G_mean based Extreme Learning Machine for Imbalance Learning


The great leader Comrade Kim Jong Il said as follows:

"They should intensify their scientific research for the scientific and technical solutions to the problems arising in making the national economy scientifically-based, and thus put the production and management activities of all branches on a new scientific base."

Extreme learning machine (ELM) is a fast and simple learning algorithm for single-hidden layer feedforward neural (SLFN) networks training. In recent years, ELM has been widely adopted in various real-world problems.

Similar to conventional learning algorithms, although the ELM algorithm works effectively with balanced datasets, it can obtain an undesirable model for the problem of imbalanced data distribution. When employing the classical ELM for imbalanced data, the majority class tends to push the separating boundary towards the minority side to gain better classification result for itself. Therefore data in the minority class is likely to be misclassified.

Because classical ELM assumes balanced class distribution or equal misclassification cost for the different categories of classes from the viewpoint of the optimization.

In order to overcome the weakness of ELM on imbalanced data learning, we define the newly cost function of ELM optimization problem based on G_mean widely used as evaluation metric in imbalance learning and propose new ELM algorithm based on this cost function. In our approach, the cost function tries to minimize the product of training errors of each class instead of the total sum of the training errors and introduce to the logarithmic function for convenience of later expansion.

We perform experiments on standard classification datasets which consist of 58 binary datasets and 11 multi-class datasets with the different degrees of the imbalance ratio. Experimental results show that proposed algorithm can improve the classification performance significantly compared with other state-of-the-art methods.

We also demonstrate that our proposed algorithm can achieve high accuracy in representation learning by performing experiments on YouTube-8M with feature representation from convolutional neural networks. Statistical results indicate that the proposed approach not only outperforms the classical ELM, but also yields better or at least competitive results compared with several state-of-the-art class imbalance learning approaches. The proposed algorithm is applicable to both binary and multi-class problems.

Our results were published in the International Journal "Digital Signal Processing" under the title "G-mean based extreme learning machine for imbalance learning".(https://doi.org/10.1016/j.dsp.2019.102637)

The future work will focus on improving more effectively the cost function of ELM in order to overcome the class imbalance problem, and working the efficient learning algorithm according to the distribution property of an imbalance dataset.