Standard predictive models have been designed with the assumption of an equal number of samples over all the classes. Imbalanced classification poses a major challenge to the community of artificial intelligence, where the distribution of samples for different classes is biased. In such case, the classification outcome is dominated by the biased distribution, where there are a small amount of samples in the minority class and a large number of samples in the majority class. Skewed or biased data distributions may arise in vast applications such as disease diagnosis, cyber security, image recognition and earth observation.
Imbalanced classification problems have been explored for many years in the community of machine learning. Methods for addressing these problems can be categorised as data- and algorithm-level and hybrid approaches. Data-level methods modify the training distributions by over- and under-sampling . Unlike data sampling, algorithm-level approaches target enhancing the learning and decision process to increase the importance of the positive class . Data- and algorithm-level methods can be combined to deal with sampling and cost-sensitive learning .
Among those available technologies, deep learning has achieved promising successes due to its high learning capacity . Despite its powerful capabilities, deep learning architectures are still vulnerable to imbalanced data distributions such as complex data representations .
In this project, we intend to explore the possibility of incorporating boosting concepts in deep learning architectures for imbalanced classification. We wish to achieve the following objectives throughout the entire project:
To develop a novel deep learning framework with boosting for imbalanced classification.
- To develop a new representation learning strategy to deal with multi-task applications with scarce samples and labels.
- To comprehensively evaluate the systems proposed in (1) and (2).
The expected outcome of the proposed project include one fully working demo system, 2 papers submitted to top journals, e.g. Journal of Machine Learning Research, and 2 papers submitted to top conferences such as ICML.
- J. Van Hulse, T.M. Khoshgoftaar, A. Napolitano. Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on machine learning. ICML ’07. ACM, New York, NY, USA. 2007. p. 935–42.
- C.X. Ling, V.S. Sheng. In: Sammut C, editor. Cost-sensitive learning and the class imbalanced problem. 2007.
- H. He, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21(9):1263–1284.
- F. Bao, Y. Deng, Y. Kong, Z. Ren, J. Suo, and Q. Dai, “Learning deep landmarks for imbalanced classification,” IEEE Trans. Neural Networks Learn. Syst., vol. 31, no. 8, pp. 2691–2704, 2020.
- X. Jing, X. Zhang, X. Zhu, F. Wu, X. You, Y. Gao, S. Shan, and J. Yang, “Multiset feature learning for highly imbalanced data classification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 139–156, 2021.