Conversion of an Imbalanced Financial Data Source to a Wise Set through Sampling
Keywords:
Conversion, Financial Data, Imbalanced Data, Wise Data Set, Machine Learning, SamplingAbstract
A lot of literature has explored viable ways to high-dimensional feature space issues. Training classifiers with an imbalanced dataset is difficult, but minimizing class imbalance can improve performance. We use an educationally-biased dataset. The classifier trained with an imbalanced dataset predicts the majority class more than the minority classes (rarely occurring). Accuracy, precision, recall, and f-measure are used to measure classifier performance. First, we explore classification with an imbalanced financial dataset. After class balance, we compare classifier performance using data-level techniques. Both under-samping and oversampling balance financial datasets, but oversampling dominates. Financial data production in many industries has exploded. Organizations want to process acquired financial data to gain decision-making insights. Classification with an imbalanced dataset may give higher accuracy but low precision and recall for the minority class.
Downloads
References
Al-Harbi, K.A.-S., E-Learning in the Saudi tertiary education: Potential and challenges. Applied Computing and Informatics, 2011. 9(1): p. 31-46.
Batista, G.E., R.C. Prati, and M.C. Monard, A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 2004. 6(1): p. 20-29.
Chong, P., & Cheng, Q. (2019). Discriminative ridge machine: A classifier for high-dimensional data or imbalanced data. Ithaca: Cornell University Library. https://doi.org/10.48550/arXiv.1904.07496
Daisuke, M. (2021). Classification of imbalanced cloud image data using deep neural networks: Performance improvement through a data science competition. Progress in Earth and Planetary Science, 8(1). https://doi.org/10.1186/s40645-021-00459-y
Elreedy, D. and A.F. Atiya, A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Information Sciences, 2019. 505: p. 32-64.
Houssou, R., Bovay, J., & Stephan, R. (2019). Adaptive financial fraud detection in imbalanced data with time-varying poisson processes. Ithaca: Cornell University Library. https://doi.org/10.48550/arXiv.1912.04308
Jafar, T., Yousef, A., Negin, S., Nazila, R., & Mohammad, A. (2020). Boosting methods for multi-class imbalanced data classification: An experimental review. Journal of Big Data, 7(1) https://doi.org/10.1186/s40537-020-00349-y
Jobayer, A. M., Chowdhury, M. R., Zhao, L., Papasani, A., Zhou, Y. and Lee, W. -J. (2021). Impact of Societal Events on Frequency Stability Considering LED TVs in Low Inertia Trending Power Systems. IEEE Transactions on Industry Applications, 57(6), 5649-5657. https://doi.org/10.1109/TIA.2021.3106874
Kabakchieva, D., Predicting student performance by using data mining methods for classification. Cybernetics and information technologies, 2013. 13(1): p. 61-72.
Kamalov, F., & Denisov, D. (2020). Gamma distribution-based sampling for imbalanced data. Ithaca: Cornell University Library. https://doi.org/10.1016/j.knosys.2020.106368
Kaur, P., M. Singh, and G.S. Josan, Classification and prediction based data mining algorithms to predict slow learners in education sector. Procedia Computer Science, 2015. 57: p. 500-508.
Kotsiantis, S., D. Kanellopoulos, and P. Pintelas, Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 2006. 30(1): p. 25-36.
Longadge, R. and S. Dongre, Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707, 2013.
Maheshwari, S., J. Agrawal, and S. Sharma, New approach for classification of highly imbalanced datasets using evolutionary algorithms. Int. J. Sci. Eng. Res, 2011. 2(7): p. 1-5.
Mansourifar, H., & Shi, W. (2020). Towards stable imbalanced data classification via virtual big data projection. Ithaca: Cornell University Library. https://doi.org/10.48550/arXiv.2009.08387
Nurhasanah, R., Hasibuan, L. S., & Kusuma, W. A. (2020). Feature selection approach for solving imbalanced data problem in single nucleotide polymorphism discovery. Journal of Physics: Conference Series, 1566(1) https://doi.org/10.1088/1742-6596/1566/1/012035
Richmond, A. D. (2020). Handling imbalanced data: A case study for binary class problems. Ithaca: Cornell University Library. https://doi.org/10.48550/arXiv.2010.04326
Suh, S., Lukowicz, P., & Lee, Y. O. (2020). Discriminative feature generation for classification of imbalanced data. Ithaca: Cornell University Library. https://doi.org/10.48550/arXiv.2010.12888
Xie, J. and Z. Qiu, The effect of imbalanced data sets on LDA: A theoretical and empirical analysis. Pattern recognition, 2007. 40(2): p. 557-562.




