ALGORITHMS OF LEARNING AND EVALUATION OF MACHINE LEARNING MODELS OF STRUCTURED DATA SETS
DOI:
https://doi.org/10.32782/IT/2023-3-1Keywords:
machine learning, data, algorithm, data processing, machine learning regression models, linear regression, decision tree, random forest.Abstract
The article deals with the sequential process of preliminary analysis and processing structured data about various types of construction vehicles. The algorithm for building machine learning models, in particular such as linear regression, decision tree and random forest, assessment of the quality of the obtained models and producing results is presented. The work describes research in the field of buying and selling cars on the secondary market using modern data mining technologies. The main objective of this study is to predict vehicle value using attributes highly correlated with price. It is proposed to consider the concepts of pricing by building the following machine learning models: taking into account the characteristics of specific brands of cars, taking into account the characteristics specific to certain types of vehicles, as well as a general model that includes all the characteristics available in the set. Models were built based on linear regression and decision tree methods. The purpose of selecting machine learning algorithms was to minimize errors in cost forecasting, speed of work, and ease of interpretation of the obtained results: based on which data the decision was made and which data have the most significant influence on the formation of the cost. To minimize the prediction error, detailed data analysis and preparation were carried out for each type of construction vehicle. Many experiments were conducted with various methods for finding and removing anomalous observations and for finding and using the essential features. In contrast, such methods as Z-index, interquartile range, recursive removal of features, and feature search based on the detection of dependencies using statistical methods were used. A comparative analysis of the results of each of the models was carried out, and the possible reasons for specific results were analyzed. The problems that arise when solving this regression-type problem are presented – the selection of data that best summarizes the system of formation of the cost of a technical vehicle.
References
Massey F. J. The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association. 2021. № 46 (253). Р. 68–78. DOI: 10.1080/01621459.1951.10500769.
Leslie J. R., Stephens M. A., Fotopoulos S. Asymptotic Distribution of the Shapiro-Wilk $W$ for Testing for Normality. Ann. Statist.2018. № 14(4). Р. 214-224. DOI: 10.1214/aos/1176350172.
Fushiki T. Estimation of prediction error by using K-fold cross-validation. Stat Comput. 2011. № 21(2). Р. 137–146. DOI: 10.1007/s11222-009-9153-8.
Mammadov H. Car Price Prediction in the USA by using Liner Regression. International Journal of Economic Behavior (IJEB). 2021. № 11(1). Р. 56-68. DOI: 10.14276/2285-0430.3049.
Pandey A., Rastogi V., Singh S. Car’s Selling Price Prediction using Random Forest Machine Learning Algorithm. SSRN Journal. 2020. № 1. Р. 146-159. DOI: 10.2139/ssrn.3702236.
Fadzilah S. Nur A. A. Used Car Price Estimation: Moving from Linear Regression towards a New S-Curve Model. IJBS.2021. № 22(3). Р. 1174–1187. DOI: 10.33736/ijbs.4293.2021.
Chen C., Hao L., Xu C. Comparative analysis of used car price evaluation models. Hangzhou. 2017. № 1. Р.201-210. DOI: 10.1063/1.4982530.
Sharma A. D., Sharma V., Mittal S., Jain G., Narang S. Predictive analysis of used car prices using machine learning. International Research Journal of Modernization in Engineering Technology and Science. 2020. № 3(6). Р. 11-20.
Chen Y., Li C., Xu M. Business Analytics for Used Car Price Prediction with Statistical Models. 3rd International Conference on Economic Management and Cultural Industry (ICEMCI 2021), Guangzhou, China, 2021. Р. 20-32. DOI: 10.2991/assehr.k.211209.090.
Karakoç M. M., Çeli̇K G., Varol A. Car Price Prediction Using An Artificial Neural Network. 2019. № 2. Р. 5-19.
Samruddhi K., Ashok Kumar R. Used Car Price Prediction using K-Nearest Neighbor Based Model. International Journal of Innovative Research in Applied Sciences and Engineering. 2020. № 4(3). Р. 686–689. DOI: 10.29027/IJIRASE.v4.i3.2020.686-689.
Asghar M., Mehmood K., Yasin S., and Khan Z., Used Cars Price Prediction using Machine Learning with Optimal Features. Pakistan Journal of Engineering and Technology. 2021. vol. 4, no. 2. Р. 113-119.