Data Mining Techniques in Direct Marketing on Imbalanced Data Using Tomek Link Combined With Random Under-Sampling

Loading...
Publication Logo

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Assoc Computing Machinery

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

Determining the potential customers is very important in direct marketing. Data mining techniques are one of the most important methods for companies to determine potential customers. However, since the number of potential customers is very low compared to the number of non-potential customers, there is a class imbalance problem that significantly affects the performance of data mining techniques. In this paper, different combinations of basic and advanced resampling techniques such as Synthetic Minority Over-sampling Technique (SMOTE), Tomek Link, RUS, and ROS were evaluated to improve the performance of customer classification. Different feature selection techniques are used in order the decrease the number of non-informative features from the data such as Information Gain, Gain Ratio, Chi-squared, and Relief. Classification performance was compared and utilized using several data mining techniques, such as LightGBM, XGBoost, Gradient Boost, Random Forest, AdaBoost, ANN, Logistic Regression, Decision Trees, SVC, Bagging Classifier based on ROC AUC and sensitivity metrics. A combination of Tomek Link and Random Under-Sampling as a resampling technique and Chi-squared method as feature selection algorithm showed superior performance among the other combinations. Detailed performance evaluations demonstrated that with the proposed approach, LightGBM, which is a gradient boosting algorithm based on decision tree, gave the best results among the other classifiers with 0.947 sensitivity and 0.896 ROC AUC value.

Description

Yilmaz, Umit/0000-0003-2918-7799;

Keywords

Direct Marketing, Data Mining, Tomek Link, Machine Learning, Imbalanced Data

Fields of Science

0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

Citation

WoS Q

N/A

Scopus Q

N/A
OpenCitations Logo
OpenCitations Citation Count
2

Source

5th International Conference on Information System and Data Mining (ICISDM) -- MAY 27-29, 2021 -- ELECTR NETWORK

Volume

Issue

Start Page

67

End Page

73
PlumX Metrics
Citations

CrossRef : 2

Scopus : 3

Captures

Mendeley Readers : 16

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.2719

Sustainable Development Goals