Data Mining Techniques in Direct Marketing on Imbalanced Data using Tomek Link Combined with Random Under-sampling

dc.contributor.author Ümit Yilmaz
dc.contributor.author Zafer Aydin
dc.contributor.author V. Çağri Güngör
dc.contributor.author Cengiz Gezer
dc.contributor.department AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü en_US
dc.contributor.institutionauthor Yilmaz, Ümit
dc.contributor.institutionauthor Aydin, Zafer
dc.contributor.institutionauthor Güngör, V. Çağri
dc.date.accessioned 2022-04-08T07:16:55Z
dc.date.available 2022-04-08T07:16:55Z
dc.date.issued 2021 en_US
dc.description.abstract Determining the potential customers is very important in direct marketing. Data mining techniques are one of the most important methods for companies to determine potential customers. However, since the number of potential customers is very low compared to the number of non-potential customers, there is a class imbalance problem that significantly affects the performance of data mining techniques. In this paper, different combinations of basic and advanced resampling techniques such as Synthetic Minority Oversampling Technique (SMOTE), Tomek Link, RUS, and ROS were evaluated to improve the performance of customer classification. Different feature selection techniques are used in order the decrease the number of non-informative features from the data such as Information Gain, Gain Ratio, Chi-squared, and Relief. Classification performance was compared and utilized using several data mining techniques, such as LightGBM, XGBoost, Gradient Boost, Random Forest, AdaBoost, ANN, Logistic Regression, Decision Trees, SVC, Bagging Classifier based on ROC AUC and sensitivity metrics. A combination of Tomek Link and Random Under-Sampling as a resampling technique and Chi-squared method as feature selection algorithm showed superior performance among the other combinations. Detailed performance evaluations demonstrated that with the proposed approach, LightGBM, which is a gradient boosting algorithm based on decision tree, gave the best results among the other classifiers with 0.947 sensitivity and 0.896 ROC AUC value. © 2021 ACM. en_US
dc.description.sponsorship Illinois State UniversitySouth Asia Institute of Science and Engineering (SAISE)University of Hawaii at Hilo en_US
dc.identifier.isbn 978-145038954-9
dc.identifier.uri https //doi.org/10.1145/3471287.3471299
dc.identifier.uri https://hdl.handle.net/20.500.12573/1256
dc.language.iso eng en_US
dc.publisher Association for Computing Machinery en_US
dc.relation.isversionof 10.1145/3471287.3471299 en_US
dc.relation.journal ACM International Conference Proceeding Series en_US
dc.relation.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Data Mining en_US
dc.subject Direct Marketing en_US
dc.subject Imbalanced Data en_US
dc.subject Machine Learning en_US
dc.subject Tomek Link en_US
dc.title Data Mining Techniques in Direct Marketing on Imbalanced Data using Tomek Link Combined with Random Under-sampling en_US
dc.type conferenceObject en_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Data Mining Techniques in Direct Marketing on Imbalanced.pdf
Size:
466.97 KB
Format:
Adobe Portable Document Format
Description:
Konferans Ögesi

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: