Developing machine learning methods for business intelligence

dc.contributor.author KABORE, KADER MONHAMADY
dc.contributor.department AGÜ, Fen Bilimleri Enstitüsü, Elektrik ve Bilgisayar Mühendisliği Ana Bilim Dalı en_US
dc.contributor.institutionauthor KABORE, KADER MONHAMADY
dc.date.accessioned 2020-07-21T13:47:40Z
dc.date.available 2020-07-21T13:47:40Z
dc.date.issued 2018 en_US
dc.description.abstract Detection of key attributes in text is an area of research, which attracts attention due to the increase of data and the availability of massive documents. Key attributes serve as metadata for documents and the discovery of accurate characteristics allows to capture significant pieces of information from a lengthy text. They allow faster and efficient information retrieval on the web domain with an ever increasing number of websites. In this thesis, a novel two-stage machine learning method is developed to identify the company name from web page text. The problem is reduced to a classification task at the token (i.e. word) level followed by a post-processing phase for predicting the company name. Features are extracted using natural language processing techniques and by observing patterns present in textual data to reflect the properties and significance of the words in context. Derived features are sent as input to classification algorithms such as naive Bayes, decision tree, and random forest. In addition to the token-based classifier, a rule-based method is designed that also considers tokens from domain as well as page title and ranks tokens by computing similarity metrics. The results demonstrate high precision from the machine learning model along with high undefined cases whereas the rule-based approach obtained high accuracy with precision inferior to the token-based model. When the two classification strategies are combined into a two-stage classifier, high accuracy and precision scores are obtained. en_US
dc.identifier.other Tez No: 541338
dc.identifier.uri https://hdl.handle.net/20.500.12573/323
dc.language.iso eng en_US
dc.publisher Abdullah Gül Üniversitesi en_US
dc.relation.publicationcategory Tez en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Named Entity Recognition en_US
dc.subject Company Name Detection en_US
dc.subject Natural Language Processing en_US
dc.subject Web Mining en_US
dc.subject Feature Extraction en_US
dc.subject Machine Learning en_US
dc.title Developing machine learning methods for business intelligence en_US
dc.title.alternative İş zekası için makine öğrenmesi yöntemlerinin geliştirilmesi en_US
dc.type masterThesis en_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Developing Machine Learning Methods for busness intelligence.pdf
Size:
1.14 MB
Format:
Adobe Portable Document Format
Description:
Yüksek Lisans Tezi

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: