1. Home
  2. Browse by Author

Browsing by Author "KABORE, KADER MONHAMADY"

Filter results by typing the first few letters
Now showing 1 - 1 of 1
  • Results Per Page
  • Sort Options
  • Loading...
    Thumbnail Image
    masterthesis.listelement.badge
    Developing machine learning methods for business intelligence
    (Abdullah Gül Üniversitesi, 2018) KABORE, KADER MONHAMADY; AGÜ, Fen Bilimleri Enstitüsü, Elektrik ve Bilgisayar Mühendisliği Ana Bilim Dalı; KABORE, KADER MONHAMADY
    Detection of key attributes in text is an area of research, which attracts attention due to the increase of data and the availability of massive documents. Key attributes serve as metadata for documents and the discovery of accurate characteristics allows to capture significant pieces of information from a lengthy text. They allow faster and efficient information retrieval on the web domain with an ever increasing number of websites. In this thesis, a novel two-stage machine learning method is developed to identify the company name from web page text. The problem is reduced to a classification task at the token (i.e. word) level followed by a post-processing phase for predicting the company name. Features are extracted using natural language processing techniques and by observing patterns present in textual data to reflect the properties and significance of the words in context. Derived features are sent as input to classification algorithms such as naive Bayes, decision tree, and random forest. In addition to the token-based classifier, a rule-based method is designed that also considers tokens from domain as well as page title and ranks tokens by computing similarity metrics. The results demonstrate high precision from the machine learning model along with high undefined cases whereas the rule-based approach obtained high accuracy with precision inferior to the token-based model. When the two classification strategies are combined into a two-stage classifier, high accuracy and precision scores are obtained.