Yayın:
Character n-gram application for automatic new topic identification

dc.contributor.buuauthorÇağlar, Burcu Gençosman
dc.contributor.buuauthorÖzmutlu, Hüseyin Cenk
dc.contributor.buuauthorÖzmutlu, Seda
dc.contributor.departmentMühendislik Fakültesi
dc.contributor.departmentEndüstri Mühendisliği Bölümü
dc.contributor.orcid0000-0003-0159-8529
dc.contributor.researcheridAAH-4480-2021
dc.contributor.researcheridABH-5209-2020
dc.contributor.researcheridAAG-8600-2021
dc.contributor.scopusid56263661900
dc.contributor.scopusid6603061328
dc.contributor.scopusid6603660605
dc.date.accessioned2023-06-21T10:13:19Z
dc.date.available2023-06-21T10:13:19Z
dc.date.issued2014-06-26
dc.description.abstractThe widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.
dc.identifier.citationÇağlar, B. G. vd. (2014). "Character n-gram application for automatic new topic identification". Information Processing and Management, 50(6), 821-856.
dc.identifier.doi10.1016/j.ipm.2014.06.005
dc.identifier.endpage856
dc.identifier.issn0306-4573
dc.identifier.issn1873-5371
dc.identifier.issue6
dc.identifier.scopus2-s2.0-84905168720
dc.identifier.startpage821
dc.identifier.urihttps://doi.org/10.1016/j.ipm.2014.06.005
dc.identifier.urihttps://www.sciencedirect.com/science/article/pii/S0306457314000521
dc.identifier.urihttp://hdl.handle.net/11452/33097
dc.identifier.volume50
dc.identifier.wos000342546900001
dc.indexed.wosSCIE
dc.indexed.wosSSCI
dc.language.isoen
dc.publisherElsevier
dc.relation.journalInformation Processing and Management
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectContent-ignorant algorithms
dc.subjectThe levenshtein edit-distance
dc.subjectNew topic identification
dc.subjectThe character n-gram method
dc.subjectPre-processed spelling correction methods
dc.subjectNeural-network applications
dc.subjectWeb
dc.subjectCategorization
dc.subjectComputer science
dc.subjectInformation science & library science
dc.subjectBehavioral research
dc.subjectSearch engines
dc.subjectErrors
dc.subjectInternet
dc.subjectEdit distance
dc.subjectTopic identification
dc.subjectInternet-based applications
dc.subjectSpelling correction
dc.subjectMinimizing the number of
dc.subjectSearch engine performance
dc.subjectN-gram methods
dc.subjectNetwork methodologies
dc.subjectAlgorithms
dc.subject.scopusQuery Reformulation; Image Indexing; Information Retrieval
dc.subject.wosComputer science, information systems
dc.subject.wosInformation science & library science
dc.titleCharacter n-gram application for automatic new topic identification
dc.typeArticle
dc.wos.quartileQ2
dspace.entity.typePublication
local.contributor.departmentMühendislik Fakültesi/Endüstri Mühendisliği Bölümü
local.indexed.atScopus
local.indexed.atWOS

Dosyalar

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1
Placeholder
Ad:
license.txt
Boyut:
1.71 KB
Format:
Item-specific license agreed upon to submission
Açıklama