Yayın:
Classification of breast cancer using ensemble machine learning with apache spark

dc.contributor.authorKrotha, Durga Pujitha
dc.contributor.authorShaik, Fathimabi
dc.contributor.authorLakshmi, G. Jaya
dc.contributor.buuauthorKrotha, Durga Pujitha
dc.contributor.buuauthorShaik, Fathimabi
dc.contributor.buuauthorLakshmi, G. Jaya
dc.contributor.departmentMühendislik Fakültesi
dc.contributor.departmentTekstil Mühendisliği Bölümü
dc.contributor.departmentFen ve Edebiyat Fakültesi
dc.contributor.departmentKimya Bölümü
dc.contributor.orcid0009-0006-9127-0736
dc.contributor.scopusid58802674100
dc.contributor.scopusid60042155100
dc.contributor.scopusid60041519300
dc.date.accessioned2025-11-28T11:25:56Z
dc.date.issued2025-08-01
dc.description.abstractBreast cancer is one of the most common and serious problem affecting people around the world. Detecting it early and correctly identifying whether a tumor is benign or malignant. In this study, we developed a new model called the Logistic Ensemble Fusion Model to im-prove the accuracy of Breast cancer diagnosis. This model combines the strengths of three different machine learning models, specifically Support Vector Machine, Decision Tree, and Logistic Regression, into a powerful ensemble approach, significantly improving over traditional methods. We used Apache Spark with its Python API to handle large datasets quickly and efficiently. To select the important features for making predictions, we used a method called Recursive Feature Elimination (RFE), with the help of both a Support Vector Machine (SVM-RFE) and Random Forest (RF-RFE). We tested our model by dividing the data into training and testing sets in an 80:20 ratio. The Logistic Ensemble Fusion Model achieved an accuracy of 99.13%, precision of 98.71%, recall of 99.91%, and an F1 score of 99.12%. The en-tire process, which involved running 12 Spark jobs, was completed in 38 seconds. Compared to other models like Random Forest, Gradient Boosting, Factorization Machine, One-vs-Rest, and Multilayer Perceptron. The main innovation of this study is the use of multiple machine learning models in a unified ensemble fusion approach, providing classification performance and demonstrating significant advancement over previous methods. This study underscores the potential of advanced ensemble machine learning techniques and big data technologies in refining breast cancer diagnosis and supporting more effective clinical decision-making.
dc.identifier.doi10.14744/sigma.2025.00126
dc.identifier.endpage1399
dc.identifier.issn1304-7191
dc.identifier.issue4
dc.identifier.scopus2-s2.0-105013338127
dc.identifier.startpage1385
dc.identifier.urihttps://hdl.handle.net/11452/56974
dc.identifier.volume43
dc.indexed.scopusScopus
dc.language.isoen
dc.publisherYıldız Teknik Üniversitesi
dc.relation.journalSigma Journal of Engineering and Natural Sciences
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectSpark
dc.subjectMachine learning
dc.subjectFeature selection
dc.subjectClassification, ensemble methods
dc.subjectBreast cancer
dc.subject.scopusMachine Learning Models for Breast Cancer Diagnosis
dc.titleClassification of breast cancer using ensemble machine learning with apache spark
dc.typeArticle
dspace.entity.typePublication
local.contributor.departmentMühendislik Fakültesi/Tekstil Mühendisliği Bölümü
local.contributor.departmentMühendislik Fakültesi/Tekstil Mühendisliği Bölümü
local.indexed.atScopus

Dosyalar