Yayın:
Classification of breast cancer using ensemble machine learning with apache spark

Placeholder

Akademik Birimler

Kurum Yazarları

Krotha, Durga Pujitha
Shaik, Fathimabi
Lakshmi, G. Jaya

Yazarlar

Krotha, Durga Pujitha
Shaik, Fathimabi
Lakshmi, G. Jaya

Danışman

Dil

Türü

Yayıncı:

Yıldız Teknik Üniversitesi

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Özet

Breast cancer is one of the most common and serious problem affecting people around the world. Detecting it early and correctly identifying whether a tumor is benign or malignant. In this study, we developed a new model called the Logistic Ensemble Fusion Model to im-prove the accuracy of Breast cancer diagnosis. This model combines the strengths of three different machine learning models, specifically Support Vector Machine, Decision Tree, and Logistic Regression, into a powerful ensemble approach, significantly improving over traditional methods. We used Apache Spark with its Python API to handle large datasets quickly and efficiently. To select the important features for making predictions, we used a method called Recursive Feature Elimination (RFE), with the help of both a Support Vector Machine (SVM-RFE) and Random Forest (RF-RFE). We tested our model by dividing the data into training and testing sets in an 80:20 ratio. The Logistic Ensemble Fusion Model achieved an accuracy of 99.13%, precision of 98.71%, recall of 99.91%, and an F1 score of 99.12%. The en-tire process, which involved running 12 Spark jobs, was completed in 38 seconds. Compared to other models like Random Forest, Gradient Boosting, Factorization Machine, One-vs-Rest, and Multilayer Perceptron. The main innovation of this study is the use of multiple machine learning models in a unified ensemble fusion approach, providing classification performance and demonstrating significant advancement over previous methods. This study underscores the potential of advanced ensemble machine learning techniques and big data technologies in refining breast cancer diagnosis and supporting more effective clinical decision-making.

Açıklama

Kaynak:

Anahtar Kelimeler:

Konusu

Spark, Machine learning, Feature selection, Classification, ensemble methods, Breast cancer

Alıntı

Endorsement

Review

Supplemented By

Referenced By

0

Views

0

Downloads

View PlumX Details