Predicting Breast Cancer Diagnosis

https://medium.com/@tjanmichela/predicting-breast-cancer-diagnoses-207154c25089

Contributions by Michela Tjan Effendie and Michelle Manfrini.

Description

Breast cancer is one of the leading causes of death among women, making it important to begin screening for breast cancer early in order to increase the chances of successful treatments. A robust model can assist medical professionals in identifying cases and reducing breast cancer risk. Our project proposed four breast cancer prediction models based on logistic regression, K-Nearest Neighbors Classifier (KNN), Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis (QDA).

Framework

Problem definition

Data

Evaluation

Features

Modelling

Experimentation

Research Questions

❓

Based on a model trained on synthetic data, how well can said model predict whether or not a clinical patient have breast cancer? Can the model accurately predict whether an individual has breast cancer based on the provided predictors? Which attributes are significant in distinguishing between healthy and affected individuals? How well does the model perform in terms of accuracy and reliability? How clinically relevant and applicable is the model? Can it be used by healthcare professionals for early-stage breast cancer detection?

Data Sources

Two datasets:

The original “Breast Cancer Coimbra” dataset found in the UC Irvine Machine Learning Repository, with 116 instances based on clinical observations from 64 patients with breast cancer and 52 healthy controls.

Features

Age (years): Represents the age of individuals in the dataset.

BMI (kg/m²): Body Mass Index, a measure of body fat based on weight and height.

Glucose (mg/dL): Reflects blood glucose levels, a vital metabolic indicator.

Insulin (µU/mL): Indicates insulin levels, a hormone associated with glucose regulation.

HOMA: Homeostatic Model Assessment, a method assessing insulin resistance and beta-cell function.

Leptin (ng/mL): Represents leptin levels, a hormone involved in appetite and energy balance regulation.

Adiponectin (µg/mL): Reflects adiponectin levels, a protein associated with metabolic regulation.

Resistin (ng/mL): Indicates resistin levels, a protein implicated in insulin resistance.

MCP-1 (pg/dL): Reflects Monocyte Chemoattractant Protein-1 levels, a cytokine involved in inflammation.

Labels:

1: Healthy controls

2: Patients with breast cancer

The derived dataset from Kaggle which is a synthetic dataset from a deep learning model that was trained on the original. The synthetic dataset has 4,000 instances.

Features

Age (years): Represents the age of individuals in the dataset.

BMI (kg/m²): Body Mass Index, a measure of body fat based on weight and height.

Glucose (mg/dL): Reflects blood glucose levels, a vital metabolic indicator.

Insulin (µU/mL): Indicates insulin levels, a hormone associated with glucose regulation.

HOMA: Homeostatic Model Assessment, a method assessing insulin resistance and beta-cell function.

Leptin (ng/mL): Represents leptin levels, a hormone involved in appetite and energy balance regulation.

Adiponectin (µg/mL): Reflects adiponectin levels, a protein associated with metabolic regulation.

Resistin (ng/mL): Indicates resistin levels, a protein implicated in insulin resistance.

MCP-1 (pg/dL): Reflects Monocyte Chemoattractant Protein-1 levels, a cytokine involved in inflammation.

Labels:

1: Healthy controls

2: Patients with breast cancer

Results

Results reveal that none of the models significantly outperform the other models. Although KNN seems to be the highest performing model based on all the evaluation metrics including the confusion matrix, it does not have a high tendency to predict one class over the other.

Click here for the in-depth project report.

👋🏻 Let’s chat!

📝

My Resume →

If you are interested in working with me, have a project in mind, or just want to say hello, please don't hesitate to contact me.

Find me here 👇🏻

📨 Email →

💼 LinkedIn →

👩‍💻 Github →

✍️ Blog Posts →

🥺 Tip Me →

Please do not steal my work. It took uncountable cups of coffee and sleepless nights. Thank you.