-

Diabetes Prediction based on diagnostic measures

This project was a task proposed on a kaggle dataset.

The objective was to diagnosticly predict whether a patient has diabetes or not based on the certain diagnostic measurements.

The data was checked, explored and then prepared for modelling. Given the high percentage (~50%) of missing values for one feature, machine learning based imputation was applied.
The Support Vector Machine model was applied, but the accuracy of multiple models was assessed.

The final model had an accuracy of approximately 80%, with low sensitivity (57.1%), but high specificity (91.7%).
In practical term, the model can updated the prior risk for an individual from 23% to 65% or 12.5%, with a positive or negative result, respectively.

Afterwards, two methods were used to assess the most important features, namely Shapley Values and Permutation Importance. Both indicated Blood Glucose, Insulin and BMI as the most relevant features.

At the end, a decision surface of the model was presented in 2D using PCA transformed features, as a visualization alternative.

title
title
Open Image Initial Exploration - Missing values & Boxplot
title
Open Image Initial Exploration - Distributions
title
Open Image Inicial Exploration - Pairplot
title
Open Image Confusion Matrix and Bayesian Statistics
title
Open Image Shapley Values Analysis
title
Open Image Decision surface using PCA

Location

  •  Recife,Brazil
  •   +55 89 9 9464-0354
  • fariasaramis@gmail.com
  • aramisfa