This project was a task proposed on a kaggle dataset.
The provised dataset was composed of student marks on subjects and socio-economical information. The main goal was to build a Machine Learning model for gender prediction.
The data was checked, explored and then prepared for modelling. Two different Models were applied, namely Random Forest and Support Vector Machine(SVM). A model accuracy of 89% was achieved.
Afterwards, Shapley Values were applied to the model to explain the most relevant parameters for the prediction. It was concluded that the socio-economical information added no relevant benefit to the model.
At the end, a final model was applied to predict the math score of the students. This last model showcased that the method could also be applied to predict numerical variables.