Getting data from a Kaggle’s competition, let’s compare the performance between classic and new generation of gradient boosting decision trees (GBDTs).


In this competition proposed by Santander Bank, invites Kaggle users to predict which customers will make a specific transaction in the future, regardless of the amount of money made. The data provided for this contest has the same structure as the actual data they have available to solve the problem in the bank, which makes us address a real problem with a demanding dataset by number of records and characteristics, by which will test the performance of classic algorithms versus next-generation algorithms.

The data is anonymised, where each row contains 200 discrete variables and no categorical variables.

Next we’ll do a data exploration, readiness to apply the model, and analyze which algorithms get the best performance with low overfitting and compare the results between them.


  1. Libraries
  2. Data extraction
  3. Data exploration
  4. Unbalanced Data and Resampling
  5. Feature selection
  6. Binary classification models
  7. Hyperparameter tuning
  8. Detection of the most influential variables

Web project in Kaggle:

Web project in Github: Github

Postada informe atmósferas protectoras Previous post Analysis of protective atmosphere packaging technologies
Informe envejecimiento 2010 Next post R&D Analysis Report on Aging

2 comentarios en “Santander Customer Transaction Prediction

Responder a blog Cancelar la respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *