Be Careful of This Data Science Mistake I Wasted 30 Hours Over

By Medium - 2020-12-30

Description

The model had been training across several sessions for many days on an image recognition competition. It was a relatively simple, and scored about a 0.9 AUC initially — the metric for the…

Summary

  • How to avoid (and take advantage of) this blunder The model had been training across several sessions for many days on an image recognition competition.
  • But I had not made some sort of code error in which the model was trained on the test data correctly.
  • However, when the competition ends, the model is evaluated on the other 75% of the test set to determine the position on the final private leaderboard.
  • If we’re able, however, to use the 25% of test set to improve the score on the private leaderboard, this counts as data leakage.

 

Topics

  1. Backend (0.35)
  2. Database (0.16)
  3. Machine_Learning (0.15)

Similar Articles

How to put machine learning models into production

By Stack Overflow Blog - 2020-10-12

The goal of building a machine learning model is to solve a problem, and a machine learning model can only do so when it is in production and actively in use by consumers. As such, model deployment is ...

Leading a Data Science Project from Scratch

By Medium - 2021-02-10

If you are new to leading a project in data science, you will have many questions despite having gone through the same steps one too many times as an intern, or an engineer in the team. When it comes…