How to correctly select a sample from a huge dataset in machine learning

By KDnuggets - 2021-03-22

Description

We explain how choosing a small, representative dataset from a large population can improve model training reliability.

Summary

  • In machine learning, we often need to train a model with a very large dataset of thousands or even millions of records.
  • If our book has three cantiche and each one of them has 33 canti, maybe it’s complete and we can safely learn from it.
  • In other words, if we take a look at the histogram of the sample, it must be the same as the histogram of the population.
  • The other field is a factor variable created by using the first 10 letters from the alphabet uniformly distributed.

 

Topics

  1. Machine_Learning (0.33)
  2. Backend (0.2)
  3. NLP (0.16)

Similar Articles

K-fold Cross Validation with PyTorch

By MachineCurve - 2021-02-02

Explanations and code examples showing you how to use K-fold Cross Validation for Machine Learning model evaluation/testing with PyTorch.