Description
We explain how choosing a small, representative dataset from a large population can improve model training reliability.
Summary
- In machine learning, we often need to train a model with a very large dataset of thousands or even millions of records.
- If our book has three cantiche and each one of them has 33 canti, maybe it’s complete and we can safely learn from it.
- In other words, if we take a look at the histogram of the sample, it must be the same as the histogram of the population.
- The other field is a factor variable created by using the first 10 letters from the alphabet uniformly distributed.