Reducing memory usage in pandas with smaller datatypes

By Medium - 2021-03-15

Description

Managing large datasets with pandas is a pretty common issue. As a result, a lot of libraries and tools have been developed to ease that pain. Take, for instance, the pydatatable library mentioned…

Summary

  • Optimizing pandas memory usage by the effective use of datatypes Managing large datasets with pandas is a pretty common issue.
  • There is no difference in the amount of memory allocated, but as the name suggests, unsigned integers can only store positive values, i.e., 0–255, for uint8.
  • Finally, we can also specify the datatypes for different columns at the time of loading the CSV files.
  • However, it will be helpful to look at some other libraries that can handle the big data issue much more efficiently.

 

Topics

  1. Backend (0.33)
  2. Coding (0.17)
  3. Database (0.16)

Similar Articles

15 Essential Steps To Build Reliable Data Pipelines

By Medium - 2020-12-01

If I learned anything from working as a data engineer, it is that practically any data pipeline fails at some point. Broken connection, broken dependencies, data arriving too late, or some external…