Datasets should behave like git repositories

By DAGsHub Blog - 2021-01-18

Description

Create, maintain, and contribute to a long-living dataset that will update itself automatically across projects, using git and DVC as versioning systems.

Summary

  • Problems emerging from data are common in research as well as in the industry.
  • I will show you how to create, maintain, and contribute to a long-living dataset that will update itself automatically across projects, using git and DVC as versioning systems, and DAGsHub as a host for the datasets.
  • Repository B - AKA the machine learning project, is where I want to use the files stored in my living-dataset.
  • This will specifically download the directories images and annotations from inside my dataset repository, and keep information on how to continue tracking the changes made in it.

 

Topics

  1. Backend (0.34)
  2. Machine_Learning (0.2)
  3. Database (0.16)

Similar Articles

Data Science Learning Roadmap for 2021

By freeCodeCamp.org - 2021-01-12

Although nothing really changes but the date, a new year fills everyone with the hope of starting things afresh. If you add in a bit of planning, some well-envisioned goals, and a learning roadmap, yo ...

Leading a Data Science Project from Scratch

By Medium - 2021-02-10

If you are new to leading a project in data science, you will have many questions despite having gone through the same steps one too many times as an intern, or an engineer in the team. When it comes…