Building a Simple ETL Pipeline with Python and Google Cloud Platform

By Medium - 2020-09-16

Description

There are a lot of ETL tools out there and sometimes they can be overwhelming, especially when you simply want to copy a file from point A to B. So today, I am going to show you how to extract a CSV…

Summary

  • Extracting data from an FTP server using Google Cloud Functions and loading to BigQuery There are a lot of ETL tools out there and sometimes they can be overwhelming, especially when you simply want to copy a file from point A to B.
  • So today, I am going to show you how to extract a CSV file from an FTP server (Extract), modify it (Transform) and automatically load it into a Google BigQuery table (Load) using python 3.6 and Google Cloud Functions.
  • As at the writing of this post, CF isn’t available in every Google data-centre region, so check here to see where Cloud Functions is enabled.
  • Please note, the FTP server I was working on, had multiple CSVs representing transaction data for different days.
  • Since we are enabling “auto-detection” , the Bigquery table doesn’t have to have a schema when creating it as it will be inferred based on the data in the CSV file.

 

Topics

  1. Backend (0.63)
  2. Database (0.13)
  3. Security (0.06)

Similar Articles

BigQuery Omni for multi-cloud data analytics

By Google Cloud Blog - 2021-01-14

BigQuery Omni, powered by Anthos, lets you analyze data in Google Cloud, as well as AWS and Azure (coming soon). It’s multi-cloud data analytics for the modern age.

Microsoft Azure ADF - Dynamic Pipelines

By SQLServerCentral - 2021-03-09

Azure Data Factory (ADF) is a cloud based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and