Spark it up a notch. Nitty-gritty details of Apache Spark

By Medium - 2021-03-13

Description

I’ve spent about a year learning and implementing the different subtleties associated with Spark. In this series, starting with this article, I’m going to attempt to document the different scenarios…

Summary

  • Spark it up a notch Nitty-gritty details of Apache Spark I’ve spent about a year learning and implementing the different subtleties associated with Spark.
  • Transformations are lazy by nature — Spark keeps track of what transformation is called on which record(using the DAG) and will execute them only when an action is called on the data(for ex, printing the top 5 lines of the dataset).
  • Note here that transformations return new RDDs since RDDs are immutable.
  • Wide Transformation All the elements required to compute the records in a single partition may reside in a many partitions of the parent RDD.

 

Topics

  1. Backend (0.26)
  2. Database (0.14)
  3. Machine_Learning (0.07)

Similar Articles

7 Ways Your Data Is Telling You It’s a Graph

By Neo4j Graph Database Platform - 2015-12-23

Watch (or read) Senior Project Manager Karen Lopez’s GraphConnect presentation on the signs that your data is actually a graph and needs a graph database.

Data Science Learning Roadmap for 2021

By freeCodeCamp.org - 2021-01-12

Although nothing really changes but the date, a new year fills everyone with the hope of starting things afresh. If you add in a bit of planning, some well-envisioned goals, and a learning roadmap, yo ...

15 Essential Steps To Build Reliable Data Pipelines

By Medium - 2020-12-01

If I learned anything from working as a data engineer, it is that practically any data pipeline fails at some point. Broken connection, broken dependencies, data arriving too late, or some external…