Description
I’ve spent about a year learning and implementing the different subtleties associated with Spark. In this series, starting with this article, I’m going to attempt to document the different scenarios…
Summary
- Spark it up a notch Nitty-gritty details of Apache Spark I’ve spent about a year learning and implementing the different subtleties associated with Spark.
- Transformations are lazy by nature — Spark keeps track of what transformation is called on which record(using the DAG) and will execute them only when an action is called on the data(for ex, printing the top 5 lines of the dataset).
- Note here that transformations return new RDDs since RDDs are immutable.
- Wide Transformation All the elements required to compute the records in a single partition may reside in a many partitions of the parent RDD.