Apache Spark - Architecture and Components

Introduction This note will look at a high-level of Spark architecture and understand how it execute actions across notes

Apache Spark - Caching and Persistance

Introduction If we call actions in an RDD for many times and do it naively, the RDDs and all of its dependencies are recomputed at each time we call the action, which can be very expensive, especially for some iterative algorithms.

Apache Spark - Actions and aspects of RDD

Introduction Actions are the operations which will return a final value to the driver program or persist data to an external storage system. It will force the evaluation of the transformations required for the RDD they were called on.

Apache Spark - Set Operations

Introduction This blog post will focus on one type of RDD transformations: Set Operations

Apache Spark - RDD, and fliter & map Transformations

Introduction This blog will cover some basic concepts and practice of Apache Spark. Github Rrpo