Apache Spark - SparkSQL Run on Amazon EMR Cluster

Introduction This blog will show how to run a spark application on Amazon EMR(Elastic MapReduce) cluster. We are going to run our spark application on top of the Hadoop cluster and we will put the input data source into the Amazon S3.

Apache Spark - SparkSQL Dataframe Vs. RDD


Apache Spark - SparkSQL Join


Apache Spark - SparkSQL Action

Introduction The most powerful way to use Spark SQL is to intergrate it into a spark application program, in which we not only can run SQL code to make a query to dataframe, but also can combine the SQL query with other implementations.

Apache Spark - SparkSQL Introduction

Introduction SparkSQL is a interface on Spark that working with structure and semistructure data.