Connect Spark with Cloudera Yarn Cluster

Cloudera automatically installs Spark cluster, but it is not easy to update Spark version by Cloudera. There is another
solution for users to run the latest version of Spark on Cloudera Yarn cluster.

  1. set up the environment to make the configuration files of Hadoop available to the package
export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/conf/yarn-conf/
  1. Start the spark-shell under the Spark package
./spark-shell --master yarn --deploy-mode client --num-executors 10 --driver-memory 12g --executor-memory 10g --executor-cores 24