First, make a VM instance on GCP with the configurations given below:
   · Make sure Machine Type is 2 vCPUs, 13GB memory. You can choose more scalable, but the price will vary.
   ·        Image installation type would be Ubuntu 18.0.4 LTS.
   ·        Disk Size would be minimum 50 GB.
   ·        Zone would be us-central1-c or you can choose like Dataproc cluster master VM.
   ·        Firewalls: If you want to access this VM over http or https port then you can select to Allow the traffic for both.

· Network Tag: These should be jobserver and spark-jobserver. As these tags will open network for specific ports of VM. (Note: Network tag creation is given later in the document)

Cloud API Access Scopes: You can choose API access for cloud. By default, you can access only your own APIs. If you want to access all cloud APIs then choose “Allow full access to all Cloud APIs” option.

Now you can save all configurations and start VM.

Fire Wall Rules:

You can define fire wall rules to open Network for specific port like 8090 on which spark job server works.

Goto VPC Network menu given at left side bar of GCP and click on “Firewall Rules” link:

Here you can define Firewall Rules name and Target tags as per the requirement like here it is defined as jobserver.

Source IP range would be 0.0.0.0/0 and protocol and ports should be tcp:8090

Now save the details.

Finally, this jobserver Rule can be use as Network Tag while defining VM instance. As this rule will open network for specific port 8090 on VM.

SSH Terminal Access:

You can access it’s SSH terminal. This will open putty-like command prompt on the browser itself.

Software Installation:

As we have installed 1.2-debian 9 image on Dataproc Cluster Master VM which has a specific version of software (https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-1.2).

Hence, we need to install a similar version of software on our Ubuntu 18.0.4 LTS image of VM.

Version details are given below:

· Java 1.8

· Hadoop 2.8.5

· Sbt 0.13.11

· Spark 2.2.3 without Hadoop version

· Spark Job Server 0.8.0

Java Installation:

first of all, log in as a root user

sudo su –

install wget

sudo apt-get install wget

install open jdk 1.8.131

sudo apt-get install openjdk-8-jdk

set JAVA_HOME in bashrc file

vi ~/.bashrc

export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

now load bashrc file

source ~/.bashrc

Hadoop Installation:

Install Hadoop 2.8.5 at /opt/ location

Cd /opt/

curl -O  https://archive.apache.org/dist/hadoop/core/hadoop-2.8.5/hadoop-2.8.5.tar.gz

tar -xzf hadoop-2.8.5.tar.gz -C /usr/local/

mv /usr/local/hadoop-2.8.5/ /opt/

rm /opt/hadoop-2.8.5.tar.gz

set environment variable for Hadoop:

vi ~/.bashrc

export HADOOP_HOME="/opt/hadoop-2.8.5"

export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

export YARN_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop

export HADOOP_CONF_DIR=/opt/hadoop-2.8.5/etc/Hadoop

source ~/.bashrc

SBT installation:

set environment variable for SBT

export SBT_VERSION=0.13.11

Install SBT 0.13.11

 wget -P /tmp "http://dl.bintray.com/sbt/native-packages/sbt/0.13.11/sbt-${SBT_VERSION}.tgz"

 tar -xzf /tmp/sbt-${SBT_VERSION}.tgz -C /usr/local/

 ln -sf /usr/local/sbt/bin/sbt /usr/local/bin/sbt

 rm /tmp/sbt-${SBT_VERSION}.tgz

Spark Installation:

set environment variable for spark installation

 export HADOOP_VERSION=2.8.5

 export SPARK_VERSION=2.2.3

 export SPARK_HADOOP_BIN=hadoop2.8.5

install Spark 2.2.3

wget -P /tmp “http://archive.apache.org/dist/spark/spark-2.2.3/spark-2.2.3-bin-without-hadoop.tgz”

tar -xzf /tmp/spark-${SPARK_VERSION}-bin-${SPARK_HADOOP_BIN}.tgz

mv spark-${SPARK_VERSION}-bin-${SPARK_HADOOP_BIN} /opt/

rm /tmp/spark-${SPARK_VERSION}-bin-${SPARK_HADOOP_BIN}.tgz

now set SPARK_HOME and other variables in bashrc

vi ~/.bashrc

export SPARK_HOME=/opt/spark-2.2.3

export SPARK_DIST_CLASSPATH=/opt/hadoop-2.8.5/etc/hadoop:

/opt/hadoop-2.8.5/share/hadoop/common/lib/*:

/opt/hadoop-2.8.5/share/hadoop/common/*:

/opt/hadoop-2.8.5/share/hadoop/hdfs:

/opt/hadoop-2.8.5/share/hadoop/hdfs/lib/*:

/opt/hadoop-2.8.5/share/hadoop/hdfs/*:

/opt/hadoop-2.8.5/share/hadoop/yarn/lib/*:

/opt/hadoop-2.8.5/share/hadoop/yarn/*:

/opt/hadoop-2.8.5/share/hadoop/mapreduce/lib/*:

/opt/hadoop-2.8.5/share/hadoop/mapreduce/*:

/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:

/opt/hadoop-2.8.5/contrib/capacity-scheduler/*.jar:$SPARK_HOME/jars/*.jar

source ~/.bashrc

Spark Job Server Installation:

set environment variable in bashrc for spark job server:

export JOBSERVER_VERSION=v0.8.0

export SPARK_JOBSERVER_VERSION=0.8.0

install spark job server 0.8.0

wget -q https://github.com/spark-jobserver/spark-jobserver/archive/v${SPARK_JOBSERVER_VERSION}.tar.gz

tar -xzf v${SPARK_JOBSERVER_VERSION}.tar.gz

mv spark-jobserver-${SPARK_JOBSERVER_VERSION} /opt/

rm v${SPARK_JOBSERVER_VERSION}.tar.gz

now set environment variable in path in bashrc file:

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/sbin:$SPARK_HOME/bin

to check env variable is set or not

env

Steps to run Spark Job Server when spark is running on yarn-client mode:

Make a file spark-default.conf in /opt/spark-2.2.3/conf directory and add below lines in the file.

Make sure Master is yarn and deploy mode is client

Spark Jars location would be cluster master VM HDFS address on which all spark jars are located.

Note: Replace <cluster-master-name> with your actual master VM instance name.

spark.master yarn

spark.submit.deployMode client

spark.yarn.jars=hdfs://<cluster-master-name>/user/spark/jars/*.jar

spark.eventLog.enabled true

spark.eventLog.dir hdfs://<cluster-master-name>/user/spark/eventlog

spark.dynamicAllocation.enabled true

spark.dynamicAllocation.minExecutors 1

spark.executor.instances 10000

spark.dynamicAllocation.maxExecutors 10000

spark.shuffle.service.enabled true

spark.scheduler.minRegisteredResourcesRatio 0.0

spark.yarn.historyServer.address <cluster-master-name>:18080

spark.history.fs.logDirectory hdfs://<cluster-master-name>/user/spark/eventlog

spark.driver.extraJavaOptions -Dflogger.backend_factory=com.google.cloud.

hadoop.repackaged.gcs.com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance

spark.executor.extraJavaOptions -Dflogger.backend_factory=com.google.cloud.hadoop.

repackaged.gcs.com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance

spark.sql.parquet.cacheMetadata=false

spark.executor.cores=1

spark.executor.memory=3712m

spark.driver.memory=1920m

spark.driver.maxResultSize=960m

spark.yarn.am.memory=640m

Add HADOOP_CONF_DIR in spark-env.sh file given at /opt/spark-2.2.3/conf location:

export HADOOP_CONF_DIR="/opt/hadoop-2.8.5/etc/hadoop"

Add yarn resource manager hostname (in our case cluster master VM host name) and other properties in yarn-site.xml file given at /opt/hadoop-2.8.5/etc/hadoop location:

Note: Replace <cluster-master-name> with your actual master VM instance name.

<configuration>

  <property>

    <name>yarn.resourcemanager.hostname</name>

    <value><cluster-master-name></value>

  </property>

  <property>

    <name>yarn.nodemanager.pmem-check-enabled</name>

    <value>false</value>

  </property>

  <property>

    <name>yarn.nodemanager.vmem-check-enabled</name>

    <value>false</value>

  </property>

</configuration>

Add below properties in core-site.xml file give at /opt/hadoop-2.8.5/etc/hadoop location. Here make sure fs.default.name is hdfs://<cluster-master-name> which is hdfs address of master VM in cluster.

Note: Replace <cluster-master-name> with your actual master VM instance name.

<configuration>

  <property>

    <name>hadoop.proxyuser.hive.hosts</name>

    <value>*</value>

  </property>

  <property>

    <name>hadoop.tmp.dir</name>

    <value>/hadoop/tmp</value>

    <description>A base for other temporary directories.</description>

  </property>

  <property>

    <name>fs.default.name</name>

    <value>hdfs://<cluster-master-name></value>

    <description>The old FileSystem used by FsShell.</description>

  </property>

  <property>

    <name>fs.defaultFS</name>

    <value>hdfs://<cluster-master-name></value>

    <description>

      The name of the default file system. A URI whose scheme and authority

      determine the FileSystem implementation. The uri's scheme determines

      the config property (fs.SCHEME.impl) naming the FileSystem

      implementation class. The uri's authority is used to determine the

      host, port, etc. for a filesystem.

    </description>

  </property>

  <property>

    <name>hadoop.proxyuser.hive.groups</name>

    <value>*</value>

  </property>

  <property>

          <name>fs.gs.working.dir</name>

           <value>/</value>

    <description>

      The directory relative gs: uris resolve in inside of the default bucket.

    </description>

  </property>

  <property>

    <name>fs.gs.system.bucket</name>

    <value><Bucket-name></value>

    <description>

      GCS bucket to use as a default bucket if fs.default.name is not a gs: uri.

    </description>

  </property>

  <property>

    <name>fs.gs.metadata.cache.directory</name>

    <value>/hadoop_gcs_connector_metadata_cache</value>

    <description>

Only used if fs.gs.metadata.cache.type is FILESYSTEM_BACKED, specifies

      the local path to use as the base path for storing mirrored GCS metadata.

      Must be an absolute path, must be a directory, and must be fully

      readable/writable/executable by any user running processes which use the

      GCS connector.

    </description>

  </property>

  <property>

    <name>fs.gs.impl</name>

    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>

    <description>The FileSystem for gs: (GCS) uris.</description>

  </property>

  <property>

    <name>fs.gs.project.id</name>

    <value><name-of-project></value>

    <description>

      Google Cloud Project ID with access to configured GCS buckets.

    </description>

  </property>

  <property>

    <name>fs.gs.metadata.cache.enable</name>

    <value>false</value>

    <final>false</final>

    <source>Dataproc Cluster Properties</source>

  </property>

  <property>

    <name>fs.gs.implicit.dir.infer.enable</name>

    <value>true</value>

    <description>

      If set, we create and return in-memory directory objects on the fly when

      no backing object exists, but we know there are files with the same

      prefix.

    </description>

  </property>

  <property>

    <name>fs.gs.application.name.suffix</name>

    <value>-dataproc</value>

    <description>

      Appended to the user-agent header for API requests to GCS to help identify

      the traffic as coming from Dataproc.

    </description>

  </property>

  <property>

    <name>fs.AbstractFileSystem.gs.impl</name>

    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>

    <description>The AbstractFileSystem for gs: (GCS) uris.</description>

  </property>

  <property>

<name>fs.gs.metadata.cache.type</name>

    <value>FILESYSTEM_BACKED</value>

    <description>

      Specifies which implementation of DirectoryListCache to use for

      supplementing GCS API &amp;amp;quot;list&amp;amp;quot; requests. Supported

      implementations:       IN_MEMORY: Enforces immediate consistency within

      same Java process.       FILESYSTEM_BACKED: Enforces consistency across

      all cooperating processes       pointed at the same local mirror

      directory, which may be an NFS directory       for massively-distributed

      coordination.

    </description>

  </property>

  <property>

    <name>fs.gs.block.size</name>

    <value>134217728</value>

    <final>false</final>

    <source>Dataproc Cluster Properties</source>

  </property>

  <property>

    <name>hadoop.ssl.enabled.protocols</name>

    <value>TLSv1,TLSv1.1,TLSv1.2</value>

    <final>false</final>

    <source>Dataproc Cluster Properties</source>

  </property>

</configuration>

Make a file yarn.sh at /opt/spark-jobserver/config location of spark-jobserver and paste below content in the file:

Here it is import that you have set YARN_CONF_DIR=/opt/hadoop-2.8.5/etc/Hadoopand HADOOP_CONF_DIR=/opt/hadoop-2.8.5/etc/Hadoopwhich is a Hadoop location where all configuration files are available.

#!/usr/bin/env bash

# Environment and deploy file

# For use with bin/server_deploy, bin/server_package etc.

DEPLOY_HOSTS="spark-job-server-ubuntu"

APP_USER=spark

APP_GROUP=spark

JMX_PORT=9999

# optional SSH Key to login to deploy server

#SSH_KEY=/path/to/keyfile.pem

INSTALL_DIR=/usr/local/spark-jobserver

LOG_DIR=/var/log/job-server

PIDFILE=spark-jobserver.pid

JOBSERVER_MEMORY=1G

SPARK_VERSION=2.2.3

MAX_DIRECT_MEMORY=512M

SPARK_HOME=/opt/spark-2.2.3

SPARK_CONF_DIR=$SPARK_HOME/conf

YARN_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop

HADOOP_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop

SCALA_VERSION=2.11.6 # or 2.11.6

Make a file yarn.conf at /opt/spark-jobserver/bin location of spark-jobserver and paste below content in the file. Here make sure following points:

“Master” would be “yarn”

“Submit.deployMode” would be “client”

“context-per-jvm” would be “true”

# Template for a Spark Job Server configuration file

# When deployed these settings are loaded when job server starts

# Spark Cluster / Job Server configuration

spark {

  # spark.master will be passed to each job's JobContext

  # local[...], yarn, mesos://... or spark://...

  master = "yarn"

  # client or cluster deployment

  submit.deployMode = "client"

  # Default # of CPUs for jobs to use for Spark standalone cluster

  job-number-cpus = 4

  jobserver {

    port = 8090

    context-per-jvm = true

    # Note: JobFileDAO is deprecated from v0.7.0 because of issues in

    # production and will be removed in future, now defaults to H2 file.

    jobdao = spark.jobserver.io.JobSqlDAO

    filedao {

      rootdir = /tmp/spark-jobserver/filedao/data

    datadao {

      # storage directory for files that are uploaded to the server

      # via POST/data commands

      rootdir = /tmp/spark-jobserver/upload

    sqldao {

      # Slick database driver, full classpath

      slick-driver = slick.driver.H2Driver

      # JDBC driver, full classpath

      jdbc-driver = org.h2.Driver

           # Directory where default H2 driver stores its data. Only needed for H2.

      rootdir = /tmp/spark-jobserver/sqldao/data

      # Full JDBC URL / init string, along with username and password.  Sorry, needs to match above.

      # Substitutions may be used to launch job-server, but leave it out here in the default or tests won't pass

      jdbc {

        url = "jdbc:h2:file:/tmp/spark-jobserver/sqldao/data/h2-db"

        user = ""

        password = ""

      # DB connection pool settings

      dbcp {

        enabled = false

        maxactive = 20

        maxidle = 10

        initialsize = 10

    # When using chunked transfer encoding with scala Stream job results, this is the size of each chunk

    result-chunk-size = 1m

  # Predefined Spark contexts

  # contexts {

  #   my-low-latency-context {

  #     num-cpu-cores = 1           # Number of cores to allocate.  Required.

  #   memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.

  #   }

  #   # define additional contexts here

# }

  # Universal context configuration.  These settings can be overridden, see README.md

  context-settings {

    num-cpu-cores = 2           # Number of cores to allocate.  Required.

    memory-per-node = 512m         # Executor memory per node, -Xmx style eg 512m, #1G, etc.

    # In case spark distribution should be accessed from HDFS (as opposed to being installed on every Mesos slave)

    # spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz"

    # URIs of Jars to be loaded into the classpath for this context.

    # Uris is a string list, or a string separated by commas ','

    # dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"]

    # Add settings you wish to pass directly to the sparkConf as-is such as Hadoop connection

    # settings that don't use the "spark." prefix

    passthrough {

         #es.nodes = ""

  # This needs to match SPARK_HOME for cluster SparkContexts to be created successfully

  # home = "/home/spark/spark"

# Note that you can use this file to define settings not only for job server,

# but for your Spark jobs as well.  Spark job configuration merges with this configuration file as defaults.

akka {

  remote.netty.tcp {

     #hostname = ""

    # This controls the maximum message size, including job results, that can be sent

     maximum-frame-size = 100 MiB

At /opt/spark-jobserver/bin location of spark-jobserver, paste below content in settings.sh file:

#!/usr/bin/env bash

# Environment and deploy file

# For use with bin/server_deploy, bin/server_package etc.

DEPLOY_HOSTS="spark-jobserver"

APP_USER=spark

APP_GROUP=spark

JMX_PORT=9999

# optional SSH Key to login to deploy server

#SSH_KEY=/path/to/keyfile.pem

INSTALL_DIR=/home/spark/job-server

LOG_DIR=/var/log/job-server

PIDFILE=spark-jobserver.pid

JOBSERVER_MEMORY=1G

SPARK_VERSION=2.2.3

MAX_DIRECT_MEMORY=512M

SPARK_HOME=/opt/spark-2.2.3

SPARK_CONF_DIR=$SPARK_HOME/conf

YARN_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop

HADOOP_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop

SCALA_VERSION=2.11.6 # or 2.11.6

At /opt/spark-jobserver/bin location of spark-jobserver, paste below content in log4j-server.properties file. This will make logs in /var/log/job-server/ location:

# Rotating log file configuration for server deploys

# Root logger option

log4j.rootLogger=INFO,LOGFILE

log4j.appender.LOGFILE=org.apache.log4j.RollingFileAppender

log4j.appender.LOGFILE.File=${LOG_DIR}/spark-job-server.log

log4j.appender.LOGFILE.layout=org.apache.log4j.PatternLayout

# log4j.appender.LOGFILE.layout.ConversionPattern=%d %-5p %c - %m%n

log4j.appender.LOGFILE.layout.ConversionPattern=[%d] %-5p %.26c [%X{testName}] [%X{akkaSource}] - %m%n

log4j.appender.LOGFILE.maxFileSize=20MB

log4j.appender.LOGFILE.maxBackupIndex=30

# Settings to quiet spark logs that are too verbose

log4j.logger.org.apache.spark.scheduler.TaskSetManager=WARN

log4j.logger.org.apache.spark.scheduler.DAGScheduler=WARN

Start/Stop Spark Job Server:

Now final step, to run spark job server goto /opt/spark-jobserver/bin location and hit below command:

./server_start.sh

You can check server has been started on by jps command:

jps

This will show all java processes available in the system, in which you can view one process name is SparkSubmit. Hence you can confirm spark job server has been started.

To stop spark job server, hit below command at /opt/spark-jobserver/bin location:

./server_stop.sh

Error while running Software through terminal:

1) While running Spark Job Server If you got below error:

Failed to find Spark jars directory (/opt/spark-2.2.3/assembly/target/scala-2.10/jars).

You need to build Spark with the target "package" before running this program.

Solution: This means hadoop distribution jars are missing so we have two option:

option 1:

jars copied from hadoop distribution to spark directory

Option 2: (Recommended)

only download Spark Distribution without Hadoop(spark-2.2.3-bin-without-hadoop.tgz) from spark website. This will have all the required jars files in the setup.

2) While starting spark-shell in yarn mode like if you hit the below command at terminal:

spark-shell –master yarn and getting below error:

Error: Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

19/06/26 07:21:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

19/06/26 07:22:01 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

19/06/26 07:22:07 ERROR spark.SparkContext: Error initializing SparkContext.

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch an application master.

at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)

at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)

at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)

at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)

at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)

at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)

Solution: Step1. Set SPARK_HOME and in PATH add $SPARK_HOME/sbin:$SPARK_HOME/bin

export SPARK_HOME=/opt/spark-2.2.3

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/sbin:$SPARK_HOME/bin

Step2. add $SPARK_HOME/jars/*.jar in SPARK_DIST_CLASSPATH variable

export SPARK_DIST_CLASSPATH=/opt/hadoop-2.8.5/etc/hadoop:/opt/hadoop-2.8.5/share/hadoop/common/lib/*:/opt/hadoop-2.8.5/share/hadoop/common/*:/opt/hadoop-2.8.5/share/hadoop/hdfs:/opt/hadoop-2.8.5/share/hadoop/hdfs/lib/*:/opt/hadoop-2.8.5/share/hadoop/hdfs/*:/opt/hadoop-2.8.5/share/hadoop/yarn/lib/*:/opt/hadoop-2.8.5/share/hadoop/yarn/*:/opt/hadoop-2.8.5/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.8.5/share/hadoop/mapreduce/*:/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/opt/hadoop-2.8.5/contrib/capacity-scheduler/*.jar:$SPARK_HOME/jars/*.jar

==================================================================

3) How to set SPARK_DIST_CLASSPATH in environment variables in bashrc file:

Solution:

Option 1: (Recommended)

run echo $(hadoop classpath) this will give an output like below:

/opt/hadoop-2.8.5/etc/hadoop:/opt/hadoop-2.8.5/share/hadoop/common/lib/*:/opt/hadoop-2.8.5/share/hadoop/common/*:/opt/hadoop-2.8.5/share/hadoop/hdfs:/opt/hadoop-2.8.5/share/hadoop/hdfs/lib/*:/opt/hadoop-2.8.5/share/hadoop/hdfs/*:/opt/hadoop-2.8.5/share/hadoop/yarn/lib/*:/opt/hadoop-2.8.5/share/hadoop/yarn/*:/opt/hadoop-2.8.5/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.8.5/share/hadoop/mapreduce/*:/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/opt/hadoop-2.8.5/contrib/capacity-scheduler/*.jar

Now copy this complete path and paste it in bashrc file as below:

Option 2:

Add below line in bashrc file:

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

==================================================================

4) To run spark job server in yarn-client mode following variables are necessary in bashrc file:

export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

export HADOOP_HOME="/opt/hadoop-2.8.5"

export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

export YARN_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop

export HADOOP_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop

aroratimus

Tuesday, December 8, 2020

GCP Spark Job Server Setup

SSH Terminal Access:

Software Installation:

Java Installation:

Hadoop Installation:

SBT installation:

Spark Installation:

Spark Job Server Installation:

Steps to run Spark Job Server when spark is running on yarn-client mode:

Start/Stop Spark Job Server:

Error while running Software through terminal:

Blog Archive

Labels

Popular Posts