Issue
Perspectives
Actually I needs to configure two service files. One for Spark Master and another for Spark Slave (Worker) node. Please find the environment and service configuration as following:
Cofigurations
/opt/cli/spark-3.3.0-bin-hadoop3/etc/env
JAVA_HOME="/usr/lib/jvm/java-17-openjdk-amd64"
SPARK_HOME="/opt/cli/spark-3.3.0-bin-hadoop3"
PYSPARK_PYTHON="/usr/bin/python3"
/etc/systemd/system/spark-master.service
[Unit]
Description=Apache Spark Master
Wants=network-online.target
After=network-online.target
[Service]
User=spark
Group=spark
Type=forking
WorkingDirectory=/opt/cli/spark-3.3.0-bin-hadoop3/sbin
EnvironmentFile=/opt/cli/spark-3.3.0-bin-hadoop3/etc/env
ExecStartPost=/bin/bash -c "echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-master.pid"
ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-master.sh
ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-master.sh
[Install]
WantedBy=multi-user.target
/etc/systemd/system/spark-slave.service
[Unit]
Description=Apache Spark Slave
Wants=network-online.target
After=network-online.target
[Service]
User=spark
Group=spark
Type=forking
WorkingDirectory=/opt/cli/spark-3.3.0-bin-hadoop3/sbin
EnvironmentFile=/opt/cli/spark-3.3.0-bin-hadoop3/etc/env
ExecStartPost=/bin/bash -c "echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-slave.pid"
ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-slave.sh spark://spark.cdn.chorke.org:7077
ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-slave.sh
[Install]
WantedBy=multi-user.target
Outcome
It's started successfully but failed to stop successfully for some sorts of errors! Actually it's failed to stop Apache Spark Master or Slave using Systemd
Spark Master Stop Status
× spark-master.service - Apache Spark Master
Loaded: loaded (/etc/systemd/system/spark-master.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2022-09-26 18:43:39 +08; 8s ago
Docs: https://spark.apache.org/docs/3.3.0
Process: 488887 ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-master.sh (code=exited, status=0/SUCCESS)
Process: 489000 ExecStartPost=/bin/bash -c echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-master.pid (code=exited, status=0/SUCCESS)
Process: 489484 ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-master.sh (code=exited, status=0/SUCCESS)
Main PID: 488903 (code=exited, status=143)
CPU: 4.813s
Spark Slave Stop Status
× spark-slave.service - Apache Spark Slave
Loaded: loaded (/etc/systemd/system/spark-slave.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2022-09-26 18:38:22 +08; 15s ago
Docs: https://spark.apache.org/docs/3.3.0
Process: 489024 ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-slave.sh spark://ns12-pc04:7077 (code=exited, status=0/SUCCESS)
Process: 489145 ExecStartPost=/bin/bash -c echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-slave.pid (code=exited, status=0/SUCCESS)
Process: 489174 ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-slave.sh (code=exited, status=0/SUCCESS)
Main PID: 489040 (code=exited, status=143)
CPU: 4.306s
Expected Behavior
Your guide line would be appreciated to shutdown both Master & Slave node without any error.
Solution
Theoretical Solution
In this case you have to write your own script for manipulating the shutdown to force exit code 0
instead of 143
. If you are idle enough like me then you can changeSuccessExitStatus
from 0
to 143
. By default systemd
unit test looking forSuccessExitStatus
code is 0. We need to change the default unit test behavior.
Practical Solution
/etc/systemd/system/spark-master.service
[Unit]
Description=Apache Spark Master
Wants=network-online.target
After=network-online.target
[Service]
User=spark
Group=spark
Type=forking
SuccessExitStatus=143
WorkingDirectory=/opt/cli/spark-3.3.0-bin-hadoop3/sbin
EnvironmentFile=/opt/cli/spark-3.3.0-bin-hadoop3/etc/env
ExecStartPost=/bin/bash -c "echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-master.pid"
ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-master.sh
ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-master.sh
[Install]
WantedBy=multi-user.target
/etc/systemd/system/spark-slave.service
[Unit]
Description=Apache Spark Slave
Wants=network-online.target
After=network-online.target
[Service]
User=spark
Group=spark
Type=forking
SuccessExitStatus=143
WorkingDirectory=/opt/cli/spark-3.3.0-bin-hadoop3/sbin
EnvironmentFile=/opt/cli/spark-3.3.0-bin-hadoop3/etc/env
ExecStartPost=/bin/bash -c "echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-slave.pid"
ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-slave.sh spark://spark.cdn.chorke.org:7077
ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-slave.sh
[Install]
WantedBy=multi-user.target
Answered By - Śhāhēēd Answer Checked By - Clifford M. (WPSolving Volunteer)