CLUSTERNAME-pod-0
, to expand the Pod.create_gres_conf.sh
: Generates the Slurm Generic Resource (GRES) configuration file that defines GPU resources for each node.create_slurm_conf.sh
: Creates the main Slurm configuration file with cluster settings, node definitions, and partition setup.install.sh
: The primary installation script that sets up MUNGE authentication, configures Slurm, and prepares the environment.test_batch.sh
: A sample Slurm job script for testing cluster functionality.[MUNGE_SECRET_KEY]
with any secure random string (like a password). The secret key is used for authentication between nodes, and must be identical across all Pods in your cluster.
echo $HOSTNAME
on the web terminal of each Pod and look for node-0
.node-0
), run both Slurm services:
node-1
), run:
-D
flag keeps the services running in the foreground, so each command needs its own terminal.
node-0
) to check the status of your nodes:
node-0
) to submit the test job script and confirm that your cluster is working properly:
test_simple_[JOBID].out
) and look for the hostnames of both nodes. This confirms that the job ran successfully across the cluster.