Management

Cluster update

List of actions to update the configuration of a cluster:

  • On the HeadNode of the cluster:

    • Stop the batch jobs with scancel <job_id>

    • Stop the VDI Windows if any with: connect to EnginFrame as an administrator, click on Admin portal/All Sessions, and terminate all active sessions.

  • On the the CCME Management Host, you will need to perform the following steps. On each step, you can check the Compute Fleet status with pcluster describe-compute-fleet -n <cluster_name>:

    1. Stop the Compute Fleet with pcluster update-compute-fleet --status STOP_REQUESTED -n <cluster_name>, and wait for the Compute Fleet status to be stopped

    2. Update the cluster with pcluster update-cluster -c <configuration_file> -n <cluster_name>, and wait for the end of the cluster update

    3. Start again the Compute Fleet with pcluster update-compute-fleet --status START_REQUESTED -n <cluster_name>, and wait for the Compute Fleet status to be running

    4. Your cluster is now updated and functional

For more information about pcluster update-cluster command, refer to: https://docs.aws.amazon.com/parallelcluster/latest/ug/pcluster.update-cluster-v3.html

Updating CCME dependencies

Warning

Updating CCME dependencies in dependencies.yaml can break CCME. First test any changes in a sandbox environment where you don’t risk to break your production cluster.

In some cases, it can be necessary to update CCME dependencies. One of such cases is for example when a CVE has been issued for one of the components of CCME.

If you need to do so, update the version of the impacted CCME dependencies in CCME/dependencies.yaml and/or management/dependencies.yaml

For example, if you want to update Slurm to a newer version, you can update the slurm: "YY.MM.V" parameter.

Then, start a new CCME cluster, or update an currently running CCME cluster to apply the changes.