Cluster

Configuration

Global configuration

Add a configuration to the pcluster config file in /home/$(whoami)/.parallelcluster/config or edit /home/$(whoami)/.parallelcluster/simple for a simple cluster. In order to launch a cluster, you first need to create a configuration file for the cluster. You can either create your own, or edit one of examples provided in /home/$(whoami)/.parallelcluster/ on the Management Host. We provide below an example, in which you must replace each yaml variable (e.g., {{ CCME_VARIABLE }}) by the required information. Make sure you use the correct subnets, security groups and policies.

The HeadNode.CustomActions.OnNodeUpdated.Args section of the ParallelCluster configuration file can contain a set of parameters that will influence the configuration of your cluster. Here is the list of supported parameters:

  • OS

    • CCME_NO_PROXY (optional): string containing a list of hosts to exclude traffic destined to certain hosts from using the proxy. It is used to set the no_proxy and NO_PROXY environment variables: content of CCME_NO_PROXY is appended to the content of those 2 variables. This is usually used in conjunction with the Proxy option in the AWS ParallelCluster configuration file.

  • S3

    • CCME_S3FS (optional): list of S3 buckets to mount through S3FS (the policies attached to the HeadNode and Computes need to have read/write access to these buckets). Format is CSV: CCME_S3FS=bucket1,bucket2,bucket3 (or simply CCME_S3FS=bucket if there is a single bucket). If this variable is unset or equal to NONE, then no bucket is mounted through S3FS.

    • CCME_JSLOGS_BUCKET (mandatory): name of an S3 bucket on which Slurm accounting logs will be exported (see Slurm accounting logs, the policies attached to the HeadNode and Computes need to have read/write access to this bucket).

  • Security

    • CCME_PASSWORDS_SIZE (optional): this variable allows you to change the default password length generated by CCME. 32 is the default value.

  • ALB

    • CCME_OIDC: a prefix used to locate the CCME/conf/${CCME_OIDC}.oidc.yaml file that contains configurations for OIDC authentication (see OIDC External Authentication).

    • CCME_DNS: the DNS name of the ALB (if you are not using the default ALB DNS name), e.g., my.domain.com. The cluster url is alb_url/my_cluster_name/portal/ for portal and alb_url/my_cluster_name/dcv-instance-id as default value. That default url is the the application load balancer and can be replaced by a custom Domain Name Server. If you want to configure your DNS allowing your personal.domain.com pointing to the Application Load Balancer in order to have replace your url and have a personal.domain.com/my_cluster/portal/ and personal.domain.com/my_cluster/visualization url for your cluster web access, then set the CCME_DNS to personal.domain.com instead of NONE.

  • User

    • CCME_USER_HOME: This variable allows you to use a path other than /home for the user’s home directory. You can use the %u parameter to retrieve the username (e.g., /file-system/home/%u). In case of a file system created with the cluster by parallelcluster, you must set the file system path mountpoint to /home instead of using CCME_USER_HOME

  • Linux remote visualization (VDI through Autoscaling fleet) (see Autoscaling fleet):

    • CCME_LIN_SG: A CSV list of security groups (a single SG can be provided).

    • CCME_LIN_ASG_SUBNETS: A CSV list of subnets in which to deploy the instances (a single subnet can be provided).

    • CCME_LIN_AMI: The AMI to use with the Linux DCV fleet instances, it must be an AMI baked with CCME (or at least AWS ParallelCluster).

    • CCME_LIN_BOOTSTRAP_TIMEOUT: The bootstrap timeout for launching Linux DCV nodes.

    • CCME_LIN_INSTANCE_PROFILE: The ARN of the Instance profile to use on the instances (must be the ccme_cluster_compute_instance_profile parameter of the CMH stack or the ComputeNodeInstanceProfileSlurm output of the CRS stack).

    • CCME_LIN_ASG_INST_TYPES: A CSV list of instance types. Each one will get its own Autoscaling Group (ASG).

    • CCME_LIN_ASG_MIN: A CSV list of minimum number of instances. There must be the same number of values as in CCME_LIN_ASG_INST_TYPES.

    • CCME_LIN_ASG_DES: A CSV list of initial desired number of instances. There must be the same number of values as in CCME_LIN_ASG_INST_TYPES.

    • CCME_LIN_ASG_MAX: A CSV list of maximus number of instances. There must be the same number of values as in CCME_LIN_ASG_INST_TYPES.

    • CCME_LIN_WARM_POOL: A CSV list of Booleans to activate or not the warm pool. There must be the same number of values as in CCME_LIN_ASG_INST_TYPES.

    • CCME_LIN_BUFFER_SIZE: A CSV list of integers to specify the buffer size (number of instances to keep ready on top of the ones used by running DCV sessions). . There must be the same number of values as in CCME_LIN_ASG_INST_TYPES.

  • Windows remote visualization (VDI) (see Configuration for a complete description of these parameters):

    • CCME_WIN_LAUNCH_TEMPLATE_ID: Launch template used to launch Windows EC2 instances. CCME creates a default launch template when deploying the CMH, but you can setup your own here.

    • CCME_WIN_AMI: ID of the AMI used to launch Windows EC2 instances (see Prerequisites for prerequisites).

    • CCME_WIN_INSTANCE_TYPE: Instances type used to launch Windows EC2 instances.

    • CCME_WIN_INACTIVE_SESSION_TIME, CCME_WIN_NO_SESSION_TIME and CCME_WIN_NO_BROKER_COMMUNICATION_TIME: parameters to control the lifecycle of Windows remote visualization sessions.

    • CCME_WIN_TAGS: Dictionary of additional tags to apply on the instances of the Windows fleet (see Starting a Windows DCV session for the list of default tags).

  • EnginFrame

    • CCME_EF_ADMIN_GROUP: The name of an OS group, users belonging to this group will automatically be promoted as administrators in EnginFrame (no sudo access).

    • CCME_EFADMIN_PASSWORD: ARN of a secret containing the password of EnginFrame admin account. The expected value in the preexisting ARN of a plaintext string stored in AWS Secrets Manager (ASM). Do not set this variable to let CCME generate and store a password in /shared/CCME/ccme.passwords.efadmin

    • CCME_EFADMIN_SUDOER: if true, then the efadmin user is sudoer. false is the default value.

    • CCME_EFADMIN_ID: The uid/gid to create the efadmin user locally

    • CCME_EFNOBODY_ID: The uid/gid to create the efnobody user locally

  • Remote access

    • CCME_AWS_SSM: if set to true, AWS SSM agent will be installed on all the nodes, to allow remote connection to them with AWS SSM.

  • Slurm

    • CCME_CUSTOM_SLURMDBD_SETTINGS: dictionary of specific options to add to SlurmDBD configuration See slurmdbd configuration for possible values. The format must be a valid “YAML dictionary embedded in a string”. Hence, the whole line must be enclosed in double quotes, and then the value of CCME_CUSTOM_SLURMDBD_SETTINGS must be the dict enclosed in escaped double quotes. See the following example: "CCME_CUSTOM_SLURMDBD_SETTINGS=\"{'PrivateData': 'jobs,events,accounts,reservations,usage,users', 'PurgeEventAfter': '12'}\""

  • Notifications

    • CCME_ADMIN_PHONE: can be set to a valid mobile phone number (in E. 164 format) to deliver information about the cluster when it is ready to be used. As described by the ITU, the E.164 general format must contain only digits split as follows:

      • ‘+’ sign

      • Country code (max 3 digits)

      • Subscriber number (max 12 digits)

      WARNING: This feature is restricted by an “account spend limit” that prevents you to spend more than a given amount of money on SMS sendings. See this documentation.

    • CCME_ADMIN_SNS_TOPIC_ARN: can be set to a valid SNS topic ARN you want to deliver information about the cluster when it is ready to be used. For example, you can configure your SNS topic to deliver the information by email to the administrators of the platform.

Note

No parameter must be set on the following sections, as they will be inherited from the HeadNode.CustomActions.OnNodeUpdated.Args parameters:

  • HeadNode.CustomActions.OnNodeStart.Args

  • HeadNode.CustomActions.OnNodeConfigured:.Args

Note

The CCME solution is downloaded from S3 on the HeadNode. Then the download directory is mounted on /opt/CCME on each ComputeNodes from the HeadNode using NFS.

CCME applies its configurations and installs software on ParallelCluster clusters through a set of Bash and Ansible scripts. The entry points are the Bash scripts specified in the OnNodeStart, OnNodeConfigured and OnNodeUpdated parameters of the HeadNode.CustomActions and Scheduling.SlurmQueues[].CustomActions sections. The values presented below (and present in the generated example configuration files) must always be present.

Example ParallelCluster configuration file for CCME
  1Region: '{{ AWS_REGION }}'
  2CustomS3Bucket: '{{ CCME_CLUSTER_S3BUCKET }}'
  3Iam:
  4  Roles:
  5    LambdaFunctionsRole: '{{ CCME_CLUSTER_LAMBDA_ROLE }}'
  6  # If the role associated to the cluster includes a custom IAM path prefix,
  7  # replace "parallelcluster" by the custom IAM path prefix.
  8  ResourcePrefix: "parallelcluster"
  9Image:
 10  Os: {{ "alinux2" or "rhel8" or "rhel9" or "rocky8" or "rocky9" }}
 11  # CustomAmi: ami-id
 12Tags:
 13  - Key: Owner
 14    Value: '{{ CCME_OWNER }}'
 15  - Key: Reason
 16    Value: '{{ CCME_REASON }}'
 17SharedStorage:
 18  - Name: shared
 19    StorageType: Ebs
 20    MountDir: shared
 21HeadNode:
 22  InstanceType: t3.medium
 23  Networking:
 24    SubnetId: '{{ CCME_SUBNET }}'
 25    SecurityGroups:
 26      - '{{ CCME_PRIVATE_SG }}'
 27  Ssh:
 28    KeyName: '{{ AWS_KEYNAME }}'
 29  LocalStorage:
 30    RootVolume:
 31      Size: 50
 32      Encrypted: true
 33  CustomActions:
 34    OnNodeStart:
 35      Script: s3://{{ CCME_SOURCES }}CCME/sbin/pre-install.sh
 36    OnNodeConfigured:
 37      Script: s3://{{ CCME_SOURCES }}CCME/sbin/post-install.sh
 38    OnNodeUpdated:
 39      Script: s3://{{ CCME_SOURCES }}CCME/sbin/update-install.sh
 40      Args:
 41        # CCME_ANSIBLE_SKIP_TAGS can be used to skip some phases of the CCME installation process.
 42        # If CCME_ANSIBLE_SKIP_TAGS is not specified, then no phase is skipped.
 43        # - CCME_ANSIBLE_SKIP_TAGS=
 44        - CCME_CMH_NAME={{ CCME_CMH_NAME }}
 45        - CCME_S3FS={{ CCME_DATA_BUCKET }}
 46        - CCME_JSLOGS_BUCKET={{ CCME_DATA_BUCKET }}
 47        - CCME_AWS_SSM=true
 48        - CCME_OIDC=default
 49        - CCME_USER_HOME=/file-system/home/%u
 50        - CCME_DNS="my.domain.com"
 51        - CCME_REPOSITORY_PIP="https://my.pip.domain.com/index/,https://my.pip.domain.com/index-url/"
 52        # Optional windows fleet
 53        - CCME_WIN_AMI="ami-i..."
 54        - CCME_WIN_INSTANCE_TYPE=NONE
 55        - CCME_WIN_INACTIVE_SESSION_TIME=600
 56        - CCME_WIN_NO_SESSION_TIME=600
 57        - CCME_WIN_NO_BROKER_COMMUNICATION_TIME=600
 58        - CCME_WIN_CUSTOM_CONF_REBOOT=true
 59        - CCME_WIN_LAUNCH_TRIGGER_DELAY=10
 60        - CCME_WIN_LAUNCH_TRIGGER_MAX_ROUNDS=100
 61        ## CCME_WIN_TAGS allows to add specific tags to instances of the Windows fleet
 62        # The format must be a valid "YAML dictionary embedded in a string".
 63        # Hence, the whole line must be enclosed in double quotes, and then the value
 64        # of CCME_WIN_TAGS must be the dict enclosed in escaped double quotes. See the following example:
 65        - "CCME_WIN_TAGS=\"{'MyTagKey1': 'MyTagValue1', 'MyTagKey2': 'MyTagValue2'}\""
 66        - CCME_EFADMIN_PASSWORD="arn:aws:secretsmanager:eu-west-1:012345678910:secret:ccme-prefix-efadmin.password-4riso"
 67        - CCME_EF_ADMIN_GROUP="Administrators"
 68        # CCME_EFADMIN_SUDOER defines if efadmin has a sudo role
 69        # Required: No
 70        # Patterns: true or false
 71        # CCME_EFADMIN_SUDOER=false
 72        # Specify EFADMIN or EFNOBODY UID/GID
 73        # - CCME_EFADMIN_ID=
 74        # - CCME_EFNOBODY_ID=
 75        ## CCME_CUSTOM_SLURMDBD_SETTINGS allows to add specific options to SlurmDBD
 76        # See https://slurm.schedmd.com/slurmdbd.conf.html for possible values
 77        # The format must be a valid "YAML/JSON dictionary embedded in a string".
 78        # Hence, the whole line must be enclosed in double quotes, and then the value
 79        # of CCME_CUSTOM_SLURMDBD_SETTINGS must be the dict enclosed in escaped double quotes.
 80        # Note: if you set PrivateData here, you must set it as well in Scheduling.SlurmSettings.CustomSlurmSettings
 81        # See the following example:
 82        - "CCME_CUSTOM_SLURMDBD_SETTINGS=\"{'PrivateData': 'jobs,events,accounts,reservations,usage,users', 'PurgeEventAfter': '12'}\""
 83  Iam:
 84    InstanceProfile: '{{ CCME_CLUSTER_HEADNODE_INSTANCE_PROFILE }}'
 85Scheduling:
 86  Scheduler: slurm
 87  SlurmSettings:
 88    Dns:
 89      # If the role associated to the cluster is not authorized to use Route 53,
 90      # or if you don't want to use Route 53,
 91      # set "DisableManagedDns" to true and "UseEc2Hostnames" to true
 92      DisableManagedDns: false
 93      UseEc2Hostnames: false
 94    CustomSlurmSettings:
 95      # If you set PrivateData in CCME_CUSTOM_SLURMDBD_SETTINGS, you must set it here as well.
 96      # PrivateData must be present in both slurmdbd.conf and slurm.conf.
 97      # Note the addition of the "cloud" value as well.
 98      - PrivateData: "jobs,events,accounts,reservations,usage,users,cloud"
 99  SlurmQueues:
100    - Name: basic-slurm
101      CapacityType: ONDEMAND
102      ComputeSettings:
103        LocalStorage:
104          RootVolume:
105            Size: 50
106            Encrypted: true
107      ComputeResources:
108        - Name: t2-small
109          InstanceType: t2.small
110          MinCount: 0
111          MaxCount: 2
112      CustomActions:
113        OnNodeStart:
114          Script: s3://{{ CCME_SOURCES }}CCME/sbin/pre-install.sh
115        OnNodeConfigured:
116          Script: s3://{{ CCME_SOURCES }}CCME/sbin/post-install.sh
117      Iam:
118        InstanceProfile: '{{ CCME_CLUSTER_COMPUTE_INSTANCE_PROFILE }}'
119      Networking:
120        SubnetIds:
121          - '{{ CCME_SUBNET }}'
122        SecurityGroups:
123          - '{{ CCME_COMPUTE_SG }}'
124    - Name: dcv-basic
125      CapacityType: ONDEMAND
126      ComputeResources:
127        - Name: t3-medium
128          InstanceType: t3.medium
129          MinCount: 0
130          MaxCount: 2
131      CustomActions:
132        OnNodeStart:
133          Script: s3://{{ CCME_SOURCES }}CCME/sbin/pre-install.sh
134        OnNodeConfigured:
135          Script: s3://{{ CCME_SOURCES }}CCME/sbin/post-install.sh
136      Iam:
137        InstanceProfile: '{{ CCME_CLUSTER_COMPUTE_INSTANCE_PROFILE }}'
138      Networking:
139        SubnetIds:
140          - '{{ CCME_SUBNET }}'
141        SecurityGroups:
142          - '{{ CCME_COMPUTE_SG }}'
143DirectoryService:
144  DomainName: {{ CCME_AD_DIR_NAME }}
145  DomainAddr: ldap://{{ CCME_AD_URI_1 }},ldap://{{ CCME_AD_URI_2 }}
146  PasswordSecretArn: {{ CCME_AD_PASSWORD }}
147  DomainReadOnlyUser: cn={{ CCME_AD_READ_ONLY_USER }},ou=Users,ou={{ CCME_AD_DIR_NAME_DC1 }},dc={{ CCME_AD_DIR_NAME_DC1 }},dc={{ CCME_AD_DIR_NAME_DC2 }}
148  LdapTlsReqCert: never # Set "hard" to enable ldaps
149  # LdapTlsCaCert: /opt/CCME/conf/{{ CCME_CA_FILE }} # Set it only with ldaps
150  # LdapAccessFilter:
151  AdditionalSssdConfigs:
152    # debug_level: "0x1ff" # Uncomment for logs, can be heavy
153    ldap_auth_disable_tls_never_use_in_production: True # Don't set it with ldaps

Note

If in the configuration of your cluster you want to use an external resource such as an EFS or FSx for NetApp file system that hasn’t been deployed by CCME, you will need to ensure that the targeted resource has at least one tag which name starts with ccme. For security reasons, CCME roles only allow CCME to describe and use services that have such ccme* tags.

We recommend that you use for example an explicit tag named such as: ccme:allow (the value is not important, but for readability reasons use a value of true for example).

Without such a tag, you will get an error message when trying to launch a cluster, for example when trying to connect an FSx for NetApp ONTAP without a ccme* tag, you can get an error like:

"message": "Invalid cluster configuration: User:
arn:aws:sts::123456789012:assumed-role/CRS-myrole-ParallelClusterUserRole-10T744D833QZH/i-0423d0720df91381b
is not authorized to perform: fsx:DescribeVolumes on resource: arn:aws:fsx:eu-west-3:123456789012:volume/*/*
because no identity-based policy allows the fsx:DescribeVolumes action"

Custom Scripts

On top of CCME specific configurations, you can integrate your own custom scripts to CCME. To deploy a cluster embedding and executing your own custom scripts, you must place them in the CCME/custom directory and synchronize this directory in the S3 bucket. You can provide your own Ansible playbooks or Bash scripts to add specific configurations to the HeadNode, or to the Compute Nodes, or to all nodes. Ansible playbooks and Bash scripts are executed in this order:

  1. install-*-all.yaml: run Ansible playbook on all nodes (Head and Compute nodes)

  2. install-*-head.yaml: run Ansible playbook on Head Node only

  3. install-*-compute.yaml: run Ansible playbook on Compute Nodes only

  4. install-*-all.sh: run Bash script on all nodes (Head and Compute nodes)

  5. install-*-head.sh: run Bash script on Head Node only

  6. install-*-compute.sh: run Bash script on Compute Nodes only

To load the CCME environment variables on custom Bash script, source: /etc/ccme/ccme.env.sh

Warning

Do not try to use any of the CCME ansible tasks in the CCME role, as there might be dependencies your custom scripts will not inherite automatically. The custom scripts have access to the following variables that you can use:

  • CCME_CONF: Path to CCME configuration file (YAML format in ansible scripts or bash format in shell scripts)

  • CCME_DEPENDENCIES: Path to CCME list of dependencies (YAML format)

You can then load these files if needed with the following tasks in your playbooks:

- name: Load global CCME variables
  ansible.builtin.include_vars:
    file: "{{ CCME_CONF }}"
    name: ccme_conf

- name: Load CCME dependencies
  ansible.builtin.include_vars:
    file: "{{ CCME_DEPENDENCIES }}"
    name: ccme_deps

- name: Load local CCME environment variables
  ansible.builtin.include_vars:
    file: "/etc/ccme/ccme.env.yaml"
    name: ccme_env_var

To update the CCME solution bucket from the Management Host, use the updateCCME.sh command.

$ ../../management/sbin/scripts/updateCCME.sh
updateCCME <S3BUCKET> <OPTIONAL: CCME.CONF> <OPTIONAL: AWS CREDENTIAL PROFILE>
 - S3BUCKET:  Name of the S3 bucket solution to upload CCME
 - CCME.CONF: Path to a ccme.conf file to replace the bucket

Note

Using the updateCCME.sh command on a Management Host does not require to specify a ccme.conf file. It will take the correct CCME conf file from: /opt/ccme/CCME/conf/ccme.conf.

Management

Note

The aws region is a required parameter, as --region option from the CLI or from the region option in the ParallelCluster configuration file used with the command.

If the aws region is specified in command line interface and in the cluster configuration file, the selected region will be the one from the CLI in priority.

Create cluster

To create a cluster, use the following command

pcluster create-cluster --cluster-name "${cluster_name}" --cluster-configuration ~/.parallelcluster/"${configuration_file}" --region "${aws_region}"

Note

If you are creating your first clusters (or a first cluster in a new environment), it is strongly recommended to create this cluster in debug mode, by setting the rollback-on-failure pcluster parameter to false with --rollback-on-failure false as shown in the command below.

pcluster create-cluster --rollback-on-failure false --cluster-name "${cluster_name}" --cluster-configuration ~/.parallelcluster/"${configuration_file}" --region "${aws_region}"

Delete cluster

pcluster delete-cluster --cluster-name "${cluster_name}" --region "${aws_region}"

List clusters

pcluster list-cluster --region "${aws_region}"

Connect to the clusters

The possibilities to connect to a deployed cluster(s) are:

  • Use the administrator account (sudoer)

    • ec2-user or rocky depending of the selected OS

    • The ssh key associated to this user is selected in the cluster config file at deployment

  • Use any user from the ActiveDirectory authorized to connect to the cluster

    • With the tuple username + password

    • With the username + ssh key (after first username + password authentication) - The ssh key is available in the user home ~/.ssh/ after a first username + password authentication

Build a CCME AMI

From an AWS ParallelCluster AMI

First of all, locate the base ParallelCluster AMI you want to customize. From the console, go to EC2/AMIs and search for aws-parallelcluster-3.X.Y (replace X.Y with the current version of ParallelCluster supported by CCME). Select the AMI that corresponds to the target OS and architecture (x86_64 or arm64) and launch an instance with this AMI, and attach a role with read access to the CCME S3 bucket. Then SSH to the instance.

You will need to retrieve the prepare-ami.sh script provided in the CCME/sbin directory of you CCME S3 bucket, or simply copy/paste the current version provided here:

#!/bin/bash
################################################################################
# Copyright (c) 2017-2025 UCit SAS
# All Rights Reserved
#
# This software is the confidential and proprietary information
# of UCit SAS ("Confidential Information").
# You shall not disclose such Confidential Information
# and shall use it only in accordance with the terms of
# the license agreement you entered into with UCit.
################################################################################

# This script can be used when preparing a CCME ami to be used with AWS ParallelCluster.
# Exit if anything goes wrong
set -Eu -o pipefail

function print_error {
    read -r line file <<<"$(caller || true)"
    echo "An error occurred in line ${line} of file ${file}:" >&2
    sed "${line}q;d" "${file}" >&2

    if [[ "${CCME_DEBUG:-}" != "true" ]]; then
      exit 1
    else
      echo "DEBUG mode, continuing"
    fi
}
trap print_error ERR


# Set null globs to prevent unwanted iterations over unexisting files
shopt -s nullglob

# History will not be saved
set +o history

# Arguments
# $1 => name of the S3 CCME bucket (e.g., ccme_bucket)
# $2 => optional, additional tags (CSV format) to be passed to ansible roles
if [[ $# -lt 1 ]]; then
  echo "Error, you need to specify the name of the S3 bucket that contains CCME (e.g., my_bucket, or my_bucket/subdir)"
  echo "$0 ccme_bucket_name [tags]"
  echo ""
  echo " - ccme_bucket_name: name of the bucket containing CCME sources. This instance must have read access to this bucket."
  echo " - tags: list of Ansible tags to be passed to CCME roles (CSV format: tag1,tag2...)"
  echo "   currently supported tags:"
  echo "      + dcv"
  echo "      + headnode_only"
  exit 1
fi

addtags=""
if [[ $# -eq 2 ]]; then
   addtags+="$2"
fi

# Logs
declare logFile="/var/log/ccme.prepare-ami.log"
touch "${logFile}"; chmod -v 600 "${logFile}"
exec  > >(awk '{printf "[%s] %s\n", strftime("%FT%T%z"), $0; fflush()}' >>"${logFile}" || true)
exec 2>&1; set -x

echo "**** CCME Prepare AMI - START ****"
# remove potential trailing '/' in $1
ccme_bucket="${1%/}"
echo "Will retrieve CCME from ${ccme_bucket}"

# Define CCME variables as we are not running this script inside an actual cluster
CCME_DIR="/opt/CCME"
CCME_DEPENDENCIES="${CCME_DIR}/dependencies.yaml"
CCME_ENV_DIR="/etc/ccme"
CCME_AMI_FILE="${CCME_ENV_DIR}/ccme.ami.info"
CCME_CONF="${CCME_CONF:-}"

if [[ -f "/etc/profile.d/proxy.sh" ]]; then
  # shellcheck source=/dev/null
  . "/etc/profile.d/proxy.sh"
fi

# We add /usr/local/bin to PATH as the aws cli can be available there
export PATH="${PATH}:/usr/local/bin"

# Cleanup any potential CCME_DIR to start from a fresh directory
mkdir -p "${CCME_DIR}"
rm -rf "${CCME_DIR:?}/"*

# Download CCME from repo
aws s3 cp --recursive "s3://${ccme_bucket}/CCME/" "${CCME_DIR}/"

# Create and load Python Environment
# shellcheck disable=SC2154,SC1091
. "${CCME_DIR}/sbin/setup.pyenv.sh" install "${CCME_DEPENDENCIES}" "${CCME_DIR}/sbin/requirements.txt"
# shellcheck disable=SC2154,SC1091
. "${CCME_DIR}/sbin/setup.pyenv.sh" activate


# Define a new log file to redirect outputs
function new_log_file() {
  # $1: log file basename
  logfile="/var/log/ccme.${1}.log"
  touch "${logfile}" >/dev/null 2>&1
  chmod -v 600 "${logfile}" >/dev/null 2>&1
  echo "${logfile}"
}

# Define method to call ansible-playbook
function anspb() {
  local _anslogfile
  local _targettag
  local _pyexec
  local _extra_vars

  _anslogfile=$(new_log_file "$(basename "$1" .yaml)-AMIBuild")
  _targettag="${2:-build}"

  _extra_vars=""
  if [[ -n "${CCME_CONF}" ]]; then
    _extra_vars="CCME_CONF=${CCME_CONF}"
  fi

  echo "Running playbook $1 with tag ${_targettag} - outputs will be written to ${_anslogfile}"
  # As we are running inside a virtual environment, we need to explicitely set
  # the python interpreter that we want to use through ansible_python_interpreter
  _pyexec=$(command -v python3)
  # shellcheck disable=SC2154
  ansible-playbook -v --tags "${_targettag}" --extra-vars="ansible_python_interpreter=${_pyexec} CCME_DEPENDENCIES=${CCME_DEPENDENCIES} ${_extra_vars}" "$1" >> "${_anslogfile}" 2>&1
}


#### Run our entry point playbook that will apply the ccme role, only the build part
tags="build"
if [[ "${addtags}" != "" ]]; then
  tags+=",${addtags}"
fi
# shellcheck disable=SC2154
anspb "${CCME_DIR}/sbin/deployCCMEAMI.yaml" "${tags}"


#### Create a file indicating that we have built a CCME AMI
# We store the current date and the list of "features" that we have packaged
mkdir -p "${CCME_ENV_DIR}"
date > "${CCME_AMI_FILE}"
echo "${tags}" >> "${CCME_AMI_FILE}"

#### Cleanup
## CCME files
rm -rf "${CCME_DIR}/"{conf,custom,templates}/*
## Ansible cache
rm -rf /tmp/facts_cache
## Cleanup tmp dirs
rm -rf /tmp/* /var/tmp/*
## Cleanup history
shred -u ~/.*history || true
history -c

echo "**** CCME Prepare AMI - END ****"

Note

This script takes as argument the path to CCME S3 bucket (including optional subdir), and optionally a list of Ansible tags to apply on top of the build phase (for example you can specify dcv if you want to preinstall DCV packages to create an AMI for the HeadNode and DCV nodes): If you want to pre-setup certain CCME variables that have an influence on which packages are installed, you can do so by creating a ccme.yaml file, and export the path to this file in CCME_CONF right before running prepare-ami.sh.

export CCME_CONF=/tmp/ccme.conf.yaml
# Force AWS SSM installation
echo "CCME_AWS_SSM: true" > "${CCME_CONF}"
# Ensure all packages are up to date
 echo "CCME_UPDATE: true" >> "${CCME_CONF}"
# Launch AMI preparation script
sudo bash prepare-ami.sh my_ccme_bucket/subdir

As soon as you are ready, run the script using the following command syntax in your terminal (the <dcv> mention below indicates it is optional):

sudo bash prepare-ami.sh my_ccme_bucket/subdir <dcv>

Logs will be written to /var/log/ccme.prepare-ami.log and /var/log/ccme.deployCCME-AMIBuild.log. When the script exits with a 0 error code, you can then cleanup the instance (e.g., remove public keys in ~/.ssh/authorized_keys, remove SSH host key pairs…) and prepare it to build a new AMI.

Then in the AWS Console, in EC2, select your instance, and click Actions/Image and templates/Create image.

Once created, you can use this new AMI in the AWS ParallelCluster configuration. See the following parameters:

Note

If you have prepared a CCME AMI, then during the boot phase, some CCME Ansible roles are skipped by default. prepare-ami.sh creates a /etc/ccme/ccme.ami.info file that contains the list of roles that have already been applied during AMI preparation (e.g., build,dcv) . These roles are then automatically skipped during instances startup to speed-up instances preparation.

Warning

It is highly recommended to prepare and use CCME AMIs instead of directly using AWS ParallelCluster AMIs. CCME dynamically downloads and installs packages, which can in time be removed from public repositories.

The good practice is to create:

  • for DCV nodes (Headnode and Linux DCV visualization Nodes):

    • 1 AMI for instances that will have a GPU

    • 1 AMI for instances that will NOT have a GPU

    As the installation process is a bit different in both cases, you cannot use an AMI build without GPU on an instance that does not have a GPU (GPU-related packages/configurations will not be available). The reverse will work (using a GPU AMI on a non-GPU instance) though startup phase will be a bit longer as CCME needs to undo some configurations.

    Use ./prepare-ami.sh <bucket> dcv in both cases.

  • 1 AMI for the compute nodes: ./prepare-ami.sh <bucket>

Using pcluster build-image

Coming soon…