Cluster
Configuration
Global configuration
Add a configuration to the pcluster config file in /home/$(whoami)/.parallelcluster/config
or edit /home/$(whoami)/.parallelcluster/simple
for a simple cluster.
In order to launch a cluster, you first need to create a configuration file for the cluster.
You can either create your own, or edit one of examples provided in /home/$(whoami)/.parallelcluster/
on the Management Host. We provide below an example, in which you must replace each yaml variable (e.g., {{ CCME_VARIABLE }}
)
by the required information. Make sure you use the correct subnets, security groups and policies.
The HeadNode.CustomActions.OnNodeStart.Args
section of the ParallelCluster configuration file can contain a set of
parameters that will influence the configuration of your cluster. Here is the list of supported parameters:
OS related parameters
CCME_NO_PROXY
(optional): string containing a list of hosts to exclude traffic destined to certain hosts from using the proxy. It is used to set theno_proxy
andNO_PROXY
environment variables: content ofCCME_NO_PROXY
is appended to the content of those 2 variables. This is usually used in conjunction with theProxy
option in the AWS ParallelCluster configuration file.
S3 related parameters
CCME_S3FS
(optional): list of S3 buckets to mount through S3FS (the policies attached to the HeadNode and Computes need to have read/write access to these buckets). Format is CSV:CCME_S3FS=bucket1,bucket2,bucket3
(or simplyCCME_S3FS=bucket
if there is a single bucket). If this variable is unset or equal toNONE
, then no bucket is mounted through S3FS.CCME_JSLOGS_BUCKET
(mandatory): name of an S3 bucket on which Slurm accounting logs will be exported (see Slurm accounting logs, the policies attached to the HeadNode and Computes need to have read/write access to this bucket).
ALB related parameters
CCME_OIDC
: a prefix used to locate theCCME/conf/${CCME_OIDC}.oidc.yaml
file that contains configurations for OIDC authentication (see OIDC External Authentication).CCME_DNS
: the DNS name of the ALB (if you are not using the default ALB DNS name), e.g.,my.domain.com
. The cluster url isalb_url/my_cluster_name/portal/
for portal andalb_url/my_cluster_name/dcv-instance-id
as default value. That default url is the the application load balancer and can be replaced by a custom Domain Name Server. If you want to configure your DNS allowing yourpersonal.domain.com
pointing to the Application Load Balancer in order to have replace your url and have apersonal.domain.com/my_cluster/portal/
andpersonal.domain.com/my_cluster/visualization
url for your cluster web access, then set theCCME_DNS
topersonal.domain.com
instead ofNONE
.
User related parameters
CCME_USER_HOME
: The user home can be mounted from a file system, binding theCCME_USER_HOME
variable to the mountpoint path. You can use the%u
parameter to retrieve the username (e.g.,/file-system/home/%u
).
Remote visualization related parameters (see Prerequisites and configuration for a complete description of these parameters):
CCME_WIN_LAUNCH_TEMPLATE_ID
: Launch template used to launch Windows EC2 instances. CCME creates a default launch template when deploying the CMH, but you can setup your own here.CCME_WIN_AMI
: ID of the AMI used to launch Windows EC2 instances (see Prerequisites and configuration for prerequisites).CCME_WIN_INSTANCE_TYPE
: instances type used to launch Windows EC2 instances.CCME_WIN_INACTIVE_SESSION_TIME
,CCME_WIN_NO_SESSION_TIME
andCCME_WIN_NO_BROKER_COMMUNICATION_TIME
: parameters to control the lifecycle of Windows remote visualization sessions.CCME_WIN_TAGS
: dictionary of additional tags to apply on the instances of the Windows fleet (see Starting a Windows DCV session for the list of default tags).
EnginFrame related parameters
CCME_EF_ADMIN_GROUP
: the name of an OS group, users belonging to this group will automatically be promoted as administrators in EnginFrame (no sudo access).CCME_EF_ADMIN_PASSWORD
: ARN of a secret containing the password of EnginFrame admin account. The expected value in the preexisting ARN of a plaintext string stored in AWS Secrets Manager (ASM). Do not set this variable to let CCME generate and store a password in/shared/CCME/ccme.passwords.efadmin
CCME_EFADMIN_ID
: The uid/gid to create the efadmin user locallyCCME_EFNOBODY_ID
: The uid/gid to create the efnobody user locally
Remote access related parameters
CCME_AWS_SSM
: if set to true, AWS SSM agent will be installed on all the nodes, to allow remote connection to them with AWS SSM.
Slurm related parameters
CCME_CUSTOM_SLURMDBD_SETTINGS
: dictionary of specific options to add to SlurmDBD configuration See slurmdbd configuration for possible values. The format must be a valid “YAML dictionary embedded in a string”. Hence, the whole line must be enclosed in double quotes, and then the value ofCCME_CUSTOM_SLURMDBD_SETTINGS
must be the dict enclosed in escaped double quotes. See the following example:"CCME_CUSTOM_SLURMDBD_SETTINGS=\"{'PrivateData': 'jobs,events,accounts,reservations,usage,users', 'PurgeEventAfter': '12'}\""
Note
No parameter must be set on the following sections, as they will be inherited from the
HeadNode.CustomActions.OnNodeStart.Args
parameters:
HeadNode.CustomActions.OnNodeConfigured:.Args
HeadNode.CustomActions.OnNodeUpdated:.Args
HeadNode.CustomActions.OnNodeStart.Args
HeadNode.CustomActions.OnNodeConfigured.Args
Note
The CCME solution is downloaded from S3 on the HeadNode.
Then the download directory is mounted on /opt/CCME
on each ComputeNodes from the HeadNode using NFS.
CCME applies its configurations and installs software on ParallelCluster clusters through a set of Bash and
Ansible scripts. The entry points are the Bash scripts specified in the OnNodeStart, OnNodeConfigured and
OnNodeUpdated parameters of the HeadNode.CustomActions
and Scheduling.SlurmQueues[].CustomActions
sections.
The values presented below (and present in the generated example configuration files) must always be present.
1Region: '{{ AWS_REGION }}'
2CustomS3Bucket: '{{ CCME_CLUSTER_S3BUCKET }}'
3Iam:
4 Roles:
5 LambdaFunctionsRole: '{{ CCME_CLUSTER_LAMBDA_ROLE }}'
6 # If the role associated to the cluster includes a custom IAM path prefix,
7 # replace "parallelcluster" by the custom IAM path prefix.
8 ResourcePrefix: "parallelcluster"
9Image:
10 Os: {{ "alinux2" or "centos7" }}
11Tags:
12 - Key: Owner
13 Value: '{{ CCME_OWNER }}'
14 - Key: Reason
15 Value: '{{ CCME_REASON }}'
16SharedStorage:
17 - Name: shared
18 StorageType: Ebs
19 MountDir: shared
20HeadNode:
21 InstanceType: t3.medium
22 Networking:
23 SubnetId: '{{ CCME_SUBNET }}'
24 SecurityGroups:
25 - '{{ CCME_PRIVATE_SG }}'
26 Ssh:
27 KeyName: '{{ AWS_KEYNAME }}'
28 LocalStorage:
29 RootVolume:
30 Size: 50
31 Encrypted: true
32 CustomActions:
33 OnNodeStart:
34 Script: s3://{{ CCME_SOURCES }}CCME/sbin/pre-install.sh
35 Args:
36 - CCME_CMH_NAME={{ CCME_CMH_NAME }}
37 - CCME_S3FS={{ CCME_DATA_BUCKET }}
38 - CCME_JSLOGS_BUCKET={{ CCME_DATA_BUCKET }}
39 - CCME_EF_ADMIN_PASSWORD="arn:aws:secretsmanager:eu-west-1:012345678910:secret:ccme-prefix-efadmin.password-4riso"
40 - CCME_OIDC=default
41 - CCME_USER_HOME=/file-system/home/%u
42 - CCME_DNS="my.domain.com"
43 - CCME_EF_ADMIN_GROUP="Administrators"
44 - CCME_REPOSITORY_PIP="https://my.pip.domain.com/index/,https://my.pip.domain.com/index-url/"
45 # Optional windows fleet
46 - CCME_WIN_AMI="ami-i..."
47 - CCME_WIN_INSTANCE_TYPE=NONE
48 - CCME_WIN_INACTIVE_SESSION_TIME=600
49 - CCME_WIN_NO_SESSION_TIME=600
50 - CCME_WIN_NO_BROKER_COMMUNICATION_TIME=600
51 - CCME_WIN_CUSTOM_CONF_REBOOT=true
52 - CCME_WIN_LAUNCH_TRIGGER_DELAY=10
53 - CCME_WIN_LAUNCH_TRIGGER_MAX_ROUNDS=100
54 ## CCME_WIN_TAGS allows to add specific tags to instances of the Windows fleet
55 # The format must be a valid "YAML dictionary embedded in a string".
56 # Hence, the whole line must be enclosed in double quotes, and then the value
57 # of CCME_WIN_TAGS must be the dict enclosed in escaped double quotes. See the following example:
58 - "CCME_WIN_TAGS=\"{'MyTagKey1': 'MyTagValue1', 'MyTagKey2': 'MyTagValue2'}\""
59 - CCME_AWS_SSM=true
60 ## CCME_CUSTOM_SLURMDBD_SETTINGS allows to add specific options to SlurmDBD
61 # See https://slurm.schedmd.com/slurmdbd.conf.html for possible values
62 # The format must be a valid "YAML/JSON dictionary embedded in a string".
63 # Hence, the whole line must be enclosed in double quotes, and then the value
64 # of CCME_CUSTOM_SLURMDBD_SETTINGS must be the dict enclosed in escaped double quotes. See the following example:
65 - "CCME_CUSTOM_SLURMDBD_SETTINGS=\"{'PrivateData': 'jobs,events,accounts,reservations,usage,users', 'PurgeEventAfter': '12'}\""
66 OnNodeConfigured:
67 Script: s3://{{ CCME_SOURCES }}CCME/sbin/post-install.sh
68 OnNodeUpdated:
69 Script: s3://{{ CCME_SOURCES }}CCME/sbin/update-install.sh
70 Iam:
71 S3Access:
72 - BucketName: '{{ CCME_BUCKET }}'
73 - EnableWriteAccess: true
74 BucketName: '{{ CCME_DATA_BUCKET }}'
75 AdditionalIamPolicies:
76 - Policy: '{{ CCME_CLUSTER_POLICY_ALB }}'
77 - Policy: '{{ CCME_CLUSTER_POLICY_DCV }}'
78 - Policy: '{{ CCME_CLUSTER_POLICY_EF }}'
79 - Policy: '{{ CCME_CLUSTER_POLICY_JOB_COSTS }}'
80 - Policy: '{{ CCME_CLUSTER_POLICY_SNS }}'
81 - Policy: 'arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore'
82Scheduling:
83 Scheduler: slurm
84 SlurmSettings:
85 Dns:
86 # If the role associated to the cluster is not authorized to use Route 53,
87 # set "DisableManagedDns" to true.
88 DisableManagedDns: False
89 SlurmQueues:
90 - Name: basic-slurm
91 CapacityType: ONDEMAND
92 ComputeSettings:
93 LocalStorage:
94 RootVolume:
95 Size: 50
96 Encrypted: true
97 ComputeResources:
98 - Name: t2-small
99 InstanceType: t2.small
100 MinCount: 0
101 MaxCount: 2
102 CustomActions:
103 OnNodeStart:
104 Script: s3://{{ CCME_SOURCES }}CCME/sbin/pre-install.sh
105 OnNodeConfigured:
106 Script: s3://{{ CCME_SOURCES }}CCME/sbin/post-install.sh
107 Iam:
108 S3Access:
109 - BucketName: '{{ CCME_BUCKET }}'
110 - EnableWriteAccess: true
111 BucketName: '{{ CCME_DATA_BUCKET }}'
112 AdditionalIamPolicies:
113 - Policy: '{{ CCME_CLUSTER_POLICY_ALB }}'
114 - Policy: '{{ CCME_CLUSTER_POLICY_DCV }}'
115 - Policy: '{{ CCME_CLUSTER_POLICY_JOB_COSTS }}'
116 - Policy: '{{ CCME_CLUSTER_POLICY_SNS }}'
117 - Policy: 'arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore'
118 Networking:
119 SubnetIds:
120 - '{{ CCME_SUBNET }}'
121 SecurityGroups:
122 - '{{ CCME_COMPUTE_SG }}'
123 - Name: dcv-basic
124 CapacityType: ONDEMAND
125 ComputeResources:
126 - Name: t3-medium
127 InstanceType: t3.medium
128 MinCount: 0
129 MaxCount: 2
130 CustomActions:
131 OnNodeStart:
132 Script: s3://{{ CCME_SOURCES }}CCME/sbin/pre-install.sh
133 OnNodeConfigured:
134 Script: s3://{{ CCME_SOURCES }}CCME/sbin/post-install.sh
135 Iam:
136 S3Access:
137 - BucketName: '{{ CCME_BUCKET }}'
138 - EnableWriteAccess: true
139 BucketName: '{{ CCME_DATA_BUCKET }}'
140 AdditionalIamPolicies:
141 - Policy: '{{ CCME_CLUSTER_POLICY_ALB }}'
142 - Policy: '{{ CCME_CLUSTER_POLICY_DCV }}'
143 - Policy: '{{ CCME_CLUSTER_POLICY_JOB_COSTS }}'
144 - Policy: '{{ CCME_CLUSTER_POLICY_SNS }}'
145 - Policy: 'arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore'
146 Networking:
147 SubnetIds:
148 - '{{ CCME_SUBNET }}'
149 SecurityGroups:
150 - '{{ CCME_COMPUTE_SG }}'
151DirectoryService:
152 DomainName: {{ CCME_AD_DIR_NAME }}
153 DomainAddr: ldap://{{ CCME_AD_IP_1 }},ldap://{{ CCME_AD_IP_2 }}
154 PasswordSecretArn: {{ CCME_AD_PASSWORD }}
155 DomainReadOnlyUser: cn={{ CCME_AD_READ_ONLY_USER }},ou=Users,ou={{ CCME_AD_DIR_NAME_DC1 }},dc={{ CCME_AD_DIR_NAME_DC1 }},dc={{ CCME_AD_DIR_NAME_DC2 }}
156 LdapTlsReqCert: never # Set "hard" to enable ldaps
157 # LdapTlsCaCert: /opt/CCME/conf/{{ CCME_CA_FILE }} # Set it only with ldaps
158 # LdapAccessFilter:
159 AdditionalSssdConfigs:
160 # debug_level: "0x1ff" # Uncomment for logs, can be heavy
161 ldap_auth_disable_tls_never_use_in_production: True # Don't set it with ldaps
Note
If in the configuration of your cluster you want to use an external resource such as an EFS
or FSx for NetApp file system that hasn’t been deployed by CCME, you will need to ensure that
the targeted resource has at least one tag which name starts with ccme
. For security reasons,
CCME roles only allow CCME to describe and use services that have such ccme*
tags.
We recommend that you use for example an explicit tag named such as: ccme:allow
(the value
is not important, but for readability reasons use a value of true
for example).
Without such a tag, you will get an error message when trying to launch a cluster, for example
when trying to connect an FSx for NetApp ONTAP without a ccme*
tag, you can get an error like:
"message": "Invalid cluster configuration: User:
arn:aws:sts::123456789012:assumed-role/CRS-myrole-ParallelClusterUserRole-10T744D833QZH/i-0423d0720df91381b
is not authorized to perform: fsx:DescribeVolumes on resource: arn:aws:fsx:eu-west-3:123456789012:volume/*/*
because no identity-based policy allows the fsx:DescribeVolumes action"
Custom Scripts
On top of CCME specific configurations, you can integrate your own custom scripts to CCME.
To deploy a cluster embedding and executing your own custom scripts, you must place them in
the CCME/custom
directory and synchronize this directory in the S3 bucket.
You can provide your own Ansible playbooks or Bash scripts to add specific configurations to
the HeadNode, or to the Compute Nodes, or to all nodes.
Ansible playbooks and Bash scripts are executed in this order:
install-\*-all.yaml
: run Ansible playbook on all nodes (Head and Compute nodes)install-\*-head.yaml
: run Ansible playbook on Head Node onlyinstall-\*-compute.yaml
: run Ansible playbook on Compute Nodes onlyinstall-\*-all.sh
: run Bash script on all nodes (Head and Compute nodes)install-\*-head.sh
: run Bash script on Head Node onlyinstall-\*-compute.sh
: run Bash script on Compute Nodes only
To update the CCME solution bucket from the Management Host, use the updateCCME.sh
command.
$ ../../management/sbin/scripts/updateCCME.sh updateCCME <S3BUCKET> <OPTIONAL: CCME.CONF> <OPTIONAL: AWS CREDENTIAL PROFILE> - S3BUCKET: Name of the S3 bucket solution to upload CCME - CCME.CONF: Path to a ccme.conf file to replace the bucketNote
Using the
updateCCME.sh
command on a Management Host does not require to specify accme.conf
file. It will take the correct CCME conf file from:/opt/ccme/CCME/conf/ccme.conf
.
Management
Note
The aws region is a required parameter, as --region
option from the
CLI or from the region option in the ParallelCluster configuration file
used with the command.
If the aws region is specified in command line interface and in the cluster configuration file, the selected region will be the one from the CLI in priority.
Create cluster
To create a cluster, use the following command
pcluster create-cluster --cluster-name "${cluster_name}" --cluster-configuration ~/.parallelcluster/${configuration_file}" --region "${aws_region}"
Note
If you are creating your first clusters (or a first cluster in a new environment),
it is strongly recommended to create this cluster in debug mode, by setting the rollback-on-failure
pcluster
parameter to false
with --rollback-on-failure false
as shown in the command below.
pcluster create-cluster --rollback-on-failure false --cluster-name "${cluster_name}" --cluster-configuration ~/.parallelcluster/${configuration_file}" --region "${aws_region}"
Delete cluster
pcluster delete-cluster --cluster-name "${cluster_name}" --region "${aws_region}"
List clusters
pcluster list-cluster --region "${aws_region}"
Connect to the clusters
The possibilities to connect to a deployed cluster(s) are:
Use the administrator account (sudoer)
centos
orec2-user
depending of the selected OSThe ssh key associated to this user is selected in the cluster config file at deployment
Use any user from the ActiveDirectory authorized to connect to the cluster
With the tuple username + password
With the username + ssh key (after first username + password authentication) - The ssh key is available in the user home
~/.ssh/
after a first username + password authentication