Releases
Release 5.3.2 - January 19, 2024
BUG FIXES
CCME Roles Stack (CRS)
Added parameter
CCMEKmsAdditionalKey
to specify an additional KMS key that will be accessible from the CMH, HeadNode and Compute nodes. (e.g., KMS key used to encrypt AMIs)
Release 5.3.1 - December 14, 2023
BUG FIXES
Updated Slurm to version 23.02.7 to fix CVE-2023-49933 to CVE-2023-49938:
Slurmd Message Integrity Bypass. CVE-2023-49935. Permits an attacker to reuse root-level authentication tokens when interacting with the slurmd process, bypassing the RPC message hashes which protect against malicious MUNGE credential reuse.
Slurm Arbitrary File Overwrite. CVE-2023-49938. Permits an attacker to modified their extended group list used with the sbcast subsystem, and open files with an incorrect set of extended groups.
Slurm NULL Pointer Dereference. CVE-2023-49936. Denial of service.
Slurm Protocol Double Free. CVE-2023-49937. Denial of service, potential for arbitrary code execution.
Slurm Protocol Message Extension. CVE-2023-49933. Allows for malicious modification of RPC traffic that bypasses the message hash checks.
Release 5.3.0 - December 13, 2023
NEW
CCME Management Host (CMH)
The parameter
MgtHostKeyName
is now optional
Clusters
Added option
CCME_EFADMIN_ID
andCCME_EFNOBODY_ID
to the clusters.Added options
CCME_WIN_LAUNCH_TRIGGER_DELAY
andCCME_WIN_LAUNCH_TRIGGER_MAX_ROUNDS
to configure Windows DCV sessions startup monitoring processes.Added an example on how to delay DCV and DCVSM services startup on Windows DCV instances to account for custom bootstrap processes.
EnginFrame logo can now be customized by adding a
logo.png
file inCCME/custom
directory.Configure
/etc/idmap.conf
to specify the domain name (when joined to an Active Directory) to allow NFSv4 mounts.
ENHANCEMENTS
Updated documentation on troubleshooting FSx for NetApp ONTAP.
BUG FIXES
Clusters
Fixed usage of
CCME_WIN_CUSTOM_CONF_REBOOT
.Fixed Windows DCV sessions when CCME was stored in a subfolder in a bucket.
dcv2
feature was defined twice ondcv*
partitions.Fixed Alb Lambda for instances concurrency at deployment.
Fixed high number of RPC calls from EnginFrame to Slurm, which caused Slurm to sometimes crash.
Release 5.2.0 - November 08, 2023
NEW
CCME Management Host (CMH)
Added option
application_load_balancer_ingress_cidr
todeployment.ccme.conf
to specify a CIDR that is allowed to connect to the ALB.
Other
Added documentation on how to use FSx for NETAPP ONTAP to share files between Linux and Windows instances.
ENHANCEMENTS
Clusters
Updated AWS ParallelCluster to version 3.6.1
Retrieve
OnNodeUpdated
arguments when executingpre-install.sh
script (OnNodeStart
) in order to allow parameter overriding when updating a cluster.
BUG FIXES
Fixed
INTERACTIVE_CLASS
name in DCV services, must contain only alphanumeric chars or-_
Fixed health check of target group associated with EnginFrame rules
Fixed
cluster-update
command: theupdate-install.sh
script ended in errorRemoved debug level on SSSD on the Management Host
Note
This version updates the following dependencies for the management host. If you are on a private
network without Internet connectivity, you must download the following packages and put them in
management/pkgs
:
{{ ansible_architecture }}/pcluster-installer-bundle-3.6.1.209-node-v16.19.0-Linux_{{ ansible_architecture }}-signed.zip
Release 5.1.0 - October 20, 2023
NEW
CCME Roles Stack (CRS)
Allow the Management Host Stack to perform action
elasticloadbalancing:ModifyLoadBalancerAttributes
on the ALB created by CCME
CCME Management Host (CMH)
CMH can now use either RHEL8 or Amazon Linux 2023. Amazon Linux 2 has been removed.
Clusters
DCV sessions are now limited by default to 1 session per session type (partition) for each user. This is set for DCV sessions running on the Headnode, any
dcv*
partition, and Windows instances. This configuration can be updated in the associated EnginFrame services.New variable configuration variables:
CCME_CUSTOM_SLURMDBD_SETTINGS
to specify additional parameters for SlurmDBD (e.g., security variables such asPrivateData
).CCME_WIN_TAGS
to specify additional tags to be set on the instances of the Windows fleet.
Other
Ansible, boto3, AWS CLI have been updated, and Python 3.9 is used on both CMH and clusters for executing Ansible playbooks. This reduces the deployment time by ~100-200s per node.
Application Load Balancer can now store its logs within a S3 named with
ccme-alb-logs-${StackId}
as prefix
ENHANCEMENTS
Cluster
Removed builtin services in EnginFrame that couldn’t be used with a CCME cluster.
Enable update of Slurm when the version deployed by AWS ParallelCluster is different from the one in
deployment.yaml
The bucket
ccme-s3bucket
created for clusters is renamedccme-cluster-${StackId}
Updated DCV to version
2023.15487
BUG FIXES
Fixed cluster deployment on aarch64 architecture
Fixed Slurm CVE-2023-41914 by updating Slurm to version
23.02.6
.
Warning
Fixing Slurm CVE-2023-41914 by updating Slurm to version 23.02.6
requires to re-compile Slurm
on the Headnode at deployment time. You need to ensure that the timeout set on the deployment time
of your Clusters is high enough to allow this. Update the AWS ParallelCluster configuration files
accordingly, such as:
DevSettings:
Timeouts:
HeadNodeBootstrapTimeout: 2400
ComputeNodeBootstrapTimeout: 1800
Release 5.0.0 - October 02, 2023
Updated EULA to version 2.3.
NEW
CCME Roles Stack (CRS)
Allow the HeadNode and ComputeNodes to perform the following actions on all ALB
elasticloadbalancing:DescribeLoadBalancerAttributes
elasticloadbalancing:DescribeListeners
elasticloadbalancing:DescribeRules
elasticloadbalancing:DescribeTags
Allow the Management Host Stack to perform the following actions
elasticloadbalancing:AddTags
on the ALB created by CCME
ec2:CreateTags
on the all network-interfaces resources
cloudformation:CreateChangeSet
CCME Management Host (CMH)
Support RedHat Enterprise Linux 8 (RHEL8) for the CCME Management Host
Add optional proxy, no_proxy and pip repository as variables for the CCME Management Host (CMH) and the clusters
Add optional custom AMI parameter
Add optional security-group parameter
Added the possibility to generate custom ParallelCluster configuration files from Jinja2 templates
Clusters
Support RedHat Enterprise Linux 8 (RHEL8) for Headnodes and Compute nodes
CCME Ansible playbooks have been refactored in an Ansible role
New visualization Windows fleet including
A Windows fleet launch template deployed by the CCME Management Host (CMH)
New variables
CCME_WIN_
availables for CCME clusters to configure the Windows fleet (AMI, instance type, configuration files…)EnginFrame
Dynamically generate an EnginFrame service for remote visualization for:
the headnode
each
dcv*
queue in SlurmWindows fleet
Renamed the DCVSM cluster name in EnginFrame from
headnode
todcvsm
, and made its hosts visible in the EnginFrame Hosts serviceAdd the possibility to automatically register all users belonging to the
CCME_EF_ADMIN_GROUP
OS group as administrators of EnginFrameAdd
CCME_EF_ADMIN_PASSWORD
as AWS Secret arn parameter to store the EnginFrame admin (efadmin) password for clustersAdd the possibility to encrypt all storages using multiple KMS keys at deployment with variables like
ccme_kms_
Add a custom playbook to fix Nvidia drivers CVE named
example.install-fix-nvidiacve-all.yaml
The Nvidia driver version is defined in dependencies.yaml through the parameter
nvidia_version
(set to515.48.07
)Verify Nvidia Drivers presence for CCME clusters in CCME sources, download and install if not present
Add default encryption for clusters root volumes for the cluster configuration files
CCME logs are now sent to CloudWatch
For the CMH, a new log group name
ccme-cmh-<stackID>
is created. The following logs are available:
CCME logs:
/var/log/ccme.*.log
Cloud init and cfn init logs:
/var/log/cloud-init.log
,cloud-init-output.log
,cfn-init.log
System logs:
/var/log/messages
andsyslog
SSD logs:
/var/log/sssd/sssd.log
andsssd_default.log
For each cluster, the logs are sent in the same subgroup as the log group of the cluster (see Amazon CloudWatch Logs cluster logs). The following logs are now available for each cluster:
On Head and Compute nodes:
All CCME pre/post/update Bash and Ansible scripts logs, including custom scripts
DCV logs (for Compute nodes belonging to a
dcv*
Slurm partition)On the Headnode:
EnginFrame logs
DCVSM broker and agent logs
ENHANCEMENTS
Documentation
Add Active Directory users troubleshoot section to the CCME documentation
The documentation requirements relating to the AWS network environment has been updated
Information relating to the subnets requirements are more explicit
Add specification for Internet Gateway and Network Gateway depending on multiple networks cases
The help function of the
deployCCME.sh
script is more verbose
CCME Management Host (CMH)
The resources created by the CMH stack base their name on the CMH stack id instead of CMH stack name
Upgrading AWS ParallelCluster to version
3.6.0
The parameters
ccme_bucket
andccme_bucket_subfolder
are merged to a new parameterccme_bucket_path
Clusters
Gnome initial setup is disabled on HeadNode and DCV nodes
The ALB URL can be replaced by a custom DNS name using the new
CCME_DNS
variable. This can be used to redirect both EnginFrame and DCV URLs throughCCME_DNS
.Improve the robustness and idempotency of the Ansible tasks
Upgrading EnginFrame to version
2021.0-r1667
Using native Nvidia Parallelcluster drivers (version
470.141.03
) to reduce clusters deployment by approximately 2 minutesThe name of the CCME configuration file uploaded available in
/opt/CCME/conf/
is now based on its CMH name (e.g.,CMH-Myfirstcmh.ccme.conf
)
Security
Improve the security by adding restriction on CloudFormation usage based on stack tags
Improve the CCME security by using IMDS v2 token to retrieve EC2 metadata
- Other
CRS CloudFormation template has been split into several templates to fit into template size restrictions for CloudFormation
CMH CloudFormation template has been split into several templates to fit into template size restrictions for CloudFormation
CMH stack tags are now propagated to:
CMH EBS
CMH ActiveDirectory (if created by the CMH stack)
BUG FIXES
CCME Roles Stack (CRS)
Fixed
iam:PassRole
with the parameterCustomIamPathPrefix
in the CCME Roles Stack (CRS)Fixed missing optional AWS Route53 policies in the CCME Roles Stack
Fixed
ec2:RunInstances
authorization for the computes deployment in placement-groupFixed tags associated to the CCME Roles Stack (CRS) and CCME Management Host (CMH) deployed with the
deployCCME.sh
scriptFixed CCME Management Host (CMH) ALB policy allowing to update the Application Load Balancer (ALB) certificate with
elasticloadbalancing:ModifyListener
CCME Management Host (CMH)
Fixed tag
Name
for the CCME Management Host (CMH)Fixed no multiple EBS storage on CMH
Fixed optional SNS notification at cluster deployment
The CCME Management Host (CMH) parameter
CCMESnsALB
becomesCCMEAdminSnsTopic
The default value is now
NONE
instead of*
On the Management Host
/var/log/ccme.ccme-start.log
now correctly displays the logs on individual linesFixed FSx policies (
fsx:Describe*
) for deployment with FSxOnTap storage
Clusters
Fixed JWT headers decoding when using OIDC authentication
Fixed
management_stack_role
variable description in the deployment configuration file of CCMEFixed dependencies downloading when external repositories take time to respond
Fixed url redirection of the EnginFrame logo
Fixed install of the S3FS-Fuse latest version
x86_64
(1.9*
)
aarch64
(== 1.93-1
)Fixed S3FS mount point with
aarch64
architecture and IMDs v2Fixed configuration file for ARM clusters
Fixed AWS SSM installation when a previous version is already installed
Fixed EFA usage on computes with the compute security group deployed with the CCME Management Host (CMH) stack
Fixed classic cluster configuration file
Fixed
/opt/CCME
NFS export: compute nodes can now be in different networks and AZs than the headnodeFixed mode for the CCME env file
Fixed
MariaDB
installation withaarch64
forALinux 2
OS with the next packages:
MariaDB-Server:
10.6.9-1
Galera:
4-26.4.12-1
Fixed CCME ALB Lambda policy, explicitly allowing the lambda to:
logs:CreateLogStream
logs:PutLogEvents
elasticloadbalancing:AddTags
Fixed compute egress security group
Fixed versions for pip packages
Fixed retrieval of pricing data through Slurm epilog script: use IMDSv2 to retrieve metadata
Fixed hosts status in EnginFrame when they are in
IDLE+CLOUD
Status in Slurm
Release 4.2.0 - May 17, 2023
NEW
CCME now supports in AWS region
Stockholm
(eu-north-1)AWS IAM Roles support for CCME management, lambdas and clusters
Automated deployment of the CCME Roles Stack (CRS) with
deployCCME.sh
ENHANCEMENTS
CCME dependencies packages are not required anymore
Upgrading DCV to
2023.0 (15065)
Upgrading DCVSM to
2023.0
Broker:
392-1
Agent:
675-1
BUG FIXES
Add fix to anticipate the resolution of a colord profile dialog box issue in virtual DCV sessions
Add fix to remove screenlock in Gnome screensaver settings
Add fix to force the minimize and maximize buttons to appear in the top right corner of the Windows in Gnome-based DCV sessions
Release 4.1.0 - April 21, 2023
NEW
Amazon SSM is now installed on the CCME Management Host (CMH)
ENHANCEMENTS
Separation of public and privates subnets for security improvement
The component
Application Load Balancer
is created in public subnets separated from other componentsThe components
Active Directory
,Management Host
and the clusters are now in privates subnetsUpgrading PCluster to
3.5.0
Upgrading Ansible to
4.10.0
Upgrading Pip to
23.0.1
Support usage of
IMDs v2
for CCME clusters
BUG FIXES
xorg.conf is now configured correctly for DCV on instances equipped with GPUs with HardDPMS option set to false (and option UseDisplayDevice removed)
Amazon SSM usage on clusters, including HeadNodes and ComputeNodes
Cluster time-out, including separated variable for HeadNode and ComputeNode are now configurable
Cluster update is now working correctly
The first visualization session / the first job starting dynamic node is now executed correctly after the end of node configuration
Fixed ALB rules creation for DCV nodes when lots of nodes are deployed at the same time.
EnginFrame services are not reset when the cluster is updated
Release 4.0.0 - January 24, 2023
NEW
Multiple S3 Buckets can now be mounted through S3FS
Cluster can be deployed in VPCs different than the CCME Management’s VPC
Support pre-existing AWS Application Load Balancer (ALB)
Support pre-existing Active Directory internal/external to AWS using LDAP
Support list of key:value tags for CCME Management Host and Clusters
Integration of custom scripts execution
Support optional authentication with OIDC to the EnginFrame portal
Support mounted files systems as mountpount for user home by setting the fallback_homedir option in sssd
Support timezone configuration for CMH and cluster instances
ENHANCEMENTS
Enforce TLS requirements in CCME S3 policies
Upgrading PCluster to
3.2.0
Upgrading Slurm to
21.08.8-2
Upgrading DCV to
2022.1 (13067)
Upgrading DCVSM to
2022.1
Broker:
355-1
Agent:
592-1
Upgrading EnginFrame to
2021.0-r1646
Updrading Nvidia Drivers to
515.48.07
Deploy DCV on compute nodes depending of the presence of
dcv
in partition(s) name(s)No configuration action is required to start a first pre-configured cluster
The cluster policies are now generated by the ManagementHost
The cluster component named
master
have been renamed toheadnode
Possibility to specify the authorized CIDR for the frontend ALB
Automated creation of private S3 bucket to use as the AWS ParallelCluster
CustomS3Bucket
configurationManagement Host public IP configuration can be set to
NONE
DEPRECATED
Removing CCME Command Line Interface (CCME-CLI) support
Removing Ganglia support
Release 3.0.0 - March 23, 2021
NEW
Adding a common secured balanced https entry point
EnginFrame portal
Ganglia
DCV sessions
Adding dedicated stack to deploy Management Host
Adding user centralized authentication through directory services (AD)
Secure access to cluster through selected groups in the AD
Secure access to Management Host through selected groups in the AD
Adding a CCME Command Line Interface for Management Host
Start, Stop, Update, Delete a cluster
Possibility to set a time-to-live to a cluster
Updating HeadNode so that it uses DCVSessionManager as its session viewer
Adding documentation
ENHANCEMENTS
Upgrading to AWS ParallelCluster
2.10.1
Updating Slurm to
20.02.4
Upgrading DCV to
2020.2 (9508)
Upgrading EnginFrame to
2020.0-r58
Adding option to specify on which partition(s) DCV should be deployed
DEPRECATED
Removing BeeGFS support
BUG FIXES
Fixing S3 Bucket policies