Releases

Release 5.3.5 - January 25, 2024

BUG FIXES

CCME Management Host (CMH)

  • Fixing some bugs during CMH startup on RHEL8

Release 5.3.4 - January 24, 2024

BUG FIXES

CCME Management Host (CMH)

  • Fixing some bugs during CMH startup on RHEL8

Clusters

  • Fixing usage of CCME_WIN_INACTIVE_SESSION_TIME, CCME_WIN_NO_SESSION_TIME and CCME_WIN_NO_BROKER_COMMUNICATION_TIME for Windows DCV sessions: a value of 0 didn’t work to deactivate the timers.

Release 5.3.3 - January 23, 2024

BUG FIXES

CCME Management Host (CMH)

  • Fixing detection of Python 3.9

Release 5.3.2 - January 19, 2024

BUG FIXES

CCME Roles Stack (CRS)

  • Added parameter CCMEKmsAdditionalKey to specify an additional KMS key that will be accessible from the CMH, HeadNode and Compute nodes. (e.g., KMS key used to encrypt AMIs)

Release 5.3.1 - December 14, 2023

BUG FIXES

  • Updated Slurm to version 23.02.7 to fix CVE-2023-49933 to CVE-2023-49938:

    • Slurmd Message Integrity Bypass. CVE-2023-49935. Permits an attacker to reuse root-level authentication tokens when interacting with the slurmd process, bypassing the RPC message hashes which protect against malicious MUNGE credential reuse.

    • Slurm Arbitrary File Overwrite. CVE-2023-49938. Permits an attacker to modified their extended group list used with the sbcast subsystem, and open files with an incorrect set of extended groups.

    • Slurm NULL Pointer Dereference. CVE-2023-49936. Denial of service.

    • Slurm Protocol Double Free. CVE-2023-49937. Denial of service, potential for arbitrary code execution.

    • Slurm Protocol Message Extension. CVE-2023-49933. Allows for malicious modification of RPC traffic that bypasses the message hash checks.

Release 5.3.0 - December 13, 2023

NEW

CCME Management Host (CMH)

  • The parameter MgtHostKeyName is now optional

Clusters

  • Added option CCME_EFADMIN_ID and CCME_EFNOBODY_ID to the clusters.

  • Added options CCME_WIN_LAUNCH_TRIGGER_DELAY and CCME_WIN_LAUNCH_TRIGGER_MAX_ROUNDS to configure Windows DCV sessions startup monitoring processes.

  • Added an example on how to delay DCV and DCVSM services startup on Windows DCV instances to account for custom bootstrap processes.

  • EnginFrame logo can now be customized by adding a logo.png file in CCME/custom directory.

  • Configure /etc/idmap.conf to specify the domain name (when joined to an Active Directory) to allow NFSv4 mounts.

ENHANCEMENTS

  • Updated documentation on troubleshooting FSx for NetApp ONTAP.

BUG FIXES

Clusters

  • Fixed usage of CCME_WIN_CUSTOM_CONF_REBOOT.

  • Fixed Windows DCV sessions when CCME was stored in a subfolder in a bucket.

  • dcv2 feature was defined twice on dcv* partitions.

  • Fixed Alb Lambda for instances concurrency at deployment.

  • Fixed high number of RPC calls from EnginFrame to Slurm, which caused Slurm to sometimes crash.

Release 5.2.0 - November 08, 2023

NEW

CCME Management Host (CMH)

  • Added option application_load_balancer_ingress_cidr to deployment.ccme.conf to specify a CIDR that is allowed to connect to the ALB.

Other

  • Added documentation on how to use FSx for NETAPP ONTAP to share files between Linux and Windows instances.

ENHANCEMENTS

Clusters

  • Updated AWS ParallelCluster to version 3.6.1

  • Retrieve OnNodeUpdated arguments when executing pre-install.sh script (OnNodeStart) in order to allow parameter overriding when updating a cluster.

BUG FIXES

  • Fixed INTERACTIVE_CLASS name in DCV services, must contain only alphanumeric chars or -_

  • Fixed health check of target group associated with EnginFrame rules

  • Fixed cluster-update command: the update-install.sh script ended in error

  • Removed debug level on SSSD on the Management Host

Note

This version updates the following dependencies for the management host. If you are on a private network without Internet connectivity, you must download the following packages and put them in management/pkgs:

  • {{ ansible_architecture }}/pcluster-installer-bundle-3.6.1.209-node-v16.19.0-Linux_{{ ansible_architecture }}-signed.zip

Release 5.1.0 - October 20, 2023

NEW

CCME Roles Stack (CRS)

  • Allow the Management Host Stack to perform action elasticloadbalancing:ModifyLoadBalancerAttributes on the ALB created by CCME

CCME Management Host (CMH)

  • CMH can now use either RHEL8 or Amazon Linux 2023. Amazon Linux 2 has been removed.

Clusters

  • DCV sessions are now limited by default to 1 session per session type (partition) for each user. This is set for DCV sessions running on the Headnode, any dcv* partition, and Windows instances. This configuration can be updated in the associated EnginFrame services.

  • New variable configuration variables:

    • CCME_CUSTOM_SLURMDBD_SETTINGS to specify additional parameters for SlurmDBD (e.g., security variables such as PrivateData).

    • CCME_WIN_TAGS to specify additional tags to be set on the instances of the Windows fleet.

Other

  • Ansible, boto3, AWS CLI have been updated, and Python 3.9 is used on both CMH and clusters for executing Ansible playbooks. This reduces the deployment time by ~100-200s per node.

  • Application Load Balancer can now store its logs within a S3 named with ccme-alb-logs-${StackId} as prefix

ENHANCEMENTS

Cluster

  • Removed builtin services in EnginFrame that couldn’t be used with a CCME cluster.

  • Enable update of Slurm when the version deployed by AWS ParallelCluster is different from the one in deployment.yaml

  • The bucket ccme-s3bucket created for clusters is renamed ccme-cluster-${StackId}

  • Updated DCV to version 2023.15487

BUG FIXES

  • Fixed cluster deployment on aarch64 architecture

  • Fixed Slurm CVE-2023-41914 by updating Slurm to version 23.02.6.

Warning

Fixing Slurm CVE-2023-41914 by updating Slurm to version 23.02.6 requires to re-compile Slurm on the Headnode at deployment time. You need to ensure that the timeout set on the deployment time of your Clusters is high enough to allow this. Update the AWS ParallelCluster configuration files accordingly, such as:

DevSettings:
  Timeouts:
    HeadNodeBootstrapTimeout: 2400
    ComputeNodeBootstrapTimeout: 1800

Release 5.0.0 - October 02, 2023

Updated EULA to version 2.3.

NEW

CCME Roles Stack (CRS)

  • Allow the HeadNode and ComputeNodes to perform the following actions on all ALB

    • elasticloadbalancing:DescribeLoadBalancerAttributes

    • elasticloadbalancing:DescribeListeners

    • elasticloadbalancing:DescribeRules

    • elasticloadbalancing:DescribeTags

  • Allow the Management Host Stack to perform the following actions

    • elasticloadbalancing:AddTags on the ALB created by CCME

    • ec2:CreateTags on the all network-interfaces resources

    • cloudformation:CreateChangeSet

CCME Management Host (CMH)

  • Support RedHat Enterprise Linux 8 (RHEL8) for the CCME Management Host

  • Add optional proxy, no_proxy and pip repository as variables for the CCME Management Host (CMH) and the clusters

  • Add optional custom AMI parameter

  • Add optional security-group parameter

  • Added the possibility to generate custom ParallelCluster configuration files from Jinja2 templates

Clusters

  • Support RedHat Enterprise Linux 8 (RHEL8) for Headnodes and Compute nodes

  • CCME Ansible playbooks have been refactored in an Ansible role

  • New visualization Windows fleet including

    • A Windows fleet launch template deployed by the CCME Management Host (CMH)

    • New variables CCME_WIN_ availables for CCME clusters to configure the Windows fleet (AMI, instance type, configuration files…)

  • EnginFrame

    • Dynamically generate an EnginFrame service for remote visualization for:

      • the headnode

      • each dcv* queue in Slurm

      • Windows fleet

    • Renamed the DCVSM cluster name in EnginFrame from headnode to dcvsm, and made its hosts visible in the EnginFrame Hosts service

    • Add the possibility to automatically register all users belonging to the CCME_EF_ADMIN_GROUP OS group as administrators of EnginFrame

    • Add CCME_EF_ADMIN_PASSWORD as AWS Secret arn parameter to store the EnginFrame admin (efadmin) password for clusters

  • Add the possibility to encrypt all storages using multiple KMS keys at deployment with variables like ccme_kms_

  • Add a custom playbook to fix Nvidia drivers CVE named example.install-fix-nvidiacve-all.yaml

    • The Nvidia driver version is defined in dependencies.yaml through the parameter nvidia_version (set to 515.48.07)

    • Verify Nvidia Drivers presence for CCME clusters in CCME sources, download and install if not present

  • Add default encryption for clusters root volumes for the cluster configuration files

CCME logs are now sent to CloudWatch

  • For the CMH, a new log group name ccme-cmh-<stackID> is created. The following logs are available:

    • CCME logs: /var/log/ccme.*.log

    • Cloud init and cfn init logs: /var/log/cloud-init.log, cloud-init-output.log, cfn-init.log

    • System logs: /var/log/messages and syslog

    • SSD logs: /var/log/sssd/sssd.log and sssd_default.log

  • For each cluster, the logs are sent in the same subgroup as the log group of the cluster (see Amazon CloudWatch Logs cluster logs). The following logs are now available for each cluster:

    • On Head and Compute nodes:

      • All CCME pre/post/update Bash and Ansible scripts logs, including custom scripts

      • DCV logs (for Compute nodes belonging to a dcv* Slurm partition)

    • On the Headnode:

      • EnginFrame logs

      • DCVSM broker and agent logs

ENHANCEMENTS

Documentation

  • Add Active Directory users troubleshoot section to the CCME documentation

  • The documentation requirements relating to the AWS network environment has been updated

    • Information relating to the subnets requirements are more explicit

    • Add specification for Internet Gateway and Network Gateway depending on multiple networks cases

  • The help function of the deployCCME.sh script is more verbose

CCME Management Host (CMH)

  • The resources created by the CMH stack base their name on the CMH stack id instead of CMH stack name

  • Upgrading AWS ParallelCluster to version 3.6.0

  • The parameters ccme_bucket and ccme_bucket_subfolder are merged to a new parameter ccme_bucket_path

Clusters

  • Gnome initial setup is disabled on HeadNode and DCV nodes

  • The ALB URL can be replaced by a custom DNS name using the new CCME_DNS variable. This can be used to redirect both EnginFrame and DCV URLs through CCME_DNS.

  • Improve the robustness and idempotency of the Ansible tasks

  • Upgrading EnginFrame to version 2021.0-r1667

  • Using native Nvidia Parallelcluster drivers (version 470.141.03) to reduce clusters deployment by approximately 2 minutes

  • The name of the CCME configuration file uploaded available in /opt/CCME/conf/ is now based on its CMH name (e.g., CMH-Myfirstcmh.ccme.conf)

Security

  • Improve the security by adding restriction on CloudFormation usage based on stack tags

  • Improve the CCME security by using IMDS v2 token to retrieve EC2 metadata

Other
  • CRS CloudFormation template has been split into several templates to fit into template size restrictions for CloudFormation

  • CMH CloudFormation template has been split into several templates to fit into template size restrictions for CloudFormation

  • CMH stack tags are now propagated to:

    • CMH EBS

    • CMH ActiveDirectory (if created by the CMH stack)

BUG FIXES

CCME Roles Stack (CRS)

  • Fixed iam:PassRole with the parameter CustomIamPathPrefix in the CCME Roles Stack (CRS)

  • Fixed missing optional AWS Route53 policies in the CCME Roles Stack

  • Fixed ec2:RunInstances authorization for the computes deployment in placement-group

  • Fixed tags associated to the CCME Roles Stack (CRS) and CCME Management Host (CMH) deployed with the deployCCME.sh script

  • Fixed CCME Management Host (CMH) ALB policy allowing to update the Application Load Balancer (ALB) certificate with elasticloadbalancing:ModifyListener

CCME Management Host (CMH)

  • Fixed tag Name for the CCME Management Host (CMH)

  • Fixed no multiple EBS storage on CMH

  • Fixed optional SNS notification at cluster deployment

    • The CCME Management Host (CMH) parameter CCMESnsALB becomes CCMEAdminSnsTopic

    • The default value is now NONE instead of *

  • On the Management Host /var/log/ccme.ccme-start.log now correctly displays the logs on individual lines

  • Fixed FSx policies (fsx:Describe*) for deployment with FSxOnTap storage

Clusters

  • Fixed JWT headers decoding when using OIDC authentication

  • Fixed management_stack_role variable description in the deployment configuration file of CCME

  • Fixed dependencies downloading when external repositories take time to respond

  • Fixed url redirection of the EnginFrame logo

  • Fixed install of the S3FS-Fuse latest version

    • x86_64 (1.9*)

    • aarch64 (== 1.93-1)

  • Fixed S3FS mount point with aarch64 architecture and IMDs v2

  • Fixed configuration file for ARM clusters

  • Fixed AWS SSM installation when a previous version is already installed

  • Fixed EFA usage on computes with the compute security group deployed with the CCME Management Host (CMH) stack

  • Fixed classic cluster configuration file

  • Fixed /opt/CCME NFS export: compute nodes can now be in different networks and AZs than the headnode

  • Fixed mode for the CCME env file

  • Fixed MariaDB installation with aarch64 for ALinux 2 OS with the next packages:

    • MariaDB-Server: 10.6.9-1

    • Galera: 4-26.4.12-1

  • Fixed CCME ALB Lambda policy, explicitly allowing the lambda to:

    • logs:CreateLogStream

    • logs:PutLogEvents

    • elasticloadbalancing:AddTags

  • Fixed compute egress security group

  • Fixed versions for pip packages

  • Fixed retrieval of pricing data through Slurm epilog script: use IMDSv2 to retrieve metadata

  • Fixed hosts status in EnginFrame when they are in IDLE+CLOUD Status in Slurm

Release 4.2.0 - May 17, 2023

NEW

  • CCME now supports in AWS region Stockholm (eu-north-1)

  • AWS IAM Roles support for CCME management, lambdas and clusters

  • Automated deployment of the CCME Roles Stack (CRS) with deployCCME.sh

ENHANCEMENTS

  • CCME dependencies packages are not required anymore

  • Upgrading DCV to 2023.0 (15065)

  • Upgrading DCVSM to 2023.0

    • Broker: 392-1

    • Agent: 675-1

BUG FIXES

  • Add fix to anticipate the resolution of a colord profile dialog box issue in virtual DCV sessions

  • Add fix to remove screenlock in Gnome screensaver settings

  • Add fix to force the minimize and maximize buttons to appear in the top right corner of the Windows in Gnome-based DCV sessions

Release 4.1.0 - April 21, 2023

NEW

  • Amazon SSM is now installed on the CCME Management Host (CMH)

ENHANCEMENTS

  • Separation of public and privates subnets for security improvement

    • The component Application Load Balancer is created in public subnets separated from other components

    • The components Active Directory, Management Host and the clusters are now in privates subnets

  • Upgrading PCluster to 3.5.0

  • Upgrading Ansible to 4.10.0

  • Upgrading Pip to 23.0.1

  • Support usage of IMDs v2 for CCME clusters

BUG FIXES

  • xorg.conf is now configured correctly for DCV on instances equipped with GPUs with HardDPMS option set to false (and option UseDisplayDevice removed)

  • Amazon SSM usage on clusters, including HeadNodes and ComputeNodes

  • Cluster time-out, including separated variable for HeadNode and ComputeNode are now configurable

  • Cluster update is now working correctly

  • The first visualization session / the first job starting dynamic node is now executed correctly after the end of node configuration

  • Fixed ALB rules creation for DCV nodes when lots of nodes are deployed at the same time.

  • EnginFrame services are not reset when the cluster is updated

Release 4.0.0 - January 24, 2023

NEW

  • Multiple S3 Buckets can now be mounted through S3FS

  • Cluster can be deployed in VPCs different than the CCME Management’s VPC

  • Support pre-existing AWS Application Load Balancer (ALB)

  • Support pre-existing Active Directory internal/external to AWS using LDAP

  • Support list of key:value tags for CCME Management Host and Clusters

  • Integration of custom scripts execution

  • Support optional authentication with OIDC to the EnginFrame portal

  • Support mounted files systems as mountpount for user home by setting the fallback_homedir option in sssd

  • Support timezone configuration for CMH and cluster instances

ENHANCEMENTS

  • Enforce TLS requirements in CCME S3 policies

  • Upgrading PCluster to 3.2.0

  • Upgrading Slurm to 21.08.8-2

  • Upgrading DCV to 2022.1 (13067)

  • Upgrading DCVSM to 2022.1

    • Broker: 355-1

    • Agent: 592-1

  • Upgrading EnginFrame to 2021.0-r1646

  • Updrading Nvidia Drivers to 515.48.07

  • Deploy DCV on compute nodes depending of the presence of dcv in partition(s) name(s)

  • No configuration action is required to start a first pre-configured cluster

  • The cluster policies are now generated by the ManagementHost

  • The cluster component named master have been renamed to headnode

  • Possibility to specify the authorized CIDR for the frontend ALB

  • Automated creation of private S3 bucket to use as the AWS ParallelCluster CustomS3Bucket configuration

  • Management Host public IP configuration can be set to NONE

DEPRECATED

  • Removing CCME Command Line Interface (CCME-CLI) support

  • Removing Ganglia support

Release 3.0.0 - March 23, 2021

NEW

  • Adding a common secured balanced https entry point

    • EnginFrame portal

    • Ganglia

    • DCV sessions

  • Adding dedicated stack to deploy Management Host

  • Adding user centralized authentication through directory services (AD)

    • Secure access to cluster through selected groups in the AD

    • Secure access to Management Host through selected groups in the AD

  • Adding a CCME Command Line Interface for Management Host

    • Start, Stop, Update, Delete a cluster

    • Possibility to set a time-to-live to a cluster

  • Updating HeadNode so that it uses DCVSessionManager as its session viewer

  • Adding documentation

ENHANCEMENTS

  • Upgrading to AWS ParallelCluster 2.10.1

  • Updating Slurm to 20.02.4

  • Upgrading DCV to 2020.2 (9508)

  • Upgrading EnginFrame to 2020.0-r58

  • Adding option to specify on which partition(s) DCV should be deployed

DEPRECATED

  • Removing BeeGFS support

BUG FIXES

  • Fixing S3 Bucket policies