Releases

Release 6.1.5 - June 10, 2025

BUG FIXES

CCME Management Host (CMH)

  • Fixed network connection loss on CMH

Release 6.1.4 - June 04, 2025

BUG FIXES

CCME Roles Stack (CRS)

  • Allow CCME ManagementHost (CMH) to ec2:RunInstances on arn:aws:resource-groups:${AWS::Region}:${AWS::AccountId}:group/*

Release 6.1.3 - June 03, 2025

BUG FIXES

CCME Roles Stack (CRS)

  • Allow CCME ManagementHost (CMH) to ec2:CreateImage

Release 6.1.2 - May 26, 2025

BUG FIXES

CCME Roles Stack (CRS)

  • Allow CCME ManagementHost (CMH) to ec2:DetachVolume.

  • Allow ImageInstanceRole to ec2:CreateTags on volumes, snapshots, instances, network interfaces.

Release 6.1.1 - May 22, 2025

ENHANCEMENTS

CCME Roles Stack (CRS)

  • Allow CCME ManagementHost (CMH) the following actions to help when building custom AMIs:

    • ec2:AttachVolume

    • ec2:DescribeSnapshots

    • ec2:DeleteSnapshot

    • ec2:RegisterImage

BUG FIXES

  • Fixed MariaDB installation, updated to package versions:

    • MariaDB-Server: 10.6.22-1

    • Galera: 4-26.4.22-1

  • Increased timeout on xdcv and dcvagent process in EF Portal/EnginFrame interactive services. On instance startup, the first session sometimes crashed due to slow start of these processes.

Release 6.1.0 - May 06, 2025

NEW

CCME Roles Stack (CRS)

  • Allow CCME ManagementHost (CMH) and clusters to optionally:

    • Read S3 buckets prefixed with GlobalReadBucketPrefix

    • Read-Write S3 buckets prefixed with GlobalReadWriteBucketPrefix

CCME Management Host (CMH)

  • Add LoginNodes security group for login nodes autoscaling instances and the dedicated NetworkLoadBalancer deployed with clusters.

Clusters

  • Added a new way to create and manage Linux visualization nodes through Autoscaling Groups. Allows finer control, and faster deployment time of DCV instances. See Autoscaling fleet for more details.

  • More secure web portal. Ensure HSTS is enabled, and add the following HTTPS headers: X-Content-Type-Options=nosniff.

  • CCME Python virtual environment is now available through source "/opt/scripts/ccme_venv/bin/activate".

Documentation

  • Added the list of URLs to whitelist in order for CCME to download its dependencies (e.g., to build CCME AMIs).

ENHANCEMENTS

CCME Roles Stack (CRS)

  • Allow to add/update/delete tags on the CMH stack

  • Add fsx:Describe* for the ParallelClusterUserRole

CCME Management Host (CMH)

  • All of the KMS keys parameters are now MANDATORY

  • Update AWS ParallelCluster to version 3.12.0

  • Added /var/log/audit/audit.log and /var/log/secure to CloudWatch logs

  • Do no install aws cli version 1 if the version 2 is already installed

  • Packages for AWS ParallelCluster, AWS SSM, AWS Cloudwatch, and cfn-bootstrap are now shipped

    with CCME tarball, to prevent downloading them from Internet in air-gapped environments.

Clusters

  • Updated MariaDB version to 10.6.19 for x86_64 and aarch64

  • Updated Galera version to 4-26.4.19-1 for x86_64 and aarch64

  • Updated DCVSM version to 2024.0 including:

    • DCVSM Broker to 493-1

    • DCVSM Agent to 801-1

  • Add EF Portal version 2024.1-r1842

  • Added an example of custom playbook to add a link in the footer of the web portal: CCME/custom/example.install-portal-footer-head.yaml

  • Added /var/log/audit/audit.log and /var/log/secure to CloudWatch logs

  • Moved installation of OIDC requirements and MariaDB during the ccme_build phase, to limit needs to access to repositories during the ccme_configure phase.

BUG FIXES

  • Fixed error in Logging configuration for the DCV Resolver.

  • Fixed error in CloudWatch configuration for Job scheduler accounting backup log file.

  • Fixed option -s of deployCCME.sh to retrieve roles and populate the CMH configuration file.

  • Fixed job scheduler logs backup service in case of cluster update.

  • Fixed installation of SSM: no more check on GPG signature as AWS changes it often and there is no URL to automate its download.

  • Fixed prepare-ami.sh now correctly reads CCME_CONF file if CCME_CONF is defined

  • Fixed ParallelClusterUserRole for ImageBuilder

DEPRECATED

  • Removing DcvProxyAutoscalingGroupRole from CCME Roles Stack (CRS) and CCME Management Host (CMH) and use the AutoScalingGroup default account role.

Release 6.0.0 - January 21, 2025

Warning

If you are updating CCME from a version prior to 6.0.0, to version 6.0.0, please review Update to 6.0.0 to understand the changes in parameters.

NEW

CCME Roles Stack (CRS)

  • Allow the Management Host Stack to perform the following actions on the CCME Management Host stack resources:

    • Allow the following actions for every resources:

      • ec2:DeleteTags

      • elasticloadbalancing:DescribeListeners

      • elasticloadbalancing:DescribeRules

      • elasticloadbalancing:DescribeTargetGroups

      • elasticloadbalancing:DescribeTargetGroupAttributes

      • elasticloadbalancing:DescribeTargetHealth

      • elasticloadbalancing:DescribeTags

      • autoscaling:DescribeAutoScalingGroups

      • autoscaling:DescribeScalingActivities

    • Allow the following actions for the ccmeDcvReverseProxyAutoScalingGroup autoscaling group resource:

      • autoscaling:CreateAutoScalingGroup

      • autoscaling:DeleteAutoScalingGroup

      • autoscaling:UpdateAutoScalingGroup

      • autoscaling:PutScalingPolicy

      • autoscaling:DeletePolicy

    • Allow the following actions for dcv-* and portal-* target groups and ccmeALB-* listener and rules:

      • elasticloadbalancing:CreateTargetGroup

      • elasticloadbalancing:DeleteTargetGroup

      • elasticloadbalancing:RegisterTargets

      • elasticloadbalancing:CreateRule

      • elasticloadbalancing:DeleteRule

    • Allow ec2:RunInstances with capacity-reservation/*

CCME Management Host (CMH)

  • Add -mr <roles_stack_name> as deployCCME.sh parameter with CMH deployment. This parameter configures the CCME Management Host (CMH) configuration file with CCME roles provided by the CRS (CCME Roles Stack).

  • The number of DCV instances is no longer limited to 100. A new AutoScaling group has been introduced to host Nginx reverse proxies for the DCV streams. The CMH ALB now only has 1 rule for all DCV sessions.

  • Add CCMEManagementDcvProxyInstanceAmi as parameter to the CCME Management Host (CMH)

  • Add the following parameters as mandatory

    • CCMELoggingBucket for CCME Management Host (CMH) parameter

    • ccme_logging_bucket for the deployCCME.sh script

  • Add LoginNodes to the ParallelCluster configuration common template file

  • Add Active Directory management scripts in the directory /opt/scripts/activedirectory/ of the CCME Management Host (CMH) user for Active Directory deployed by CCME only.

Clusters

  • Add support for ParallelCluster LoginNodes.

  • Add option CCME_PASSWORDS_SIZE to define the default passwords length generated by CCME (default is 32 characters).

  • Add CCME_AD_SSH_KEY_ATTRIBUTE as optional cluster parameter allowing to set and use the sshPublicKey attribute provided by Active Directory. It can be set to NONE or any attribute name (e.g.: sshPublicKey for ActiveDirectory deployed by the CCME Management Host stack).

  • Add option CCME_ANSIBLE_SKIP_TAGS to skip some phases of the CCME installation process.

  • When a CCME AMI is created with prepare-ami.sh, you can now specify an additional parameter dcv to preinstall DCV packages in the AMI.

  • CCME now automatically detects if we are using a CCME AMI, and skips some installation/configuration steps when launching an instances. This allows for faster boot time of the instances.

ENHANCEMENTS

CCME Roles Stack (CRS)

  • All of the KMS keys parameters are now MANDATORY

  • Replace the following actions of all CCME role policies, with the individual permissions required to follow the concept of least privilege:

    • s3:*

    • fsx:*

    • elasticfilesystem:*

CCME Management Host (CMH)

  • All of the KMS keys parameters are now MANDATORY

  • Update AWS ParallelCluster to version 3.11.1

  • Update CCME Management Host (CMH) AMIs alinux2023, rhel8 and rhel9 to the latest available version.

  • Add ccme_ingress_security_groups_ssh_cidr_block parameters to the deployCCME.sh script. This parameter corresponds to the ManagementHost template variable CCMEIngressSecurityGroupSshCidr.

  • Add NFSv4 ACL management to the NetApp ONTAP File system creation documentation section.

  • Add S3 logging bucket for ccme-alb-logs-* and ccme-cluster-* buckets created by the CCME Management Host (CMH) stack

  • Add S3 lifecycle policy for ccme-alb-logs-* and ccme-cluster-* buckets created by the CCME Management Host (CMH) stack

Clusters

  • Add Scope and AuthenticationRequestExtraParams as parameters to the oidc.yaml configuration file.

  • Change CCME_AD_GROUP_SUDOER to CCME_SUDOER_GROUP as parameter to configure cluster sudoer group. This parameter is not limited to domain groups anymore.

  • Add NFSv4 ACL management to the NetApp ONTAP File system creation documentation section.

  • Updated DCV Server version to 2024.0 and patch to 18131.

  • Updated DCVSM version to 2024.0 including:

    • DCVSM Broker to 457-1

    • DCVSM Agent to 781-1

  • Updated S3FS-Fuse to version 1.95-1 for aarch64

  • Windows DCV sessions

    • Instance creation process (launch template and run-instance command) now support EC2 Fast Launch

    • More robust restart of DCV and DCVSM services on Windows instances (in case the services were disabled in the AMI).

    • Added tips on how to improve Windows instances start time in the documentation

    • Added details about NVidia drivers version in the documentation

  • Do not truncate log files when updating or replaying CCME scripts/playbooks. Append to existing log files.

  • Allow the use of * for KMS keys in the CCME Role Stack. This is now the default value. Though it is still recommended to specify the KMS keys you want to use.

  • Added more EnginFrame logs to CloudWatch

BUG FIXES

Clusters

  • Fix cluster deployment without Active Directory when the CCME Management Host (CMH) is configured with Active Directory

  • Fix EnginFrame admin web access for users within the CCME_EF_ADMIN_GROUP.

  • Tags specified at the CMH CloudFormation stack level are now passed down to instances’ tags definition in Launch templates.

DEPRECATED

  • Removing support for Centos 7 (not supported by AWS ParallelCluster anymore).

  • Removing EnableKmsEncryption parameter from CCME Roles Stack (CRS)

  • Removing CCMEEnableKmsEncryption parameter from CCME Management Host (CMH)

Release 5.7.2 - September 12, 2024

BUG FIXES

  • Fix CMH update. Configuration files are now correctly regenerated and playbooks re-run during an update.

  • Fix some default values in deployCCME.sh.

  • Tags in generated AWS ParallelCluster configuration files can now have numeric values (added quotes around the values).

  • Do not truncate log files when updating or replaying CCME scripts/playbooks. Append to existing log files.

Warning

Please check out the new requirements for Windows AMIs for remote DCV sessions. See Prerequisites.

Release 5.7.1 - August 06, 2024

BUG FIXES

  • Fix SSM, DCV, DCVSM and MariaDB installation on Rocky9 and RHEL9: GPG keys are not checked as SHA1 is not part of the standard supported encryption algorithm on these OS.

Release 5.7.0 - August 02, 2024

NEW

Clusters

  • CCME_SHARED_DIR must now be defined to specify where CCME will store its shared files (e.g., EnginFrame spoolers and sessions directories). In CCME default templates, this value is set to shared, which is the mountpoint of an EFS.

ENHANCEMENTS

Clusters

  • Password files and keystores are now available only on the HeadNode in /opt/CCME_internal.

  • Updated troubleshooting section in the documentation.

  • Added option CCME_EF_MAX_UPLOAD_SIZE to set the maximum size allowed for uploaded files, in bytes (default is 1024 * 1024 * 1024 * 40 (40GB): 4294967296).

BUG FIXES

Clusters

  • Fix the startup of slurmctld during a stop/restart of HeadNodes

  • Fixed AWS SSM gpg key import for Rocky Linux 9 (Rocky9) and RedHat Enterprise Linux 9 (RHEL9)

CCME Management Host (CMH)

  • Fix LDAPS URI for CCME Management Host, replacing CCMEAdIPs by CCMEAdUri

Warning

CCME_SHARED_DIR is now a new and mandatory parameter. It must be set to a valid mountpoint specified in one of AWS ParallelCluster SharedStorage (see this link for more details).

We recommend that you use a file system that can sustain your needs. EFS can be a good choice to have “unlimited” storage that will automatically grow to your needs.

CCME_SHARED_DIR must not be changed during a cluster update. Updating it may break EnginFrame setup.

Warning

The parameter CCMEAdIPs no longer exists in the CCME Management Host stack and deployment configuration files. You must use CCMEAdUri for any new deployment.

Release 5.6.3 - July 10, 2024

BUG FIXES

CCME Management Host (CMH)

  • Updated AWS ParallelCluster to version 3.9.3

Release 5.6.2 - July 05, 2024

BUG FIXES

  • Added lambda:ListTags permissions to ALB Lambda. Previously, permissions on ListTags were required only when using the ListTags API explicitly. However, principals with GetFunction API permissions could still access tag information outputted by the GetFunction call. Beginning July 27, 2024, Lambda will return tags data only when the principal calling GetFunction API has a policy with explicit allow permission on ListTags API.

Release 5.6.1 - July 04, 2024

ENHANCEMENTS

Clusters

  • License Management: print error message if any when calling lmstat.

BUG FIXES

  • When building a CCME AMI, Ansible facts were not cleaned up, leading to issues if the AMI was used within 2 hours after its creation (due to Ansible facts cache timeout).

  • Fixed failed condition for sacctmgr with default and efadmin accounts creation

Release 5.6.0 - June 12, 2024

NEW

CCME Roles Stack (CRS)

  • Allow the Management Host Stack to perform the following actions on the CCME Management Host stack resources:

    • ec2:DeleteTags

CCME Management Host (CMH)

  • Support RedHat Enterprise Linux 9 (RHEL9) for the CCME Management Host

  • Add optional MgtHostVolumeSize for custom CCM Management Host volume size

Clusters

  • Add sample custom scripts in CCME/custom/:

    • example.install-ldapfilters-compute.yaml

    • example.install-paraview-head.yaml

    • example.install-singularity-all.yaml

    • example.install-updatenvidiadrivers-all.yaml

  • Added a script (CCME/sbin/prepare-ami.sh) to create a CCME AMI from an AWS ParallelCluster AMI. The process downloads external packages and preinstall some of them. See this section for more information.

  • Support RedHat Enterprise Linux 9 (RHEL9) for Headnodes and Compute with x86_64 and aarch64 architecture

  • Support Rocky Linux 9 (Rocky9) for Headnodes and Compute nodes with x86_64 and aarch64 architecture

Warning

CCME does not yet support DCV on instances with Nvidia GPU on Rocky Linux 9.

ENHANCEMENTS

CCME Management Host (CMH)

  • Updated AWS ParallelCluster to version 3.9.2

Clusters

  • Updated Slurm to version 23.11.7

  • Updated S3FS-Fuse to version 1.94-1 for aarch64

BUG FIXES

CCME Roles Stack (CRS)

  • Fix the documentation related to ccme_secret_prefix as CCME Roles Stack parameter

CCME Management Host (CMH)

  • Fixed secrets correct usage by removing the secrets_manager_prefix parameter from the deployment.ccme.conf configuration file

  • Fixed the ParallelCluster configuration file template with a ResourcePrefix depending on the CCME Roles Stack

  • Fixed authentication on the CCME Management Host with Active Directory users

  • Fixed home directory on the CCME Management Host with Active Directory users

Clusters

  • Fixed MariaDB version to 10.6.18 for x86_64

  • Fixed Fuse3 version to 3.3.0-18 for Rocky

  • Authentication on EnginFrame portal with aarch64 architecture

Release 5.5.1 - April 30, 2024

NEW

CCME Roles Stack (CRS)

  • Add a tag named ccme:version to the CCME Roles Stack when deployed with the deployCCME script

CCME Management Host (CMH)

  • Add a tag named ccme:version to the CCME Management Host stack when deployed with the deployCCME script

  • Add a tag named ccme:version to the cluster templates

Clusters

  • Add a variable named CCME_VERSION to the CCME_ENV_FILE and CCME_ENV_SH_FILE

ENHANCEMENTS

  • CCME tarball is now created without extended file attributes, as it can cause errors with old versions of tar when extracting the archive.

BUG FIXES

CCME Management Host (CMH)

  • Fixed ccmeLambdaALBLogGroup path for ccmeLambdaALB in CMH template

Release 5.5.0 - April 02, 2024

NEW

Clusters

  • License management with Slurm: you can now specify a list of licenses hosted by FlexLM license servers that will be added as resources in Slurm and periodically updated by querying the license servers. This allows to block jobs in queue while waiting for licenses to be available.

  • Added the possibility to monitor inactive Linux DCV sessions, and automatically terminate them after CCME_LIN_INACTIVE_SESSION_TIME seconds. CCME_LIN_INACTIVE_SESSION_TIME can be configured in ParallelCluster configuration file or directly in EnginFrame services.

CCME Management Host (CMH)

  • Add ccme_logs_retention_in_days as CCME Management Host deployment parameter to configure how long logs should be retained in CloudWatch.

  • Enhanced security

    • HTTP headers with invalid header fields are now removed by the load balancer (drop_invalid_header_fields is enabled),

    • Ensure that Load Balancer Listener is using at least TLS v1.2,

    • Limit the number of concurrent executions of CMH Lambda to 10.

Documentation

  • Added Active Directory LDAPS documentation.

  • Added Simple AD management documentation.

ENHANCEMENTS

CCME Management Host (CMH)

  • Add the Arn of the resources created by the CCME Management Host Stack to the stack output.

Clusters

  • Use the OnNodeUpdated args instead of OnNodeStart parameters for CCME cluster deployments. This allows to update “any” parameter of CCME during a cluster update, without having to duplicate them.

  • Updated DCV Server version to 2023.1 and patch to 16388.

  • Updated DCVSM version to 2023.1 including:

    • DCVSM Broker to 410-1

    • DCVSM Agent to 732-1

  • Added information about NTFS Unix Security Options for ONTAP.

  • Refresh the Active Directory ReadOnlyUser password when the cluster is updated.

  • Add multiple CCME environment variable files

    • CCME_ENV_FILE: Path to /etc/ccme/ccme.env.yaml

    • CCME_ENV_SH_FILE: Path to /etc/ccme/ccme.env.sh

  • Add the following environment variables to the CCME environment variable files (CCME_ENV_FILE and CCME_ENV_SH_FILE):

    • CCME_NODE_DCV

    • CCME_NODE_PARTITION

  • Added examples in CCME/custom/winec2_config.ps1 to

    • join an Active Directory,

    • at user logon mount Ontap volumes and export specific variables.

  • Added the possibility to have a specific CCME/custom/CLUSTERNAME.ef-services.zip (with CLUSTERNAME the name of the cluster) to specify a list of EnginFrame services to publish specifically for this cluster. The CCME/custom/ef-services.zip can still exist, but is ignored if CLUSTERNAME.ef-services.zip is present.

  • Add two variables to the OpenID (OIDC) cluster configuration

    • UsernameAttribute is optional and allows to select the user attribute used as username,

    • UserMapping is optional and allows to map every user to the same portal service account.

  • Add custom decode_jwt_headers python script in CCME/custom/ for OpenID (OIDC).

Documentation

  • Updated the Active Directory documentation

    • adcli commands

    • ldapsearch and ldapmodify ldif files

BUG FIXES

CCME Management Host (CMH)

  • Fixed download on large files through EnginFrame: increased the ApplicationLoadBalancer idle timeout to 300s.

  • Fixed usage of the latest version of the launch template for Windows DCV instances.

  • Automatically clean Windows sessions that have been closed on the instance, but were still visible in EnginFrame in an unknown state.

Clusters

  • Fixing buffer_pool_size and lock_wait_timeout errors from MariaDB server.

  • Do not download Slurm tarball when the current version is already the correct one.

  • Force restart NFS server on HeadNode.

Warning

All CCME variables in AWS ParallelCluster configuration file have been migrated from OnNodeStart.Args to OnNodeUpdated.Args. You should now specify them only in OnNodeUpdated.Args.

Warning

The following variables have been renamed for consistency. If you used any of them in your configurations, please update them:

  • CCME_EF_ADMIN_PASSWORD has been renamed to CCME_EFADMIN_PASSWORD: ARN of an AWS Secret that stores the password of efadmin,

  • CCME_EF_ADMIN_SUDOER has been renamed to CCME_EFADMIN_SUDOER: is efadmin sudoer or not,

  • CCME_EF_ADMIN_USER has been renamed to CCME_EF_ADMIN_USER: username of efadmin (default efadmin)

Release 5.4.1 - February 19, 2024

BUG FIXES

Clusters

  • Fixing EnginFrame publish services when OIDC feature is enabled

  • Add PrivateData in Scheduling.SlurmSettings.CustomSlurmSettings. It is not sufficient to specify it in CCME_CUSTOM_SLURMDBD_SETTINGS.

  • Fixed exportfs from HeadNodes to their VPC when they more than one CIDR

Release 5.4.0 - February 02, 2024

NEW

CCME Roles Stack (CRS)

  • Allow the Management Host Stack to perform the following actions on the CCME Management Host stack resources, they are needed during an update:

    • ec2:StopInstances

    • ec2:ModifyInstanceAttribute

    • ec2:StartInstances

    • ec2:ReplaceIamInstanceProfileAssociation

    • elasticloadbalancing:SetSubnets

    • lambda:UpdateFunctionCode

    • lambda:UpdateFunctionConfiguration

    • lambda:ListTags

    • sns:SetTopicAttributes

    • sns:ListSubscriptionsByTopic

CCME Management Host (CMH)

  • Updated AWS ParallelCluster to version 3.8.0

  • The CCME Management Host instance type can now be set to any x86_64 instance instead of a limited EC2 list

  • Added the possibility to update the CCME Management Host stack:

    • Update policies integration to deployment.ccme.conf.

    • CCME Management Host update policies documentation in Management.

Clusters

  • Load EnginFrame services from the CCME/custom/ef-services.zip instead of CCME/pkgs/efservices-slurm.tgz. These services are now automatically published to all users, only if CCME_OIDC is not used.

  • Add support to EC2 G5 instance family for DCV partitions.

  • Add CCME_EFADMIN_SUDOER parameter to configure the efadmin user as a sudoer.

  • Add CCME_AD_GROUP_SUDOER parameter to configure users of an AD group as a sudoer.

  • Add update policy to each CCME clusters parameter. Most of the CCME_ parameters can now be updated with OnNodeUpdated/Args.

  • Use generated passwords for the database users instead of hardcoded credentials

  • Add Rocky Linux 8 support for CCME Clusters. Only with a custom AMI.

ENHANCEMENTS

CCME Roles Stack (CRS)

  • Allow the Management Host Stack to perform the logs: actions on /ccme/ccme* instead of /aws/lambda/ccme*.

  • Logs of custom playbooks are now also uploaded in AWS CloudWatch loggroup /ccme/ccme-cmh-*.

CCME Management Host (CMH)

  • Updated CMH Amazon Linux 2023 AMI to version al2023-ami-2023.3.20240131.0-kernel-6.1-x86_64

  • The CCME Application Load Balancer lambda log group path is changed to from /aws/lambda/ccmeLambdaALB-${StackId} to /ccme/ccme-lambda-alb-${StackId}.

  • The Windows instances use the latest version of their CMH Windows launch template

  • The CCME Management Host instance uses the latest version of its launch template

  • The CCME Management Host can no longer be used in (insecure) privileged mode. CRS roles are now mandatory.

    • The CCME Management Host stack no longer creates policies complementary to those dynamically created by AWS ParallelCluster

    • The following settings are now mandatory in the CCME management host configuration:

      • ccme_cluster_lambda_role

      • ccme_cluster_headnode_instance_profile

      • ccme_cluster_compute_instance_profile

Clusters

  • The OIDC prerequisites have been updated to the next versions:

    • decode: 2.4.0

    • pyjwt: 2.8.0

    • requests: 2.31.0

Documentation

  • Added more details about how to configure FSx for NetApp ONTAP

  • Added the documentation related to the CCME cluster parameter update policies

  • Added a link to an AWS Calculator to show an example of the cost of using CCME

  • Updated documentation related to the Application Load Balancer lambda log

BUG FIXES

CCME Roles Stack (CRS)

  • Allow the Management Host Stack to perform secretsmanager:DescribeSecret for cluster deployment with Active Directory

CCME Management Host (CMH)

  • The policy fsx:CreateDataRepositoryAssociation is now correctly applied

Clusters

  • Add retries when registering EnginFrame as a client to DCVSM

  • Added default value to NONE for all variables set in /etc/ccme/ccme.env.sh.

  • Ensure that {{ CCME_DEPS_DIR }}/{{ ansible_architecture }} is created when running CCME ansible role

Release 5.3.5 - January 25, 2024

BUG FIXES

CCME Management Host (CMH)

  • Fixing some bugs during CMH startup on RHEL8

Release 5.3.4 - January 24, 2024

BUG FIXES

CCME Management Host (CMH)

  • Fixing some bugs during CMH startup on RHEL8

Clusters

  • Fixing usage of CCME_WIN_INACTIVE_SESSION_TIME, CCME_WIN_NO_SESSION_TIME and CCME_WIN_NO_BROKER_COMMUNICATION_TIME for Windows DCV sessions: a value of 0 didn’t work to deactivate the timers.

Release 5.3.3 - January 23, 2024

BUG FIXES

CCME Management Host (CMH)

  • Fixing detection of Python 3.9

Release 5.3.2 - January 19, 2024

BUG FIXES

CCME Roles Stack (CRS)

  • Added parameter CCMEKmsAdditionalKey to specify an additional KMS key that will be accessible from the CMH, HeadNode and Compute nodes. (e.g., KMS key used to encrypt AMIs)

Release 5.3.1 - December 14, 2023

BUG FIXES

  • Updated Slurm to version 23.02.7 to fix CVE-2023-49933 to CVE-2023-49938:

    • Slurmd Message Integrity Bypass. CVE-2023-49935. Permits an attacker to reuse root-level authentication tokens when interacting with the slurmd process, bypassing the RPC message hashes which protect against malicious MUNGE credential reuse.

    • Slurm Arbitrary File Overwrite. CVE-2023-49938. Permits an attacker to modified their extended group list used with the sbcast subsystem, and open files with an incorrect set of extended groups.

    • Slurm NULL Pointer Dereference. CVE-2023-49936. Denial of service.

    • Slurm Protocol Double Free. CVE-2023-49937. Denial of service, potential for arbitrary code execution.

    • Slurm Protocol Message Extension. CVE-2023-49933. Allows for malicious modification of RPC traffic that bypasses the message hash checks.

Release 5.3.0 - December 13, 2023

NEW

CCME Management Host (CMH)

  • The parameter MgtHostKeyName is now optional

Clusters

  • Added option CCME_EFADMIN_ID and CCME_EFNOBODY_ID to the clusters.

  • Added options CCME_WIN_LAUNCH_TRIGGER_DELAY and CCME_WIN_LAUNCH_TRIGGER_MAX_ROUNDS to configure Windows DCV sessions startup monitoring processes.

  • Added an example on how to delay DCV and DCVSM services startup on Windows DCV instances to account for custom bootstrap processes.

  • EnginFrame logo can now be customized by adding a logo.png file in CCME/custom directory.

  • Configure /etc/idmap.conf to specify the domain name (when joined to an Active Directory) to allow NFSv4 mounts.

ENHANCEMENTS

  • Updated documentation on troubleshooting FSx for NetApp ONTAP.

BUG FIXES

Clusters

  • Fixed usage of CCME_WIN_CUSTOM_CONF_REBOOT.

  • Fixed Windows DCV sessions when CCME was stored in a subfolder in a bucket.

  • dcv2 feature was defined twice on dcv* partitions.

  • Fixed Alb Lambda for instances concurrency at deployment.

  • Fixed high number of RPC calls from EnginFrame to Slurm, which caused Slurm to sometimes crash.

Release 5.2.0 - November 08, 2023

NEW

CCME Management Host (CMH)

  • Added option application_load_balancer_ingress_cidr to deployment.ccme.conf to specify a CIDR that is allowed to connect to the ALB.

Other

  • Added documentation on how to use FSx for NETAPP ONTAP to share files between Linux and Windows instances.

ENHANCEMENTS

Clusters

  • Updated AWS ParallelCluster to version 3.6.1

  • Retrieve OnNodeUpdated arguments when executing pre-install.sh script (OnNodeStart) in order to allow parameter overriding when updating a cluster.

BUG FIXES

  • Fixed INTERACTIVE_CLASS name in DCV services, must contain only alphanumeric chars or -_

  • Fixed health check of target group associated with EnginFrame rules

  • Fixed cluster-update command: the update-install.sh script ended in error

  • Removed debug level on SSSD on the Management Host

Note

This version updates the following dependencies for the management host. If you are on a private network without Internet connectivity, you must download the following packages and put them in management/pkgs:

  • {{ ansible_architecture }}/pcluster-installer-bundle-3.6.1.209-node-v16.19.0-Linux_{{ ansible_architecture }}-signed.zip

Release 5.1.0 - October 20, 2023

NEW

CCME Roles Stack (CRS)

  • Allow the Management Host Stack to perform action elasticloadbalancing:ModifyLoadBalancerAttributes on the ALB created by CCME

CCME Management Host (CMH)

  • CMH can now use either RHEL8 or Amazon Linux 2023. Amazon Linux 2 has been removed.

Clusters

  • DCV sessions are now limited by default to 1 session per session type (partition) for each user. This is set for DCV sessions running on the Headnode, any dcv* partition, and Windows instances. This configuration can be updated in the associated EnginFrame services.

  • New variable configuration variables:

    • CCME_CUSTOM_SLURMDBD_SETTINGS to specify additional parameters for SlurmDBD (e.g., security variables such as PrivateData).

    • CCME_WIN_TAGS to specify additional tags to be set on the instances of the Windows fleet.

Other

  • Ansible, boto3, AWS CLI have been updated, and Python 3.9 is used on both CMH and clusters for executing Ansible playbooks. This reduces the deployment time by ~100-200s per node.

  • Application Load Balancer can now store its logs within a S3 named with ccme-alb-logs-${StackId} as prefix

ENHANCEMENTS

Cluster

  • Removed builtin services in EnginFrame that couldn’t be used with a CCME cluster.

  • Enable update of Slurm when the version deployed by AWS ParallelCluster is different from the one in deployment.yaml

  • The bucket ccme-s3bucket created for clusters is renamed ccme-cluster-${StackId}

  • Updated DCV to version 2023.15487

BUG FIXES

  • Fixed cluster deployment on aarch64 architecture

  • Fixed Slurm CVE-2023-41914 by updating Slurm to version 23.02.6.

Warning

Fixing Slurm CVE-2023-41914 by updating Slurm to version 23.02.6 requires to re-compile Slurm on the Headnode at deployment time. You need to ensure that the timeout set on the deployment time of your Clusters is high enough to allow this. Update the AWS ParallelCluster configuration files accordingly, such as:

DevSettings:
  Timeouts:
    HeadNodeBootstrapTimeout: 2400
    ComputeNodeBootstrapTimeout: 1800

Release 5.0.0 - October 02, 2023

Updated EULA to version 2.3.

NEW

CCME Roles Stack (CRS)

  • Allow the HeadNode and ComputeNodes to perform the following actions on all ALB

    • elasticloadbalancing:DescribeLoadBalancerAttributes

    • elasticloadbalancing:DescribeListeners

    • elasticloadbalancing:DescribeRules

    • elasticloadbalancing:DescribeTags

  • Allow the Management Host Stack to perform the following actions

    • elasticloadbalancing:AddTags on the ALB created by CCME

    • ec2:CreateTags on the all network-interfaces resources

    • cloudformation:CreateChangeSet

CCME Management Host (CMH)

  • Support RedHat Enterprise Linux 8 (RHEL8) for the CCME Management Host

  • Add optional proxy, no_proxy and pip repository as variables for the CCME Management Host (CMH) and the clusters

  • Add optional custom AMI parameter

  • Add optional security-group parameter

  • Added the possibility to generate custom ParallelCluster configuration files from Jinja2 templates

Clusters

  • Support RedHat Enterprise Linux 8 (RHEL8) for Headnodes and Compute nodes

  • CCME Ansible playbooks have been refactored in an Ansible role

  • New visualization Windows fleet including

    • A Windows fleet launch template deployed by the CCME Management Host (CMH)

    • New variables CCME_WIN_ availables for CCME clusters to configure the Windows fleet (AMI, instance type, configuration files…)

  • EnginFrame

    • Dynamically generate an EnginFrame service for remote visualization for:

      • the headnode

      • each dcv* queue in Slurm

      • Windows fleet

    • Renamed the DCVSM cluster name in EnginFrame from headnode to dcvsm, and made its hosts visible in the EnginFrame Hosts service

    • Add the possibility to automatically register all users belonging to the CCME_EF_ADMIN_GROUP OS group as administrators of EnginFrame

    • Add CCME_EFADMIN_PASSWORD as AWS Secret arn parameter to store the EnginFrame admin (efadmin) password for clusters

  • Add the possibility to encrypt all storages using multiple KMS keys at deployment with variables like ccme_kms_

  • Add a custom playbook to fix Nvidia drivers CVE named example.install-fix-nvidiacve-all.yaml

    • The Nvidia driver version is defined in dependencies.yaml through the parameter nvidia_version (set to 515.48.07)

    • Verify Nvidia Drivers presence for CCME clusters in CCME sources, download and install if not present

  • Add default encryption for clusters root volumes for the cluster configuration files

CCME logs are now sent to CloudWatch

  • For the CMH, a new log group name ccme-cmh-<stackID> is created. The following logs are available:

    • CCME logs: /var/log/ccme.*.log

    • Cloud init and cfn init logs: /var/log/cloud-init.log, cloud-init-output.log, cfn-init.log

    • System logs: /var/log/messages and syslog

    • SSD logs: /var/log/sssd/sssd.log and sssd_default.log

  • For each cluster, the logs are sent in the same subgroup as the log group of the cluster (see Amazon CloudWatch Logs cluster logs). The following logs are now available for each cluster:

    • On Head and Compute nodes:

      • All CCME pre/post/update Bash and Ansible scripts logs, including custom scripts

      • DCV logs (for Compute nodes belonging to a dcv* Slurm partition)

    • On the Headnode:

      • EnginFrame logs

      • DCVSM broker and agent logs

ENHANCEMENTS

Documentation

  • Add Active Directory users troubleshoot section to the CCME documentation

  • The documentation requirements relating to the AWS network environment has been updated

    • Information relating to the subnets requirements are more explicit

    • Add specification for Internet Gateway and Network Gateway depending on multiple networks cases

  • The help function of the deployCCME.sh script is more verbose

CCME Management Host (CMH)

  • The resources created by the CMH stack base their name on the CMH stack id instead of CMH stack name

  • Upgrading AWS ParallelCluster to version 3.6.0

  • The parameters ccme_bucket and ccme_bucket_subfolder are merged to a new parameter ccme_bucket_path

Clusters

  • Gnome initial setup is disabled on HeadNode and DCV nodes

  • The ALB URL can be replaced by a custom DNS name using the new CCME_DNS variable. This can be used to redirect both EnginFrame and DCV URLs through CCME_DNS.

  • Improve the robustness and idempotency of the Ansible tasks

  • Upgrading EnginFrame to version 2021.0-r1667

  • Using native Nvidia Parallelcluster drivers (version 470.141.03) to reduce clusters deployment by approximately 2 minutes

  • The name of the CCME configuration file uploaded available in /opt/CCME/conf/ is now based on its CMH name (e.g., CMH-Myfirstcmh.ccme.conf)

Security

  • Improve the security by adding restriction on CloudFormation usage based on stack tags

  • Improve the CCME security by using IMDS v2 token to retrieve EC2 metadata

Other
  • CRS CloudFormation template has been split into several templates to fit into template size restrictions for CloudFormation

  • CMH CloudFormation template has been split into several templates to fit into template size restrictions for CloudFormation

  • CMH stack tags are now propagated to:

    • CMH EBS

    • CMH Active Directory (if created by the CMH stack)

BUG FIXES

CCME Roles Stack (CRS)

  • Fixed iam:PassRole with the parameter CustomIamPathPrefix in the CCME Roles Stack (CRS)

  • Fixed missing optional AWS Route53 policies in the CCME Roles Stack

  • Fixed ec2:RunInstances authorization for the computes deployment in placement-group

  • Fixed tags associated to the CCME Roles Stack (CRS) and CCME Management Host (CMH) deployed with the deployCCME.sh script

  • Fixed CCME Management Host (CMH) ALB policy allowing to update the Application Load Balancer (ALB) certificate with elasticloadbalancing:ModifyListener

CCME Management Host (CMH)

  • Fixed tag Name for the CCME Management Host (CMH)

  • Fixed no multiple EBS storage on CMH

  • Fixed optional SNS notification at cluster deployment

    • The CCME Management Host (CMH) parameter CCMESnsALB becomes CCMEAdminSnsTopic

    • The default value is now NONE instead of *

  • On the Management Host /var/log/ccme.ccme-start.log now correctly displays the logs on individual lines

  • Fixed FSx policies (fsx:Describe*) for deployment with FSxOnTap storage

Clusters

  • Fixed JWT headers decoding when using OIDC authentication

  • Fixed management_stack_role variable description in the deployment configuration file of CCME

  • Fixed dependencies downloading when external repositories take time to respond

  • Fixed url redirection of the EnginFrame logo

  • Fixed install of the S3FS-Fuse latest version

    • x86_64 (1.9*)

    • aarch64 (== 1.93-1)

  • Fixed S3FS mount point with aarch64 architecture and IMDs v2

  • Fixed configuration file for ARM clusters

  • Fixed AWS SSM installation when a previous version is already installed

  • Fixed EFA usage on computes with the compute security group deployed with the CCME Management Host (CMH) stack

  • Fixed classic cluster configuration file

  • Fixed /opt/CCME NFS export: compute nodes can now be in different networks and AZs than the headnode

  • Fixed mode for the CCME env file

  • Fixed MariaDB installation with aarch64 for ALinux 2 OS with the next packages:

    • MariaDB-Server: 10.6.9-1

    • Galera: 4-26.4.12-1

  • Fixed CCME ALB Lambda policy, explicitly allowing the lambda to:

    • logs:CreateLogStream

    • logs:PutLogEvents

    • elasticloadbalancing:AddTags

  • Fixed compute egress security group

  • Fixed versions for pip packages

  • Fixed retrieval of pricing data through Slurm epilog script: use IMDSv2 to retrieve metadata

  • Fixed hosts status in EnginFrame when they are in IDLE+CLOUD Status in Slurm

Release 4.2.0 - May 17, 2023

NEW

  • CCME now supports in AWS region Stockholm (eu-north-1)

  • AWS IAM Roles support for CCME management, lambdas and clusters

  • Automated deployment of the CCME Roles Stack (CRS) with deployCCME.sh

ENHANCEMENTS

  • CCME dependencies packages are not required anymore

  • Upgrading DCV to 2023.0 (15065)

  • Upgrading DCVSM to 2023.0

    • Broker: 392-1

    • Agent: 675-1

BUG FIXES

  • Add fix to anticipate the resolution of a colord profile dialog box issue in virtual DCV sessions

  • Add fix to remove screenlock in Gnome screensaver settings

  • Add fix to force the minimize and maximize buttons to appear in the top right corner of the Windows in Gnome-based DCV sessions

Release 4.1.0 - April 21, 2023

NEW

  • Amazon SSM is now installed on the CCME Management Host (CMH)

ENHANCEMENTS

  • Separation of public and privates subnets for security improvement

    • The component Application Load Balancer is created in public subnets separated from other components

    • The components Active Directory, Management Host and the clusters are now in privates subnets

  • Upgrading PCluster to 3.5.0

  • Upgrading Ansible to 4.10.0

  • Upgrading Pip to 23.0.1

  • Support usage of IMDs v2 for CCME clusters

BUG FIXES

  • xorg.conf is now configured correctly for DCV on instances equipped with GPUs with HardDPMS option set to false (and option UseDisplayDevice removed)

  • Amazon SSM usage on clusters, including HeadNodes and ComputeNodes

  • Cluster time-out, including separated variable for HeadNode and ComputeNode are now configurable

  • Cluster update is now working correctly

  • The first visualization session / the first job starting dynamic node is now executed correctly after the end of node configuration

  • Fixed ALB rules creation for DCV nodes when lots of nodes are deployed at the same time.

  • EnginFrame services are not reset when the cluster is updated

Release 4.0.0 - January 24, 2023

NEW

  • Multiple S3 Buckets can now be mounted through S3FS

  • Cluster can be deployed in VPCs different than the CCME Management’s VPC

  • Support pre-existing AWS Application Load Balancer (ALB)

  • Support pre-existing Active Directory internal/external to AWS using LDAP

  • Support list of key:value tags for CCME Management Host and Clusters

  • Integration of custom scripts execution

  • Support optional authentication with OIDC to the EnginFrame portal

  • Support mounted files systems as mountpount for user home by setting the fallback_homedir option in sssd

  • Support timezone configuration for CMH and cluster instances

ENHANCEMENTS

  • Enforce TLS requirements in CCME S3 policies

  • Upgrading PCluster to 3.2.0

  • Upgrading Slurm to 21.08.8-2

  • Upgrading DCV to 2022.1 (13067)

  • Upgrading DCVSM to 2022.1

    • Broker: 355-1

    • Agent: 592-1

  • Upgrading EnginFrame to 2021.0-r1646

  • Updrading Nvidia Drivers to 515.48.07

  • Deploy DCV on compute nodes depending of the presence of dcv in partition(s) name(s)

  • No configuration action is required to start a first pre-configured cluster

  • The cluster policies are now generated by the ManagementHost

  • The cluster component named master have been renamed to headnode

  • Possibility to specify the authorized CIDR for the frontend ALB

  • Automated creation of private S3 bucket to use as the AWS ParallelCluster CustomS3Bucket configuration

  • Management Host public IP configuration can be set to NONE

DEPRECATED

  • Removing CCME Command Line Interface (CCME-CLI) support

  • Removing Ganglia support

Release 3.0.0 - March 23, 2021

NEW

  • Adding a common secured balanced https entry point

    • EnginFrame portal

    • Ganglia

    • DCV sessions

  • Adding dedicated stack to deploy Management Host

  • Adding user centralized authentication through directory services (AD)

    • Secure access to cluster through selected groups in the AD

    • Secure access to Management Host through selected groups in the AD

  • Adding a CCME Command Line Interface for Management Host

    • Start, Stop, Update, Delete a cluster

    • Possibility to set a time-to-live to a cluster

  • Updating HeadNode so that it uses DCVSessionManager as its session viewer

  • Adding documentation

ENHANCEMENTS

  • Upgrading to AWS ParallelCluster 2.10.1

  • Updating Slurm to 20.02.4

  • Upgrading DCV to 2020.2 (9508)

  • Upgrading EnginFrame to 2020.0-r58

  • Adding option to specify on which partition(s) DCV should be deployed

DEPRECATED

  • Removing BeeGFS support

BUG FIXES

  • Fixing S3 Bucket policies