Remote visualization with DCV

CCME provides multiple ways of accessing remote graphical sessions to run you pre and post-processing workloads. Remote visualization is delivered by the DCV software. All DCV sessions are made available through the ALB deployed as part of the Cluster Management Host stack. The URL to reach them will be https://ALB_URL/dcv-INSTANCE_ID/#SESSION_ID, with:

  • ALB_URL: the DNS name of the ALB or CCME_DNS if you specify it

  • INSTANCE_ID: the ID of the EC2 instance on which the session runs

  • SESSION_ID: the DCV session ID

By default, only 1 session is allowed per user on each type of DCV sessions (see below: HeadNode, in a dcv* partition or a Windows instance). If a user tries to create multiple sessions, he will get a prompt asking “Do you want to reconnect to the existing session?”. This limitation can be changed by updating the corresponding EnginFrame service: in EnginFrame “Admin Portal”, go to “Manage/Services” and edit the selected DCV service, click on “Settings” and change the “Max number of Sessions” parameter (you can also change the “Session Class” if you wish for example to set the same value for all DCV sessions to allow only 1 DCV session across all the DCV services).

Note

If you modify one of the EnginFrame DCV services generated by CCME, then be aware that the next time you run a pcluster update-cluster these services will be regenerated by CCME. You can safely publish unmodified DCV services, but if you need to customize them, we recommend to first create a copy of the services, and the customize and publish these copies.

Prerequisites

You need to check the limits of your AWS account to match your needs in terms of remote visualization. We list below the quotas you need to review and ensure that they are high enough to support your use cases:

  • EC2: ensure that you have enough vCPUs for the different types of instances you will use across all your clusters.

  • Elastic Load Balancing (ELB):

    • Rules per Application Load Balancer: if you intend to have a large cluster of remote visualization instances (both Linux and/or Windows), you need to ensure that this limit is above the total number of remote visualization instances you will allow on all your clusters. Reminder: only 1 session can be run on a Windows instance, but multiple can be run concurrently on a Linux instance; only the number of instances count for this limit.

    • Target Groups per Application Load Balancer: same as above, as 1 target group is created per DCV instance.

Linux

Headnode

A service in EnginFrame is available to start a DCV session on the HeadNode. Its name is DCV Linux - Headnode, by default it is not published to any user.

DCV session on the HeadNode is not managed by Slurm, but by DCV Session Manager.

Compute nodes

All instances placed in a Slurm partitions named dcv* will have DCV installed and available.

A service in EnginFrame automatically created to start a DCV session on the each dcv* queue. The name of the services is DCV Linux - queue <queueName> (with queueName the name of the DCV queue), by default it is not published to any user.

Nodes are managed by Slurm and ParallelCluster as standard compute nodes. So instances are dynamically started/terminated depending on the needs of the users

Windows

Aside from Linux remote visualization nodes, CCME provides a way to dynamically manage Windows instances for remote visualization. In this case, the instances are note managed by Slurm and ParallelCluster, but directly by CCME through multiple mechanisms, but essentially a new EnginFrame plugin that manages the lifecycle of the Windows instances along with the requested remote sessions.

Prerequisites and configuration

The only prerequisite that is needed to setup a Windows DCV fleet, is to have an AMI prepared with the following requirements:

  1. OS version: Windows Server 2019 or greater

  2. Installed software

    1. DCV server: configured to work properly on the target instance type (e.g., with or without GPU), and with DCV Session Manager.

    2. DCV Session Manager Agent <https://docs.aws.amazon.com/dcv/latest/sm-admin/agent.html> (DCVSM Agent). Its final configuration will be done at startup by CCME through an EC2 User-Data script.

    3. NVidia drivers: this is optional and depends on the target instance type

    4. AWSPowerShell: tools to call AWS APIs. Note that these tools are installed by default on AWS standard Windows AMIs.

  3. User authentication: your AMI must be configured to authenticate against your Active Directory, or use local users (not recommended)

  4. Shared file systems: any shared file systems that needs to be mounted on the instance must be pre-configured in the AMI. CCME EC2 User-Data script currently do not handle any dynamic mountpoints.

You’ll then need to select an instance type compatible with the AMI you created.

To configure CCME, you’ll need to update the following variables in the ParallelCluster configuration file, in the HeadNode.CustomActions.OnNodeStart.Args parameters:

  • Instance launch configuration:

    • CCME_WIN_AMI: the AMI ID.

    • CCME_WIN_INSTANCE_TYPE: the instance type to use.

    • CCME_WIN_TAGS (optional, default {}): dictionary of additional tags to apply on the instances of the Windows fleet (see Starting a Windows DCV session for the list of default tags). The format must be a valid “YAML dictionary embedded in a string”. Hence, the whole line must be enclosed in double quotes, and then the value of CCME_WIN_TAGS must be the dict enclosed in escaped double quotes. See the following example: "CCME_WIN_TAGS=\"{'MyTagKey1': 'MyTagValue1', 'MyTagKey2': 'MyTagValue2'}\"".

    • CCME_WIN_CUSTOM_CONF_REBOOT (optional, default false): you need to set this to true if your Powershell customization script winec2_config.ps1 reboots the instance.

  • Session launch monitoring from EnginFrame:

    • CCME_WIN_LAUNCH_TRIGGER_DELAY (optional, default 10): Specify the delay (in seconds) between two rounds performed by an EnginFrame trigger to check if the instance has started and if the DCVSM agent on the instance has registered to the DCVSM broker on the HeadNode.

    • CCME_WIN_LAUNCH_TRIGGER_MAX_ROUNDS (optional, default 100): Maximum number of rounds allowed to check the instance startup (hence, the maximum delay for an instance to start and join DCVSM is CCME_WIN_LAUNCH_TRIGGER_DELAY * CCME_WIN_LAUNCH_TRIGGER_MAX_ROUNDS, by default 10 * 100 = 1000 seconds)

  • Session cleanup configuration:

    • CCME_WIN_INACTIVE_SESSION_TIME (optional, default 600): the instance is terminated if no DCV session is active on the Windows instance since more than CCME_WIN_INACTIVE_SESSION_TIME seconds. If the value is 0, this control is deactivated.

    • CCME_WIN_NO_SESSION_TIME (optional, default 600): the instance is terminated if no DCV session is up and running on the Windows instance and the instance has been started more than CCME_WIN_NO_SESSION_TIME seconds ago. If the value is 0, this control is deactivated.

    • CCME_WIN_NO_BROKER_COMMUNICATION_TIME (optional, default 600): time in seconds after which the instance is terminated if the DCVSM broker on the headnode cannot be contacted from the Windows instance. If the value is 0, this control is deactivated.

Note

If you need to have multiple configurations for your Windows interactive sessions in a single CCME cluster, you can override the values of CCME_WIN_AMI, CCME_WIN_INSTANCE_TYPE, CCME_WIN_INACTIVE_SESSION_TIME, CCME_WIN_NO_SESSION_TIME and CCME_WIN_NO_BROKER_COMMUNICATION_TIME that you have set in ParallelCluster configuration by exporting these variables with the expected values in the action script of your EnginFrame services (e.g., export CCME_WIN_INSTANCE_TYPE=g4dn.xlarge).

In this case, the variables set in ParallelCluster configuration file will be used as default values, and the variables set in EnginFrame services will be specific to the service.

You can thus create multiple DCV Windows services with different configurations. For example:

  • Allow long-lasting sessions by setting CCME_WIN_INACTIVE_SESSION_TIME to 0.

  • Propose multiple sizes of instances depending on the application/use-case your users have (e.g., g4dn.12xlarge for heavy 3D modeling, m6a.large for 2D-only sessions…).

Note

If you need to update the AMI for your Windows instances (e.g., to applying a security update), you have two ways of doing it:

  1. Simply export the CCME_WIN_AMI in the DCV services in EnginFrame, by specifying the new AMI ID. You will need to exporte this variable for all Windows DCV services if you have multiple.

  2. Update the CCME_WIN_AMI variable in the ParallelCluster configuration file. For that, you will need to stop the compute fleet of your cluster, then update the cluster, and start again the compute fleet.

Windows session lifecycle

Starting a Windows DCV session

To create a Windows DCV session, you first need to publish the DCV Windows service that is automatically created with in EnginFrame. Just log-in as an administrator (e.g., efadmin), click on Switch to Admin View and then on Services, select the DCV Windows service and Publish the service to the selected groups of users.

Whenever a user requests a new Windows session through the DCV Windows service created above, it will go through the following process:

  • A new EC2 instance is launched.

    • Each session is tied to a specific EC2 instance: it is started when requested and terminated when the session is closed (see Terminating a DCV session and instance cleanup to prevent overspendings for details about EC2 instance termination).

    • The instance is launched with a Launch Template created by the Management Host (see the CCME_WIN_LAUNCH_TEMPLATE_ID parameter in the ParallelCluster configuration file in HeadNode.CustomActions.OnNodeStart.Args), it will be started in the same network as the compute nodes.

    • The AMI ID used to launch the instance is the one specified by CCME_WIN_AMI.

    • The instance runs a specific EC2 user-data Powershell script during startup. This script is responsible for configuring the following items:

      • DCV: configuration of the web-url-path parameter to match the path used with the ALB, authentication with DCVSM Broker, disable automatic console session.

      • DCVSM agent: configuration of the connectivity to DCSVM Broker on the HeadNode (IP, Certificate), tags (same tags as the EC2 instance, see Cluster termination).

      • A scheduled task to cleanup the instance in some cases (see Instance cleanup ScheduledTask).

      • Running a custom Powershell script if it exists: if you put a winec2_config.ps1 file inside the custom directory of CCME in its S3 bucket, then this script will be executed during startup. You can use this script for example to mount additional file systems, add further configurations for Windows, or even join the instance to your Active Directory. Important: if this script reboots the instance, then you need to set CCME_WIN_CUSTOM_CONF_REBOOT to true.

      • Sending a message to SNS to register the instance with the CCME ALB.

      • Restart DCV and DCVSM Agent services.

    • The instance is tagged with the following tags:

      • All tags present in the Tags section of the ParallelCluster configuration file and parallelcluster:* tags created by ParallelCluster.

      • ccme_EF_USER: the username of the user who requested the session

      • ccme_CLUSTERNAME: the name of the cluster

      • ccme_HEADNODE_IP: the IP address of the HeadNode of the cluster

      • ccme_interactive_sessionUri: URI internal to EnginFrame to identify the session

      • ccme_EF_SPOOLER_NAME: name of the Spooler attached to the session in Enginframe

      • parallelcluster:node-type: this tag cannot be changed and its value is always CCME_DCV. This allows the instances to be terminated along with the cluster.

      • All tags present in CCME_WIN_TAGS: tags describe in this variable supersede the ones present in the Tags section of the ParallelCluster configuration file.

  • The user is redirected to the Sessions view of EnginFrame, and will have to wait for the instance to start (this can take several minutes). The instance will automatically be configured to join the DCVSM cluster, and apply a set of tags to recognize it as belonging to the user and the requested DCV session. Once ready, the user will be redirected to the DCV session in his/her browser or download a DCV file containing the connection information.

  • DCV network streams will go through the Application Load Balancer (ALB) deployed by the Management Stack, just like the Linux DCV sessions.

image_submit_winec2_session

Limiting the number of sessions per user

EnginFrame provides a way to limit the number of sessions a user can launch at the same time.

Edit the DCV Windows service that you have created in EnginFrame (as an administrator, click on Switch to Admin View, the Manage/Services and edit your service), click on Settings. There you’ll find two parameters that you can configure to limit the number of sessions:

  • Session Class: a String used to identify the type of session you are running. You can simply set it to windows for example.

  • Max number of Sessions: the number of concurrent sessions allowed to a user. If you leave it empty, users can launch as many sessions as they want, though they’ll be prompted to reuse an existing session if any. If you set it to 1, then only 1 session will be allowed.

Terminating a DCV session and instance cleanup to prevent overspendings

CCME embeds a set of functions and safeguards to prevent overspendings and terminate all Windows instances that are no longer required. In this section, we present these various mechanisms.

Instance shutdown

All Windows instances are launched through a CCME EC2 lauch-template. This launch template configures the InstanceInitiatedShutdownBehavior parameter of the instance to terminate. This means that if a shutdown command is issued on the instance, instead of stopping it (and still pay for the attached storages), the instance is terminated.

Closing the session from EnginFrame

The best and fastest way to cleanup a Windows instance is simply to close the session from EnginFrame. CCME EF plugin will terminate the instance associated with the session as soon as you request to close it.

Warning

Simply disconnecting from the DCV client or from Windows will not directly trigger the termination of the instance. Though the instance might be terminated after CCME_WIN_INACTIVE_SESSION_TIME seconds (see Instance cleanup ScheduledTask).

image_close_winec2_session

EnginFrame cleanup trigger

CCME EF plugin embeds an EnginFrame trigger that periodically (every 60 seconds) checks known DCVSM hosts and DCV sessions. If it detects that a host does not have a session anymore (e.g., if the session has been closed from the Windows session, not from EnginFrame), then the trigger will terminate such instances.

image_EFTrigger_winec2_session

Instance cleanup ScheduledTask

All Windows instances are configured with a periodic ScheduledTask that terminates the instance if one of the following conditions are met:

  1. No DCV session is active since more than CCME_WIN_INACTIVE_SESSION_TIME seconds.

  2. No DCV session is available and more than CCME_WIN_NO_SESSION_TIME seconds have passed since the instance has been started.

  3. If the DCVSM broker (on the headnode) cannot be contacted since more than CCME_WIN_NO_BROKER_COMMUNICATION_TIME seconds.

image_winScheduledTask_winec2_session

Cluster termination

All Windows instances launched as part of a CCME cluster inherit the parallelcluster:cluster-name tag from the ParallelCluster cluster, and is also tagged with parallelcluster:node-type=CCME_DCV (the role associated with the AWS ParallelCluster Cleanup Lambda has been updated to allow cleanup of instances with this tag). Thus, when you terminate the cluster, all instances with the same parallelcluster:cluster-name tag as the cluster will be terminated by ParallelCluster through its cleanup Lambda.

image_pclusterCleanupResources_winec2_session