Troubleshooting

Management Host Deployment

$ ./deployCCME.sh -n ClusterName -cm deployment.ccme.conf -m -nr

Cluster deployment

$ pcluster create-cluster --cluster-name ClusterName --cluster-configuration configuration_file.yaml --region $REGION --rollback-on-failure false

Accessing CCME logs

All logs are available on the Headnode and compute nodes of the cluster in /var/log, and also in the AWS CloudWatch LogGroup created with the cluster (see Amazon CloudWatch Logs cluster logs). The CloudWatch LogGroup /aws/parallelcluster/<cluster_name>-<StackId> includes:

enginframe/ef.log

enginframe/dcvsm.log

enginframe/agent.remote.log

dcv/server.log

dcvsm/broker.log

dcvsm/agent.log

ccme.*

The retention period in day is:

For the CloudWatch LogGroup: The retention period is defined by the ccme_logs_retention_in_days CMH parameter, the default value is 14 days.
For the Instances: The retention period is 7 days.

All CCME logs start with ccme.. For example to see why the pre-install script failed to run you have to see in /var/log/ccme.pre-install.log and then to see for post-install script /var/log/ccme.post-install.log

Users see user (USER1) is not authorized to modify attributes of spooler errors in EnginFrame portal

Whenever a user tries to submit a job or a VDI session, the see the following error: user (USER1) is not authorized to modify attributes of spooler.

In ef.log file, if you see errors similar to the following, then you may have an issue with users’ identity seen by EnginFrame.

2024/Jul/01 15:56.24 ERROR   TID[62707] USER1 EnginFrameServlet.doService(EnginFrameServlet.java:234): handling request
    java.lang.NullPointerException: Neither key nor value can be loaded as null.[mapName: spooler.map, key: spooler:///shared/nice/spoolers/USER2/tmp11000975990629747030.session.ef, value: null]

You can first try to restart EnginFrame, it usually fixes the issue: systemctl restart enginframe.

It sometimes happens that you need to clear the SSSD cache as described here

You can cleanup persistence data for a specific user by following this procedure that need a short period of unavailability of EnginFrame service:

make the user close all his/her sessions
identify all files belonging to [USERNAME] in the directories: repository, sessions and spoolers: ls /opt/nice/enginframe/{repository,sessions,spoolers}/[USERNAME]/*
stop enginframe: systemctl stop enginframe
remove all files identified by previous ls command
start enginframe: systemctl start enginframe

Active Directory users

Depending of the Active Directory configuration and security, the users of the AD can have issue at connection. When the issue appears, we strongly recommend to check the sssd status with a systemctl status sssd.

Access logs

Multiple logs can be reviewed to follow CCME access:

ALB logs: they are placed in the ccme-alb-logs-${StackId} (with ${StackId} the CMH CloudFormation Stack ID). Elastic Load Balancing publishes a log file for each load balancer node every 5 minutes. For more details on ALB logs and their content see this documentation.
EnginFrame logs: on each cluster, on the Headnode, EnginFrame also produces logs in /opt/nice/enginframe/logs/${hostname} (with ${hostname} the hostname of the Headnode).