Cluster deployment
Deploy Cluster with debug mode
To debug deployment of Cluster you have to add –rollback-on-failure false to disable rollback in Cloudformation
$ pcluster create-cluster --cluster-name ClusterName --cluster-configuration configuration_file.yaml --region $REGION --rollback-on-failure false
See deployment Logs
Installation logs location: /var/log/
You can also see the logs in AWS CloudWatch under your cluster name for more information you can see in AWS Parallel Cluster in CloudWatch
CCME logs are also present in the same CloudWatch group: search for ccme.
in the log group to find all CCME logs.
Accessing CCME logs
All logs are available on the Headnode and compute nodes of the cluster in /var/log
, and also in the AWS CloudWatch LogGroup created with the cluster
(see Amazon CloudWatch Logs cluster logs ).
All CCME logs start with ccme.
. For example to see why the pre-install script failed to run you have to see in /var/log/ccme.pre-install.log
and
then to see for post-install script /var/log/ccme.post-install.log
Prevent compute nodes from being killed when there is a problem
AWS ParallelCluster will kill any compute node that has an issue.
If this happens too often, or at every startup of a node, you need to access the logs in /var/log/ccme*.log
to see what is the problem.
To prevent nodes from being killed, you need to log in your AWS console, go to EC2, and identify the instance you want to protect. You then need to
set the following configurations to the instance (Actions/Instance settings
):
Change termination protection
: Enable
Change stop protection
: Enable
Change shutdown behavior
: Stop
Accessing Slurm logs
Slurm logs are available on the Headnode of the cluster in:
Slurm logs are available on the Compute Nodes of the cluster in: