Out of the box metrics for EC2 - disk, CPU, network, for more granularity use CloudWatch Unified Agent
CloudWatch Alarms
Alarms are used to trigger notifications for any metric
Alarms can go to Auto Scaling, EC2 Actions, SNS notifications
There are various options for alarm metrics: sampling, percentage, max, min, etc.
Alarm states:
OK
INSUFFICIENT_DATA
ALARM
Period:
Length of time in seconds to evaluate the metric
In case we are using high resolution custom metrics, we can chose between 10 or 30 seconds for firing the alarm
EC2 Instance Recovery
Status Checks:
Instance status = check the EC2 VM
System check = check the underlying hardware
If one of these alarms are triggered, we can have an action called Instance Recovery. This will trigger some internal mechanism in AWS to recover the instance
After an instance recovery we will have the same private, public, elastic IP, same metadata and placement group
Any data stored on an instance store will not be kept
AWS CloudWatch Events
CloudWatch events can be:
Scheduled: cron job
Event pattern: event rules to react to a service doing something
CloudWatch events can trigger a Lambda function, or can send SQS/SNS/Kinesis messages
A CloudWatch event creates a small JSON document to give information about the change