Scalability means that an a system can handle greater loads by adapting
We can distinguish two types of scalability strategies:
Vertical Scalability (scale up/down)
Increase the size of the current instance (ex. from a t2.micro instance migrate to a a t2.large one)
Vertical scalability is common for non distributed systems, such as databases
RDS, ElastiCache are services that can scale vertically
Horizontal Scalability (scale out/in)
Increase the number of instances on which the application runs
Horizontal scaling implies having a distributed system
It’s easy to scale horizontally thanks to could offerings such as EC2
High Availability means running our application in at least 2 data centers (AZs)
The goal of high availability is to survive a data center loss
High Availability can be:
Passive
Active
Load balancers can scale but not instantaneously - contact AWS for a “warm-up”
Troubleshooting:
4xx errors are client induced errors
5xx errors are application induced errors (server side errors)
Error 503 means that the load balancer is at capacity or no registered targets can be found
If the load balancer can’t connect to the application, it most likely means that the security group blocks the connection
Monitoring:
ELB access logs will log all the access requests to the LB
CloudWatch Metrics will give aggregate statistics (example: connections counts)
Load Balancing Basics
Load balancers are servers that forward internet traffic to multiple other servers (most likely EC2 instances)
Why use load balances?
Spear load across multiple downstream instances
Expose a single point of access (DNS) to the application
Seamlessly handle failures of downstream instances (by using health checks)
Do regular health checks to registered instances
Provide SSL termination (HTTPS) for the website hosted on the downstream instances
Enforce stickiness for cookies
High availability across availability zones (load balancer can be spread across multiple AZs, not regions!!!)
Cleanly separate public traffic from private traffic
An ELB (Elastic Load Balancer) is a managed load balancer which means:
AWS guarantees that it will be working
AWS takes care of upgrades, maintenance and high availability
An ELB provides a few configuration options for us also
It costs less to setup our custom load balancer, but it will be a lot more effort to maintain on the long run
An ELB is integrated with many AWS offering/services, it will be more flexible than a custom LB
Health Checks
They enable for a LB to know if an instance for which traffic is forwarded is available to reply to requests
The health checks is done using a port and a route (usually /health)
If the response is not 200, then the instance is considered unhealthy
Types of Load Balancers on AWS
AWS provides 4 type of load balancers:
Classic Load Balancer (v1 - old generation): supports HTTP, HTTPS and TCP
Application Load Balancer (v2 - new generation): supports HTTP, HTTPS and WebSockets
Network Load Balancer (v2 - new generation): supports TCP, TLS (secure TCP) and UDP
Gateway Load Balancer (new generation - see VPC section of the notes)
It is recommended to use the new versions
We can setup internal (private) and external (public) load balancers on AWS
Classic Load Balancers (CLB)
They support 2 types of connections: TCP (layer 4) and HTTP(S) (layer 7)
Health checks are either TCP or HTTP based
CLBs provide a fixed hostname: XXX.region.elb.amazonaws.com
Application Load Balancers (ALB)
They are a layer 7 type load balancers (only HTTP or HTTPS)
They allow load balancing to multiple HTTP applications across multiple machines (target groups). Also they allow to load balance to multiple applications on the same EC2 instance (useful in case of containers)
They have support for HTTP2 and WebSockets.
They support redirects, example for HTTP to HTTPS
They provide routing tables to different target groups:
Routing based on path in URL
Routing based on the hostname
Routing based on query strings an headers
ALBs are great fit for micros-services and container based applications
ALBs have port mapping features to redirect to dynamic ports in case ECS
Target groups can contain:
EC2 instances (can be managed by an Auto Scaling Group)
ECS tasks (managed by ECS itself)
Lambda Functions - HTTP request is translated to a JSON event
IP Addresses - must be private IP addresses
ALBs also provide a fixed hostname (same as CLBs): XXX.region.elb.amazonaws.com
The application servers behind the LB can not see the IP of the client who accessing them directly, but they can retrieve for X-Forwarded-For header. The port can be fetched from X-Forwarded-Port and the protocol from X-Forwarded-Proto
Network Load Balancers (NLB)
Network load balancers (layer 4) allow to:
Forward TCP and UDP traffic to the registered instances
Handle millions of requests per second
Less latency ~100ms (vs 400ms for ALB)
NLBs have one static IP per AZ and supports Elastic IPs (can be used when whitelisting is necessary)
Use case for NLBs: NLBs are used for extreme performance in case of TCP or UDP traffic (example: video games)
Instances behind an NLB don’t see traffic coming from the load balancer, they see traffic as it was coming from the outside world => no security group is attached to LB => security group attached to the target EC2 instance should be changed to allow traffic from the outside (example: 0.0.0.0/0, on port 80)
Stickiness
It is possible to implement stickiness in case of CLB and ALB load balancers
Stickiness means that the traffic from the same client will be forwarded to the same target instance
Stickiness works by adding a cookie to the request which has an expiration date for controlling the
stickiness period
Possible use case for stickiness: we have to make sure that the user does not lose his session data
Enabling stickiness may bring imbalance to the load over the downstream target instances
Cross-Zone Load Balancing
With Cross-Zone Load Balancing enabled each LB instance distributes traffic evenly across multiple AZs
Otherwise, ech LB node distributes requests evenly only in the AZ where it is registered
Classic Load Balancer:
Cross-zone load balancing is disabled by default
No additional charges for cross zone load balancing if the feature is enabled
Application Load Balancer:
Cross-zone load balancing is always on, can not be disabled
No charges applied for cross zone load balancing
Network Load Balancer:
Cross-zone load balancing is disabled by default
Additional charges apply if the feature is enabled
SSL/TLS Certificates
An SSL certificate allows traffic to be encrypted between the clients and the load balancers. This ia called encryption in transit or in-flight encryption
SSL - Secure Socket Layer
TLS (newer version of SSL) - Transport Layer Security
Nowadays TLS are mainly used, but we are still referring to it as SSL
Public SSL certificates are issued by a Certificate Authority
SSL certificates have an expiration dates and they must be renewed
SSL termination: client can talk with a LB using HTTPS but internal traffic can be routed to a target using HTTP
Load balancer can load an X.509 certificate (which is a SSL/TLS server certificate)
We can manage certificates in AWS using ACM (AWS Certificate Manager)
HTTPS Listener:
We must specify a default certificate
We can add an optional list of certificates to support multiple domains
Clients can use SNI (Server Name Indication) to specify which hostname want to reach
Ability to specify a security policy to support older versions of SSL/TLS (for legacy clients like Internet Explorer 5 lol:) )
SNI - Server Name Indication
SNI solves the problem of being able to load multiple SSL certificates onto one web server
There is a newer protocol which requires the client to indicate the hostname of the target server in the initial SSL handshake
In case of AWS this only works for ALB, NLB and CloudFront (no CLB!)
ELB - Connection Draining
Feature naming:
In case of a CLB is called Connections Draining
If we have a target group: (ALB, NLB) it is called Deregistration Delay
Connection draining is the time to complete in-flight requests while the instance is de-registering or unhealthy. Basically it allows the instance to terminate whatever it was doing
The LB will stop sending new requests to the target instance which is in progress of de-registering
The time period of the connection draining can be set between 1 seconds to 3600 seconds
It also can be disabled (set the period to 0 seconds)