AWS Certified Solutions Architect - Associate Exam Notes > Deep Dive > Advanced S3

Advanced S3

To use MFA-Delete we have to enable versioning on the selected bucket
MFA will be required when:
- We want to permanently delete an object version
- We want to suspend the versioning on the bucket
MFA wont be required when:
- We want to enable versioning
- We want to list deleted versions
- We want to add a delete marker to an object
MFA-Delete can be enabled/disabled only by the owner of the bucket (root account)!
MFA-Delete currently can only be enabled using the CLI

For audit purposes we would want to log all access to S3 buckets
Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
The data can be analyzed by some data analysis tools or Amazon Athena

We should never set our logging bucket to be the monitored bucket! This may create a logging loop causing the bucket to grow exponentially

In order to be able enable replication:
- We must enable versioning on the source and destination buckets
There are 2 types of replication:
- Cross Region Replication (CRR): bucket are in a different region
  - Used for: compliance, lower latency access, replication across accounts
- Same Region Replication (SRR): buckets are in the same region
  - Used for: log aggregation, live replication between production and test accounts
In both cases the buckets can be in separate accounts
Copying between replica buckets happens asynchronously (it is very quick)
In order to be ably to copy between replicas, an IAM permission has to be assigned to the source bucket

Only the new objects are replicated after the replication is activated (no retroactive replication)
For DELETE operations:
- For deletion without version ID, a delete marker is added to he object. Deletion is not replicated
- For deletion with version ID, the object is deleted in the source bucket. Deletion is not replicated
There is no replication chaining!

We can generate pre-signed URLs using the SDK and the CLI
Pre-signed URLs have a default wait time of 3600 seconds. This can be changed with --expires-in argument
Users given a pre-signed URL will inherit the permissions of the person who generated the ULR

High durability (13 nines SLA) of objects across multiple AZ
SLA: if we store 10 million objects in S3, we can expect to loose on average a single file per 10K years
99.99% availability per year
It can sustain 2 concurrent facility failures

Suitable for data that is less frequently accessed, but it should be retrieved quickly when it is needed
Same durability as General Purpose, 99.9% availability
Lower cost than General Purpose

Low cost object storage for archiving/backup data
Data is retained for longer terms (10s of years)
Alternative to on-premise magnetic tape
Same durability as General Purpose
Cost per storage per month is really low, but we pay for data retrieval as well
Each item in Glacier is called an archive, archives are stored in Vaults
Provides 3 retrieval options:
- Expedited (1 to 5 minutes): costs $10
- Standard (3 to 5 hours)
- Bulk (5 to 12 hours)
Minimum storage duration for Glacier is 90 days

We can transition objects between storage classes in order to save money
General rules:
- Infrequently access documents should be moved to STANDARD_ID
- Objects that don’t need real-time access should be moved to GLACIER or DEEP_ARCHIVE
Moving objects can be done manually or can be done via a lifecycle configuration
Transaction actions: they define when should objects be transitioned from one storage to another
Expiration actions: configuration to delete objects after a given time.\
- Can be used to delete old versions of files if versioning is enabled on the bucket
- Can be used to clean-up incomplete multi-part uploads
Rules can be applied for a certain prefix
Rules can be created for certain object tags

Amazon S3 automatically scales to high request rates, having latency of 100-200ms to get the first byte out of S3
We can achieve:
- 3500 PUT/COPY/POST/DELETE requests per second per prefix in a bucket
- 5500 GET/HEAD requests per second per prefix in a bucket
Prefix explained:
- Example of a file in a bucket: my-bucket/folder/subfolder/file
- Prefix in this case is: /folder/subfolder/
- Request performance will apply to each prefix separately

S3 Performance can be affected by KMS limits
If encryption is enabled using SSE-KMS, we get additional requests to KMS which will apply to our KMS quota
KMS could throttle performance, as of today we can not request a quota increase for it

Multi-Part upload: will split data into smaller chunks and it will upload them in parallel. It is recommended to be used for files bigger than 100MB, it is mandatory for files bigger than 5GB
S3 Transfer Acceleration: it can increase the transfer speed in case of uploads by using an AWS edge location. Compatible with multi-part upload.
S3 Byte-Range Fetches: can be used to speed up downloads by parallelizing GET requests. Can be used to retrieve only a part of the file

Can be used to retrieve less data using SQL queries to do server side filtering
We can filter by rows and columns. SQL statements should be simple, we can not have joins
The purpose of S3 Select is to use less network traffic

Serverless service to perform analytics directly against S3 files
We can use SQL to query data from the files from S3
It provides a JDBC/ODBC driver
Pricing: we are charged per query amount of data scanned, we are billed for what are we using
Supported file formats: csv, json, orc, Avro, Parquet. In the back-end it uses Presto query engine
Athena uses cases: business intelligence, analytics, reporting, log analysis, etc.

S3 Object Lock: implements WORM (Write Once Read Many Model) model, meaning that it guarantees that a file is only written once and it can not be deleted until the lock is removed
Glacier Vault Lock: same WORM model is implemented, locket file can not be changed as long as the lock is active
Helpful for compliance and data retention