Advanced S3

Advanced S3

S3 MFA-Delete

  • To use MFA-Delete we have to enable versioning on the selected bucket
  • MFA will be required when:
    • We want to permanently delete an object version
    • We want to suspend the versioning on the bucket
  • MFA wont be required when:
    • We want to enable versioning
    • We want to list deleted versions
    • We want to add a delete marker to an object
  • MFA-Delete can be enabled/disabled only by the owner of the bucket (root account)!
  • MFA-Delete currently can only be enabled using the CLI

S3 Access Logs

  • For audit purposes we would want to log all access to S3 buckets
  • Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
  • The data can be analyzed by some data analysis tools or Amazon Athena

Warnings

  • We should never set our logging bucket to be the monitored bucket! This may create a logging loop causing the bucket to grow exponentially

S3 Replication

  • In order to be able enable replication:
    • We must enable versioning on the source and destination buckets
  • There are 2 types of replication:
    • Cross Region Replication (CRR): bucket are in a different region
      • Used for: compliance, lower latency access, replication across accounts
    • Same Region Replication (SRR): buckets are in the same region
      • Used for: log aggregation, live replication between production and test accounts
  • In both cases the buckets can be in separate accounts
  • Copying between replica buckets happens asynchronously (it is very quick)
  • In order to be ably to copy between replicas, an IAM permission has to be assigned to the source bucket

Replication Notes

  • Only the new objects are replicated after the replication is activated (no retroactive replication)
  • For DELETE operations:
    • For deletion without version ID, a delete marker is added to he object. Deletion is not replicated
    • For deletion with version ID, the object is deleted in the source bucket. Deletion is not replicated
  • There is no replication chaining!

S3 Pre-singed URLs

  • We can generate pre-signed URLs using the SDK and the CLI
  • Pre-signed URLs have a default wait time of 3600 seconds. This can be changed with --expires-in argument
  • Users given a pre-signed URL will inherit the permissions of the person who generated the ULR

S3 Storage Classes

  • Amazon S3 Standard-General Purpose
  • Amazon S3 Standard-Infrequent Access (IA)
  • Amazon S3 One Zone-Infrequent Access
  • Amazon S3 Intelligent Tiering
  • Amazon Glacier
  • Amazon Glacier Deep Archive
  • AmazonS3 Reduced Redundancy Storage (deprecated)

S3 Standard - General Purpose

  • High durability (13 nines SLA) of objects across multiple AZ
  • SLA: if we store 10 million objects in S3, we can expect to loose on average a single file per 10K years
  • 99.99% availability per year
  • It can sustain 2 concurrent facility failures

S3 Standard - Infrequent Access

  • Suitable for data that is less frequently accessed, but it should be retrieved quickly when it is needed
  • Same durability as General Purpose, 99.9% availability
  • Lower cost than General Purpose

S3 One Zone - Infrequent Access

  • Same as Standard IA, but data is stored in a single AZ
  • Same durability as Standard IA. Data can be lost if an AZ goes down
  • 99.5% availability per year
  • Lower cost than IA

S3 One Zone - Intelligent Tiering

  • Automatically moves objects between two access tiers based on access patterns
  • Has a small monthly monitoring fee
  • Same durability as General Purpose, having availability of 99.9%

S3 Glacier

  • Low cost object storage for archiving/backup data
  • Data is retained for longer terms (10s of years)
  • Alternative to on-premise magnetic tape
  • Same durability as General Purpose
  • Cost per storage per month is really low, but we pay for data retrieval as well
  • Each item in Glacier is called an archive, archives are stored in Vaults
  • Provides 3 retrieval options:
    • Expedited (1 to 5 minutes): costs $10
    • Standard (3 to 5 hours)
    • Bulk (5 to 12 hours)
  • Minimum storage duration for Glacier is 90 days

S3 Glacier Deep Archive

  • For very long term storage - cheaper than S3 Glacier
  • Retrieval options:
    • Standard (12 hours)
    • Bulk (48 hours)
  • Minimum storage duration is 180 days

S3 - Moving between storage classes

  • We can transition objects between storage classes in order to save money
  • General rules:
    • Infrequently access documents should be moved to STANDARD_ID
    • Objects that don’t need real-time access should be moved to GLACIER or DEEP_ARCHIVE
  • Moving objects can be done manually or can be done via a lifecycle configuration
  • Transaction actions: they define when should objects be transitioned from one storage to another
  • Expiration actions: configuration to delete objects after a given time.\
    • Can be used to delete old versions of files if versioning is enabled on the bucket
    • Can be used to clean-up incomplete multi-part uploads
  • Rules can be applied for a certain prefix
  • Rules can be created for certain object tags

S3 - Performance

  • Amazon S3 automatically scales to high request rates, having latency of 100-200ms to get the first byte out of S3
  • We can achieve:
    • 3500 PUT/COPY/POST/DELETE requests per second per prefix in a bucket
    • 5500 GET/HEAD requests per second per prefix in a bucket
  • Prefix explained:
    • Example of a file in a bucket: my-bucket/folder/subfolder/file
    • Prefix in this case is: /folder/subfolder/
    • Request performance will apply to each prefix separately

S3 KMS Limitation

  • S3 Performance can be affected by KMS limits
  • If encryption is enabled using SSE-KMS, we get additional requests to KMS which will apply to our KMS quota
  • KMS could throttle performance, as of today we can not request a quota increase for it

S3 Performance Optimizations

  • Multi-Part upload: will split data into smaller chunks and it will upload them in parallel. It is recommended to be used for files bigger than 100MB, it is mandatory for files bigger than 5GB
  • S3 Transfer Acceleration: it can increase the transfer speed in case of uploads by using an AWS edge location. Compatible with multi-part upload.
  • S3 Byte-Range Fetches: can be used to speed up downloads by parallelizing GET requests. Can be used to retrieve only a part of the file

S3 Select and Glacier Select

  • Can be used to retrieve less data using SQL queries to do server side filtering
  • We can filter by rows and columns. SQL statements should be simple, we can not have joins
  • The purpose of S3 Select is to use less network traffic

AWS Athena

  • Serverless service to perform analytics directly against S3 files
  • We can use SQL to query data from the files from S3
  • It provides a JDBC/ODBC driver
  • Pricing: we are charged per query amount of data scanned, we are billed for what are we using
  • Supported file formats: csv, json, orc, Avro, Parquet. In the back-end it uses Presto query engine
  • Athena uses cases: business intelligence, analytics, reporting, log analysis, etc.

S3 Object Lock and Glacier Vault Lock

  • S3 Object Lock: implements WORM (Write Once Read Many Model) model, meaning that it guarantees that a file is only written once and it can not be deleted until the lock is removed
  • Glacier Vault Lock: same WORM model is implemented, locket file can not be changed as long as the lock is active
  • Helpful for compliance and data retention