S3 Basics Link to heading

AWS S3 can store objects. It is usually used as:

  • Backup storage
  • App web hosting
  • Media hosting
  • Software delivery

Labs Link to heading

  • Create s3 bucket complete with all features awscli. nxf
  • Scripts to self-host web static app on s3
  • Storage class test: compare standard, IA, glacier and its pricing
  • Metadata: query bucket metadata table
  • Upload:
    • Test multi-part upload, query list multiparts, abort multipart upload
      • awscli and python?
    • s3 lifecycle to delete multipart uploads

Working with buckets Link to heading

Notes:

  • 2 types of buckets:
    • General purpose buckets
    • Directory buckets (s3 one zone) for performance sensitive
  • Buckets should be have enabled “Block Public Access” all the time as best practice
    • For static web hosting, it only should be opened to Cloudfront via Origin Access Identity OAI on its iam policy
  • Bucket configs: cors, events, lifecycle, location, logging, obj locking, policy, acl, replication, tag, versioning, transfer accl, website, encryption
  • Quotas:
    • by default up to 10,000 buckets in an aws acct
  • aws s3 vhost: https://bucket-name.s3.region-code.amazonaws.com/key-name
  • s3 bucket is high-throughput, it can be mounted as a local file system on Linux only
    • symlinks doesn’t work. Up to 5TB files
    • available for all storage classes except glaciers
    • mount-s3 bin is required to download and install in order to mount an s3 bucket
      • supports local disk cache
  • s3 has “Storage Browser” web service simple component for users to browser your s3 bucket objs
    • can be used with AWS Cognito / Amplify for user access
  • Transfer acceleration can be enabled on s3 for fast upload from users/customers on different regions
    • Takes 20min to be available after activation
  • Requester pays. A requester can pay for s3 objects requests/transfers instead of the s3 bucket owner, disabled by default?

Working with objects Link to heading

  • Naming objects
    • Support prefixes like slashes / but are not directories per se
    • Relative paths ideas are valid as they were subdirs: ../../
    • There are some characters valid for obj naming, some characters that might require special handling and others that should be avoided.
  • Metadata:
    • System defined: creation date, size, storage class
    • User defined: user-defined name-value pairs
      • It can’t be modified later, unless a new obj copy is made
      • using SDK these user-defined metadata start with x-amz-meta- http header
    • S3 has Metadata Tables service that are updated regulary and can be queried with some pricing (Athena, Redshift, EMR, Quicksight)
      • Useful for finding files with extensions or objects deleted
  • Uploading objects
    • Any file type can be uploaded
    • File size to be uploaded
      • Limits:
        • max 160Gb a single file to be uploaded from console (single operation)
        • max 5GB in a single PUT using SDK/S3 Rest/CLI (single operation)
        • if using multipart upload, multiple chunks of the file can be uploaded, max size of the file has to be 5Mb-5TB
        • Max num parts per upload: 10,000
      • Multipart upload: done in parallele, better throughput, can pause/resume
        • Done by S3 using defined algorithms and checksums to validate a single or multipart upload was succes or not
        • Pricing: aws can maintain/retain multiparts uploaded to S3 and that has a Cost per Bandwith and Storage services
        • Recommended to cleanup incomplete uploades periodically using action AbortIncompleteMultipartUpload in a Lifecycle
        • IAM: s3:PutObject permission is needed
  • TODO: Making conditional requests (???)
  • Copying, moving, and renaming objects
    • If Versioning not enabled, then the new copy obj is current, the old one gets replaced
    • If versioning is enabled, the old obj and the new obj exists together
    • Metadata can be rewritten in the new obj
  • Downloading objects
    • Pricing: DataTransfer fee is applied if obj is downloaded from outside of AWS Network
      • Inside AWS Network DataTransfer fee is free, but GET request is still charged
    • Multiple objs can be downloaded
    • Part of a larger object can be downloaded by specifying bytes range
    • Objs can be presigned and have an expirying public URL to download

Checking object integrity in Amazon S3 Link to heading

Deleting objects Link to heading

Organizing and listing objects Link to heading

Using presigned URLs to download and upload objects Link to heading

Transforming objects Link to heading

Performing object operations in bulk Link to heading

Querying data in place Link to heading

Working with Amazon S3 Tables and table buckets Link to heading

Access control Link to heading

Security Link to heading

Data protection Link to heading

Cost optimization Link to heading

Logging and monitoring Link to heading

Optimizing performance Link to heading

Hosting a static website Link to heading