S3 Basics Link to heading
AWS S3 can store objects. It is usually used as:
- Backup storage
- App web hosting
- Media hosting
- Software delivery
Labs Link to heading
- Create s3 bucket complete with all features awscli. nxf
- Scripts to self-host web static app on s3
- Storage class test: compare standard, IA, glacier and its pricing
- Metadata: query bucket metadata table
- Upload:
- Test multi-part upload, query list multiparts, abort multipart upload
- awscli and python?
- s3 lifecycle to delete multipart uploads
- Test multi-part upload, query list multiparts, abort multipart upload
Working with buckets Link to heading
Notes:
- 2 types of buckets:
- General purpose buckets
- Directory buckets (s3 one zone) for performance sensitive
- Buckets should be have enabled “Block Public Access” all the time as best practice
- For static web hosting, it only should be opened to Cloudfront via Origin Access Identity OAI on its iam policy
- Bucket configs: cors, events, lifecycle, location, logging, obj locking, policy, acl, replication, tag, versioning, transfer accl, website, encryption
- Quotas:
- by default up to 10,000 buckets in an aws acct
- aws s3 vhost:
https://bucket-name.s3.region-code.amazonaws.com/key-name
- also, but to be discontinued in future: https://s3.region-code.amazonaws.com/bucket-name/key-name
- HTTP GET with no SSL: http://s3.us-east-1.amazonaws.com/example.com/homepage.html
- s3 supports cnaming, bucket images.example.com can have a CNAME to images.example.com.s3.us-east-1.amazonaws.com
- s3 bucket is high-throughput, it can be mounted as a local file system on Linux only
- symlinks doesn’t work. Up to 5TB files
- available for all storage classes except glaciers
mount-s3bin is required to download and install in order to mount an s3 bucket- supports local disk cache
- s3 has “Storage Browser” web service simple component for users to browser your s3 bucket objs
- can be used with AWS Cognito / Amplify for user access
- Transfer acceleration can be enabled on s3 for fast upload from users/customers on different regions
- Takes 20min to be available after activation
- Requester pays. A requester can pay for s3 objects requests/transfers instead of the s3 bucket owner, disabled by default?
Working with objects Link to heading
- Naming objects
- Support prefixes like slashes
/but are not directories per se - Relative paths ideas are valid as they were subdirs:
../../ - There are some characters valid for obj naming, some characters that might require special handling and others that should be avoided.
- Support prefixes like slashes
- Metadata:
- System defined: creation date, size, storage class
- User defined: user-defined name-value pairs
- It can’t be modified later, unless a new obj copy is made
- using SDK these user-defined metadata start with
x-amz-meta-http header
- S3 has Metadata Tables service that are updated regulary and can be queried with some pricing (Athena, Redshift, EMR, Quicksight)
- Useful for finding files with extensions or objects deleted
- Uploading objects
- Any file type can be uploaded
- File size to be uploaded
- Limits:
- max 160Gb a single file to be uploaded from console (single operation)
- max 5GB in a single PUT using SDK/S3 Rest/CLI (single operation)
- if using multipart upload, multiple chunks of the file can be uploaded, max size of the file has to be 5Mb-5TB
- Max num parts per upload: 10,000
- Multipart upload: done in parallele, better throughput, can pause/resume
- Done by S3 using defined algorithms and checksums to validate a single or multipart upload was succes or not
- Pricing: aws can maintain/retain multiparts uploaded to S3 and that has a Cost per Bandwith and Storage services
- Recommended to cleanup incomplete uploades periodically using action
AbortIncompleteMultipartUploadin a Lifecycle - IAM:
s3:PutObjectpermission is needed
- Limits:
- TODO: Making conditional requests (???)
- Copying, moving, and renaming objects
- If Versioning not enabled, then the new copy obj is current, the old one gets replaced
- If versioning is enabled, the old obj and the new obj exists together
- Metadata can be rewritten in the new obj
- Downloading objects
- Pricing: DataTransfer fee is applied if obj is downloaded from outside of AWS Network
- Inside AWS Network DataTransfer fee is free, but GET request is still charged
- Multiple objs can be downloaded
- Part of a larger object can be downloaded by specifying bytes range
- Objs can be presigned and have an expirying public URL to download
- Pricing: DataTransfer fee is applied if obj is downloaded from outside of AWS Network