Instances Link to heading
These are just the virtual machines on AWS.
chatgpt summary overview:
- Instance types and families: cpu/mem/storage families
- Pricing and purchase options: on-demand/reserved/spot
- Launching and config instances: aws console, cli, ebs
- Security and networking: SGs, firewalls, iam roles, vpc notion
- Monitoring, Scaling and maintenance: cloudwatch, asg
- Instance lifecycle and best practices: stop/start/hibernate/terminate
Instance types Link to heading
Each instance type offers diff cpu, mem and storage caps. Other resources are shared like network and disk subsystem.
These instances types can be available based on: Virtualization type, Hypervisor, CPU Arch.
Consider the following instance type families:
DL, Inf, P, Trn: Specialized for machine learning (training and inference).
F: FPGA-based custom acceleration.
G and VT: Graphics and video processing.
HPC: High-performance computing.
High Memory, X: Memory-intensive applications.
Standard (A, C, D, H, I, M, R, T, Z) : General-purpose computing.
Hypervisor:
- nitro-based:
- general purpose: m5-m8, t3, t4
- compute optimized: c5-c8
- memory optimized: r5-r8, u-*
- storage optimized: d3, i3-4
- accelerated computing: dl1-2, g4-g6
- high-performance computing: hp*
- prev gen: a1
- xen-based
- general purpose: m1-m4, t1, t2
- compute optimized: c1,c3,c4
- memory optimized: r3,r4,x1
- storage optimized: d2, h1, i2, i3
- accelerated computing: f1, g3, p2, p3
- nitro-based:
Virtualization type: PV, HVM
CPU Arch:
- Intel (x86, x64), AES-NI, AVX2, AVX-512, Turbo Boost, VNNI/INT8
- AMD (x64): SME, AVX, VNNI
- AWS Graviton: improved general purposes
- AWS Trainium, Inferentia (SageMaker, ECS, EKS): Deep learning
Burstables Link to heading
Tx instances can burst its performance.
It basically provides a base line cpu utilization if the consume of it was that line or lower, then credits are being given, each instance type has a max limit of cpu credits. These credits can automatically be used in order to allow the cpu to burst more than its base line utilization. In case the cpu credits are spent all, then a surplus cost is applied to the billing in case the cpu utilization keeps above the baseline for more than 24hrs. Instances can be set with both options for the cpu auto burst when credits are terminated, it can be set to unlimited or the other one which throtles the cpu in case there are no way to automatically buy more cpu credits.
GPU instances Link to heading
These are NVIDIA GPUs with thousands compute cores.
Used for engineering, CUDA, opencl.
p4, p3, g4, g5, inf1: 96vcpus.
Are used for: ML, DL, complex inferences, video transcoding, graphics-intensive apps. nvidia a100, v100, t4, a10g
Billing and purchasing Link to heading
On-demand Link to heading
You pay the usage starting from the first minute. You pay as you go.
Reserved Instances Link to heading
- It is a billing discount
- Has to match instance size (from instance type), region, OS, tenancy
- for RI regional: it applies disccount to the region, it applies to the instance type family
- for RI zonal: it only applies to the instance type specific
- Has to match instance size (from instance type), region, OS, tenancy
- Once the billing discount (RI) is expired, then instances are back to on-demand
- Time commitment: 1, 3 years
- options: all upfront, partial, no upfront
- RI can be puchased from RI Marketplace, it can be explored with awscli
describe-reserved-instances-offerings
Spot instances Link to heading
- Spare compute aws ec2 capacity at a cheaper price
- Pricing varies based on long-term supply
- Price can be seen via awscli which updates every 5min
- It can be saved up to 90% from a regular on-demand instances
- AWS ec2 can request get back its spare capacity with a 2-min notice notification (cloudwatch?)
- There is also the instance rebalance recommendation notification previous to the instance-interruption 2min notif.
- The 2min notif and inst reblance recommn metrics can be catch on AWS EventBridge
- Additionally AWS Cloudtrail logs terminated instances by spot decomission with event name BidEvictedEvent.
- Also these events can be monitored on AWS EventBridge and curl’ing the instance metadata api
- EventBridge event: “EC2 Instance Rebalance Recommendation”
- Metadata api: http://169.254.169.254/latest/meta-data/events/recommendations/rebalance
- When are used for?: big data, containers, cicd, stateless apps
- Allocation: be flexible on instance families and AZs for the spot fleet so you have more options to have an allocated spot instance
- or allocate based on instance attribs: vcpus, mem, storage.
- AWS ASG vs EC2 Fleet?
- What can use spot on aws?: EMR, ECS, Batch, EKS, Sagemaker, Elastic Beanstalk, Gamelift
- On Burstable instances, better to use Standard Mode vs Unlimited Mode, might not be a change to accrue cpu credits, just surplus
- Instance interruption can be manually triggered for testing purposes
- Manually interrupted spot instances are billed for the seconds used
- Best practices:
- Don’t store data on these instances, use S3, EBS or Dynamodb
- Use spot with ASG
- Ensure AMI is fully prepared to quickly initialize the OS and services
- Monitor the two-minute instance termination notif and the ec2 instance rebalance recommendation signal, to proactively take actions for the spot inst about to be retired
Dedicated Hosts Link to heading
- Physical servers dedicated to your account
- You can bring your own licences
- Need to have your own AMI (not from marketplace or aws) with your own BYOL.
- You can purchase it on-demand or by reserving and paying upfront
- Instance capacity: yes, more things to configure given this is a host, things like physical cores, sockets, vcpus.
- Single instance type VS multiple instance types: based on the hardware specs you can define wether or not:
- the instance can run multiple instances sizes under the same instance family
- or just run one specific instance size and family only
- Single instance type VS multiple instance types: based on the hardware specs you can define wether or not:
- Burstable instances can run on a t3 dedicated host as they are.
- Bustable functions works as if it were running on a shared tenant, its exactly the same
- Before running instances on a dedicated host, you have first to allocate a dedicated host
- Then you need to define: single instance type or Multiple instances types support
- If the requested and allocated dedicated host is no longer needed, you have to release it
- Sharing dedicated hosts:
- You can share the dedicated host so other AWS accounts or Accounts from your organizations can run Instances on your dedicated host (via AWS RAM)
- Dedicated host maintenance and recovery: AWS can prepare a Dedicated Host replacement and restart running instances on the new dedicated host
Dedicated instances Link to heading
- Run on single-tenant hardware, meaning only your aws account, not shared
- Why?: compliance
- More expensive compared to regular ec2 on shared tenants
Capacity Reservations Link to heading
- Dunno?
Launch templates Link to heading
- Templates with user-defined parameters for ec2 instances launch so we dont specify those every time.
- Attrs like: AMI, key pair, subnet id, az, etc
- LT are versioned, so LT are inmutables
- AMI can be stored on AWS SSM (resolve:ssm:my_ami_name)
- LTs are the better (and replacement) than Launch Configurations on ASG
Launching instances / Connect / Link to heading
- Can connect via SSH and also RDP
- AWS SSM -> Fleet Manager -> RDP
- Instance Connect: connect with no ssh keys, and for free.
- Requires agent ec2-instance-connect, iam policy
- AWS charges only when the ec2 instance is on running state
- Instances states: running, stopping, stopped, pending
- ec2 can be protected against: termination and stop
- An ec2 can hibernate by dumping RAM info into EBS vol
- by being on hibernation, aws doesn’t charge it
- check if the AMI supports hibernation
- EBS root vol should be encrypted! RAM data will be dumped there
Instance Metadata Link to heading
URL: http://169.254.169.254/latest/meta-data/ The EC2 api metadata can provide info:
- instance information
- network info
- iam role info
- storage info
- user data
- hostname info
- instance lifecycle
- security groups
Networking Link to heading
Concepts: subnets, vpcs, sgs, RTs, subnet masks, public/private IPs, NAT, IGWY, routing
AWS VPC supports traffic monitoring.
Summaries:
- ENA, enhanced network interface (on Nitro hypervisor)
- What is an Elastic FAbric Adapter EFA??
AWS has multiple Regions, Availability zones, Local zones, Wavelength zones, AWS Outposts.
Instance IP addressing Link to heading
- Supports ipv4 and ipv6
- Don’t use NAT gwy for just sshing, you can use AWS Instance Connect Endpoint
- so no need for the ec2 to have public internet access
- Use Private Link service to connect AWS services inside AWS’ network with no public ipv4 IPs
- Temporary public ipv4 have a cost
- Each AWS VPC has its own dns server (plus 2 from vpc network range)
- A single ENI can have multiple private ipv4 addrs depending on the instance type and family
- An ec2 can have hostname as:
- resource name:
i-0123456789abcdef.us-west-2.compute.internal - ip name: ip-10-24-34-0.us-west-2.compute.internal
- resource name:
Elastic IP addresses Link to heading
- Flexible IPs that can be released and associated from/to EC2 instances
- Default EIPs per region is 5
- Can be transferred to another AWS acct
Network interfaces Link to heading
- There are instances that support many ENIs, the more network cards supports, better network performance/bandwidth, above 100Gbps.
- Each instance family and type supports a given number of ENIs and each of them support a given number of ipv4 addrs
Enhanced networking Link to heading
- All Nitro hypervisor instances use ENA which support up to 100Gbps enhanced network
- Tool
ethtoolprovides statistics summary of network interface performance
Elastic Fabric Adapter Link to heading
- Low latency interfaces optimized for AI/ML and HPC apps
- ENA and ECCL for ML workloads??
Others? Link to heading
- Instance topology
- Placement groups
- Network MTU
- Virtual private clouds
Storage Link to heading
EC2 can have different storage options: EBS, S3, EFS, FSx, File cache.
Amazon EBS Link to heading
- EBS volumes can be attached to an EC2
- EBS snapshots can persist snapshot of a EBS volume
- Instance types have a specific max number of EBS volume to be attached
- most of nitro instances support 28 max ebs attach,
- network interface attach has to be reduced too: 28 max attachs: 27 ebs + 1 ENI
- most of nitro instances support 28 max ebs attach,
Amazon EC2 instance store Link to heading
- Ephimeral data storage
Root volumes Link to heading
Device names for volumes Link to heading
Block device mappings Link to heading
hello
