Learning More about AWS (Part 1) - Notes for Certified Cloud Practicioner Exam

I intreact with AWS a good amount - I use s3 for image/audio/video/other data storage and I use the Relational Database Service for the database for this site - so I am going to try to learn more about how it all works. I have learned a lot about computer networking by having to look up what is what when implementing AWS / Google Cloud services, but I still don't look forward to setting up another service, so by learning more about AWS, I hope that the task begins to seem less daunting to me.

Date Created:
Last Edited:
2 453
What I Intend to Learn

  • I hope to learn enough to earn the AWS Certified Cloud Practitioner and AWS Certified Developer - Associate certifications.
Things that I have Done with AWS / Things that I Need to DO

  • What I Have Done
    • I have created s3 buckets to store / retrieve images, audio files, video files, and geojson objects.
    • I have used CloudFront to create edge location for these s3 buckets to make access to their objects quicker.
    • I have created Lambda Functions that run on interaction with the s3 buckets.
    • I have created a Relational Database Service for PostgreSQL that I use for this website.
    • I have limited access to each of these buckets based on IP address, using CORS headers, and by restricting access to certain AWS users.
  • What I Need to Do
    • I need to create a few different things relating to images stored in the s3 bucket
      • A Lambda function that appropriately checks for whether or not the image contains inappropriate content.
      • A way for the images to be resized (width / height) on the server and then sent to the client / save a new version of the image in the s3 bucket
    • I need to validate videos / audio stored in AWS
    • I need to generate captions for video / audio in AWS
    • I need to generate thumbnail for video on AWS
    • I need to change the audio / video process for streaming video / audio
      • This replaces the ffmpeg process that I am currently performing on the server
AWS Certified Cloud Practitioner Notes

Getting Started
  • Amazon Web Services
    • AWS has 200+ services
    • Provides most (200+) services
    • Reliable, secure and cost-effective
  • Exam tests your decision making capabilities
    • Which service do you choose in which situation
  • Benefits of the Cloud
    • On demand resource provisioning (also called Elasticity) - only provision resources when you need them and release the resources when no longer needed
    • Trade capital expense for variable expense
    • Benefit from massive economies of scale
    • Stop guessing capacity
    • Increase speed and agility
    • Stop spending money running and maintaining data centers
    • Go global in minutes
Regions and Zones
  • AWS provides 20+ regions around the world (expanding every year)
  • AWS Regions - Advantages
    • Low Latency
    • Global Footprint
    • Adhere to government regulations
    • High availability
  • Availability Zones (AZs)
    • Each AWS region consists of multiple, isolated, and physically separate AZs
    • Availability Zones in a Region are connected through low-latency links
    • Each availability zone:
      • Can have one or more discrete data centers
      • has redundant power, networking, and connectivity
    • Increase availability and fault tolerance of applications in the same region
    • Achieve high availability and greater fault tolerance

EC2 Fundamentals
  • Where do you deploy applications in AWS?
    • EC2 instances - virtual servers in AWS
    • EC2 service - Provision EC2 instances or virtual servers
  • EC2 Features
    • Create and manage the lifecycle of EC2 instances
    • Attach storage (& network storage) to your ec2 instance
    • Manage network connectivity to your EC2 instance
    • Load balancing and auto scaling for multiple EC2 instances

Useful Commands:

sudo su
yum update -y
yum install httpd
systemctl start httpd
systemctl enable httpd
echo "Hello World 2" > /var/www/html/index.html

EC2 Concepts - Amazon Machine Image (AMI)

  • What operating system and what software do you want on the instance?
  • Three AMI sources:
    • Provided by AWS
    • AWS Market Place: Online store for customized AMIs. Per hour billing.
    • Customized AMIs: Created for you

EC2 Concepts - Instance Families

  • Optimized combination of compute (CPU, GPU), memory, disk (storage) and networking for specific workloads
  • 270+ instances across 40+ types for different workloads
    • m (m4, m5, m6) - General Purpose
    • c (c4, c5, c5n) - Compute Optimized
    • r (r4, r5, r5a, r5n) -Memory (RAM) optimized
    • i (i3) - Storage (I/O) Optimized
    • g (g3, g4) - GPU Optimize - Graphics Processing

EC2 Concepts - Security Groups

  • Virtual firewall to control incoming and outgoing traffic to/from AWS resources (EC2 instances, databases, etc.)
  • Provides additional layer of security - Defense in Depth

Security Group Rules

  • Default deny - If there are no rules configured, no outbound/inbound traffic is allowed
  • Allows allow rules ONLY
  • Separate rules for inbound and outbound traffic

EC2 Security - Key Pairs

  • EC2 uses public key cryptography for protecting login credentials
  • Key pair - public key and private key
    • Public key is stored in EC2 instance
    • Private key is stored by the customer

EC2 IP Addresses

  • Public IP addresses are internet addressable.
  • Private IP addresses are internal to a corporate network.
  • You CANNOT have two resources with same public IP address.
  • HOWEVER, two different corporate networks CAN have resources with the same IP privileges
  • All EC2 instances are assigned private IP addresses
  • (Remember) When you stop an EC2 instance, public IP address is lost

Elastic IP Addresses

  • How do you get a constant public IP address for an EC2 instance?
    • Quick and dirty way is to use an Elastic IP
  • Elastic IP can be switched to another EC2 instance within the same region.
  • Elastic IP can be switched to another EC2 instance within the same region
  • Elastic IP remains attached even if you stop the instance. You have to manually attach it.
  • Remember: You are charged for an Elastic IP address when you are NOT using it. Make sure that you explicitly release an Elastic IP when you are not using it.

IAAS (Infrastructure as a Service)

  • use only infrastructure from a cloud provider
    • Computers (virtual or on dedicated hardware), data storage space and Networking features
  • Also called "Lift and Shift"
  • Cloud Provider is responsible for:
    • Physical Infrastructure (Hardware, Networking)
    • Virtualization Layer (Hypervisor, Host OS)
  • Customer is responsible for:
    • Guest OS upgrades and patches
    • Application Code and Runtime
    • Availability, Fault Tolerance, Scalability etc.

PAAS (Platform as a Service)

  • Use a platform provided by cloud
  • Cloud Provider is responsible for:
    • OS (incl. upgrades and patches)
    • Application Runtime
    • Auto scaling, Availability & Load balancing etc...
  • Customer is responsible for:
    • Application code and/or
    • Configuration

AWS Managed Service Offerings

  • Elastic Load Balancing - Distribute incoming traffic across multiple targets
  • AWS Elastic Beanstalk - Run and Manage Web Apps
  • Amazon RDS - RDS - MySQL, Oracle, SQL Server, etc.
  • And a lot more...

Elastic Load Balancer

  • Distribute traffic across EC2 instances in one or more AZs in a single region
  • Managed service - AWS ensures that it is highly available
  • Auto scales to handle huge loads
  • Load Balancers can be public or private
  • Health checks - route traffic to healthy instances

Three Types of Elastic Load Balancers

  • Classic Load Balancer (Layer 4 and Layer 7)
    • Old generation supporting Layer 4 (TCP/TLS) and Layer 7(HTTP/HTTPS) protocols
    • Not recommended by AWS
  • Application Load Balancer (Layer 7)
    • Most popular and frequently used ELB in AWS
    • New generation and supporting from HTTP/HTTPS
    • Supports advanced routing approaches (Headers, Query Params, Path and Host Based)
  • Network Load Balancer (Layer 4)
    • New generation supporting TCP/TLS and UDP
    • Very high performance use cases

Availability

  • Are the applications available when the users need them?
  • Percentage of time an application provides the operations expected of it
  • Example: 99.9% availability. Also called four 9's availability

Availability Basics - EC2 and ELB

  • Deploy to multiple AZs
  • Deploy to multiple regions

Scalability

  • A system is handling 1000 transactions per second. Load is expected to increase 10 times in the next month
    • Can we handle a growth in users, traffic, or data size without any drop in performance?
    • Does ability to serve more growth increase proportionally with resources?
  • Ability to adapt to changes in demand (users, data)
  • What are the options that can be considered?
    • Deploy to bigger instances with bigger CPU and more memory
    • Increase the number of application instances and setup a load balancer
    • and a lot more...

Vertical Scaling

  • Deploying application / database to bigger instance:
    • A larger hard drive
    • A faster CPU
    • More RAM, CPU, I/O. or networking capabilities
  • There are limits to vertical scaling

Vertical Scaling for EC2

  • Increasing EC2 instance size:
    • t2.micro to t2.small or
    • t2.small to t2.xlarge or
    • ...

Horizontal Scaling

  • Deploying multiple instances of application / database
  • (Typically but not always) Horizontal Scaling is preferred to Vertical Scaling:
    • Vertical Scaling has limits
    • Vertical Scaling can be expensive
    • Horizontal Scaling increases availability
  • (BUT) Horizontal Scaling needs additional infrastructure
    • Load Balancers (etc.)

Horizontal Scaling for EC2

  • Distribute EC2 instances
    • in a single AZ
    • in multiple AZs in single region
    • in multiple AZs in multiple regions
  • Auto Scale: Auto Scaling Group
  • Distribute Load: Elastic Load Balancer

EC2 Tenancy - Shared vs Dedicated

  • Shared Tenancy (Default)
    • Single host machine can have instances from multiple customers
  • EC2 Dedicated Instances
    • Virtualized instances on hardware dedicated to one customer
    • You do NOT have visibility into the hardware of the underlying host
  • EC2 Dedicated Hosts
    • Physical severs dedicated to one customer
    • You have visibility into the hardware of underlying host (sockets and physical cores)
    • (Use cases) Regulatory needs or server bound software licenses like Windows Server, SQL server

EC2 Pricing Models Overview

Pricing Model

Description

Details

On demand

Request when you need it

Flexible and Most Expensive

Spot

Quote the maximum price

Cheapest (up to 90% off) BUT NO Guarantees

Reserved

Reserve ahead of time

Up to 75%off. 1 or 3 years reservation

Savings Plan

Commit spending $X per hour on (EC2 or AWS Fargate or Lambda)

Up to 66% off. No restrictions. 1 or 3 years reservation

EC2 On-Demand

  • On demand resource and provisioning - Use And Throw!
  • Highest cost and highest flexibility
  • This is what we have been using until now in this course
  • Ideal for:
    • A web application which receives spiky traffic
    • A batch program which has unpredictable runtime and cannot be interrupted
    • A batch program being moved from on-premises to cloud for the first time

EC2 Spot Instances

  • (Old Model) Bid a price. Highest bidder wins
  • (New Model) Quote your maximum price. Prices decided by long term trends
  • Up to 90% off (compared to On-Demand)
  • Can be terminated with a 2 minute notice
  • Ideal for Non time-critical workloads that can tolerate interruptions (fault-tolerant)
    • A batch program that does not have a strict deadline AND can be stopped at short notice and re-started

EC2 Reserved Instances

  • Reserve EC2 instances ahead of time!
  • Get up to 75% OFF!
  • Payment Models:
    • No upfront - $0 upfront. Pay monthly installment
    • Partial Upfront. $XYZ upfront. Pay monthly installment
    • All Upfront - Full amount upfront. $0 monthly installment
    • Cost wise: Earlier you pay, more the discount. All Upfront < Partial Upfront < No Upfront
    • A difference up to 5%

EC2 Savings Plans

  • EC2 Compute Savings Plans
    • Commitment: I would spend X dollars per hour on AWS compute resources (Amazon EC2 instances, AWS Fargate, and/or AWS Lambda) for a 1 or 3 year period
    • Up to 66% off (compares to on demand instances)
    • Provides complete flexibility:
      • You can change instance family, size, OS, tenancy or AWS Region of your Amazon EC2 instances
      • You can switch between Amazon EC2, AWS Fargate and/or AWS Lambda
  • EC2 Instance Savings Plans
    • Commitment: I would spend X dollars per hour on Amazon EC2 instances of a specific instance family (General Purpose, for example) within a specific region (us-east-1, for example)
    • Up to 72% off (compared to on demand instances)
    • You can switch operating systems (Windows to Linux, for example)

EC2 Pricing Models Overview

Pricing Model

Use Cases

On Demand

Spiky Workloads

Spot

Cost sensitive, Fault tolerant, Non immediate workloads

Reserved

Constant workloads that run all the time

Savings Plans

Constant workloads that run all the time and you want more flexibility

AWS Elastic BeanStalk

  • Next level of Platform as a Service
  • Simplest way to deploy and scale your web applications in AWS
    • Provides end-to-end web application management
  • Supports JAVA, .NET, Node.js, PHP, Ruby, Python, Go, and Docker Applications
  • No usage charges - Pay for AWS Resources provisioned
  • Features:
    • Automatic Load Balancing
    • Auto scaling
    • Managed platform updates
    • Application health monitoring

AWS Elastic BeanStalk Concepts

  • Application - A container for environments, versions and configuration
  • Application Version - A Specific version of deployable code (stored in s3)
  • Environment - An application version deployed to AWS resources. You can have multiple environments running different application versions for the same application

Auto Scaling Components

  • Launch Configuration / Template (What?)
    • EC2 instances size and Amazon Machine Image
  • Auto Scaling Group (Where?)
    • Min, max and desired size of ASG
    • Health checks
  • Auto Scaling Policies (When?)
    • When and How to execute scaling

Dynamic Scaling Policy Types

Scaling Policies - Background

  • Two Parts:
    • CloudWatch alarm (Is CPU utilization >80%? or <60%)
    • Scaling action (+5 EC2 instances or -3 EC2 instances)

Serverless

  • What if we do not need to worry about servers and focus on building our application?
  • Enter Serverless
  • Remember: Serverless does not mean "No Servers"
  • Serverless for me:
    • You don't worry about infrastructure
    • Flexible scaling
    • Automated High Availability
    • Pay for use:
      • You don't have to provision servers or capacity
  • You focus on code and the cloud managed service takes care of all that is needed to scale you code to serve millions of requests

AWS Lambda

  • You don't worry about servers or scaling or availability
  • You don't worry about your code
  • You pay what you use
    • Number of requests
    • Duration of requests
    • Memory consumed

AWS Lambda - Supported Languages

  • Java
  • Go
  • Powershell
  • Node.js
  • C#
  • Python
  • Ruby
  • and a lot more...

AWS Lambda Event Sources

  • Amazon SPU Gateway
  • AWS Cognito
  • Amazon DynamoDB (event)
  • Amazon CloudFront (Lambda@Edge)
  • AWS Step Functions
  • Amazon Kinesis (event)
  • Amazon Simple Storage Service
  • Amazon Simple Queue Service (event)
  • Amazon Simple Notification Service
  • The list is endless...

Other Compute Services

  • Amazon Lightsail
    • Use case 1: Pre-configured development stacks like LAMP, Nginx, MEAN, and Node.js
    • Use case 2: Run websites on WordPress, Magneto, Plesk, and Joomla
    • Low, predictable monthly price
  • AWS Batch
    • Use Case: Run batch computing workloads on AWS
    • Use Amazon EC2 and Amazon EC2 Spot Instances
Storage

I am thinking about making it such that all my projects use the same s3 buckets for image / audio / video / font - Need to pay attention to bucket performance vs size

Amazon S3 (Simple Storage Service)

  • Most popular, very flexible, and inexpensive storage service
  • Store large objects using a key-value approach
  • Also called Object Storage
  • Provides REST API to access and modify objects
  • Provides unlimited storage
    • (S3 storage class) 99.99% availability & (11 9's - 99.999999999) durability
    • Objects are replicated in a single region (across multiple AZs)
  • Store all file types - text, binary, backup, and archives:
    • Media files and archives
    • Application packages and logs
    • Backups of your databases or storage devices
    • Staging data during on-premise to cloud database migration

Amazon S3 - Objects and Buckets

  • Amazon S3 is a global service. Not associated with a region.
    • However, a bucket is created in specific AWS region.
  • Objects are stored in buckets
    • Bucket names are globally unique
    • Bucket names are used as part of object URLs => Can contain ONLY lower case letters, numbers hyphens and periods
    • Unlimited objects in a bucket
  • Each Object is identified by a key value pair

Amazon S3 Storage Classes - Introduction

  • Different kinds of data can be stored in Amazon s3
    • Media files and archives
    • Application packages and logs
    • Backups of your databases or storage devices
    • Long term archives
  • Huge variations in access patterns
  • Trade-off between access time and cost
  • S3 storage classes help to optimize your costs while meeting access time needs

Amazon S3 Storage Classes

Amazon S3 Storage Classes - Comparison

Amazon S3 Cost

  • Important pricing elements:
    • Cost of storage (per GB)
    • (If Applicable) Retrieval Charge (per GB)
    • Monthly tiering fee (Only for Intelligent Tiering)
    • Data of Transfer fee
  • FREE of Cost:
    • Data transfer into S3
    • Data transfer from Amazon S3 to Amazon CloudFront
    • Data transfer from Amazon S3 to services in the same region

Amazon S3 Glacier

  • In addition to existing as a S3 Storage Class, S3 Glacier is a separate AWS Service on its own
  • Extremely low cost storage for archives and long-term backups:
    • Old media ontent
    • Archives to meet regulatory requirements (old patient records etc.)
    • As a replacement for magnetic tapes
  • High durability (11 9s 99.99999999%)
  • High scalability
  • High security (encrypted at rest and in transfer)

Amazon S3 vs S3 Glacier

Storage Types - Block Storage and File Storage

  • What is the type of storage of your hard disk?
    • Block storage
  • You've created a file to share a set of files with your colleagues in an enterprise. What type of storage are you using?
    • File storage

Block Storage

  • Use case: Hard -disks attached to you computers
  • Typically, ONE block storage device can be connected to ONE virtual server

File Storage

  • Media workflows need huge shared storage for supporting processes like video editing
  • Enterprise users need a quick way to share files in a secure and organized way
  • These file shared are shared by several virtual servers

AWS Block Storage and File Storage

  • Block Storage
    • Amazon Elastic Block Store (EBS)
    • Instance Store
  • File Storage
    • Amazon EFS (for Linux Instances)
    • Amazon FSx Windows File Servers
    • Amazon FSx for Lustre (high performance use cases)

EC2- Block Storage

  • Two popular types of block storage can be attached to EC2 instances:
    • Elastic Block Store (EBS)
    • Instance Store)

  • Instance Stores are physically attached to the EC2 instance
    • Temporary data
    • Lifecycle ties to EC2 instance
  • Elastic Block Store (EBS) is a network storage
    • More durable
    • lifecycle not tied to EC2 instance

Instance Store

  • Physically attached to your EC2 instance
  • Ephemeral Storage
    • Temporary data
    • Data is lost when hardware fails or an instance is terminated
    • Use case: cache or scratch files
  • Lifecycle is tied to EC2 instance
  • Only some of the EC2 instance types support Instance Store

Instance Store - Advantages and Disadvantages

  • Advantages
    • very fast I/O (2-100X of EBS)
    • (Cost Effective) No extra cost. Cost is included in the cost of EC2 instance
    • Ideal for storing temporary information - cache, scratch files, etc.
  • Disadvantages
    • Slow boot up (up top 5 minutes)
    • Ephemeral storage (data is lost when hardware fails or instance is terminated)
    • CANNOT take a snapshot or restore from snapshot
    • File size based on instance type
    • You cannot detach and attach it to another EC2 instance

Amazon Elastic Block Store (EBS)

  • Network block storage attached to you EC2 instance
  • Provisioned capacity
  • Very flexible
    • increase size when you need it - when attached to EC2 instance
  • Independent lifecycle from EC2 instance
    • Attach/Detach from one EC2 instance to another
    • 99.999% Availability and replicated within the same AZ
    • Use Case: Run your custom database

Amazon EBS vs Instance Store

Hard Disk Drive vs Solid State Drive

Amazon EFS

  • Petabyte scale, Auto scaling, Pay for use shared file storage
  • Compatible with Amazon EC2 Linux-based instances
  • (Use cases) Home directories, file share, content management
  • (Alternative) Amazon FSx for Lustre
    • File system optimized for performance
    • High performance computing (HPC) and media processing use cases
    • Automatic encryption at-rest and in-transit
  • (Alternative) Amazon FSx Windows and File Servers
    • Fully managed Windows file servers
    • Accessible from Windows, Linux, and MacOS instances
    • Integrated with Microsoft Active Directory (AD) to support Windows-based environments and enterprises
    • Automatic encryption at-rest and in-transit

Review of Storage Options

AWS Storage Gateway

  • Hybrid storage (cloud + on premise)
  • Unlimited cloud storage for on-premise software applications and users with good performance
  • (Remember) Storage Gateway and S3 Glacier encrypt data by default
  • Three Options
    • AWS Storage File Gateway
    • AWS Storage Tape Gateway
    • AWS Storage Volume Gateway

Amazon Storage File Gateway

  • Problem Statement: Large on-premise file share with terabytes of data
    • Users put files into file share and applications use the files
    • Managing it is becoming expensive
    • Move the file share to cloud without performance impact
  • AWS Storage File gateway provides cloud storage for your file shares
    • Files stored in Amazon S3 & Glacier

AWS Storage Tape Gateway

  • Tape backups used in enterprises (archives)
    • Stored off-site - expensive, physical wear and tear
  • AWS Storage Tape Gateway - Avoid physical tape backups
  • No charge needed for tape backup infrastructure
  • Backup data to virtual tapes (actually, Amazon S3 and Glacier)

AWS Storage Volume Gateway

  • Volume Gateway: Move block storage to cloud
  • Automate backup and disaster recovery
  • Use cases: Backup and disaster recovery, Migration of application data
  • (Option 1) Cached (Gateway Cached Volumes):
    • Primary Data Store - AWS - Amazon S3
    • On-premise cache stores frequently accessed data
  • (Option 2) Stored (Gateway Stored Volumes):
    • Primary Data Store - On-Premises
    • Asynchronous copy to AWS
    • Stored as EBS snapshots

AWS Storage Gateway - Summary

  • Key to look for: Hybrid Storage (cloud + on premise)
  • File Share moved to cloud => AWS Storage File Gateway
  • Tape Backups on cloud (Block Storage) => AWS Storage Volume Gateway
    • High Performance => Stored
    • Otherwise => Cached
Databases

Databases Primer

  • databases provide organized and persistent storage for your data
  • To choose between different database types, we would need to understand
    • Availability
    • Durability
    • Consistency
    • Transactions etc.
  • Let's get started on a simple journey to understand these
  • Database Snapshots = making a copy of your database at a point in time

  • Database Transaction Logs - Let's add transaction logs to database and create a process to copy it over to the second data center

  • Standby Datacenter - a DB center you can switch to if your first database goes down

Availability and Duration

  • Availability
    • Will I be able to access my data when I need it?
    • Percentage of time an application provides the operations expected of it
  • Durability
    • Will my data be available after 10 or 100 or 1000 years?
  • Examples of measuring availability and durability:
    • 4 9s- 99.99%
    • 11 9s - 9.9999999%
  • Typically, an availability of four 9's is considered very good
  • Typically, a durability of eleven 9's is considered very good
  • Typical online apps aim for 99.99% availability

Increasing Availability and Durability of Databases

  • Increasing Availability
    • Having multiple standbys available
      • in multiple AZs
      • in multiple Regions
  • Increasing Durability
    • Multiple copies of data (standby, snapshots, transaction logs and replicas)
      • in multiple AZs
      • in multiple regions
  • Replicating data comes with its own challenges

Database Terminology: RTO and RPO

  • RPO
    • Recovery Point Objective = maximum acceptable period of database loss
  • RTO
    • Recovery Time Objective - maximum acceptable downtime
  • Achieving minimum RTO and RPO is expensive
  • Trade-off based on the criticality of the data

Database Read Replicas

  • Databases that only read data from the main database
  • Reduces load on master databases

Consistency

  • How do we make sure that data in multiple database instances is updated simultaneously?
  • Strong consistency
    • Synchronous replication to all replicas
      • Will be slow if you have multiple replicas or standbys
  • Eventual Consistency
    • Asynchronous replication. A little lag - few seconds - before change is available in all replicas
      • In the intermediate period, different replicas might return different values
      • Used when scalability is more important than data integrity
      • Examples: Social Media Posts - Facebook status messages, Twitter tweets, LinkedIn posts, etc.
  • Read-after-write Consistency
    • Inserts are immediately available. Updates and deletes are eventually consistent
      • Amazon S3 provides read-after-write consistency

Database Categories

  • There are several categories of databases
    • Relational, Document, Key Value, Graph, etc.
  • Choosing type of database for your use case is not easy. factors:
    • Do you want a fixed schema?
      • Do you want flexibility in defining and changing your schema?
    • What level of transaction properties do you need?
    • What kind of latency do you want?
    • How many transactions do you expect?
    • How much data will be stored?

Relational Databases

  • Only option until recently
  • Most popular (or unpopular) type of databases
  • Predefined schema - tables & relationships
  • Supports Complex SQL Queries
  • Very strong transactional capabilities
  • Used for:
    • OLTP (Online transaction processing) use cases
    • OLAP (Online Analytics Processing) use cases
  • Application where large number of users make large number of small transactions
    • small data reads, updates, and deletes
  • Use Cases: Most traditional applications, ERP, CRM, e-commerce, banking applications
  • Popular Databases: MySQL, Oracle, SQL Server, etc.

Amazon RDS

  • Amazon RDS is a managed relational database service for OLTP use cases
  • Amazon RDS Features:
    • Multi-AZ deployment (standby in another AZ)
    • Read replicas
      • Same AZ
      • Multi AZ
      • Cross Region (Availability++)
    • Storage Auto scaling (up to a configured limit)
    • Automated backups (restore to point in time)
    • Manual snapshots
  • Amazon RDS - You vs AWS
    • AWS is responsible for
      • Availability (according to your configuration)
      • Durability
      • Scaling (according to your configuration)
      • Maintenance (patches)
      • Backups
    • You are responsible for
      • Managing database users
      • App optimization (tables, indexes, etc.)
    • You CANNOT
      • SSH into database EC2 instances or setup custom software (NOT ALLOWED)
      • Install OS or DB patches. RDS takes care of them (NOT ALLOWED)
  • Amazon RDS - When to Use

    • Use Amazon RDS for transactional applications needing
      • Pre-defined schema
      • Strong transactional capabilities
      • Complex Queries
    • Amazon RDS is NOT recommended when
      • You need highly scalable massive read/write operations - for example millions of writes/second
        • Go for DynamoDB
      • When you want to upload files using simple GET/PUT REST API
        • Go for Amazon S3
      • When you need heavy customizations for your database or need access to underling EC2 instances
        • Go for custom database installation

Amazon Aurora

  • MySQL and PostgreSQL-compatible
  • 2 copies in a minimum of 3 AZ
  • Provides "Global Database" option
    • Up to five read only, secondary AWS regions
    • Low Latency for global reads
    • Safe from region-wide outages
  • Minimal lag time, typically less than 1 second

Relation Database - OLAP (Online Analytics Processing)

  • Applications allowing users to analyze petabytes of data
  • Examples: Reporting applications, Data warehouses, Business intelligence applications, Analytics systems
  • Sample application: Decide insurance premiums analyzing data from last hundred years
  • Data is consolidated from multiple (transactional) databases

OLAP vs OLTP

  • OLAP and OLTIP use similar data structures
  • BUT very different approach in how data is stored
  • OLTP databases use row storage
    • Each table row is stored together
    • Efficient for processing small transactions
  • OLAP databases use columnar storage
    • Each table column is stored together
    • High compression - store petabytes of data efficiently
    • Distribute data - one table in multiple cluster nodes
    • Execute single query across multiple nodes - Complex queries can be executed efficiently

Amazon Redshift

  • Redshift is a relational database (tables and relationships)
  • What is the need for another relational database?
    • RDS is optimized for online transaction processing
    • RDS is optimized to provide a balance between both reads and write operations
  • (However) OLAP workloads have exponentially larger reads on the database compared to writes
    • Can we use a different approach to design the database?
    • How about creating a cluster and splitting the execution of the same queries across several nodes>
  • Redshift is a petabyte-scale distributed data warehouse based on PostgreSQL
  • Three important characteristics of Redshift:
    • Massive parallel processing (MPP) - storage and processing can be split across multiple nodes
    • Columnar data storage
    • High data compression
  • As a result
    • A single row of data might be stored across multiple nodes
    • A query to Redshift leader node is distributed to multiple compute nodes for execution
  • Start with a single node configuration and scale to multi node configuration
  • You can dynamically add and remove nodes
  • Used for traditional ETL (Extract, Transform, Load), OLAP and Business Intelligence (BI) use cases
    • Optimized for high performance analysis and reporting of very large datasets
  • Supports standard SQL
  • Integration with data loading, reporting, mining and analytics tools

Amazon EMR - Elastic MapReduce

  • Managed Hadoop service with high availability and durability
  • EMR gives access to underling OS => You can SSH into it
  • Important tools in Hadoop eco system are natively supprted:
    • Examples: Pig, Hive, Spark, or Presto
  • install Others using bootstrap actions
  • Use Cases:
    • Log Processing for insights
    • Click stream analysis for advertisers
    • Genomic and life science dataset processing

Amazon Redshift Spectrum

  • Run SQL queries against datasets in Amazon S3
    • Does need for any intermediate data stores
  • Auto scales based on your queries
  • Scale storage and compute independently
  • Eliminate expensive data transfers from S3 to data warehousing solutions (Cost Effective)
  • Query against Amazon EMR (as well)

Document Databases

  • Structure data the way your application needs it
  • Create one table instead of dozens
  • Quickly evolving semi structured data (schema-less)
  • Easily distributable
  • Advantages: (Horizontally) Scalable to terabytes of data with millisecond responses up to millions of transaction per second
  • Use cases: Content management, catalogs, user profiles

Key-Value:

  • Use a simple key-value pair to store data. Key is a unique identifier.
  • Values can be objects, compound objects, or simple data values
  • Advantages: (Horizontally) Scalable to terabytes of data with millisecond responses up to millions of transactions per second
  • Use cases: shopping carts, session stores, gaming applications, and very high web traffic apps

Amazon DynamoDB

  • Fast, scalable, distributed for any scale
  • Flexible NoSQL Key-value & document database (schemeless)
  • Single digit millisecond responses for millions or durability
    • Automatically partitions data as it grows
    • Maintains 3 replicas within the same region
  • No need to provision a database
    • Create a table and configure read and write capacity (RCU and WCU)
    • Automatically scales to meet your RCU and WCU
  • Provides an expensive serverless mode
  • Use cases: User profiles, shopping carts, high volume read and write applications

DynamoDB Tables

  • Hierarchy: Table > item(s) > attribute (key value pair)
  • Mandatory primary key
  • Other than the primary key, tables are schemaless
    • No need to define the other attributes or types
    • Each item in a table can have distinct attributes
  • Max 400 kB per item in a table
    • Use S3 for large objects and DynamoDB for smaller objects

In-memory Databases (or Caches)

  • Retrieving data from memory is much faster from retrieving data from disk
  • You can speed up dynamic database-driven websites by caching data and objects in memory (Ex. Memcached)
  • You can delivery microsecond latency by storing some persistent data in memory (Ex. Redis)
  • Use Cases: Caching, session management, gaming leader boards, geospatial applications

Amazon ElastiCache

  • Highly scalable and low latency in-memory data store
  • Used for distributed caching
  • (Option 1) ElastiCache Memcached
    • Low maintenance caching solution
    • Easy horizontal scaling with auto discovery
    • Use case: Speed up database-driven websites by caching data
  • (Option 2) ElastiCache Redis:
    • Persistence
    • Advanced Features:
      • Pub-sub
      • Read Replicas and Failover
      • Encryption
    • Use cases: gaming leaderboards, queues, real time analytics

Database Type

AWS Service

Description

Relational OLTP Databases

Amazon RDS

Row Storage
Transactional use-cases needing predefined schema and very strong transactional capabilities

Relational OLAP Databases

Amazon Redshift

Columnar Storage
Reporting, analytics, and intelligence apps needing predefined schema

Document and Key Databases

Amazon DynamoDB

Apps needing quickly evolving semi structured data (schema-less)
Scale to terabytes of data with millisecond responses up to millions of TPS

Graph Databases

Amazon Neptune

Store and navigate data with complex relationships
Social Netoworking Data (Twitter, Facebook), Fraud Detection

In memory databases / caches

Amazon ElastiCache

Applications needing microsecond responses
Redis - persistent data

Memcached - simple caches

Other Storage Services

  • Amazon DocumentDB
    • Managed document database service
    • Compatible with MongoDB
  • Amazon Keyspaces
    • Managed service for Apache Cassandra
    • Serverless (Pay for use)
  • AWS Backup
    • Centrally manage and automate backup across AWS services
    • Automate backup compliance and monitoring
Networking

Need for Amazon VPC

  • In a corporate network or an on-premises data center:
    • Can anyone on the internet see the data exchange between the application and the database?
      • No
    • Can anyone from internet directly connect to your database?
      • Typically NO
      • You need to connect to your corporate network and then access your applications or databases
  • Corporate network provides a secure internal network protecting your resources, data and communication from external users
  • How do you create your own private network in the cloud
    • Enter Virtual Private Cloud (VPC)

Amazon VPC

  • Your own isolated network in AWS cloud
    • Network traffic within a VPC is isolated (not visible) from all other Amazon VPCs
    • You control all the traffic coming in and outgoing a VPC
  • (Best Practice) Create all your AWS resources (compute, storage, databases etc.) within a VPC
    • Secure resources from unauthorized access AND
    • Enable secure communication between your child resources

Need for VPC Subnets

  • Different resources are created on cloud - databases, compute (EC2) etc.
  • Each type of resource has its own access needs
  • Public Elastic Load Balancers are accessible from internet (public resources)
  • Databases or EC2 instances should NOT be accessible from internet
    • ONLY applications within your network (VPC) should be able to access them (private resources)
  • How do you separate public resources from private resources inside a VPC?

VPC Subnets

  • (Solution) Create different subnets for public and private resources
    • Resources in a public subnet CAN be accessed from internet
    • Resources in a private subnet CANNOT be accessed from internet
    • BUT resources in public subnet can talk to resources in private subnet
  • Each VPX is created in a Region
  • Each Subnet is created in an Availability Zone
  • Example: VPC us-east-1 => Subnets - AZs us-east-1a or us-east-1b

Routing on the Internet

  • You have an IP address of a website you want to visit
  • There is no direct connection from your computer to the website
  • Internet is actually a set of routers routing traffic
  • Each router has a set of rule that help it decide the path to the destination OP address

Routing Inside AWS

  • In AWS, route tables are used for routing
  • Route tables can be associated with VPCs and subnets

Destination

Target

172.31.0.0/16

Local

0.0.0.0/0

igw-1234567

  • Each route table consists of a set of rules called routes
    • Each route or routing rule has a destination and target
    • What Range of Addresses should be routed to which target resource?
  • Rule 1 (Above) - Route requests to VPC CIDR 172.31.0.0/16 (172.31.0.0 to 172.31.255.255) to local resources within the VPC
  • Rule 2 - Route all other IP addresses (0.0.0.0/0) to internet (internet gateway)

Public Subnet vs Private Subnet

  • Public Subnet
    • Communication allowed from subnet to internet
    • Communication allowed from internet to subnet
  • Private Subnet
    • Communication NOT allowed from internet to subnet

Name

Destination

Target

Explanation

Rule 1

172.31.0.0/16

Local

Local Routing

Rule 2

0.0.0.0/0

igw-1234567

Internet Routing

  • An internet gateway enables internet communication for subnets
  • Any subnet which has a route to an internet gateway is called a public subnet
  • Any subnet which DOES NOT have route to an internet gateway is called a private subnet

Network Address Translation (NAT) Instance and Gateway

  • How do you allow instances in a private subnet to download software updates and security patches while denying inbound traffic from internet?
  • How do you allow instances in a private subnet to connect privately to other AWS Services outside the VPC?
  • Three Options:
    • NAT Gateway: Managed service
    • NAT Instance: Install a EC2 instance with a specific NAT AMI and configure as a gateway
    • Egress-Only Internet Gateways: For IPv6 subnets

Network Access Control List

  • Security group control traffic to a specific resource in a subnet
  • How about stopping traffic from even entering the subnet?
  • NACL provides stateless firewall at a subnet level.
  • Each subnet must be associated with a NACL
  • Default NACL allows all inbound and outbound traffic
  • Custom created NACL denies inbound and outbound traffic by default
  • Rules have a priority number
    • Lower Number => Higher Priority

Security Group vs NACL


Feature

Security Group

NACL

Level

Assigned to a specific instance(s)/resource(s)

Configured for a subnet. Applied to traffic to all instances in a subnet

Rules

Allow rules only

Both allow and deny rules

State

Stateful. Return traffic is automatically allowed.

Stateless. You should explicitly allow return traffic.

Evaluation

Traffic allowed if there is a matching rule.

Rules are prioritized. Matching rule with highest priority wins.

VPC Flow Logs

  • Monitor network traffic
  • Troubleshoot connectivity issues (NACL and/or security groups misconfiguration)
  • Capture traffic going in and out of your VPC (network interfaces)
  • Can be crated for:
    • a VPC
    • a subnet
  • Publish logs to Amazon CloudWatch Logs or Amazon S3
  • Flow log record contain ACCEPT or REJECT
    • Is traffic permitted by security groups or network ACLs?

VPC Peering

  • Connect VPCs belonging to same or different AWS accounts irrespective of the region of the VPCs
  • Allows private communication between the connected VPCs
  • Peering uses a request/accept protocol
    • Owner of requesting VPC sends a request
    • Owner of the peer VPC has one week to accept

AWS and On-Premises - Overview

  • AWS Managed VPN
    • IPsec VPN tunnels from VPC to customer network
  • AWS Direct Connect (DX)
    • Private dedicated network connection from on-premises to AWS

AWS Managed VPN

  • IP Sec VPN tunnels from VPC to customer network
  • Traffic over internet - encrypted using IPsec protocol
  • VPN gateway to connect one VPX to customer network
  • Customer gateway installed in customer network
    • You need an Internet routable IP address of customer gateway

AWS Direct Connect (DC)

  • Private dedicated network connection from on-premises to AWS
  • Advantages:
    • Private network
    • Reduce your (ISP) bandwidth costs
    • Consistent Network performance because of the private network
  • (REMEMBER) Establishing DC connection can take more than a month

Service

Example Use Case

Explanation

AWS Local Zones

A gaming company providing seamless online multiplayer experience in a specific city

Extend AWS infrastructure to metro areas closer to end-users for single0digit millisecond latency. Ideal for real-time user engagement like online gaming.

AWS Outposts

A healthcare organization processing patient data with regulatory requirements to keep data on-premises.

Bring native AWS services, infrastructure, and operating models to on-premises. Suitable for workloads with regulatory or data residency needs.

AWS Wavelength

Developing an augmented reality app for mobile devices requiring real-time edge data processing.

Embed AWS infrastructure in mobile service providers data centers at the 5g network edge. Enables ultra-low latency and high bandwidth applications for mobile and connected devices.

VPC - Revies

  • VPC: Virtual Network to protect resources and communication from the outside world.
  • Subnet: Separate private resources from public resources
  • Internet Gateway: Allows Public Subnets to connect/accept traffic to/from internet
  • NAT Gateway: Allow internet traffic from private subnets
  • VPC Peering: Connect one VPC with other VPCs
  • VPC Flow logs: Enable logs to debug problems
  • AWS Direct Command: Private pipe from AWS to on-premises
  • AWS VPN: Encrypted (IPsec) tunnel over internet to on-premises
IAM - Fundamentals

Typical Identity Management in the Cloud

  • You have resources in the cloud (examples - a virtual server, a database)
  • You have identities (human and non-human) that need access to those resources and perform actions
    • For example: launch (stop, start, or terminate) a virtual server
  • How do you configure resources they can access?
  • How can you configure what actions to allow?
  • In AWS, Identity and Access Management (IAM) provides this service

AWS Identity and Access Management (IAM)

  • Authentication (is it the right user?) and
  • Authorization (do they have the right access?)
  • Identities can be
    • AWS users
    • Federated users (externally authenticated users)
  • Provides very granular controil
    • Limit a single user
      • to perform single action
      • on a specific AWS resource
      • from a specific IP address
      • during a specific time window

Important IAM Concepts

  • IAM users: Users created in an AWS account
    • Has credentials attached (name/password or access keys)
  • IAM Groups: Collection of IAM Users
  • Roles: Temporary Identities
    • Does not have credentials attached
    • (Advantage) Expire after a set period of time
Data Encryption KMS and Cloud HSM

Data States

  • Data at rest: Stored on a device or a backup
    • Examples: data on a hard disk, in a database, backups and archives
  • Data in motion: Being transferred across a network
    • Also called data in transit
    • Examples:
      • Data copied from on-premise to cloud storage
      • An application in a VPC talking to a database
    • Two Types:
      • In and out of AWS
      • Within AWS
  • Data in use: Active data processed in a non-persistent state
    • Example: Data in your RAM

Encryption

  • If you store data as is, what would happen id an unauthorized entity gets access to it?
    • Imagine losing an unencrypted hard disk
  • First law of security: Defense in Depth
  • Typically, enterprises encrypt all data
    • Data on your hard disks
    • Data in your databases
    • Data on your file servers
  • Is it sufficient if you encrypt data at rest?
    • No. Encrypt data in transit - between application to database as well.

Systematic Key Encryption

  • Symmetric encryption algorithms use the same key for encryption and decryption
  • Key Factor 1: Choose the right encryption algorithm
  • Key Factor 2: How do we secure the encryption key?
  • Key Factor 3: How do we share the encryption key?

Asymmetric Key Encryption

  • Two Keys: Public Key and Private Key
  • Also called Public Key Cryptography
  • Encrypt data with Public Key and decrypt with Private Key
  • Share Public Key with everybody and keep the Private Key with you
  • No crazy questions:
    • Will somebody not figure out private key using the public key
  • How do you create Asymmetric Keys?

KMS and Cloud HSM

  • How do you generate, store, use, and replace your keys?
  • AWS provides two important services - KMS and Cloud HSM
    • Manage your keys
    • Perform encryption and decryption

Amazon KMS

  • Create symmetric and manage cryptographic keys (symmetric and asymmetric)
  • Control their use in your applications and AWS Services
  • Defined key usage permissions (including cross account access)
  • Track key usage in AWS CloudTrail (regulations and compliance)
  • Integrates with almost all AWS services that need data encryption
  • Automatically rotate master keys once a year
    • No need to re-encrypt previously encrypted data (versions of master key are maintained)
  • Schedule key deletion to verify if the key is used
    • Mandatory minimum wait period: 7 days (max-30days)

AWS CloudHSM

  • Managed (highly available and auto scaling) dedicated single tenant Hardware Security Module (HSM) for regulatory compliance
    • (Remember) AWS KMS is a multi-tenant service
  • FIPS 140-2 Level 3 compliant
  • AWS CANNOT access your encryption master keys in CloudHSM
    • In KMS, AWS can access your master keys
    • Be ultra safe with your keys when you are using CloudHSM
    • (Recommendation) Use two or more HSMs in separate AZs in a production cluster

AWS CloudHSM

  • AWS KMS can use CloudHSM cluster as "custom key store" to store the keys:
    • AWS Services can continue to talk to KMS for data encryption
    • (AND) KMS does the necessary integration with CloudHSM cluster
  • (Best Practice) CloudWatch for monitoring and CloudTrail to track key usage
  • Use Cases
    • (Web Servers) Offload SSL processing
    • Certificate Authority
    • Digital Rights Management
    • TDE for Oracle Databases

AWS Shield

  • Shields from Distributed Denial of Service (DDoS) attacks
    • Disrupt normal traffic of a server by overwhelming it with a flood of Internet traffic
  • Protect
    • Amazon Route S3
    • Amazon CloudFront
    • AWS Global Accelerator
    • Amazon Elastic Compute Cloud (EC2) Instances
    • Elastic Load Balancers (ELB)

AWS Shield - Standard and Advanced

  • AWS Shield Standard
    • Zero Cost. Automatically enabled.
    • Protection against common infrastructure (layer 3 and 4) DDoS attacks
  • AWS Shield Advanced
    • Paid service
    • Enhanced protection for Amazon EC2, Elastic Load Balancing (ELB), Amazon CloudFront, AWS Global Accelerator, and Amazon Route S3
    • 24x7 access to the AWS DDoS Response Team (DRT)
    • Protects your AWS bill from usage spikes as a result of a DDoS attack
  • Protect any web application (from Amazon S3 or external) from DDoS by putting Amazon CloudFront enabled AWS Shield in front of it

AMS WAF - Web Application Firewall

  • AWS WAF protect your web application from OWASP Top 10 exploits, CVE and a lot more!
    • OWASP (Open Web Application Security Project) Top 10
      • List of broadly agreed "most critical security risks to web applications"
      • Examples: SQL injection, cross-site scripting
    • Common vulnerabilities and Exposures (CVE) is a list of information-security vulnerabilities and exposures
  • Can be deployed on Amazon CloudFront, Application Load Balancer, Amazon API Gateway
  • Customize rules & trigger real-time alerts (CloudWatch Alarms)
  • Web traffic filtering: block attacks
    • Filter traffic based on IP addresses, geo locations, HTTP headers, and body (block attacks from specific user-agents, bad bots, or content scrapers)

Amazon Macie

  • Fully managed data security and data privacy service
  • Automatically discover, classify, and protect sensitive data in Amazon S3 buckets
  • When migrating to AWS use S3 for staging
    • Run Macie to discover secure data
  • Uses machine learning
  • Recognizes sensitive data
    • Example: personally identifiable information (PII) or intellectual property
  • Provides you with dashboards and alerts
    • Gives visibility into how data is being accessed and moved

AWS Inspector: Enhanced Security Scanning

  • AWS Inspector: Automated Security Scanning
  • Discover AWS workloads: Scans Amazon EC2 instances, containers, and Lambda functions for vulnerabilities
  • Security: Identifies software vulnerabilities and checks for unintended network exposures
  • Compliance: Helps ensure your AWS workloads with security standards and best practices
  • Continuous Monitoring: Automatically assesses new and existing workloads to improve your security posture over time

AWS Systems Manager Parameter Score

  • Manage application environment configuration and secrets
    • database connections, password etc.
  • Supports hierarchal structure
  • Store configuration at one place
    • Multiple applications
    • multiple environments
  • Maintains history of configuration over a period of time
  • Integrates with KMS, IAM, CloudWatch, and SNS

AWS Secrets Manager

  • Rotate, Manage and retrieve database credentials, API keys, and other secrets for your applications
  • Integrates with KMS (encryption), Amazon RDS, Amazon redshift, and Amazon DocumentDB
  • (KEY FEATURE) Rotate secrets automatically without impacting applications
  • (KEY FEATURE) Service dedicated to secrets management
  • Recommended for workloads needing HIPAA, PCI-DSS compliance

AWS Single Sign On

  • Cloud-based single sign-on (SSO) service
  • Centrally manage SSO access to all of your AWS accounts
  • Integrates with Microsoft AD (Supports using your existing corporate accounts)
  • Supports Security Assertion Markup Languages (SML) 2.0
  • Deep integration with AWS Organizations (Centrally manage access to multiple AWS accounts)
  • One place auditing in AWS CloudTrail

Other Important Security Services

  • Amazon GuardDuty
    • Continually monitor your AWS environment for suspicious activity (Intelligent Threat Detection)
    • Analyze AWS CloudTrail events, VPC flow logs, etc.
  • AWS Certificate Manager
    • Provision, manage, deploy, and renew SSL/TLS certificates on the AWS platform
  • AWS Artifact
    • Self-service portal for on-demand access to AWS compliance reports, certifications, accreditations, and other third-party attestations
    • Review, accept, and manage your agreements with AWS
  • AWS Security Hub
    • Consolidated view of your security status in AWS
    • Automate security checks, manage security findings, and identify the highest priority security issues across your AWS environment.
  • Amazon Detective
    • Investigate and quickly identify the root cause of your potential security issue
    • Automatically collect log data from your AWS resources and uses machine learning to help you visualize and conduct security investigations
  • Penetration Testing
    • Testing application security by simulating an attack
    • You do not need permission from AWS to do penetration testing on a limited set of services (EC2 instances, ELB, RDS, CloudFront, API Gateway, Lambda, Elastic BeanStalk)
CloudTrail, Config, and CloudWatch

AWS CloudTrail

  • Track events, API calls, changes made to your AWS resources:
    • Who made the request?
    • What action was performed?
    • What are the parameters used?
    • What was the end result?
  • (USE CASE) Compliance with regulatory standards
  • (USE CASE) Troubleshooting. Locate a missing resource
  • Delivers log files to S3 and/or Amazon cloud watch logs log group (S3 is default)
  • You can setup SNS notifications for log file delivery

AWS Cloud Trail Types

  • Multi Region Trail
    • One trail of all AWS regions
    • Events from all regions can be sent to one CloudWatch logs log group
    • Destination S3 bucket can be in any region

AWS Clous Trail - Good to Know

  • Log files are automatically encrypted with Amazon S3 SSE
  • You can configure S3 Lifecycle rules to archive or delete log files
  • Supports log file integrity
    • You can prove that a log file has not been altered

AWS Config

  • Auditing
    • Create a complete history of your AWS resources
  • Resource History and Change Tracking
    • Find how a resource was configured at any point in time
    • Configuration of deleted resources would be maintained
    • Delivers a history file to S3 bucket every 6 hours
    • Take configuration snapshots when needed
  • Governance
    • Customize Config Rules for specific resources or for entire AWS account
    • Continuously evaluate compliance against desired configuration
    • Get a SNS notification for every configuration change
  • Consistent Rules and Compliance across AWS accounts
    • Group Config Rules and Remediation Actions into Conformance Packs

Predefined Config Rule Examples (80+)

  • alb-http-to-https-redirection-check - Checks whether HTTP to HTTPS redirection is configured on all HTTP listeners of Application Load Balancers
  • ebs-optimized-instance - Checks whether EBS optimization is enabled for your EC2 instances that can be EBS-optimized
  • ec2-instance-no-public-ip - Do EC2 instances have public IPs?
  • encrypted-volumes - Are all EC2 instance attached EBS volumes encrypted?
  • eip-attached - Are all Elastic IP addresses used?
  • restricted-ssh - Checks whether security groups that are in use disallow unrestricted incoming SSH traffic

AWS Config Rules

  • (Feature) Create Lambda functions with your custom rules
  • (Feature) You can setup auto remediation for each rule
    • Take immediate action on a non compliant resource
    • (Example) Stop EC2 instances without a specific tag!
  • Enable AWS Config to use the rules
    • No Free Tier
    • No more rules to check => More $$$$

AWS Config + AWS CloudTrail

  • AWS Config
    • What did my AWS resource look like?
  • AWS CloudTrail
    • Who made an API call to modify this resource?

Monitoring AWS With Amazon CloudWatch

  • Monitoring and observability service
  • Collects monitoring and operational data in the form of logs, metrics, and events
  • Set alarms, visualize logs, take automated actions, and troubleshoot issues
  • Integrates with more than 70 AWS services

Amazon CloudWatch Logs

  • Monitor and troubleshoot using system, application, and custom log files
  • Real time application and system monitoring
    • Monitor for patterns in your logs and trigger alerts based on them
    • Example: Errors in a specific interval exceed a certain threshold
  • Long term log retention
    • Store logs in CloudWatch Logs for as long as you want (configurable - default: forever)
    • Or archive logs to S3 bucket (Typically involves a delay of 12 hours)
    • Or stream real time to Amazon Elasticsearch Service (Amazon ES) cluster using CloudWatch Logs subscription

Amazon CloudWatch Logs

  • CloudWatch Logs Agent
    • Installed on EC2 instances to move logs from servers to CloudWatch logs
  • CloudWatch Logs Insights
    • Write queries and get actionable insights from your logs
  • CloudWatch Container Insights
    • Monitor, troubleshoot, and set alarms for your containerized applications running in EKS, ECS and Fargate

Amazon CloudWatch Alarms

  • Create alarms based on:
    • Amazon EC2 instance CPU utilization
    • Amazon SQS queue length
    • Amazon DynamoDB table throughput or
    • Your own custom metrics
  • Take immediate action:
    • Send a SNS event notification
      • Send an email using SNS
    • Execute an Auto Scaling Policy

Amazon CloudWatch Dashboards

  • create auto refreshed graphs around all CloudWatch metrics
  • Automatic Dashboards are available for most AWS services and resources
  • Every Dashboard can have graphs from multiple regions

Amazon CloudWatch Events

  • Enable you to take immediate action based on events on AWS resources
    • Call a AWS Lambda function when an EC2 instance starts
    • Send event to an Amazon Kinesis stream when an Amazon EBS volume is created
    • Notify an Amazon SNS topic when an Auto Scaling event happened
  • Schedule events - Use Unix chron syntax
    • Schedule a call to a Lambda function every hour
    • Send a notification to Amazon SNS topic every 3 hours
Decoupling Applications with SQS, SNS, and MQ

Need for Asynchronous Communication

  • Why do we need asynchronous communication?

Synchronous Communication

  • Applications on your web server make synchronous calls to the logging service
  • What if your logging service goes down?
    • Will your application go down too?
  • What if all of sudden, there is high load and there are lot of logs coming in?
    • Log Service is not able to handle the load and goes down very often

Asynchronous Communication - Decoupled

  • Create a queue or topic
  • Your applications put the logs on the queue
  • They would be picked up when the logging service is ready
  • Good example of decoupling

Asynchronous Communication - Scale Up

  • You can have multiple logging service instances reading from the queue

Asynchronous Communication - Pull Model - SQS

  • Producers put messages on the queue
  • Consumers poll on the queue
    • Only one of the consumers will successfully process a given message
  • Scalability
    • Scale consumer instances under high load
  • Availability
    • Producer up even if consumer is down
  • Reliability
    • Work s not lost due to insufficient resources
  • Decoupling
    • Make changes to consumers without effect on producers worrying about them

Asynchronous Communication - Push Model - SNS


  • Subscribers subscribe to a topic
  • Producers send notifications to a topic
    • Notifications sent out to all subscribers
  • Decoupling
    • Producers don't care about who is listening
  • Availability
    • Producer up even if a subscriber is down

Simple Queuing Service

  • Reliable, scalable, fully managed message queuing service
  • High availability
  • Unlimited scaling
    • Auto scale to process billions of messages per day
  • Low Cost (Pay for Use)

Standard and FIFO Queues

  • Standard Queue
    • Unlimited throughput
    • But NO Guarantee of ordering (Best-Effort Ordering)
    • and NO guarantee of exactly-once processing
      • Guarantees at-least-once delivery (some messages can be processed twice)
  • FIFO (first-in-first-out) Queue
    • First-In-First-Out Delivery
    • Exactly-Once Processing
    • BUT throughput is lower
      • Up to 300 messages per second (300 send, receive, or delete operations per second
      • If you batch 10 messaged per operation (maximum), up to 3000 messages per second
  • Choose
    • Standard SQS queue if throughput is important
    • FIFO Queue if order of events is important

Amazon Simple Notification Service (SNS)

  • Publish-Subscribe (pub-sub) paradigm
  • Broadcast asynchronous event notifications
  • Simple process
    • Create an SNS Topic
    • Subscribers can register for a Topic
    • When an SNS Topic receives an event notification (from publisher), it is broadcast to all Subscribers
  • Use Cases: Monitoring Apps, workflow systems, mobile apps

Amazon Simple Notification Service (SNS)

  • Provides mobile and enterprise message web services
    • Push notifications to Apple, Android, FireOS, Windows devices
    • Send SMS to mobile users
    • Send Emails
  • REMEMBER: SNS does not need SQD or a Queue
  • You can allow access to other AWS accounts using SNS generated policy

Amazon MQ

  • Managed message broker service for Apache Active MQ
  • (Functionally) Amazon MQ = Amazon SQS (Queues) + Amazon SNS (Topics)
    • BUT with restricted scalability
  • Supports traditional APIs (JMS_ and protocols (AMQP, MQQTT, OpenWire, and STOMP)
    • Easy to migrate on-premise applications using traditional message brokers
    • Start with Amazon MQ as first step and slowly re-design apps to use Amazon SQS and/or SNS
  • Scenario: An enterprise uses AMQP (standard message broker protocol). They want to migrate to AWS without making code changes
    • Recommend Amazon MQ
Routing and Content Delivery

Content Delivery Network

  • You want to deliver content to your global audience
  • Content Delivery Networks distribute content to multiple edge locations around the world
  • AWS provides 200+ edge locations around the world
  • Provides high availability and performance

Amazon CloudFront

  • How do you enable serving content directly from AWS edge locations?
    • Amazon CloudFront (one of the options)
  • Serve users from the nearest edge location (based on user location)
  • Source content can be from S3, EC2, ELB, and External Websites
  • If content is not available at the edge location, it is retrieved from the origin server and cached
  • No minimum usage commitment
  • Provides features to protect your private content
  • Use Cases
    • Static Web apps. Audio, video and software downloads. Dynamic web apps
    • Supporting media streaming with HTTP and RTMP
  • Integrates With
    • AWS Shield to protect from DDoS attacks
    • AWS Web Application Firewall (WAF) to protect from SQL injection, cross-site scripting, etc.
  • Cost Benefits
    • Cero cost for data transfer between S3 and CloudFront
    • Reduce compute workload for your EC2 instances

  • Create a CloudFront distribution to distribute consent to edge locations
    • DNS domain name - example abc.cloudront.com
    • Origins - Where do you get content from? S3, EC2, ELB, External Website
    • Cache-Control
      • By default objects expire after 24 hours
      • Customize min, max, default TTL in CloudFront distribution
      • (For file level customization) Use Cache-Control max-age and Expires headers in origin server
  • You can configure CloudFront to only use HTTPS (or) use HTTPS for certain objects
    • Default is to support HTTP and HTTPS
    • You can configure CloudFront to redirect HTTP to HTTPS

AWS Edge Locations: Content Delivery Hubs

  • AWS Edge Locations: Delivery Hubs for Your Apps
  • Used by Amazon CloudFront: CloudFront distributes your static content (images, videos, etc.) across a global network of edge locations, minimizing latency and improving delivery speeds for users worldwide
  • Used by Global Accelerator: Global Accelerator intelligently routes user traffic to the closest AWS edge location, minimizing latency and improving loading times
  • Used by Amazon S3 Transfer Acceleration: Accelerates long-distance transfers to and from your Amazon S3 buckets

Route 53

  • What would be the steps in setting up a website with a domain name (for example, in28minutes.com)?
    1. But the domain name in 28minutes.com (Domain Registrar)
    2. Setup your website content (Website Hosting)
    3. Route requests to in28minutes.com to the my website host server (DNS)
  • Route 53 = Domain Registrar + DNS
    • Buy your domain name
    • Setup your DNS routing for in28minutes.com

Route 52 - DNS (Domain Name Server)

  • How Should traffic be routed for in28minutes.com
    • Configure Records:
      • Route api.in28minutes.com to the IP address of api server
      • Route static.in28minutes.com to the IP address of http server
      • Route email to the mail server
      • Each record is associated with a TTL (Time To Live) - How long is your mapping cached at the routers and the client?

Route 52 Hosted Zone

  • Container for records containing DNS records routing traffic for a specific domain
  • I want to use Route 53 to manage the records (Name Server) for in28minutes.com
    • Create a hosted zone for in28minutes.com in Route 53
  • Hosted zones can be
    • private - routing within VPCs
    • public - routing on the internet
  • Manage the DNS records in a Hosted Zone

Standard DNS Records

  • A - Name to IPV4 address(es)
  • AAAA - Name to IPV6 address(es)
  • NS - name Server containing DNS records
    • I bought in28minutes.com from GoDaddy (Domain Registrar)
    • BUT I can use Route 52 as DNS
      • Create NS records on GoDaddy
      • Redirect to Route 52 Name Servers
  • MX - Mail Exchange
  • CNAME - Name1 to Name2

Route 52 Specific Extension - Alias Records

  • Route traffic to selected AWS resources
    • Elastic BeanStalk environment
    • ELB load balancer
    • Amazon S3 bucket
    • CloudFront Distribution
  • Alias records can be created for
    • root (in28,inutes.com) and
    • non root domains (api.in28minutes.com)
  • COMPARED to CNAMD records which can only be created for
    • non root domains (api.in28minutes.com)

Route 53 - Routing

  • Route 52 can route across Regions
    • Create ALBs in multiple regions and route to them
    • Offers multiple routing policies

Route 53 Routing Policies

Need for AWS Global Accelerator

  • Cached DNS answers
    • clients might cache DNS answers causing a delay in propagation of configuration updates
  • High latency
    • users connect to the region over the internet

AWS Global Accelerator

  • Directs traffic to optimal endpoints over the AWS global network
  • Global Accelerator provides you with two static IP addresses
  • Static IP addresses are anycast from the AWS edge network
    • Distribute traffic across multiple endpoint resources in multiple AWS Regions
  • Works with Network Load Balancers, Application load Balancers, EC2 Instances, and Elastic IP addresses
Moving Data Between AWS and On-Premises

AWS Snowball

  • Transfer dozens of terabytes to petabytes of data from on-premises to AWS
  • 100TB (80TB usable) per appliance
  • Involves physical shipping
  • Simple process
    • Request for Snowball
    • Copy data
    • Ship it back
  • Manage jobs with AWS Snowball console
  • Data is automatically encrypted with KMS
  • Current versions of AWS Snowball use Snowball Edge devices
    • Provide both compute and storage
    • Pre-process data (using Lambda functions)
  • Choose between
    • Storage optimized (34 vCPUs, 32 GiB RAM)
    • Compute optimized (52 vCPUs, 208 GiB RAM)
    • Compute optimized with GPU
  • Choose Snowball if direct transfer takes over a week
    • 5TB can be transferred on 100 Mbps line in a week at 80% utilization

AWS DataSync - Transfer File Storage to Cloud

  • Secure and 10x faster (100s of TB) data transfers from/to AWS over internet or AWS Direct Connect
  • Transfer from on-premise file storage (NFS, SMB) to S3, EFS, or FSx for Windows
  • Monitor progress using Amazon CloudWatch
  • (Use Cases) Data Migration, Data replication, and Cold Data Archival
  • (Alternative) Use AWS Snowball if you are bandwidth constrained or transferring data from remote, or disconnected
  • (Alternative) Use S3 Transfer Acceleration when your applications are integrated with S3 API. If not, prefer AWS DataSync (Supports multiple destinations, built-in retry)
  • (Integration) Migrate data using DataSync and use AWS Storage Gateway for ongoing updates from on-premises applications

AWS Data Pipeline

  • Process and move data (ETL) between S3, RDS, DynamoDB, EMR, On-premise data sources
  • Create complex data processing workloads that are fault tolerant, repeatable, and highly available
  • Launches required resources and tear them down after execution.
  • REMEMBER: NOT for streaming data!

AWS Database Migration Service

  • Migrate databases to AWS while keeping source database operational
    • Homogeneous Migrations (ex. Oracle to Oracle)
    • Heterogeneous Migrations (ex. Oracle to Amazon Aurora, MySQL to Amazon Aurora)
  • Free for first 6 months when migrating to Aurora, Redshift, or DynamoDB
  • (AFTER MIGRATION) Keep databases in sync and pick right movement to switch
  • (Use case) Consolidate multiple databases into a single target database
  • (Use case) Continuous Data Replication can be used for Disaster Recovery

AWS Schema Conversion Tool

  • Migrate data from commercial databases and data warehouses to open source or AWS services
    • Preferred option for migrating data warehouse data to Amazon Redshift
  • Migrate database schema (views, stored procedures, and functions) to compatible targets
  • Features
    • SCT assessment report
      • Analyze a database to determine the conversion complexity
    • Update source code (update embedded SQL in code)
    • Fan-in (multiple sources - single target)
    • Fan-out (single source - multiple targets)

Database Migration Service VS Schema Conversion Tool

  • (Remember) SCT is part of DMS service
  • DMS is preferred for homogeneous migrations
  • SCT is preferred when schema conversions are involved
  • DMS is for smaller workloads
  • SCT preferred for large data warehouse workloads
    • Prefer SCT for migrations to Amazon Redshift
  • Only DMS provides continuous data replication after migration
Amazon Kinesis

Amazon Kinesis

  • Handle streaming data
    • NOT recommended for ETL batch jobs
  • Amazon Kinesis Data Streams
    • Process data Streams
  • Amazon Kinesis Firehouse
    • Data ingestion for streaming data: S3, Elasticsearch, etc.
  • Amazon Kinesis Analytics
    • Run queries against streaming data
  • Amazon Kinesis Video Streams
    • Monitor Video Streams

Amazon Kinesis Data Streams

  • Limitless Real time stream processing
    • Sub second processing latency
  • Alternative for Kafka
  • Supports multiple clients
    • Each client can track their stream position
  • Retain and replay data (max 7 days and default 1 day)

Amazon Kinesis Data Streams - Integrations

  • Use application integrations to generate streams
    • Toolkits: AWS SDK, AWS Mobile SDK, Kinesis Agent
    • Service Integrations: AWS IOT, CloudWatch Events and Logs
  • Process streams using Kinesis Stream Applications
    • Run on EC2 instances
    • Written using Kinesis Data Streams APIs

Amazon Kinesis Data Firehose

  • Data ingestion for streaming data
    • Receive
    • Process (transform - Lambda, compress, encrypt)
    • Store stream data to S3, Elasticsearch, Redshift and Splunk
  • Use existing analytics tools based on S3, Redshift, and Elasticsearch
  • Pay for volume of data ingested (Serverless)

Amazon Kinesis Analytics

  • You want to continuously find active number of users on a website in the last 5 minutes based on streaming website data
  • With Amazon Kinesis Analytics, you can write SQL queries and build JAVA applications to continuously analyze your streaming data

Amazon Kinesis Video Streams

  • Monitor video streams from web cams
  • Examples: traffic lights, shopping malls, homes. etc.
  • Integrate with machine learning frameworks to get intelligence
DevOps

DevOps

  • Getting better at "Three Elements of Great Software Teams"
    • Communication - Get teams together
    • Feedback - Earlier you find a problem, the easier it is to fix
    • Automation - Automate texting, infrastructure provisioning, deployment, and monitoring

DevOps - CI, CD

  • Continuous Integration
    • Continuously run your tests and packaging
  • Continuous Deployment
    • Continuously deploy to test environments
  • Continuous Delivery
    • Continuously deploy to production

DevOps - CI, CD Tools

  • AWS CodeCommit - Private source control (GIT)
  • AWS CodePipeling - Orchestrate CI/CD Pipelines
  • AWS CodeBuild - Build and Test Code (application packages and containers)
  • AWS CodeDeploy - Automate Deployment (EC2, ECS, Elastic Beanstalk, EKS, Lambda etc.)

DevOps - IAAC

  • Treat infrastructure the same way as application code
  • Track your infrastructure changes over time (version control)
  • Bring repeatability into your infrastructure
  • Two Key Parts
    • Infrastructure Provisioning
      • Provisioning compute, database, storage, and networking
      • Open source cloud neutral - Terraform
      • AWS Service - CloudFormation
    • Configuration Management
      • Install right software and tools on the provisioned resources
      • Open Source Tools - Chef, Puppet, Ansible
      • AWS Service - OpsWorks

AWS CloudFormation - Introduction

  • Let's consider an example:
    • I would want to create a new VPC and a subnet
    • I want to provision a ELB, ASG with 5 EC2 instances and an RDS database in the subnet
    • I would want to setup the right security groups
  • AND I would want to create 4 environments
    • Dev, QA, Stage, and production!
  • Cloud Formation can help you do all these with a simple (or not so simple) script!

AWS CloudFormation - Advantages

  • Automate deployment and modification of AWS resources in a controlled, predictable way
  • Avoid configuration drift
  • Avoid mistakes with manual configuration
  • Think of it as version control for your environments

AWS CloudFormation

  • All configuration is defined in a simple text file - JSON or YAML
    • I want a VPC, a subnet, a database, and ...
  • CloudFormation understands dependencies
    • Create VPCs first, then subnets, then the database
  • (Default) Automatic rollbacks on errors (Easier to retry)
    • If creation of the database fails, it would automatic delete the subnet and VPC
  • Version control your configuration file and make changes to it over time
  • Free to use - Pay only for the resources provisioned
    • Get an automated estimate for your configuration

AWS CloudFormation - Terminology

  • Template
    • A CloudFormation JSON or YAML defining multiple resources
  • Stack
    • A group of resources that are created from a CloudFormation template
    • In the earlier example, the stack contains an EC2 instance and a security group
  • Change Sets
    • To make changes to stack, update the template
    • Change set shows what would change if you execute
    • Allows you to verify the changes and then execute

AWS CloudFormation - Important Template Elements

  • Resources: What do you want to create
    • One or more mandatory elements
  • Parameters - values to pass to your template at runtime
    • Which EC2 instance to create? - ("t2.micro", "m1.small" , "m1.large")
  • Mappings - Key value pairs
    • Example: Configure different values for different regions
  • Outputs - return values from execution
    • See them on console and use in automation

AWS CloudFormation - remember

  • Deleting a stack deletes all the associated resources
    • EXCEPT for resources with DeletionPolicy attribute set to "Retain"
    • You can enable termination protection for the entire stack
  • Templates are stored in S3
  • Use CloudFormation Designer to visually design templates
  • AWS CloudFormation StackSets
    • Create update, or delete stacks across multiple accounts and regions with a single operation

CloudFormation vs AWS Elastic Beanstalk

  • (Do you know?) You can create an Elastic Beanstalk environment using cloud formation
  • Think of Elastic Beanstalk as a pre-packaged Cloud Formation template with a User Interface
    • You choose what you want
    • (Background) A cloud formation template is created and executed
    • The environment is ready

AWS CDK: Define Cloud Infrastructure Using Code

  • AWS CDK (Cloud Development Kit): Provision AWS resources with familiar programming languages
    • Code as infrastructure: Use TypeScript, Python, Java, or .NET to define resources
    • Use CloudFormation: Translates your code into AWS CloudFormation templates for reliable and repeatable deployments
    • Streamline Development: Simplify the creation of complex, multi-component AWS applications with modular, reusable components
    • Automate Deployment: Integrate with AWS deployment pipelines for CI/CD

AWS OpsWorks - Configuration Management

  • OpsWorks is used for Configuration Management
    • How do you ensure that 100 servers have the same configuration?
    • How can I make a change across 100 servers?
  • Managed service based on Chef and Puppet
  • One service for deployment and operations in cloud and on-premise environments
  • Configuration - Chef recipes or cookbooks, Puppet manifests
  • All metrics are sent to Amazon CloudWatch
  • (IMPORTANT) All configuration management tools can also do infrastructure provisioning
    • However, I would recommend NOT doing that as they are not good at infrastructure provisioning

AWS CloudShell: Command Line at Your Fingertips

  • AWS CloudShell: instant Command Line I/F
  • Browser-based access: No setup required, use the AWS CLI directly from your browser
  • pre-authenticated: Automatically logs in with your console credentials for immediate access to your resources
  • Built-in tools: Comes with pre-installed AWS CLI and other useful software to manage your resources
  • No extra cost: Available at no additional charge, you pay only for the AWS resources you manage with CloudShell
Management Services in AWS

AWS Organizations

  • organization have multiple AWS accounts
    • Different business units
    • Different environments
  • How do you centralize your management (billing, access control, compliance and security) across multiple AWS accounts?
  • Welcome AWS organizations!
  • Organize accounts into Organizational Units (OU)
  • Provides API to automate creation of new accounts

AWS Organizations - Features

  • One consolidated bill for all AWS accounts
  • Centralized compliance management for AWS Config Rules
  • Send AWS CloudTrail data to one S3 bucket (across accounts)
  • AWS Firewall Manager to manage firewall rules (across accounts)
    • AWS WAF, AWS Shield Advanced protections, and Security Groups
  • Use Service control points (SCPs) to define restrictions for actions (across accounts):
    • Prevent users from disabling AWS Config or changing its rules
    • Require Amazon EC2 instances to use a specific type
    • Require MFA to stop an Amazon EC2 instance
    • Require a tag upon resource creation

AWS Trusted Advisor

  • Recommendations for cost optimization, performance, security and fault tolerance
    • Red - Action recommended Yellow - investigate and Green - Good to go
  • All AWS customers get 4 checks for free:
    • Service limits (usage > 80%)
    • Security groups having unrestricted access (0.0.0.0/0)
    • Proper use of IAM
    • MFA on Root Account
  • Business or Enterprise AWS support plan provides over 50 checks
    • Disable those you are not interested in
    • How much will you save by using Reserved Instances?
    • How does your resource utilization look like? Are you right sized?

AWS Trusted Advisor Recommendations

  • Cost Optimization
    • Highlight unused resources
    • Opportunities to reduce your costs
  • Security
    • settings that can make your AWS solution more secure
  • Fault Tolerance
    • Increase resiliency of your AWS solution
    • Redundancy improvements, over-utilized resources
  • Performance
    • Improved speed and responsiveness of your AWS solutions
  • Service Limits
    • Identify if your service usage is more than 80% of service limits

AWS Service Quotas

  • AWS account has Region-specific default quotas or limits for each service
    • You don't need to remember all of them
  • Service Quotas allows you to manage your quotas for over 100 AWS services, from one location

AWS Directory Service

  • Provide AWS Access to on-premise users without IAM users
  • Managed service deployed across multiple AZs
  • Option 1: AWS Directory Service for Microsoft AD
    • More than 5000 Users
    • Trust relationships needed between AWS and on-premise directory
  • Option 2: Simple AD
    • Less than 5000 users
    • Powered by Samba4 and compatible with Microsoft AD
    • Does not support trust relationships with other AD domains
  • Option 3: AD Connector
    • Use your existing on-premise directory with other AWS cloud services
    • Your use existing credentials to access AWS resources

Billing and Cost Management Services / Tools

  • AWS Billing and Cost Management - Pay your AWS bill, monitor your usage, and analyze and control your costs
    • Costs Explorer - View your AWS cost as a graph. Filter by a Region, AZ, tags, etc. See future cost projection
    • AWS Budgets - Create a budget. Create Amazon SNS notifications to alert you when you go over (or projected to go over) budget
  • AWS Compute Optimizer - recommends optimal AWS Compute resources to reduce costs (Example: Right-sizing - EC2 instance type and EC2 Auto Scaling group configuration)
  • AWS Pricing Calculator (NEW) - Estimate cost of your architecture solution
  • AWS Simple Monthly Calculator (OLD) - Estimate charges for AWS services
  • Total Cost of Ownership (TCO) Calculator (OLD) - Compare Cost of running applications in AWS vs On Premise

Other Management Services

  • AWS Marketplace
    • Digital catalog to find, test, buy, and deploy licensed software solutions using flexible pricing options: Bring Your Own License (BYOL), free trial, pay-as-you-go, hourly, monthly, etc.
  • Resource Groups
    • Group your AWS resources
    • Automate Tasks using AWS Systems Manager
    • Get group related insights from AWS Config and CloudTrail
  • AWS Systems Manager
    • Run commands (operational tasks) on Amazon EC2 instances
    • Manage your OS patches
  • Personal Health Dashboard
    • Personalized alerts when AWS is experiencing events that my impact you
    • Provides troubleshooting guidance

Serverless Architecture

Rest API Challenges

  • Most applications today are built around REST API
  • Management of REST API is not easy
    • You've to take care of authentication and authorization
    • You've to e able to set limits (rate limiting, quotas) for your API consumers
    • You've to take care of implementing multiple versions of your API
    • You would want to monitor your API calls
    • You would want to be able to cache API requests

Amazon API Gateway

  • How about a fully managed service with auto scaling that can act as a "font door" to your APIs?
  • Welcome "Amazon API Gateway"
    • "publish, monitor, and secure APIs at any scale"
    • Integrates with AWS Lambda, Amazon EC2, Amazon ECS, or any web application
    • Supports HTTP(S) and WebSockets (two way communication - chat apps and streaming dashboards)
    • Serverless. Pay for use (API calls and connection duration)

Amazon API Gateway - Remember

  • Run multiple versions of the same API
  • Rate Limits (request quota limits), throttling and fine-grained access permissions using API Keys for Third-Party Developers
  • Implement Authorization with
    • AWS IAM
    • Amazon Cognito
    • Custom Lambda Authorizer

Amazon Cognito

  • Want to quickly add a sign up page and authentication for your mobile and web apps?
  • Want to integrate with web identity providers (example: Google, Facebook, Amazon) and provide a social sign-in?
  • Do you want security features such as multi-factor authentication (MFA), phone and email verification?
  • Want to create your own user database without worrying about scaling or operations?
  • Let's go: Amazon Cognito
  • Support for SAML

Amazon Cognito - User Pools

  • Do you want to create your own secure and scalable user directory?
  • Do you want to create sign-up pages?
  • Do you want a built-in, customizable web UI to sign in users (with option to social sign in)?
  • Create a user pool

Amazon Cognito - Identity Pools

  • Identity pools provide AWS credentials to grant your users access to other AWS services
  • Connect identity pools with authentication (identity) providers
    • Your own user pool OR
    • Amazon, Apple, Facebook, Twitter, OR
    • OpenID Connect Provider OR
    • SAML identity Providers (SAML 2.0)
  • Configure multiple authentication (identity) providers for each identity pool
  • Federated identity
    • An external authentication (identity) provider
    • ex. Amazon, Apple, Facebook, OpenID, or SAML identity providers

Serverless Application Model

  • 1000s of Lambda functions to manage, versioning, deployment, etc.
  • Serverless projects can become maintenance headache
  • How to test serverless projects with Lambda, API Gateway, and DynamoDB in your local?
  • How to ensure that your serverless projects are adhering to best practices?
    • Tracing (X-Ray), CI/CD (CodeBuild, CodeDeploy, CodePipeline) etc.
  • Welcome SAM - Serverless Application model
    • Open source framework for building serverless applications
    • Define YAML with all the serverless resources you want:
      • Functions, APIs, Databases, etc.
    • BEHIND THE SCENES: Your configuration is used to create a AWS CloudFormation syntax to deploy your application

AWS Step Functions

  • Create a serverless workflow in 10 minutes using a visual approach
  • Orchestrate multiple AWS services into serverless workflows:
    • Invoke an AWS Lambda function
    • Run an Amazon Elastic Container Service or AWS Fargate task
    • Get an existing item from an Amazon DynamoDB table or put a new item into a DynamoDB table
    • Publish a message to an Amazon SNS topic
    • Send a message to an Amazon SQS queue
  • Build workflows as a series of steps
    • Output of one step flows as input into next step
    • Retry multiple times until it succeeds
    • Maximum duration of 1 year
  • Integrates with Amazon API Gateway
    • Expose API around Step Functions
    • Include human approvals into workflows
  • (Use case) Long running workflows
    • Machine learning model training, report generation, and IT automation
  • (Use case) Short duration workflows
    • IoT data ingestion, and streaming data processing
  • (Benefits) Visual workflows with easy updates and less code
  • (Alternative) Amazon Simple Workflow Service
    • Complex orchestration code (external signals, launch child processes)
  • Step Functions is recommended for all new workflows UNLESS you need to write complex code for orchestration
Containers and Container Optimization

Microservices

  • Enterprises are heading towards microservices architectures
  • Build small focused microservices
  • Flexibility to innovate and build applications in different programming languages (Go, Java, Python, JavaScript, etc.)
  • But deployments become complex!
  • How can we have one way of deploying Go, Java, Python, or JavaScript ... microservices?
    • Enter containers!

Docker

  • Create Docker images for each microservice
  • Docker image contains everything a microservice needs to run:
    • Application Runtime
    • Application code
    • Dependencies
  • You can run these docker contains the same way on any infrastructure
    • Your local machine
    • Corporate data center
    • Cloud

Docker - Advantages

  • Docker containers are light weight (compared to virtual Machines)
  • Docker provides isolation for containers
  • Docker is cloud neutral
  • How do you manage 1000's of containers belonging to multiple microservices?
    • Enter Container Orchestration
  • Requirement: i want 10 instances of Microservice A container, 15 instances of Microservice B container, and ...
  • Typical Features:
    • Auto Scaling - Scale containers based on demand
    • Service Discovery - Help microservices find one another
    • Load Balancer - Distribute load among multiple instances of a microservice
    • Self Healing - Do health checks and replace failing instances
    • Zero Downtime Deployments - Release new versions without downtime

Cloud Orchestration Options

  • Cloud Neutral
    • Kubernetes
    • AWS service - AWS elastic Kubernetes Service (EKS)
    • EKS does not have a free tier
  • AWS Specific
    • AWS Elastic Container Service (ECS)
    • AWS Fargate: Serverless version of AWS ECS
    • AWS Fargate does not have a free tier

Amazon Elastic Container Service (Amazon ECS)

  • Fully managed service for container orchestration
  • Serverless option - AWS Fargate
  • Use cases:
    • Microservices Architectures - Create containers for your microservices and orchestrate them using ECS and Fargate
    • Batching Processing. Run batch workloads on EC2 and AWS Batch
Architecture and Best Practices

Well Architected Framework

  • Helps cloud architects build application infrastructure which is:
    • Secure
    • High-performing
    • Resilient and
    • Efficient
  • Five Pillars
    • Operational Excellence
    • Security
    • Reliability
    • Performance Efficiency
    • Cost Optimization

Operational Excellence

  • Avoid/Minimize effort and problems with
    • Provisioning servers
    • Deployment
    • Monitoring
    • Support

Operational Excellence - Solutions and AWS Services

  • Use managed services
    • You don't need to worry about managing servers, availability, durability, etc.
  • Go serverless
    • Prefer Lambda to EC2
  • Automate with Cloud Formation
    • Use Infrastructure As Code
  • Implement CI/CD to find problems early
    • CodePipeline
    • CodeBuild
    • CodeDeploy
  • Perform frequent, small reversible changes

Operational Excellence - Solutions and AWS Services

  • Prepare: for failure
    • Game days
    • Disaster recovery exercises
    • Implement standards with AWS Config Rules
  • Operate: Gather Data and Metrics
    • CloudWatch (Logs agent), Config, Config Rules, CloudTrail, VPX Flow Logs and X-Ray (tracing)
  • Evolve: Get intelligence
    • Use Amazon Elasticsearch to analyze your logs

Security Pillar

  • Principle of least privilege for least time
  • Security in Depth - Apply security in all layers
  • Protect data in Transit and at rest
  • Actively monitor for security issues
  • Centralize policies for multiple AWS accounts

Security Pillar - Principle of least privilege for least time

  • Use temporary credentials when possible (IAM roles, Instance profiles)
  • Use IAM Groups to simplify IAM management
  • Enforce strong password practices
  • Enforce MFA
  • Rotate credentials regularly

Security Pillar - Security in Depth

  • VPCs and Private Subnets
    • Security groups
    • Network Access Control List (NACL)
  • Use hardened EC2 AMIs (golden image)
    • Automate patches for OS, Software etc
  • Use CloudFront with AWS Shield for DDoS mitigation
  • Use WAF with CloudFront and ALB
    • Protect web applications from CSS, SQL injection, etc.
  • Use CloudFormation
    • Automate provisioning infrastructure that adheres to security policies

Security Pillar - Protecting Data at Rest

  • Enable Versioning (when available)
  • Enable encryption - KMS and Cloud HSM
    • Rotate encryption keys
  • Amazon S3
    • SSE-C, SSE-S3, SSE-KMS
  • Amazon DynamoDB
    • Encryption Client, SSE-KMS
  • Amazon Redshift
    • Amazon KMS and AWS Cloud HSM
  • Amazon EBS, Amazon SQS, and Amazon SNS
    • Amazon KMS
  • Amazon RDS
    • Amazon KMS, TDE

Security Pillar - Protecting Data in Transit

  • Data coming in and going out of AWS
  • By default, all AWS API use HTTPS/SSL
  • You can also choose to perform client side encryption for additional security
  • Ensure that your data goes through AWS network as much as possible
    • VPX Endpoints and AWS PrivateLink

Security Pillar - Detect Threats

  • Actively monitor for security issues
    • Monitor CloudWatch Logs
    • Use Amazon GuardDuty to detect threats and continuously monitor for malicious behavior

Reliability

  • Ability to:
    • recover from infrastructure and application issues
    • Adapt to changing demands in load

Reliability - Best Practices

  • Prefer serverless architectures
  • Prefer loosely coupled architectures
    • SQS, SNS
  • Distributed System Best Pracices
    • Use Amazon API Gateway for throttling requests
    • AWS SDK provides with exponential backoff

Loosely Coupled Architectures

  • ELB
    • Woks in tandem with AWS auto scaling
  • Amazon SQS
    • Polling Mechanism
  • Amazon SNS
    • Publish subscribe pattern
    • Bulk notifications and Mobile push support
  • Amazon Kinesis
    • Handle event streams
    • Multiple clients
    • Each client can track their stream position

Performance Efficiency - Best Practices

  • Use managed services
    • Focus on your business instead of focusing on resource provisioning and management
  • Go serverless
    • Lower transactional cost and less operational burden
  • Experiment
    • Cloud makes it easy to experiment
  • Monitor Performance
    • Trigger CloudWatch alarms and perform actions through Amazon SQS and Lambda

Performance Efficiency - Choose the Right Solution

  • Compute
    • EC2 instance vs Lambda vs Containers
  • Storage
    • Block, File, Object
  • Database
    • RDS vs DynamoDB vs RedShift
  • Caching
    • ElastiCache vs CloudFront vs DAX vs Read Replicas
  • Network
    • CloudFront, Global Accelerator, Route 52, Placement Groups, VPC endpoints, Direct Connect
  • Use product specific features
    • Enhanced Networking, S3 Transfer Acceleration, EBS Optimized Instances

Cost Optimization

  • Run systems at lowest cost

Cost Optimization - Best Practices

  • Match supply and demand
    • Implement Auto Scaling
    • Stop Dev/Test resources when you don't need them
    • Go Serverless
  • Track your expenditure
    • Cost Explorer to track and analyze your spend
    • AWS Budgets to trigger alerts
    • use tags on resources

Cost Optimization - Choose Cost-Effective Solutions

  • Right Sizing: Analyze 5 large servers vs 10 small servers
    • Use CloudWatch (monitoring) and Trusted Advisor (recommendations) to right size your resource
  • Email server vs managed email service (charged per email)
  • On-demand vs reserved bs spot instances
  • Avoid expensive software: MySQL vs Aurora vs Oracle
  • Optimize data transfer costs using AWS Direct Connect and Amazon CloudFront
Digital Transformation

  • How consumers make purchase decisions? (Social)
  • How we do things? (Mobile)
  • How much data we have(Big Data)
  • Digital Transformation: Using modern technologies to create (or modify) business processes and customer experiences by innovating with technology and team culture
Shared Responsibility Model

  • Security and Compliance is shared between AWS and customer

Shared Responsibility Model

  • AWS manages security of the cloud
    • AWS operates, manages, and controls components from the Host OS and virtualization layer down to the physical safety
  • YOU are responsible for security in the cloud
    • Customer assumes responsibility and management of
      • Guest operating system (including updates and security patches)
      • Application software
      • Configuration of Security Group
      • Choosing and Integrating AWS Services with their IT environments
More AWS Services

AWS Transit Gateway

  • AWS Transit Gateway: connect multiple VPCs with VPN and DC
    • Supports Global inter-Region peering
    • Traffic between an Amazon VPC and AWS Transit Gateway remains on the AWS global private network
    • Create Route Table and associate Amazon VPCs and VPNs

Machine Learning - 3 Approaches

  • Use Pre-Trained Models
    • Get intelligence from text, images, audio, video
    • Amazon Comprehend, Amazon Rekognition...
  • Build simple models: Without needing Data scientists
    • Limited / no-code experience
    • Example: Amazon SageMaker AutoML
  • Build complex models: Using data scientists and team
    • Build your own ML models from zero (code-experienced)
    • Example: Amazon SageMaker

Pre-Trained Models in AWS

  • Amazon Comprehend: Analyze unstructured text
  • Amazon Textract: Easily extract text and data from virtually any document
  • Amazon Rekognition: Search and Analyze Images and Videos
  • Amazon Transcribe: Powerful Speech Rekognition
  • Amazon Polly: Turn Text into Lifelike Speech
  • Amazon Translate: Powerful Neural Machine Translation
  • Amazon Personalize: Add real-time recommendations to your apps
  • Amazon Fraud Detector: Detect online fraud faster
  • Amazon Forecast: Time-series forecasting service
  • Amazon Kendra: Intelligent search service (Search from scattered content - multiple locations and content repositories)
  • Amazon Lex: Build Voice and Text chatbots

Amazon SageMaker

  • Amazon SageMaker: Simplifies creation of your models
    • Manage data, code, compute, models, etc.
    • Prepare data
    • Train models
    • Publish models
    • Monitor models
  • Multiple Options to create models
    • AutoML/Autopilot: Build custom models with minumum ML experties
    • Build your own models: Data Scientists
      • Support for deep learning frameworks such as TensorFlow, Apache MXNet, PyTorch, and more (use them within the built in containers)
      • Data and compute

Big Data - Terminology and Evolution

  • 3Vs of Big Data
    • Volume: Terabytes to Petabytes to Exabytes
    • Variety: Structured, Semi structured, unstructured
    • Velocity: Batch, Streaming
  • Terminology: Data warehouse vs Data lake
    • Data warehouse: PBs of Storage + Compute (Typically)
      • Data stored in format ready for specific analysis (processed data)
        • Examples: terabyte, BigQuery (GCP), Redshift (AWS), Azure Synapse Analytics
      • Typically uses specialized hardware
    • Data lake: Typically retains all raw data (compressed)
      • Typically object storage is used as data lake
        • Amazon S3, Google Cloud Storage, Azure Data Lake Storage Gen2 etc..
      • Flexibility while saving cost
      • Perform ad-hoc analysis on demand
      • Analytics & intelligence services (even data warehouses) can directly read from data lake
        • Azure Synapse Analytics, BigQuery(GCP), Redshift Spectrum(AWS), Amazon Athena etc..

Big Data and Data warehousing in AWS

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC