AWS Cloud Practitioner Study Session Eight
January 02, 2026
I am taking the AWS Cloud Practitioner Exam in approximately two days and want to ensure I am prepared. This series will serve as non-exhaustive note taking for the information that I am internalizing as I go.
ChatGPT Summary:
AWS Certified Cloud Practitioner – AI/ML, Data & Analytics Services Summary
This section is about choosing the right level of intelligence and data tooling. The exam does not expect you to build ML models, but it does expect you to know:
- When to use pre-built AI vs custom ML
- Which AWS service matches a specific AI or data use case
- How data flows from ingestion → processing → analytics → visualization
The AWS AI/ML Stack (Very Testable)
AWS organizes AI/ML into three tiers, from least to most customization.
🧠 Golden Memory Trick
Use it → Train it → Build it
| Tier | What It Means | Who It’s For |
|---|---|---|
| Tier 1 – AI Services | Pre-built, trained models | Developers who want results fast |
| Tier 2 – ML Services | Build/train models without infra | Data scientists |
| Tier 3 – Frameworks & Infra | Full control, custom ML | ML engineers |
Tier 1: Pre-Built AWS AI Services
When to Use
- You don’t want to train models
- You want quick AI capabilities
- Common tasks like speech, vision, language, recommendations
🧠 Exam Clue
- “No ML expertise required”
- “Pre-trained” → AI Services
Language & Text Services
Amazon Comprehend (NLP)
- Extracts:
- Key phrases
- Sentiment
- Language
- From unstructured text
Use Cases
- Customer sentiment analysis
- Content classification
- Compliance monitoring
🧠 Memory Tip:
Comprehend = “Understand text”
Amazon Polly (Text → Speech)
- Converts text into natural-sounding speech
Use Cases
- Virtual assistants
- E-learning
- Accessibility
🧠 Memory Tip:
Polly talks
Amazon Transcribe (Speech → Text)
- Converts audio into text
- Supports:
- Real-time transcription
- Speaker identification
Use Cases
- Call transcription
- Subtitles
- Media metadata
🧠 Memory Tip:
Transcribe writes what you say
Amazon Translate
- Translates text between languages
Use Cases
- Multilingual apps
- Document translation
🧠 Memory Tip:
Translate translates
Amazon Kendra (Enterprise Search)
- Uses NLP to search enterprise content
Use Cases
- Intelligent search
- Chatbots
- Knowledge bases
🧠 Memory Tip:
Kendra finds answers
Vision & Document Services
Amazon Rekognition
- Image and video analysis
- Identifies:
- Objects
- Faces
- Text
- Activities
Use Cases
- Content moderation
- Identity verification
- Media analysis
🧠 Memory Tip:
Rekognition recognizes images
Amazon Textract
- Extracts text from:
- Forms
- Tables
- Handwritten documents
Use Cases
- Financial documents
- Healthcare records
- Government forms
🧠 Memory Tip:
Textract extracts text
Conversational AI & Personalization
Amazon Lex
- Builds conversational interfaces
- Uses:
- ASR (speech recognition)
- NLU (language understanding)
Use Cases
- Chatbots
- Virtual assistants
- FAQ bots
🧠 Memory Tip:
Lex lets you talk to apps
Amazon Personalize
- Generates personalized recommendations
Use Cases
- Product recommendations
- Media streaming suggestions
🧠 Memory Tip:
Personalize = “Just for you”
Tier 2: ML Services (Amazon SageMaker AI)
What It Is
- Fully managed ML platform
- Build, train, deploy models
- No infrastructure management
Key Benefits
- Multiple ML tools (IDE, no-code)
- Managed, scalable infrastructure
- Repeatable ML workflows (MLOps)
🧠 Memory Tip:
SageMaker = “Serious ML without servers”
📝 Exam Clue
- “Custom ML model”
- “Training and deployment” → SageMaker
Generative AI on AWS (High-Level Exam Awareness)
Amazon SageMaker JumpStart
- Hub of:
- Pre-trained models
- Foundation models (FMs)
Use Cases
- Rapid deployment
- Fine-tuning models
🧠 Memory Tip:
JumpStart = Start fast
Amazon Bedrock
- Managed access to Foundation Models
- Unified API for multiple providers
Use Cases
- Generative AI apps
- Multimodal content
- Conversational agents
🧠 Memory Tip:
Bedrock = Foundation models
Amazon Q
Amazon Q Business
- AI assistant for company knowledge
Use Cases
- Answering internal questions
- Automating workflows
Amazon Q Developer
- AI coding assistant
Use Cases
- Code generation
- Code reviews
- Security improvements
🧠 Memory Tip:
Q = Question-answering AI
Data Pipelines (ETL Fundamentals)
ETL Stages
- Extract data from sources
- Transform into usable formats
- Load into analytics systems
🧠 Memory Tip:
ETL = Get it, clean it, store it
AWS Data Pipeline Services (End-to-End)
Data Ingestion
Amazon Kinesis Data Streams
- Real-time data ingestion
- Handles large volumes of streaming data
🧠 Memory Tip:
Kinesis = Streaming
Amazon Data Firehose
- Near real-time delivery
- Sends data to:
- S3
- Redshift
- Analytics tools
🧠 Memory Tip:
Firehose delivers
Data Storage
Amazon S3 (Data Lake)
- Virtually unlimited storage
- Ideal for unstructured data
🧠 Memory Tip:
S3 = Data lake foundation
Amazon Redshift (Data Warehouse)
- Columnar storage
- Massively parallel processing
🧠 Memory Tip:
Redshift = Fast SQL analytics
Data Cataloging
AWS Glue Data Catalog
- Central metadata repository
- Improves data discovery
🧠 Memory Tip:
Glue Catalog = Data dictionary
Data Processing
AWS Glue
- Serverless ETL service
🧠 Memory Tip:
Glue prepares data
Amazon EMR
- Big data processing
- Supports Spark, Hadoop, Hive
🧠 Memory Tip:
EMR = Big data clusters
Data Analysis & Visualization
Amazon Athena
- Serverless SQL on S3
🧠 Memory Tip:
Athena = Ask data with SQL
Amazon QuickSight
- BI dashboards
- Natural language queries
🧠 Memory Tip:
QuickSight = See insights quickly
Amazon OpenSearch Service
- Search and analytics
- Visualize pipeline data
🧠 Memory Tip:
OpenSearch = Search your data
End-to-End Analytics Example (Exam-Style)
Goal: Analyze streaming application data
- Kinesis ingests real-time data
- Firehose delivers to S3
- Glue catalogs and transforms
- Athena / Redshift analyze
- QuickSight visualizes insights
🧠 Memory Tip:
Ingest → Store → Process → Analyze → Visualize
Final Exam Takeaways
- Tier 1 AI Services = Pre-trained, fast results
- SageMaker = Custom ML without infra
- Bedrock = Generative AI foundation models
- Kinesis = Real-time streams
- Firehose = Data delivery
- S3 = Data lake
- Athena = Serverless SQL
- Redshift = Data warehouse
- QuickSight = Dashboards
Study materials:
- Free Code Camp Preparation
- AWS Certified Solutions Architect Practice Tests
- AWS Cloud Practitioner Essentials
- AWS Documentation
- What is Cloud Computing?
- Shared Responsibility Model
- Regions and Availability Zones
- Containers on AWS
- Amazon Elastic Container Registry
- Amazon Elastic Container Service
- Amazon Elastic Kubernetes Service
- AWS Fargate
- AWS Elastic Beanstalk
- AWS Batch
- What is Amazon Lightsail?
- What is AWS Outposts?
- Choosing a modern application strategy
- AWS Global Infrastructure
- AWS for the Edge
- AWS CloudFormation
- Amazon Virtual Private Cloud
- Subnet
- Internet gateway
- Virtual private gateway
- AWS Client VPN
- AWS Site-to-Site VPN
- AWS PrivateLink
- AWS Direct Connect
- Network Access Control List (network ACL)
- Security groups
- Domain Name System (DNS)
- Amazon Route 53
- Amazon CloudFront
- AWS Global Accelerator
- Amazon Transit Gateway
- NAT Gateway
- API Gateway
- Amazon EC2 Instance Store User Guide
- Amazon Elastic Block Store (Amazon EBS)
- Amazon Elastic Block Store (Amazon EBS) FAQ
- Amazon EBS Snapshots User Guide
- Amazon Data Lifecycle Manager User Guide
- Amazon Simple Storage Service (Amazon S3)
- Amazon Simple Storage Service (Amazon S3) FAQ
- Amazon S3 Storage Classes
- Amazon S3 Versioning User Guide
- Amazon S3 Buckets User Guide
- Amazon Elastic File System (Amazon EFS)
- Amazon Elastic File System (Amazon EFS) FAQ
- Amazon FSx
- Amazon FSx for Windows File Server
- Amazon FSx for NetApp ONTAP
- Amazon FSx for OpenZFS
- Amazon FSx for Lustre
- AWS Storage Gateway
- Amazon S3 File Gateway
- Tape Gateway
- Volume Gateway
- Amazon Relational Database Service (Amazon RDS)
- Amazon RDS Security
- Amazon Aurora
- AWS Database Migration Service (AWS DMS)
- Amazon DynamoDB
- Amazon ElastiCache
- Amazon DocumentDB
- Amazon Backup
- Amazon Neptune
- What Is a Relational Database?
- What Is a NoSQL Database?
- What Is an In-Memory Caching Service?
- AWS Shared Responsibility Model
- Amazon Comprehend
- Amazon Polly
- Amazon Transcribe
- Amazon Translate
- Amazon Kendra
- Amazon Rekognition
- Amazon Textract
- Amazon Lex
- Amazon Personalize
- Amazon SageMaker AI
- Amazon SageMaker JumpStart
- Amazon Bedrock
- Amazon Q Business
- Amazon Q Developer
- Amazon Kinesis Data Streams
- Amazon Data Firehose
- Amazon S3
- Amazon Redshift
- AWS Glue Data Catalog
- AWS Glue
- Amazon EMR
- Amazon Athena
- Amazon QuickSight
- Amazon OpenSearch Service
- ChatGPT
Raw Input Notes:
AWS AI/ML stack is composed of 3 tiers.
- (1) AI Services - Pre-built models that are already trained to perform specific functions
- (2) ML Services - Customized approach with Amazon SageMaker AI where you build, train, deploy own ML models with fully managed infra
- (3) ML Frameworks / Infra - Custom approach to building models using purpose-built chips that integrate with popular ML frameworks
What is an ML Framework? A software library with pre-built, optimized components.
Tier 1: Pre-built AWS AI Services
- Language Services
- Computer Vision and Search Services
- Conversational AI and Personalization Services
Language Services - When you need to interpret text / speech and turn it into something meaningful. (TTS, STT)
- Amazon Comprehend - Uses NLP to extract key insights from docs. Develops insights by recognizing key phrases, language, sentiment.
- Use Cases: Content Classification, Customer Sentiment Analyisis, Compliance Monitoring
Amazon Polly - Converts text in to lifelike speech. Supports multiple languages, different genders, accents.
- Use Cases: Virtual assistants, e-learning apps, accessibility enhancements for visually impaired users
Amazon Transcribe - Converts speech into text. Supports multiple languages. Features: Speaker identification, custom vocabulary, real-time transcription.
- Use Cases: Customer call transcription, automated subtitling, metadata generation for media content
Amazon Translate - Text translation service that supports real-time and batch text translation across multiple languages
- Use Cases: Document translation and multi-language application integrations
Amazon Kendra - UsesNLP to search for answers within enterprise content.
- Use Cases: Intelligent search, chatbots, application search integration
Amazon Rekognition - Video analysis service. Can identify objects, people, text, scenes, activities within images and videos stored in Amazon S3.
- Use Cases: Content moderation, identity verification, media analysis, home automation services
Amazon Textract - Detects and extracts typed and handwritten text found in documents, forms, tables within documents
- Use Cases: Financial, healthcare, government form text extraction for quick processing
Amazon Lex - NLU and ASR to create lifelike conversations.
- Use Cases: Virtual assistants, natural language search for FAQs, automated application bots
Amazon Personalize - Can use historical data to build intelligent applications with personalized customer recommendations
- Use Cases: Personalized streaming, product, trending recommendations
Tier 2: ML Services
- Provides a more customized approach for more control without having to manage infrastructure
- SageMaker AI is a key offering in this tier
SageMaker AI: Fully managed service, can build, train, deploy ML models without worrying about infrastructure. IDE. Can track training, visualize data, debug workflows. Access to pre-trained models to deploy. Benefits - Choice of ML Tools: Increase innovation with different tools (IDE, no-code interface) Fully managed Infra: Focus on ML model development whle SageMaker AI provides with high-performance cost-effective infrastructure Repeatable ML Workflows: Automate / standardize MLOps practices and governance across your enterprise to support transparency and auditability
Introduction to Gen AI on AWS
- Amazon SageMaker JumpStart - An ML hub with FMs and pre-built ML solutions deployable with a few clicks
-
Use Cases: Quickly deploy pre-trained models, fine-tune with domain-specific data, compare performance for different models
- Amazon Bedrock - Fully managed service for adapting FMs from Amazon and elsewhere. Provides access to FMs through a single unified API.
-
Use Cases - Build gen AI apps, create apps that can generate multiple content types (multimodel), conversational agents
-
Amazon Q - Interactive AI assistant that can be integrated with company’s info repositories.
-
Amazon Q Business - Can answer pressing questions, help solve problems, take actions using data and expertise found in company’s info repositories Use Cases: Information requests, automated workflows, insight extract
-
Amazon Q Developer - Provides code recommendations to accelerate development for coding langauges. Use Cases: Faster code generation, improved reliability / security, automated code reviews
- Data Pipelines for ETL Processes
- (1) Extract data from arious sources and store it.
- (2) Transform data into consistent, usable format for downstream tools to consume.
-
(3) Load it into destination system like data warehouse or analytics platform.
- Data Analytics
- Use Cases - Loan companies explaining lending decisions to customers, medical researchers analyzing clinical trial data through hypothesis testing, insurance companies making risk assessment models transparent to regulators.
AWS Data Pipeline Services
- Amazon Kinesis Data Streams
- Real-time ingestion of terabytes of data from applications, streams, sensors.
-
Automatic provisioning and scaling in on-demand mode.
- Amazon Data Firehose
- Data ingestion in near real-time. Provides automatic provisioning and scaling. Delivers data within seconds to data lakes, warehouses, analytics services.
- Amazon S3 - Fully elastic, automatically scaling as you add / remove data.
- Amazon Redshift - Fully manageed data warehouse service that can store petabytes of structured / semistructured data.
- Data Cataloging Services
- AWS Glue Data Catalog - Centralized, scalable, managed metadata repository that enhances data discovery.
- Data Processing Services
- AWS Glue - Fully managed ETL, makes data prep simpler, faster, cost effective. Best suited for data processing in data pipeline.
- Amazon EMR - Automatically handles infra provisioning, cluster management, scaling. Supports Apache Spark, Apache Hadoop, Apache Hive.
- Data Analysis and Visualization Services
- Amazon Athena - Serverless service that can access data hosted on Amazon S3, on-prem, or multicloud and runs SQL queries to analyze data in relational, non-relational, object, custom data sources.
- Amazon Redshift - Fully managed data warehouse solution, columnar storage, massively parallel processesing architecture makes it ideal for analyzing large datasets.
- Can use it to perform SQL queries on large datasets for frequent, high-performance analytics workloads
- Amazon QuickSight - Technical and non-technical users can quickly create modern interactive dashboards / reports from data sources without managing infra.
-
Natural language queries
- Amazon OpenSearch Service - Can search for relevant content through precise keyword matching or natural language queries.
-
Can use OpenSearch Service to visualize data in a data pipeline.
- Amazon S3 can store virtually unlimited amounts of unstructured data. This makes it a popular data lake choice and best option for the team.
Feedback
Have thoughts or suggestions about this post?