AWS Cloud Practitioner Study Session Eight

January 02, 2026

I am taking the AWS Cloud Practitioner Exam in approximately two days and want to ensure I am prepared. This series will serve as non-exhaustive note taking for the information that I am internalizing as I go.

ChatGPT Summary:

AWS Certified Cloud Practitioner – AI/ML, Data & Analytics Services Summary

This section is about choosing the right level of intelligence and data tooling. The exam does not expect you to build ML models, but it does expect you to know:

When to use pre-built AI vs custom ML
Which AWS service matches a specific AI or data use case
How data flows from ingestion → processing → analytics → visualization

The AWS AI/ML Stack (Very Testable)

AWS organizes AI/ML into three tiers, from least to most customization.

🧠 Golden Memory Trick

Use it → Train it → Build it

Tier	What It Means	Who It’s For
Tier 1 – AI Services	Pre-built, trained models	Developers who want results fast
Tier 2 – ML Services	Build/train models without infra	Data scientists
Tier 3 – Frameworks & Infra	Full control, custom ML	ML engineers

Tier 1: Pre-Built AWS AI Services

When to Use

You don’t want to train models
You want quick AI capabilities
Common tasks like speech, vision, language, recommendations

🧠 Exam Clue

“No ML expertise required”
“Pre-trained” → AI Services

Language & Text Services

Amazon Comprehend (NLP)

Extracts:
- Key phrases
- Sentiment
- Language
From unstructured text

Use Cases

Customer sentiment analysis
Content classification
Compliance monitoring

🧠 Memory Tip:
Comprehend = “Understand text”

Amazon Polly (Text → Speech)

Converts text into natural-sounding speech

Use Cases

Virtual assistants
E-learning
Accessibility

🧠 Memory Tip:
Polly talks

Amazon Transcribe (Speech → Text)

Converts audio into text
Supports:
- Real-time transcription
- Speaker identification

Use Cases

Call transcription
Subtitles
Media metadata

🧠 Memory Tip:
Transcribe writes what you say

Amazon Translate

Translates text between languages

Use Cases

Multilingual apps
Document translation

🧠 Memory Tip:
Translate translates

Amazon Kendra (Enterprise Search)

Uses NLP to search enterprise content

Use Cases

Intelligent search
Chatbots
Knowledge bases

🧠 Memory Tip:
Kendra finds answers

Vision & Document Services

Amazon Rekognition

Image and video analysis
Identifies:
- Objects
- Faces
- Text
- Activities

Use Cases

Content moderation
Identity verification
Media analysis

🧠 Memory Tip:
Rekognition recognizes images

Amazon Textract

Extracts text from:
- Forms
- Tables
- Handwritten documents

Use Cases

Financial documents
Healthcare records
Government forms

🧠 Memory Tip:
Textract extracts text

Conversational AI & Personalization

Amazon Lex

Builds conversational interfaces
Uses:
- ASR (speech recognition)
- NLU (language understanding)

Use Cases

Chatbots
Virtual assistants
FAQ bots

🧠 Memory Tip:
Lex lets you talk to apps

Amazon Personalize

Generates personalized recommendations

Use Cases

Product recommendations
Media streaming suggestions

🧠 Memory Tip:
Personalize = “Just for you”

Tier 2: ML Services (Amazon SageMaker AI)

What It Is

Fully managed ML platform
Build, train, deploy models
No infrastructure management

Key Benefits

Multiple ML tools (IDE, no-code)
Managed, scalable infrastructure
Repeatable ML workflows (MLOps)

🧠 Memory Tip:
SageMaker = “Serious ML without servers”

📝 Exam Clue

“Custom ML model”
“Training and deployment” → SageMaker

Generative AI on AWS (High-Level Exam Awareness)

Amazon SageMaker JumpStart

Hub of:
- Pre-trained models
- Foundation models (FMs)

Use Cases

Rapid deployment
Fine-tuning models

🧠 Memory Tip:
JumpStart = Start fast

Amazon Bedrock

Managed access to Foundation Models
Unified API for multiple providers

Use Cases

Generative AI apps
Multimodal content
Conversational agents

🧠 Memory Tip:
Bedrock = Foundation models

Amazon Q

Amazon Q Business

AI assistant for company knowledge

Use Cases

Answering internal questions
Automating workflows

Amazon Q Developer

AI coding assistant

Use Cases

Code generation
Code reviews
Security improvements

🧠 Memory Tip:
Q = Question-answering AI

Data Pipelines (ETL Fundamentals)

ETL Stages

Extract data from sources
Transform into usable formats
Load into analytics systems

🧠 Memory Tip:
ETL = Get it, clean it, store it

AWS Data Pipeline Services (End-to-End)

Data Ingestion

Amazon Kinesis Data Streams

Real-time data ingestion
Handles large volumes of streaming data

🧠 Memory Tip:
Kinesis = Streaming

Amazon Data Firehose

Near real-time delivery
Sends data to:
- S3
- Redshift
- Analytics tools

🧠 Memory Tip:
Firehose delivers

Data Storage

Amazon S3 (Data Lake)

Virtually unlimited storage
Ideal for unstructured data

🧠 Memory Tip:
S3 = Data lake foundation

Amazon Redshift (Data Warehouse)

Columnar storage
Massively parallel processing

🧠 Memory Tip:
Redshift = Fast SQL analytics

Data Cataloging

AWS Glue Data Catalog

Central metadata repository
Improves data discovery

🧠 Memory Tip:
Glue Catalog = Data dictionary

Data Processing

AWS Glue

Serverless ETL service

🧠 Memory Tip:
Glue prepares data

Amazon EMR

Big data processing
Supports Spark, Hadoop, Hive

🧠 Memory Tip:
EMR = Big data clusters

Data Analysis & Visualization

Amazon Athena

Serverless SQL on S3

🧠 Memory Tip:
Athena = Ask data with SQL

Amazon QuickSight

BI dashboards
Natural language queries

🧠 Memory Tip:
QuickSight = See insights quickly

Amazon OpenSearch Service

Search and analytics
Visualize pipeline data

🧠 Memory Tip:
OpenSearch = Search your data

End-to-End Analytics Example (Exam-Style)

Goal: Analyze streaming application data

Kinesis ingests real-time data
Firehose delivers to S3
Glue catalogs and transforms
Athena / Redshift analyze
QuickSight visualizes insights

🧠 Memory Tip:
Ingest → Store → Process → Analyze → Visualize

Final Exam Takeaways

Tier 1 AI Services = Pre-trained, fast results
SageMaker = Custom ML without infra
Bedrock = Generative AI foundation models
Kinesis = Real-time streams
Firehose = Data delivery
S3 = Data lake
Athena = Serverless SQL
Redshift = Data warehouse
QuickSight = Dashboards

Study materials:

Raw Input Notes:

AWS AI/ML stack is composed of 3 tiers.

(1) AI Services - Pre-built models that are already trained to perform specific functions
(2) ML Services - Customized approach with Amazon SageMaker AI where you build, train, deploy own ML models with fully managed infra
(3) ML Frameworks / Infra - Custom approach to building models using purpose-built chips that integrate with popular ML frameworks

What is an ML Framework? A software library with pre-built, optimized components.

Tier 1: Pre-built AWS AI Services

Language Services
Computer Vision and Search Services
Conversational AI and Personalization Services

Language Services - When you need to interpret text / speech and turn it into something meaningful. (TTS, STT)

Amazon Comprehend - Uses NLP to extract key insights from docs. Develops insights by recognizing key phrases, language, sentiment.
Use Cases: Content Classification, Customer Sentiment Analyisis, Compliance Monitoring

Amazon Polly - Converts text in to lifelike speech. Supports multiple languages, different genders, accents.

Use Cases: Virtual assistants, e-learning apps, accessibility enhancements for visually impaired users

Amazon Transcribe - Converts speech into text. Supports multiple languages. Features: Speaker identification, custom vocabulary, real-time transcription.

Use Cases: Customer call transcription, automated subtitling, metadata generation for media content

Amazon Translate - Text translation service that supports real-time and batch text translation across multiple languages

Use Cases: Document translation and multi-language application integrations

Amazon Kendra - UsesNLP to search for answers within enterprise content.

Use Cases: Intelligent search, chatbots, application search integration

Amazon Rekognition - Video analysis service. Can identify objects, people, text, scenes, activities within images and videos stored in Amazon S3.

Use Cases: Content moderation, identity verification, media analysis, home automation services

Amazon Textract - Detects and extracts typed and handwritten text found in documents, forms, tables within documents

Use Cases: Financial, healthcare, government form text extraction for quick processing

Amazon Lex - NLU and ASR to create lifelike conversations.

Use Cases: Virtual assistants, natural language search for FAQs, automated application bots

Amazon Personalize - Can use historical data to build intelligent applications with personalized customer recommendations

Use Cases: Personalized streaming, product, trending recommendations

Tier 2: ML Services

Provides a more customized approach for more control without having to manage infrastructure
SageMaker AI is a key offering in this tier

SageMaker AI: Fully managed service, can build, train, deploy ML models without worrying about infrastructure. IDE. Can track training, visualize data, debug workflows. Access to pre-trained models to deploy. Benefits - Choice of ML Tools: Increase innovation with different tools (IDE, no-code interface) Fully managed Infra: Focus on ML model development whle SageMaker AI provides with high-performance cost-effective infrastructure Repeatable ML Workflows: Automate / standardize MLOps practices and governance across your enterprise to support transparency and auditability

Introduction to Gen AI on AWS

Amazon SageMaker JumpStart - An ML hub with FMs and pre-built ML solutions deployable with a few clicks
Use Cases: Quickly deploy pre-trained models, fine-tune with domain-specific data, compare performance for different models
Amazon Bedrock - Fully managed service for adapting FMs from Amazon and elsewhere. Provides access to FMs through a single unified API.
Use Cases - Build gen AI apps, create apps that can generate multiple content types (multimodel), conversational agents
Amazon Q - Interactive AI assistant that can be integrated with company’s info repositories.
Amazon Q Business - Can answer pressing questions, help solve problems, take actions using data and expertise found in company’s info repositories Use Cases: Information requests, automated workflows, insight extract
Amazon Q Developer - Provides code recommendations to accelerate development for coding langauges. Use Cases: Faster code generation, improved reliability / security, automated code reviews
Data Pipelines for ETL Processes
(1) Extract data from arious sources and store it.
(2) Transform data into consistent, usable format for downstream tools to consume.
(3) Load it into destination system like data warehouse or analytics platform.
Data Analytics
Use Cases - Loan companies explaining lending decisions to customers, medical researchers analyzing clinical trial data through hypothesis testing, insurance companies making risk assessment models transparent to regulators.

AWS Data Pipeline Services

Amazon Kinesis Data Streams
Real-time ingestion of terabytes of data from applications, streams, sensors.
Automatic provisioning and scaling in on-demand mode.
Amazon Data Firehose
Data ingestion in near real-time. Provides automatic provisioning and scaling. Delivers data within seconds to data lakes, warehouses, analytics services.
Amazon S3 - Fully elastic, automatically scaling as you add / remove data.
Amazon Redshift - Fully manageed data warehouse service that can store petabytes of structured / semistructured data.

Data Cataloging Services
AWS Glue Data Catalog - Centralized, scalable, managed metadata repository that enhances data discovery.

Data Processing Services
AWS Glue - Fully managed ETL, makes data prep simpler, faster, cost effective. Best suited for data processing in data pipeline.
Amazon EMR - Automatically handles infra provisioning, cluster management, scaling. Supports Apache Spark, Apache Hadoop, Apache Hive.

Data Analysis and Visualization Services
Amazon Athena - Serverless service that can access data hosted on Amazon S3, on-prem, or multicloud and runs SQL queries to analyze data in relational, non-relational, object, custom data sources.

Amazon Redshift - Fully managed data warehouse solution, columnar storage, massively parallel processesing architecture makes it ideal for analyzing large datasets.
Can use it to perform SQL queries on large datasets for frequent, high-performance analytics workloads

Amazon QuickSight - Technical and non-technical users can quickly create modern interactive dashboards / reports from data sources without managing infra.
Natural language queries
Amazon OpenSearch Service - Can search for relevant content through precise keyword matching or natural language queries.
Can use OpenSearch Service to visualize data in a data pipeline.
Amazon S3 can store virtually unlimited amounts of unstructured data. This makes it a popular data lake choice and best option for the team.

Feedback

Have thoughts or suggestions about this post?