Cost of S3 and Data Lake in AWS

This section provides a high-level cost overview for an S3-based data lake and common query services. Use it as a discussion point; pricing varies by region and usage.

Core Cost Drivers

S3 storage class (Standard, Intelligent-Tiering, Glacier).
Data ingest and transfer (PUT/GET requests, data transfer out).
Athena queries (charged per TB scanned).
Glue jobs and crawlers (billed by DPU-hour).
OpenSearch (cluster size and storage).

Cost Control Ideas

Partition data and use columnar formats (Parquet) to reduce Athena scans.
Use S3 lifecycle policies to move cold data to cheaper tiers.
Limit Glue job runs and use incremental crawls.
Monitor query patterns and cache results where possible.

What to Include in Your Assessment

Assumptions about data volume and growth.
Expected query frequency and typical scan size.
Basic cost levers you would use to optimize spend.