How computers & cloud storage work — beginner map

How software & cloud storage fit together

← Hub

A mental map for beginners and interview warm-ups. Analogies first, cloud names second—then you can dig into each vendor’s docs. Pairs with architect interview lens and SQL Reference Guide once you know what “warehouse” means.

Why this matters (jobs, money, and Snowflake)

Interviewers and architects care that you can separate compute (CPU/RAM doing work right now) from storage (bytes kept durably) from network (moving bytes between places). Mixing them up leads to wrong cost guesses and weak designs.

Snowflake maps cleanly: your virtual warehouse is compute; long-lived tables lean on cloud object storage in the account’s region; the control plane coordinates auth, metadata, and query planning.

Interview phrase: “We optimized bytes scanned and warehouse concurrency, not just SQL syntax—because that’s where credits and SLAs live.”

1 — What is a computer, physically?

Four friends that work together:

Fun analogy: Cooking — CPU is you chopping, RAM is the cutting board (limited space), disk is the fridge (everything stored until you need it), network is ordering groceries delivered.

2 — What is software?

Software is instructions + data. Layers stack from metal upward:

YOU → browser or app (“I want to see my data”) ↓ APP → your application code (web app, game, Snowflake SQL worksheet…) ↓ OS → Windows / macOS / Linux (manages CPU, RAM, files, network) ↓ HARDWARE → CPU · RAM · disk · network card

Libraries and frameworks sit inside the “app” box—they reuse someone else’s solved problems (draw a button, talk HTTPS, parse JSON).

3 — What happens when you click “Run query”?

  1. Your browser sends a request over the network (HTTPS).
  2. A server in a data center receives it (a computer you rent, not the one on your desk).
  3. That server runs software (Snowflake’s services + warehouses), reads/writes storage, and sends a result back.
  4. Your screen paints rows—the “answer” was computed somewhere else.

So “the cloud” mostly means someone else’s computers running your workload, billed by use.

OLTP vs OLAP (two kinds of “database work”)

Same word “database,” very different jobs. Knowing which world you are in stops you from using the wrong tool for the pattern.

OLTP (online transaction processing) OLAP / analytics (warehouse workload)
Typical question “Insert this order,” “update this balance,” “show this customer’s last login.” “Revenue by region for three years,” “funnel conversion last week,” “train features from billions of rows.”
Row pattern Many small reads/writes; low latency per operation. Large scans, aggregations, joins; throughput matters more than single-row speed.
Common homes PostgreSQL, MySQL, SQL Server, Oracle—often backing a product or store. Snowflake, BigQuery, Redshift, Databricks SQL—curated analytics and reporting.
Storage shape (conceptual) Row-friendly layouts; indexes for point lookups. Often columnar or hybrid—great for “sum/average these columns across huge history.”

Rule of thumb: do not treat a warehouse like the primary database for thousands of single-row writes per second from a shopping app—that is OLTP territory. Land events fast, then batch or micro-batch into analytics stores.

Data warehouse, data mart, lake, lakehouse (one paragraph each)

Data warehouse: A governed place for analytics-ready data—dimensions, facts, conformed keys—so BI and SQL users get consistent answers. Workloads are mostly read-heavy and set-oriented.

Data mart: A smaller slice of the warehouse for one department (finance, sales). Same ideas, narrower scope—faster to build and permission.

Data lake: Cheap, flexible storage (often object storage + open file formats). Many teams can land raw data; many engines can read it. Flexibility goes up; governance discipline must go up too or it becomes a swamp.

Lakehouse (idea): Combine lake economics with warehouse-style tables—e.g. Apache Iceberg, Delta Lake—so you get ACID-ish table semantics and SQL over files. Products differ; the interview story is “one copy of truth, clearer contracts.”

ETL vs ELT

ETL: Extract → Transform outside the warehouse (tooling on VMs/containers) → Load curated tables.

ELT: Extract → Load raw or light shape → Transform inside the warehouse with SQL (scale compute when needed).

Neither is universally “right”—compliance, skill mix, and cost of large transforms drive the choice.

Batch vs “near real time”

How data flows (landing → curated → consumed)

Most platform answers sound like a pipeline. You do not need every buzzword—just the direction of travel.

SOURCES (apps, SaaS, IoT, partners) │ ▼ ingest (API, CDC, files, events) OBJECT STORAGE (often Parquet/CSV/JSON “landing”) │ ▼ transform & model (SQL, Spark, dbt…) WAREHOUSE / LAKE TABLES (curated, governed) │ ▼ serve BI · notebooks · ML features · reverse ETL

Hyperscalers sell the object store, virtual machines / containers, managed databases, identity, and network paths that make this pipeline someone else’s day job to rack-and-stack—you still own data contracts, access rules, and the bill.

Why “object storage” matters for data platforms

Traditional files live in folders on a disk you manage. Object storage is a giant, API-driven warehouse of objects (files + metadata + a key like a path). It scales huge, is built for the network, and is the bedrock under many databases and data lakes.

Analogy: Instead of a basement full of labeled boxes you walk to yourself, you get a barcode system: “bring me object reports/2025/jan.parquet” and the system retrieves it from a massive automated warehouse.

Three big clouds: buckets at a glance

Each has regions (geography), durability (very hard to lose data), and access control (who can read/write). Names differ; idea is the same.

Concept AWS Google Cloud Microsoft Azure
Object store product S3 (Simple Storage Service) Cloud Storage (GCS) — “buckets” Blob Storage — containers & blobs
Unit you create Bucket (globally unique name) Bucket Storage account → container → blob
Typical use Data lake files (Parquet, CSV), backups, static websites, ingest landing zone before loading a warehouse
“Cold / cheap” tiers S3 Glacier tiers, Intelligent-Tiering Archive / Nearline / Coldline Cool / Archive access tiers

Snowflake often reads your data from external stages pointing at these systems (with credentials and a URL). The warehouse compute is separate from where bytes sit—same interview story as “compute vs storage.”

Remember: The cloud logo is not magic—it’s disciplined ops: encryption, access keys, network paths, and bills. You’re the architect of who can touch which bucket.

Beyond buckets: what AWS, Azure, and GCP actually provide

Object storage is only one layer. Platforms are sold as building blocks you compose; interviews reward naming the category even if the product name slips.

Building block Why it matters AWS (examples) Azure (examples) GCP (examples)
Identity & access Who may read/write which bucket or table? Least privilege reduces breach blast radius. IAM roles & policies Microsoft Entra ID, RBAC IAM, service accounts
Network Private paths from your VPC to managed services; fewer public endpoints. VPC, PrivateLink Virtual Network, Private Link VPC, Private Service Connect
Encryption Data at rest (disk/object) and in flight (TLS). Keys tied to compliance stories. KMS Key Vault Cloud KMS
Regions & zones Latency, residency, and DR: data and compute placement are policy questions—not just performance. Pick region first; then enable services inside it. Cross-region replication exists but adds cost and complexity.

Snowflake runs in a cloud region; your external stages and egress patterns still follow the cloud provider’s rules. Verify private connectivity and encryption settings in current docs for each product.