SQL — basics → expert (interview & real time)  
   
     

SQL Reference Guide — Basics to Expert

     

← Hub

         
 
 
   

New to SQL? Use Start here first (tables, rows, reading a query in English). Interview / on-call? Use Basics → Expert below. Snowflake dialect is close to ANSI SQL—always confirm function names in docs for your engine.

   
     

Start here — if you are a student or fresher

     

SQL is a language for asking questions about tables of data. You describe what you want; the database figures out how to fetch it (mostly—you still learn good habits).

     

Words that matter

           

Read a SELECT in English

     

Logical order to reason about (not always execution order): FROM (where is the data?) → WHERE (which rows keep?) → SELECT (which columns show?) → ORDER BY (how to sort?).

     
SELECT name, amount
FROM sales
WHERE region = 'EU'
ORDER BY amount DESC;
     

Plain English: “From the sales table, keep only rows where region is EU, show name and amount, sort biggest amount first.”

     

Common fresher mistakes

           
Next step: When the above feels obvious, jump to Basics for joins and null-safe patterns.
   
   
     

Basics — read and filter data

     

Think: FROM → WHERE → SELECT → ORDER (logical order you reason about, not always how the optimizer runs).

     

Core statements

           
SELECT id, name, amount * 1.1 AS amount_with_tax
FROM sales
WHERE region = 'EU' AND amount > 0
ORDER BY amount DESC
LIMIT 100;
     

Joins (interview favourite)

                                                   
JoinMeaning in one line
INNER JOINOnly rows that match both sides.
LEFT JOINAll left rows; right columns NULL if no match (great for “missing dimension” checks).
FULL OUTER JOINKeep both sides; unmatched columns NULL.
CROSS JOINCartesian product—easy to explode row count by mistake.
     

Nulls and safety

         
   
     

Intermediate — aggregates, shapes, reuse

     

Group and filter groups

     
SELECT region, COUNT(*) AS orders, SUM(amount) AS revenue
FROM sales
WHERE order_date >= DATE '2026-01-01'
GROUP BY region
HAVING SUM(amount) > 10000;
     

HAVING filters after aggregation (like a WHERE for groups).

     

Common Table Expressions (CTEs)

     

Readable “pipeline in SQL”: name intermediate steps. Interview win: you structure logic instead of nested subqueries only.

     
WITH daily AS (
  SELECT order_date::DATE AS d, SUM(amount) AS rev FROM sales GROUP BY 1
)
SELECT d, rev, AVG(rev) OVER (ORDER BY d ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rev_7d_avg
FROM daily;
     

CASE — conditional values

     
CASE WHEN amount >= 1000 THEN 'large' WHEN amount >= 100 THEN 'medium' ELSE 'small' END AS tier
     

Set operations

     

UNION combines results (dedup); UNION ALL keeps duplicates (usually faster). INTERSECT / EXCEPT where supported.

     

Semi-join patterns

     

EXISTS (SELECT 1 FROM … WHERE …) stops once a match exists—often clearer than IN (subquery) for large sets.

   
   
     

Expert — windows, analytics, SQL that “feels like code”

     

Window functions

     

Compute per row using a window of related rows without collapsing to one row per group. Pattern: function() OVER (PARTITION BY … ORDER BY … frame).

     
SELECT
  user_id,
  event_time,
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_time) AS seq,
  LAG(event_type, 1) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_type
FROM events;
           

Snowflake / modern dialect: QUALIFY

     

Filter on window results without a subquery—great for “latest row per key” patterns (verify support on your engine).

     

MERGE (upsert)

     

One statement: match rows, then WHEN MATCHED update/delete, WHEN NOT MATCHED insert—common in slowly changing dimensions and sync jobs.

     

Performance mindset (interview gold)

         
   
     

Real time & pipelines (SQL’s role)

     

Read this first if you are new: A warehouse is built for big questions over history, not for replacing your checkout database. “Fresher data” usually means shorter gaps between batches (minutes) or continuous small loads, then SQL that only touches new or changed rows instead of re-reading everything.

     

“Real time” in warehouses is usually micro-batch or continuous ingest, then SQL transforms on fresh data—not single-row OLTP inside the warehouse.

     

Vocabulary (one line each)

           

Why micro-batch is common

                                                         
PatternSQL idea
Incremental loadFilter on updated_at > :watermark or use change feeds / streams (Snowflake) / CDC tables.
Idempotent loadMERGE on natural key; or delete+insert partition; avoid blind duplicates.
Late-arriving eventsWindows with RANGE / watermarks in streaming engines; in SQL, QUALIFY or self-join on “as of” time.
SessionizationSUM(is_new_session) OVER (ORDER BY time) or gap-based grouping—classic window interview problem.
     
Realtime interview line: “We bounded latency at ingest (seconds), then used incremental SQL downstream so dashboards didn’t full-scan history every run.”
   
   
     

Interview angles — how to sound experienced

           

Pair with SnowPro study for Snowflake-specific objects (VARIANT, streams, tasks) on top of this SQL base.

   
   
     

Top SQL interview questions (click to expand)

     

Short model answers—adapt to the engine (Postgres, SQL Server, Snowflake, etc.).

     
What is SQL?        

Structured Query Language: a declarative language to define, query, and manipulate relational data. You describe sets of rows and columns; the engine plans how to execute.

     
     
What is a JOIN in plain words?        

Combining rows from two tables when they match on a key (e.g. orders and customers on customer_id). Inner join keeps only matches; left join keeps all rows from the left table and fills missing right-side data with NULL.

     
     
What is a table vs a query result?        

A table is stored data with a name. A query result is a temporary set of rows produced by SELECT—you can save it as a view, table, or CTE depending on the product.

     
     
What is the difference between WHERE and HAVING?        

WHERE filters rows before aggregation. HAVING filters groups after GROUP BY (can use aggregates like SUM(amount) > 100).

     
     
Explain INNER JOIN vs LEFT JOIN.        

Inner: only rows that match on both tables. Left: all rows from the left table; columns from the right are NULL when there is no match—useful for “what’s missing?” checks.

     
     
What is the difference between UNION and UNION ALL?        

UNION combines result sets and removes duplicate rows (extra sort/hash work). UNION ALL keeps everything including duplicates—usually faster when you know duplicates aren’t a problem.

     
     
What are primary keys and foreign keys?        

Primary key: uniquely identifies a row in a table (often surrogate id). Foreign key: column(s) referencing another table’s key—enforces relational integrity when the DB enforces constraints.

     
     
What is database normalization (high level)?        

Organizing tables to reduce redundancy: e.g. put repeating groups in their own table, use keys to relate. Tradeoff: more joins vs update anomalies. Denormalize selectively for read-heavy analytics.

     
     
What is an index and when can it hurt?        

An index is a separate structure to find rows faster (like a book index). Helps selective lookups; hurts bulk loads and heavy writes (maintain index). Wrong index may be ignored by the optimizer if the predicate isn’t selective.

     
     
How do NULLs behave in comparisons and joins?        

NULL = NULL is unknown, not true—use IS NULL. In joins, NULL doesn’t match NULL unless you use special predicates—classic source of “missing rows” bugs.

     
     
What is a correlated subquery?        

A subquery that references columns from an outer query. It can run once per outer row (expensive)—often replaceable with a join or window for clarity and performance.

     
     
When would you use a window function instead of GROUP BY?        

When you need per-row output with aggregates over a partition (running total, rank, previous row) without collapsing to one row per group.

     
     
How would you find duplicate keys in a table?        
SELECT user_id, COUNT(*) AS c
FROM events
GROUP BY user_id
HAVING COUNT(*) > 1;
     
     
What is a transaction? Mention ACID briefly.        

A transaction groups operations that commit or roll back together. ACID: Atomicity (all or nothing), Consistency (valid state), Isolation (concurrent transactions don’t corrupt each other—levels vary), Durability (after commit, survives crashes).

     
     
What is the difference between DELETE, TRUNCATE, and DROP?        

DELETE removes rows (can filter, logged row-wise). TRUNCATE clears the table quickly (usually minimal row logging; resets storage semantics vary). DROP removes the table object entirely.

     
     
What does “sargable” mean?        

A predicate can use an index/zone map—e.g. WHERE order_date >= '2026-01-01'. Often not sargable: WHERE YEAR(order_date) = 2026 because a function wraps the column.

     
     
How does pagination with LIMIT/OFFSET scale?        

Large OFFSET scans/skips many rows each page—gets slow. Better patterns: keyset pagination (WHERE id > :last_id ORDER BY id LIMIT 50) or cursors—depends on product.

     
     
What would you look at in a query execution plan (high level)?        

Join order and types (nested loop vs hash), estimated vs actual rows (cardinality mis-estimates), spills to disk, which indexes were used, and biggest cost operators to target first.

     
     

See also Python cheat sheet for paired data-engineering interviews.