Databricks

Databricks SQL Complete Guide: Serverless SQL Warehouse

2026-03-21
更新: 2026-03-27
NicheeLab Editorial Team

Databricks SQL is a SQL analytics environment designed for BI analysts and SQL users. It bundles SQL warehouses (dedicated compute), a SQL editor, dashboards, and alerts into a single product, delivering a SQL analytics experience on the Lakehouse that rivals traditional data warehouses such as Snowflake, BigQuery, and Redshift. This article walks through SQL warehouse types, the Photon engine, dashboards, query federation, Query Profile, Unity Catalog integration, and exam-relevant points.

Databricks SQL Overview

Databricks SQL is the integrated environment for running SQL analytics on the Databricks Lakehouse Platform. Separate from the notebook environment where data engineers build ETL, it is made up of the following SQL-focused components.

  • SQL Warehouse: Dedicated compute resources for running SQL queries (Classic / Pro / Serverless)
  • SQL Editor: A web-based editor for writing, running, saving, and sharing queries. Includes autocomplete and a schema browser
  • Dashboards: Visualize query results and build interactive reports to share with stakeholders
  • Alerts: Automatically notify via email, Slack, or webhook when query results meet defined conditions
  • Query History: Inspect past query performance, runtime, and scanned data volume

Comparing the 3 SQL Warehouse Types

Databricks SQL offers three types of SQL warehouses. Pick the right one based on workload characteristics, startup speed, cost, and required features.

CharacteristicClassicProServerless
Startup time5-10 min5-10 minSeconds
Photon engineEnabledEnabledEnabled
Query federationNot supportedSupportedSupported
Predictive OptimizationNot supportedSupportedSupported
Infrastructure managementCustomer-managedCustomer-managedDatabricks-managed
AutoscalingManual configurationManual configurationFully automatic
DBU rateLowMediumHigh
Recommended use caseBasic SQL queriesAdvanced analytics, federationAd-hoc queries, dashboards

As of 2026, Databricks recommends Serverless SQL Warehouse. It starts in seconds, drastically reducing wait time for ad-hoc queries, and bills nothing while idle, making it cost-efficient for intermittent usage patterns.

The Photon Engine

Photon is Databricks' next-generation vectorized query execution engine, written in C++. It is enabled by default on every SQL warehouse and speeds up queries transparently — no SQL or code changes required.

  • Vectorized execution: Processes data in batches (columnar format) instead of row by row, maximizing the use of CPU SIMD instructions
  • Native C++ implementation: Avoids JVM GC overhead and optimizes memory layout
  • Delta Lake optimization: Specifically optimized for decoding, filtering, and aggregating Parquet files
  • Full Spark compatibility: Existing Spark SQL queries run as-is with identical results

Benchmarks report 2-8x speedups over the legacy Spark engine, with the largest gains on large-volume scans, filters, aggregations, and joins.

Building and Sharing Dashboards

Databricks SQL dashboards place SQL query results into visualizations (charts and tables) to compose interactive reports. The 2026 refresh added AI-assisted automatic dashboard generation.

  • Visualizations: Bar, line, pie, scatter, heatmap, counter, table, and many other chart types
  • Filter widgets: Dropdowns, date ranges, and text-input filters let viewers narrow down the data dynamically
  • Query parameters: Use {{param_name}} placeholders to bind dynamic values into SQL queries. Provides safe parameterization that prevents SQL injection
  • Scheduled refresh: Re-run queries automatically at intervals from 1 minute to 24 hours to keep dashboards current
  • Sharing: Share via URL, embed with an iframe, or export to PDF. View permissions integrate with Unity Catalog to also control data access

Configuring Alerts

SQL alerts run a specified SQL query on a schedule and automatically send a notification when results meet a threshold condition. They are used for data quality monitoring, KPI threshold alerts, and anomaly detection.

-- アラート用クエリの例: 過去1時間のAPIエラー率が5%を超えたら通知
SELECT
  COUNT(CASE WHEN status_code >= 500 THEN 1 END) * 100.0
    / NULLIF(COUNT(*), 0) AS error_rate_pct
FROM prod_catalog.logs.api_requests
WHERE request_timestamp > current_timestamp() - INTERVAL 1 HOUR

-- アラート条件: error_rate_pct > 5
-- 通知先: Slackチャンネル #data-alerts
-- 評価間隔: 15分ごと

Alert evaluation results are tracked in three states (TRIGGERED / OK / UNKNOWN), and you can configure a notification to fire when the alert recovers from TRIGGERED back to OK as well.

Query Federation (Lakehouse Federation)

Query federation (Lakehouse Federation) lets you query data in external databases (PostgreSQL, MySQL, SQL Server, Snowflake, BigQuery, and others) directly from Databricks SQL. You can query across heterogeneous data sources without moving data or building ETL pipelines.

-- 1. 外部データベースへの接続を作成
CREATE CONNECTION pg_production
TYPE POSTGRESQL
OPTIONS (
  host 'prod-db.example.com',
  port '5432',
  user secret('db_scope', 'pg_user'),
  password secret('db_scope', 'pg_password')
);

-- 2. 外部カタログの登録
CREATE FOREIGN CATALOG pg_catalog
USING CONNECTION pg_production
OPTIONS (database 'production');

-- 3. 外部テーブルとDelta Lakeテーブルの横断クエリ
SELECT c.customer_name, o.order_total, o.order_date
FROM prod_catalog.sales.customers c
JOIN pg_catalog.public.orders o
  ON c.customer_id = o.customer_id
WHERE o.order_date >= '2026-01-01'
ORDER BY o.order_total DESC;

Federation is available on Pro SQL Warehouse and Serverless SQL Warehouse. External table connection details are managed centrally in Unity Catalog, and access permissions are controlled with GRANT statements.

Query Profile

Query Profile visualizes the execution plan of a SQL query along with each operator's processing time, scanned row count, and shuffle volume. Use it to find query bottlenecks and to guide performance tuning.

  • Operator tree: Displays the execution plan (Scan → Filter → Aggregate → Sort, etc.) as a tree
  • Time breakdown: Color-codes each operator's processing time so you can spot the most expensive operation
  • Scan statistics: Shows the files, rows, and bytes scanned so you can verify that predicate pushdown is working
  • Spill information: Identifies where memory pressure caused a spill to disk

The DAA exam covers reading Query Profile and asks you to identify which operator is the bottleneck and whether predicate pushdown is being applied.

Unity Catalog Integration

Databricks SQL is fully integrated with Unity Catalog. Every piece of data accessed from a SQL warehouse is managed under Unity Catalog's three-level namespace (catalog.schema.table).

  • Access control: Manage permissions on tables, views, and functions with GRANT / REVOKE statements. Dashboard viewers need SELECT permission on the underlying tables referenced by the SQL queries
  • Data lineage: Data flow between tables is tracked automatically from SQL query execution. Dashboard data-source dependencies are also visualized
  • Column masking and row filters: Unity Catalog's column masking functions and row filter policies let you display different data to different users from the same table
  • Audit logs: Audit logs of queries run on SQL warehouses are recorded in System Tables, helping you meet compliance requirements

Exam Coverage (DEA / DAA)

Databricks SQL appears on multiple certification exams. The depth of coverage differs by exam, so focus on the points that matter for the exam you are taking.

Data Engineer Associate (DEA) coverage

  • Differences between the 3 SQL warehouse types (Classic / Pro / Serverless) and how to choose
  • Basics of the Photon engine (C++ implementation, vectorization, automatic application)
  • Basic use cases and setup steps for dashboards and alerts
  • Unity Catalog integration (referencing tables in the three-level namespace)

Data Analyst Associate (DAA) coverage

  • Writing queries in the SQL editor and parameterized query syntax
  • Dashboard creation workflow (adding visualizations, configuring filters, sharing)
  • Reading Query Profile (identifying bottlenecks, interpreting scan statistics)
  • Configuring alert conditions and notification destinations
  • Query federation concepts (CONNECTION / FOREIGN CATALOG)

Sample Question

DEA - SQL Warehouse

問題 1

A data analyst team frequently runs ad-hoc queries and wants to minimize the wait time before queries execute while also cutting costs during idle periods. Which SQL warehouse type is the best fit?

  1. Run a Classic SQL Warehouse (2X-Large) 24/7 with Auto-stop disabled
  2. Use a Pro SQL Warehouse at the smallest size with Auto-stop set to 5 minutes
  3. Use a Serverless SQL Warehouse and let it auto-stop when idle
  4. Run SQL queries on an All-Purpose Cluster

正解: C

Serverless SQL Warehouse starts in seconds, minimizing wait time for ad-hoc queries. When idle, the warehouse returns to the shared resource pool that Databricks manages automatically, so there are no charges during downtime. Classic/Pro take 5-10 minutes to start, creating a dilemma where enabling Auto-stop makes restart times painful. Option A is expensive because the warehouse runs 24/7, and Option D — an All-Purpose Cluster — is not as optimized for SQL analytics as a SQL warehouse. The DEA exam includes questions about choosing the right SQL warehouse type.

Databricks SQL Best Practices

  • Sizing the warehouse: Use 2X-Small to Small for small ad-hoc queries, and Medium or larger for production dashboards with many concurrent users
  • Leverage Auto-stop: Save money by stopping warehouses after an idle period. Serverless manages this automatically — no Auto-stop configuration needed
  • Use the query cache: Repeated identical queries return cached results, which is highly effective for dashboards viewed by multiple users concurrently
  • Use parameterized queries: Bind dashboard filters with {{param}} query parameters to safely inject values and prevent SQL injection
  • Combine Delta Lake with Liquid Clustering: Apply Liquid Clustering to columns that are frequently filtered to dramatically reduce SQL warehouse scan volume — speeding up queries and lowering cost at the same time

Frequently Asked Questions

How is Databricks SQL different from Spark SQL?

Spark SQL is Apache Spark's distributed SQL execution engine, used to run SQL from notebooks and jobs. Databricks SQL is a complete SQL-focused analytics environment that bundles SQL warehouses (dedicated compute), the SQL editor, dashboards, and alerts. It is built for BI analysts and SQL users, with the Photon engine delivering fast query execution. Think of Spark SQL as the data engineer's tool and Databricks SQL as the data analyst's product.

How much does Serverless SQL Warehouse cost compared with Pro?

Serverless SQL Warehouse charges roughly 1.4-1.7x the DBU rate of Pro, but it starts in seconds (Pro takes minutes) and incurs no charges while idle. For ad-hoc queries or intermittent workloads, Serverless often has the lower total cost; for always-on, high-throughput dashboards Pro can be more cost-effective. Tune the warehouse Auto-stop setting (default 15 minutes) at the same time.

Which exam domains cover Databricks SQL?

The Data Engineer Associate (DEA) exam tests the comparison of SQL warehouse types (Classic/Pro/Serverless), the use cases for dashboards and alerts, and Unity Catalog integration. The Data Analyst Associate (DAA) exam goes deeper into practical SQL syntax, dashboard creation workflows, parameterized queries, and reading Query Profile. The basics of the Photon engine can appear on both DEA and DAA.

Try Databricks SQL questions

Check your grasp of SQL warehouses, dashboards, and federation

Try free questions

Related Articles

Spark SQL Complete Guide

Deep dive into Spark's SQL execution layer

Serverless Compute Explained

Details on the serverless execution environment

Unity Catalog Complete Guide

Data governance and access control

How to Study for Databricks Certifications

Fastest path to passing and study-time estimates

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Databricks

Databricks Certifications: All 7 Exams, Difficulty & Study Plan (2026)

Complete guide to all 7 Databricks certifications — Data Eng...

Databricks

Databricks Exam Difficulty Ranking: All 7 Certs Compared (2026)

Every Databricks certification ranked by difficulty, with st...

Databricks

Databricks Study Guide: Fastest Pass Route & Time Estimates (2026)

How to pass Databricks certifications efficiently. Official ...

Databricks

Databricks Data Engineer Associate: Complete Guide (2026)

Domain-by-domain breakdown of the Databricks Certified Data ...

Databricks

Databricks Data Engineer Professional: Complete Guide (2026)

Tactics for the Databricks Certified Data Engineer Professio...

Browse all Databricks articles (110)
© 2026 NicheeLab All rights reserved.