Databricks

What is Lakeflow Jobs? Complete Databricks Workflow Orchestration Guide

2026-03-26
更新: 2026-03-27
NicheeLab Editorial Team

Lakeflow Jobs is the orchestration layer on Databricks for scheduling and monitoring data pipelines and ML workflows. The previous names "Databricks Workflows" and "Databricks Jobs" were renamed to "Lakeflow Jobs" in late 2024. You compose multiple tasks into a DAG (directed acyclic graph) and manage dependencies, retries, and parameter passing in one place.

This article walks through multi-task job structure, scheduling, error handling, and monitoring from both a practical and exam-prep perspective.

Multi-Task Job Structure (DAG)

The base unit of Lakeflow Jobs is the "job", and one job consists of one or more "tasks". Defining dependencies between tasks forms a DAG, and a task starts automatically once its upstream dependencies complete.

[Ingest] --> [Transform_Bronze] --> [Transform_Silver] --> [Load_Gold]
                    |                                          |
                    +--> [Data_Quality_Check] ---------------->+
                                                               |
                                                          [Notify]

In the DAG above, Transform_Bronze and Data_Quality_Check start in parallel once Ingest completes, while Transform_Silver and Load_Gold run sequentially. The Notify task only runs after both Load_Gold and Data_Quality_Check finish. By combining serial, parallel, and join patterns, you can express complex pipelines.

You define dependencies either by drag-and-drop in the UI or by listing task keys in the depends_on array of a JSON/YAML definition. Cyclic dependencies fail validation.

Task Types at a Glance

You can mix different task types within a single job. For example, one job can run ETL in a Notebook, aggregate the result in a SQL task, and then run a streaming DLT pipeline — all end-to-end.

Task TypeWhat It RunsTypical Use Case
NotebookRuns a Notebook in the workspaceETL, data transformation, ML training
Python scriptRuns a .py file on DBFS/VolumesPackaged Python code
Python wheelInstalls and runs a .whl packageDistributable code produced by CI/CD
SQLRuns a SQL query or SQL fileAggregation and reporting table refresh
DLT pipelineLaunches a Lakeflow Declarative PipelineDeclarative ETL, streaming ingestion
dbtRuns tasks from a dbt Core projectBuilding and testing dbt models
JARRuns a Spark application JARBatch processing in Scala/Java
If/Else conditionBranches the DAG based on a conditionDynamic workflow branching
For EachLoops a task over an input listRepeating the same processing across multiple tables

Scheduling

Lakeflow Jobs supports three trigger modes.

CRON Schedule

A standard time-based schedule. You can specify the timezone explicitly, for example 0 0 2 * * ? (daily at 2 AM) or 0 0 */6 * * ? (every 6 hours). It uses the Quartz CRON format (6 fields including a seconds field), so be aware that the field count differs from the standard Linux 5-field CRON.

File Arrival Trigger

Watches a Unity Catalog-governed cloud storage path and launches the job when new files arrive. The behavior is closer to event-driven than polling, and you can target S3 / ADLS / GCS paths. It is ideal for batch ingestion pipelines that follow a "process when a file lands" pattern.

Continuous Run

A mode that re-runs the job as soon as the previous run completes. Use it for streaming-style Notebooks that should stay running. You can set the minimum interval between restarts in seconds and combine it with automatic retry on failure to build an always-on pipeline.

Retries and Error Handling

Lakeflow Jobs lets you configure retries at two levels: task-level and job-level.

LevelParameterBehavior
Task-levelmax_retries (0-10)Re-runs that single task when it fails
Task-levelmin_retry_interval_millisMinimum interval between retries (milliseconds)
Task-levelretry_on_timeoutWhether timeouts are also retried
Job-levelmax_concurrent_runsMaximum concurrent runs of the same job (default 1)

When task-level retries are configured, a failed task is first retried in isolation. If the task still fails after max_retries, it is marked as a final "failure", and downstream tasks that depend on it are skipped. The overall job result becomes "failure", but tasks on branches that do not depend on the failed task continue to run normally.

The "Repair Run" feature lets you re-run only the failed task and its downstream tasks. Already-successful tasks are not re-executed, which is very useful for partial recovery in large pipelines.

Choosing Between Job Cluster and All-Purpose Cluster

Lakeflow Jobs tasks run on either a Job Cluster (a job-dedicated cluster) or an All-Purpose Cluster (an interactive cluster).

AspectJob ClusterAll-Purpose Cluster
LifecycleAuto-created at job start, auto-deleted on completionManually started and stopped (auto-termination is configurable)
DBU rateAbout 30-60% cheaper than All-PurposeStandard rate
Startup latencyCluster spins up on each run (a few minutes)Runs immediately if the cluster is already up
When to useProduction batch and scheduled runsDevelopment test runs and interactive debugging

For production pipelines, you should generally use Job Clusters. The DBU rate is lower and the cluster is auto-deleted after the run, so there is no risk of leaving resources idle. During development, it is common to test manually on an already-running All-Purpose Cluster and then switch to a Job Cluster configuration when deploying to production.

Parameters and Task Value Hand-off

Lakeflow Jobs offers two mechanisms: "Job Parameters" passed into the job from outside, and "Task Values" passed between tasks at runtime.

Job Parameters

You define parameter keys and default values at job definition time and can override them at schedule execution or API invocation. Built-in dynamic references such as {{job.start_time}} and {{job.run_id}} are available. Inside a Notebook, fetch them with dbutils.widgets.get("param_name").

Task Values (dbutils.jobs.taskValues)

A mechanism for passing the result of one task to downstream tasks.

# Set values in Task A
dbutils.jobs.taskValues.set(key="row_count", value=df.count())
dbutils.jobs.taskValues.set(key="target_date", value="2026-03-27")

# Read values in Task B (depends on Task A)
row_count = dbutils.jobs.taskValues.get(
    taskKey="task_a", key="row_count"
)
target_date = dbutils.jobs.taskValues.get(
    taskKey="task_a", key="target_date"
)

Values must be JSON-compatible types (string, number, boolean). Large DataFrames or binary data cannot be passed this way; in those cases, write to a Delta Table or temporary file and share only the path via a taskValue.

Notification Settings

You can configure notifications for job events such as start, success, failure, and skip.

  • Email: send to any email address; multiple recipients supported
  • Webhook: POST to any HTTPS endpoint; integrates with PagerDuty or custom systems
  • Slack integration: register a Slack Webhook as a Databricks destination to post into channels
  • Microsoft Teams: notify via Incoming Webhook

Notifications can be configured at both the job and task levels. For example, you can route "overall job failure to the on-call team" while sending "specific task failures to the data quality team", splitting granularity so the right people get the right signal without alert fatigue.

Monitoring (Run History and System Tables)

The Lakeflow Jobs UI lets you inspect each job's run history, with success/failure status, execution time, and per-task status displayed together. The default retention is 60 days of run history.

For deeper analytics and longer retention, use the Unity Catalog System Tables. The system.lakeflow schema provides the following tables.

TableContents
system.lakeflow.jobsJob definition metadata (name, owner, cluster config, etc.)
system.lakeflow.job_tasksPer-task definition (task type, dependencies, etc.)
system.lakeflow.job_run_timelinePer-run timeline (start/end times, result status)
system.lakeflow.job_task_run_timelinePer-task run timeline (duration and status of each task)

Aggregating these tables in SQL surfaces insights like "the jobs with the highest failure rate over the past 30 days", "tasks whose average runtime is trending upward", and "cluster utilization spikes during specific time windows". Combined with Databricks SQL dashboards, you can build operations-team-facing job health monitoring.

What the Exam Tests

On the Data Engineer Associate exam, Lakeflow Jobs appears in the "Production Pipelines" domain (about 16%). The following patterns show up frequently.

  • Which to use in production: Job Cluster or All-Purpose Cluster → Job Cluster (cost and auto-termination)
  • How to re-run only the failed task after a job failure → Repair Run
  • How to pass a previous task's computed result to the next task → dbutils.jobs.taskValues.set() / get()
  • How to launch a job when a file arrives → File Arrival Trigger
  • Field count of the CRON expression → 6 fields in Quartz format (including seconds)
  • If task dependencies contain a cycle → Validation error (no longer a DAG, so cannot be created)

The exam frequently includes distractor choices that conflate "DLT pipeline", "Lakeflow Jobs", and "standalone Notebook run". Be sure to organize the purpose and applicability of each in your head.

Check Your Understanding

Data Engineer Associate – Production Pipelines

問題 1

A data engineer operates a production multi-task job. In last night's run, only the third of five tasks (Transform_Silver) failed due to a transient network issue. Tasks 1 and 2 succeeded, and tasks 4 and 5 were skipped. To efficiently re-run only the failed task and its downstream tasks without re-running the already-successful tasks, which feature should be used?

  1. Re-run the entire job from the beginning
  2. Use Repair Run to re-run only the failed task and its downstream tasks
  3. Manually run the failed task's Notebook on an All-Purpose Cluster
  4. Delete the job, recreate it with the same definition, and run it

正解: B

Repair Run re-runs only the failed task and its downstream tasks. Already-successful tasks (Ingest, Transform_Bronze) are not re-executed, saving compute cost and runtime. A full job re-run is inefficient because it re-executes all tasks, including ones that already succeeded. A manual Notebook run loses DAG dependency management and does not trigger downstream tasks automatically.

Frequently Asked Questions

What is the difference between Lakeflow Jobs and the old Databricks Workflows name?

There is no functional difference. In late 2024, Databricks consolidated its branding by renaming Workflows/Jobs to 'Lakeflow Jobs' and Delta Live Tables to 'Lakeflow Declarative Pipelines'. The UI labels and documentation URLs are being updated gradually, but the REST API endpoints (/api/2.1/jobs/) and CLI command structure still use the old names. Exams may use either name, so you should recognize that both old and new labels refer to the same feature.

When should I use dbutils.jobs.taskValues vs. Job Parameters?

Job Parameters are static values passed in externally at job start time and are visible to every task. By contrast, dbutils.jobs.taskValues.set() / get() are for runtime data hand-off between tasks. For example, when an upstream task computes a row count or partition key that a downstream task needs, use taskValues. Job Parameters are best for fixed values or dynamic references (such as {{job.start_time}}) configured at schedule definition or API invocation time.

How heavily is Lakeflow Jobs tested on the Data Engineer Associate exam?

On the Data Engineer Associate Exam Guide, the 'Production Pipelines' domain accounts for about 16% of the exam, and Lakeflow Jobs is the core topic of that domain. Expect questions on multi-task job structure, scheduling, retry settings, and the rationale for choosing between Job Clusters and All-Purpose Clusters. Rather than rote memorization of feature names, the exam focuses on judgment-style questions like 'Which setting is most appropriate in this scenario?'

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Databricks

Databricks Certifications: All 7 Exams, Difficulty & Study Plan (2026)

Complete guide to all 7 Databricks certifications — Data Eng...

Databricks

Databricks Exam Difficulty Ranking: All 7 Certs Compared (2026)

Every Databricks certification ranked by difficulty, with st...

Databricks

Databricks Study Guide: Fastest Pass Route & Time Estimates (2026)

How to pass Databricks certifications efficiently. Official ...

Databricks

Databricks Data Engineer Associate: Complete Guide (2026)

Domain-by-domain breakdown of the Databricks Certified Data ...

Databricks

Databricks Data Engineer Professional: Complete Guide (2026)

Tactics for the Databricks Certified Data Engineer Professio...

Browse all Databricks articles (110)
© 2026 NicheeLab All rights reserved.