Lakeflow Jobs: Successor to Workflows (2026)

Lakeflow Jobs is the orchestration layer on Databricks for scheduling and monitoring data pipelines and ML workflows. The previous names "Databricks Workflows" and "Databricks Jobs" were renamed to "Lakeflow Jobs" in late 2024. You compose multiple tasks into a DAG (directed acyclic graph) and manage dependencies, retries, and parameter passing in one place.

This article walks through multi-task job structure, scheduling, error handling, and monitoring from both a practical and exam-prep perspective.

Multi-Task Job Structure (DAG)

The base unit of Lakeflow Jobs is the "job", and one job consists of one or more "tasks". Defining dependencies between tasks forms a DAG, and a task starts automatically once its upstream dependencies complete.

[Ingest] --> [Transform_Bronze] --> [Transform_Silver] --> [Load_Gold]
                    |                                          |
                    +--> [Data_Quality_Check] ---------------->+
                                                               |
                                                          [Notify]

In the DAG above, Transform_Bronze and Data_Quality_Check start in parallel once Ingest completes, while Transform_Silver and Load_Gold run sequentially. The Notify task only runs after both Load_Gold and Data_Quality_Check finish. By combining serial, parallel, and join patterns, you can express complex pipelines.

You define dependencies either by drag-and-drop in the UI or by listing task keys in the depends_on array of a JSON/YAML definition. Cyclic dependencies fail validation.

Task Types at a Glance

You can mix different task types within a single job. For example, one job can run ETL in a Notebook, aggregate the result in a SQL task, and then run a streaming DLT pipeline — all end-to-end.

Task Type	What It Runs	Typical Use Case
Notebook	Runs a Notebook in the workspace	ETL, data transformation, ML training
Python script	Runs a .py file on DBFS/Volumes	Packaged Python code
Python wheel	Installs and runs a .whl package	Distributable code produced by CI/CD
SQL	Runs a SQL query or SQL file	Aggregation and reporting table refresh
DLT pipeline	Launches a Lakeflow Declarative Pipeline	Declarative ETL, streaming ingestion
dbt	Runs tasks from a dbt Core project	Building and testing dbt models
JAR	Runs a Spark application JAR	Batch processing in Scala/Java
If/Else condition	Branches the DAG based on a condition	Dynamic workflow branching
For Each	Loops a task over an input list	Repeating the same processing across multiple tables

Scheduling

Lakeflow Jobs supports three trigger modes.

CRON Schedule

A standard time-based schedule. You can specify the timezone explicitly, for example 0 0 2 * * ? (daily at 2 AM) or 0 0 */6 * * ? (every 6 hours). It uses the Quartz CRON format (6 fields including a seconds field), so be aware that the field count differs from the standard Linux 5-field CRON.

File Arrival Trigger

Watches a Unity Catalog-governed cloud storage path and launches the job when new files arrive. The behavior is closer to event-driven than polling, and you can target S3 / ADLS / GCS paths. It is ideal for batch ingestion pipelines that follow a "process when a file lands" pattern.

Continuous Run

A mode that re-runs the job as soon as the previous run completes. Use it for streaming-style Notebooks that should stay running. You can set the minimum interval between restarts in seconds and combine it with automatic retry on failure to build an always-on pipeline.

Retries and Error Handling

Lakeflow Jobs lets you configure retries at two levels: task-level and job-level.

Level	Parameter	Behavior
Task-level	max_retries (0-10)	Re-runs that single task when it fails
Task-level	min_retry_interval_millis	Minimum interval between retries (milliseconds)
Task-level	retry_on_timeout	Whether timeouts are also retried
Job-level	max_concurrent_runs	Maximum concurrent runs of the same job (default 1)

When task-level retries are configured, a failed task is first retried in isolation. If the task still fails after max_retries, it is marked as a final "failure", and downstream tasks that depend on it are skipped. The overall job result becomes "failure", but tasks on branches that do not depend on the failed task continue to run normally.

The "Repair Run" feature lets you re-run only the failed task and its downstream tasks. Already-successful tasks are not re-executed, which is very useful for partial recovery in large pipelines.

Choosing Between Job Cluster and All-Purpose Cluster

Lakeflow Jobs tasks run on either a Job Cluster (a job-dedicated cluster) or an All-Purpose Cluster (an interactive cluster).

Aspect	Job Cluster	All-Purpose Cluster
Lifecycle	Auto-created at job start, auto-deleted on completion	Manually started and stopped (auto-termination is configurable)
DBU rate	About 30-60% cheaper than All-Purpose	Standard rate
Startup latency	Cluster spins up on each run (a few minutes)	Runs immediately if the cluster is already up
When to use	Production batch and scheduled runs	Development test runs and interactive debugging

For production pipelines, you should generally use Job Clusters. The DBU rate is lower and the cluster is auto-deleted after the run, so there is no risk of leaving resources idle. During development, it is common to test manually on an already-running All-Purpose Cluster and then switch to a Job Cluster configuration when deploying to production.

Parameters and Task Value Hand-off

Lakeflow Jobs offers two mechanisms: "Job Parameters" passed into the job from outside, and "Task Values" passed between tasks at runtime.

Job Parameters

You define parameter keys and default values at job definition time and can override them at schedule execution or API invocation. Built-in dynamic references such as {{job.start_time}} and {{job.run_id}} are available. Inside a Notebook, fetch them with dbutils.widgets.get("param_name").

Task Values (dbutils.jobs.taskValues)

A mechanism for passing the result of one task to downstream tasks.

# Set values in Task A
dbutils.jobs.taskValues.set(key="row_count", value=df.count())
dbutils.jobs.taskValues.set(key="target_date", value="2026-03-27")

# Read values in Task B (depends on Task A)
row_count = dbutils.jobs.taskValues.get(
    taskKey="task_a", key="row_count"
)
target_date = dbutils.jobs.taskValues.get(
    taskKey="task_a", key="target_date"
)

Values must be JSON-compatible types (string, number, boolean). Large DataFrames or binary data cannot be passed this way; in those cases, write to a Delta Table or temporary file and share only the path via a taskValue.

Notification Settings

You can configure notifications for job events such as start, success, failure, and skip.

Email: send to any email address; multiple recipients supported
Webhook: POST to any HTTPS endpoint; integrates with PagerDuty or custom systems
Slack integration: register a Slack Webhook as a Databricks destination to post into channels
Microsoft Teams: notify via Incoming Webhook

Notifications can be configured at both the job and task levels. For example, you can route "overall job failure to the on-call team" while sending "specific task failures to the data quality team", splitting granularity so the right people get the right signal without alert fatigue.

Monitoring (Run History and System Tables)

The Lakeflow Jobs UI lets you inspect each job's run history, with success/failure status, execution time, and per-task status displayed together. The default retention is 60 days of run history.

For deeper analytics and longer retention, use the Unity Catalog System Tables. The system.lakeflow schema provides the following tables.

Table	Contents
system.lakeflow.jobs	Job definition metadata (name, owner, cluster config, etc.)
system.lakeflow.job_tasks	Per-task definition (task type, dependencies, etc.)
system.lakeflow.job_run_timeline	Per-run timeline (start/end times, result status)
system.lakeflow.job_task_run_timeline	Per-task run timeline (duration and status of each task)

Aggregating these tables in SQL surfaces insights like "the jobs with the highest failure rate over the past 30 days", "tasks whose average runtime is trending upward", and "cluster utilization spikes during specific time windows". Combined with Databricks SQL dashboards, you can build operations-team-facing job health monitoring.

What the Exam Tests

On the Data Engineer Associate exam, Lakeflow Jobs appears in the "Production Pipelines" domain (about 16%). The following patterns show up frequently.

Which to use in production: Job Cluster or All-Purpose Cluster → Job Cluster (cost and auto-termination)
How to re-run only the failed task after a job failure → Repair Run
How to pass a previous task's computed result to the next task → dbutils.jobs.taskValues.set() / get()
How to launch a job when a file arrives → File Arrival Trigger
Field count of the CRON expression → 6 fields in Quartz format (including seconds)
If task dependencies contain a cycle → Validation error (no longer a DAG, so cannot be created)

The exam frequently includes distractor choices that conflate "DLT pipeline", "Lakeflow Jobs", and "standalone Notebook run". Be sure to organize the purpose and applicability of each in your head.

Check Your Understanding

Data Engineer Associate – Production Pipelines

問題 1

A data engineer operates a production multi-task job. In last night's run, only the third of five tasks (Transform_Silver) failed due to a transient network issue. Tasks 1 and 2 succeeded, and tasks 4 and 5 were skipped. To efficiently re-run only the failed task and its downstream tasks without re-running the already-successful tasks, which feature should be used?

Re-run the entire job from the beginning
Use Repair Run to re-run only the failed task and its downstream tasks
Manually run the failed task's Notebook on an All-Purpose Cluster
Delete the job, recreate it with the same definition, and run it

正解: B

Repair Run re-runs only the failed task and its downstream tasks. Already-successful tasks (Ingest, Transform_Bronze) are not re-executed, saving compute cost and runtime. A full job re-run is inefficient because it re-executes all tasks, including ones that already succeeded. A manual Notebook run loses DAG dependency management and does not trigger downstream tasks automatically.

Frequently Asked Questions

What is the difference between Lakeflow Jobs and the old Databricks Workflows name?

There is no functional difference. In late 2024, Databricks consolidated its branding by renaming Workflows/Jobs to 'Lakeflow Jobs' and Delta Live Tables to 'Lakeflow Declarative Pipelines'. The UI labels and documentation URLs are being updated gradually, but the REST API endpoints (/api/2.1/jobs/) and CLI command structure still use the old names. Exams may use either name, so you should recognize that both old and new labels refer to the same feature.

When should I use dbutils.jobs.taskValues vs. Job Parameters?

Job Parameters are static values passed in externally at job start time and are visible to every task. By contrast, dbutils.jobs.taskValues.set() / get() are for runtime data hand-off between tasks. For example, when an upstream task computes a row count or partition key that a downstream task needs, use taskValues. Job Parameters are best for fixed values or dynamic references (such as {{job.start_time}}) configured at schedule definition or API invocation time.

How heavily is Lakeflow Jobs tested on the Data Engineer Associate exam?

On the Data Engineer Associate Exam Guide, the 'Production Pipelines' domain accounts for about 16% of the exam, and Lakeflow Jobs is the core topic of that domain. Expect questions on multi-task job structure, scheduling, retry settings, and the rationale for choosing between Job Clusters and All-Purpose Clusters. Rather than rote memorization of feature names, the exam focuses on judgment-style questions like 'Which setting is most appropriate in this scenario?'

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

What is Lakeflow Jobs? Complete Databricks Workflow Orchestration Guide