Lakeflow Jobs is the orchestration layer on Databricks for scheduling and monitoring data pipelines and ML workflows. The previous names "Databricks Workflows" and "Databricks Jobs" were renamed to "Lakeflow Jobs" in late 2024. You compose multiple tasks into a DAG (directed acyclic graph) and manage dependencies, retries, and parameter passing in one place.
This article walks through multi-task job structure, scheduling, error handling, and monitoring from both a practical and exam-prep perspective.
The base unit of Lakeflow Jobs is the "job", and one job consists of one or more "tasks". Defining dependencies between tasks forms a DAG, and a task starts automatically once its upstream dependencies complete.
[Ingest] --> [Transform_Bronze] --> [Transform_Silver] --> [Load_Gold]
| |
+--> [Data_Quality_Check] ---------------->+
|
[Notify]In the DAG above, Transform_Bronze and Data_Quality_Check start in parallel once Ingest completes, while Transform_Silver and Load_Gold run sequentially. The Notify task only runs after both Load_Gold and Data_Quality_Check finish. By combining serial, parallel, and join patterns, you can express complex pipelines.
You define dependencies either by drag-and-drop in the UI or by listing task keys in the depends_on array of a JSON/YAML definition. Cyclic dependencies fail validation.
You can mix different task types within a single job. For example, one job can run ETL in a Notebook, aggregate the result in a SQL task, and then run a streaming DLT pipeline — all end-to-end.
| Task Type | What It Runs | Typical Use Case |
|---|---|---|
| Notebook | Runs a Notebook in the workspace | ETL, data transformation, ML training |
| Python script | Runs a .py file on DBFS/Volumes | Packaged Python code |
| Python wheel | Installs and runs a .whl package | Distributable code produced by CI/CD |
| SQL | Runs a SQL query or SQL file | Aggregation and reporting table refresh |
| DLT pipeline | Launches a Lakeflow Declarative Pipeline | Declarative ETL, streaming ingestion |
| dbt | Runs tasks from a dbt Core project | Building and testing dbt models |
| JAR | Runs a Spark application JAR | Batch processing in Scala/Java |
| If/Else condition | Branches the DAG based on a condition | Dynamic workflow branching |
| For Each | Loops a task over an input list | Repeating the same processing across multiple tables |
Lakeflow Jobs supports three trigger modes.
A standard time-based schedule. You can specify the timezone explicitly, for example 0 0 2 * * ? (daily at 2 AM) or 0 0 */6 * * ? (every 6 hours). It uses the Quartz CRON format (6 fields including a seconds field), so be aware that the field count differs from the standard Linux 5-field CRON.
Watches a Unity Catalog-governed cloud storage path and launches the job when new files arrive. The behavior is closer to event-driven than polling, and you can target S3 / ADLS / GCS paths. It is ideal for batch ingestion pipelines that follow a "process when a file lands" pattern.
A mode that re-runs the job as soon as the previous run completes. Use it for streaming-style Notebooks that should stay running. You can set the minimum interval between restarts in seconds and combine it with automatic retry on failure to build an always-on pipeline.
Lakeflow Jobs lets you configure retries at two levels: task-level and job-level.
| Level | Parameter | Behavior |
|---|---|---|
| Task-level | max_retries (0-10) | Re-runs that single task when it fails |
| Task-level | min_retry_interval_millis | Minimum interval between retries (milliseconds) |
| Task-level | retry_on_timeout | Whether timeouts are also retried |
| Job-level | max_concurrent_runs | Maximum concurrent runs of the same job (default 1) |
When task-level retries are configured, a failed task is first retried in isolation. If the task still fails after max_retries, it is marked as a final "failure", and downstream tasks that depend on it are skipped. The overall job result becomes "failure", but tasks on branches that do not depend on the failed task continue to run normally.
The "Repair Run" feature lets you re-run only the failed task and its downstream tasks. Already-successful tasks are not re-executed, which is very useful for partial recovery in large pipelines.
Lakeflow Jobs tasks run on either a Job Cluster (a job-dedicated cluster) or an All-Purpose Cluster (an interactive cluster).
| Aspect | Job Cluster | All-Purpose Cluster |
|---|---|---|
| Lifecycle | Auto-created at job start, auto-deleted on completion | Manually started and stopped (auto-termination is configurable) |
| DBU rate | About 30-60% cheaper than All-Purpose | Standard rate |
| Startup latency | Cluster spins up on each run (a few minutes) | Runs immediately if the cluster is already up |
| When to use | Production batch and scheduled runs | Development test runs and interactive debugging |
For production pipelines, you should generally use Job Clusters. The DBU rate is lower and the cluster is auto-deleted after the run, so there is no risk of leaving resources idle. During development, it is common to test manually on an already-running All-Purpose Cluster and then switch to a Job Cluster configuration when deploying to production.
Lakeflow Jobs offers two mechanisms: "Job Parameters" passed into the job from outside, and "Task Values" passed between tasks at runtime.
You define parameter keys and default values at job definition time and can override them at schedule execution or API invocation. Built-in dynamic references such as {{job.start_time}} and {{job.run_id}} are available. Inside a Notebook, fetch them with dbutils.widgets.get("param_name").
A mechanism for passing the result of one task to downstream tasks.
# Set values in Task A
dbutils.jobs.taskValues.set(key="row_count", value=df.count())
dbutils.jobs.taskValues.set(key="target_date", value="2026-03-27")
# Read values in Task B (depends on Task A)
row_count = dbutils.jobs.taskValues.get(
taskKey="task_a", key="row_count"
)
target_date = dbutils.jobs.taskValues.get(
taskKey="task_a", key="target_date"
)Values must be JSON-compatible types (string, number, boolean). Large DataFrames or binary data cannot be passed this way; in those cases, write to a Delta Table or temporary file and share only the path via a taskValue.
You can configure notifications for job events such as start, success, failure, and skip.
Notifications can be configured at both the job and task levels. For example, you can route "overall job failure to the on-call team" while sending "specific task failures to the data quality team", splitting granularity so the right people get the right signal without alert fatigue.
The Lakeflow Jobs UI lets you inspect each job's run history, with success/failure status, execution time, and per-task status displayed together. The default retention is 60 days of run history.
For deeper analytics and longer retention, use the Unity Catalog System Tables. The system.lakeflow schema provides the following tables.
| Table | Contents |
|---|---|
| system.lakeflow.jobs | Job definition metadata (name, owner, cluster config, etc.) |
| system.lakeflow.job_tasks | Per-task definition (task type, dependencies, etc.) |
| system.lakeflow.job_run_timeline | Per-run timeline (start/end times, result status) |
| system.lakeflow.job_task_run_timeline | Per-task run timeline (duration and status of each task) |
Aggregating these tables in SQL surfaces insights like "the jobs with the highest failure rate over the past 30 days", "tasks whose average runtime is trending upward", and "cluster utilization spikes during specific time windows". Combined with Databricks SQL dashboards, you can build operations-team-facing job health monitoring.
On the Data Engineer Associate exam, Lakeflow Jobs appears in the "Production Pipelines" domain (about 16%). The following patterns show up frequently.
The exam frequently includes distractor choices that conflate "DLT pipeline", "Lakeflow Jobs", and "standalone Notebook run". Be sure to organize the purpose and applicability of each in your head.
Data Engineer Associate – Production Pipelines
問題 1
A data engineer operates a production multi-task job. In last night's run, only the third of five tasks (Transform_Silver) failed due to a transient network issue. Tasks 1 and 2 succeeded, and tasks 4 and 5 were skipped. To efficiently re-run only the failed task and its downstream tasks without re-running the already-successful tasks, which feature should be used?
正解: B
Repair Run re-runs only the failed task and its downstream tasks. Already-successful tasks (Ingest, Transform_Bronze) are not re-executed, saving compute cost and runtime. A full job re-run is inefficient because it re-executes all tasks, including ones that already succeeded. A manual Notebook run loses DAG dependency management and does not trigger downstream tasks automatically.
What is the difference between Lakeflow Jobs and the old Databricks Workflows name?
There is no functional difference. In late 2024, Databricks consolidated its branding by renaming Workflows/Jobs to 'Lakeflow Jobs' and Delta Live Tables to 'Lakeflow Declarative Pipelines'. The UI labels and documentation URLs are being updated gradually, but the REST API endpoints (/api/2.1/jobs/) and CLI command structure still use the old names. Exams may use either name, so you should recognize that both old and new labels refer to the same feature.
When should I use dbutils.jobs.taskValues vs. Job Parameters?
Job Parameters are static values passed in externally at job start time and are visible to every task. By contrast, dbutils.jobs.taskValues.set() / get() are for runtime data hand-off between tasks. For example, when an upstream task computes a row count or partition key that a downstream task needs, use taskValues. Job Parameters are best for fixed values or dynamic references (such as {{job.start_time}}) configured at schedule definition or API invocation time.
How heavily is Lakeflow Jobs tested on the Data Engineer Associate exam?
On the Data Engineer Associate Exam Guide, the 'Production Pipelines' domain accounts for about 16% of the exam, and Lakeflow Jobs is the core topic of that domain. Expect questions on multi-task job structure, scheduling, retry settings, and the rationale for choosing between Job Clusters and All-Purpose Clusters. Rather than rote memorization of feature names, the exam focuses on judgment-style questions like 'Which setting is most appropriate in this scenario?'
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Databricks Certifications: All 7 Exams, Difficulty & Study Plan (2026)
Complete guide to all 7 Databricks certifications — Data Eng...
Databricks Exam Difficulty Ranking: All 7 Certs Compared (2026)
Every Databricks certification ranked by difficulty, with st...
Databricks Study Guide: Fastest Pass Route & Time Estimates (2026)
How to pass Databricks certifications efficiently. Official ...
Databricks Data Engineer Associate: Complete Guide (2026)
Domain-by-domain breakdown of the Databricks Certified Data ...
Databricks Data Engineer Professional: Complete Guide (2026)
Tactics for the Databricks Certified Data Engineer Professio...