Databricks AutoML: Glass-Box Model Generation (2026)

Databricks AutoML is a managed machine learning capability that automates data preprocessing, model selection, hyperparameter tuning, and evaluation. It supports 3 task types — classification, regression, and time-series forecasting — and runs the entire flow from searching for the best model to generating a reproducible notebook with one click (UI) or one line of code (API). On the ML Associate exam, 10-15% of questions cover AutoML, focusing on API usage, understanding the generated artifacts, and judging when to apply it.

AutoML Processing Flow

AutoML automatically runs the 4 steps below. Understanding what each step does and what it produces is the foundation of exam prep.

┌─────────────────────────────────────────────────────────┐
│                AutoML Processing Flow                    │
│                                                         │
│  Step 1: Data analysis & preprocessing                  │
│  ├─ Missing-value handling (median/mode/imputation)     │
│  ├─ Categorical variable encoding                       │
│  ├─ Numeric feature normalization                       │
│  └─ Automatic feature selection                         │
│          │                                              │
│  Step 2: Model selection                                │
│  ├─ Automatic candidate algorithm selection             │
│  └─ LightGBM / XGBoost / sklearn / Prophet, etc.        │
│          │                                              │
│  Step 3: Hyperparameter tuning                          │
│  ├─ Automated search via Hyperopt                       │
│  └─ Each combination recorded as an MLflow Run          │
│          │                                              │
│  Step 4: Evaluation & ranking                           │
│  ├─ Evaluation via cross-validation                     │
│  └─ Automatic best-model selection                      │
└─────────────────────────────────────────────────────────┘

Supported Tasks and Evaluation Metrics

Task	API method	Supported algorithms	Primary metrics
Classification	databricks.automl.classify()	LightGBM, XGBoost, sklearn (LogisticRegression, RandomForest, DecisionTree)	F1 score, accuracy, log_loss, precision, recall
Regression	databricks.automl.regress()	LightGBM, XGBoost, sklearn (LinearRegression, RandomForest, DecisionTree)	RMSE, MAE, R², MSE
Forecasting	databricks.automl.forecast()	Prophet, ARIMA	SMAPE, MSE, RMSE, MAE

Classification and regression automatically try LightGBM, XGBoost, and sklearn algorithms. Forecasting is limited to Prophet and ARIMA — deep-learning-based methods (LSTM, etc.) are not supported by AutoML.

UI Execution Steps

You can run AutoML from the Databricks workspace UI without writing any code.

Experiments page — select "Create AutoML Experiment"
Dataset: choose a Unity Catalog table or Delta Table
Prediction target: specify the target column (label)
Problem type: choose Classification / Regression / Forecasting (auto-detection is also available)
Advanced Configuration: set evaluation metric, excluded columns, timeout, and trial count
Click Start to run

After the run completes, you can compare each trial's metrics in the MLflow Experiment UI, then open the best model's notebook to review and edit its contents.

API Execution (Python Code)

import databricks.automl

# Run a classification task
summary = databricks.automl.classify(
    dataset="catalog.schema.customer_data",   # Unity Catalog table
    target_col="churn",                       # target column
    primary_metric="f1",                      # metric to optimize
    timeout_minutes=30,                       # search-time upper bound
    max_trials=50                             # maximum number of trials
)

# Inspect the result
print(f"Best trial: {summary.best_trial}")
print(f"Best metric: {summary.best_trial.metrics}")
print(f"MLflow run ID: {summary.best_trial.mlflow_run_id}")

# Load the best model
best_model = summary.best_trial.load_model()

# Regression example
reg_summary = databricks.automl.regress(
    dataset=sales_df,              # a Spark DataFrame also works as input
    target_col="revenue",
    primary_metric="rmse",
    timeout_minutes=60
)

# Forecasting example
forecast_summary = databricks.automl.forecast(
    dataset="catalog.schema.daily_sales",
    target_col="sales_amount",
    time_col="date",                          # time column (required)
    frequency="d",                            # "d"=daily, "W"=weekly, "M"=monthly
    horizon=30,                               # forecast horizon (days)
    identity_col=["store_id"]                 # group key for multi-series forecasting
)

For dataset you can pass a Unity Catalog table name (3-level namespace) or a Spark DataFrame.primary_metric is the metric used for model optimization — the default for classification is F1. The exam tests the parameter names and roles of each API method.

Structure and Use of the Generated Notebook

AutoML's biggest differentiator is that it auto-generates a reproducible Python notebook for each trial. It is not a black box — the code is fully exposed, so data scientists can understand and edit it.

What the generated notebook contains

Data loading: loads the input table and splits it into train/test
Preprocessing pipeline: sklearn Pipeline for imputation, encoding, and normalization
Model definition: the selected algorithm and its hyperparameters
Training and evaluation: cross-validation and metric computation
MLflow logging: logs parameters, metrics, and the model
Feature Importance: feature importance visualization via SHAP values

The recommended workflow is to use the generated notebook as a baseline and then customize it with domain knowledge — adding features, changing preprocessing, swapping algorithms, and so on. The exam asks questions like "which of the following is the most appropriate way to use the AutoML-generated notebook?"

MLflow Experiment Integration

Every AutoML trial is automatically recorded as a Run in an MLflow Experiment. You do not need to write MLflow logging code by hand.

Recorded information	Details
Parameters	Hyperparameters of each trial (learning_rate, n_estimators, etc.)
Metrics	Cross-validation results for evaluation metrics (F1, accuracy, RMSE, etc.)
Artifacts	Trained model, Feature Importance plot, generated notebook
Tags	Algorithm name, AutoML version, dataset information

import mlflow

# After the AutoML run, get the best model from the MLflow Experiment
experiment_id = summary.experiment.experiment_id
best_run = mlflow.search_runs(
    experiment_ids=[experiment_id],
    order_by=["metrics.val_f1_score DESC"],
    max_results=1
).iloc[0]

print(f"Best F1: {best_run['metrics.val_f1_score']:.4f}")
print(f"Algorithm: {best_run['tags.estimator_name']}")

# Register the best model in the Model Registry
mlflow.register_model(
    model_uri=f"runs:/{best_run.run_id}/model",
    name="catalog.schema.churn_classifier"
)

AutoML vs. Manual ML Comparison

Comparison	AutoML	Manual ML
Development speed	Get a baseline model in minutes to tens of minutes	Requires days to weeks of development
ML knowledge required	Usable with basic knowledge	Requires deep knowledge of algorithms and tuning
Customizability	Constrained by supported algorithms and preprocessing	Fully flexible
Supported algorithms	LightGBM, XGBoost, sklearn, Prophet	Any framework, including deep learning
Large-scale data	Runs on a single node (applies sampling)	Supports distributed training (Spark ML, Horovod)
Reproducibility	Fully reproducible via the generated notebook	Depends on developer discipline
Recommended use cases	Baseline construction, PoC, data exploration	Maximizing production-model accuracy, custom pipelines

Limitations and Caveats

Single-node execution: AutoML does not support distributed training. Large datasets need to be sampled in advance.
No deep learning: CNN, RNN, and Transformer-based models are not part of the search space.
No unstructured data: Images, text, and audio cannot be passed directly (you must extract features up front).
No custom metrics: Only built-in metrics are supported; user-defined metrics cannot be specified.
Forecasting constraints: Only Prophet and ARIMA are available; deep-learning sequence models (N-BEATS, etc.) are not supported.

Key Points Tested on the ML Associate Exam

Choosing between API methods: parameters and roles of classify() / regress() / forecast()
Using the generated notebook: the recommended workflow of customizing it as a baseline
Automatic MLflow logging: how every trial is recorded as a Run
When to apply: telling apart scenarios where AutoML fits from those that require manual ML
Understanding the limitations: single-node, supported algorithms, and data-size constraints
Forecast-specific parameters: the role of time_col, frequency, horizon, and identity_col

Sample Question

AutoML / ML Associate

問題 1

A data scientist built a customer churn prediction model with AutoML. Which combination correctly describes what is produced by an AutoML run?

Only the best model. No information about other trials is recorded.
MLflow Experiment with Runs for every trial (parameters, metrics, models), plus a reproducible notebook for each trial.
The best model and a table of hyperparameters. Notebook generation and MLflow logging must be configured manually.
Only the model files for every trial. Metric comparison has to be done in Databricks SQL.

正解: B

Databricks AutoML automatically records every trial as a Run in an MLflow Experiment. Each Run contains hyperparameters, evaluation metrics (cross-validation results), the trained model, and Feature Importance. In addition, a reproducible Python notebook is auto-generated for each trial, containing the data loading, preprocessing, model definition, training, and evaluation code. Options A (only the best model) and C (manual MLflow setup) are wrong, and D is wrong because metric comparison can be done directly in the MLflow UI.

Frequently Asked Questions

How can I use the notebook generated by AutoML?

The notebook generated by AutoML contains code for data preprocessing, feature engineering, model training, and hyperparameter settings. The notebook is fully runnable and freely editable, so the recommended workflow is to use the AutoML result as a baseline and then customize it with domain knowledge (adding features, switching algorithms, adjusting preprocessing).

What dataset sizes does AutoML support?

AutoML runs on a single node, so it is optimized for datasets that fit in memory. As a rough guide, it handles up to a few million rows efficiently. For large datasets (billions of rows or more), you should sample or aggregate up front, or consider distributed training frameworks (Spark ML, Horovod, etc.). When the input table exceeds 100 GB, AutoML automatically applies sampling.

How are AutoML results recorded in MLflow?

Every AutoML trial (model candidate) is automatically recorded as a Run in an MLflow Experiment. Each Run includes the hyperparameters, evaluation metrics (accuracy, F1, RMSE, etc.), the trained model, and a link to the generated notebook. You can pick the best model using MLflow UI's metrics comparison, then register it in the Model Registry and proceed to production deployment.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Databricks AutoML Complete Guide: Automated ML Usage and Exam Prep