Hyperparameter Tuning on Databricks: Hyperopt & MLflow (2026)

Hyperparameter tuning is the process of optimizing parameters that aren't learned during training — learning rate, tree depth, regularization strength, and so on. On Databricks, Hyperopt (distributed search via SparkTrials) and Optuna are the main tools, and every trial can be automatically recorded in MLflow Tracking. The ML Associate exam frequently tests the basic Hyperopt API, while ML Professional often asks when to use SparkTrials vs Trials.

Comparing Search Strategies

The algorithms used to find the best combination from a hyperparameter search space fall into three broad categories.

Strategy	How it works	Pros	Cons	Representative tools
Grid Search	Exhaustively tries every specified combination	Highly reproducible; effective when there are few parameters	Search space grows exponentially (curse of dimensionality)	scikit-learn GridSearchCV
Random Search	Samples randomly from the search space	Tends to find good solutions with fewer trials than Grid Search	Allocates resources equally to unimportant parameters	scikit-learn RandomizedSearchCV
Bayesian Optimization	Builds a probabilistic model (such as TPE) from past trials and predicts the next point to try	Converges to the optimum with fewer trials; handles high-dimensional spaces well	Has sequential dependencies, so pure parallelization requires careful design	Hyperopt (TPE), Optuna (TPE)

For both practical work and the exam, the most important strategy on Databricks is Bayesian Optimization (TPE: Tree-structured Parzen Estimator). Hyperopt and Optuna both default to TPE, and they can find strong solutions in high-dimensional spaces with roughly 50-200 trials.

Hyperopt Basics

Hyperopt is the Bayesian optimization library built into Databricks. You can run tuning simply by passing an objective function, a search space, an algorithm, and a maximum number of trials to fmin().

Defining the Search Space

Function	Use case	Example
`hp.choice(label, options)`	Categorical values (discrete choices)	`hp.choice("algo", ["rf", "xgb", "lgb"])`
`hp.uniform(label, low, high)`	Uniform distribution (continuous values)	`hp.uniform("dropout", 0.1, 0.5)`
`hp.loguniform(label, low, high)`	Log-uniform distribution (parameters that span orders of magnitude, e.g. learning rate)	`hp.loguniform("lr", log(1e-5), log(1e-1))`
`hp.quniform(label, low, high, q)`	Quantized uniform distribution (integer parameters)	`hp.quniform("max_depth", 3, 15, 1)`

Hyperopt + MLflow Integration Example

from hyperopt import fmin, tpe, hp, STATUS_OK, SparkTrials
import mlflow
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

# Define the search space
search_space = {
    "n_estimators": hp.quniform("n_estimators", 50, 500, 50),
    "max_depth": hp.quniform("max_depth", 3, 15, 1),
    "min_samples_split": hp.quniform("min_samples_split", 2, 20, 1),
    "learning_rate": hp.loguniform("learning_rate", np.log(1e-4), np.log(1e-1)),
}

# Objective function (returns the value to minimize)
def objective(params):
    params["n_estimators"] = int(params["n_estimators"])
    params["max_depth"] = int(params["max_depth"])
    params["min_samples_split"] = int(params["min_samples_split"])

    clf = RandomForestClassifier(**params, random_state=42)
    score = cross_val_score(clf, X_train, y_train, cv=3, scoring="f1").mean()

    # Log metrics to MLflow
    mlflow.log_metrics({"f1_cv": score})

    # Return STATUS_OK and loss (loss is minimized)
    return {"loss": -score, "status": STATUS_OK}

# Run distributed with SparkTrials
spark_trials = SparkTrials(parallelism=8)

with mlflow.start_run(run_name="hyperopt_rf_tuning"):
    best_params = fmin(
        fn=objective,
        space=search_space,
        algo=tpe.suggest,       # TPE (Bayesian Optimization)
        max_evals=100,          # up to 100 trials
        trials=spark_trials,    # run in parallel on Spark executors
    )

The objective function must return a dict in the form {"loss": value, "status": STATUS_OK}. Because loss is the value fmin minimizes, return a negative value (-score) when you want to maximize accuracy.

SparkTrials vs Trials

The Trials class choice determines Hyperopt's execution mode. Whether you tap into the cluster's resources makes a huge difference in performance.

Item	Trials	SparkTrials
Execution location	Driver node (single machine)	Spark executors (entire cluster)
Parallelism	Sequential execution only	Controlled by the parallelism parameter
Suitable models	Single-machine ML such as scikit-learn	Single-machine ML such as scikit-learn (each executor runs independently)
MLflow integration	Manual logging required	Each trial is automatically logged as a nested run
Recommended parallelism	—	Match the cluster's worker count, or use the square root of max_evals
Caveats	Even 100 trials run sequentially on a single driver	Too-high parallelism erodes TPE's sequential-optimization advantage

SparkTrials' parallelism involves a tradeoff. Higher values increase raw parallelism, but TPE uses past results to choose the next point. With parallelism too high, you end up choosing the next point while many in-flight trials haven't returned yet, which approaches random search. In practice, roughly the square root of max_evals is the recommended setting.

Optuna Basics

Optuna is a Bayesian optimization framework developed by Japan-based Preferred Networks. Unlike Hyperopt, it supports pruning (early termination) out of the box, letting you abort unpromising trials mid-flight to cut compute costs.

Key Optuna APIs

API	Role
`optuna.create_study(direction)`	Create an optimization study ("minimize" or "maximize")
`study.optimize(objective, n_trials)`	Run the specified number of optimization trials
`trial.suggest_int(name, low, high)`	Search an integer parameter
`trial.suggest_float(name, low, high, log)`	Search a float parameter (log=True for log scale)
`trial.suggest_categorical(name, choices)`	Search categorical values
`trial.report(value, step)`	Report intermediate values (used for pruning decisions)
`trial.should_prune()`	Pruning check (True means terminate early)

Optuna + MLflow Integration Example

import optuna
import mlflow
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score

def objective(trial):
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 50, 500),
        "max_depth": trial.suggest_int("max_depth", 3, 15),
        "learning_rate": trial.suggest_float("learning_rate", 1e-4, 1e-1, log=True),
        "subsample": trial.suggest_float("subsample", 0.5, 1.0),
        "min_samples_split": trial.suggest_int("min_samples_split", 2, 20),
    }

    clf = GradientBoostingClassifier(**params, random_state=42)
    score = cross_val_score(clf, X_train, y_train, cv=3, scoring="f1").mean()

    # Log each trial to MLflow
    with mlflow.start_run(nested=True):
        mlflow.log_params(params)
        mlflow.log_metric("f1_cv", score)

    return score

with mlflow.start_run(run_name="optuna_gbm_tuning"):
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=100)

    # Record the best parameters
    mlflow.log_params(study.best_params)
    mlflow.log_metric("best_f1", study.best_value)

Hyperopt vs Optuna

Item	Hyperopt	Optuna
Search algorithms	TPE, Random Search, Adaptive TPE	TPE, CMA-ES, Random Search, Grid Search, GP
Distributed support on Databricks	Native support via SparkTrials	Can be parallelized with Joblib etc., but no Spark integration
Pruning (early termination)	Not supported out of the box	MedianPruner, HyperbandPruner, and others built in
Objective return value	A dict containing loss (to minimize) and STATUS_OK	A scalar value (direction selects maximize/minimize)
MLflow integration	Automatic logging when using SparkTrials	Manual mlflow.start_run(nested=True)
Relation to AutoML	Engine behind Databricks AutoML	Not used by AutoML
Visualization	Use the MLflow UI	Visualize the search process with optuna.visualization
Exam importance	Frequently tested on ML Associate and ML Professional	Rarely tested directly, but useful for conceptual understanding

Relationship with AutoML

Databricks AutoML automatically performs preprocessing, feature engineering, model selection, and hyperparameter tuning once you hand it data. Internally it uses Hyperopt's TPE algorithm to search hyperparameters for each model, and every trial is automatically logged to an MLflow experiment.

Every notebook AutoML generates contains tuning code that uses Hyperopt
You can edit the generated notebooks to expand the search space or add custom preprocessing
A common production pattern is to use AutoML for a baseline and then fine-tune with Hyperopt
Because AutoML results are logged to an MLflow experiment, you can directly compare them with manual tuning results

MLflow Tracking Integration

When you use Hyperopt with SparkTrials, each trial is automatically logged as a child run nested under the parent run. That lets you compare parameters and metrics across all trials in the MLflow UI.

Parent run: created with mlflow.start_run(); records metadata for the entire tuning job
Child runs: SparkTrials automatically logs each trial as a nested run (parameters, loss, status)
Compare feature: plot metrics across child runs to analyze the impact of each parameter
Register the best run's model in the Model Registry to move seamlessly to deployment

Best Practices for Distributed Tuning

Setting parallelism: use the square root of max_evals as a rule of thumb; for 100 trials, parallelism of around 10 works well. Too high a value hurts TPE's search efficiency
Early stopping: use the early_stop_fn parameter of fmin to stop the search once the target accuracy is reached
Search-space types: use hp.loguniform for learning rate (parameters that span orders of magnitude) and hp.quniform for tree depth (integer parameters)
Cluster sizing: with SparkTrials each trial runs as 1 executor = 1 trial; for GPU models, ensure one GPU per worker
Data size and caching: for large training datasets, reduce data-transfer costs with spark.broadcast() or by writing the data to DBFS in advance
Ensuring reproducibility: fix seeds with np.random.seed() and the rstate parameter, but note that distributed execution has non-deterministic ordering

Exam Focus Points

Exam	Scope	Key points
ML Associate	Hyperopt basics	Meaning and usage of fmin, hp.choice, hp.loguniform, and STATUS_OK
ML Associate	Differences between search strategies	Characteristics of Grid Search vs Random Search vs Bayesian Optimization
ML Professional	SparkTrials vs Trials	Difference between distributed and single-machine execution; setting parallelism
ML Professional	MLflow integration	Automatic nested-run logging with SparkTrials and how to compare results
ML Professional	Relationship with AutoML	The fact that AutoML uses Hyperopt internally; using generated notebooks

Check Your Understanding

ML Professional

問題 1

An ML engineer wants to run 100 hyperparameter tuning trials on a scikit-learn random forest on an 8-worker Databricks cluster. Which approach best maximizes cluster resource usage while keeping TPE's search efficiency intact?

Pass a Trials object to fmin() and run all 100 trials sequentially on the driver node
Pass SparkTrials(parallelism=8) to fmin() and run trials in parallel on each executor. The square root of max_evals (about 10) is the recommended parallelism, but matching the worker count at 8 is acceptable
Pass SparkTrials(parallelism=100) to fmin() and run all 100 trials simultaneously
Use Optuna's create_study and distribute across Spark executors with Joblib parallelism

正解: B

SparkTrials distributes trials across Spark executors. parallelism=8 matches the worker count and is close to the square root of max_evals (100), about 10, which is a reasonable choice. Option A doesn't use the cluster's resources. Option C with parallelism=100 effectively wipes out TPE's advantage of using past results to pick the next point, making it equivalent to random search. Optuna in option D lacks SparkTrials integration on Databricks and is not the best fit.

Frequently Asked Questions

Should I use Hyperopt or Optuna?

If integration with the Databricks ecosystem matters most, Hyperopt is the first choice. Its strengths are cluster-wide distributed tuning via SparkTrials, being the engine behind AutoML, and automatic integration with MLflow Tracking. On the other hand, Optuna fits better when you need pruning (early termination) for efficiency, search algorithms beyond TPE (such as CMA-ES), or when you want to run the same code across other clouds and on-prem. For exam prep, Hyperopt is the priority.

What is the difference between SparkTrials and Trials?

Trials is a class that runs trials sequentially on a single machine, using only the resources of one driver node. SparkTrials distributes trials across Spark executors and runs them in parallel across the cluster. For example, an 8-worker cluster can run up to 8 trials concurrently, drastically reducing the time to complete 100 trials. This distinction comes up often on the ML Professional exam.

How does hyperparameter tuning relate to AutoML?

Databricks AutoML uses Hyperopt internally to search hyperparameters. AutoML is a higher-level layer that automates preprocessing, feature engineering, model selection, and tuning, and every trial is automatically logged to MLflow. The notebooks AutoML generates contain Hyperopt code, which you can customize to build your own tuning pipeline.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Hyperparameter Tuning: Complete Hyperopt and Optuna Guide