The Databricks ML exams (Associate / Professional) cover the entire machine learning lifecycle: MLflow, AutoML, Feature Store, distributed training, and Model Serving. This article organizes the domain-by-domain weight tables and typical question patterns, then provides 3 domain-specific practice questions with detailed explanations.
MLA is a 45-question, 120-minute exam that tests ML implementation skills on Databricks. The 2026 update increased the weight of Unity Catalog-integrated Model Registry and aliases.
| Domain | Weight | Main Topics |
|---|---|---|
| MLflow | ~30% | Experiment Tracking, autolog, Model Registry, aliases, Signature |
| EDA & Feature Engineering | ~20% | pandas/Spark DataFrame, missing value handling, categorical variables, Feature Store |
| Model Training & Evaluation | ~20% | scikit-learn, classification/regression metrics, overfitting countermeasures, AutoML |
| Model Deployment | ~15% | Batch inference, Model Serving REST API, A/B test concepts |
| Distributed Training & Scaling | ~15% | Pandas API on Spark, Hyperopt SparkTrials, pandas UDF |
MLP is the advanced 60-question, 120-minute exam that tests your ability to design and operate production ML systems. Long-form scenario questions dominate, and you are expected to make design decisions that span multiple domains.
| Domain | Weight | Main Topics |
|---|---|---|
| MLOps & Model Lifecycle | ~30% | CI/CD for ML, model promotion via aliases, Webhook triggers |
| Distributed Training & Large-Scale Data | ~20% | TorchDistributor, Horovod, DeepSpeed, data parallelism / model parallelism |
| Feature Engineering & Monitoring | ~20% | Feature Store Point-in-Time Lookup, Lakehouse Monitoring, drift detection |
| Model Serving & Inference Optimization | ~15% | Serverless Serving, GPU Serving, batch vs real-time inference |
| Experiment Management & Reproducibility | ~15% | MLflow Projects, environment reproducibility (Conda/Docker), hyperparameter management |
mlflow.autolog() automatically records parameters, metrics, and models for supported libraries (scikit-learn, XGBoost, LightGBM, PyTorch, etc.). A common exam pattern asks what autolog() does not record. The key gotcha is that custom business metrics (e.g., profit margin or cost-weighted scores) must still be logged manually with mlflow.log_metric().
The defining feature of Databricks AutoML is its "glass-box" approach: each trial is generated as a complete notebook. The exam frequently probes the boundary of what AutoML does automatically versus what it does not. Algorithm selection, preprocessing, and hyperparameter search are automatic; fetching external data, defining custom loss functions, and production deployment are manual.
In Feature Engineering in Unity Catalog's create_training_set(), lookup_key specifies the join key between the training data and the feature table. MLP increasingly tests Point-in-Time Lookup (timestamp_lookup_key) — the use case of retrieving the accurate feature values that were available at training time.
Hyperopt's SparkTrials parallelizes hyperparameter search across Spark workers. MLA tests the basics of SparkTrials, while MLP tests how to configure distributed PyTorch training with TorchDistributor (data parallelism) and how to adjust batch size based on the number of GPUs.
Databricks Model Serving offers three endpoint types: CPU Serverless, GPU Serving, and External Models (a proxy for external LLMs such as OpenAI). A frequent question pattern asks you to choose Serverless Serving for low-latency real-time inference, and batch inference via score_batch() for large-scale batch processing.
One question each from MLflow, Feature Store, and distributed training. After answering, review the explanation and the related concepts to lock in the knowledge.
MLflow
問題 1
In MLflow Model Registry in Unity Catalog, which is the recommended procedure for switching the production model to a new version?
正解: B
The Unity Catalog-integrated Model Registry manages model versions with aliases instead of the legacy Staging/Production stages. The recommended flow is: set the challenger alias on the new version → validate quality via A/B or shadow testing → if successful, move the champion alias to the new version. Alias changes take effect immediately, and Serving endpoints that reference the model via an alias like models:/model_name@champion will automatically use the new version. Option A is the old Workspace Model Registry approach and is no longer recommended in the Unity Catalog-integrated version.
Feature Store
問題 2
Why would you specify timestamp_lookup_key when calling create_training_set() in Feature Engineering in Unity Catalog?
正解: B
timestamp_lookup_key is the parameter that enables Point-in-Time Lookup. In time-series ML there is a fundamental data-leakage constraint: you must not use feature values from the future to predict at a given point in time. When you set timestamp_lookup_key to the event timestamp column, only the latest feature values that were available at each training row's timestamp are joined, preventing future information from leaking in. A typical example is purchase prediction, where only user features computed before the target date should be used. This is a frequent topic on MLP in particular.
Distributed Training
問題 3
What is the most important difference between using SparkTrials and Trials (single node) in Hyperopt?
正解: B
SparkTrials is Hyperopt's parallelization backend that distributes each hyperparameter trial across Spark worker nodes. For example, running a TPE search with max_evals=100 and SparkTrials(parallelism=10) executes up to 10 trials in parallel and completes in roughly one tenth the time of a single-node run. Trials, on the other hand, runs sequentially on the driver, so all 100 trials run in series. The search algorithm (TPE, random, etc.) can be specified for either, so A is wrong. There is no GPU/CPU restriction either, so C is wrong. On Databricks, combining either with mlflow.autolog() automatically logs the results to MLflow.
MLflow accounts for about 30% of MLA and is the foundation of the MLOps domain on MLP — touching more than 30% of the exam overall. It is the single most important topic. On Community Edition, run through mlflow.start_run(), log_param(), log_metric(), and log_model(). Walk through the MLflow UI, observe autolog() behavior, and practice registering models and setting aliases in the Model Registry end-to-end.
Lock in the trade-offs between Accuracy, Precision, Recall, F1, and AUC-ROC for classification, and RMSE, MAE, and R² for regression. In particular, the reason Accuracy is inappropriate on imbalanced data (predicting the majority class always yields high accuracy) and the Precision-Recall trade-off (raising one by threshold tuning lowers the other) are frequent exam topics.
Be able to map each phase — data preparation → feature engineering → model training → evaluation → deployment → monitoring — to the Databricks feature you would use. Internalizing the flow Feature Store → AutoML/MLflow → Model Registry → Model Serving → Lakehouse Monitoring is what lets you make the right design call on MLP scenario questions.
Do the MLA and MLP exam domains overlap?
MLA (ML Associate) and MLP (ML Professional) share some exam domains, but the depth of the questions is very different. MLA tests whether you can execute machine learning on Databricks: basic MLflow operations, using AutoML, Feature Store concepts, and core evaluation metrics. MLP tests whether you can design and operate ML systems in production: MLOps design decisions, distributed training optimization, model serving at scale, and A/B test design. MLP assumes you already have the MLA knowledge and is built around practical scenario questions.
What is the difference between MLflow Model Registry in Unity Catalog and Workspace Model Registry?
Workspace Model Registry manages models per workspace and uses stage transitions (Staging → Production → Archived) for lifecycle management. The Unity Catalog-integrated Model Registry lets you share models across the entire account and uses aliases (arbitrary names like champion/challenger) to reference versions instead of stages. As of 2026, Databricks recommends the Unity Catalog-integrated Model Registry, and alias-based management is the central focus on the exam.
How far can Community Edition take you when studying for the ML exams?
Community Edition gives you a free single-node cluster, so you can actually practice MLflow Tracking (experiment logging and autolog), scikit-learn/XGBoost model training, basic Feature Store operations, and AutoML runs. However, Model Serving (endpoint creation), distributed training (multi-node), and Serverless compute are not available, so you will need to supplement those topics with official documentation and question-bank practice.
Test yourself with the ML question bank
Build pass-ready skills with 550+ MLA and 400+ MLP practice questions
Try free questions →Related Databricks ML Articles
ML Associate Complete Guide
Full breakdown of MLA's exam scope, difficulty, and prep strategy
ML Professional Complete Guide
Full breakdown of MLP's exam scope, difficulty, and prep strategy
MLflow Complete Guide
Comprehensive coverage of Experiment Tracking and Model Registry
Free Databricks Question Bank
6,800+ bilingual practice questions
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Databricks Certifications: All 7 Exams, Difficulty & Study Plan (2026)
Complete guide to all 7 Databricks certifications — Data Eng...
Databricks Exam Difficulty Ranking: All 7 Certs Compared (2026)
Every Databricks certification ranked by difficulty, with st...
Databricks Study Guide: Fastest Pass Route & Time Estimates (2026)
How to pass Databricks certifications efficiently. Official ...
Databricks Data Engineer Associate: Complete Guide (2026)
Domain-by-domain breakdown of the Databricks Certified Data ...
Databricks Data Engineer Professional: Complete Guide (2026)
Tactics for the Databricks Certified Data Engineer Professio...