RAG (Retrieval-Augmented Generation) is an architecture that combines an LLM's generation capability with retrieval over external knowledge, suppressing hallucinations while producing up-to-date, accurate answers. On the Databricks GenAI Engineer exam, roughly 30-40% of the questions cover RAG, testing your understanding of architecture design, component selection, and evaluation methods.
A RAG pipeline consists of two stages: a retrieval phase and a generation phase. The user's query is converted into an Embedding, Vector Search retrieves related documents, and the prompt is then sent to the LLM together with that context.
┌─────────────────────────────────────────────────────────────┐
│ RAG Pipeline │
│ │
│ User Query │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ Embedding │───▶│ Vector Search │ │
│ │ Model │ │ (similarity) │ │
│ └──────────────┘ └───────┬──────────┘ │
│ │ Top-K chunks │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Prompt Template │ │
│ │ (Query+Context) │ │
│ └───────┬──────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ LLM (generation) │ │
│ └───────┬──────────┘ │
│ │ │
│ ▼ │
│ Response │
└─────────────────────────────────────────────────────────────┘This pipeline design lets the LLM access information it did not see at training time (internal documents, fresh data, etc.) and produce grounded answers.
Before storing documents in Vector Search, you need to split them into appropriately sized chunks. Chunking strategy directly affects retrieval accuracy, so the right choice depends on the characteristics of the document.
| Strategy | Split Criteria | Pros | Cons | When to Use |
|---|---|---|---|---|
| Fixed-size | Fixed token count (e.g., 512 tokens) | Simple to implement, fast to process | May cut sentences in the middle | Uniform documents (FAQs, logs) |
| Semantic | Semantic sentence boundaries | High semantic coherence | Additional Embedding model cost | Technical documents, papers |
| Recursive | Hierarchical split: paragraph → sentence → token | Preserves structure while controlling size | Requires parameter tuning | Markdown and structured HTML documents |
The GenAI Engineer exam asks which chunking strategy to pick based on the type of document. For example, Recursive fits a document with heading structure (like an internal wiki), while Fixed-size suits a collection of short FAQs.
Embedding models convert text into a vector space and form the foundation of similarity search. On Databricks, you can use models served by the Foundation Model API or call external APIs.
| Model | Dimensions | Japanese Support | Notes |
|---|---|---|---|
| BGE-large-en | 1024 | Limited | Open-source, self-hostable |
| Instructor | 768 | Limited | Task-instruction-aware Embedding |
| OpenAI text-embedding-3 | 256-3072 (variable) | Supported | High accuracy, usage-based API pricing |
| GTE-large (provided by Databricks) | 1024 | Supported | Ready to use via the Foundation Model API |
If you want to stay entirely within Databricks, the GTE model on the Foundation Model API is the easiest to integrate. You can also use the External Model feature to call external APIs such as OpenAI.
Databricks Vector Search is a managed vector database that offers similarity search under Unity Catalog-integrated access control. The choice of index type has a major impact on RAG architecture operability.
| Item | Delta Sync Index | Direct Vector Access Index |
|---|---|---|
| Data Source | Delta Table (auto-synced) | Direct writes via REST API |
| Update Mode | Auto-updates when the source table changes | Manually insert and update vectors |
| Embedding | Auto-computed (specify a model) or precomputed column | Precomputed vectors only |
| When to Use | RAG over internal documents (periodic updates) | Real-time integration with external systems |
In RAG, the prompt you hand to the LLM determines answer quality. Passing the retrieved context and the user's query in a structured way helps suppress hallucinations.
# Example RAG prompt template
prompt_template = """
You are an assistant that answers based on internal documents.
Use ONLY the context below to answer the question.
If the context does not contain the answer, reply "Information not found."
## Context
{context}
## Question
{query}
## Answer
"""
# Running it on Databricks Foundation Model API
import mlflow.deployments
client = mlflow.deployments.get_deploy_client("databricks")
response = client.predict(
endpoint="databricks-meta-llama-3-1-70b-instruct",
inputs={
"messages": [
{"role": "system", "content": "Answer based on the internal documents."},
{"role": "user", "content": prompt_template.format(
context=retrieved_context,
query=user_query
)}
],
"max_tokens": 1024,
"temperature": 0.1
}
)Setting temperature low (0.0-0.2) makes it easier to produce answers that stay faithful to the context. The exam tests how important it is to include a constraint like "do not use information outside the context" in the prompt template.
To quantitatively evaluate RAG pipeline quality, use the following three metrics. On Databricks, they are built into MLflow evaluate().
| Metric | What It Measures | Judgment Criterion |
|---|---|---|
| Faithfulness | Whether the answer is faithful to the context | Whether each sentence of the answer is supported by the context |
| Answer Relevance | Whether the answer is relevant to the question | Whether the answer correctly captures the intent of the question |
| Context Precision | Retrieval precision | Whether the retrieved chunks contain ones relevant to the question |
import mlflow
# Run the RAG evaluation pipeline
eval_data = pd.DataFrame({
"questions": ["What are Databricks cluster policies?", ...],
"ground_truth": ["A feature that defines constraints on cluster creation settings.", ...],
"retrieved_context": [retrieved_chunks_list, ...],
"generated_answers": [rag_responses_list, ...]
})
results = mlflow.evaluate(
data=eval_data,
model_type="question-answering",
evaluators="default",
extra_metrics=[
mlflow.metrics.genai.faithfulness(),
mlflow.metrics.genai.relevance(),
]
)
print(results.metrics)
# {'faithfulness/v1/mean': 0.92, 'relevance/v1/mean': 0.88, ...}MLflow evaluate() uses the LLM-as-a-Judge pattern, having a separate LLM score the answer quality. Each metric is rated on a 1-5 scale and can be used as a threshold-based quality gate for the pipeline.
| Dimension | RAG | Fine-tuning |
|---|---|---|
| Knowledge Updates | Reflected immediately by updating documents | Requires retraining |
| Citing Sources | Can cite source documents | Embedded in the model; cannot be cited |
| Cost | Vector Search plus LLM call at inference time | Training cost plus inference cost |
| Latency | Slightly slower due to the retrieval step | Fast — a single model call |
| Hallucinations | Suppressed via context constraints | More likely on questions outside the training data |
| Example Use Cases | Internal Q&A, document search | Code generation, domain-specific style transfer |
Roughly 30-40% of the GenAI Engineer exam covers RAG. It tests more than just architectural understanding — you need practical judgment about which component to choose in a given scenario.
RAG / GenAI Engineer
問題 1
A company is building a Q&A chatbot powered by its internal knowledge base (thousands of pages on Confluence). The documents are updated weekly and every answer must include a link to the source page. Which approach best fits these requirements?
正解: B
For a Q&A bot driven by weekly-updated documents, RAG is the best fit because data changes are reflected immediately. With a Delta Sync Index, source-table changes auto-sync into Vector Search, and chunk metadata (e.g., source URLs) is preserved so you can surface source links. Fine-tuning (A) makes citing sources hard and retraining is costly. Stuffing all documents directly (C) exceeds the context window. Few-shot summaries (D) cannot guarantee coverage or accuracy.
When should I use RAG vs Fine-tuning?
RAG is the right choice when you need access to fresh information or external knowledge such as internal documents. Fine-tuning fits when you want to permanently change the model's output format, tone, or domain vocabulary. The Databricks GenAI Engineer exam frequently tests this decision criterion. Hybrid setups that combine both approaches are also effective in production.
What components are required to build RAG on Databricks?
At minimum you need four components: (1) a pipeline that splits documents into chunks, (2) an Embedding model (Foundation Model API or external API), (3) a Vector Search Index (Delta Sync Index or Direct Vector Access Index), and (4) an LLM (Foundation Model API or External Model). It is also recommended to add a quality evaluation pipeline using MLflow evaluate().
What evaluation metrics are used for RAG?
There are three main metrics: Faithfulness (whether the generated answer is faithful to the context), Answer Relevance (whether the answer properly addresses the user's question), and Context Precision (whether the retrieved context contains chunks relevant to the question). On Databricks, these metrics are built into MLflow evaluate() and can be automatically scored using the LLM-as-a-Judge pattern.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Databricks Certifications: All 7 Exams, Difficulty & Study Plan (2026)
Complete guide to all 7 Databricks certifications — Data Eng...
Databricks Exam Difficulty Ranking: All 7 Certs Compared (2026)
Every Databricks certification ranked by difficulty, with st...
Databricks Study Guide: Fastest Pass Route & Time Estimates (2026)
How to pass Databricks certifications efficiently. Official ...
Databricks Data Engineer Associate: Complete Guide (2026)
Domain-by-domain breakdown of the Databricks Certified Data ...
Databricks Data Engineer Professional: Complete Guide (2026)
Tactics for the Databricks Certified Data Engineer Professio...