Databricks

What is the RAG Pattern? Implementing Retrieval-Augmented Generation on Databricks

2026-03-21
更新: 2026-03-27
NicheeLab Editorial Team

RAG (Retrieval-Augmented Generation) is an architecture that combines an LLM's generation capability with retrieval over external knowledge, suppressing hallucinations while producing up-to-date, accurate answers. On the Databricks GenAI Engineer exam, roughly 30-40% of the questions cover RAG, testing your understanding of architecture design, component selection, and evaluation methods.

Overall RAG Architecture

A RAG pipeline consists of two stages: a retrieval phase and a generation phase. The user's query is converted into an Embedding, Vector Search retrieves related documents, and the prompt is then sent to the LLM together with that context.

┌─────────────────────────────────────────────────────────────┐
│                    RAG Pipeline                             │
│                                                             │
│  User Query                                                 │
│      │                                                      │
│      ▼                                                      │
│  ┌──────────────┐    ┌──────────────────┐                   │
│  │ Embedding    │───▶│ Vector Search    │                   │
│  │ Model        │    │ (similarity)     │                   │
│  └──────────────┘    └───────┬──────────┘                   │
│                              │ Top-K chunks                 │
│                              ▼                              │
│                     ┌──────────────────┐                    │
│                     │ Prompt Template  │                    │
│                     │ (Query+Context)  │                    │
│                     └───────┬──────────┘                    │
│                             │                               │
│                             ▼                               │
│                     ┌──────────────────┐                    │
│                     │ LLM (generation) │                    │
│                     └───────┬──────────┘                    │
│                             │                               │
│                             ▼                               │
│                        Response                             │
└─────────────────────────────────────────────────────────────┘

This pipeline design lets the LLM access information it did not see at training time (internal documents, fresh data, etc.) and produce grounded answers.

Comparing Chunking Strategies

Before storing documents in Vector Search, you need to split them into appropriately sized chunks. Chunking strategy directly affects retrieval accuracy, so the right choice depends on the characteristics of the document.

StrategySplit CriteriaProsConsWhen to Use
Fixed-sizeFixed token count (e.g., 512 tokens)Simple to implement, fast to processMay cut sentences in the middleUniform documents (FAQs, logs)
SemanticSemantic sentence boundariesHigh semantic coherenceAdditional Embedding model costTechnical documents, papers
RecursiveHierarchical split: paragraph → sentence → tokenPreserves structure while controlling sizeRequires parameter tuningMarkdown and structured HTML documents

The GenAI Engineer exam asks which chunking strategy to pick based on the type of document. For example, Recursive fits a document with heading structure (like an internal wiki), while Fixed-size suits a collection of short FAQs.

Choosing an Embedding Model

Embedding models convert text into a vector space and form the foundation of similarity search. On Databricks, you can use models served by the Foundation Model API or call external APIs.

ModelDimensionsJapanese SupportNotes
BGE-large-en1024LimitedOpen-source, self-hostable
Instructor768LimitedTask-instruction-aware Embedding
OpenAI text-embedding-3256-3072 (variable)SupportedHigh accuracy, usage-based API pricing
GTE-large (provided by Databricks)1024SupportedReady to use via the Foundation Model API

If you want to stay entirely within Databricks, the GTE model on the Foundation Model API is the easiest to integrate. You can also use the External Model feature to call external APIs such as OpenAI.

Configuring Vector Search

Databricks Vector Search is a managed vector database that offers similarity search under Unity Catalog-integrated access control. The choice of index type has a major impact on RAG architecture operability.

ItemDelta Sync IndexDirect Vector Access Index
Data SourceDelta Table (auto-synced)Direct writes via REST API
Update ModeAuto-updates when the source table changesManually insert and update vectors
EmbeddingAuto-computed (specify a model) or precomputed columnPrecomputed vectors only
When to UseRAG over internal documents (periodic updates)Real-time integration with external systems

Prompt Engineering Templates

In RAG, the prompt you hand to the LLM determines answer quality. Passing the retrieved context and the user's query in a structured way helps suppress hallucinations.

# Example RAG prompt template

prompt_template = """
You are an assistant that answers based on internal documents.
Use ONLY the context below to answer the question.
If the context does not contain the answer, reply "Information not found."

## Context
{context}

## Question
{query}

## Answer
"""

# Running it on Databricks Foundation Model API
import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")

response = client.predict(
    endpoint="databricks-meta-llama-3-1-70b-instruct",
    inputs={
        "messages": [
            {"role": "system", "content": "Answer based on the internal documents."},
            {"role": "user", "content": prompt_template.format(
                context=retrieved_context,
                query=user_query
            )}
        ],
        "max_tokens": 1024,
        "temperature": 0.1
    }
)

Setting temperature low (0.0-0.2) makes it easier to produce answers that stay faithful to the context. The exam tests how important it is to include a constraint like "do not use information outside the context" in the prompt template.

RAG Evaluation Metrics and MLflow evaluate()

To quantitatively evaluate RAG pipeline quality, use the following three metrics. On Databricks, they are built into MLflow evaluate().

MetricWhat It MeasuresJudgment Criterion
FaithfulnessWhether the answer is faithful to the contextWhether each sentence of the answer is supported by the context
Answer RelevanceWhether the answer is relevant to the questionWhether the answer correctly captures the intent of the question
Context PrecisionRetrieval precisionWhether the retrieved chunks contain ones relevant to the question
import mlflow

# Run the RAG evaluation pipeline
eval_data = pd.DataFrame({
    "questions": ["What are Databricks cluster policies?", ...],
    "ground_truth": ["A feature that defines constraints on cluster creation settings.", ...],
    "retrieved_context": [retrieved_chunks_list, ...],
    "generated_answers": [rag_responses_list, ...]
})

results = mlflow.evaluate(
    data=eval_data,
    model_type="question-answering",
    evaluators="default",
    extra_metrics=[
        mlflow.metrics.genai.faithfulness(),
        mlflow.metrics.genai.relevance(),
    ]
)

print(results.metrics)
# {'faithfulness/v1/mean': 0.92, 'relevance/v1/mean': 0.88, ...}

MLflow evaluate() uses the LLM-as-a-Judge pattern, having a separate LLM score the answer quality. Each metric is rated on a 1-5 scale and can be used as a threshold-based quality gate for the pipeline.

RAG vs Fine-tuning Comparison

DimensionRAGFine-tuning
Knowledge UpdatesReflected immediately by updating documentsRequires retraining
Citing SourcesCan cite source documentsEmbedded in the model; cannot be cited
CostVector Search plus LLM call at inference timeTraining cost plus inference cost
LatencySlightly slower due to the retrieval stepFast — a single model call
HallucinationsSuppressed via context constraintsMore likely on questions outside the training data
Example Use CasesInternal Q&A, document searchCode generation, domain-specific style transfer

Key Topics on the GenAI Engineer Exam

  • Choosing a chunking strategy: picking Fixed / Semantic / Recursive based on document characteristics
  • Choosing an index type: criteria for Delta Sync Index vs Direct Vector Access Index
  • Understanding evaluation metrics: definitions and measurement methods for Faithfulness, Relevance, and Context Precision
  • RAG vs Fine-tuning: picking the right approach for the use case
  • Prompt design: building a prompt template that includes context constraints
  • MLflow evaluate(): evaluation methods for RAG pipeline quality

Roughly 30-40% of the GenAI Engineer exam covers RAG. It tests more than just architectural understanding — you need practical judgment about which component to choose in a given scenario.

Sample Question

RAG / GenAI Engineer

問題 1

A company is building a Q&A chatbot powered by its internal knowledge base (thousands of pages on Confluence). The documents are updated weekly and every answer must include a link to the source page. Which approach best fits these requirements?

  1. Fine-tune the LLM on all internal documents and retrain it periodically
  2. Build a RAG pipeline, auto-sync weekly updates via a Delta Sync Index, and pull source URLs from the retrieved chunks' metadata
  3. Stuff the entire document set directly into the LLM's context window
  4. Pass document summaries to the LLM as a few-shot prompt every time

正解: B

For a Q&A bot driven by weekly-updated documents, RAG is the best fit because data changes are reflected immediately. With a Delta Sync Index, source-table changes auto-sync into Vector Search, and chunk metadata (e.g., source URLs) is preserved so you can surface source links. Fine-tuning (A) makes citing sources hard and retraining is costly. Stuffing all documents directly (C) exceeds the context window. Few-shot summaries (D) cannot guarantee coverage or accuracy.

Frequently Asked Questions

When should I use RAG vs Fine-tuning?

RAG is the right choice when you need access to fresh information or external knowledge such as internal documents. Fine-tuning fits when you want to permanently change the model's output format, tone, or domain vocabulary. The Databricks GenAI Engineer exam frequently tests this decision criterion. Hybrid setups that combine both approaches are also effective in production.

What components are required to build RAG on Databricks?

At minimum you need four components: (1) a pipeline that splits documents into chunks, (2) an Embedding model (Foundation Model API or external API), (3) a Vector Search Index (Delta Sync Index or Direct Vector Access Index), and (4) an LLM (Foundation Model API or External Model). It is also recommended to add a quality evaluation pipeline using MLflow evaluate().

What evaluation metrics are used for RAG?

There are three main metrics: Faithfulness (whether the generated answer is faithful to the context), Answer Relevance (whether the answer properly addresses the user's question), and Context Precision (whether the retrieved context contains chunks relevant to the question). On Databricks, these metrics are built into MLflow evaluate() and can be automatically scored using the LLM-as-a-Judge pattern.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Databricks

Databricks Certifications: All 7 Exams, Difficulty & Study Plan (2026)

Complete guide to all 7 Databricks certifications — Data Eng...

Databricks

Databricks Exam Difficulty Ranking: All 7 Certs Compared (2026)

Every Databricks certification ranked by difficulty, with st...

Databricks

Databricks Study Guide: Fastest Pass Route & Time Estimates (2026)

How to pass Databricks certifications efficiently. Official ...

Databricks

Databricks Data Engineer Associate: Complete Guide (2026)

Domain-by-domain breakdown of the Databricks Certified Data ...

Databricks

Databricks Data Engineer Professional: Complete Guide (2026)

Tactics for the Databricks Certified Data Engineer Professio...

Browse all Databricks articles (110)
© 2026 NicheeLab All rights reserved.