Snowpark is a development framework for processing data inside Snowflake programmatically from Python, Scala, and Java. Every DataFrame operation runs on a Snowflake warehouse, so your data never has to leave Snowflake. Lazy evaluation means no SQL is issued until you call an action, and the Snowflake Optimizer generates the best execution plan for you.
| Item | Details |
|---|---|
| Questions | 65 |
| Time limit | 115 minutes |
| Exam fee | $225 USD |
| Passing score | 750 / 1000 |
| Prerequisite | SnowPro Core Certification |
| Validity | 2 years (recertification available) |
| Domain | Weight | Key topics |
|---|---|---|
| Snowflake Core Knowledge | ~15% | Architecture, warehouses, caching, security |
| DataFrame API | ~30% | Session, DataFrame operations, lazy eval, window functions |
| UDF / UDTF / Stored Procedure | ~25% | Scalar UDF, Vectorized UDF, UDTF, Sproc, permissions |
| Data Engineering | ~15% | File operations, Dynamic Tables, Tasks, Streams |
| ML Integration | ~15% | Snowpark ML, Model Registry, Feature Store |
Snowpark's DataFrame API lets you write data processing as Spark-style method chains. Every method compiles down to SQL, so the Snowflake Optimizer handles the optimization for you.
| Method | Purpose | SQL equivalent |
|---|---|---|
| select() | Column selection | SELECT col1, col2 |
| filter() / where() | Row filtering | WHERE condition |
| group_by().agg() | Group aggregation | GROUP BY + aggregate functions |
| join() | Table join | JOIN ... ON |
| with_column() | Add or transform column | SELECT ..., expr AS alias |
| sort() / order_by() | Sort | ORDER BY |
| collect() | Fetch results (action) | Execute query + return results |
| show() | Display results (action) | Execute query + console output |
| write.save_as_table() | Save table (action) | CREATE TABLE AS SELECT |
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col, sum as sum_, avg, count
session = Session.builder.configs(connection_params).create()
# Lazy evaluation: no SQL is executed at this point
df = session.table("SALES")
result = (
df.filter(col("SALE_DATE") >= "2026-01-01")
.group_by("REGION", "PRODUCT_CATEGORY")
.agg(
sum_("AMOUNT").alias("TOTAL_SALES"),
avg("AMOUNT").alias("AVG_SALE"),
count("ORDER_ID").alias("ORDER_COUNT"),
)
.sort(col("TOTAL_SALES").desc())
)
# collect() issues the SQL and runs it on Snowflake
rows = result.collect()
for row in rows:
print(f"{row['REGION']}: {row['TOTAL_SALES']}")| Aspect | UDF | UDTF | Stored Procedure |
|---|---|---|---|
| Invocation | Inside SELECT (scalar function) | FROM clause (TABLE(func())) | CALL statement |
| Return value | Scalar value (one value per row) | Table (multiple rows) | Arbitrary (side effects are the main goal) |
| Side effects | Not allowed (pure function) | Not allowed (pure function) | Allowed (DDL/DML supported) |
| Permission model | Caller Rights | Caller Rights | Caller Rights / Owner Rights |
| Languages | Python / Scala / Java / SQL | Python / Scala / Java / SQL | Python / Scala / Java / SQL / JavaScript |
from snowflake.snowpark.functions import udf
from snowflake.snowpark.types import StringType, IntegerType
# Inline UDF
@udf(name="calculate_tax", return_type=IntegerType(),
input_types=[IntegerType()], replace=True, is_permanent=False)
def calculate_tax(amount: int) -> int:
return int(amount * 0.1)
# SQL call: SELECT calculate_tax(AMOUNT) FROM ORDERS;
# --- UDTF definition example ---
from snowflake.snowpark.functions import udtf
from snowflake.snowpark.types import StructType, StructField
class SplitTags:
def process(self, tags: str):
for tag in tags.split(","):
yield (tag.strip(),)
session.udtf.register(
SplitTags,
output_schema=StructType([StructField("TAG", StringType())]),
input_types=[StringType()],
name="split_tags",
replace=True,
)
# SQL call: SELECT * FROM TABLE(split_tags('ml,ai,data'));from snowflake.snowpark.functions import sproc
@sproc(name="refresh_summary", replace=True, is_permanent=True,
stage_location="@DEPLOY_STAGE",
packages=["snowflake-snowpark-python"])
def refresh_summary(session: Session) -> str:
source = session.table("RAW_EVENTS")
summary = (
source.group_by("EVENT_TYPE")
.agg(count("*").alias("CNT"))
)
summary.write.save_as_table("EVENT_SUMMARY", mode="overwrite")
return "EVENT_SUMMARY refreshed"
# Execution: CALL refresh_summary();A Vectorized UDF takes pandas.Series as input and output, processing in batches instead of row by row, so it runs much faster than a regular UDF. It is well-suited to performance-sensitive numerical computation and ML inference.
from snowflake.snowpark.functions import pandas_udf
from snowflake.snowpark.types import PandasSeriesType, IntegerType
import pandas as pd
@pandas_udf(name="batch_normalize", return_type=PandasSeriesType(IntegerType()),
input_types=[PandasSeriesType(IntegerType())],
replace=True)
def batch_normalize(series: pd.Series) -> pd.Series:
mean = series.mean()
std = series.std()
return ((series - mean) / std * 100).astype(int)
# SQL: SELECT batch_normalize(SCORE) FROM EXAM_RESULTS;
# Batch processing is dozens of times faster than a row-wise UDFSnowpark ML is a library that lets you train and serve ML models entirely inside Snowflake. It exposes a scikit-learn-compatible API for fit/predict, and models are registered to the Snowflake Model Registry for production deployment.
from snowflake.ml.modeling.linear_model import LogisticRegression
from snowflake.ml.registry import Registry
# Training
train_df = session.table("TRAINING_DATA")
model = LogisticRegression(
input_cols=["FEATURE_A", "FEATURE_B", "FEATURE_C"],
label_cols=["LABEL"],
output_cols=["PREDICTION"],
)
model.fit(train_df)
# Inference
test_df = session.table("TEST_DATA")
predictions = model.predict(test_df)
predictions.write.save_as_table("PREDICTIONS", mode="overwrite")
# Model registration
reg = Registry(session)
mv = reg.log_model(
model,
model_name="churn_classifier",
version_name="v1",
sample_input_data=train_df.limit(10),
)| Snowpark ML component | Role |
|---|---|
| snowflake.ml.modeling | scikit-learn-compatible ML API (preprocessing, training, inference) |
| Model Registry | Model version management and deployment |
| Feature Store | Feature management, sharing, and point-in-time joins |
SPCS is a managed service for running Docker containers inside Snowflake. While Snowpark focuses on DataFrame operations and UDF/Sproc execution, SPCS runs general-purpose workloads such as GPU-based model training, custom REST APIs, and full-stack web applications.
| Aspect | Snowpark UDF/Sproc | SPCS |
|---|---|---|
| Execution model | SQL function / CALL statement | Long-running container service |
| GPU | Not supported | Supported |
| External network | Restricted | Allowed via External Access Integration |
| Use cases | Data transformation, lightweight ML inference | Large-scale ML training, custom APIs, web apps |
| Period | Phase | What to study |
|---|---|---|
| Months 1-2 | Core review + Python fundamentals | Snowflake architecture, basic Python/pandas operations |
| Months 3-4 | Focused DataFrame API study | Session setup, DataFrame operations, window functions, file I/O |
| Months 5-6 | UDF / UDTF / Sproc | Different definition styles, Vectorized UDF, permission model, package management |
| Months 7-8 | Data Engineering + ML | Dynamic Tables, Streams/Tasks, Snowpark ML, Model Registry |
| Months 9-12 | Mock exams + targeted review | Take multiple full-length 65-question mock exams and revisit the domains you missed |
Snowpark
問題 1
You want to build a Python UDF in Snowpark and apply it to a column in a SELECT statement to perform a custom transformation on each row's text data. The UDF returns a single scalar value. To process a large number of rows quickly, you want batch execution instead of row-by-row. Which implementation is most appropriate?
正解: A
A Vectorized UDF (@pandas_udf) is ideal for batch processing many rows. Using pandas.Series for input and output removes the per-row function-call overhead and is dramatically faster than a regular UDF. Option B's is_permanent simply controls whether the UDF is persisted and has no effect on speed. Option C's UDTF returns a table (multiple rows), which does not match the scalar return requirement. Option D's Stored Procedure is invoked with a CALL statement and cannot be applied per row inside a SELECT statement.
How does Snowpark's DataFrame API differ from Spark's DataFrame API?
Snowpark's DataFrame API offers Spark-like syntax (select, filter, group_by, join, and so on), but where the work runs is fundamentally different. Spark distributes processing across drivers and executors, whereas Snowpark translates every DataFrame operation into SQL that runs on a Snowflake warehouse. The data never leaves Snowflake and benefits from the Snowflake Optimizer. Snowpark also uses lazy evaluation: no SQL is actually issued until you call an action such as collect() or show().
When should you use UDF vs UDTF vs Stored Procedure?
A UDF (User-Defined Function) is a scalar function used inside a SELECT statement, ideal for per-row transformations and custom calculations. A UDTF (User-Defined Table Function) is called in the FROM clause and returns multiple rows per input row — useful for JSON expansion or log parsing. A Stored Procedure is invoked with a CALL statement and can issue DDL/DML or orchestrate multi-step workflows. Anything with side effects (creating tables, mutating data) belongs in a Stored Procedure. The exam frequently tests the differences in invocation style and return values across these three.
What is Snowpark Container Services, and how does it relate to Snowpark?
Snowpark Container Services (SPCS) is a managed service for running Docker containers inside Snowflake. While Snowpark handles DataFrame operations and UDF/UDTF/Stored Procedure execution, SPCS runs more general-purpose workloads — full-stack applications, GPU-based ML model training, custom REST APIs — all within Snowflake. A common integration pattern is to invoke an SPCS service from a Snowpark Stored Procedure. On the exam, SPCS shows up as the answer for any 'container-based processing' option.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Snowflake Certifications: All 11 Exams Explained (2026)
Every SnowPro certification — Associate, Core, Specialty, Ad...
Snowflake Exam Difficulty Ranking: All 11 Certs Compared (2026)
All 11 SnowPro exams ranked by difficulty with study-time es...
Snowflake Study Guide: Fastest Pass Route by Exam (2026)
How to pass SnowPro certifications efficiently — official ma...
SnowPro Core (COF-C03): Complete Exam Guide (2026)
Pass the SnowPro Core exam — six domains, scope, sample ques...
SnowPro Associate Platform (SOL-C01): Complete Guide (2026)
The entry-level SnowPro Associate exam — scope, weighting, s...