Snowflake

Snowflake Semi-Structured Data Processing

2026-03-21
更新: 2026-03-27
NicheeLab Editorial Team

Snowflake centers on the VARIANT type to store semi-structured data — JSON, Parquet, Avro, ORC, XML — natively inside relational tables and query it directly with SQL. The three operation patterns that show up most often on exams are: path access via colon notation (:), array unnesting via LATERAL FLATTEN, and construction via OBJECT_CONSTRUCT/ARRAY_AGG.

VARIANT Type Basics

VARIANT is a general-purpose type that can store any semi-structured data — JSON, objects, arrays, scalar values, and more. A single value can hold up to 16MB, and Snowflake stores it internally in a compressed columnar format.

Semi-structured TypeWhat it StoresUse Case
VARIANTAny semi-structured data (JSON / object / array / scalar)The most general-purpose; the landing zone for JSON loads
OBJECTA collection of key-value pairsJSON objects with a known structure
ARRAYAn ordered array of valuesLists, tags, and array data
-- Create a table with a VARIANT column
CREATE OR REPLACE TABLE raw.events (
  event_id   INT AUTOINCREMENT,
  received_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
  payload    VARIANT
);

-- Direct INSERT of JSON data
INSERT INTO raw.events (payload)
SELECT PARSE_JSON('{
  "user_id": 42,
  "action": "purchase",
  "items": [
    {"sku": "A100", "qty": 2, "price": 1500},
    {"sku": "B200", "qty": 1, "price": 3200}
  ],
  "metadata": {"source": "web", "version": "2.1"}
}');

Path Access with Colon Notation (:)

You access keys inside VARIANT data with colon notation (:). Use dot notation (.) for nested objects and bracket notation ([]) for array elements.

-- Accessing top-level keys
SELECT
  payload:user_id::INT       AS user_id,
  payload:action::STRING     AS action
FROM raw.events;

-- Accessing nested objects
SELECT
  payload:metadata.source::STRING  AS source,
  payload:metadata.version::STRING AS api_version
FROM raw.events;

-- Accessing array elements (0-indexed)
SELECT
  payload:items[0].sku::STRING   AS first_sku,
  payload:items[0].price::NUMBER AS first_price
FROM raw.events;

Without an explicit cast using :: (the cast operator), the result stays as VARIANT. You must cast whenever you use the value in a WHERE clause or a JOIN.

Unnesting Arrays with LATERAL FLATTEN

LATERAL FLATTEN is the function that expands arrays and objects inside a VARIANT into rows. It is the standard way to convert JSON array elements into a relational, one-row-per-element form.

-- Expand each item in the array into its own row
SELECT
  e.event_id,
  e.payload:user_id::INT    AS user_id,
  f.index                   AS item_index,
  f.value:sku::STRING       AS sku,
  f.value:qty::INT          AS quantity,
  f.value:price::NUMBER     AS price
FROM raw.events e,
  LATERAL FLATTEN(INPUT => e.payload:items) f;

FLATTEN Output Columns

ColumnDescription
SEQSequence number of the input record
KEYObject key name (NULL for arrays)
PATHPath string to the element
INDEXArray index (0-based)
VALUEThe expanded value (VARIANT)
THISThe entire source array/object being expanded

Unnesting Multiple Levels

-- Two-level unnesting (orders -> items)
SELECT
  o.value:order_id::INT      AS order_id,
  i.value:product::STRING    AS product,
  i.value:amount::NUMBER     AS amount
FROM raw.batch_data d,
  LATERAL FLATTEN(INPUT => d.payload:orders) o,
  LATERAL FLATTEN(INPUT => o.value:items) i;

Building Objects with OBJECT_CONSTRUCT

OBJECT_CONSTRUCT builds a VARIANT JSON object out of relational columns. Use it for data unloads (exporting as JSON) and for generating payloads to send to APIs.

-- Build a JSON object from relational data
SELECT OBJECT_CONSTRUCT(
  'order_id',   o.order_id,
  'customer',   o.customer_name,
  'total',      o.total_amount,
  'order_date', o.order_date::STRING
) AS order_json
FROM analytics.orders o
WHERE o.order_date = '2026-03-27';

-- OBJECT_CONSTRUCT_KEEP_NULL (keeps NULL keys)
SELECT OBJECT_CONSTRUCT_KEEP_NULL(
  'name',  e.name,
  'email', e.email,
  'phone', e.phone
) AS employee_json
FROM hr.employees e;

Aggregating into Arrays with ARRAY_AGG

ARRAY_AGG aggregates values from multiple rows into a single ARRAY. Combined with GROUP BY, it is useful for converting relational data into a denormalized JSON structure.

-- Aggregate per-customer order IDs into an array
SELECT
  customer_id,
  ARRAY_AGG(DISTINCT order_id) WITHIN GROUP (ORDER BY order_date)
    AS order_ids,
  ARRAY_SIZE(ARRAY_AGG(DISTINCT order_id)) AS order_count
FROM analytics.orders
GROUP BY customer_id;

-- Combining OBJECT_CONSTRUCT and ARRAY_AGG
SELECT OBJECT_CONSTRUCT(
  'customer_id', customer_id,
  'orders', ARRAY_AGG(
    OBJECT_CONSTRUCT(
      'order_id', order_id,
      'amount',   total_amount
    )
  )
) AS customer_orders_json
FROM analytics.orders
GROUP BY customer_id;

Loading Patterns for Semi-Structured Data

When loading JSON files, a very common pattern is using the STRIP_OUTER_ARRAY option to break the outermost array into individual rows.

-- Define a JSON File Format
CREATE OR REPLACE FILE FORMAT json_format
  TYPE = JSON
  STRIP_OUTER_ARRAY = TRUE
  ALLOW_DUPLICATE = FALSE;

-- Load from a stage into a VARIANT column
COPY INTO raw.events (payload)
FROM @data_stage/events/
FILE_FORMAT = json_format;

-- Load Parquet files (auto schema detection)
COPY INTO raw.parquet_data
FROM @data_stage/parquet/
FILE_FORMAT = (TYPE = PARQUET)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

Other Key Functions

FunctionUse CaseExample
PARSE_JSONConvert string -> VARIANTPARSE_JSON('{"key":"val"}')
TO_JSONConvert VARIANT -> JSON stringTO_JSON(payload)
ARRAY_SIZEGet the number of elements in an arrayARRAY_SIZE(payload:items)
ARRAY_CONTAINSCheck whether a value exists in an arrayARRAY_CONTAINS('A100'::VARIANT, payload:skus)
OBJECT_KEYSGet an object's keys as an arrayOBJECT_KEYS(payload:metadata)
GET_PATH / GETGet a VARIANT element by path stringGET_PATH(payload, 'metadata.source')
TYPEOFCheck the internal type of a VARIANT valueTYPEOF(payload:user_id)

Check Your Understanding

Semi-structured Data

問題 1

You want to expand the JSON array payload:items stored in a VARIANT column payload and retrieve the sku and price of each element. Which SQL is correct?

  1. SELECT f.value:sku, f.value:price FROM events, LATERAL FLATTEN(INPUT => payload:items) f
  2. SELECT payload:items.sku, payload:items.price FROM events
  3. SELECT SPLIT(payload:items, ',') AS sku, price FROM events
  4. SELECT PARSE_JSON(payload:items):sku, PARSE_JSON(payload:items):price FROM events

正解: A

Use LATERAL FLATTEN to expand an array inside a VARIANT into rows. You access each expanded element via f.value, and then pick keys with colon notation like f.value:sku. payload:items.sku is a property access on the array as a whole and won't produce the expected result. SPLIT is a string-splitting function and cannot be used to expand a VARIANT array.

Frequently Asked Questions

Is there a size limit for data stored in a VARIANT column?

A single VARIANT value is capped at 16MB. There are also size constraints on the micro-partitions that contain VARIANT columns, so in production you should aim to keep each record within a few MB. Rather than storing a giant JSON document as-is, it is better — both for query performance and for cost — to normalize the data with LATERAL FLATTEN and split it into separate tables.

Do queries against VARIANT columns perform as well as queries against relational columns?

Snowflake stores VARIANT data internally in a columnar format alongside metadata (type information and path information), so frequently accessed keys can be retrieved quickly. However, evaluating semi-structured paths via colon notation (:) or bracket notation ([]) does carry more overhead than scanning a relational column. The best practice is to pre-extract keys you frequently filter, JOIN, or aggregate on into relational columns using a VIEW or CTAS.

How do you unnest deeply nested arrays with FLATTEN?

LATERAL FLATTEN unnests a single level at a time, so for multi-level nesting you chain multiple LATERAL FLATTEN calls. For a 3-level structure like orders[].items[].tags[], you would write FROM raw, LATERAL FLATTEN(raw.data:orders) o, LATERAL FLATTEN(o.value:items) i, LATERAL FLATTEN(i.value:tags) t. Each FLATTEN result is passed to the next FLATTEN via its value column.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Snowflake

Snowflake Certifications: All 11 Exams Explained (2026)

Every SnowPro certification — Associate, Core, Specialty, Ad...

Snowflake

Snowflake Exam Difficulty Ranking: All 11 Certs Compared (2026)

All 11 SnowPro exams ranked by difficulty with study-time es...

Snowflake

Snowflake Study Guide: Fastest Pass Route by Exam (2026)

How to pass SnowPro certifications efficiently — official ma...

Snowflake

SnowPro Core (COF-C03): Complete Exam Guide (2026)

Pass the SnowPro Core exam — six domains, scope, sample ques...

Snowflake

SnowPro Associate Platform (SOL-C01): Complete Guide (2026)

The entry-level SnowPro Associate exam — scope, weighting, s...

Browse all Snowflake articles (103)
© 2026 NicheeLab All rights reserved.