Snowflake Semi-Structured Data: VARIANT, OBJECT, ARRAY (2026)

Snowflake centers on the VARIANT type to store semi-structured data — JSON, Parquet, Avro, ORC, XML — natively inside relational tables and query it directly with SQL. The three operation patterns that show up most often on exams are: path access via colon notation (:), array unnesting via LATERAL FLATTEN, and construction via OBJECT_CONSTRUCT/ARRAY_AGG.

VARIANT Type Basics

VARIANT is a general-purpose type that can store any semi-structured data — JSON, objects, arrays, scalar values, and more. A single value can hold up to 16MB, and Snowflake stores it internally in a compressed columnar format.

Semi-structured Type	What it Stores	Use Case
VARIANT	Any semi-structured data (JSON / object / array / scalar)	The most general-purpose; the landing zone for JSON loads
OBJECT	A collection of key-value pairs	JSON objects with a known structure
ARRAY	An ordered array of values	Lists, tags, and array data

-- Create a table with a VARIANT column
CREATE OR REPLACE TABLE raw.events (
  event_id   INT AUTOINCREMENT,
  received_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
  payload    VARIANT
);

-- Direct INSERT of JSON data
INSERT INTO raw.events (payload)
SELECT PARSE_JSON('{
  "user_id": 42,
  "action": "purchase",
  "items": [
    {"sku": "A100", "qty": 2, "price": 1500},
    {"sku": "B200", "qty": 1, "price": 3200}
  ],
  "metadata": {"source": "web", "version": "2.1"}
}');

Path Access with Colon Notation (:)

You access keys inside VARIANT data with colon notation (:). Use dot notation (.) for nested objects and bracket notation ([]) for array elements.

-- Accessing top-level keys
SELECT
  payload:user_id::INT       AS user_id,
  payload:action::STRING     AS action
FROM raw.events;

-- Accessing nested objects
SELECT
  payload:metadata.source::STRING  AS source,
  payload:metadata.version::STRING AS api_version
FROM raw.events;

-- Accessing array elements (0-indexed)
SELECT
  payload:items[0].sku::STRING   AS first_sku,
  payload:items[0].price::NUMBER AS first_price
FROM raw.events;

Without an explicit cast using :: (the cast operator), the result stays as VARIANT. You must cast whenever you use the value in a WHERE clause or a JOIN.

Unnesting Arrays with LATERAL FLATTEN

LATERAL FLATTEN is the function that expands arrays and objects inside a VARIANT into rows. It is the standard way to convert JSON array elements into a relational, one-row-per-element form.

-- Expand each item in the array into its own row
SELECT
  e.event_id,
  e.payload:user_id::INT    AS user_id,
  f.index                   AS item_index,
  f.value:sku::STRING       AS sku,
  f.value:qty::INT          AS quantity,
  f.value:price::NUMBER     AS price
FROM raw.events e,
  LATERAL FLATTEN(INPUT => e.payload:items) f;

FLATTEN Output Columns

Column	Description
SEQ	Sequence number of the input record
KEY	Object key name (NULL for arrays)
PATH	Path string to the element
INDEX	Array index (0-based)
VALUE	The expanded value (VARIANT)
THIS	The entire source array/object being expanded

Unnesting Multiple Levels

-- Two-level unnesting (orders -> items)
SELECT
  o.value:order_id::INT      AS order_id,
  i.value:product::STRING    AS product,
  i.value:amount::NUMBER     AS amount
FROM raw.batch_data d,
  LATERAL FLATTEN(INPUT => d.payload:orders) o,
  LATERAL FLATTEN(INPUT => o.value:items) i;

Building Objects with OBJECT_CONSTRUCT

OBJECT_CONSTRUCT builds a VARIANT JSON object out of relational columns. Use it for data unloads (exporting as JSON) and for generating payloads to send to APIs.

-- Build a JSON object from relational data
SELECT OBJECT_CONSTRUCT(
  'order_id',   o.order_id,
  'customer',   o.customer_name,
  'total',      o.total_amount,
  'order_date', o.order_date::STRING
) AS order_json
FROM analytics.orders o
WHERE o.order_date = '2026-03-27';

-- OBJECT_CONSTRUCT_KEEP_NULL (keeps NULL keys)
SELECT OBJECT_CONSTRUCT_KEEP_NULL(
  'name',  e.name,
  'email', e.email,
  'phone', e.phone
) AS employee_json
FROM hr.employees e;

Aggregating into Arrays with ARRAY_AGG

ARRAY_AGG aggregates values from multiple rows into a single ARRAY. Combined with GROUP BY, it is useful for converting relational data into a denormalized JSON structure.

-- Aggregate per-customer order IDs into an array
SELECT
  customer_id,
  ARRAY_AGG(DISTINCT order_id) WITHIN GROUP (ORDER BY order_date)
    AS order_ids,
  ARRAY_SIZE(ARRAY_AGG(DISTINCT order_id)) AS order_count
FROM analytics.orders
GROUP BY customer_id;

-- Combining OBJECT_CONSTRUCT and ARRAY_AGG
SELECT OBJECT_CONSTRUCT(
  'customer_id', customer_id,
  'orders', ARRAY_AGG(
    OBJECT_CONSTRUCT(
      'order_id', order_id,
      'amount',   total_amount
    )
  )
) AS customer_orders_json
FROM analytics.orders
GROUP BY customer_id;

Loading Patterns for Semi-Structured Data

When loading JSON files, a very common pattern is using the STRIP_OUTER_ARRAY option to break the outermost array into individual rows.

-- Define a JSON File Format
CREATE OR REPLACE FILE FORMAT json_format
  TYPE = JSON
  STRIP_OUTER_ARRAY = TRUE
  ALLOW_DUPLICATE = FALSE;

-- Load from a stage into a VARIANT column
COPY INTO raw.events (payload)
FROM @data_stage/events/
FILE_FORMAT = json_format;

-- Load Parquet files (auto schema detection)
COPY INTO raw.parquet_data
FROM @data_stage/parquet/
FILE_FORMAT = (TYPE = PARQUET)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

Other Key Functions

Function	Use Case	Example
PARSE_JSON	Convert string -> VARIANT	PARSE_JSON('{"key":"val"}')
TO_JSON	Convert VARIANT -> JSON string	TO_JSON(payload)
ARRAY_SIZE	Get the number of elements in an array	ARRAY_SIZE(payload:items)
ARRAY_CONTAINS	Check whether a value exists in an array	ARRAY_CONTAINS('A100'::VARIANT, payload:skus)
OBJECT_KEYS	Get an object's keys as an array	OBJECT_KEYS(payload:metadata)
GET_PATH / GET	Get a VARIANT element by path string	GET_PATH(payload, 'metadata.source')
TYPEOF	Check the internal type of a VARIANT value	TYPEOF(payload:user_id)

Check Your Understanding

Semi-structured Data

問題 1

You want to expand the JSON array payload:items stored in a VARIANT column payload and retrieve the sku and price of each element. Which SQL is correct?

SELECT f.value:sku, f.value:price FROM events, LATERAL FLATTEN(INPUT => payload:items) f
SELECT payload:items.sku, payload:items.price FROM events
SELECT SPLIT(payload:items, ',') AS sku, price FROM events
SELECT PARSE_JSON(payload:items):sku, PARSE_JSON(payload:items):price FROM events

正解: A

Use LATERAL FLATTEN to expand an array inside a VARIANT into rows. You access each expanded element via f.value, and then pick keys with colon notation like f.value:sku. payload:items.sku is a property access on the array as a whole and won't produce the expected result. SPLIT is a string-splitting function and cannot be used to expand a VARIANT array.

Frequently Asked Questions

Is there a size limit for data stored in a VARIANT column?

A single VARIANT value is capped at 16MB. There are also size constraints on the micro-partitions that contain VARIANT columns, so in production you should aim to keep each record within a few MB. Rather than storing a giant JSON document as-is, it is better — both for query performance and for cost — to normalize the data with LATERAL FLATTEN and split it into separate tables.

Do queries against VARIANT columns perform as well as queries against relational columns?

Snowflake stores VARIANT data internally in a columnar format alongside metadata (type information and path information), so frequently accessed keys can be retrieved quickly. However, evaluating semi-structured paths via colon notation (:) or bracket notation ([]) does carry more overhead than scanning a relational column. The best practice is to pre-extract keys you frequently filter, JOIN, or aggregate on into relational columns using a VIEW or CTAS.

How do you unnest deeply nested arrays with FLATTEN?

LATERAL FLATTEN unnests a single level at a time, so for multi-level nesting you chain multiple LATERAL FLATTEN calls. For a 3-level structure like orders[].items[].tags[], you would write FROM raw, LATERAL FLATTEN(raw.data:orders) o, LATERAL FLATTEN(o.value:items) i, LATERAL FLATTEN(i.value:tags) t. Each FLATTEN result is passed to the next FLATTEN via its value column.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Snowflake Semi-Structured Data Processing