Vision AI / Video Intelligence 完全ガイド｜画像・動画解析 API (GCP)

GCP の画像・動画 AI は 3 つのサービスから構成されます — Cloud Vision API (画像分析)、Video Intelligence API (動画分析)、Vertex AI Vision (カスタム + Edge)。本記事ではそれぞれの機能、使い分け、AWS / Azure 比較をまとめます。

Cloud Vision API 機能

機能	用途
Label Detection	物体・概念ラベリング
Object Detection / Localization	バウンディングボックス検出
OCR (Text Detection)	テキスト抽出 (日本語対応)
Document Text Detection	ドキュメント OCR (Document AI 推奨)
Face Detection	顔位置・感情・属性 (識別ではない)
Landmark Detection	世界の有名スポット認識
Logo Detection	企業ロゴ認識
SafeSearch	不適切コンテンツ判定
Web Detection	類似画像 Web 検索
Image Properties	主要色 / クロップヒント
Product Search	商品画像検索

Vision API 利用例

from google.cloud import vision

client = vision.ImageAnnotatorClient()
image = vision.Image()
image.source.image_uri = "gs://my-bucket/photo.jpg"

# 複数機能を 1 リクエストで
features = [
    vision.Feature(type_=vision.Feature.Type.LABEL_DETECTION, max_results=10),
    vision.Feature(type_=vision.Feature.Type.TEXT_DETECTION),
    vision.Feature(type_=vision.Feature.Type.SAFE_SEARCH_DETECTION),
]
request = vision.AnnotateImageRequest(image=image, features=features)
response = client.annotate_image(request=request)

for label in response.label_annotations:
    print(f"{label.description}: {label.score:.2f}")
print("OCR:", response.text_annotations[0].description)
print("Adult:", response.safe_search_annotation.adult.name)

Video Intelligence API 機能

Label Detection: 動画全体 / シーン / フレームラベル
Shot Change Detection: シーン変化点
Object Tracking: 動画内物体追跡
Text Detection: 動画内テキスト OCR
Logo Detection: ブランドロゴ追跡
Explicit Content: 不適切コンテンツ検出
Speech Transcription: 動画内会話書き起こし
Person Detection: 人物検出 + 姿勢推定
Face Detection: 顔検出 (識別ではない)

Vertex AI Vision

カスタムモデル学習 (画像分類 / 物体検出 / OCR)
Stream Ingestion (RTSP / Pub/Sub から動画ストリーム取り込み)
People Counting / Occupancy Analytics (店舗・施設用)
Person/Vehicle Detection (監視カメラ)
IoT Edge デプロイ (Coral / Jetson)
BigQuery 連携で大規模分析

料金例

サービス	料金
Vision API (Label / OCR)	$1.50/1000 unit
Vision API 無料枠	月 1000 unit / 機能
Video Intelligence Label	$0.10/分
Video Intelligence Speech	$0.048/分
Vertex AI Vision (カスタム学習)	$3.15/h
Vertex AI Vision Stream	$0.30/h (ストリーム数)

他クラウド比較

項目	Vision AI	AWS Rekognition	Azure Computer Vision
OCR (日本語)	◎	○ (英語中心)	○
顔識別	—	◎	○
動画分析	Video Intelligence	Video Analysis	Video Indexer
カスタム学習	AutoML Vision / Vertex	Custom Labels	Custom Vision
料金 (Label)	$1.50/1k	$1/1k	$1/1k

典型的なユースケース

EC: 商品画像の自動タグ付け + 不適切画像ブロック
製造: 検品 (Vertex AI Vision カスタムモデル)
リテール: 店舗内人流分析 (Vertex AI Vision Occupancy)
メディア: 動画コンテンツのモデレーション + メタデータ生成
ロジスティクス: 配送ラベル OCR
セキュリティ: 監視カメラの異常検知

Vision AI / Cloud Vision API とは？

画像から物体・顔・テキスト・ロゴ・ランドマーク・不適切コンテンツ等を検出する事前学習済み API。1 リクエストで複数機能を組み合わせ可能。

Video Intelligence API は何ができる？

動画からシーン変化、物体追跡、テキスト、ロゴ、不適切コンテンツ、人物検出、スピーチ転写を抽出。CDN 連携で大規模動画処理可能。

Vertex AI Vision との違いは？

Vision API = 事前学習 API、Vertex AI Vision = カスタムモデル学習 + IoT デプロイ。リアルタイム監視や Edge 推論には Vertex AI Vision。

Imagen との関係は？

Vision API = 既存画像の分析、Imagen = テキストから画像生成。別物だが組み合わせ可能 (例: 商品画像生成 → Vision API で品質チェック)。

料金体系は？

Vision API: $1.50/1000 unit (機能ごと)、月 1000 unit 無料。Video Intelligence: $0.10/分 (Label Detection)、機能多岐。

AWS Rekognition / Azure Computer Vision 比較は？

3 つとも機能ほぼ同等。Vision AI は OCR の日本語精度・Web Detection・SafeSearch が優位。Rekognition は顔認識が強い。

AutoML Vision で何ができる？

業界固有の画像分類・物体検出モデルをノーコードで学習。Vision API では検出できない独自カテゴリ (自社商品分類等) に対応。

Edge デプロイは可能？

Vertex AI Vision で IoT Edge デバイスにモデルデプロイ可。AutoML Vision Edge で軽量モデル生成 (TensorFlow Lite / Core ML / TPU)。

この記事で学んだ内容を問題で確認しましょう

16,000問以上の問題で実力チェック

GCP 試験対策ページを見る

この記事の著者

NicheeLab編集部

データエンジニアリング・クラウド資格の専門家。Databricks・Snowflake等の認定資格を保有し、実務経験に基づいた問題作成・解説を行っています。NicheeLab運営。

Vision AI / Video Intelligence 完全ガイド｜画像・動画解析 API