Back to list
ストリートビデオから歩行者の経路を JSON 形式で抽出する
Extracting Pedestrian Trajectories from Street Video as JSON
Translated: 2026/3/15 2:10:16
Japanese Translation
Motivation
スマートフォン動画の映像から歩行者の経路(トラジェクトリー)を抽出する意義は、私の都市的な社会運動に関する研究において多様な目的を果たすものです。
GIS 準備データ:JSON 出力は地理情報システム(GIS)およびマッピングツールとシームレスに統合されます
費用効果の高いデータ収集:高価な GPS トレーカーまたは監視インフラの導入を不要とします
歩行者の行動理解:都市環境において人々が移動・互いにどのように相互作用するかを明らかにします
デモの反応測定:立ち上がりのデモが周囲の歩行者の流動に及ぼす影響を定量的に評価します
本プロジェクトはデモ監視の迅速な展開に emphasis(重点)を置いています。セットアップはスマートフォンと三脚だけで完了するため、出現する事象に対する迅速な対応が可能になります。
都市計画および輸送研究において、歩行者の移動パターンを理解することは、より安全で効率的な公共空間の設計に不可欠です。従来の方法(手動観察や GPS 追跡)には、被験者数の限界と高コストという制約があります。コンピュータビジョンは、ビデオ分析を通じてスケーラブルな代替手段を提供します。
本記事は、オープンソースツールを使用してストリートビデオの映像から歩行者の経路を抽出する方法を示します。YOLOX-Tiny をリアルタイムの人物検出に使用し、構造化された JSON 経路データを作成するためにカスタンのセントロイドベースのトラッカーを実装します。本プロジェクトで使用されたサンプル動画はスマートフォンで撮影され、セットアップは軽量で簡単にデプロイ可能とされました。
YOLOX-Tiny はリアルタイムの推論を最適化した軽量オブジェクト検出モデルです。本稿では、クロスプラットフォーム互換性のために ONNX エクスポートを使用しています。
検出パイプライン:
前処理:アスペクト比を維持するため Letterbox リーシング
推論:YOLOX モデルがフレームを処理する
後処理:検出を境界ボックスに変換する
フィルタリング:確実性閾値設定と非最大抑圧(NMS)
検出された人物をフレーム間で追跡するために、私はシンプルで効果的なセントロイドトラッカーを実装しました:
各検出の境界ボックス中心はセントロイドとなり、各フレーム間のセントロイドの一致によってトラッカーは維持されます。一致しなかった検出には新しいトラッカーが登録され、最大消失閾値を超えた場合はトラッカーは登録から削除されます。
視覚化のために、映像のトラッキング検出も実行されます。
各完全な経路に対して、以下のデータは抽出されます:
期間:人がトラッキングされた総時間
距離:移動した総ピクセル数
方向:度の単位で移動角
開始/終了位置:入口と出口の位置
画面上の脱出検出:人がフレームから画面外に出たか否か
# 必要なパッケージ
pip install opencv-python numpy onnxruntime
# YOLOX-Tiny ONNX モデルのダウンロード
# 元:https://github.com/Megvii-BaseDetection/YOLOX
def detect_persons(frame, session):
# フレームの前処理
blob, ratio = preprocess_yolox(frame, 416, 416)
# 推論の実行
output = session.run(None, {session.get_inputs()[0].name: blob})[0]
# 検出後の処理
# ... (確信度のフィルタリング、NMS 適用)
return boxes, confidences
フルセントロイドトラッカー実装
from collections import defaultdict
import numpy as np
class CentroidTracker:
"""
検出された境界ボックスをフレーム間で関連付けるためのセントロイドベースの追跡アルゴリズム。
セントロイドの追跡に加え、境界ボックスの足(歩行者が地面に触れる点)に基づいた経路も維持しており、移動分析においてより安定します。
"""
def __init__(self, max_disappeared=50):
self.next_object_id = 0
self.objects = {} # ID: (centroid_x, centroid_y)
self.disappeared = {} # ID: disappeared_frame_count
self.trajectories = defaultdict(list) # ID: [(x, y, frame), ...]
self.first_seen = {} # ID: first frame detected
self.last_seen = {} # ID: last frame detected
self.max_disappeared = max_disappeared
def register(self, centroid, foot_point, frame_num):
"""ユニークな ID を付与して新しいオブジェクトを登録する"""
self.objects[self.next_object_id] = centroid
self.disappeared
Original Content
Motivation
Why extract pedestrian trajectories from smartphone video footage? This approach serves multiple purposes in my research on urban social movements:
GIS-ready data: JSON output integrates seamlessly with geographic information systems and mapping tools
Cost-effective data collection: Eliminates the need for expensive GPS trackers or surveillance infrastructure
Understanding pedestrian behavior: Reveals how people move and interact in urban environments
Measuring protest reactions: Quantifies how standing demonstrations affect surrounding pedestrian flow
This project emphasizes rapid deployment for protest monitoring. The entire setup requires only a smartphone and tripod, enabling quick response to emerging events.
In urban planning and transportation studies, understanding pedestrian movement patterns is crucial for designing safer and more efficient public spaces. Previous methods like manual observation or GPS tracking have limitations in coverage and cost. Computer vision offers a scalable alternative through video analysis.
This article demonstrates how to extract pedestrian trajectories from street video footage using open-source tools. I'll use YOLOX-Tiny for real-time person detection and implement a custom centroid-based tracker to generate structured JSON trajectory data. The sample videos used in this project were captured on a smartphone, which keeps the setup lightweight and easy to deploy.
YOLOX-Tiny is a lightweight object detection model optimized for real-time inference. I use the ONNX export for cross-platform compatibility with OpenCV and ONNX Runtime.
The detection pipeline:
Preprocessing: Letterbox resizing to maintain aspect ratio
Inference: YOLOX model processes the frame
Postprocessing: Convert detections to bounding boxes
Filtering: Confidence thresholding and non-maximum suppression
For tracking detected persons across frames, I implement a simple but effective centroid tracker:
Each detection's bounding box center becomes a centroid
Tracks are maintained by matching centroids between frames
New tracks are registered for unmatched detections
Lost tracks are deregistered after a maximum disappearance threshold
For visualize, footage tracks are also detected
For each complete trajectory, I extract:
Duration: Total time the person was tracked
Distance: Total pixels traveled
Direction: Movement angle in degrees
Start/End positions: Entry and exit points
Screen exit detection: Whether the person left the frame
# Required packages
pip install opencv-python numpy onnxruntime
# Download YOLOX-Tiny ONNX model
# From: https://github.com/Megvii-BaseDetection/YOLOX
def detect_persons(frame, session):
# Preprocess frame
blob, ratio = preprocess_yolox(frame, 416, 416)
# Run inference
output = session.run(None, {session.get_inputs()[0].name: blob})[0]
# Postprocess detections
# ... (filter by confidence, apply NMS)
return boxes, confidences
Full CentroidTracker Implementation
from collections import defaultdict
import numpy as np
class CentroidTracker:
"""
centroid-based tracking algorithm for associating detected bounding boxes across frames.
In addition to tracking centroids, it also maintains trajectories based on the foot point of the bounding box (the point where the person touches the ground), which is more stable for movement analysis.
"""
def __init__(self, max_disappeared=50):
self.next_object_id = 0
self.objects = {} # ID: (centroid_x, centroid_y)
self.disappeared = {} # ID: disappeared_frame_count
self.trajectories = defaultdict(list) # ID: [(x, y, frame), ...]
self.first_seen = {} # ID: first frame detected
self.last_seen = {} # ID: last frame detected
self.max_disappeared = max_disappeared
def register(self, centroid, foot_point, frame_num):
"""register a new object with a unique ID"""
self.objects[self.next_object_id] = centroid
self.disappeared[self.next_object_id] = 0
self.trajectories[self.next_object_id].append(
(foot_point[0], foot_point[1], frame_num)
)
self.first_seen[self.next_object_id] = frame_num
self.last_seen[self.next_object_id] = frame_num
self.next_object_id += 1
def deregister(self, object_id):
"""deregister an object and remove it from tracking"""
del self.objects[object_id]
del self.disappeared[object_id]
def update(self, rects, frame_num):
"""
update the tracker with new bounding box detections
Args:
rects: the list of detected bounding boxes [(x1, y1, x2, y2), ...]
frame_num: the current frame number
Returns:
objects: a dictionary mapping object IDs to their current centroids {(cx, cy)}
"""
# when no detections are present, mark existing objects as disappeared
if len(rects) == 0:
for object_id in list(self.disappeared.keys()):
self.disappeared[object_id] += 1
if self.disappeared[object_id] > self.max_disappeared:
self.deregister(object_id)
return self.objects
# conpute centroids and foot points for the current detections
input_centroids = np.zeros((len(rects), 2), dtype="int")
input_feet = np.zeros((len(rects), 2), dtype="int")
for i, (x1, y1, x2, y2) in enumerate(rects):
cx = int((x1 + x2) / 2.0)
input_centroids[i] = (cx, int((y1 + y2) / 2.0))
input_feet[i] = (cx, y2) # foot point is the bottom center of the bounding box
# if no existing objects, register all input centroids
if len(self.objects) == 0:
for i in range(len(input_centroids)):
self.register(input_centroids[i], input_feet[i], frame_num)
# existing objects are present, match input centroids to existing object centroids
else:
object_ids = list(self.objects.keys())
object_centroids = list(self.objects.values())
# conpute distance matrix between existing object centroids and input centroids
D = np.zeros((len(object_centroids), len(input_centroids)))
for i, oc in enumerate(object_centroids):
for j, ic in enumerate(input_centroids):
D[i, j] = np.linalg.norm(oc - ic)
# find the smallest distance pairs (existing object to input centroid)
rows = D.min(axis=1).argsort()
cols = D.argmin(axis=1)[rows]
used_rows = set()
used_cols = set()
for (row, col) in zip(rows, cols):
if row in used_rows or col in used_cols:
continue
# when distance is lower than a threshold, consider it a match
if D[row, col] > 100: # if the distance is too large, ignore the match (this threshold can be tuned)
continue
object_id = object_ids[row]
self.objects[object_id] = input_centroids[col] # using the centroid for tracking
self.disappeared[object_id] = 0
self.trajectories[object_id].append(
(input_feet[col][0], input_feet[col][1], frame_num) # using the foot point for trajectory analysis
)
self.last_seen[object_id] = frame_num
used_rows.add(row)
used_cols.add(col)
# not matched existing objects
unused_rows = set(range(D.shape[0])) - used_rows
for row in unused_rows:
object_id = object_ids[row]
self.disappeared[object_id] += 1
if self.disappeared[object_id] > self.max_disappeared:
self.deregister(object_id)
# not matched input centroids
unused_cols = set(range(D.shape[1])) - used_cols
for col in unused_cols:
self.register(input_centroids[col], input_feet[col], frame_num)
return self.objects
The trajectory data is saved as structured JSON:
{
"video_name": "street_footage.mp4",
"fps": 30,
"resolution": "1920x1080",
"tracks": [
{
"id": 1,
"duration": 12.5,
"total_distance": 320.4,
"trajectory": [
{"x": 100, "y": 200, "frame": 10, "time_sec": 0.333},
{"x": 105, "y": 202, "frame": 11, "time_sec": 0.367}
],
"geometry": {
"type": "LineString",
"coordinates": [[100, 200], [105, 202]]
}
}
]
}
Processing a 5min video at 30 FPS typically yields:
The JSON output provides rich data for further analysis:
Spatial patterns of movement
Temporal distribution of pedestrian activity
Flow direction analysis
Cost-effective: Uses commodity hardware and free software
Scalable: Can process hours of footage automatically
Structured output: JSON format integrates with GIS and analysis tools
Real-time capable: YOLOX-Tiny enables live processing
The sample videos in this project were captured on a smartphone, but the same pipeline can be applied to fixed surveillance cameras for longer-term monitoring.
Processing demonstration videos from Shinbashi station revealed insights about centroid tracking performance and pedestrian behavior during protests:
Commuter indifference: In Japan, individual protests are uncommon, so commuters typically ignore demonstrators. Additionally, most people are busy office workers who tend to focus on their commute rather than noticing activities around them.
Camera height issues: Using a smartphone camera with a low tripod created unreliable detections. People near the camera appeared with unnatural up-and-down trajectories due to the low-angle perspective.
ID swapping during interactions: When pedestrians crossed paths or interacted closely, their tracking IDs would swap, creating fragmented trajectories for the same individuals.
Overall, the system successfully captured general movement patterns. Future improvements could include filtering trajectories with sudden angle changes after intersections or removing outliers based on historical movement differences.
Occlusion handling: Simple centroid tracking fails in crowds
Camera motion: Assumes static camera position
Identity persistence: No re-identification across camera cuts
Stopping behavior: People who stop moving in videos sometimes lose their tracking ID due to centroid distance thresholds, leading to fragmented trajectories (e.g., ID 5 → 110 → 430 as the same person gets re-detected with new IDs)
For crowded scenes, more sophisticated trackers like DeepSORT or ByteTrack would improve performance. Camera motion compensation using optical flow could extend applicability to moving platforms.
In this project, I prioritized spending time on analysis and visualization rather than implementing the most advanced tracking pipeline; that tradeoff made it easier to iterate quickly with real data.
This trajectory data serves as input for:
Urban planning: Identifying pedestrian flow bottlenecks
Safety analysis: Detecting high-risk crossing patterns
Traffic engineering: Optimizing signal timing
Accessibility studies: Understanding mobility patterns
The structured JSON format makes it easy to integrate with mapping libraries like MapLibre GL JS for visualization, as I'll explore in the next article.
By combining YOLOX-Tiny detection with centroid tracking, I can extract meaningful pedestrian trajectory data from video footage. The resulting JSON structure provides a foundation for spatial analysis of urban movement patterns. While the current implementation works well for moderate-density scenarios, future enhancements could address occlusion and camera motion challenges.
In the next article, I'll visualize these trajectories on an interactive map using MapLibre GL JS.
Download YOLOX-Tiny ONNX model
From: https://github.com/Megvii-BaseDetection/YOLOX
Centroid tracker
https://pyimagesearch.com/2025/07/14/people-tracker-with-yolov12-and-centroid-tracker/
my github project
https://github.com/TOKIHISA/people_trajectory_analysis/blob/main/src/detects_people.py