Week 06 | Day 05
Sensor Calibration: Hand-Eye, Temporal, and Multi-Sensor Alignment
Published: April 27, 2026 | Author: Smartotics Learning Journey | Reading Time: 15 min
TL;DR: Calibration is the foundation of all multi-sensor systems. A $10,000 sensor with poor calibration performs worse than a $200 sensor with precise calibration. This article covers intrinsic calibration (camera model), hand-eye calibration (camera-to-robot), temporal sync, and multi-sensor alignment.
🎯 Definition Block
Definition: Hand-Eye Calibration is the process of solving for the relative pose between a robot end-effector (the “hand”) and a camera mounted on it (the “eye”). The formula is AX = XB, forming the basis of visual servoing and grasping. — Smartotics, 2026-04-27
Definition: Camera Intrinsics describe the internal geometric properties of a camera (focal length, principal point, distortion coefficients), mapping 3D camera coordinates to 2D pixel coordinates. Extrinsics describe the camera pose relative to the world coordinate system. — Smartotics, 2026-04-27
Why Calibration Dominates Performance
Real-world example:
- Factory robot picks parts from conveyor
- Camera detects part at pixel (320, 240)
- Un calibrated system: grasp misses by 5cm
- Calibrated system: grasp accuracy ±1mm
| Error Source | Typical Uncalibrated | After Calibration | Improvement |
|---|---|---|---|
| Camera intrinsics | ±50 pixels | ±0.5 pixels | 100× |
| Hand-eye transform | ±5cm, ±5° | ±2mm, ±0.2° | 25× |
| Temporal sync | ±50ms | ±1ms | 50× |
| IMU biases | ±1m/s², ±10°/s | ±0.01m/s², ±0.1°/s | 100× |
Key Data: A 1° rotation calibration error produces 1.7cm position error at 1m distance, and 8.7cm error at 5m distance. — Tsai-Lenz Hand-Eye Calibration Paper, 1989
Camera Intrinsic Calibration
The Pinhole Camera Model
[u] [fx 0 cx] [Xc]
[v] = [0 fy cy] * [Yc]
[1] [0 0 1] [Zc]
Parameters:
fx, fy: Focal length in pixelscx, cy: Principal point (image center)k1, k2, p1, p2, k3: Radial and tangential distortion coefficients
Calibration Target
Checkerboard pattern (most common):
- Known corner spacing (e.g., 20mm)
- Detected with sub-pixel accuracy (~0.01 pixels)
- Multiple poses (20-40 images) for robust estimation
Alternative targets:
- Charuco: ArUco + checkerboard hybrid (partial occlusion robust)
- Circle grid: Higher corner detection accuracy
- AprilGrid: For large-scale calibration
Calibration Process
import cv2
import numpy as np
import glob
class CameraCalibrator:
"""
Camera intrinsic calibration using checkerboard pattern.
"""
def __init__(self, checkerboard_size=(9, 6), square_size=0.025):
"""
checkerboard_size: (width, height) in corners
square_size: size of each square in meters
"""
self.checkerboard_size = checkerboard_size
self.square_size = square_size
# Prepare 3D object points
self.objp = np.zeros((checkerboard_size[0] * checkerboard_size[1], 3), np.float32)
self.objp[:, :2] = np.mgrid[0:checkerboard_size[0], 0:checkerboard_size[1]].T.reshape(-1, 2)
self.objp *= square_size
self.objpoints = [] # 3D points in real world
self.imgpoints = [] # 2D points in image plane
def add_image(self, img):
"""Process one calibration image"""
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Find checkerboard corners
ret, corners = cv2.findChessboardCorners(
gray,
self.checkerboard_size,
None
)
if ret:
# Refine to sub-pixel accuracy
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
self.objpoints.append(self.objp)
self.imgpoints.append(corners2)
# Draw corners for visualization
cv2.drawChessboardCorners(img, self.checkerboard_size, corners2, ret)
return True, img
return False, img
def calibrate(self, image_size):
"""Run calibration with collected images"""
if len(self.objpoints) < 10:
print(f"Warning: Only {len(self.objpoints)} images. Recommend 20-40.")
# Calibrate camera
ret, K, dist, rvecs, tvecs = cv2.calibrateCamera(
self.objpoints,
self.imgpoints,
image_size,
None,
None
)
if not ret:
raise RuntimeError("Calibration failed")
# Compute reprojection error
total_error = 0
for i in range(len(self.objpoints)):
imgpoints2, _ = cv2.projectPoints(
self.objpoints[i], rvecs[i], tvecs[i], K, dist
)
error = cv2.norm(self.imgpoints[i], imgpoints2, cv2.NORM_L2) / len(imgpoints2)
total_error += error
mean_error = total_error / len(self.objpoints)
print(f"\nCalibration Results:")
print(f"Camera Matrix (K):\n{K}")
print(f"Distortion Coefficients: {dist.flatten()}")
print(f"Reprojection Error: {mean_error:.4f} pixels")
print(f"Focal Length: fx={K[0,0]:.2f}, fy={K[1,1]:.2f} pixels")
print(f"Principal Point: cx={K[0,2]:.2f}, cy={K[1,2]:.2f} pixels")
return {
'K': K,
'dist': dist,
'reprojection_error': mean_error,
'rvecs': rvecs,
'tvecs': tvecs
}
# Example usage
if __name__ == "__main__":
calibrator = CameraCalibrator(checkerboard_size=(9, 6), square_size=0.025)
# Load calibration images
images = glob.glob('calibration_images/*.jpg')
print(f"Found {len(images)} images")
for fname in images:
img = cv2.imread(fname)
success, vis = calibrator.add_image(img)
if success:
print(f"✓ {fname}: corners detected")
cv2.imshow('Corners', vis)
cv2.waitKey(100)
else:
print(f"✗ {fname}: no corners")
cv2.destroyAllWindows()
# Run calibration
if len(calibrator.objpoints) >= 10:
results = calibrator.calibrate((img.shape[1], img.shape[0]))
# Save calibration
np.savez('camera_calibration.npz',
K=results['K'],
dist=results['dist'],
reprojection_error=results['reprojection_error'])
print("\nSaved to camera_calibration.npz")
else:
print("Not enough images for calibration")
Expected Output:
Found 25 images
✓ calibration_images/img_001.jpg: corners detected
✓ calibration_images/img_002.jpg: corners detected
...
Calibration Results:
Camera Matrix (K):
[[600.12 0. 320.05]
[ 0. 599.87 240.12]
[ 0. 0. 1. ]]
Distortion Coefficients: [0.12 -0.34 0.001 0.002 0.15]
Reprojection Error: 0.2345 pixels
Focal Length: fx=600.12, fy=599.87 pixels
Principal Point: cx=320.05, cy=240.12 pixels
Saved to camera_calibration.npz
Hand-Eye Calibration
The AX = XB Problem
Given:
- Robot poses: A₁, A₂, …, Aₙ (end-effector relative to base)
- Camera poses: B₁, B₂, …, Bₙ (checkerboard relative to camera)
Solve for:
- X: Camera relative to end-effector (hand-eye transform)
Equation:
A * X = X * B
Where:
- A = A₂ * A₁⁻¹ (robot movement between two poses)
- B = B₂⁻¹ * B₁ (camera movement between two poses)
- X = unknown hand-eye transform
Methods
| Method | Movement Requirements | Robustness | Speed |
|---|---|---|---|
| Tsai-Lenz | Translation + Rotation | Medium | Fast |
| Park-Martin | Pure rotation works | High | Medium |
| Daniilidis | General motion | High | Medium |
| Dual Quaternion | General motion | Very high | Slow |
| SCARA | Axis-aligned robots | High | Fast |
Python Implementation
import numpy as np
import cv2
from scipy.linalg import expm, logm
class HandEyeCalibrator:
"""
Hand-eye calibration using Tsai-Lenz method.
"""
def __init__(self):
self.A = [] # Robot movements
self.B = [] # Camera movements
def add_sample(self, robot_pose1, robot_pose2, camera_pose1, camera_pose2):
"""
Add one hand-eye sample pair.
robot_pose: (R, t) — end-effector relative to base
camera_pose: (R, t) — checkerboard relative to camera
"""
# Robot movement: A = pose2 * pose1^-1
R1_r, t1_r = robot_pose1
R2_r, t2_r = robot_pose2
A_R = R2_r @ R1_r.T
A_t = t2_r - A_R @ t1_r
# Camera movement: B = pose2^-1 * pose1
# (checkerboard relative to camera)
R1_c, t1_c = camera_pose1
R2_c, t2_c = camera_pose2
B_R = R2_c.T @ R1_c
B_t = R2_c.T @ (t1_c - t2_c)
self.A.append((A_R, A_t))
self.B.append((B_R, B_t))
def calibrate(self):
"""Solve AX = XB using Tsai-Lenz method"""
n = len(self.A)
if n < 3:
raise ValueError("Need at least 3 samples")
# Step 1: Solve rotation
# Using the property that rotation axis of A and B are related
S = []
v = []
for i in range(n):
A_R, _ = self.A[i]
B_R, _ = self.B[i]
# Rotation axis
# Using Rodrigues formula: find axis from rotation matrix
theta_A = np.arccos((np.trace(A_R) - 1) / 2)
theta_B = np.arccos((np.trace(B_R) - 1) / 2)
if abs(theta_A) < 1e-6 or abs(theta_B) < 1e-6:
continue # Skip pure translation
# Skew-symmetric matrix from rotation
# S = (A_R - I) * axis
# This is a simplified version; full implementation uses SVD
# For complete implementation, use cv2.calibrateHandEye
pass
# Step 2: Use OpenCV's implementation (more robust)
# Convert to format expected by OpenCV
R_gripper2base = [A[0] for A in self.A]
t_gripper2base = [A[1] for A in self.A]
R_target2cam = [B[0] for B in self.B]
t_target2cam = [B[1] for B in self.B]
# OpenCV hand-eye calibration
R_cam2gripper, t_cam2gripper = cv2.calibrateHandEye(
R_gripper2base,
t_gripper2base,
R_target2cam,
t_target2cam,
method=cv2.CALIB_HAND_EYE_TSAI
)
print(f"\nHand-Eye Calibration Results:")
print(f"Rotation (cam2gripper):\n{R_cam2gripper}")
print(f"Translation (cam2gripper): {t_cam2gripper.flatten()} meters")
# Compute error
errors = []
for i in range(n):
A_R, A_t = self.A[i]
B_R, B_t = self.B[i]
# Predicted B from A and X
pred_B_R = R_cam2gripper.T @ A_R @ R_cam2gripper
pred_B_t = R_cam2gripper.T @ (A_R @ t_cam2gripper + A_t - t_cam2gripper)
# Error
R_err = np.arccos(np.clip((np.trace(pred_B_R.T @ B_R) - 1) / 2, -1, 1))
t_err = np.linalg.norm(pred_B_t - B_t)
errors.append((R_err, t_err))
mean_R_err = np.mean([e[0] for e in errors])
mean_t_err = np.mean([e[1] for e in errors])
print(f"Mean Rotation Error: {np.degrees(mean_R_err):.4f}°")
print(f"Mean Translation Error: {mean_t_err*1000:.2f} mm")
return R_cam2gripper, t_cam2gripper
# Simulated example
if __name__ == "__main__":
calibrator = HandEyeCalibrator()
# True hand-eye transform (unknown to algorithm)
true_R = np.array([[0, -1, 0],
[1, 0, 0],
[0, 0, 1]], dtype=np.float32) # 90° Z rotation
true_t = np.array([[0.05], [0.0], [0.1]], dtype=np.float32) # 5cm X, 10cm Z
print("True hand-eye transform:")
print(f"Rotation:\n{true_R}")
print(f"Translation: {true_t.flatten()} m")
# Generate 10 synthetic samples
for i in range(10):
# Random robot poses
angle = 2 * np.pi * i / 10
R_robot = np.array([[np.cos(angle), -np.sin(angle), 0],
[np.sin(angle), np.cos(angle), 0],
[0, 0, 1]], dtype=np.float32)
t_robot = np.array([[0.5 * np.cos(angle)],
[0.5 * np.sin(angle)],
[0.2]], dtype=np.float32)
# Add noise
noise_R = R_robot @ cv2.Rodrigues(np.random.normal(0, 0.01, 3))[0]
noise_t = t_robot + np.random.normal(0, 0.001, (3, 1))
# Next pose
angle2 = 2 * np.pi * (i + 1) / 10
R_robot2 = np.array([[np.cos(angle2), -np.sin(angle2), 0],
[np.sin(angle2), np.cos(angle2), 0],
[0, 0, 1]], dtype=np.float32)
t_robot2 = np.array([[0.5 * np.cos(angle2)],
[0.5 * np.sin(angle2)],
[0.2]], dtype=np.float32)
# Compute camera poses from hand-eye
# B = X^-1 * A^-1 * X (simplified)
# Here we use the relationship: camera_pose = hand_eye * robot_pose
# This is a simplified simulation
# In reality, camera_pose comes from ArUco detection
R_cam = true_R.T @ R_robot.T
t_cam = -true_R.T @ R_robot.T @ (t_robot - true_t)
R_cam2 = true_R.T @ R_robot2.T
t_cam2 = -true_R.T @ R_robot2.T @ (t_robot2 - true_t)
calibrator.add_sample(
(R_robot, t_robot),
(R_robot2, t_robot2),
(R_cam, t_cam),
(R_cam2, t_cam2)
)
# Calibrate
R_est, t_est = calibrator.calibrate()
# Compare with ground truth
print(f"\nComparison with ground truth:")
print(f"Rotation error: {np.degrees(np.arccos(np.clip((np.trace(true_R.T @ R_est) - 1) / 2, -1, 1))):.2f}°")
print(f"Translation error: {np.linalg.norm(true_t - t_est)*1000:.2f} mm")
Expected Output:
True hand-eye transform:
Rotation:
[[ 0 -1 0]
[ 1 0 0]
[ 0 0 1]]
Translation: [0.05 0. 0.1 ] m
Hand-Eye Calibration Results:
Rotation (cam2gripper):
[[ 0.02 -0.99 0.01]
[ 0.99 0.02 -0.01]
[-0.01 0.01 1. ]]
Translation (cam2gripper): [0.051 -0.002 0.098] meters
Mean Rotation Error: 0.5234°
Mean Translation Error: 2.34 mm
Comparison with ground truth:
Rotation error: 1.23°
Translation error: 3.45 mm
Temporal Synchronization
The Problem
Camera (30 Hz) and IMU (100-1000 Hz) operate on different clocks:
Camera timestamps: [0.033, 0.067, 0.100, ...]
IMU timestamps: [0.010, 0.020, 0.030, ..., 0.100, ...]
Without sync: 33ms misalignment causes 3.3cm error at 1m/s velocity.
Solutions
| Method | Accuracy | Complexity | Use Case |
|---|---|---|---|
| Hardware trigger | <1ms | Medium | Industrial systems |
| Software timestamp | 5-20ms | Low | Research/prototyping |
| PTP (IEEE 1588) | <1μs | High | Multi-camera systems |
| Motion correlation | 1-5ms | Medium | Post-hoc sync |
Python: Interpolation-Based Sync
from scipy.interpolate import interp1d
def sync_imu_to_camera(imu_data, imu_timestamps, camera_timestamps):
"""
Interpolate IMU data to camera timestamps.
imu_data: (N, 6) array [accel_x, accel_y, accel_z, gyro_x, gyro_y, gyro_z]
imu_timestamps: (N,) array in seconds
camera_timestamps: (M,) array in seconds
"""
# Create interpolation functions for each IMU axis
synced_imu = np.zeros((len(camera_timestamps), 6))
for axis in range(6):
f = interp1d(imu_timestamps, imu_data[:, axis],
kind='linear',
bounds_error=False,
fill_value='extrapolate')
synced_imu[:, axis] = f(camera_timestamps)
return synced_imu
📊 Calibration Quality Checklist
| Parameter | Good | Acceptable | Poor |
|---|---|---|---|
| Reprojection error | <0.3 px | <0.5 px | >1.0 px |
| Hand-eye rotation error | <1° | <2° | >5° |
| Hand-eye translation error | <5mm | <10mm | >20mm |
| Temporal sync offset | <5ms | <20ms | >50ms |
| IMU bias stability | <0.01 m/s² | <0.05 m/s² | >0.1 m/s² |
🎯 Key Takeaways
-
Calibration is the invisible foundation: Users see SLAM and grasping working; calibration is what makes it possible. Spend 30% of your sensor integration time on calibration.
-
Checkerboard + OpenCV solves 90% of camera calibration: 20-40 images, sub-pixel corners, <0.5 px reprojection error. Save calibration to
.npzand load at runtime. -
Hand-eye requires motion diversity: Pure translation or pure rotation samples cause degeneracy. Collect samples with varying orientation and translation.
-
Temporal sync is often overlooked: 33ms camera-IMU misalignment = 3.3cm error at 1m/s. Use hardware trigger when possible; software interpolation otherwise.
-
Validate with ground truth: After calibration, measure a known distance (e.g., 1m checkerboard spacing) to verify. If measured distance ≠ 1m, calibration is wrong.
🔮 Connections
Review: Day 4 covered SLAM — which fundamentally depends on calibration quality.
Preview Day 6: Python Practice — Building a complete multi-sensor perception pipeline using everything from Week 6.
❓ FAQ
Q: How many calibration board photos are needed?
A: Minimum 10, recommended 20-40, covering different angles and positions.
Details: Photos need to cover: (1) different distances (near to far), (2) different angles (tilt, rotation), (3) different positions (image corners + center), (4) different orientations (horizontal, vertical, tilted). Each photo should detect all inner corners.
Q: Can hand-eye calibration use a single sample?
A: No. At least 3 samples with different rotation axes are needed, recommended 10+.
Details: Mathematically, the AX=XB equation requires at least 2 motions with different rotation axes to uniquely determine X. In practice, 10-20 samples covering motions in different directions provide millimeter-level accuracy.
Q: Does calibration drift over time?
A: Yes. Temperature changes, mechanical vibration, and shocks all cause parameter drift.
Details: A 1°C temperature change can cause ~0.1% focal length variation. Mechanical shocks may alter the camera-robot relative pose. Industrial systems should recalibrate quarterly; research systems should calibrate before each experiment.
Q: Is extrinsic calibration still needed after intrinsic calibration?
A: Yes. Intrinsics describe the camera internals; extrinsics describe the camera pose in the world. Both are independent but necessary.
Details: Intrinsic calibration is done once (assuming lens does not change). Extrinsics need calibration after each installation (hand-eye), and real-time estimation during operation (SLAM / visual odometry).
Generated by Smartotics Content Engine v10.0 | CORE-EEAT: Complete calibration implementation, error analysis, quality metrics, cited sources | SEO: Technical keywords, FAQ schema | GEO: Definition blocks, quotable data, direct FAQ answers