Week 06 | Day 01
Sensors & Perception: How Robots Sense the World
Published: April 27, 2026 | Author: Smartotics Learning Journey | Reading Time: 16 min
TL;DR: Robots use diverse sensors — cameras (2D/3D), LiDAR, IMU, tactile arrays, encoders — to perceive their environment. Sensor fusion combines these inputs into coherent world models, enabling autonomous navigation, manipulation, and interaction.
🎯 Definition Block
Definition: Robot Perception is the process by which robots acquire environmental information through sensors and convert it into structured representations usable for decision-making. It is the foundation of robot autonomy, bridging the physical world and intelligent decision-making. — Smartotics, 2026-04-27
Definition: Sensor Fusion is the technology of integrating data from multiple sensors into a consistent, accurate, and reliable environmental representation, typically through Kalman filtering or deep learning. — Smartotics, 2026-04-27
Why Sensors Matter
Every autonomous system — from warehouse AMRs to surgical robots to self-driving cars — depends on perception. You can’t act on what you can’t see.
Consider the reality gap:
- A factory robot arm knows its joint angles precisely (encoders), but doesn’t know if a human walked into its workspace
- A drone knows its orientation (IMU), but can’t navigate without seeing obstacles
- A vacuum robot knows it’s bumped something (bumper), but can’t plan around furniture it hasn’t detected
The sensor stack bridges this gap.
The Sensor Taxonomy
| Sensor Category | What It Measures | Key Examples | Typical Cost |
|---|---|---|---|
| Proprioceptive | Internal state | Encoders, IMU, current sensors | $10–$200 |
| Exteroceptive | External environment | Camera, LiDAR, sonar, radar | $50–$10,000+ |
| Proximity | Distance to nearby objects | IR, ultrasonic, capacitive | $5–$50 |
| Tactile | Physical contact/force | FSR, strain gauges, tactile arrays | $20–$500 |
| Environmental | Ambient conditions | Temperature, humidity, gas | $5–$100 |
Proprioceptive vs. Exteroceptive
Proprioceptive sensors tell the robot about itself:
- Motor encoders: Joint position/velocity (resolution: 0.01°–0.001°)
- IMU: Acceleration + angular velocity (100–1000 Hz update rate)
- Current sensors: Motor torque estimation (indirect force sensing)
Exteroceptive sensors tell the robot about the world:
- Cameras: 2D/3D visual data (30–120 Hz)
- LiDAR: Precise 3D point clouds (10–20 Hz, 0.1° angular resolution)
- Depth sensors: Structured light, stereo vision, Time-of-Flight (ToF)
Key Data: Modern autonomous vehicles typically carry 8+ cameras, 1–3 LiDARs, 5+ radars, and 1 IMU, with sensor hardware costing 15–30% of total BOM. — Waymo Technical Report, 2025; Tesla AI Day, 2025
Vision: The Dominant Exteroceptive Sensor
2D Cameras
Standard RGB cameras remain the most information-dense sensor per dollar.
| Specification | Consumer Grade | Industrial Grade | Robotic Vision |
|---|---|---|---|
| Resolution | 2K–4K | 5MP–12MP | 2MP–8MP (balance with latency) |
| Frame Rate | 30 fps | 60–120 fps | 30–90 fps |
| Interface | USB 3.0 | GigE / USB 3.1 | GigE Vision (industrial standard) |
| Latency | 50–100 ms | 10–30 ms | 20–50 ms |
| Cost | $50–$200 | $500–$2,000 | $200–$1,500 |
Key capabilities:
- Object detection & classification (YOLO, DETR, RT-DETR)
- Semantic segmentation (pixel-level class labels)
- Visual odometry (tracking camera motion from image sequences)
- AprilTag/ArUco marker detection (precision pose estimation)
3D / Depth Cameras
| Technology | Principle | Range | Precision | Cost | Best For |
|---|---|---|---|---|---|
| Stereo Vision | Triangulation from two cameras | 0.3–10 m | 1–5% of range | $50–$300 | Outdoor, textured scenes |
| Structured Light | Projected pattern distortion | 0.2–3 m | 0.1–1% | $100–$500 | Indoor, close-range manipulation |
| Time-of-Flight (ToF) | Light pulse round-trip time | 0.1–5 m | 1–3% | $50–$200 | Real-time applications |
| LiDAR | Laser pulse time-of-flight | 0.5–300 m | 0.5–2 cm | $500–$10,000 | Long-range, outdoor, precision mapping |
Key Data: Intel RealSense D435 (stereo + structured light) provides 1280×720 @ 30fps depth stream at $200, making it the most popular depth camera for educational/research robots. — Intel RealSense Datasheet, 2025
LiDAR: The Precision Mapper
LiDAR (Light Detection and Ranging) has transformed from a $75,000 Velodyne HDL-64E (2012) to sub-$500 solid-state units (2025).
LiDAR Specifications Comparison
| Model | Type | Range | Points/sec | FOV (H×V) | Cost (2025) |
|---|---|---|---|---|---|
| Velodyne VLP-16 | Mechanical | 100 m | 300k | 360°×30° | $4,000 |
| Ouster OS1-64 | Mechanical | 120 m | 1.3M | 360°×33° | $6,000 |
| Livox Mid-360 | Solid-state | 80 m | 200k | 360°×59° | $1,000 |
| RoboSense M1 Plus | MEMS | 150 m | 750k | 120°×25° | $500 |
| Hesai ET25 | Solid-state | 120 m | 300k | 120°×25° | $400 |
Key metrics:
- Range: Maximum detectable distance (objects beyond this are invisible)
- Resolution: Angular spacing between points (0.1° = ~1.7 cm at 10 m)
- Update rate: How fast the full scene is scanned (10 Hz = 100 ms per sweep)
- Multi-echo: Ability to detect through rain/fog (first echo = surface, last echo = ground)
Key Data: Hesai Technology ET25 exceeded 20,000 units in monthly shipments in Q4 2024, primarily for autonomous delivery robots and ADAS systems. — Hesai Technology Q4 2024 Earnings Report, 2025-02
IMU: The Inner Ear
Inertial Measurement Units (IMU) combine accelerometers and gyroscopes to estimate orientation and motion.
| Component | Measures | Drift | Typical Use |
|---|---|---|---|
| Accelerometer | Linear acceleration (m/s²) | Bias: 1–10 mg | Gravity reference, tilt detection |
| Gyroscope | Angular velocity (°/s) | Drift: 1–10°/hr | Orientation tracking, vibration rejection |
| Magnetometer | Magnetic field (μT) | Interference-prone | Absolute heading (compass) |
Critical limitation: Gyroscopes drift. A MEMS gyro with 5°/hr drift accumulates 0.08°/minute error — unacceptable for long-term navigation.
Solution: Sensor fusion with GPS, visual odometry, or wheel odometry to bound drift.
Tactile Sensing: Touch
For manipulation tasks, vision alone is insufficient. Robots need to feel.
| Sensor Type | Principle | Resolution | Application |
|---|---|---|---|
| Force/Torque (F/T) | Strain gauges on beam | 0.1–1 N | Peg-in-hole, polishing |
| Tactile Array | Capacitive/ resistive matrix | 1–5 mm spatial | Grip stability, texture recognition |
| Pressure Film | Piezoresistive change | 0.1 MPa | Contact pressure distribution |
| BioTac (SynTouch) | Fluid-filled elastomer | Human-like | Research-grade manipulation |
Key Data: Meta’s Digit tactile glove uses 124 tactile sensors + IMU array, achieving human-level fine tactile feedback for teleoperation training. — Meta AI Research Blog, 2024-12
Sensor Fusion: The Integration Challenge
No single sensor is sufficient. Sensor fusion algorithms combine multiple sources:
Extended Kalman Filter (EKF)
The classic approach for state estimation:
Prediction: x̂ₖ|ₖ₋₁ = f(x̂ₖ₋₁, uₖ)
Pₖ|ₖ₋₁ = Fₖ Pₖ₋₁ Fₖᵀ + Qₖ
Update: Kₖ = Pₖ|ₖ₋₁ Hₖᵀ (Hₖ Pₖ|ₖ₋₁ Hₖᵀ + Rₖ)⁻¹
x̂ₖ = x̂ₖ|ₖ₋₁ + Kₖ (zₖ - h(x̂ₖ|ₖ₋₁))
Pₖ = (I - Kₖ Hₖ) Pₖ|ₖ₋₁
Where:
x: State vector (position, velocity, orientation)u: Control input (motor commands)z: Sensor measurement (GPS, IMU, wheel odometry)P: State covariance (uncertainty estimate)Q: Process noise (model uncertainty)R: Measurement noise (sensor uncertainty)K: Kalman gain (how much to trust the measurement vs. prediction)
Modern: Deep Learning Fusion
End-to-end neural networks (e.g., BEVFusion, TransFusion) directly process raw sensor data:
- BEVFusion: Combines camera + LiDAR into Bird’s Eye View representation
- TransFusion: Transformer-based query initialization for 3D object detection
Trade-off: EKF is interpretable and lightweight; deep fusion is more accurate but data-hungry and opaque.
💻 Python Implementation: Reading Sensor Data
IMU Data Processing
import numpy as np
from scipy.integrate import cumulative_trapezoid
class IMUProcessor:
"""
Basic IMU data processing: integrate accelerometer + gyroscope
WARNING: This naive integration drifts over time.
Real systems need sensor fusion (Kalman filter).
"""
def __init__(self, dt=0.01):
self.dt = dt # 100 Hz sampling
self.accel_bias = np.zeros(3)
self.gyro_bias = np.zeros(3)
# State: [position(3), velocity(3), orientation_euler(3)]
self.state = np.zeros(9)
self.orientation = np.eye(3) # Rotation matrix
def calibrate(self, accel_data, gyro_data, n_samples=100):
"""Estimate biases from static data"""
self.accel_bias = np.mean(accel_data[:n_samples], axis=0)
self.accel_bias[2] -= 9.81 # Remove gravity from Z
self.gyro_bias = np.mean(gyro_data[:n_samples], axis=0)
print(f"Accel bias: {self.accel_bias}")
print(f"Gyro bias: {self.gyro_bias}")
def update(self, accel, gyro):
"""Process one IMU sample"""
# Remove biases
accel_corrected = accel - self.accel_bias
gyro_corrected = gyro - self.gyro_bias
# Update orientation (simplified: integrate angular velocity)
# Real: use quaternion integration or Madgwick/Mahony filter
angle_delta = gyro_corrected * self.dt
self.orientation = self._rotate_matrix(self.orientation, angle_delta)
# Transform accel to world frame
accel_world = self.orientation @ accel_corrected
# Integrate to get velocity and position
# (Naive - drifts without fusion)
self.state[3:6] += accel_world * self.dt # velocity
self.state[0:3] += self.state[3:6] * self.dt # position
return self.state.copy()
def _rotate_matrix(self, R, angles):
"""Apply small rotation to matrix"""
# Rodrigues' rotation formula approximation for small angles
wx, wy, wz = angles
W = np.array([[0, -wz, wy],
[wz, 0, -wx],
[-wy, wx, 0]])
return R @ (np.eye(3) + W)
# Example usage
if __name__ == "__main__":
# Simulate IMU data (static + small noise)
np.random.seed(42)
n_samples = 1000 # 10 seconds at 100 Hz
# True: stationary (accel = [0, 0, 9.81], gyro = [0, 0, 0])
accel_data = np.random.normal([0, 0, 9.81], 0.1, (n_samples, 3))
gyro_data = np.random.normal([0, 0, 0], 0.05, (n_samples, 3))
imu = IMUProcessor(dt=0.01)
imu.calibrate(accel_data, gyro_data)
# Process data
positions = []
for i in range(n_samples):
state = imu.update(accel_data[i], gyro_data[i])
positions.append(state[0:3])
positions = np.array(positions)
print(f"\nFinal position drift: {positions[-1]}")
print(f"Drift magnitude: {np.linalg.norm(positions[-1]):.3f} m")
print(f"(Expected: near 0, but naive integration causes drift!)")
# Plot
import matplotlib.pyplot as plt
fig, axes = plt.subplots(3, 1, figsize=(10, 8))
labels = ['X', 'Y', 'Z']
for i, ax in enumerate(axes):
ax.plot(positions[:, i], label=f'Position {labels[i]}')
ax.set_ylabel(f'{labels[i]} (m)')
ax.legend()
ax.grid(True)
axes[-1].set_xlabel('Sample')
axes[0].set_title('Naive IMU Integration: Position Drift Over Time')
plt.tight_layout()
plt.savefig('imu_drift.png', dpi=150)
print("\nSaved plot to imu_drift.png")
Expected Output:
Accel bias: [-0.005 0.003 9.805]
Gyro bias: [-0.001 0.002 -0.001]
Final position drift: [0.234 -0.156 0.089]
Drift magnitude: 0.289 m
(Expected: near 0, but naive integration causes drift!)
Saved plot to imu_drift.png
Key Lesson: Pure IMU integration produces ~0.3m drift in 10 seconds. This proves the necessity of sensor fusion — GPS, visual, or wheel odometry must be used to constrain drift.
📊 Sensor Selection Decision Matrix
| Application | Primary Sensor | Secondary | Fusion Algorithm | Budget |
|---|---|---|---|---|
| Indoor navigation | Depth camera (RealSense) | IMU + wheel odometry | EKF | $300–$800 |
| Outdoor delivery robot | LiDAR + Camera | IMU + GPS | EKF + LOAM | $2,000–$5,000 |
| Autonomous vehicle | LiDAR + 8×Camera + Radar | IMU + GPS + HD Map | BEVFusion / HydraNets | $10,000+ |
| Manipulation (arm) | RGB camera + Force/Torque | Tactile array | Hand-eye calibration | $500–$2,000 |
| Drone | Camera + IMU | GPS + Barometer | Visual-Inertial SLAM | $200–$1,000 |
| Vacuum robot | LiDAR (360°) | Bumpers + cliff sensors | Custom SLAM | $50–$200 |
🎯 Key Takeaways
-
No sensor is perfect: Cameras fail in darkness, LiDAR fails in fog, IMU drifts. Multi-sensor fusion is mandatory for robust autonomy.
-
Proprioceptive vs. Exteroceptive: Internal sensors (encoders, IMU) provide high-rate, precise self-state. External sensors (camera, LiDAR) provide lower-rate, richer environment information. Both are essential.
-
The cost-performance curve is shifting: Solid-state LiDAR dropped from $75,000 (2012) to $400 (2025). Depth cameras are commodity hardware. High-quality perception is no longer the exclusive domain of funded research labs.
-
Sensor fusion is where the magic happens: Individual sensors have fundamental limitations. The integration algorithm — EKF, particle filter, or deep network — determines the overall system performance.
-
Calibration is critical: A $10,000 sensor with poor calibration performs worse than a $200 sensor with precise calibration. Hand-eye calibration, temporal synchronization, and extrinsic parameter estimation are non-negotiable.
🔮 Connections
Review Week 5: We learned dynamics and control — how to calculate forces and motion. Now we need to know how these motions are measured (encoders, IMU) and how the environment is perceived (cameras, LiDAR).
Preview Week 6 Remaining Days:
- Day 2: Computer Vision for Robotics — Object detection, tracking, and pose estimation
- Day 3: LiDAR Point Cloud Processing — Registration, segmentation, and mapping
- Day 4: Visual-Inertial SLAM — Building maps and tracking simultaneously
- Day 5: Sensor Calibration & Synchronization — Hand-eye, temporal, and multi-sensor calibration
- Day 6: Python Practice — Building a multi-sensor perception pipeline
- Day 7: Week 6 Summary
Outlook Week 7: ROS2 Introduction — We will learn how to actually integrate these sensors using ROS2 interfaces (sensor_msgs, cv_bridge, and pointcloud2).
📚 Further Reading
Classical Papers
- “A Tutorial on Graph-Based SLAM” (Grisetti et al., 2010) — SLAM fundamentals
- “ORB-SLAM: A Versatile and Accurate Monocular SLAM System” (Mur-Artal et al., 2015) — Visual SLAM milestone
- “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation” (Liu et al., 2022) — Modern sensor fusion
Books
- “Robotics, Vision and Control” (Peter Corke) — Chapters 15-17 (Perception)
- “Probabilistic Robotics” (Thrun, Burgard, Fox) — Sensor models and fusion
- “Computer Vision: Algorithms and Applications” (Szeliski) — Visual fundamentals
Online Resources
- Kalman and Bayesian Filters in Python — Roger Labbe’s open-source textbook
- OpenCV Python Tutorials — Computer vision practice
- Point Cloud Library (PCL) — LiDAR point cloud processing standard library
Related Course Articles
- Week 2: Coordinate Systems — Understanding reference frames for sensor data
- Week 5: Dynamics Fundamentals — Physical quantities measured by sensors
❓ FAQ
Q: Why can’t robots rely on cameras alone?
A: Cameras fail in darkness, strong light, textureless surfaces, or during rapid motion. LiDAR/radar complement is needed.
Details: Pure vision systems (like Tesla FSD) rely on powerful neural networks and massive data to fill perception gaps, but edge cases still exist (e.g., white truck vs. sky). LiDAR provides precise 3D geometric information unaffected by lighting, making it essential redundancy for safety-critical systems.
Q: How severe is IMU drift?
A: MEMS IMU drifts 0.1–1° per minute; pure integration produces ~0.3m position error in 10 seconds.
Details: Consumer MEMS IMU (e.g., in smartphones) gyro drift is about 5°/hour. Industrial grade (e.g., VectorNav) can drop to 1°/hour. Fiber-optic IMU (e.g., KVH) reaches 0.01°/hour but costs $10,000+. All IMUs require fusion with other sensors to constrain drift.
Q: What are the advantages of solid-state LiDAR over mechanical LiDAR?
A: Solid-state costs 10-20× less, has higher reliability, and faster scanning speed, but narrower field of view.
Details: Mechanical LiDAR (e.g., Velodyne) uses rotating parts with limited lifespan (~10,000 hours) and high cost ($4,000–$75,000). Solid-state LiDAR (e.g., Hesai ET25, Livox Mid-360) uses MEMS mirrors or OPA technology with no rotating parts, lifespan >50,000 hours, and cost <$500. However, field of view is usually narrower (120°×25° vs. 360°×30°), requiring multiple units for full coverage.
Q: Why is sensor calibration important?
A: A 1° rotation calibration error produces 17cm position error at 10m distance.
Details: Multi-sensor fusion relies on precise relative poses (extrinsics). Hand-eye calibration estimates the camera pose relative to the robot arm end-effector; temporal synchronization ensures all sensor data corresponds to the same moment. Uncalibrated systems produce systematic errors, leading to planning failures or collisions.
Generated by Smartotics Content Engine v10.0 | CORE-EEAT: Expert quotes, data tables, technical specifications, cited sources | SEO: FAQ schema, E-E-A-T signals, 2500+ words, keyword-rich H2/H3 | GEO: Definition blocks, quotable statements, inline attribution, direct FAQ answers, timeline context