Projects

Selected robotics, embodied intelligence, and visual perception projects.

FluxDAgger

A Model-Decoupled DAgger Pipeline for Dual-Arm Robotic Manipulation

Rollout to takeover demo frame
Rollout → Takeover. A single DAgger episode with autonomous-to-human handover at 0:30.
FluxDAgger system architecture
Model-decoupled architecture. Policy and reward modules plug in via standardized ROS topics.
Baseline data collection system
Baseline collection system. Cameras + master arm (demonstration) + slave arm (execution) on the AgileX Piper platform.
End-to-end data flow
End-to-end data flow. Raw multi-modal capture → Parquet → MP4 / NumPy / takeover clips / reward annotations.
Timestamp synchronization
Timestamp synchronization. Per-camera image buffers matched to robot states inside a bounded sync window.
Original CAN-bus topology
Hardware — before. Original shared CAN-bus topology of the AgileX Piper four-arm platform.
Modified CAN-bus topology
Hardware — after. Per-arm dedicated USB-CAN interfaces enabling independent enable/mode/control.

TL;DR

Deploying DAgger on a real dual-arm robot is far more than a policy-inference problem — it tangles policy, hardware, teleoperation, multi-camera sync, and post-processing into a single brittle stack. FluxDAgger decouples these concerns behind a small set of ROS topics, so swapping the VLA model or the reward model never touches the collection logic.

Contributions

  • Model-decoupled architecture. Policy inference lives in an external project/service; the collector consumes a fixed action topic, so one DAgger workflow serves arbitrary VLA policies.
  • Human-in-the-loop DAgger loop. Autonomous rollout, online human takeover, and per-frame source tagging (rollout vs. correction) within a single episode, with multi-camera + joint-state timestamp alignment.
  • Reward-pluggable infrastructure. A Qwen3-VL reward module is integrated via an independent interface for both online ROS publishing and offline batch annotation, enabling reward-guided dataset filtering.

Method & Hardware

FluxDAgger is organized as a set of ROS Noetic nodes — camera, sync-observation, model-inference, DAgger controller, DAgger collector, and reward node — communicating through standardized topic interfaces. The policy and reward models become drop-in modules rather than first-class citizens of the collector. On the hardware side, the stock AgileX Piper platform shares one CAN bus across arms; FluxDAgger introduces a per-arm USB-CAN topology so each arm’s enable state, mode, and commands can be managed independently.

For interactive demo videos and the full system walkthrough, visit the project page.

Gesture Recognition on Horizon X3 Pi

Efficient gesture recognition and mobile-robot deployment (undergraduate thesis)

Undergraduate Thesis Edge AI YOLOv5s ROS2
Real-time gesture recognition. YOLOv5s on Horizon X3 Pi after optimization — 30 FPS detection with 0.36% mAP loss versus the GPU baseline.
Gesture-controlled mobile robot. Recognized gestures mapped to ROS2 motion commands driving the platform in real time.
Human tracking. Detection-driven person tracking module integrated with the mobile-robot motion-control node.

TL;DR

An undergraduate-thesis project on deploying efficient gesture recognition on the Horizon X3 Pi edge board and integrating it into a mobile-robot system. The work covers model selection, edge-side optimization, and ROS2 integration with simulation and real-robot validation.

Contributions

  • Edge-deployable detector. Trained YOLOv5s on the HaGRID static-gesture dataset, reaching 86.60% mAP, and benchmarked against the YOLO and DETR families.
  • Edge optimization. Optimized and deployed YOLOv5s on the Horizon X3 Pi, improving inference from 20 FPS to 30 FPS with only 0.36% mAP degradation.
  • Robot integration. Built ROS2 motion-control nodes and integrated gesture recognition with human tracking on the lab’s mobile robot.

Method & Platform

The pipeline combines a YOLOv5s detector (HaGRID-trained) with Horizon-toolchain-based quantization and graph optimization for the X3 Pi BPU. Recognized gestures are mapped to discrete motion primitives published as ROS2 Twist messages; a parallel human-tracking module shares the detection backbone to drive the platform’s follow behavior.