$ cat ~/projects/vision-action-model.md

Vision-Action Model for Robot Control

internal
#VLA Models#Imitation Learning#PyTorch#Robotics#Simulation

From perception to manipulation — a Vision-Language-Action model that converts visual understanding into precise robot actions for real-world manipulation tasks.

Extended the VLM into a full Vision-Language-Action architecture by attaching action-decoding heads that predict robot joint trajectories and gripper commands. The model converts high-level visual understanding and language instructions into executable robot actions, tested in both simulation and real hardware.

// key_highlights

  • Action prediction head architecture converting VLM embeddings to robot trajectories
  • Simulation-based evaluation pipeline with standard robotics benchmarks
  • Policy inference system optimized for real-time control loops
  • Zero-shot generalization across unseen manipulation tasks
  • Bridged the gap from research prototype to production deployment on robot hardware

This is proprietary work from my role at Agile Robots SE. Source code is not publicly available, but the write-up above describes the architecture and technical approach.