$ cat ~/projects/async-ml-inference.md

Async ML Inference System

internal
#WebSocket#Async Python#ML Inference#Robotics#Systems

Real-time ML inference using asynchronous WebSocket architecture — streaming multimodal data (images, tensors, text) for low-latency robot control.

Designed and built a high-performance async inference server that serves ML model predictions over WebSocket connections. The system handles streaming multimodal inputs — camera images, proprioceptive tensors, and language commands — with sub-10ms overhead for real-time robotic control applications.

// key_highlights

  • Async Python server with WebSocket-based client-server architecture
  • Streaming multimodal data pipeline (images + tensors + text) over persistent connections
  • Policy wrapper for serving VLA model inference in real-time control loops
  • Concurrent request handling with GPU-aware batching
  • Benchmarked latency and throughput under production-like robot workloads

This is proprietary work from my role at Agile Robots SE. Source code is not publicly available, but the write-up above describes the architecture and technical approach.