$ cat ~/projects/async-ml-inference.md

Async ML Inference System

internal

#WebSocket#Async Python#ML Inference#Robotics#Systems

Real-time ML inference using asynchronous WebSocket architecture — streaming multimodal data (images, tensors, text) for low-latency robot control.

Designed and built a high-performance async inference server that serves ML model predictions over WebSocket connections. The system handles streaming multimodal inputs — camera images, proprioceptive tensors, and language commands — with sub-10ms overhead for real-time robotic control applications.

// key_highlights

▸Async Python server with WebSocket-based client-server architecture
▸Streaming multimodal data pipeline (images + tensors + text) over persistent connections
▸Policy wrapper for serving VLA model inference in real-time control loops
▸Concurrent request handling with GPU-aware batching
▸Benchmarked latency and throughput under production-like robot workloads

This is proprietary work from my role at Agile Robots SE. Source code is not publicly available, but the write-up above describes the architecture and technical approach.