$ cat ~/projects/async-ml-inference.md
Async ML Inference System
internal#WebSocket#Async Python#ML Inference#Robotics#Systems
Real-time ML inference using asynchronous WebSocket architecture — streaming multimodal data (images, tensors, text) for low-latency robot control.
Designed and built a high-performance async inference server that serves ML model predictions over WebSocket connections. The system handles streaming multimodal inputs — camera images, proprioceptive tensors, and language commands — with sub-10ms overhead for real-time robotic control applications.
// key_highlights
- ▸Async Python server with WebSocket-based client-server architecture
- ▸Streaming multimodal data pipeline (images + tensors + text) over persistent connections
- ▸Policy wrapper for serving VLA model inference in real-time control loops
- ▸Concurrent request handling with GPU-aware batching
- ▸Benchmarked latency and throughput under production-like robot workloads
This is proprietary work from my role at Agile Robots SE. Source code is not publicly available, but the write-up above describes the architecture and technical approach.