Senior ML Platform Engineer

Company: Cubiq Recruitment

Location: London

Closing Date: 07/07/2026

Hours: Full Time

Type: Permanent

Apply Now

Job Description

Senior ML Platform Engineer

On-site Central London

About the company

A stealth-stage, founder-backed robotics company based in central London. The founders are experienced operators who built and exited a major UK business and are now self-funding this venture, so there is no VC pressure and no fundraising treadmill. The company is building dexterous manipulation robots for deployment into specific, high-value industrial environments, with a real customer wedge rather than lab demos. The team is small, technical, and moving fast.

The role

We are looking for a Senior ML Platform Engineer to own the software, web UI, and tooling layer of our platform: the day-to-day applications that data operators, ML engineers, and robotics engineers rely on to do their work. You will be the engineer who unblocks an entire team.

This is a broad, senior generalist role. You will own the operator-facing tooling end to end, and partner closely with the ML team on the data and training infrastructure that sits underneath it. Most of your collaboration will be with ML engineers and scientists, helping move data and tooling around the machine learning side of the business.

What you will own

The recording web application (FastAPI backend, vanilla-JS frontend) used by data operators for live robot data collection: real-time topic discovery, camera previews, episode lifecycle controls, and the recording state machine behind them
The episode-manager web application operators use to review, QA, and publish recorded episodes, including the embedded viewer, multi-tab review workflows, and the publishing pipeline
The CLI orchestration layer that drives interactive and headless recording workflows, including config validation, stale-data recovery, batch finalisation, archival, and the supporting services
Evolving the operator experience as new robots, sensors, and capture workflows come online
Strengthening the recording state machine and growing the test suites that protect it
Containerising and orchestrating these services so they deploy consistently across machines and lab environments

What you will collaborate on

With the ML team:

The dataset builder pipeline that converts raw episodes into production-grade datasets
Optimised dataset ingestion layers that feed training: high-throughput readers, prefetching, sharding, caching, and storage-format choices
Infrastructure to support large-scale, multi-node distributed model training: orchestration, configuration, reproducibility, and observability
Performance of CPU and IO-bound stages, including video encoding via ffmpeg and frame-level concurrency

With the robotics team:

The integration surface between the platform and the robots: how recording, replay, and inference tooling consume ROS2 topics, and where the ownership boundaries sit
Cross-cutting work that touches operator tooling, such as exposing new robot capabilities through the recording UI

Required

Foundations:

Senior-level professional software engineering experience, typically 5+ years, with a track record of owning non-trivial systems end to end in production
Exceptional software craftsmanship: clean, well-tested, well-documented Python as the default
Advanced Python: comfortable with typing, async/await, threading, and multiprocessing, and able to reason confidently about concurrent and mixed-IO workloads
Strong fundamentals in algorithms, data structures, and low-level systems concepts
Experience contributing to high-throughput data pipelines and/or distributed systems
Comfort orchestrating external CLI tooling from Python via subprocess, and reasoning about IO-bound performance
A degree in Computer Science, Engineering, Robotics, or a related field, or equivalent practical experience

Web, UI and tooling:

FastAPI, or an equivalent async Python web framework, used for production tooling backends
Vanilla JavaScript: DOM manipulation, fetch, WebSockets, and a willingness to extend a no-framework frontend without inflating it
A real eye for the operator experience: tools that are clear, fast, hard to misuse, and easy to debug
Strong CLI ergonomics: argparse / Click, YAML/JSON config validation, sensible subcommand and error-handling design

Engineering practice:

Continuous integration and automated testing
Containerisation (Docker) and reproducible build and runtime environments

Bonus (genuinely nice-to-have, not expected)

You do not need robotics experience for this role. The following are welcome but none are required:
Distributed training infrastructure (PyTorch DDP/FSDP, DeepSpeed) or the data-side concerns of large-model training: sharded datasets, streaming readers, prefetch tuning, GPU utilisation profiling
ROS2 experience: subscribers and publishers, QoS tuning, message types, DDS debugging
Experience with physical robot hardware in the loop
OpenCV, NumPy, pandas
Familiarity with the LeRobot dataset format
Cloud or HPC environments (AWS, GCP, Azure)
Light familiarity with PyTorch or JAX
Experience working alongside ML researchers and translating fast-moving requests into well-engineered tools

Logistics,

Central London, on-site five days a week
Permanent or contract considered
Meaningful founding equity

Apply Now

Share this job

Cubiq Recruitment

Useful Links

More Jobs in London
Full Time Jobs in London
Part Time Jobs in London
Management Jobs
Engineering Jobs
IT Infrastructure Jobs

Similar Jobs
Data Platform Lead Engineer
Islington
View Job
Platform Engineer (AI Infrastructure)
London
View Job
Data Platform Lead Engineer
London
View Job
Data Platform Lead Engineer
Edmonton
View Job
Data Platform Lead Engineer
Brent
View Job

Senior ML Platform Engineer

Job Description

Similar Jobs