Senior ML Platform Engineer

Company:  Cubiq Recruitment
Location: London
Closing Date: 07/07/2026
Hours: Full Time
Type: Permanent

Job Description

Senior ML Platform Engineer

On-site Central London


About the company

A stealth-stage, founder-backed robotics company based in central London. The founders are experienced operators who built and exited a major UK business and are now self-funding this venture, so there is no VC pressure and no fundraising treadmill. The company is building dexterous manipulation robots for deployment into specific, high-value industrial environments, with a real customer wedge rather than lab demos. The team is small, technical, and moving fast.


The role

We are looking for a Senior ML Platform Engineer to own the software, web UI, and tooling layer of our platform: the day-to-day applications that data operators, ML engineers, and robotics engineers rely on to do their work. You will be the engineer who unblocks an entire team.

This is a broad, senior generalist role. You will own the operator-facing tooling end to end, and partner closely with the ML team on the data and training infrastructure that sits underneath it. Most of your collaboration will be with ML engineers and scientists, helping move data and tooling around the machine learning side of the business.


What you will own

  • The recording web application (FastAPI backend, vanilla-JS frontend) used by data operators for live robot data collection: real-time topic discovery, camera previews, episode lifecycle controls, and the recording state machine behind them
  • The episode-manager web application operators use to review, QA, and publish recorded episodes, including the embedded viewer, multi-tab review workflows, and the publishing pipeline
  • The CLI orchestration layer that drives interactive and headless recording workflows, including config validation, stale-data recovery, batch finalisation, archival, and the supporting services
  • Evolving the operator experience as new robots, sensors, and capture workflows come online
  • Strengthening the recording state machine and growing the test suites that protect it
  • Containerising and orchestrating these services so they deploy consistently across machines and lab environments


What you will collaborate on


With the ML team:

  • The dataset builder pipeline that converts raw episodes into production-grade datasets
  • Optimised dataset ingestion layers that feed training: high-throughput readers, prefetching, sharding, caching, and storage-format choices
  • Infrastructure to support large-scale, multi-node distributed model training: orchestration, configuration, reproducibility, and observability
  • Performance of CPU and IO-bound stages, including video encoding via ffmpeg and frame-level concurrency


With the robotics team:

  • The integration surface between the platform and the robots: how recording, replay, and inference tooling consume ROS2 topics, and where the ownership boundaries sit
  • Cross-cutting work that touches operator tooling, such as exposing new robot capabilities through the recording UI


Required


Foundations:

  • Senior-level professional software engineering experience, typically 5+ years, with a track record of owning non-trivial systems end to end in production
  • Exceptional software craftsmanship: clean, well-tested, well-documented Python as the default
  • Advanced Python: comfortable with typing, async/await, threading, and multiprocessing, and able to reason confidently about concurrent and mixed-IO workloads
  • Strong fundamentals in algorithms, data structures, and low-level systems concepts
  • Experience contributing to high-throughput data pipelines and/or distributed systems
  • Comfort orchestrating external CLI tooling from Python via subprocess, and reasoning about IO-bound performance
  • A degree in Computer Science, Engineering, Robotics, or a related field, or equivalent practical experience


Web, UI and tooling:

  • FastAPI, or an equivalent async Python web framework, used for production tooling backends
  • Vanilla JavaScript: DOM manipulation, fetch, WebSockets, and a willingness to extend a no-framework frontend without inflating it
  • A real eye for the operator experience: tools that are clear, fast, hard to misuse, and easy to debug
  • Strong CLI ergonomics: argparse / Click, YAML/JSON config validation, sensible subcommand and error-handling design


Engineering practice:

  • Continuous integration and automated testing
  • Containerisation (Docker) and reproducible build and runtime environments


Bonus (genuinely nice-to-have, not expected)

  • You do not need robotics experience for this role. The following are welcome but none are required:
  • Distributed training infrastructure (PyTorch DDP/FSDP, DeepSpeed) or the data-side concerns of large-model training: sharded datasets, streaming readers, prefetch tuning, GPU utilisation profiling
  • ROS2 experience: subscribers and publishers, QoS tuning, message types, DDS debugging
  • Experience with physical robot hardware in the loop
  • OpenCV, NumPy, pandas
  • Familiarity with the LeRobot dataset format
  • Cloud or HPC environments (AWS, GCP, Azure)
  • Light familiarity with PyTorch or JAX
  • Experience working alongside ML researchers and translating fast-moving requests into well-engineered tools


Logistics,

  • Central London, on-site five days a week
  • Permanent or contract considered
  • Meaningful founding equity
Apply Now
Share this job
Cubiq Recruitment
  • Similar Jobs

  • Data Platform Lead Engineer

    Islington
    View Job
  • Platform Engineer (AI Infrastructure)

    London
    View Job
  • Data Platform Lead Engineer

    London
    View Job
  • Data Platform Lead Engineer

    Edmonton
    View Job
  • Data Platform Lead Engineer

    Brent
    View Job
An unhandled error has occurred. Reload 🗙