Hcompany

Member of technical staff (Inference)

London, UK Posted 2026-04-10

Apply on company site → View on Signal →

About this role

About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential. H is hiring the world’s best AI talent, seeking those who are dedicated as much to building safely and responsibly as to advancing disruptive agentic capabilities. We promote a mindset of openness, learning, and collaboration, where everyone has something to contribute. About the Team: The Inference team develops and enhances the inference stack for serving H-models that power our agent technology. The team focuses on optimizing hardware utilization to reach high throughput, low latency and cost efficiency in order to deliver a seamless user experience. Key Responsibilities: - Develop scalable, low-latency and cost effective inference pipelines - Optimize model performance: memory usage, throughput, and latency, using advanced techniques like distributed computing, model compression, quantization and caching mechanisms - Develop specialized GPU kernels for performance-critical tasks like attention mechanisms, matrix multiplications, etc. - Collaborate with H research teams on model architectures to enhance efficiency during inference - Review state-of-the-art papers to improve memory usage, throughput and latency (Flash attention, Paged Attention, Continuous batching, etc.) - Prioritize and implement state-of-the-art inference techniques Requirements: - Technical skills: - MS or PhD in Computer Science, Machine Learning or related fields - Proficient in at least one of the following programming languages: Python, Rust or C/C++ - Experience in GPU programming such as CUDA, Open AI Triton, Metal, etc. - Experience in model compression and quantization techniques - Soft skills - Collaborative mindset, thriving in dynamic, multidisciplinary teams - Strong communication and presentation skills - Eager to explore new challenges - Bonuses: - Experience with LLM serving frameworks such as vLLM, TensorRT-LLM, SGLang, llama.cpp, etc. - Experience with CUDA kernel programming and NCCL - Experience in deep learning inference framework (Pytorch/execuTorch, ONNX Runtime, GGML, etc.) Location: - Paris or London. - This role is hybrid, and you are expected to be in the office 3 days a week on average. - The final decision for this will lie with the hiring manager for each individual role What We Offer: - Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startups - Collaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environment - Enjoy a competitive salary - Unlock opportunities for professional growth, continuous learning, and career development If you want to change the status quo in AI, join us.

Tech stack

PythonRustLLMC++

About Hcompany

Hcompany is hiring for the member of technical staff (inference) role. Signal aggregates active openings directly from Hcompany's applicant tracking system, so this listing is current.