Careers · engineering
AI Infrastructure Engineer
Lead AI Infrastructure engagements — GPU clusters, model serving, MLOps, LLM observability. The work that makes production AI as boring as the rest of the stack.
Remote (India / EU-friendly hours) Full-time Senior
What you’d do
- Lead AI Infrastructure engagements. Discovery, architecture, hands-on delivery. The clients are mostly Series A–C teams putting their first ML or LLM workloads into production.
- Design GPU cluster topologies. EKS / GKE / AKS GPU node groups, NVIDIA operator, Spot vs. on-demand mix, multi-region resilience.
- Stand up serving runtimes. vLLM, TGI, Triton — picking, deploying, tuning, autoscaling. You know which flag matters and which is theatre.
- Build the MLOps platform. Argo Workflows, Kubeflow, MLflow — depending on the client’s existing stack. Paved-path templates so data scientists ship without filing platform tickets.
- Wire LLM observability. Langfuse / Phoenix / Helicone alongside the standard stack. Make hallucinations a debuggable artefact.
What we’re looking for
- 4+ years on production infrastructure, with at least 1 year on AI/ML workloads
- Deep Kubernetes — you’ve shipped GPU workloads, you know the failure modes
- Comfort with the modern serving stack (vLLM, TGI, etc.) and the trade-offs between them
- FinOps instincts — GPU spend is a different beast and clients hire us partly to control it
- Ability to write — blog posts, runbooks, deliverable documents
Nice to have
- Open-source contributions to vLLM, Kubeflow, Argo, Langfuse, or similar
- Prior experience scaling ML inference at meaningful traffic
- Background that includes both ops AND a real touch of ML (we don’t need a PhD; we do need someone who can pair with data scientists fluently)
Why CloudWizz
- A real AI infrastructure practice. Not “we tried it once.” Real engagements, real metrics, real on-call.
- AI-native workflow. AI does the routine; you do the senior judgment.
- Open-source aligned. Our work feeds back into the OSS we publish.
- Sustainable pace. No graveyard rotations.
How we hire
- 30-min intro call
- Technical conversation on a real (anonymized) AI infra problem
- Paid trial — 1–2 weeks on a real engagement with another engineer
- Offer