ATS-TestedFree + edit in builder

ML engineer resume examples

Full-length ML-engineer resumes across the LLM and classical-ML eras. Each leads with the model + the system around it, names latency-per-token and eval metrics in real units, and surfaces the production MLOps work hiring panels grade on.

ByTomás Albrecht·Senior Resume Writer·Reviewed byDaniel Ortega· Head of Writing·1 example

ML engineer hiring grades on three axes: model (which models has the candidate served or trained, at what scale), evidence (which production metrics — latency, throughput, cost, accuracy — moved in real units), and discipline (does the candidate operate ML like a production system with evals + monitoring, or like a research project with one-off notebooks). The resumes on this page are written for those axes.

This matters because ML engineering split into two largely distinct roles post-2023: classical ML (tabular models, computer vision, traditional MLOps) and LLM engineering (serving, fine-tuning, evals, RAG). The roles share a Python + PyTorch foundation but the day-to-day work and the hiring vocabulary diverge sharply. Senior LLM-engineering panels read for vLLM / Triton / Modal / Ray fluency, eval discipline, and serving-cost economics. Senior classical-ML panels read for feature-store + drift-detection + experimentation depth.

For entry-level candidates, the structure mirrors the senior one with smaller scope. A substantial side project — a fine-tuned model with public evals, a deployed RAG system with real users, a Kaggle medal-class submission with an open writeup — is high-leverage. Capable side-project work in ML carries unusual weight because the field is moving so fast.

For senior and staff candidates, the structure widens. The summary names the model and system. Bullets quantify in LLM-era units (ms/token, tokens/sec/GPU, cost-per-1k-tokens) or classical-ML units (precision, recall, AUC, business-metric delta). The bottom third reserves space for capability proof — vLLM / Transformers / PyTorch contributions, NeurIPS or ICML papers, NeurIPS-workshop talks, or shipped open-source models with measurable adoption.

The example

Anjali Subramaniam

Senior LLM Engineer · vLLM + Fine-tuning + Evals · 4M req/day fleet
Bengaluru·[email protected]·+91 98 555 0381·github.com/asubramaniam·huggingface.co/asubramaniam·linkedin.com/in/asubramaniam

Summary

Senior LLM engineer with six years of ML + two years on production LLM systems. Owns the inference request router for the Llama-3.1-70B-Instruct fleet at a Series C AI company (4M req/day, 4xH100 nodes). Cut p99 ms/token from 84ms to 28ms via vLLM continuous batching + speculative decoding. Two merged PRs to vllm-project/vllm; NeurIPS 2024 workshop coauthor.

Skills

Serving + Inference
vLLM (continuous batching, paged-attention)SGLangTriton Inference ServerSpeculative decoding
Training + Fine-tuning
PyTorch 2.x (torch.compile)Transformers + AccelerateLoRA / QLoRA / DPODeepSpeed + FSDPAxolotl + Unsloth
Evals + Ops
lm-evaluation-harnessCustom rubric evals (LLM-as-judge + calibration)Weights & BiasesDrift detection (NannyML / Evidently)Hugging Face Hub

Experience

Senior LLM Engineer
Quill AI · Remote (Bengaluru, IN)
Feb 2023Present
  • Cut p99 ms/token on the Llama-3.1-70B inference path from 84ms to 28ms by migrating from HF Transformers serving to vLLM with continuous batching + paged-attention; speculative decoding with a Llama-3.1-8B draft model accounted for the last 12ms.
  • Reduced cost-per-1k-tokens on the inference fleet from $0.42 to $0.18 through H100 → H200 migration (1.4× throughput) + continuous batching tuning + prefix caching for long-system-prompt workloads.
  • Built the team's eval harness in lm-evaluation-harness + custom rubric tasks (n=480 prompts, 4-judge ensemble with calibration); held post-fine-tune model to ≥ base + 6pp on the internal benchmark across 4 quarterly releases.
  • Fine-tuned Llama-3.1-8B on a 38k-row internal support-ticket dataset (LoRA r=16, alpha=32, 3 epochs); resulting model beat GPT-4o-mini on internal eval by 14pp at 1/12th per-token cost.
  • Shipped a prompt-injection detection layer (Llama-3.1-8B guard + heuristic prefilter); detected 98.4% of injection attempts with 0.4% FP rate on benign traffic.
ML Engineer
Hugging Face · Paris / Remote
Sep 2020Jan 2023
  • Built the team's RAG pipeline (pgvector + sentence-transformers + hybrid retriever); Recall@5 62% → 87% via BM25 + dense + LLM rerank.
  • Migrated the eval pipeline from notebook-based to CI-gated harness; every model release now runs against 480 prompts in 38 min; deploy gated on regression < 2pp.
  • Mentored 2 junior ML engineers transitioning from classical ML into LLM-engineering focus; both shipped sole-owner production LLM systems within 6 months.
Machine Learning Engineer
Razorpay · Bengaluru, IN
Jun 2018Aug 2020
  • Built the merchant-fraud detection model (XGBoost on 38M historic transactions); precision @ recall=0.80 went from 0.62 to 0.84 on the post-launch holdout.

Open Source & Publications

vllm-project/vllm
Contributor (2 merged PRs)

Two merged PRs to vLLM — one extended the continuous-batching scheduler for prefix-caching across long-prefix workloads; one closed a memory leak in the paged-attention kernel under high tensor-parallelism. Plus: NeurIPS 2024 workshop paper (coauthor) — 'Speculative decoding with mixture-of-experts draft models.' Open-sourced llama-3.1-8b-quill-support on Hugging Face (4,800 downloads).

PythonPyTorchCUDA

Education

MTech in Computer Science (Machine Learning)
IIT Madras
Jul 2016May 2018
senior

Senior (LLM)

6 years ML + 2 years LLMs. Owns inference router for 70B model on 4M req/day.

Use this template

Live preview · Senior (LLM)

Use this resume

Why this resume works

Summary names the model (Llama-3.1-70B), the system (inference router), and two LLM-era metrics. Bullets quantify ms/token, tokens/sec/GPU, cost-per-1k-tokens. Eval methodology with judge-ensemble. vLLM OSS contributions close. One page tight.

Anjali Subramaniam

Senior LLM Engineer · vLLM + Fine-tuning + Evals · 4M req/day fleet
Bengaluru·[email protected]·+91 98 555 0381·github.com/asubramaniam·huggingface.co/asubramaniam·linkedin.com/in/asubramaniam

Summary

Senior LLM engineer with six years of ML + two years on production LLM systems. Owns the inference request router for the Llama-3.1-70B-Instruct fleet at a Series C AI company (4M req/day, 4xH100 nodes). Cut p99 ms/token from 84ms to 28ms via vLLM continuous batching + speculative decoding. Two merged PRs to vllm-project/vllm; NeurIPS 2024 workshop coauthor.

Skills

Serving + Inference
vLLM (continuous batching, paged-attention)SGLangTriton Inference ServerSpeculative decoding
Training + Fine-tuning
PyTorch 2.x (torch.compile)Transformers + AccelerateLoRA / QLoRA / DPODeepSpeed + FSDPAxolotl + Unsloth
Evals + Ops
lm-evaluation-harnessCustom rubric evals (LLM-as-judge + calibration)Weights & BiasesDrift detection (NannyML / Evidently)Hugging Face Hub

Experience

Senior LLM Engineer
Quill AI · Remote (Bengaluru, IN)
Feb 2023Present
  • Cut p99 ms/token on the Llama-3.1-70B inference path from 84ms to 28ms by migrating from HF Transformers serving to vLLM with continuous batching + paged-attention; speculative decoding with a Llama-3.1-8B draft model accounted for the last 12ms.
  • Reduced cost-per-1k-tokens on the inference fleet from $0.42 to $0.18 through H100 → H200 migration (1.4× throughput) + continuous batching tuning + prefix caching for long-system-prompt workloads.
  • Built the team's eval harness in lm-evaluation-harness + custom rubric tasks (n=480 prompts, 4-judge ensemble with calibration); held post-fine-tune model to ≥ base + 6pp on the internal benchmark across 4 quarterly releases.
  • Fine-tuned Llama-3.1-8B on a 38k-row internal support-ticket dataset (LoRA r=16, alpha=32, 3 epochs); resulting model beat GPT-4o-mini on internal eval by 14pp at 1/12th per-token cost.
  • Shipped a prompt-injection detection layer (Llama-3.1-8B guard + heuristic prefilter); detected 98.4% of injection attempts with 0.4% FP rate on benign traffic.
ML Engineer
Hugging Face · Paris / Remote
Sep 2020Jan 2023
  • Built the team's RAG pipeline (pgvector + sentence-transformers + hybrid retriever); Recall@5 62% → 87% via BM25 + dense + LLM rerank.
  • Migrated the eval pipeline from notebook-based to CI-gated harness; every model release now runs against 480 prompts in 38 min; deploy gated on regression < 2pp.
  • Mentored 2 junior ML engineers transitioning from classical ML into LLM-engineering focus; both shipped sole-owner production LLM systems within 6 months.
Machine Learning Engineer
Razorpay · Bengaluru, IN
Jun 2018Aug 2020
  • Built the merchant-fraud detection model (XGBoost on 38M historic transactions); precision @ recall=0.80 went from 0.62 to 0.84 on the post-launch holdout.

Open Source & Publications

vllm-project/vllm
Contributor (2 merged PRs)

Two merged PRs to vLLM — one extended the continuous-batching scheduler for prefix-caching across long-prefix workloads; one closed a memory leak in the paged-attention kernel under high tensor-parallelism. Plus: NeurIPS 2024 workshop paper (coauthor) — 'Speculative decoding with mixture-of-experts draft models.' Open-sourced llama-3.1-8b-quill-support on Hugging Face (4,800 downloads).

PythonPyTorchCUDA

Education

MTech in Computer Science (Machine Learning)
IIT Madras
Jul 2016May 2018

What hiring managers look for

The specific signals an experienced ml engineer hiring panel grades on during the eight-second scan.

  • Summary names the model + the system

    'Owns the inference request router for the 70B LLM on 4M req/day' beats 'ML engineer.' Model + system is what panels scan for.

  • Latency-per-token + throughput-per-GPU

    LLM-era metrics. ms/token, tokens/sec/GPU. Generic 'low latency' parses as undated.

  • Eval methodology named

    Specific eval framework (lm-evaluation-harness, custom rubric-based, LLM-as-judge). Generic 'evaluated model performance' is filler.

  • Production serving stack

    vLLM, Triton, TGI, SGLang, Modal, Ray Serve. Named tools parse as ML-engineering depth.

  • Drift / monitoring work

    Data drift, model drift, prompt-injection rate, hallucination rate. Production-ML is graded on monitoring discipline.

  • Open-source contribution (if any)

    vLLM, Transformers, PyTorch, JAX, LangChain, LlamaIndex. ML engineering is open-source-native.

How to write a ml engineer resume

  1. 1

    Open with the model and the system

    A senior LLM-engineering summary names the model and the system: 'LLM engineer at a Series C AI company; owns the inference request router for the Llama-3.1-70B-Instruct fleet on 4M req/day.' Classical-ML: 'ML engineer at a marketplace; owns the recommendation model + the feature store (Feast) for 38M users.' Entry-level: 'Recent CS grad with shipped Kaggle medal; built a fine-tuned Llama-3.1-8B model on internal support data, public eval writeup at 4,800 reads.'

  2. 2

    Quantify with model-era units

    LLM era: ms/token (P50 + P99), tokens/sec/GPU, cost-per-1k-tokens, eval-vs-base improvement (pp). Classical ML: accuracy / precision / recall / AUC, business-metric delta, training time, model size, feature count.

    The numbers that pull a resume forward differ by era; surface the units that match the JD you're targeting.

  3. 3

    Name the serving + training stack

    Serving: vLLM, Triton, SGLang, TGI, Modal, Ray Serve, Bento, BudgetServer. Training: PyTorch (2.x with torch.compile), JAX, Transformers, DeepSpeed, FSDP, Accelerate, Axolotl, Unsloth. Evals: lm-evaluation-harness, custom rubric, LLM-as-judge with calibration.

    Named tools parse better than 'ML serving experience.'

  4. 4

    Surface eval + monitoring discipline

    Eval discipline is the load-bearing senior-LLM signal. The pattern that works: • Build one rigorous eval that catches regressions. • Run it on every model release. • Gate model deploys on the eval.

    Monitoring: data drift, model drift, prompt-injection rate, hallucination rate (with named methodology), output toxicity, refusal rate.

    ML resumes without monitoring or eval bullets read as research-only.

  5. 5

    Close with OSS / paper / shipped model

    High-signal closing items: • Merged PRs to vLLM, Transformers, PyTorch, JAX (named PR, technical description). • NeurIPS / ICML / ICLR paper (named conference, title, role). • Open-sourced fine-tuned model on Hugging Face with downloads + named evals. • NeurIPS / ICML workshop talk.

    These items pull senior ML resumes forward.

Pro tip

Specify the era — classical ML vs LLM

'Classical ML engineer (tabular + computer vision)' vs 'LLM engineer (serving + fine-tuning)' are different roles. Naming the era reads as senior; conflating them reads as recruiter-keyword.

Pro tip

Latency-per-token is the load-bearing LLM metric

ms/token (P50 and P99), tokens/sec/GPU, tokens-per-second-per-dollar. These are the units LLM-engineering hiring panels grade on.

Pro tip

Eval discipline beats eval count

Building one rigorous eval that catches regressions beats running 20 noisy benchmarks. Surface the eval methodology, not just the benchmark names.

Pro tip

Name the model + the size

'Llama-3.1-70B-Instruct served on 4xH100s' parses as current LLM work. 'LLM serving experience' is too generic.

ATS notes

ML-engineering ATS pipelines screen for a distinctive token set that changed substantially post-2023. LLM-era tokens: LLM, large language model, GPT-4, Claude, Llama, Mistral, vLLM, Triton, SGLang, TGI, PyTorch, JAX, Transformers, Modal, Ray, Anyscale, RAG, retrieval-augmented generation, embeddings, vector database, Pinecone, Weaviate, pgvector, LangChain, LlamaIndex, fine-tuning, LoRA, QLoRA, SFT, DPO, RLHF, eval, lm-evaluation-harness, prompt engineering, prompt injection.

Classical-ML tokens: scikit-learn, XGBoost, LightGBM, computer vision, NLP, recommendation systems, MLflow, Weights & Biases, feature store, Feast, Tecton, drift detection, model monitoring, A/B testing.

Name the tokens precisely. JDs in 2026 are often very specific about LLM vs classical ML — match the JD's vocabulary.

Sample bullets you can adapt

Each follows the [verb] [object] [number] structure hiring managers grade against. Copy them as a starting point, swap in your own numbers, and read the annotation to understand why each one works.

  • Latency

    Cut p99 ms/token on the Llama-3.1-70B inference path from 84ms to 28ms by migrating from HF Transformers serving to vLLM with continuous batching + paged-attention; speculative decoding with a Llama-3.1-8B draft model accounted for the last 12ms.

    Why it works: Names the model, before/after, three interventions, and the draft model. The depth signals senior LLM engineering.

  • Evals

    Built the team's eval harness in lm-evaluation-harness + custom rubric tasks (n=480 prompts, 4-judge ensemble with calibration); held post-fine-tune model to ≥ base + 6pp on the internal benchmark across 4 quarterly releases.

    Why it works: Names the eval framework, harness size, judge-ensemble methodology, and the regression-gate outcome over a sustained window.

  • Fine-tuning

    Fine-tuned Llama-3.1-8B on a 38k-row internal support-ticket dataset (LoRA r=16, alpha=32, 3 epochs); resulting model beat GPT-4o-mini on the internal eval by 14pp at 1/12th the per-token cost.

    Why it works: Base model, dataset size, LoRA config, eval outcome, cost comparison. The combo is the senior LLM-fine-tuning signature.

  • Cost

    Reduced cost-per-1k-tokens on the inference fleet from $0.42 to $0.18 through three interventions: H100 → H200 migration (1.4× throughput), continuous batching tuning, and prefix caching for long-system-prompt workloads.

    Why it works: Cost-per-token improvement with absolute numbers, hardware migration named, and three specific interventions. LLM cost work is heavily-weighted in 2026 hiring.

  • RAG

    Built the team's RAG pipeline (pgvector + sentence-transformers + custom hybrid retriever); retrieval Recall@5 went from 62% to 87% via a BM25 + dense ensemble + LLM rerank, and end-to-end RAG accuracy on the internal eval rose 18pp.

    Why it works: Names the stack, the metric (Recall@5), the retrieval architecture (hybrid + rerank), and the end-to-end outcome. RAG depth at this level is senior signal.

  • Safety

    Shipped a real-time prompt-injection detection layer (Llama-3.1-8B guard model + heuristic prefilter); detected 98.4% of injection attempts in the redteam corpus with 0.4% false-positive rate on benign traffic.

    Why it works: Names the architecture (guard + prefilter), the detection rate, and the FP rate. Safety work with sustained metrics is senior signal.

  • Monitoring

    Built the model-monitoring system (data drift + output entropy + refusal-rate dashboards in Grafana); caught a 14% accuracy regression within 4 hours of a base-model upstream release.

    Why it works: Names the three monitoring signals, the dashboard tool, and a specific catch outcome. Monitoring discipline is senior signal.

  • Tooling

    Migrated the eval pipeline from a notebook-based ad-hoc loop to a CI-gated harness (lm-eval-harness in GitHub Actions); every model release now runs against 480 prompts in 38 minutes; deploy is gated on regression < 2pp.

    Why it works: Names the before/after architecture, the runtime, and the deploy gate. Eval-in-CI is the modern LLM-engineering bread-and-butter.

  • Open Source

    Two merged PRs to vllm-project/vllm — one extended the continuous-batching scheduler for prefix-caching across long-prefix workloads; one closed a memory leak in the paged-attention kernel under high tensor-parallelism.

    Why it works: Named project (vLLM), PR count, and two technically meaty descriptions. vLLM is current-vintage LLM serving infra.

  • Shipped model

    Open-sourced llama-3.1-8b-quill-support — a LoRA-fine-tuned support-ticket model; 4,800 downloads on Hugging Face; named in two model-card writeups from the AI-eng community.

    Why it works: Named model, download count, external citations. Open-sourced fine-tuned models are gold-standard LLM-engineering credentials.

  • Mentorship

    Mentored 2 junior ML engineers transitioning from classical ML into LLM-engineering focus; both shipped sole-owner production LLM systems within 6 months.

    Why it works: Names the transition, timeframe, and deliverable per mentee. Cross-era mentorship is a senior signal.

  • Publication

    NeurIPS 2024 workshop paper — 'Speculative decoding with mixture-of-experts draft models' (coauthor, presenter). 380 conference attendees; 2,400 paper views post-publication.

    Why it works: Named conference, paper title, role, and post-publication metrics. Workshop papers carry weight in ML hiring.

Wrong vs Right · bullet rewrites

Same intent, two phrasings. Read why the right column lands on the keep-pile and the wrong column doesn't.

Summary opener

Wrong

ML engineer with experience in machine learning and AI.

Right

LLM engineer at a Series C AI company; owns the inference request router for the Llama-3.1-70B-Instruct fleet on 4M req/day. Cut p99 ms/token from 84ms to 28ms via vLLM continuous batching + speculative decoding; cost-per-1k-tokens fell 38% over Q3.

Why: Right version names the model (with size + variant), the request scale, two LLM-era metrics, and the optimization technique. Wrong version is the LLM-default opener.

Latency

Wrong

Improved model serving performance through optimization.

Right

Cut p99 ms/token on the Llama-3.1-70B inference path from 84ms to 28ms by migrating from naive HF Transformers serving to vLLM with continuous batching + paged-attention; speculative decoding with a Llama-3.1-8B draft model accounted for the last 12ms.

Why: Right version names the model, the before/after, three specific interventions (vLLM, continuous batching, speculative decoding), and the draft model. Hiring panels recognize the depth.

Eval

Wrong

Evaluated model performance across various benchmarks.

Right

Built the team's eval harness in lm-evaluation-harness + custom rubric tasks (n=480 prompts, 4-judge ensemble with calibration); held the post-fine-tuning model to ≥ base + 6pp on the internal benchmark across 4 quarterly releases.

Why: Right version names the eval framework, the harness size, the judge ensemble methodology, and the regression-gate outcome. Eval discipline is the senior-ML signal.

Fine-tuning

Wrong

Fine-tuned LLMs for various downstream tasks.

Right

Fine-tuned Llama-3.1-8B on a 38k-row internal support-ticket dataset (LoRA r=16, alpha=32, 3 epochs); resulted model beat GPT-4o-mini on the internal eval by 14pp at 1/12th the per-token cost.

Why: Right version names the base model, dataset size, LoRA config, eval outcome, and cost comparison. Fine-tuning without these details reads as resume-padding.

Open source

Wrong

Contributed to ML open-source projects.

Right

Two merged PRs to vllm-project/vllm — one extended the continuous-batching scheduler for prefix-caching across long-prefix workloads; one closed a memory leak in the paged-attention kernel under high tensor-parallelism.

Why: Right version names the project (vLLM), PR count, and two technically meaty descriptions. vLLM is current-vintage LLM-serving infra; the contribution signals depth.

Skip the blank page

Start from the senior (llm) example

Edit the names, the numbers, the company — yours in under a minute.

Use this template

Common mistakes (and how to fix them)

Patterns our writers see most often when reviewing ml engineer resumes — each one disqualifies candidates faster than weak experience does.

  • Mistake

    Conflating classical ML and LLM engineering.

    Fix

    Specify the era. Classical ML and LLM engineering are different roles in 2026.

  • Mistake

    Generic 'low latency' claims without ms/token.

    Fix

    LLM era: ms/token (P50, P99), tokens/sec/GPU. Classical ML: training time, inference latency in absolute ms.

  • Mistake

    Eval claims without naming the methodology.

    Fix

    Name the eval framework (lm-evaluation-harness, custom rubric, LLM-as-judge with calibration). Generic 'evaluated model' is filler.

  • Mistake

    Listing every ML framework without naming what you shipped.

    Fix

    Name the framework only if you've shipped production with it. PyTorch + Transformers + vLLM + a couple of adjacent tools is more credible than ten framework names.

  • Mistake

    Notebook-only work without production deployment.

    Fix

    Production ML and notebook ML are different. If you've only shipped notebooks, lead with the experimentation signal — but be transparent.

  • Mistake

    Reporting FID metrics without ms/token (in LLM context).

    Fix

    LLM era metrics are ms/token, not generic latency. Match the unit to the JD.

  • Mistake

    No mention of monitoring or evals.

    Fix

    Surface eval discipline and monitoring. ML hiring panels in 2026 read for production maturity.

  • Mistake

    Two-page resume with fewer than 7 years experience.

    Fix

    One page. ML hiring panels move fast.

Resume format for ML Engineers

Reverse-chronological. Header → era (LLM / classical ML) + system + scope summary → experience → open-source / papers / shipped models → skills (Models / Serving / Training / Evals / Tooling) → education. One page until at least seven years experience.

Salary & job outlook

Median annual salary

$165,000

Range: $94,200 to $268,000

Projected job growth

+26% from 2023 to 2033 (much faster than average)

Action verbs for ml engineers

Strong verbs lead strong bullets. Replace generic openers (worked on, helped with, was responsible for) with the specific verb that matches what you actually did.

shippedownedtrainedfine-tunedserveddeployedquantizeddistilledevaluatedbenchmarkedinstrumentedmonitoreddrift-detectedbatchedprefix-cachedspeculatively-decodedtensor-paralleledrolled outopen-sourcedpublishedmentoredled

Skills hiring managers screen for

ATS pipelines weight your Skills section as a structured list. Include 15-25 of the items below if they match your experience — not soft skills.

PyTorch 2.x (torch.compile)Transformers + DiffusersJAX + FlaxvLLMTriton + TensorRT-LLMSGLangTGIModalRay + Ray ServeDeepSpeed + FSDPAxolotl + UnslothLoRA / QLoRA / DPO / RLHFlm-evaluation-harnessCustom evals (LLM-as-judge with calibration)pgvectorPinecone / WeaviateLangChain / LlamaIndex (selectively)Hugging Face HubWeights & BiasesMLflowFeast (feature store)Drift detection (NannyML / Evidently)PyTorch Distributed (DDP, FSDP)CUDA + Triton kernels (basics)

FAQ

Should I list both classical ML and LLM work?+

If you've shipped both, yes — but separate the eras in your experience. Lead with the era that matches the JD. The 2026 hiring landscape distinguishes the two more than it used to.

Do I need a PhD for ML engineering roles?+

No. Senior ML-engineering roles at sophisticated companies (Anthropic, OpenAI, Meta, Google) hire substantively without PhDs when the production ML and OSS evidence is strong. A PhD helps for research-track roles; the engineering track weights shipped systems heavier.

How do I show LLM serving depth without proprietary numbers?+

Use relative numbers + the tooling. 'Cut p99 ms/token by 67% via vLLM continuous batching + speculative decoding with a draft model' is credible without exact absolute numbers.

Should I list LangChain / LlamaIndex on my resume?+

Selectively. LangChain has reputation issues in senior ML-engineering circles for being too abstract. Naming it is fine; leading with it as primary expertise can read as junior. LlamaIndex is more respected for RAG-specific work.

What if I've only fine-tuned models, not trained from scratch?+

That's the modern reality — most production ML engineers fine-tune, they don't train from scratch. Lead with fine-tuning depth (LoRA configs, eval discipline, base-model selection). Pretraining experience is rare and is mostly relevant for foundation-model companies.

How important are evals on a resume?+

Load-bearing post-2023. Eval discipline is the load-bearing senior LLM-engineering signal. Surface the methodology, the harness size, and the regression-gate outcome.

Should I include Kaggle results?+

Only if you've placed in named competitions (gold / silver medal, top 1-5%). Generic Kaggle participation reads as filler at the senior level. Gold-medal placement is a high-signal entry-level credential.

Do I need to know CUDA / Triton kernels?+

Helps but not required for most ML-engineering roles. Foundation-model company roles weight it more. Generic LLM serving roles weight vLLM / Triton inference server (note: different Triton) more than custom CUDA kernels.

How do I handle a transition from generalist SWE to ML engineering?+

Lead with the ML work first, even if it's smaller. 'Software engineer transitioning to ML — fine-tuned and shipped Llama-3.1-8B for the support team for the last 14 months.' Show evidence of production ML work, not just learning.

Should I list every Hugging Face model I've fine-tuned?+

Group them into one bullet if they're related. 'Fine-tuned 4 LoRA adapters on internal data across customer support, sales, internal-search use cases — top adapter at 4,800 Hugging Face downloads.'

Ready when you are

Start with one of these examples

Pick the variant closest to your stage. We'll drop the resume into your account fully editable — swap the names, the numbers, the company, and you have a polished starting point in under a minute.

Browse examples