DEV Community

# vllm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
How RunPod FlashBoot Actually Works (4-Request Test)

How RunPod FlashBoot Actually Works (4-Request Test)

Comments
10 min read
Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

Comments
5 min read
vLLM's V1 Release Fixes the Silent Killer in RL Training

vLLM's V1 Release Fixes the Silent Killer in RL Training

Comments
2 min read
The 70B Threshold: How the RTX 5090 Rewrites the Home Lab Equation

The 70B Threshold: How the RTX 5090 Rewrites the Home Lab Equation

Comments
8 min read
Rethinking Open Source Contribution in the Age of AI Agents, featuring vLLM Core Maintainer Roger Wang at MLSys'26

Rethinking Open Source Contribution in the Age of AI Agents, featuring vLLM Core Maintainer Roger Wang at MLSys'26

8
Comments 5
3 min read
72B Parameters, Zero Quantization, One GPU: Benchmarking Qwen2-VL on AMD MI300X

72B Parameters, Zero Quantization, One GPU: Benchmarking Qwen2-VL on AMD MI300X

Comments
13 min read
From one model to seven — what it took to make TurboQuant model-portable

From one model to seven — what it took to make TurboQuant model-portable

Comments
3 min read
Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1

Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1

1
Comments
2 min read
Self-hosted Gemma 4 on TPU with vLLM, MCP, ADK, and Gemini CLI

Self-hosted Gemma 4 on TPU with vLLM, MCP, ADK, and Gemini CLI

26
Comments
16 min read
11-Second Time to First Token on a Healthy vLLM Server

11-Second Time to First Token on a Healthy vLLM Server

1
Comments
5 min read
How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

2
Comments 1
9 min read
Gemma-SRE: Self-Hosted vLLM Infrastructure Agent

Gemma-SRE: Self-Hosted vLLM Infrastructure Agent

1
Comments
18 min read
vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

2
Comments 1
4 min read
I Pushed Local LLMs Harder. Here's What Two Models Actually Did.

I Pushed Local LLMs Harder. Here's What Two Models Actually Did.

1
Comments
8 min read
vLLM Request Lifecycle (Where TTFT is measured)

vLLM Request Lifecycle (Where TTFT is measured)

1
Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.