DEV Community

# gpu

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
FlashAttention CUDA Kernel, Strix Halo MOE Boost, & NVIDIA DLSS 4.5 Driver Update

FlashAttention CUDA Kernel, Strix Halo MOE Boost, & NVIDIA DLSS 4.5 Driver Update

Comments
3 min read
PatentLLM: CUDA TileLang/Triton B200 5x Speedup, RTX 5090 Power, PTX Grammar

PatentLLM: CUDA TileLang/Triton B200 5x Speedup, RTX 5090 Power, PTX Grammar

Comments
3 min read
How to Detect GPU Waste in a Kubernetes Cluster

How to Detect GPU Waste in a Kubernetes Cluster

Comments
5 min read
Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)

Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)

Comments
5 min read
RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix

RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix

Comments
3 min read
AMD GPU/AI Launches, Legacy Driver Update & CUDA Optimization Platform

AMD GPU/AI Launches, Legacy Driver Update & CUDA Optimization Platform

Comments
3 min read
Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture

Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture

Comments
5 min read
RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains

RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains

1
Comments
4 min read
Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops

Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops

Comments
8 min read
Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline

Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline

Comments
7 min read
HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution

HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution

Comments
5 min read
Profiling a CUDA Python Program with GPUFlight

Profiling a CUDA Python Program with GPUFlight

Comments
10 min read
LLM Compilers, GGUF Quantization, & Radeon RX 9060 Benchmarks

LLM Compilers, GGUF Quantization, & Radeon RX 9060 Benchmarks

Comments
3 min read
Go+CUDA Optimization, LLM VRAM Benchmarks & NVIDIA G-SYNC Firmware 1.1.6

Go+CUDA Optimization, LLM VRAM Benchmarks & NVIDIA G-SYNC Firmware 1.1.6

2
Comments
3 min read
Construyendo la PC de Escritorio de tus Sueños

Construyendo la PC de Escritorio de tus Sueños

Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.