Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
quantization
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Why your quantized LLM loses its MTP heads and how to keep them
Alan West
Alan West
Alan West
Follow
May 27
Why your quantized LLM loses its MTP heads and how to keep them
#
machinelearning
#
llm
#
python
#
quantization
Comments
Add Comment
5 min read
The Best Result This Week Was a Failed Prediction — Phase-3a Doesn't Transfer
MxGuru
MxGuru
MxGuru
Follow
May 20
The Best Result This Week Was a Failed Prediction — Phase-3a Doesn't Transfer
#
quantization
#
hsaq
#
methodology
#
granite
Comments
Add Comment
1 min read
Two Localizers, Both Wrong: Bounding a Quantization Cost That Wouldn't Close
MxGuru
MxGuru
MxGuru
Follow
May 20
Two Localizers, Both Wrong: Bounding a Quantization Cost That Wouldn't Close
#
quantization
#
hsaq
#
methodology
#
granite
Comments
Add Comment
1 min read
When the Sensitivity Metric Lies: A Drift-Inversion Smoking Gun in Mixed-Precision LLM Quantization
MxGuru
MxGuru
MxGuru
Follow
May 20
When the Sensitivity Metric Lies: A Drift-Inversion Smoking Gun in Mixed-Precision LLM Quantization
#
quantization
#
hsaq
#
awq
#
granite
Comments
Add Comment
8 min read
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)
Patrick Hughes
Patrick Hughes
Patrick Hughes
Follow
May 13
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)
#
llamacpp
#
gguf
#
quantization
#
localai
Comments
Add Comment
4 min read
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4
Vilius
Vilius
Vilius
Follow
May 9
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4
#
ai
#
llm
#
local
#
quantization
2
 reactions
Comments
1
 comment
2 min read
KVQuant: Run 70B LLMs on 8GB RAM with KV Cache Quantization
Aman Sachan
Aman Sachan
Aman Sachan
Follow
Apr 30
KVQuant: Run 70B LLMs on 8GB RAM with KV Cache Quantization
#
python
#
llm
#
quantization
Comments
Add Comment
1 min read
KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization
Aman Sachan
Aman Sachan
Aman Sachan
Follow
Apr 30
KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization
#
python
#
llm
#
quantization
#
optimization
Comments
Add Comment
1 min read
Traditional Quantization vs 1.58-Bit Ternary Models: A Practical Comparison
Alan West
Alan West
Alan West
Follow
Apr 18
Traditional Quantization vs 1.58-Bit Ternary Models: A Practical Comparison
#
machinelearning
#
llm
#
quantization
#
ai
Comments
1
 comment
5 min read
GIMP's Posterization: Simple Quantization vs. Median Cut for Better Visuals
Denis Lavrentyev
Denis Lavrentyev
Denis Lavrentyev
Follow
Apr 13
GIMP's Posterization: Simple Quantization vs. Median Cut for Better Visuals
#
gimp
#
posterization
#
quantization
#
mediancut
Comments
Add Comment
8 min read
Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke
plasmon
plasmon
plasmon
Follow
Apr 8
Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke
#
llm
#
quantization
#
vram
#
localllm
Comments
Add Comment
8 min read
Chasing 16MB: My Parameter Golf Journey and What I Learned the Hard Way
Jean
Jean
Jean
Follow
May 8
Chasing 16MB: My Parameter Golf Journey and What I Learned the Hard Way
#
parametergolf
#
tinyllm
#
aiexperimentation
#
quantization
1
 reaction
Comments
Add Comment
3 min read
Building a Vector Database That Never Decompresses Your Vectors
Scott Everitt
Scott Everitt
Scott Everitt
Follow
Mar 30
Building a Vector Database That Never Decompresses Your Vectors
#
vectordatabase
#
quantization
#
turboquant
#
go
2
 reactions
Comments
Add Comment
16 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account