NOW IN EARLY ACCESS — 30+ MODELS READY

8BIT/4BIT

The quantized model registry. Every model. Every format.
GGUF · GPTQ · AWQ · ONNX · TFLite · CoreML

8bit4bit.com — model registry
$ 8b4b pull llama-3.1-8b-instruct --format gguf-q4 # Fetching quantized variant from registry... → Model: Llama 3.1 8B Instruct → Format: GGUF Q4_K_M → Size: 4.1 GB (vs 16 GB FP16 — 74% smaller) → Accuracy: 98.2% perplexity: 6.84 → CDN: Mumbai edge node · 0ms latency ✓ Downloaded in 38s
0MODELS
0VARIANTS
0FORMATS
0DOWNLOADS
SUPPORTED FORMATS

Every quantization format.
One registry.

We support every major compression format so you can find the exact variant for your hardware — from consumer CPUs to mobile NPUs.

GGUF
Q4_K_M · Q8_0 · Q5_K
llama.cpp ecosystem. Best for CPU inference on Apple Silicon and x86.
GPTQ
INT4 · INT8
GPU-quantized via AutoGPTQ. Optimized for NVIDIA cards with vLLM.
AWQ
INT4 Activation-aware
Faster than GPTQ with better accuracy. Ideal for NVIDIA inference at scale.
ONNX
INT8 · FP16
Cross-platform via ONNX Runtime. Runs on CPU, GPU, and edge devices.
TFLite
INT8 · FP16
Android on-device inference. Optimized for ARM CPUs and mobile NPUs.
CoreML
FP16 · INT8
iOS and macOS via Apple Neural Engine. Built with coremltools.
PIPELINE

From base model
to download-ready.

Every model goes through a rigorous 4-step pipeline before it reaches you. Rejected if accuracy degradation exceeds 5%.

01
INGEST
Pull from Hugging Face or direct upload. License verified automatically on submission.
02
QUANTIZE
Async GPU workers convert to all formats using llama.cpp, AutoGPTQ, AutoAWQ, Optimum.
03
BENCHMARK
Accuracy, size, and latency tested on reference hardware. Hard threshold enforced.
04
DEPLOY
Stored on Cloudflare R2 with zero egress fees. Served from the nearest edge node globally.
MODEL CATALOG

Browse quantized models.

MODALITIES

Text. Vision. Audio.
Image generation.

Phase 1 covers the most in-demand model types. Phase 2 adds video and more advanced modalities.

🔤
Text → Text (LLM)
● PHASE 1 — AVAILABLE NOW
👁
Image → Text (VLM)
● PHASE 1 — AVAILABLE NOW
🎨
Text → Image
● PHASE 1 — AVAILABLE NOW
🎙
Audio → Text (ASR)
● PHASE 1 — AVAILABLE NOW
🖼
Image → Image
○ PHASE 2 — COMING SOON
🎬
Text / Image → Video
○ PHASE 2 — COMING SOON
SERVICES

More than a registry.

Pay only for what you use. No subscriptions, no lock-in.

Quantize on Demand
$2 – $15 per model
Submit any Hugging Face model ID. Get all quantized variants back in minutes. Pure pay-per-job — no commitment.
📡
Metadata API
Free · $19/mo · $49/mo
Programmatic access to model catalog, benchmarks, and download stats. Build your own tools on our registry.
🏢
Enterprise Registry
Custom pricing
Private model hosting, custom quantization SLAs, SSO, audit logs, and dedicated support for your team.
GET STARTED

Stop quantizing models
yourself.

The average developer spends 3 hours quantizing one model.
We did it already. Download in seconds.

BROWSE ALL MODELS →READ THE DOCS