:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

NOW IN EARLY ACCESS — 30+ MODELS READY

8BIT/4BIT

The quantized model registry. Every model. Every format.
GGUF · GPTQ · AWQ · ONNX · TFLite · CoreML

BROWSE MODELS HOW IT WORKS →

8bit4bit.com — model registry

$ 8b4b pull llama-3.1-8b-instruct --format gguf-q4 # Fetching quantized variant from registry... → Model: Llama 3.1 8B Instruct → Format: GGUF Q4_K_M → Size: 4.1 GB (vs 16 GB FP16 — 74% smaller) → Accuracy: 98.2% perplexity: 6.84 → CDN: Mumbai edge node · 0ms latency ✓ Downloaded in 38s

0MODELS

0VARIANTS

0FORMATS

0DOWNLOADS

SUPPORTED FORMATS

Every quantization format.
One registry.

We support every major compression format so you can find the exact variant for your hardware — from consumer CPUs to mobile NPUs.

GGUF

Q4_K_M · Q8_0 · Q5_K

llama.cpp ecosystem. Best for CPU inference on Apple Silicon and x86.

GPTQ

INT4 · INT8

GPU-quantized via AutoGPTQ. Optimized for NVIDIA cards with vLLM.

AWQ

INT4 Activation-aware

Faster than GPTQ with better accuracy. Ideal for NVIDIA inference at scale.

ONNX

INT8 · FP16

Cross-platform via ONNX Runtime. Runs on CPU, GPU, and edge devices.

TFLite

INT8 · FP16

Android on-device inference. Optimized for ARM CPUs and mobile NPUs.

CoreML

FP16 · INT8

iOS and macOS via Apple Neural Engine. Built with coremltools.

PIPELINE

From base model
to download-ready.

Every model goes through a rigorous 4-step pipeline before it reaches you. Rejected if accuracy degradation exceeds 5%.

INGEST

Pull from Hugging Face or direct upload. License verified automatically on submission.

QUANTIZE

Async GPU workers convert to all formats using llama.cpp, AutoGPTQ, AutoAWQ, Optimum.

BENCHMARK

Accuracy, size, and latency tested on reference hardware. Hard threshold enforced.

DEPLOY

Stored on Cloudflare R2 with zero egress fees. Served from the nearest edge node globally.

MODEL CATALOG

Browse quantized models.

⌕

MODALITIES

Text. Vision. Audio.
Image generation.

Phase 1 covers the most in-demand model types. Phase 2 adds video and more advanced modalities.

🔤

Text → Text (LLM)

● PHASE 1 — AVAILABLE NOW

👁

Image → Text (VLM)

● PHASE 1 — AVAILABLE NOW

🎨

Text → Image

● PHASE 1 — AVAILABLE NOW

🎙

Audio → Text (ASR)

● PHASE 1 — AVAILABLE NOW

🖼

Image → Image

○ PHASE 2 — COMING SOON

🎬

Text / Image → Video

○ PHASE 2 — COMING SOON

SERVICES

More than a registry.

Pay only for what you use. No subscriptions, no lock-in.

⚡

Quantize on Demand

$2 – $15 per model

Submit any Hugging Face model ID. Get all quantized variants back in minutes. Pure pay-per-job — no commitment.

📡

Metadata API

Free · $19/mo · $49/mo

Programmatic access to model catalog, benchmarks, and download stats. Build your own tools on our registry.

🏢

Enterprise Registry

Custom pricing

Private model hosting, custom quantization SLAs, SSO, audit logs, and dedicated support for your team.

GET STARTED

Stop quantizing models
yourself.

The average developer spends 3 hours quantizing one model.
We did it already. Download in seconds.

BROWSE ALL MODELS →READ THE DOCS

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

:: ADVERTISE ::

20/20 spots left

8BIT/4BIT

Every quantization format.One registry.

From base modelto download-ready.

Browse quantized models.

Text. Vision. Audio.Image generation.

More than a registry.

Stop quantizing modelsyourself.

Advertise on 8BIT4BIT

How it works

CLAIM::AD_SPOT

Every quantization format.
One registry.

From base model
to download-ready.

Text. Vision. Audio.
Image generation.

Stop quantizing models
yourself.