experiment · 13 · model landscape

// madcool · lab · deep research

AI Model Landscape

Two views of the field. Below the fold: 25 open-weights models small enough to host yourself (full-precision footprint under 500 GB) with one-click links to source, weights, and a hosted-inference button on Featherless where available. Then: a regularly-updated multi-axial comparison of 11 cloud models scored across eight capability axes.

local · updated 2026-05-24 · cloud · updated 2026-05-24

Cloud · Multi-Axial Comparison

Subjective curated scores. Click model chips to overlay up to four at once — the shape tells the story.

// models

// axes

Reasoning — Math + multistep logic (GPQA, MathArena)
Coding — Real-codebase repair (SWE-bench, Aider)
Knowledge — Breadth + factuality (MMLU-Pro)
Writing — Creative + long-form (AidanBench, Arena-Hard)
Vision — Multimodal understanding (MMMU)
Tools — Agentic + function-calling (Tau-bench)
Speed — Inverse latency (tokens/s, TTFT)
Cost — Inverse $/Mtok (100 = cheapest)

Curated 0-100 scores per axis, anchored to public benchmarks (LMSys Arena, SWE-bench Verified, MMLU-Pro, GPQA, MathArena, AidanBench, Aider, Vellum leaderboard) blended with field reports. cost = inverse price (100 = cheapest), speed = inverse latency (100 = fastest). Refreshed monthly or after a noteworthy model release.

Claude Sonnet 4.7

Anthropic

The current default. Top-tier coding + tool use, 1M context.

vendor site ↗

Claude Opus 4.7

Anthropic

Anthropic's heavyweight. Deepest reasoning, premium tier.

vendor site ↗

Claude Haiku 4

Anthropic

Cheap + fast Anthropic. Surprisingly capable for the price.

vendor site ↗

GPT-5

OpenAI

OpenAI's frontier. Unified reasoning + agentic stack.

vendor site ↗

GPT-4o

OpenAI

Multimodal workhorse. Solid all-rounder, native audio + vision.

vendor site ↗

Gemini 2.5 Pro

Google

Google's flagship. 2M context, video native, integrated thinking.

vendor site ↗

Gemini 2.5 Flash

Google

Cheap Gemini. Astonishingly long context for the price.

vendor site ↗

Grok 4

xAI

xAI's frontier model. Real-time X data, large context, growing tool stack.

vendor site ↗

DeepSeek V3.1

DeepSeek

Open-weights frontier. 671B MoE, hosted API at 1/10th the price.

vendor site ↗

Mistral Large 2

Mistral AI

European frontier. EU-data-resident inference, strong multilingual.

vendor site ↗

Command R+

Cohere

RAG-first cloud model. Citation-aware, tool-use solid.

vendor site ↗

Local · Open Weights ≤ 500 GB

Filter by vendor or kind. Click any link to jump to source, weights, or a hosted runner on Featherless.

DeepSeek V2.5

DeepSeek · 236B MoE (21B active) · 472 GB

general

MoE generalist + coder fused. Activates 21B per token.

DeepSeek License

GitHub ↗ 🤗 HF ↗ ▶ Featherless n/a

Mixtral 8x22B

Mistral AI · 141B MoE (39B active) · 282 GB

general

MoE workhorse. Apache 2.0, fast at scale.

Apache 2.0

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

DBRX Instruct

Databricks · 132B MoE (36B active) · 264 GB

general

MoE built for the data-platform crowd. Long context, good code.

Databricks Open Model

GitHub ↗ 🤗 HF ↗ ▶ Featherless n/a

Mistral Large 2 (123B)

Mistral AI · 123B · 246 GB

general

Mistral's flagship dense weights. Strong multilingual + function calling.

Mistral Research

GitHub ↗ 🤗 HF ↗ ▶ Featherless n/a

Command R+ 104B

Cohere · 104B · 208 GB

general

RAG + tool-use specialist. Non-commercial weights.

CC-BY-NC 4.0

GitHub ↗ 🤗 HF ↗ ▶ Featherless n/a

Llama 3.2 90B Vision

Meta · 90B · 180 GB

vision

Multimodal Llama. Image + text reasoning at frontier scale.

Llama 3.2 Community

GitHub ↗ 🤗 HF ↗ ▶ Featherless n/a

Qwen 2.5 72B Instruct

Alibaba · 72B · 144 GB

general

Top open-weights generalist on LMSys. Tool-use solid.

Qwen License

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Llama 3.3 70B Instruct

Meta · 70B · 140 GB

general

Meta's 2025 70B refresh. Closes most of the 405B gap with smaller footprint.

Llama 3 Community

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Nemotron-70B Instruct

NVIDIA · 70B · 140 GB

general

Llama 3.1 70B finetune. RewardBench leader at release.

Llama 3.1 Community

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Hermes 3 70B

Nous Research · 70B · 140 GB

general

Steerable Llama 3 finetune. Sharper persona + tool-use control.

Llama 3.1 Community

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Yi-1.5 34B

01.AI · 34B · 68 GB

general

Apache-licensed bilingual model. Strong English + Chinese.

Apache 2.0

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Qwen 2.5 32B Instruct

Alibaba · 32B · 64 GB

general

Two-card sweet spot. Apache-licensed Qwen sibling.

Apache 2.0

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Qwen 2.5 Coder 32B

Alibaba · 32B · 64 GB

coding

Best open-weights coder. Competitive with Sonnet on SWE-bench Verified.

Apache 2.0

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

QwQ 32B Preview

Alibaba · 32B · 64 GB

reasoning

Open o1-style reasoning model. Long chain-of-thought, math heavy.

Apache 2.0

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Gemma 2 27B

Google · 27B · 54 GB

general

Google's open weights. Sliding-window attention, efficient.

Gemma Terms

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Mistral Small 22B

Mistral AI · 22B · 44 GB

general

Single-GPU dense model. Strong for size.

Mistral Research

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

DeepSeek Coder V2 Lite

DeepSeek · 16B MoE (2.4B active) · 32 GB

coding

Punchy MoE coder, single-GPU. 128K context, FIM training.

DeepSeek License

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Phi-4 14B

Microsoft · 14B · 28 GB

reasoning

Synthetic-data trained. Math + reasoning above its weight class.

MIT

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

OLMo 2 13B

AI2 · 13B · 26 GB

general

Fully open: weights + data + training code. The open-science benchmark.

Apache 2.0

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Llama 3.2 11B Vision

Meta · 11B · 22 GB

vision

Compact vision-language model. Single-GPU territory.

Llama 3.2 Community

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Gemma 2 9B

Google · 9B · 18 GB

general

Small but mighty. Beats Llama 3 8B on most benchmarks.

Gemma Terms

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Llama 3.1 8B Instruct

Meta · 8B · 16 GB

general

The workhorse small model. Fits on a single 24GB card.

Llama 3.1 Community

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Granite 3 8B

IBM · 8B · 16 GB

general

IBM's enterprise-trained open model. Code + instruction tuned.

Apache 2.0

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

Llama 3.2 3B

Meta · 3B · 6 GB

small

Edge-deployment Llama. Pruned + distilled from 8B.

Llama 3.2 Community

GitHub ↗ 🤗 HF ↗ ▶ Featherless ↗

SmolLM2 1.7B

Hugging Face · 1.7B · 3.4 GB

small

On-device class. Runs on a Raspberry Pi 5 with room to spare.

Apache 2.0

GitHub ↗ 🤗 HF ↗ ▶ Featherless n/a

Numbers are curated estimates — not a leaderboard. The shape of the radar matters more than any single number. Refreshed monthly from public/data/ai-models-cloud.json and public/data/ai-models-local.json.

Back to the lab index →