Claude Sonnet 4.7
AnthropicThe current default. Top-tier coding + tool use, 1M context.
vendor site ↗// madcool · lab · deep research
Two views of the field. Below the fold: 25 open-weights models small enough to host yourself (full-precision footprint under 500 GB) with one-click links to source, weights, and a hosted-inference button on Featherless where available. Then: a regularly-updated multi-axial comparison of 11 cloud models scored across eight capability axes.
Subjective curated scores. Click model chips to overlay up to four at once — the shape tells the story.
The current default. Top-tier coding + tool use, 1M context.
vendor site ↗Anthropic's heavyweight. Deepest reasoning, premium tier.
vendor site ↗Cheap + fast Anthropic. Surprisingly capable for the price.
vendor site ↗OpenAI's frontier. Unified reasoning + agentic stack.
vendor site ↗Multimodal workhorse. Solid all-rounder, native audio + vision.
vendor site ↗Google's flagship. 2M context, video native, integrated thinking.
vendor site ↗Cheap Gemini. Astonishingly long context for the price.
vendor site ↗xAI's frontier model. Real-time X data, large context, growing tool stack.
vendor site ↗Open-weights frontier. 671B MoE, hosted API at 1/10th the price.
vendor site ↗European frontier. EU-data-resident inference, strong multilingual.
vendor site ↗RAG-first cloud model. Citation-aware, tool-use solid.
vendor site ↗Filter by vendor or kind. Click any link to jump to source, weights, or a hosted runner on Featherless.
MoE generalist + coder fused. Activates 21B per token.
MoE workhorse. Apache 2.0, fast at scale.
MoE built for the data-platform crowd. Long context, good code.
Mistral's flagship dense weights. Strong multilingual + function calling.
RAG + tool-use specialist. Non-commercial weights.
Multimodal Llama. Image + text reasoning at frontier scale.
Top open-weights generalist on LMSys. Tool-use solid.
Meta's 2025 70B refresh. Closes most of the 405B gap with smaller footprint.
Llama 3.1 70B finetune. RewardBench leader at release.
Steerable Llama 3 finetune. Sharper persona + tool-use control.
Apache-licensed bilingual model. Strong English + Chinese.
Two-card sweet spot. Apache-licensed Qwen sibling.
Best open-weights coder. Competitive with Sonnet on SWE-bench Verified.
Open o1-style reasoning model. Long chain-of-thought, math heavy.
Google's open weights. Sliding-window attention, efficient.
Single-GPU dense model. Strong for size.
Punchy MoE coder, single-GPU. 128K context, FIM training.
Synthetic-data trained. Math + reasoning above its weight class.
Fully open: weights + data + training code. The open-science benchmark.
Compact vision-language model. Single-GPU territory.
Small but mighty. Beats Llama 3 8B on most benchmarks.
The workhorse small model. Fits on a single 24GB card.
IBM's enterprise-trained open model. Code + instruction tuned.
Edge-deployment Llama. Pruned + distilled from 8B.
On-device class. Runs on a Raspberry Pi 5 with room to spare.
Numbers are curated estimates — not a leaderboard. The shape of the
radar matters more than any single number. Refreshed monthly from
public/data/ai-models-cloud.json and
public/data/ai-models-local.json.