Current

Onyx AI Open LLM Leaderboard

A curated benchmarking interface for open-weight models across coding, reasoning, and engineering tasks.

Signal

Best Open Source LLM Leaderboard 2026 | Open Source Model Rankings and Tier List | Onyx AI · Brave · 2026-03-12

Compare the best open source models and LLMs on coding, reasoning, math, and software engineering benchmarks. Tier list, benchmark scores, and head-to-head comparisons.

Context

Evaluation infrastructure is shifting from static paper-based benchmarks to dynamic, task-specific leaderboards. As open-weight model proliferation increases, operators require standardized metrics to select models for specific workflows without relying on vendor marketing claims. This signal represents a consolidation of performance data into a single accessible interface.

Relevance

Leaderboards function as operational literacy tools, reducing the cognitive load required to navigate the open-weights ecosystem. They provide a baseline for comparing model capabilities across different architectures and training regimes. This aligns with the Openflows principle of treating AI selection as a technical infrastructure decision rather than a consumer choice.

Current State

The Onyx AI interface aggregates scores across multiple domains including coding, reasoning, and mathematics. It utilizes tier lists to categorize models by performance brackets, facilitating quick identification of suitable candidates for specific tasks. The dashboard supports head-to-head comparisons, allowing operators to weigh trade-offs between model size, speed, and accuracy.

Open Questions

Methodology transparency remains a critical constraint; the specific datasets and evaluation protocols used for scoring are not immediately visible in the summary signal. Update frequency and latency in reflecting new model releases affect the currency of the data. There is also the question of whether the ranking algorithm introduces bias toward models with known benchmark overfitting.

Connections

The entry connects to the Chinese Open-Source Model Infrastructure circuit, as regional performance tiers often diverge on specific benchmarks. It supports the Open Weights Commons circuit by providing a mechanism for circulating evaluation data. Finally, it serves as an input layer for local inference tools like LM Studio, where ranking data informs model selection for deployment.

Connections

Linked from

External references

Mediation note

Tooling: OpenRouter / qwen/qwen3.5-flash-02-23

Use: drafted entry from external signal, assessed linkage against existing knowledge base

Human role: review, edit, and approve before publication

Limits: signal content may be incomplete; verify primary sources before publishing