Nvidia just dropped Nemotron-Nano-9B-V2, a “small” language model that lets you turn reasoning on and off like a budget-conscious light switch. Because who doesn’t want to decide whether their AI should think before it speaks?
This 9-billion-parameter hybrid—part Transformer, part Mamba—fits on a single A10 GPU and apparently runs six times faster than models its size. It aced benchmarks like MATH500 (97.8%, no big deal) and even lets you cap how many tokens it wastes “thinking” with /think
or /no_think
commands. Because nothing says “trustworthy AI” like telling it when to shut up.
Trained on synthetic data and licensed under Nvidia’s own “Open Model License”—which basically says “use it, but don’t make us look bad”—this thing is aimed squarely at devs who want performance without the power bill of a small nation.
Is it brilliant or just another cleverly packaged GPU sales pitch? Either way, it’s refreshing to see an AI that knows when to stop overthinking. Maybe we could all use that feature.
Nvidia’s New AI Model: Toggle Your Way to Sanity
N