NVIDIA

Nemotron 3 Nano Omni

Name: NVIDIA Nemotron 3 Nano Omni
Author: NVIDIA

Try It Compare

2026-04-28

Try It Compare

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and audio inputs and produces text output, enabling agents to perceive and reason across modalities in a single inference loop. Built on a hybrid MoE Transformer-Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS), it delivers approximately 2× higher throughput and 2.5× lower compute for video reasoning versus separate vision + speech pipelines. It supports up to 300K context length and a 16,384 reasoning budget, with extended thinking enabled via reasoning.

VisionReasoning|Proprietary Model

Knowledge Cutoff

Unknown

The date this AI finished learning. It may not know about things that happened after this date.

Input → Output Format

The types of content this AI can receive, and what it can produce in return.

Context Memory

256KIN66KOUT

The maximum amount of text the AI can read and process in a single request. A larger number means it can handle longer documents or conversations.

Cost/1M Words

—

The cost of using this AI directly in your own application. Shown in USD per 1 million units of text (tokens).

Calculate Cost

AI Performance Evaluation

Overall

AA Intelligence Index

21%↓18%

Reasoning & Math

GPQA Diamond

47%↓35%

HLE

5.3%↓12%

Coding

AA Coding Index

15%↓22%

TAU2

45%↓35%

TerminalBench

8.3%↓26%

SciCode

28%↓14%

Language & Instructions

IFBench

63%↑0%

AA-LCR

36%↓26%

Output Speed

Standard Mode

312tok/s↑235

First Output 6.96s

Source:Artificial Analysis

NVIDIA