Agent Engine

Most AI tools run a simple loop: get your prompt, call an LLM, return text. If you’re lucky, it uses a tool. That’s it. NIOM is different. Every message flows through a reasoning pipeline that adapts its effort based on how complex your request is — from instant responses for quick questions to multi-step planning with quality checks for serious work.

How it works

Your message → Analyze → Route → Execute → Evaluate → Respond (or refine and try again)

1. Analyze your intent

A fast model (~200ms) reads your message and figures out: What’s the goal? How complex? Which tools are needed? Should this be a background task?Simple messages like “hi” or “what time is it?” skip this step entirely — zero added latency.

2. Pick the right strategy

Based on complexity, NIOM chooses how much effort to invest:

Complexity	Example	What NIOM does
Simple	”What’s my CPU usage?”	Quick response, 3 tool calls max
Standard	”List my project structure and explain it”	Executes with light evaluation, up to 10 steps
Complex	”Refactor this module into smaller files”	Full pipeline with quality evaluation loop, up to 25 steps
Long-running	”Write a summary every 2 days”	Creates a background task with scheduling

3. Execute with tools

The agent gets to work — reading files, running commands, searching the web, taking screenshots, calling MCP tools — whatever your request needs.

4. Check its own work

After execution, a separate evaluation step asks: Did we actually achieve the goal? What’s the quality?For complex tasks, this creates a refinement loop: execute → evaluate → improve → re-execute (up to 3 rounds) until the quality bar is met.

Smart model routing

NIOM doesn’t burn expensive models on cheap tasks. It uses the right model for each phase:

Role	What it does	Default	Why this model
Fast	Analyzes intent, evaluates quality	Groq Llama 3.3 70B	Sub-second, nearly free
Capable	Does the actual work	Your selected model	The workhorse — Claude, GPT-4o, etc.
Vision	Understands screenshots	GPT-4o or Claude	Needed for computer use tasks

This means analysis happens in ~200ms on a free-tier model, while your premium model is reserved for the work that actually matters.

Why this matters

What most AI agents do	What NIOM does
LLM decides when to stop (usually too early)	Quality criteria defined upfront, checked after
No quality assessment	Explicit evaluation with refinement loops
Same effort for “hi” and “refactor my codebase”	Adaptive depth based on detected complexity
Can’t loop back to improve	Execute → evaluate → refine → re-execute
One model for everything	Right model for each phase = faster + cheaper

This pipeline is what makes background tasks, MCP integration, and computer use reliable — every tool call goes through structured reasoning, not hope.

Getting Started

Guides

Architecture

Agent Engine

How it works

Smart model routing

Why this matters

Getting Started

Guides

Architecture

​How it works

​Smart model routing

​Why this matters

How it works

Smart model routing

Why this matters