How it works
1. Analyze your intent
A fast model (~200ms) reads your message and figures out: What’s the goal? How complex? Which tools are needed? Should this be a background task?Simple messages like “hi” or “what time is it?” skip this step entirely — zero added latency.
2. Pick the right strategy
Based on complexity, NIOM chooses how much effort to invest:
| Complexity | Example | What NIOM does |
|---|---|---|
| Simple | ”What’s my CPU usage?” | Quick response, 3 tool calls max |
| Standard | ”List my project structure and explain it” | Executes with light evaluation, up to 10 steps |
| Complex | ”Refactor this module into smaller files” | Full pipeline with quality evaluation loop, up to 25 steps |
| Long-running | ”Write a summary every 2 days” | Creates a background task with scheduling |
3. Execute with tools
The agent gets to work — reading files, running commands, searching the web, taking screenshots, calling MCP tools — whatever your request needs.
Smart model routing
NIOM doesn’t burn expensive models on cheap tasks. It uses the right model for each phase:| Role | What it does | Default | Why this model |
|---|---|---|---|
| Fast | Analyzes intent, evaluates quality | Groq Llama 3.3 70B | Sub-second, nearly free |
| Capable | Does the actual work | Your selected model | The workhorse — Claude, GPT-4o, etc. |
| Vision | Understands screenshots | GPT-4o or Claude | Needed for computer use tasks |
Why this matters
| What most AI agents do | What NIOM does |
|---|---|
| LLM decides when to stop (usually too early) | Quality criteria defined upfront, checked after |
| No quality assessment | Explicit evaluation with refinement loops |
| Same effort for “hi” and “refactor my codebase” | Adaptive depth based on detected complexity |
| Can’t loop back to improve | Execute → evaluate → refine → re-execute |
| One model for everything | Right model for each phase = faster + cheaper |