Computer Use

Most AI tools are limited to reading files and running terminal commands. NIOM goes further — it can see your screen, click buttons, type text, scroll through apps, and interact with any GUI application. This is what separates a file tool from a genuine computer agent.

What NIOM can do

Tool	What it does
`screenshot`	Captures your full screen (or a region) and sends it to a vision model for analysis
`mouseClick`	Left, right, or double-click at specific screen coordinates
`mouseMove`	Moves the cursor (for triggering hover effects, tooltips, etc.)
`typeText`	Types text at the current cursor position
`pressKey`	Keyboard shortcuts: `Ctrl+C`, `Alt+Tab`, `Ctrl+Shift+S`, and more
`scroll`	Scrolls up or down at a specific location
`getActiveWindow`	Gets the title, process name, position, and size of the active window

How it works

The process is simple and visual:

Screenshot → NIOM captures what’s on your screen
Understand → A vision model (GPT-4o or Claude) analyzes the image — reading text, identifying buttons, understanding layout
Plan → The agent decides what to click, type, or navigate
Act → Mouse and keyboard actions are performed via native OS APIs
Verify → Another screenshot confirms the action succeeded

Works on every platform

Platform	Screenshot	Mouse/Keyboard	How
Windows	✅	✅	PowerShell + .NET, `user32.dll`, `SendKeys`
macOS	✅	✅	`screencapture`, `cliclick`, `osascript`
Linux	✅	✅	ImageMagick `import`, `xdotool`

Example scenarios

Navigate a settings page

You say: “Go to browser settings and enable dark mode”NIOM: Takes a screenshot → identifies the settings icon → clicks it → finds the dark mode toggle → clicks it → takes another screenshot to verify.

Read a dashboard

You say: “Take a screenshot of my analytics dashboard and summarize the key metrics”NIOM: Captures the screen → the vision model reads all text, charts, and numbers → produces a clean summary with the key figures.

Automate a repetitive GUI task

You say: “Open each PDF in my downloads folder and rename it based on its content”NIOM: Combines file tools (to list the PDFs) with computer use (to open each one, read the content via screenshot, then rename the file).

Fill out a form

You say: “Fill out this registration form with my details”NIOM: Screenshots the form → identifies each field → clicks into each one → types the appropriate information → clicks Submit.

Screenshot management

Screenshots are handled automatically:

Saved to ~/.niom/screenshots/
Only the last 5 are kept (auto-cleanup)
Encoded as base64 for vision model analysis
No screenshots are sent anywhere except your chosen AI provider

Safety first. Destructive GUI actions (like clicking “Delete” or “Submit” buttons) always trigger NIOM’s approval flow — you’ll be asked before any risky action is performed.

Getting Started

Guides

Architecture

Computer Use

What NIOM can do

How it works

Works on every platform

Example scenarios

Screenshot management

Getting Started

Guides

Architecture

​What NIOM can do

​How it works

​Works on every platform

​Example scenarios

​Screenshot management

What NIOM can do

How it works

Works on every platform

Example scenarios

Screenshot management