Skip to main content
Most AI tools are limited to reading files and running terminal commands. NIOM goes further — it can see your screen, click buttons, type text, scroll through apps, and interact with any GUI application. This is what separates a file tool from a genuine computer agent.

What NIOM can do

ToolWhat it does
screenshotCaptures your full screen (or a region) and sends it to a vision model for analysis
mouseClickLeft, right, or double-click at specific screen coordinates
mouseMoveMoves the cursor (for triggering hover effects, tooltips, etc.)
typeTextTypes text at the current cursor position
pressKeyKeyboard shortcuts: Ctrl+C, Alt+Tab, Ctrl+Shift+S, and more
scrollScrolls up or down at a specific location
getActiveWindowGets the title, process name, position, and size of the active window

How it works

The process is simple and visual:
  1. Screenshot → NIOM captures what’s on your screen
  2. Understand → A vision model (GPT-4o or Claude) analyzes the image — reading text, identifying buttons, understanding layout
  3. Plan → The agent decides what to click, type, or navigate
  4. Act → Mouse and keyboard actions are performed via native OS APIs
  5. Verify → Another screenshot confirms the action succeeded

Works on every platform

PlatformScreenshotMouse/KeyboardHow
WindowsPowerShell + .NET, user32.dll, SendKeys
macOSscreencapture, cliclick, osascript
LinuxImageMagick import, xdotool

Example scenarios

You say: “Take a screenshot of my analytics dashboard and summarize the key metrics”NIOM: Captures the screen → the vision model reads all text, charts, and numbers → produces a clean summary with the key figures.
You say: “Open each PDF in my downloads folder and rename it based on its content”NIOM: Combines file tools (to list the PDFs) with computer use (to open each one, read the content via screenshot, then rename the file).
You say: “Fill out this registration form with my details”NIOM: Screenshots the form → identifies each field → clicks into each one → types the appropriate information → clicks Submit.

Screenshot management

Screenshots are handled automatically:
  • Saved to ~/.niom/screenshots/
  • Only the last 5 are kept (auto-cleanup)
  • Encoded as base64 for vision model analysis
  • No screenshots are sent anywhere except your chosen AI provider
Safety first. Destructive GUI actions (like clicking “Delete” or “Submit” buttons) always trigger NIOM’s approval flow — you’ll be asked before any risky action is performed.