What NIOM can do
| Tool | What it does |
|---|---|
screenshot | Captures your full screen (or a region) and sends it to a vision model for analysis |
mouseClick | Left, right, or double-click at specific screen coordinates |
mouseMove | Moves the cursor (for triggering hover effects, tooltips, etc.) |
typeText | Types text at the current cursor position |
pressKey | Keyboard shortcuts: Ctrl+C, Alt+Tab, Ctrl+Shift+S, and more |
scroll | Scrolls up or down at a specific location |
getActiveWindow | Gets the title, process name, position, and size of the active window |
How it works
The process is simple and visual:- Screenshot → NIOM captures what’s on your screen
- Understand → A vision model (GPT-4o or Claude) analyzes the image — reading text, identifying buttons, understanding layout
- Plan → The agent decides what to click, type, or navigate
- Act → Mouse and keyboard actions are performed via native OS APIs
- Verify → Another screenshot confirms the action succeeded
Works on every platform
| Platform | Screenshot | Mouse/Keyboard | How |
|---|---|---|---|
| Windows | ✅ | ✅ | PowerShell + .NET, user32.dll, SendKeys |
| macOS | ✅ | ✅ | screencapture, cliclick, osascript |
| Linux | ✅ | ✅ | ImageMagick import, xdotool |
Example scenarios
Navigate a settings page
Navigate a settings page
Read a dashboard
Read a dashboard
You say: “Take a screenshot of my analytics dashboard and summarize the key metrics”NIOM: Captures the screen → the vision model reads all text, charts, and numbers → produces a clean summary with the key figures.
Automate a repetitive GUI task
Automate a repetitive GUI task
You say: “Open each PDF in my downloads folder and rename it based on its content”NIOM: Combines file tools (to list the PDFs) with computer use (to open each one, read the content via screenshot, then rename the file).
Fill out a form
Fill out a form
You say: “Fill out this registration form with my details”NIOM: Screenshots the form → identifies each field → clicks into each one → types the appropriate information → clicks Submit.
Screenshot management
Screenshots are handled automatically:- Saved to
~/.niom/screenshots/ - Only the last 5 are kept (auto-cleanup)
- Encoded as base64 for vision model analysis
- No screenshots are sent anywhere except your chosen AI provider