From f746dc45c296dca4dd5fc7a4b7bf9e6007bcf8db Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Fri, 23 May 2025 05:39:36 +0200 Subject: [PATCH] Add docs --- .cursor/rules/agent.mdc | 108 +++ .cursor/rules/mcp-inspector.mdc | 98 ++ .cursor/rules/safari.mdc | 216 +++++ .cursor/safari.mdc | 195 ++++ .cursor/scripts/peekaboo.scpt | 1614 +++++++++++++++++++++++++++++++ .cursor/scripts/terminator.scpt | 774 +++++++++++++++ README.md | 1205 +++++++---------------- docs/spec.md | 423 ++++++++ 8 files changed, 3797 insertions(+), 836 deletions(-) create mode 100644 .cursor/rules/agent.mdc create mode 100644 .cursor/rules/mcp-inspector.mdc create mode 100644 .cursor/rules/safari.mdc create mode 100644 .cursor/safari.mdc create mode 100755 .cursor/scripts/peekaboo.scpt create mode 100755 .cursor/scripts/terminator.scpt create mode 100644 docs/spec.md diff --git a/.cursor/rules/agent.mdc b/.cursor/rules/agent.mdc new file mode 100644 index 0000000..f0b048d --- /dev/null +++ b/.cursor/rules/agent.mdc @@ -0,0 +1,108 @@ +--- +description: +globs: +alwaysApply: false +--- +# Agent Instructions + +This file provides guidance to AI assistants when working with code in this repository. + +## Project Overview + +This is the `peekaboo` project, which provides a Model Context Protocol (MCP) server that enables executing AppleScript and JavaScript for Automation (JXA) scripts on macOS. The server features a knowledge base of pre-defined scripts accessible by ID and supports inline scripts, script files, and argument passing. + +## Architecture + +- **Server Configuration**: The server reads configuration from environment variables like `LOG_LEVEL` and `KB_PARSING`. +- **MCP Tools**: Two main tools are provided: + 1. `execute_script`: Executes AppleScript/JXA from inline content, file path, or knowledge base ID + 2. `get_scripting_tips`: Retrieves information from the knowledge base +- **Knowledge Base**: A collection of pre-defined scripts stored as Markdown files in `knowledge_base/` directory with YAML frontmatter +- **ScriptExecutor**: Core component that executes scripts via `osascript` command + +## Knowledge Base System + +The knowledge base (`knowledge_base/` directory) contains numerous Markdown files organized by category: +- Each file has YAML frontmatter with metadata: `id`, `title`, `description`, `language`, etc. +- The actual script code is contained in the Markdown body in a fenced code block +- Scripts can use placeholders like `--MCP_INPUT:keyName` and `--MCP_ARG_N` for parameter substitution + +## Common Development Commands + +```bash +# Install dependencies +npm install + +# Run the server in development mode with hot reloading +npm run dev + +# Build the TypeScript project +npm run build + +# Start the compiled server +npm run start + +# Lint the codebase +npm run lint + +# Format the codebase +npm run format + +# Validate the knowledge base +npm run validate +``` + +## Environment Variables + +- `LOG_LEVEL`: Set logging level (`DEBUG`, `INFO`, `WARN`, `ERROR`) - default is `INFO` +- `KB_PARSING`: Controls when knowledge base is parsed: + - `lazy` (default): Parsed on first request + - `eager`: Parsed when server starts + +## Working with the Knowledge Base + +When adding new scripts to the knowledge base: +1. Create a new `.md` file in the appropriate category folder +2. Include required YAML frontmatter (`title`, `description`, etc.) +3. Add the script code in a fenced code block +4. Run `npm run validate` to ensure the new content is correctly formatted + +## Code Execution Flow + +1. The `server.ts` file defines the MCP server and its tools +2. `knowledgeBaseService.ts` loads and indexes scripts from the knowledge base +3. `ScriptExecutor.ts` handles the actual execution of scripts +4. Input validation is handled via Zod schemas in `schemas.ts` +5. Logging is managed by the `Logger` class in `logger.ts` + +## Security and Permissions + +Remember that scripts run on macOS require specific permissions: +- Automation permissions for controlling applications +- Accessibility permissions for UI scripting via System Events +- Full Disk Access for certain file operations + +## Agent Operational Learnings and Debugging Strategies + +This section captures key operational strategies and debugging techniques for the agent (me) based on collaborative sessions. + +### Prioritizing Log Visibility for Debugging + +When an external tool or script (like AppleScript via `osascript`) returns cryptic errors, or when agent-generated code/substitutions might be faulty: + +1. **Suspect Dynamic Content**: Issues often stem from the dynamic content being passed to the external tool (e.g., incorrect placeholder substitutions leading to syntax errors in the target language). +2. **Enable/Add Detailed Logging**: Prioritize enabling any built-in detailed logging features of the tool in question (e.g., `includeSubstitutionLogs: true` for this project's `execute_script` tool). +3. **Ensure Log Visibility**: If standard debug logging doesn't appear in the primary output channel the user is observing, attempt to modify the code to force critical diagnostic information (like step-by-step transformations, variable states, or the exact content being passed externally) into that main output. This might involve temporarily altering the structure of the success or error messages to include these logs. + * **Confirm Restarts and Code Version**: For changes requiring server restarts (common in this project), leverage any features that confirm the new code is active. For example, the server startup timestamp and execution mode info appended to `get_scripting_tips` output helps verify that a restart was successful and the intended code version (e.g., TypeScript source via `tsx` vs. compiled `dist/server.js`) is running. + +### Iterative Simplification for Complex Patterns (e.g., Regex) + +If a complex pattern (like a regular expression) in code being generated or modified by the agent is not working as expected, and the cause isn't immediately obvious: + +1. **Isolate the Pattern**: Identify the specific complex pattern (e.g., a regex for string replacement). +2. **Drastically Simplify**: Reduce the pattern to its most basic form that should still achieve a part of the goal or match a core component of the target string. (e.g., simplifying `/(?:["'])--MCP_INPUT:(\w+)(?:["'])/g` to `/--MCP_INPUT:/g` to test basic matching of the placeholder prefix). +3. **Test the Simple Form**: Verify if this simplified pattern works. If it does, the core string manipulation mechanism is likely sound. +4. **Incrementally Rebuild & Test**: Gradually add back elements of the original complexity (e.g., capture groups, character sets, quantifiers, lookarounds, backreferences like `\1`). Test at each incremental step to pinpoint which specific construct or combination introduces the failure. This process helped identify that `(?:["'])` was problematic in our placeholder regex, leading to a solution using a capturing group and a backreference like `/(["'])--MCP_INPUT:(\w+)\1/g`. +5. **Verify Replacement Logic**: Ensure that if the pattern involves capturing groups for use in a replacement, the replacement logic correctly utilizes these captures and produces the intended output format (e.g., `valueToAppleScriptLiteral` for AppleScript). + +This methodical approach is more effective than repeatedly trying minor variations of an already complex and failing pattern. \ No newline at end of file diff --git a/.cursor/rules/mcp-inspector.mdc b/.cursor/rules/mcp-inspector.mdc new file mode 100644 index 0000000..183a595 --- /dev/null +++ b/.cursor/rules/mcp-inspector.mdc @@ -0,0 +1,98 @@ +--- +description: +globs: +alwaysApply: false +--- +Rule Name: mcp-inspector +Description: Debugging and verifying the `macos-automator-mcp` server via the MCP Inspector, using Playwright for UI automation and direct terminal commands for server management. This rule prioritizes stability and detailed verification through Playwright's introspection capabilities. + +**Required Tools:** +- `run_terminal_cmd` +- `mcp_playwright_browser_navigate` +- `mcp_playwright_browser_type` +- `mcp_playwright_browser_click` +- `mcp_playwright_browser_snapshot` +- `mcp_playwright_browser_console_messages` +- `mcp_playwright_browser_wait_for` + +**User Workspace Path Placeholder:** +- The path to the `start.sh` script will be specified as `[WORKSPACE_PATH]/start.sh`. +- The AI assistant executing this rule **MUST** replace `[WORKSPACE_PATH]` with the absolute path to the user's current project workspace (e.g., as found in the `` context block during rule execution). +- Example of a resolved path if the workspace is `/Users/username/Projects/my-mcp-project`: `/Users/username/Projects/my-mcp-project/start.sh`. + +--- +**Main Flow:** + +**Phase 1: Start MCP Inspector Server** +1. **Kill Existing Inspector Processes:** + * Action: Call `run_terminal_cmd`. + * `command`: `pkill -f 'npx @modelcontextprotocol/inspector' || true` + * `is_background`: `false` + * Expected: Cleans up any lingering Inspector processes. +2. **Start New Inspector Process:** + * Action: Call `run_terminal_cmd`. + * `command`: `npx @modelcontextprotocol/inspector` + * `is_background`: `true` + * Expected: MCP Inspector starts in the background. +3. **Wait for Inspector Initialization:** + * Action: Call `mcp_playwright_browser_wait_for`. + * `time`: `10` (seconds) + * Expected: Allows ample time for the Inspector server to be ready. This step requires an active Playwright page, so it's implicitly preceded by navigation in Phase 2 if the browser isn't already open. + +**Phase 2: Connect to Server via Playwright** +1. **Navigate to Inspector URL:** + * Action: Call `mcp_playwright_browser_navigate`. + * `url`: `http://127.0.0.1:6274` + * Expected: Playwright opens the MCP Inspector web UI. + * Snapshot: Take a snapshot (`mcp_playwright_browser_snapshot`) to confirm page load and identify initial form element references (`ref`). +2. **Fill Form (Command & Args only):** + * **Set Command:** + * Action: Call `mcp_playwright_browser_type`. + * `element`: "Command textbox" (Obtain `ref` from snapshot). + * `text`: `macos-automator-mcp` + * **Set Arguments:** + * Action: Call `mcp_playwright_browser_type`. + * `element`: "Arguments textbox" (Obtain `ref` from snapshot). + * `text`: `[WORKSPACE_PATH]/start.sh` (This placeholder MUST be replaced by the AI executing the rule with the absolute path to the user's current workspace). + * *(Note: Environment Variables are skipped in this flow for simplicity and stability, as issues were previously observed when setting LOG_LEVEL=DEBUG during connection.)* +3. **Click "Connect":** + * Action: Call `mcp_playwright_browser_click`. + * `element`: "Connect button" (Obtain `ref` from snapshot). + * Expected: Connection to the `macos-automator-mcp` server is established. + * Snapshot: Take a snapshot. Verify connection status (e.g., text changes to "Connected") and check for initial server logs in the UI. + +**Phase 3: Interact with a Tool via Playwright** +1. **List Tools:** + * Action: Call `mcp_playwright_browser_click`. + * `element`: "List Tools button" (Obtain `ref` from the latest snapshot). + * Expected: The list of available tools from the `macos-automator-mcp` server is displayed. + * Snapshot: Take a snapshot. Verify tools like `execute_script` and `get_scripting_tips` are visible. +2. **Select 'get_scripting_tips' Tool:** + * Action: Call `mcp_playwright_browser_click`. + * `element`: "get_scripting_tips tool in list" (Obtain `ref` by identifying it in the snapshot's tool list). + * Expected: The parameters form for `get_scripting_tips` is displayed in the right-hand panel. + * Snapshot: Take a snapshot. Verify the right panel shows details for `get_scripting_tips` (e.g., its name, description, and parameter fields like 'searchTerm', 'listCategories', etc.). +3. **Execute 'get_scripting_tips' (default parameters):** + * Action: Call `mcp_playwright_browser_click`. + * `element`: "Run Tool button" (Obtain `ref` for the 'Run Tool' button specific to the `get_scripting_tips` form in the right panel from the snapshot). + * Expected: The `get_scripting_tips` tool is executed with its default parameters. + * Snapshot: Take a snapshot. + +**Phase 4: Verify Tool Execution and Logs in Playwright** +1. **Check for Results in UI:** + * Action: Examine the latest snapshot. + * Look for: The results of the `get_scripting_tips` call (e.g., a list of script categories if `listCategories` was implicitly true by default, or an empty result if no default search term was run). + * The results should appear in the 'Result from tool' or a similarly named section within the right-hand panel where the tool's form was. +2. **Check Console Logs (Optional but Recommended):** + * Action: Call `mcp_playwright_browser_console_messages`. + * Expected: Review for any errors or relevant messages from the Inspector or the tool interaction. +3. **Check MCP Server Logs in UI:** + * Action: Examine the latest snapshot. + * Look for: Logs related to the `get_scripting_tips` tool execution in the main server log panel (usually bottom-left, titled "Error output from MCP server" or similar, but also shows general logs). + +**Troubleshooting Notes:** +- If connection fails, check the `run_terminal_cmd` output for the Inspector to ensure it started correctly. +- Check Playwright console messages for clues. +- Ensure the `[WORKSPACE_PATH]` was correctly resolved and points to an existing `start.sh` script. +- Element `ref` values can change. Always use the latest snapshot to get correct `ref` values before an interaction. +- Shadow DOM: The MCP Inspector UI uses Shadow DOM extensively for the tool details and results panels. Playwright's default selectors should pierce Shadow DOM, but if issues arise with finding elements *within* the tool panel (right-hand side after selecting a tool), be mindful of this. The provided flow assumes Playwright's auto-piercing handles this sufficiently. diff --git a/.cursor/rules/safari.mdc b/.cursor/rules/safari.mdc new file mode 100644 index 0000000..f08d9bd --- /dev/null +++ b/.cursor/rules/safari.mdc @@ -0,0 +1,216 @@ +--- +description: +globs: +alwaysApply: false +--- +### Meta Note + +This file, `safari.mdc`, serves as a repository for detailed working notes, observations, and learnings acquired during the process of automating Safari interactions, particularly for the MCP Inspector UI. It's intended to capture the nuances of trial-and-error, debugging steps, and insights into what worked, what didn't, and why. + +This contrasts with `mcp-inspector.mdc`, which is designed to be the concise, polished, and operational ruleset for future automated runs once a specific automation flow (like connecting to the MCP Inspector) has been stabilized and proven reliable. `mcp-inspector.mdc` should contain the 'final' working scripts and minimal necessary commentary, while `safari.mdc` is the space for the extended antechamber of discovery. + +--- + +### Key Learnings and Observations from Safari Automation (MCP Inspector) + +#### 1. Managing Safari Windows and Tabs for the Inspector + +* **Objective:** Reliably direct Safari to the MCP Inspector URL (`http://127.0.0.1:6274`) in a predictable way, preferably using a single, consistent browser window and tab to avoid disrupting the user's workspace or losing context. +* **Initial Challenges & Evolution: + * Simply using `make new document with properties {URL:"..."}` could lead to multiple windows/tabs if not managed. + * Attempts to close all existing Inspector tabs first (`repeat with w in windows... close t...`) were functional but could be overly aggressive if the user had other work in Safari. + * Identifying and reusing an *existing specific tab* for the Inspector requires careful targeting (e.g., `first tab whose URL starts with "..."`). If this tab was from a previous, unconfigured session, just switching to it wasn't enough; it needed to be reloaded/reset. +* **Refined & Recommended Approach (as implemented in `mcp-inspector.mdc`): + ```applescript + tell application "Safari" + activate + delay 0.2 -- Allow Safari to become the frontmost application + if (count of windows) is 0 then + -- No Safari windows are open, so create a new one. + make new document with properties {URL:"http://127.0.0.1:6274"} + else + -- Safari has windows open; use the frontmost one. + tell front window + set inspectorTab to missing value + try + -- Check if a tab for the Inspector is already open in this window. + set inspectorTab to (first tab whose URL starts with "http://127.0.0.1:6274") + end try + + if inspectorTab is not missing value then + -- An Inspector tab exists: set its URL again (to refresh/reset) and make it active. + set URL of inspectorTab to "http://127.0.0.1:6274" + set current tab to inspectorTab + else + -- No specific Inspector tab found: set the URL of the *current active tab*. + set URL of current tab to "http://127.0.0.1:6274" + end if + end tell + end if + delay 1 -- Pause to allow the page to begin loading. + end tell + ``` + This logic aims to use the existing front window and either reuse/refresh an Inspector tab or repurpose the current active tab, falling back to creating a new window only if Safari isn't open. + +#### 2. Clicking Elements Programmatically (The "Connect" Button Saga) + +* **The Core Challenge:** Programmatically clicking the "Connect" button in the MCP Inspector UI to initiate the server connection. +* **Strategies Explored & Lessons: + * **CSS Selectors (`querySelector`):** + * Simple selectors like `[data-testid='env-vars-button']` worked for some buttons but required escaping single quotes in AppleScript: `do JavaScript "document.querySelector('[data-testid=\\\'env-vars-button\\']').click();"`. + * A complex `querySelector` for the "Connect" button (e.g., `'button[data-testid*=connect-button], button:not([disabled])... > span:contains(Connect)...'.click()`) ran without JS error but didn't reliably establish the connection, suggesting it might not have found the exact interactable element or the click wasn't registering correctly. + * **XPath (`document.evaluate`):** + * **Highly Specific XPaths:** An initial XPath based on the rule (`//button[contains(., 'Connect') and .//svg[.//polygon[@points='6 3 20 12 6 21 6 3']]]`) was very difficult to embed correctly in AppleScript due to nested single quotes requiring complex escaping (`\'`). This often led to AppleScript parsing errors (`-2741`). + * **`character id 39` for AppleScript String Construction:** To combat escaping issues, building the JavaScript string in AppleScript using `set sQuote to character id 39` for internal single quotes was effective for getting the AppleScript parser to accept the command. Example: + ```applescript + set sQuote to character id 39 + set jsConnectText to "Connect" + set specificXPath to "//button[contains(., " & sQuote & jsConnectText & sQuote & ") and .//svg[.//polygon[@points=" & sQuote & "6 3 20 12 6 21 6 3" & sQuote & "]]]" + set jsCommand to "document.evaluate(" & sQuote & specificXPath & sQuote & ", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.click();" + ``` + While this made the AppleScript runnable, this very specific XPath still didn't reliably trigger the connection. + * **Successful XPath:** The breakthrough came with a slightly less specific but more robust XPath: `//button[.//text()='Connect']`. This finds a button that *contains* a text node exactly matching "Connect". + * JavaScript: `document.evaluate("//button[.//text()='Connect']", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.click();` + * AppleScript embedding (note `\"` for JS string quotes): + ```applescript + set jsCommand to "document.evaluate(\"//button[.//text()='Connect']\", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.click();" + do JavaScript jsCommand in front document + ``` + This method proved successful in clicking the button and establishing the connection. + * **`dispatchEvent(new MouseEvent('click', ...))`:** This was tried as an alternative to `.click()` but did not yield a different outcome for the "Connect" button in this specific scenario. + +#### 3. JavaScript Construction and Execution in AppleScript + +* **`do JavaScript "..."`:** This is the fundamental command. +* **String Literals and Escaping:** + * If the AppleScript command itself is enclosed in double quotes (`"..."`), then any literal double quotes *within the JavaScript code* must be escaped as `\\"`. + * Single quotes (`'`) within the JavaScript code usually do not need escaping in this context. + * Example: `do JavaScript "var el = document.getElementById(\"myId\"); el.value = 'Hello\';"` +* **Long/Multiline JavaScript:** + * Concatenating multiple AppleScript string literals using `&` (and optionally `Β¬` for line continuation) can build up a long JavaScript command. However, this can be fragile if not every part is perfectly quoted and escaped. Often, AppleScript parsing errors (`-2741`) occur before the JS is even attempted. + * For complex JS, it's often more robust to ensure the entire JavaScript code is a single, well-formed string literal from AppleScript's perspective. If the JS itself is very complex, pre-constructing parts of it in AppleScript variables (especially strings that need careful quoting, like XPaths) can help. +* **Returning Values:** The `do JavaScript` command returns the result of the last JavaScript statement executed. This can be invaluable for debugging, e.g., `return 'Found element';` or `return element !== null;`. + +#### 4. Asynchronicity and Delays + +* **Essential `delay` commands (Strategic vs. Tactical):** + * **Strategic Delay (Crucial):** A critical lesson was the necessity of a significant delay (e.g., ~5 seconds) *after* an external process like the MCP Inspector is launched (e.g., via `npx` in iTerm) and *before* Safari automation attempts to interact with its web UI. This allows the external process and its web server to fully initialize. Without this, Safari automation might target a page that isn't ready or fully functional, leading to failures. + * **Tactical Delays (Within Safari UI Automation - Often Avoidable):** Initially, small `delay` commands were used within Safari AppleScripts after actions like clicks or page loads (e.g., `delay 0.25`, `delay 1`). While these can sometimes help ensure the DOM is updated, the latest successful runs showed that if the backend/server (Inspector) is fully ready (due to the strategic delay), rapid Safari UI interactions (form filling, sequential clicks) can often be performed reliably *without* these internal micro-delays. Removing them can speed up the automation if the underlying application is responsive enough. + * **Context is Key:** The need for tactical delays depends on how quickly the web application updates its DOM and responds to JavaScript events. For the MCP Inspector, once it's running, its UI seems to respond quickly enough to handle a sequence of JavaScript commands without interspersed AppleScript delays, provided the commands themselves are valid and target the correct elements. + +* **Checking for Results:** When verifying an action (e.g., checking if `document.body.innerText.includes('Connected')`), it's vital that this check happens *after* the action has had a chance to complete and the UI to reflect the change. If running without tactical delays, this check should still be performed after the relevant JavaScript action that's supposed to cause the change. + +#### 5. MCP Inspector Specifics + +* **URL Consistency:** The MCP Inspector URL (`http://127.0.0.1:6274`) was found to be consistent between runs, simplifying Safari targeting. +* **Server Logs in the Inspector UI:** It was confirmed that after the `macos-automator-mcp` server connects via the MCP Inspector, its startup and operational logs (e.g., `[macos_automator_server] [INFO] Starting...`) are displayed directly within the MCP Inspector's web interface in Safari. This is the primary place to check for these server-specific logs, rather than the iTerm console running the `npx @modelcontextprotocol/inspector` command (which shows the Inspector's own proxy/connection logs). The Safari UI shows "Connected" status, and the server logs within the UI provide detailed confirmation of the server's state. + +#### 6. Automating iTerm via AppleScript and Advanced Timing Considerations + +* **Full iTerm Automation via AppleScript:** Due to persistent issues with iTerm-specific MCP tools (e.g., `mcp_iterm_send_control_character`, `mcp_iterm_write_to_terminal` consistently failing with "Tool not found" errors), a robust AppleScript workaround was developed and successfully implemented to manage the iTerm portion of the MCP Inspector setup. This script handles: + * Activating iTerm. + * Ensuring a window is available. + * Sending a Control-C command to the current session using `System Events` (for reliability, targeting the iTerm process) to terminate any running commands. + * Writing the `npx @modelcontextprotocol/inspector` command to the iTerm session to start the inspector. + * The successful AppleScript structure is as follows (and now part of `mcp-inspector.mdc`): + ```applescript + tell application "iTerm" + activate + if (count of windows) is 0 then + create window with default profile + delay 0.5 # Brief delay for window creation + end if + end tell + delay 0.2 # Ensure iTerm is frontmost + + tell application "System Events" + # Note: 'iTerm' process name might need to be 'iTerm2' for iTerm3+. + tell process "iTerm" + keystroke "c" using control down + end tell + end tell + delay 0.2 # Pause after Ctrl-C + + tell application "iTerm" + tell current window + tell current session + write text "npx @modelcontextprotocol/inspector" + end tell + end tell + end tell + ``` + +* **iTerm Process Name in System Events:** When using `System Events` to control iTerm (e.g., for `keystroke`), the `tell process "iTerm"` command might need to be `tell process "iTerm2"` if using iTerm version 3 or later, as the application's registered process name can vary. + +* **Reinforcing the Strategic Delay:** The success of running Safari UI automation steps *without* internal (tactical) delays is highly dependent on the *strategic* delay implemented *after* initiating the MCP Inspector in iTerm and *before* beginning any Safari interaction. A delay of approximately 5 seconds was found to be effective, allowing `npx` and the Inspector server to fully initialize. Attempting Safari automation too soon, especially without tactical delays, will likely result in failures as the web UI won't be ready or responsive. + +#### 7. Interacting with Shadow DOM (Advanced) + +* **Identifying Shadow DOM:** Some web UIs, including potentially parts of the MCP Inspector (especially complex, self-contained components like the tool details and results panels), may use Shadow DOM to encapsulate their structure and styles. Standard `document.querySelector` or `document.evaluate` calls from the main document context will *not* pierce these shadow boundaries. +* **Symptoms of Shadow DOM:** If `document.body.innerText` seems to miss details of an active UI component, or if standard selectors fail for visible elements that are clearly part of a specific component, Shadow DOM may be in use. +* **Accessing Elements within Shadow DOM (Conceptual JavaScript Approach):** + To interact with elements inside a shadow root, you first need a reference to the host element, then access its `shadowRoot` property, and then query within that root. + ```javascript + // 1. Find the host element (custom element tag name, e.g., 'tool-details-panel') + const hostElement = document.querySelector('your-shadow-host-tag-name'); + + if (hostElement && hostElement.shadowRoot) { + const shadowRoot = hostElement.shadowRoot; + + // 2. Query within the shadowRoot for target elements + const targetElementInShadow = shadowRoot.querySelector('#some-element-inside-shadow'); + if (targetElementInShadow) { + // targetElementInShadow.click(); + // return targetElementInShadow.textContent; + } else { + // return 'Element not found within shadowRoot'; + } + } else { + // return 'Shadow host not found or no shadowRoot attached'; + } + ``` +* **Recursive Deep Query Helper (Conceptual):** For nested shadow DOMs or when the exact host is unknown, a recursive or iterative deep query function can be useful. This function would traverse the DOM, checking each element for a `shadowRoot` and searching within it. + ```javascript + function $deep(selector, rootNode = document) { + const stack = [rootNode]; + while (stack.length) { + const currentNode = stack.shift(); + if (currentNode.nodeType === Node.ELEMENT_NODE && currentNode.matches(selector)) { + return currentNode; + } + if (currentNode.shadowRoot) { + stack.push(currentNode.shadowRoot); + } + // Check children only if it's an Element or DocumentFragment (like a shadowRoot) + if (currentNode.nodeType === Node.ELEMENT_NODE || currentNode.nodeType === Node.DOCUMENT_FRAGMENT_NODE) { + if (currentNode.children) { // Ensure children property exists + stack.push(...currentNode.children); + } + } + } + return null; + } + // Usage: const someButton = $deep('button.some-class-in-shadow'); + ``` +* **Challenges with AppleScript `do JavaScript`:** + * **Return Value Limitations:** Complex objects (like DOM elements) or very large strings (like extensive `outerHTML`) returned from `do JavaScript` can sometimes result in `missing value` or empty strings in AppleScript, making debugging difficult. + * **Debugging:** Direct console logging from `do JavaScript` is not visible to the AppleScript environment, complicating troubleshooting of JavaScript execution within Safari. + * **Reliability:** For highly dynamic UIs with extensive Shadow DOM, the AppleScript `do JavaScript` bridge may not always be reliable enough for complex, multi-step interactions, especially when precise timing or access to nuanced DOM states is required. Direct API/tool calls, if available, are often more robust for verification in such cases. +* **Discovering Shadow Host Tag Names:** If the specific tag name of a shadow host is unknown, one might attempt to list all elements that have a `shadowRoot`: + ```javascript + // JavaScript to be executed via AppleScript to list shadow host tag names + // (Note: Return value handling by AppleScript needs to be robust, e.g., JSON stringify) + // let hosts = [...document.querySelectorAll('*')]\ + // .filter(el => el.shadowRoot)\ + // .map(el => el.tagName);\ + // return JSON.stringify(hosts);\ + ``` + However, successful execution and return of this data via AppleScript `do JavaScript` can be unreliable, as experienced in attempts to automate the MCP Inspector. + +These notes capture the iterative process and key takeaways from the Safari automation for the MCP Inspector. The successful methods are now enshrined in `mcp-inspector.mdc`, while this document provides the background and context. + +--- +### Meta-Level Collaboration & Rule Evolution Notes + +* **Rule Refinement for Readability (User Feedback):** Based on user feedback, the main operational rule file (`mcp-inspector.mdc`) was refactored to move lengthy scripts (like the Safari tab setup AppleScript) into an Appendix section (e.g., `[Setup Safari Tab for Inspector]`). This keeps the main flow of the rule concise and readable for both humans and models, while still providing the full implementation details in a structured way. The `safari.mdc` file is designated for the more verbose, evolutionary notes and debugging narratives. +* **Tool Usage Preferences (User Feedback):** User indicated a preference for using the `edit_file` tool for modifying rule files (like `.mdc` files) rather than `claude_code`. This allows the user to review the diff in their IDE before the change is effectively applied by the AI. This preference will be honored for future rule file modifications. diff --git a/.cursor/safari.mdc b/.cursor/safari.mdc new file mode 100644 index 0000000..e5653fb --- /dev/null +++ b/.cursor/safari.mdc @@ -0,0 +1,195 @@ +--- +description: +globs: +alwaysApply: false +--- +#### 5. MCP Inspector Specifics + +* **URL Consistency:** The MCP Inspector URL (`http://127.0.0.1:6274`) was found to be consistent between runs, simplifying Safari targeting. +* **"Connected" State vs. iTerm Logs:** A key finding was that the Safari Inspector UI can show "Connected" (and tools subsequently work) even if detailed `DEBUG`-level logs from the launched server process (`start.sh` -> `node dist/server.js`) do not appear in the iTerm console where `npx @modelcontextprotocol/inspector` is running. The Inspector seems to show its own proxying/connection logs, but the full stdout/stderr of the child might not always be visible there. This means successful connection and tool usability are the primary indicators, and absence of detailed server logs in the iTerm console is not necessarily a showstopper for basic interaction, though it would affect deeper debugging of the server itself. + +These notes capture the iterative process and key takeaways from the Safari automation for the MCP Inspector. The successful methods are now enshrined in `mcp-inspector.mdc`, while this document provides the background and context. + +This contrasts with `mcp-inspector.mdc`, which is designed to be the concise, polished, and operational ruleset for future automated runs once a specific automation flow (like connecting to the MCP Inspector) has been stabilized and proven reliable. `mcp-inspector.mdc` should contain the 'final' working scripts and minimal necessary commentary, while `safari.mdc` is the space for the extended antechamber of discovery. + +* **Clarification on `[WORKSPACE_PATH]` Resolution:** The placeholder `[WORKSPACE_PATH]` used in rules (e.g., for script paths like `[WORKSPACE_PATH]/start.sh`) must be dynamically replaced by the AI with the absolute path of the current project workspace. This path is typically available to the AI from its context (e.g., derived from `user_info.workspace_path` or a similar environment variable). It is crucial that the AI ensures the resolved path is correctly quoted if it's used in shell commands or script arguments, especially if the path might contain spaces or special characters. For instance, a path like `/Users/username/My Projects/project-name` should be passed as `'/Users/username/My Projects/project-name'` in a shell command. + +--- + +### Strategies for Robust Element Selection + +When automating UI interactions, the reliability of your scripts heavily depends on how you identify and select HTML elements. Here's a hierarchy of preferences and tips for making your selectors more robust: + +1. **`data-testid` Attributes (Gold Standard):** + * **Why:** These are custom attributes specifically added for testing and automation. They are decoupled from styling and functional implementation details, making them the most resilient to UI changes. + * **Example (CSS):** `[data-testid='user-login-button']` + * **Example (XPath):** `//*[@data-testid='user-login-button']` + +2. **Unique `id` Attributes:** + * **Why:** `id` attributes are *supposed* to be unique within a page. If developers adhere to this, they are very reliable. + * **Example (CSS):** `#submit-form` + * **Example (XPath):** `//*[@id='submit-form']` + +3. **Stable `aria-label`, `aria-labelledby`, `role`, or other Accessibility Attributes:** + * **Why:** Accessibility attributes are often more stable than class names used for styling, as they relate to the element's function and purpose. + * **Example (CSS):** `button[aria-label='Open settings']` + * **Example (XPath):** `//button[@aria-label='Open settings']` + +4. **Stable Class Names (Used for Structure/Function, Not Just Styling):** + * **Why:** Some class names indicate the structure or function of an element rather than just its appearance. These can be reasonably stable. Avoid classes that are purely presentational (e.g., `color-blue`, `margin-small`). + * **Example (CSS):** `.user-profile-card .username` (Contextual selection) + * **Example (XPath):** `//div[contains(@class, 'user-profile-card')]//span[contains(@class, 'username')]` + +5. **Structural XPaths (Based on DOM hierarchy):** + * **Why:** Relying on the element's position within the DOM (e.g., "the second `div` inside a `section` with a specific header"). These are more brittle than attribute-based selectors because any structural change can break them. Use sparingly and keep them as simple as possible. + * **Example (XPath):** `//section[@id='main-content']/div[2]/p` + +6. **Text-Based XPaths (Using visible text):** + * **Why:** Selecting elements based on their visible text content (e.g., a button with the text "Submit"). Can be useful, but prone to breakage if the text changes (e.g., for localization or wording updates). + * **Example (XPath):** `//button[text()='Submit']` or `//button[contains(text(), 'Submit')]` + * **Tip for Robustness:** Use XPath's `normalize-space()` function to handle variations in whitespace (leading, trailing, multiple internal spaces). + * `//button[normalize-space(text())='Submit']` (Matches " Submit ", "Submit", " Submit" etc.) + * `//a[contains(normalize-space(.), 'Learn More')]` (Checks within any descendant text nodes) + +**General Tips for Selectors:** + +* **Prefer CSS Selectors for Simplicity and Speed:** When applicable, CSS selectors are often more concise and can be faster than XPaths. +* **Use Browser Developer Tools:** Actively use the "Inspect Element" feature in your browser to test and refine your CSS selectors and XPaths. Most dev tools allow you to directly test them. +* **Avoid Generated IDs/Classes:** Be wary of IDs or class names that look auto-generated (e.g., `id="ext-gen1234"`), as these are likely to change between page loads or application versions. +* **Context is Key:** Instead of overly complex global selectors, try to select a stable parent element first, then find the target element within that parent's context. This often leads to simpler and more reliable selectors. + +--- + +### Debugging AppleScript `do JavaScript` Execution Flow + +Successfully executing JavaScript via AppleScript's `do JavaScript` command often involves navigating two potential layers of errors: AppleScript parsing errors and JavaScript runtime errors. Here's how to approach debugging: + +**1. Differentiating Error Types:** + +* **AppleScript Compile-Time/Parsing Errors (e.g., `-2741`):** + * **Symptom:** The AppleScript editor shows an error, or the script fails immediately when run, often with error messages like "Syntax Error," "Expected end of line but found...", or specific error codes like `-2741` (which typically means the command couldn't be parsed correctly, often due to malformed strings or incorrect quoting). + * **Cause:** The AppleScript interpreter itself cannot understand the structure of your `do JavaScript "..."` command, usually due to incorrect quoting or escaping of characters *within the AppleScript string that defines the JavaScript code*. + * **The JavaScript code itself hasn't even been sent to Safari yet.** + +* **JavaScript Runtime Errors:** + * **Symptom:** The AppleScript command runs without an immediate AppleScript error, but the desired action doesn't occur in Safari, or `do JavaScript` returns an error message from the JavaScript engine (e.g., "TypeError: null is not an object" or "SyntaxError: Unexpected identifier"). + * **Cause:** The JavaScript code was successfully passed to Safari, but the JavaScript engine encountered an error while trying to execute it (e.g., trying to access a property of a non-existent element, incorrect JS syntax, etc.). + +**2. Debugging AppleScript Syntax/Parsing Errors:** + +* **Simplify the JavaScript String:** Start with the simplest possible JavaScript that should work, e.g.: + ```applescript + tell application "Safari" + do JavaScript "'test';" in front document + end tell + ``` +* **Log the Constructed JavaScript String:** Before the `do JavaScript` line, use AppleScript's `log` command to print the exact JavaScript string you are about to send. This helps you visually inspect it for quoting issues. + ```applescript + set jsCommand to "document.getElementById(\"myButton\").click();" + log jsCommand + tell application "Safari" + do JavaScript jsCommand in front document + end tell + ``` + Check the logged output carefully in Script Editor's "Messages" tab. +* **Build Complex Strings Incrementally:** If your JavaScript is complex, build it in parts using AppleScript variables. This can make it easier to manage quoting for each part. +* **Master Quoting:** + * If AppleScript string is in double quotes (`"..."`): Escape internal JS double quotes as `\"`. JS single quotes usually don't need escaping. + * Use `character id 39` for single quotes if constructing JS with many internal single quotes to avoid confusion: `set sQuote to character id 39`. `set jsCommand to "var name = " & sQuote & "Pete" & sQuote & ";"` + +**3. Debugging JavaScript Runtime Errors:** + +* **Test in Safari's Web Inspector Console:** The most effective way to debug the JavaScript itself is to open Safari, navigate to the target page, open the Web Inspector (Develop > Show Web Inspector), and paste your JavaScript snippet directly into the Console. This provides immediate feedback, error messages, and allows for interactive debugging. +* **Use `try...catch` in Your JavaScript:** Wrap your JavaScript code in a `try...catch` block to capture and return error messages back to AppleScript. This can make it much easier to see what went wrong inside Safari. + ```applescript + set jsCommand to "try { document.getElementById('nonExistentElement').value = 'test'; return 'Success'; } catch(e) { return 'JS Error: ' + e.name + ': ' + e.message; }" + tell application "Safari" + set jsResult to do JavaScript jsCommand in front document + log jsResult + end tell + ``` +* **Return Values for Debugging:** Have your JavaScript return intermediate values or status indicators to AppleScript to understand its state. + ```applescript + set jsCommand to "var el = document.getElementById('myField'); if (el) { return 'Element found!'; } else { return 'Element NOT found.'; }" + log (do JavaScript jsCommand in front document) + ``` + +By systematically checking for AppleScript parsing issues first, then moving to debug the JavaScript logic within Safari's environment, you can effectively troubleshoot `do JavaScript` commands. + +--- + +### Advanced Asynchronous Handling: Polling for Conditions + +Web pages load and update content asynchronously. Relying on fixed `delay` commands in AppleScript after an action (like a click or page navigation) can be unreliable because the actual time needed for the UI to update can vary due to network speed, server load, or client-side processing. + +A more robust approach is to actively poll for a specific condition to be met (e.g., an element appearing, text changing, a certain JavaScript variable becoming true) before proceeding. This makes your scripts more resilient to timing variations. + +**How Polling Works:** + +1. Define the JavaScript code that checks for your desired condition (this should return `true` or `false`). +2. In AppleScript, create a loop that: + * Executes the JavaScript check. + * If the condition is met, exit the loop. + * If not, wait for a short interval (e.g., 0.5 seconds). + * Include a counter or timeout mechanism to prevent the loop from running indefinitely if the condition is never met. + +**Example: Polling for 'Connected' Status in MCP Inspector** + +This AppleScript snippet demonstrates polling for the text "Connected" to appear on the page after clicking the connect button: + +```applescript +-- JavaScript to check if the page body contains the text "Connected" +set jsCheckConnected to "document.body.innerText.includes('Connected');" + +set isNowConnected to false +set attempts to 0 +set maxAttempts to 20 -- Set a reasonable limit, e.g., 20 attempts +set pollInterval to 0.5 -- Wait 0.5 seconds between attempts + +log "Polling for 'Connected' status..." + +tell application "Safari" + tell front document + repeat while isNowConnected is false and attempts < maxAttempts + try + if (do JavaScript jsCheckConnected) is true then + set isNowConnected to true + log "Status changed to 'Connected' after " & (attempts + 1) & " attempts." + else + delay pollInterval + end if + on error errMsg number errNum + log "Error during JavaScript check (attempt " & (attempts + 1) & "): " & errMsg & " (Number: " & errNum & ")" + -- Decide if you want to stop on error or just log and continue + delay pollInterval -- Still delay even if JS itself errored, maybe it's a temporary issue + end try + set attempts to attempts + 1 + end repeat + end tell +end tell + +if isNowConnected then + log "Successfully confirmed 'Connected' status via polling." + -- Proceed with next actions that depend on being connected +else + log "Failed to see 'Connected' status within " & (maxAttempts * pollInterval) & " seconds." + -- Handle the failure case (e.g., log error, stop script) +end if +``` + +**Benefits of Polling:** + +* **Increased Reliability:** Scripts wait only as long as necessary, adapting to real-time conditions rather than fixed, potentially too short or too long, delays. +* **Reduced Brittleness:** Less likely to fail due to unexpected slowdowns. +* **Clearer Intent:** The script explicitly states what condition it's waiting for. + +**Considerations:** + +* **Timeout:** Always implement a maximum number of attempts or a total timeout to prevent infinite loops if the condition never occurs. +* **Poll Interval:** Choose a reasonable interval. Too short can be resource-intensive; too long can make the script feel sluggish. +* **Error Handling:** Include `try...on error` blocks within your loop to gracefully handle potential errors during the JavaScript execution (e.g., if the page is still transitioning and elements are not yet available). + +--- + +### Meta-Level Collaboration & Rule Evolution Notes + diff --git a/.cursor/scripts/peekaboo.scpt b/.cursor/scripts/peekaboo.scpt new file mode 100755 index 0000000..71db53f --- /dev/null +++ b/.cursor/scripts/peekaboo.scpt @@ -0,0 +1,1614 @@ +#!/usr/bin/osascript +-------------------------------------------------------------------------------- +-- peekaboo.scpt - v1.0.0 "Peekaboo Pro! πŸ‘€ β†’ πŸ“Έ β†’ πŸ’Ύ" +-- Enhanced screenshot capture with multi-window support and app discovery +-- Peekabooβ€”screenshot got you! Now you see it, now it's saved. +-- +-- IMPORTANT: This script uses non-interactive screencapture methods +-- Do NOT use flags like -o -W which require user interaction +-- Instead use -l for specific window capture +-------------------------------------------------------------------------------- + +--#region Configuration Properties +property scriptInfoPrefix : "Peekaboo πŸ‘€: " +property defaultScreenshotFormat : "png" +property captureDelay : 0.3 +property windowActivationDelay : 0.2 +property enhancedErrorReporting : true +property verboseLogging : false +property maxWindowTitleLength : 50 +-- AI Analysis Configuration +property defaultVisionModel : "qwen2.5vl:7b" +-- Prioritized list of vision models (best to fallback) +property visionModelPriority : {"qwen2.5vl:7b", "llava:7b", "llava-phi3:3.8b", "minicpm-v:8b", "gemma3:4b", "llava:latest", "qwen2.5vl:3b", "llava:13b", "llava-llama3:8b"} +-- AI Provider Configuration +property aiProvider : "auto" -- "auto", "ollama", "claude" +property claudeModel : "sonnet" -- default Claude model alias +-- AI Analysis Timeout (90 seconds) +property aiAnalysisTimeout : 90 +-- Image Resize Configuration +property defaultImageMaxDimension : 0 -- 0 means no resize, otherwise max width/height in pixels +property defaultAIResizePercent : 50 -- Default resize percentage for AI analysis (50 = 50%) +--#endregion Configuration Properties + +--#region Helper Functions +on isValidPath(thePath) + if thePath is not "" and (thePath starts with "/") then + return true + end if + return false +end isValidPath + +on getFileExtension(filePath) + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to "." + set pathParts to text items of filePath + set AppleScript's text item delimiters to oldDelims + if (count pathParts) > 1 then + return item -1 of pathParts + else + return "" + end if +end getFileExtension + +on ensureDirectoryExists(dirPath) + try + do shell script "mkdir -p " & quoted form of dirPath + return true + on error + return false + end try +end ensureDirectoryExists + +on sanitizeFilename(filename) + -- Replace problematic characters for filenames + set filename to my replaceText(filename, "/", "_") + set filename to my replaceText(filename, ":", "_") + set filename to my replaceText(filename, "*", "_") + set filename to my replaceText(filename, "?", "_") + set filename to my replaceText(filename, "\"", "_") + set filename to my replaceText(filename, "<", "_") + set filename to my replaceText(filename, ">", "_") + set filename to my replaceText(filename, "|", "_") + if (length of filename) > maxWindowTitleLength then + set filename to text 1 thru maxWindowTitleLength of filename + end if + return filename +end sanitizeFilename + +on sanitizeAppName(appName) + -- Create model-friendly app names (lowercase, underscores, no spaces) + set appName to my replaceText(appName, " ", "_") + set appName to my replaceText(appName, ".", "_") + set appName to my replaceText(appName, "-", "_") + set appName to my replaceText(appName, "/", "_") + set appName to my replaceText(appName, ":", "_") + set appName to my replaceText(appName, "*", "_") + set appName to my replaceText(appName, "?", "_") + set appName to my replaceText(appName, "\"", "_") + set appName to my replaceText(appName, "<", "_") + set appName to my replaceText(appName, ">", "_") + set appName to my replaceText(appName, "|", "_") + -- Convert to lowercase using shell + try + set appName to do shell script "echo " & quoted form of appName & " | tr '[:upper:]' '[:lower:]'" + on error + -- Fallback if shell command fails + end try + -- Limit length for readability + if (length of appName) > 20 then + set appName to text 1 thru 20 of appName + end if + return appName +end sanitizeAppName + +on replaceText(theText, searchStr, replaceStr) + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to searchStr + set textItems to text items of theText + set AppleScript's text item delimiters to replaceStr + set newText to textItems as text + set AppleScript's text item delimiters to oldDelims + return newText +end replaceText + +on formatErrorMessage(errorType, errorMsg, context) + if enhancedErrorReporting then + set formattedMsg to scriptInfoPrefix & errorType & ": " & errorMsg + if context is not "" then + set formattedMsg to formattedMsg & " (Context: " & context & ")" + end if + return formattedMsg + else + return scriptInfoPrefix & errorMsg + end if +end formatErrorMessage + +on logVerbose(message) + if verboseLogging then + log "πŸ” " & message + end if +end logVerbose + +on trimWhitespace(theText) + set whitespaceChars to {" ", tab} + set newText to theText + repeat while (newText is not "") and (character 1 of newText is in whitespaceChars) + if (length of newText) > 1 then + set newText to text 2 thru -1 of newText + else + set newText to "" + end if + end repeat + repeat while (newText is not "") and (character -1 of newText is in whitespaceChars) + if (length of newText) > 1 then + set newText to text 1 thru -2 of newText + else + set newText to "" + end if + end repeat + return newText +end trimWhitespace + +on formatCaptureOutput(outputPath, appName, mode, isQuiet) + if isQuiet then + return outputPath + else + set msg to scriptInfoPrefix & "Screenshot captured successfully! πŸ“Έ" & linefeed + set msg to msg & "β€’ File: " & outputPath & linefeed + set msg to msg & "β€’ App: " & appName & linefeed + set msg to msg & "β€’ Mode: " & mode + return msg + end if +end formatCaptureOutput + +on formatMultiOutput(capturedFiles, appName, isQuiet) + if isQuiet then + -- Just return paths separated by newlines + set paths to "" + repeat with fileInfo in capturedFiles + set filePath to item 1 of fileInfo + set paths to paths & filePath & linefeed + end repeat + return paths + else + set windowCount to count of capturedFiles + set msg to scriptInfoPrefix & "Multi-window capture successful! Captured " & windowCount & " window(s) for " & appName & ":" & linefeed + repeat with fileInfo in capturedFiles + set filePath to item 1 of fileInfo + set winTitle to item 2 of fileInfo + set msg to msg & " πŸ“Έ " & filePath & " β†’ \"" & winTitle & "\"" & linefeed + end repeat + return msg + end if +end formatMultiOutput + +on formatMultiWindowAnalysis(capturedFiles, analysisResults, appName, question, model, isQuiet) + if isQuiet then + -- In quiet mode, return condensed results + set output to "" + repeat with result in analysisResults + set winTitle to windowTitle of result + set answer to answer of result + set output to output & scriptInfoPrefix & "Window \"" & winTitle & "\": " & answer & linefeed + end repeat + return output + else + -- Full formatted output + set windowCount to count of capturedFiles + set msg to scriptInfoPrefix & "Multi-window AI Analysis Complete! πŸ€–" & linefeed & linefeed + set msg to msg & "πŸ“Έ App: " & appName & " (" & windowCount & " windows)" & linefeed + set msg to msg & "❓ Question: " & question & linefeed + set msg to msg & "πŸ€– Model: " & model & linefeed & linefeed + + set msg to msg & "πŸ’¬ Results for each window:" & linefeed & linefeed + + set windowNum to 1 + repeat with result in analysisResults + set winTitle to windowTitle of result + set winIndex to windowIndex of result + set answer to answer of result + set success to success of result + + set msg to msg & "πŸͺŸ Window " & windowNum & ": \"" & winTitle & "\"" & linefeed + if success then + set msg to msg & answer & linefeed & linefeed + else + set msg to msg & "⚠️ Analysis failed: " & answer & linefeed & linefeed + end if + + set windowNum to windowNum + 1 + end repeat + + -- Add timing info if available + set msg to msg & scriptInfoPrefix & "Analysis of " & windowCount & " windows complete." + + return msg + end if +end formatMultiWindowAnalysis +--#endregion Helper Functions + +--#region AI Analysis Functions +on checkOllamaAvailable() + try + -- Check if ollama command exists + do shell script "ollama --version >/dev/null 2>&1" + -- Check if ollama service is running by testing API + do shell script "curl -s http://localhost:11434/api/tags >/dev/null 2>&1" + return true + on error + return false + end try +end checkOllamaAvailable + +on checkClaudeAvailable() + try + -- Check if claude command exists + do shell script "claude --version >/dev/null 2>&1" + return true + on error + return false + end try +end checkClaudeAvailable + +on getAvailableVisionModels() + set availableModels to {} + try + set ollamaList to do shell script "ollama list 2>/dev/null | tail -n +2 | awk '{print $1}' | grep -v '^$'" + set modelLines to paragraphs of ollamaList + repeat with modelLine in modelLines + set modelName to contents of modelLine + if modelName is not "" then + set end of availableModels to modelName + end if + end repeat + on error + -- Return empty list if ollama list fails + end try + return availableModels +end getAvailableVisionModels + +on findBestVisionModel(requestedModel) + my logVerbose("Finding best vision model, requested: " & requestedModel) + + set availableModels to my getAvailableVisionModels() + my logVerbose("Available models: " & (availableModels as string)) + + -- If specific model requested and available, use it + if requestedModel is not defaultVisionModel then + repeat with availModel in availableModels + if contents of availModel is requestedModel then + my logVerbose("Using requested model: " & requestedModel) + return requestedModel + end if + end repeat + -- Requested model not found, will fall back to priority list + my logVerbose("Requested model '" & requestedModel & "' not found, checking priority list") + end if + + -- Find best available model from priority list + repeat with priorityModel in visionModelPriority + repeat with availModel in availableModels + if contents of availModel is contents of priorityModel then + my logVerbose("Using priority model: " & contents of priorityModel) + return contents of priorityModel + end if + end repeat + end repeat + + -- No priority models available, use first available vision model + repeat with availModel in availableModels + set modelName to contents of availModel + if modelName contains "llava" or modelName contains "qwen" or modelName contains "gemma" or modelName contains "minicpm" then + my logVerbose("Using first available vision model: " & modelName) + return modelName + end if + end repeat + + -- No vision models found + return "" +end findBestVisionModel + +on getOllamaInstallInstructions() + set instructions to scriptInfoPrefix & "AI Analysis requires Ollama with a vision model." & linefeed & linefeed + set instructions to instructions & "πŸš€ Quick Setup:" & linefeed + set instructions to instructions & "1. Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh" & linefeed + set instructions to instructions & "2. Pull a vision model: ollama pull " & defaultVisionModel & linefeed + set instructions to instructions & "3. Models are ready to use!" & linefeed & linefeed + set instructions to instructions & "πŸ’‘ Recommended models:" & linefeed + set instructions to instructions & " β€’ qwen2.5vl:7b (6GB) - Best doc/chart understanding" & linefeed + set instructions to instructions & " β€’ llava:7b (4.7GB) - Solid all-rounder" & linefeed + set instructions to instructions & " β€’ llava-phi3:3.8b (2.9GB) - Tiny but chatty" & linefeed + set instructions to instructions & " β€’ minicpm-v:8b (5.5GB) - Killer OCR" & linefeed & linefeed + set instructions to instructions & "Then retry your Peekaboo command with --ask or --analyze!" + return instructions +end getOllamaInstallInstructions + +on analyzeImageWithOllama(imagePath, question, requestedModel, resizeDimension) + my logVerbose("Analyzing image with AI: " & imagePath) + my logVerbose("Requested model: " & requestedModel) + my logVerbose("Question: " & question) + + -- Record start time + set startTime to do shell script "date +%s" + + -- Check if Ollama is available + if not my checkOllamaAvailable() then + return my formatErrorMessage("Ollama Error", "Ollama is not installed or not in PATH." & linefeed & linefeed & my getOllamaInstallInstructions(), "ollama unavailable") + end if + + -- Find best available vision model + set modelToUse to my findBestVisionModel(requestedModel) + if modelToUse is "" then + return my formatErrorMessage("Model Error", "No vision models found." & linefeed & linefeed & my getOllamaInstallInstructions(), "no vision models") + end if + + -- Use ollama run command with proper vision model syntax + try + my logVerbose("Using model: " & modelToUse) + -- For vision models, we need to use the API approach or different command structure + -- Let's use a simpler approach with base64 and API + -- First check if image is too large and compress if needed + set imageSize to do shell script "wc -c < " & quoted form of imagePath + set imageSizeBytes to imageSize as number + my logVerbose("Image size: " & imageSize & " bytes") + + set processedImagePath to imagePath + -- Always resize for AI unless already resized by user + set compressedPath to "/tmp/peekaboo_ai_compressed.png" + + -- Check if we need to resize + set shouldResize to true + if resizeDimension > 0 then + -- User already specified resize, don't resize again + set shouldResize to false + my logVerbose("Image already resized by user to " & resizeDimension & " pixels, skipping AI resize") + end if + + if shouldResize then + if imageSizeBytes > 5000000 then -- 5MB threshold for additional compression + my logVerbose("Image is large (" & imageSize & " bytes), applying " & defaultAIResizePercent & "% resize for AI") + do shell script "sips -Z 2048 -s format png " & quoted form of imagePath & " --out " & quoted form of compressedPath + else + -- Apply default 50% resize for AI + my logVerbose("Applying default " & defaultAIResizePercent & "% resize for AI analysis") + -- Calculate 50% dimensions manually + set dimensions to do shell script "sips -g pixelHeight -g pixelWidth " & quoted form of imagePath & " | grep -E 'pixelHeight|pixelWidth' | awk '{print $2}'" + set dimensionLines to paragraphs of dimensions + set imgHeight to (item 1 of dimensionLines) as integer + set imgWidth to (item 2 of dimensionLines) as integer + set newHeight to imgHeight * defaultAIResizePercent / 100 + set newWidth to imgWidth * defaultAIResizePercent / 100 + do shell script "sips -z " & newHeight & " " & newWidth & " -s format png " & quoted form of imagePath & " --out " & quoted form of compressedPath + end if + set processedImagePath to compressedPath + end if + + set base64Image to do shell script "base64 -i " & quoted form of processedImagePath & " | tr -d '\\n'" + set jsonPayload to "{\"model\":\"" & modelToUse & "\",\"prompt\":\"" & my escapeJSON(question) & "\",\"images\":[\"" & base64Image & "\"],\"stream\":false}" + my logVerbose("Running API call to Ollama") + my logVerbose("JSON payload size: " & (length of jsonPayload) & " characters") + my logVerbose("Base64 image size: " & (length of base64Image) & " characters") + + -- Write JSON to temporary file using AppleScript file writing to avoid shell limitations + set jsonTempFile to "/tmp/peekaboo_ollama_request.json" + try + set fileRef to open for access (POSIX file jsonTempFile) with write permission + set eof of fileRef to 0 + write jsonPayload to fileRef + close access fileRef + on error + try + close access fileRef + end try + end try + -- Add timeout to curl command (60 seconds) + set curlCmd to "curl -s -X POST http://localhost:11434/api/generate -H 'Content-Type: application/json' -d @" & quoted form of jsonTempFile & " --max-time " & aiAnalysisTimeout + + set response to do shell script curlCmd + + -- Parse JSON response + set responseStart to (offset of "\"response\":\"" in response) + 12 + if responseStart > 12 then + set responseEnd to responseStart + set inEscape to false + repeat with i from responseStart to (length of response) + set char to character i of response + if inEscape then + set inEscape to false + else if char is "\\" then + set inEscape to true + else if char is "\"" then + set responseEnd to i - 1 + exit repeat + end if + end repeat + + set aiResponse to text responseStart thru responseEnd of response + -- Unescape JSON + set aiResponse to my replaceText(aiResponse, "\\n", linefeed) + set aiResponse to my replaceText(aiResponse, "\\\"", "\"") + set aiResponse to my replaceText(aiResponse, "\\\\", "\\") + else + error "Could not parse response: " & response + end if + + -- Calculate elapsed time + set endTime to do shell script "date +%s" + set elapsedTime to (endTime as number) - (startTime as number) + -- Simple formatting - just show seconds + set elapsedTimeFormatted to elapsedTime as string + + set resultMsg to scriptInfoPrefix & "AI Analysis Complete! πŸ€–" & linefeed & linefeed + set resultMsg to resultMsg & "πŸ“Έ Image: " & imagePath & linefeed + set resultMsg to resultMsg & "❓ Question: " & question & linefeed + set resultMsg to resultMsg & "πŸ€– Model: " & modelToUse & linefeed & linefeed + set resultMsg to resultMsg & "πŸ’¬ Answer:" & linefeed & aiResponse & linefeed & linefeed + set resultMsg to resultMsg & scriptInfoPrefix & "Analysis via " & modelToUse & " took " & elapsedTimeFormatted & " sec." + + return resultMsg + + on error errMsg + -- Calculate elapsed time even on error + set endTime to do shell script "date +%s" + set elapsedTime to (endTime as number) - (startTime as number) + + if errMsg contains "curl" and (errMsg contains "timed out" or errMsg contains "timeout" or elapsedTime β‰₯ aiAnalysisTimeout) then + return my formatErrorMessage("Timeout Error", "AI analysis timed out after " & aiAnalysisTimeout & " seconds." & linefeed & linefeed & "The model '" & modelToUse & "' may be too large or slow for your system." & linefeed & linefeed & "Try:" & linefeed & "β€’ Using a smaller model (e.g., llava-phi3:3.8b)" & linefeed & "β€’ Checking if Ollama is responding: ollama list" & linefeed & "β€’ Restarting Ollama service", "timeout") + else if errMsg contains "model" and errMsg contains "not found" then + return my formatErrorMessage("Model Error", "Model '" & modelToUse & "' not found." & linefeed & linefeed & "Install it with: ollama pull " & modelToUse & linefeed & linefeed & my getOllamaInstallInstructions(), "model not found") + else + return my formatErrorMessage("Analysis Error", "Failed to analyze image: " & errMsg & linefeed & linefeed & "Make sure Ollama is running and the model is available.", "ollama execution") + end if + end try +end analyzeImageWithAI + +on escapeJSON(inputText) + set escapedText to my replaceText(inputText, "\\", "\\\\") + set escapedText to my replaceText(escapedText, "\"", "\\\"") + set escapedText to my replaceText(escapedText, linefeed, "\\n") + set escapedText to my replaceText(escapedText, return, "\\n") + set escapedText to my replaceText(escapedText, tab, "\\t") + return escapedText +end escapeJSON + +on analyzeImageWithClaude(imagePath, question, modelAlias) + my logVerbose("Analyzing image with Claude: " & imagePath) + my logVerbose("Model: " & modelAlias) + my logVerbose("Question: " & question) + + -- Record start time + set startTime to do shell script "date +%s" + + -- Check if Claude is available + if not my checkClaudeAvailable() then + return my formatErrorMessage("Claude Error", "Claude CLI is not installed." & linefeed & linefeed & "Install it from: https://claude.ai/code", "claude unavailable") + end if + + -- Get Claude version + set claudeVersion to "" + try + set claudeVersion to do shell script "claude --version 2>/dev/null | head -1" + on error + set claudeVersion to "unknown" + end try + + try + -- Note: Claude CLI doesn't support direct image file analysis + -- This is a limitation of the current Claude CLI implementation + set errorMsg to "Claude CLI currently doesn't support direct image file analysis." & linefeed & linefeed + set errorMsg to errorMsg & "Claude can analyze images through:" & linefeed + set errorMsg to errorMsg & "β€’ Copy/paste images in interactive mode" & linefeed + set errorMsg to errorMsg & "β€’ MCP (Model Context Protocol) integrations" & linefeed & linefeed + set errorMsg to errorMsg & "For automated image analysis, please use Ollama with vision models instead." + + -- Calculate elapsed time even for error + set endTime to do shell script "date +%s" + set elapsedTime to (endTime as number) - (startTime as number) + set elapsedTimeFormatted to elapsedTime as string + + set errorMsg to errorMsg & linefeed & linefeed & scriptInfoPrefix & "Claude " & claudeVersion & " check took " & elapsedTimeFormatted & " sec." + + return my formatErrorMessage("Claude Limitation", errorMsg, "feature not supported") + + on error errMsg + return my formatErrorMessage("Claude Analysis Error", "Failed to analyze image with Claude: " & errMsg, "claude execution") + end try +end analyzeImageWithClaude + +on analyzeImageWithAI(imagePath, question, requestedModel, requestedProvider, resizeDimension) + my logVerbose("Starting AI analysis with smart provider selection") + my logVerbose("Requested provider: " & requestedProvider) + + -- Determine which AI provider to use + set ollamaAvailable to my checkOllamaAvailable() + set claudeAvailable to my checkClaudeAvailable() + + my logVerbose("Ollama available: " & ollamaAvailable) + my logVerbose("Claude available: " & claudeAvailable) + + -- If neither is available, provide helpful error + if not ollamaAvailable and not claudeAvailable then + set errorMsg to "Neither Ollama nor Claude CLI is installed." & linefeed & linefeed + set errorMsg to errorMsg & "Install one of these AI providers:" & linefeed & linefeed + set errorMsg to errorMsg & "πŸ€– Ollama (local, privacy-focused):" & linefeed + set errorMsg to errorMsg & my getOllamaInstallInstructions() & linefeed & linefeed + set errorMsg to errorMsg & "☁️ Claude CLI (cloud-based):" & linefeed + set errorMsg to errorMsg & "Install from: https://claude.ai/code" + return my formatErrorMessage("No AI Provider", errorMsg, "no ai provider") + end if + + -- Smart selection based on availability and preference + if requestedProvider is "ollama" and ollamaAvailable then + return my analyzeImageWithOllama(imagePath, question, requestedModel, resizeDimension) + else if requestedProvider is "claude" and claudeAvailable then + return my analyzeImageWithClaude(imagePath, question, requestedModel) + else if requestedProvider is "auto" then + -- Auto mode: prefer Ollama, fallback to Claude + if ollamaAvailable then + return my analyzeImageWithOllama(imagePath, question, requestedModel, resizeDimension) + else if claudeAvailable then + return my analyzeImageWithClaude(imagePath, question, requestedModel) + end if + else + -- Requested provider not available, try the other one + if ollamaAvailable then + my logVerbose("Requested provider not available, using Ollama instead") + return my analyzeImageWithOllama(imagePath, question, requestedModel, resizeDimension) + else if claudeAvailable then + my logVerbose("Requested provider not available, using Claude instead") + return my analyzeImageWithClaude(imagePath, question, requestedModel) + end if + end if + + -- Should never reach here + return my formatErrorMessage("Provider Error", "Unable to determine AI provider", "provider selection") +end analyzeImageWithAI +--#endregion AI Analysis Functions + +--#region App Discovery Functions +on listRunningApps() + set appList to {} + try + tell application "System Events" + repeat with proc in (every application process whose background only is false) + try + set appName to name of proc + set bundleID to bundle identifier of proc + set windowCount to count of windows of proc + set windowTitles to {} + + if windowCount > 0 then + repeat with win in windows of proc + try + set winTitle to title of win + if winTitle is not "" then + set end of windowTitles to winTitle + end if + on error + -- Skip windows without accessible titles + end try + end repeat + end if + + set end of appList to {appName:appName, bundleID:bundleID, windowCount:windowCount, windowTitles:windowTitles} + on error + -- Skip apps we can't access + end try + end repeat + end tell + on error errMsg + return my formatErrorMessage("Discovery Error", "Failed to enumerate running applications: " & errMsg, "app enumeration") + end try + return appList +end listRunningApps + +on formatAppList(appList) + if appList starts with scriptInfoPrefix then + return appList -- Error message + end if + + set output to scriptInfoPrefix & "Running Applications:" & linefeed & linefeed + + repeat with appInfo in appList + set appName to appName of appInfo + set bundleID to bundleID of appInfo + set windowCount to windowCount of appInfo + set windowTitles to windowTitles of appInfo + + set output to output & "β€’ " & appName & " (" & bundleID & ")" & linefeed + set output to output & " Windows: " & windowCount + + if windowCount > 0 and (count of windowTitles) > 0 then + set output to output & linefeed + repeat with winTitle in windowTitles + set output to output & " - \"" & winTitle & "\"" & linefeed + end repeat + else + set output to output & linefeed + end if + set output to output & linefeed + end repeat + + return output +end formatAppList +--#endregion App Discovery Functions + +--#region App Resolution Functions +on resolveAppIdentifier(appIdentifier) + my logVerbose("Resolving app identifier: " & appIdentifier) + + -- Resolution priority order: + -- 1. Exact app name match (running apps) + -- 2. Fuzzy matching for running apps (case-insensitive, partial, common mappings) + -- 3. Exact match in /Applications (not running) + -- 4. Fuzzy match in /Applications (not running) + -- 5. Bundle ID lookup (last resort) + + -- PRIORITY 1: Try as application name for running apps (exact match) + try + tell application "System Events" + set nameApps to (every application process whose name is appIdentifier) + if (count nameApps) > 0 then + set targetApp to item 1 of nameApps + set actualAppName to name of targetApp -- Get the actual name with correct case + try + set bundleID to bundle identifier of targetApp + on error + set bundleID to "" + end try + my logVerbose("Found running app by name: " & actualAppName) + return {appName:actualAppName, bundleID:bundleID, isRunning:true, resolvedBy:"app_name"} + end if + end tell + on error + my logVerbose("App name lookup failed for running processes") + end try + + -- PRIORITY 2: Try fuzzy matching for running apps + try + tell application "System Events" + set allApps to every application process + set appIdentifierLower to do shell script "echo " & quoted form of appIdentifier & " | tr '[:upper:]' '[:lower:]'" + + -- Try case-insensitive exact match first + repeat with appProc in allApps + set appName to name of appProc + set appNameLower to do shell script "echo " & quoted form of appName & " | tr '[:upper:]' '[:lower:]'" + if appNameLower is appIdentifierLower then + try + set bundleID to bundle identifier of appProc + on error + set bundleID to "" + end try + my logVerbose("Found running app by case-insensitive match: " & appName) + return {appName:appName, bundleID:bundleID, isRunning:true, resolvedBy:"case_insensitive"} + end if + end repeat + + -- Try partial match (app identifier is contained in app name) + repeat with appProc in allApps + set appName to name of appProc + set appNameLower to do shell script "echo " & quoted form of appName & " | tr '[:upper:]' '[:lower:]'" + if appNameLower contains appIdentifierLower then + try + set bundleID to bundle identifier of appProc + on error + set bundleID to "" + end try + my logVerbose("Found running app by partial match: " & appName & " (searched for: " & appIdentifier & ")") + return {appName:appName, bundleID:bundleID, isRunning:true, resolvedBy:"partial_match"} + end if + end repeat + + -- Try common variations (e.g., "Chrome" -> "Google Chrome", "Code" -> "Visual Studio Code") + -- Only include apps that are actually running + set commonMappings to {{"chrome", "Google Chrome"}, {"safari", "Safari"}, {"firefox", "Firefox"}, {"code", {"Visual Studio Code", "Visual Studio Code - Insiders"}}, {"vscode", {"Visual Studio Code", "Visual Studio Code - Insiders"}}, {"terminal", "Terminal"}, {"term", "Terminal"}, {"iterm", "iTerm2"}, {"iterm2", "iTerm2"}, {"slack", "Slack"}, {"zoom", "zoom.us"}, {"teams", "Microsoft Teams"}, {"outlook", "Microsoft Outlook"}, {"excel", "Microsoft Excel"}, {"word", "Microsoft Word"}, {"powerpoint", "Microsoft PowerPoint"}, {"keynote", "Keynote"}, {"pages", "Pages"}, {"numbers", "Numbers"}} + + repeat with mapping in commonMappings + if appIdentifierLower is item 1 of mapping then + set mappedAppNames to item 2 of mapping + -- Handle both single string and list of strings + if class of mappedAppNames is text then + set mappedAppNames to {mappedAppNames} + end if + + -- Try each possible mapped name + repeat with mappedAppName in mappedAppNames + repeat with appProc in allApps + if name of appProc is contents of mappedAppName then + try + set bundleID to bundle identifier of appProc + on error + set bundleID to "" + end try + my logVerbose("Found running app by common name mapping: " & contents of mappedAppName & " (searched for: " & appIdentifier & ")") + return {appName:contents of mappedAppName, bundleID:bundleID, isRunning:true, resolvedBy:"common_mapping"} + end if + end repeat + end repeat + end if + end repeat + end tell + on error errMsg + my logVerbose("Fuzzy matching failed: " & errMsg) + end try + + -- PRIORITY 3: Try to find the app in /Applications (not running) + try + set appPath to "/Applications/" & appIdentifier & ".app" + tell application "System Events" + if exists file appPath then + try + set bundleID to bundle identifier of file appPath + on error + set bundleID to "" + end try + my logVerbose("Found app in /Applications: " & appIdentifier) + return {appName:appIdentifier, bundleID:bundleID, isRunning:false, resolvedBy:"applications_folder"} + end if + end tell + on error + my logVerbose("/Applications lookup failed") + end try + + -- PRIORITY 4: Try fuzzy matching in /Applications folder + try + set appIdentifierLower to do shell script "echo " & quoted form of appIdentifier & " | tr '[:upper:]' '[:lower:]'" + set appFiles to do shell script "ls /Applications/ | grep -E '\\.app$' || true" + set appFileList to paragraphs of appFiles + + repeat with appFile in appFileList + set appNameOnly to text 1 thru -5 of appFile -- Remove ".app" extension + set appNameLower to do shell script "echo " & quoted form of appNameOnly & " | tr '[:upper:]' '[:lower:]'" + + -- Check if fuzzy match + if appNameLower contains appIdentifierLower or appIdentifierLower contains appNameLower then + set appPath to "/Applications/" & appFile + tell application "System Events" + if exists file appPath then + try + set bundleID to bundle identifier of file appPath + on error + set bundleID to "" + end try + my logVerbose("Found app in /Applications by fuzzy match: " & appNameOnly & " (searched for: " & appIdentifier & ")") + return {appName:appNameOnly, bundleID:bundleID, isRunning:false, resolvedBy:"applications_fuzzy"} + end if + end tell + end if + end repeat + on error errMsg + my logVerbose("/Applications fuzzy search failed: " & errMsg) + end try + + -- PRIORITY 5: Try as bundle ID for running apps (last resort) + try + tell application "System Events" + set bundleApps to (every application process whose bundle identifier is appIdentifier) + if (count bundleApps) > 0 then + set targetApp to item 1 of bundleApps + set appName to name of targetApp + my logVerbose("Found running app by bundle ID: " & appName) + return {appName:appName, bundleID:appIdentifier, isRunning:true, resolvedBy:"bundle_id"} + end if + end tell + on error + my logVerbose("Bundle ID lookup failed") + end try + + -- If it looks like a bundle ID, try launching it directly + if appIdentifier contains "." then + try + tell application "System Events" + launch application id appIdentifier + delay windowActivationDelay + set bundleApps to (every application process whose bundle identifier is appIdentifier) + if (count bundleApps) > 0 then + set targetApp to item 1 of bundleApps + set appName to name of targetApp + my logVerbose("Successfully launched app by bundle ID: " & appName) + return {appName:appName, bundleID:appIdentifier, isRunning:true, resolvedBy:"bundle_id_launch"} + end if + end tell + on error errMsg + my logVerbose("Bundle ID launch failed: " & errMsg) + end try + end if + + return missing value +end resolveAppIdentifier + +on getAppWindows(appName) + set windowInfo to {} + set windowCount to 0 + set accessibleWindows to 0 + + try + tell application "System Events" + tell process appName + set windowCount to count of windows + repeat with i from 1 to windowCount + try + set win to window i + set winTitle to title of win + if winTitle is "" then set winTitle to "Untitled Window " & i + set end of windowInfo to {winTitle, i} + set accessibleWindows to accessibleWindows + 1 + on error + set end of windowInfo to {("Window " & i), i} + end try + end repeat + end tell + end tell + on error errMsg + my logVerbose("Failed to get windows for " & appName & ": " & errMsg) + return {windowInfo:windowInfo, totalWindows:0, accessibleWindows:0, errorMsg:errMsg} + end try + + return {windowInfo:windowInfo, totalWindows:windowCount, accessibleWindows:accessibleWindows, errorMsg:""} +end getAppWindows + +on getAppWindowStatus(appName) + set windowStatus to my getAppWindows(appName) + set windowInfo to windowInfo of windowStatus + set totalWindows to totalWindows of windowStatus + set accessibleWindows to accessibleWindows of windowStatus + set windowError to errorMsg of windowStatus + + if windowError is not "" then + return my formatErrorMessage("Window Access Error", "Cannot access windows for app '" & appName & "': " & windowError & ". The app may not be running or may not have accessibility permissions.", "window enumeration") + end if + + if totalWindows = 0 then + return my formatErrorMessage("No Windows Error", "App '" & appName & "' is running but has no windows open. Peekaboo needs at least one window to capture. Please open a window in " & appName & " and try again.", "zero windows") + end if + + if accessibleWindows = 0 and totalWindows > 0 then + return my formatErrorMessage("Window Access Error", "App '" & appName & "' has " & totalWindows & " window(s) but none are accessible. This may require accessibility permissions in System Preferences > Security & Privacy > Accessibility.", "accessibility required") + end if + + -- Success case + set statusMsg to "App '" & appName & "' has " & totalWindows & " window(s)" + if accessibleWindows < totalWindows then + set statusMsg to statusMsg & " (" & accessibleWindows & " accessible)" + end if + + return {status:"success", message:statusMsg, windowInfo:windowInfo, totalWindows:totalWindows, accessibleWindows:accessibleWindows} +end getAppWindowStatus + +on bringAppToFront(appInfo) + set appName to appName of appInfo + set isRunning to isRunning of appInfo + + my logVerbose("Bringing app to front: " & appName & " (running: " & isRunning & ")") + + -- Skip app focus for fullscreen mode + if appName is "fullscreen" then + my logVerbose("Fullscreen mode - skipping app focus") + return "" + end if + + if not isRunning then + try + tell application appName to activate + delay windowActivationDelay + on error errMsg + return my formatErrorMessage("Activation Error", "Failed to launch app '" & appName & "': " & errMsg, "app launch") + end try + else + try + tell application "System Events" + tell process appName + set frontmost to true + end tell + end tell + delay windowActivationDelay + on error errMsg + return my formatErrorMessage("Focus Error", "Failed to bring app '" & appName & "' to front: " & errMsg, "app focus") + end try + end if + + return "" +end bringAppToFront +--#endregion App Resolution Functions + +--#region Screenshot Functions +on captureScreenshot(outputPath, captureMode, appName, resizeDimension) + my logVerbose("Capturing screenshot to: " & outputPath & " (mode: " & captureMode & ", resize: " & resizeDimension & ")") + + -- Ensure output directory exists + set outputDir to do shell script "dirname " & quoted form of outputPath + if not my ensureDirectoryExists(outputDir) then + return my formatErrorMessage("Directory Error", "Could not create output directory: " & outputDir, "directory creation") + end if + + -- Wait for capture delay + delay captureDelay + + -- Determine screenshot format + set fileExt to my getFileExtension(outputPath) + if fileExt is "" then + set fileExt to defaultScreenshotFormat + set outputPath to outputPath & "." & fileExt + end if + + -- Build screencapture command based on mode + set screencaptureCmd to "screencapture -x" + + if captureMode is "window" and appName is not "fullscreen" then + -- IMPORTANT: Do NOT use -o -W flags as they require user interaction! + -- Instead, capture the window using its bounds (position and size) + try + tell application "System Events" + tell process appName + set winPosition to position of window 1 + set winSize to size of window 1 + end tell + end tell + + set x to item 1 of winPosition + set y to item 2 of winPosition + set w to item 1 of winSize + set h to item 2 of winSize + + -- Use -R flag to capture a specific rectangle + set screencaptureCmd to screencaptureCmd & " -R" & x & "," & y & "," & w & "," & h + my logVerbose("Capturing window bounds for " & appName & ": " & x & "," & y & "," & w & "," & h) + on error errMsg + -- Fallback to full screen if we can't get window bounds + my logVerbose("Could not get window bounds for " & appName & ", error: " & errMsg) + log scriptInfoPrefix & "Warning: Could not capture window bounds for " & appName & ", using full screen capture instead" + end try + end if + + -- Add format flag if not PNG (default) + if fileExt is not "png" then + set screencaptureCmd to screencaptureCmd & " -t " & fileExt + end if + + -- Add output path + set screencaptureCmd to screencaptureCmd & " " & quoted form of outputPath + + -- Capture screenshot + try + my logVerbose("Running: " & screencaptureCmd) + do shell script screencaptureCmd + + -- Verify file was created + try + do shell script "test -f " & quoted form of outputPath + + -- Apply resize if requested + if resizeDimension > 0 then + my logVerbose("Resizing image to max dimension: " & resizeDimension) + try + do shell script "sips -Z " & resizeDimension & " " & quoted form of outputPath + my logVerbose("Image resized successfully") + on error resizeErr + my logVerbose("Failed to resize image: " & resizeErr) + -- Continue without resize on error + end try + end if + + return outputPath + on error + return my formatErrorMessage("Capture Error", "Screenshot file was not created at: " & outputPath, "file verification") + end try + + on error errMsg number errNum + -- Enhanced error handling for common screencapture issues + if errMsg contains "not authorized" or errMsg contains "Screen Recording" then + return my formatErrorMessage("Permission Error", "Screen Recording permission required. Please go to System Preferences > Security & Privacy > Screen Recording and add your terminal app to the allowed list. Then restart your terminal and try again.", "screen recording permission") + else if errMsg contains "No such file" then + return my formatErrorMessage("Path Error", "Cannot create screenshot at '" & outputPath & "'. Check that the directory exists and you have write permissions.", "file creation") + else if errMsg contains "Permission denied" then + return my formatErrorMessage("Permission Error", "Permission denied writing to '" & outputPath & "'. Check file/directory permissions or try a different location like /tmp/", "write permission") + else + return my formatErrorMessage("Capture Error", "screencapture failed: " & errMsg & ". This may be due to permissions, disk space, or system restrictions.", "error " & errNum) + end if + end try +end captureScreenshot + +on captureWindowByIndex(outputPath, appName, winIndex, resizeDimension) + my logVerbose("Capturing window " & winIndex & " of " & appName & " to: " & outputPath & " (resize: " & resizeDimension & ")") + + -- Ensure output directory exists + set outputDir to do shell script "dirname " & quoted form of outputPath + if not my ensureDirectoryExists(outputDir) then + return my formatErrorMessage("Directory Error", "Could not create output directory: " & outputDir, "directory creation") + end if + + -- Wait for capture delay + delay captureDelay + + -- Determine screenshot format + set fileExt to my getFileExtension(outputPath) + if fileExt is "" then + set fileExt to defaultScreenshotFormat + set outputPath to outputPath & "." & fileExt + end if + + -- Build screencapture command + set screencaptureCmd to "screencapture -x" + + -- Get window bounds for specific window + try + tell application "System Events" + tell process appName + set winPosition to position of window winIndex + set winSize to size of window winIndex + end tell + end tell + + set x to item 1 of winPosition + set y to item 2 of winPosition + set w to item 1 of winSize + set h to item 2 of winSize + + -- Use -R flag to capture a specific rectangle + set screencaptureCmd to screencaptureCmd & " -R" & x & "," & y & "," & w & "," & h + my logVerbose("Capturing window " & winIndex & " bounds: " & x & "," & y & "," & w & "," & h) + on error errMsg + -- Fallback to full screen if we can't get window bounds + my logVerbose("Could not get bounds for window " & winIndex & ", error: " & errMsg) + return my formatErrorMessage("Window Error", "Could not get bounds for window " & winIndex & " of " & appName, "window bounds") + end try + + -- Add format flag if not PNG (default) + if fileExt is not "png" then + set screencaptureCmd to screencaptureCmd & " -t " & fileExt + end if + + -- Add output path + set screencaptureCmd to screencaptureCmd & " " & quoted form of outputPath + + -- Capture screenshot + try + my logVerbose("Running: " & screencaptureCmd) + do shell script screencaptureCmd + + -- Verify file was created + try + do shell script "test -f " & quoted form of outputPath + + -- Apply resize if requested + if resizeDimension > 0 then + my logVerbose("Resizing image to max dimension: " & resizeDimension) + try + do shell script "sips -Z " & resizeDimension & " " & quoted form of outputPath + my logVerbose("Image resized successfully") + on error resizeErr + my logVerbose("Failed to resize image: " & resizeErr) + -- Continue without resize on error + end try + end if + + return outputPath + on error + return my formatErrorMessage("Capture Error", "Screenshot file was not created at: " & outputPath, "file verification") + end try + + on error errMsg + return my formatErrorMessage("Capture Error", "Failed to capture window: " & errMsg, "screencapture") + end try +end captureWindowByIndex + +on captureMultipleWindows(appName, baseOutputPath, resizeDimension) + -- Get detailed window status first + set windowStatus to my getAppWindowStatus(appName) + + -- Check if it's an error (string) or success (record) + try + set statusClass to class of windowStatus + if statusClass is text or statusClass is string then + -- It's an error message + return windowStatus + end if + on error + -- Assume it's a record and continue + end try + + -- Extract window info from successful status + set windowInfo to windowInfo of windowStatus + set totalWindows to totalWindows of windowStatus + set accessibleWindows to accessibleWindows of windowStatus + set capturedFiles to {} + + my logVerbose("Multi-window capture: " & totalWindows & " total, " & accessibleWindows & " accessible") + + if (count of windowInfo) = 0 then + return my formatErrorMessage("Multi-Window Error", "App '" & appName & "' has no accessible windows for multi-window capture. Try using single screenshot mode instead, or ensure the app has open windows.", "no accessible windows") + end if + + -- Get base path components + set outputDir to do shell script "dirname " & quoted form of baseOutputPath + set baseName to do shell script "basename " & quoted form of baseOutputPath + set fileExt to my getFileExtension(baseName) + if fileExt is not "" then + set baseNameNoExt to text 1 thru -((length of fileExt) + 2) of baseName + else + set baseNameNoExt to baseName + set fileExt to defaultScreenshotFormat + end if + + my logVerbose("Capturing " & (count of windowInfo) & " windows for " & appName) + + repeat with winInfo in windowInfo + set winTitle to item 1 of winInfo + set winIndex to item 2 of winInfo + set sanitizedTitle to my sanitizeAppName(winTitle) + + set windowFileName to baseNameNoExt & "_window_" & winIndex & "_" & sanitizedTitle & "." & fileExt + set windowOutputPath to outputDir & "/" & windowFileName + + -- Focus the specific window first + try + tell application "System Events" + tell process appName + set frontmost to true + tell window winIndex + perform action "AXRaise" + end tell + end tell + end tell + delay 0.1 + on error + my logVerbose("Could not focus window " & winIndex & ", continuing anyway") + end try + + -- Capture the specific window using a custom method + set captureResult to my captureWindowByIndex(windowOutputPath, appName, winIndex, resizeDimension) + if captureResult starts with scriptInfoPrefix then + -- Error occurred, but continue with other windows + my logVerbose("Failed to capture window " & winIndex & ": " & captureResult) + else + set end of capturedFiles to {captureResult, winTitle, winIndex} + end if + end repeat + + return capturedFiles +end captureMultipleWindows +--#endregion Screenshot Functions + +--#region Main Script Logic (on run) +on run argv + set appSpecificErrorOccurred to false + try + my logVerbose("Starting Peekaboo v2.0.0") + + set argCount to count argv + + -- Initialize all variables + set command to "" -- "capture", "analyze", "list", "help" + set appIdentifier to "" + set outputPath to "" + set outputSpecified to false + set captureMode to "" -- will be determined + set forceFullscreen to false + set multiWindow to false + set analyzeMode to false + set analysisQuestion to "" + set visionModel to defaultVisionModel + set requestedProvider to aiProvider + set outputFormat to "" + set quietMode to false + set resizeDimension to defaultImageMaxDimension + + -- Handle no arguments - default to fullscreen + if argCount = 0 then + set command to "capture" + set forceFullscreen to true + else + -- Check first argument for commands + set firstArg to item 1 of argv + if firstArg is "list" or firstArg is "ls" then + return my formatAppList(my listRunningApps()) + else if firstArg is "help" or firstArg is "-h" or firstArg is "--help" then + return my usageText() + else if firstArg is "analyze" then + set command to "analyze" + -- analyze command requires at least image and question + if argCount < 3 then + return my formatErrorMessage("Argument Error", "analyze command requires: analyze \"question\"" & linefeed & linefeed & my usageText(), "validation") + end if + set appIdentifier to item 2 of argv -- actually the image path + set analysisQuestion to item 3 of argv + set analyzeMode to true + else + -- Regular capture command + set command to "capture" + -- Check if first arg is a flag or app name + if not (firstArg starts with "-") then + set appIdentifier to firstArg + end if + end if + end if + + -- Parse remaining arguments + set i to 1 + if command is "analyze" then set i to 4 -- Skip "analyze image question" + if command is "capture" and appIdentifier is not "" then set i to 2 -- Skip app name + + repeat while i ≀ argCount + set arg to item i of argv + + -- Handle flags with values + if arg is "--output" or arg is "-o" then + if i < argCount then + set i to i + 1 + set outputPath to item i of argv + set outputSpecified to true + else + return my formatErrorMessage("Argument Error", arg & " requires a path parameter", "validation") + end if + else if arg is "--ask" or arg is "-a" then + if i < argCount then + set i to i + 1 + set analysisQuestion to item i of argv + set analyzeMode to true + else + return my formatErrorMessage("Argument Error", arg & " requires a question parameter", "validation") + end if + else if arg is "--model" then + if i < argCount then + set i to i + 1 + set visionModel to item i of argv + else + return my formatErrorMessage("Argument Error", "--model requires a model name parameter", "validation") + end if + else if arg is "--provider" then + if i < argCount then + set i to i + 1 + set requestedProvider to item i of argv + if requestedProvider is not "auto" and requestedProvider is not "ollama" and requestedProvider is not "claude" then + return my formatErrorMessage("Argument Error", "--provider must be 'auto', 'ollama', or 'claude'", "validation") + end if + else + return my formatErrorMessage("Argument Error", "--provider requires a provider name parameter", "validation") + end if + else if arg is "--format" then + if i < argCount then + set i to i + 1 + set outputFormat to item i of argv + if outputFormat is not "png" and outputFormat is not "jpg" and outputFormat is not "pdf" then + return my formatErrorMessage("Argument Error", "--format must be 'png', 'jpg', or 'pdf'", "validation") + end if + else + return my formatErrorMessage("Argument Error", "--format requires a format parameter", "validation") + end if + else if arg is "--resize" or arg is "-r" then + if i < argCount then + set i to i + 1 + try + set resizeDimension to (item i of argv) as integer + if resizeDimension < 0 then + return my formatErrorMessage("Argument Error", "--resize value must be a positive number or 0 (no resize)", "validation") + end if + on error + return my formatErrorMessage("Argument Error", "--resize requires a numeric value (max dimension in pixels)", "validation") + end try + else + return my formatErrorMessage("Argument Error", "--resize requires a dimension parameter", "validation") + end if + + -- Handle boolean flags + else if arg is "--fullscreen" or arg is "-f" then + set forceFullscreen to true + else if arg is "--window" or arg is "-w" then + set captureMode to "window" + else if arg is "--multi" or arg is "-m" then + set multiWindow to true + else if arg is "--verbose" or arg is "-v" then + set verboseLogging to true + else if arg is "--quiet" or arg is "-q" then + set quietMode to true + + -- Handle positional argument (output path for old-style compatibility) + else if not (arg starts with "-") and command is "capture" and not outputSpecified then + set outputPath to arg + set outputSpecified to true + end if + + set i to i + 1 + end repeat + + -- Handle analyze command + if command is "analyze" then + -- For analyze command, appIdentifier contains the image path + return my analyzeImageWithAI(appIdentifier, analysisQuestion, visionModel, requestedProvider, resizeDimension) + end if + + -- For capture command, determine capture mode + if captureMode is "" then + if forceFullscreen or appIdentifier is "" then + set captureMode to "screen" + else + -- App specified, default to window capture + set captureMode to "window" + end if + end if + + -- Set default output path if none provided + if outputPath is "" then + set timestamp to do shell script "date +%Y%m%d_%H%M%S" + -- Create model-friendly filename with app name + if appIdentifier is "" or appIdentifier is "fullscreen" then + set appNameForFile to "fullscreen" + else + set appNameForFile to my sanitizeAppName(appIdentifier) + end if + + -- Determine extension based on format + set fileExt to outputFormat + if fileExt is "" then set fileExt to defaultScreenshotFormat + + set outputPath to "/tmp/peekaboo_" & appNameForFile & "_" & timestamp & "." & fileExt + else + -- Check if user specified a directory for multi-window mode + if multiWindow and outputPath ends with "/" then + set timestamp to do shell script "date +%Y%m%d_%H%M%S" + set appNameForFile to my sanitizeAppName(appIdentifier) + set fileExt to outputFormat + if fileExt is "" then set fileExt to defaultScreenshotFormat + set outputPath to outputPath & "peekaboo_" & appNameForFile & "_" & timestamp & "." & fileExt + else if outputFormat is not "" and not (outputPath ends with ("." & outputFormat)) then + -- Apply format if specified but not in path + set outputPath to outputPath & "." & outputFormat + end if + end if + + -- Validate output path + if outputSpecified and not my isValidPath(outputPath) then + return my formatErrorMessage("Argument Error", "Output path must be an absolute path starting with '/'.", "validation") + end if + + -- Resolve app identifier with detailed diagnostics + if appIdentifier is "" or appIdentifier is "fullscreen" then + set appInfo to {appName:"fullscreen", bundleID:"fullscreen", isRunning:true, resolvedBy:"fullscreen"} + else + set appInfo to my resolveAppIdentifier(appIdentifier) + end if + if appInfo is missing value then + set errorDetails to "Could not resolve app identifier '" & appIdentifier & "'." + + -- Provide specific guidance based on identifier type + if appIdentifier contains "." then + set errorDetails to errorDetails & " This appears to be a bundle ID. Common issues:" & linefeed + set errorDetails to errorDetails & "β€’ Bundle ID may be incorrect (try 'com.apple.' prefix for system apps)" & linefeed + set errorDetails to errorDetails & "β€’ App may not be installed" & linefeed + set errorDetails to errorDetails & "β€’ Use 'osascript peekaboo.scpt list' to see available apps" + else + set errorDetails to errorDetails & " Fuzzy matching tried but no matches found." & linefeed + set errorDetails to errorDetails & "β€’ Partial names are supported (e.g., 'Chrome' for 'Google Chrome')" & linefeed + set errorDetails to errorDetails & "β€’ Common abbreviations work (e.g., 'Code' for 'Visual Studio Code')" & linefeed + set errorDetails to errorDetails & "β€’ App may not be installed or running" & linefeed + set errorDetails to errorDetails & "β€’ Use 'osascript peekaboo.scpt list' to see running apps" + end if + + return my formatErrorMessage("App Resolution Error", errorDetails, "app resolution") + end if + + set resolvedAppName to appName of appInfo + set resolvedBy to resolvedBy of appInfo + my logVerbose("App resolved: " & resolvedAppName & " (method: " & resolvedBy & ")") + + -- Bring app to front + set frontError to my bringAppToFront(appInfo) + if frontError is not "" then return frontError + + -- Smart multi-window detection for AI analysis + if analyzeMode and resolvedAppName is not "fullscreen" and not forceFullscreen then + -- Check how many windows the app has + set windowStatus to my getAppWindowStatus(resolvedAppName) + try + set statusClass to class of windowStatus + if statusClass is not text and statusClass is not string then + -- It's a success record + set totalWindows to totalWindows of windowStatus + if totalWindows > 1 and not multiWindow and captureMode is not "screen" then + -- Automatically enable multi-window mode for AI analysis + set multiWindow to true + my logVerbose("Auto-enabling multi-window mode for AI analysis (app has " & totalWindows & " windows)") + end if + end if + on error + -- Continue without auto-enabling + end try + end if + + -- Pre-capture window validation for better error messages + if (multiWindow or captureMode is "window") and resolvedAppName is not "fullscreen" then + set windowStatus to my getAppWindowStatus(resolvedAppName) + -- Check if it's an error (string starting with prefix) or success (record) + try + set statusClass to class of windowStatus + if statusClass is text or statusClass is string then + -- It's an error message + if multiWindow then + set contextError to "Multi-window capture failed: " & windowStatus + set contextError to contextError & linefeed & "πŸ’‘ Suggestion: Try basic screenshot mode without --multi flag" + else + set contextError to "Window capture failed: " & windowStatus + set contextError to contextError & linefeed & "πŸ’‘ Suggestion: Try full-screen capture mode without --window flag" + end if + return contextError + else + -- It's a success record + set statusMsg to message of windowStatus + my logVerbose("Window validation passed: " & statusMsg) + end if + on error + -- Fallback if type check fails + my logVerbose("Window validation status check bypassed") + end try + end if + + -- Handle multi-window capture + if multiWindow then + set capturedFiles to my captureMultipleWindows(resolvedAppName, outputPath, resizeDimension) + -- Check if it's an error (string) or success (list) + try + set capturedClass to class of capturedFiles + if capturedClass is text or capturedClass is string then + return capturedFiles -- Error message + end if + on error + -- Continue with list processing + end try + + -- If AI analysis requested, analyze all captured windows + if analyzeMode and (count of capturedFiles) > 0 then + set analysisResults to {} + set allSuccess to true + + repeat with fileInfo in capturedFiles + set filePath to item 1 of fileInfo + set windowTitle to item 2 of fileInfo + set windowIndex to item 3 of fileInfo + + set analysisResult to my analyzeImageWithAI(filePath, analysisQuestion, visionModel, requestedProvider, resizeDimension) + + if analysisResult starts with scriptInfoPrefix and analysisResult contains "Analysis Complete" then + -- Extract just the answer part from the analysis + set answerStart to (offset of "πŸ’¬ Answer:" in analysisResult) + 10 + set answerEnd to (offset of (scriptInfoPrefix & "Analysis via") in analysisResult) - 1 + if answerStart > 10 and answerEnd > answerStart then + set windowAnswer to text answerStart thru answerEnd of analysisResult + else + set windowAnswer to analysisResult + end if + set end of analysisResults to {windowTitle:windowTitle, windowIndex:windowIndex, answer:windowAnswer, success:true} + else + set allSuccess to false + set end of analysisResults to {windowTitle:windowTitle, windowIndex:windowIndex, answer:analysisResult, success:false} + end if + end repeat + + -- Format multi-window AI analysis results + return my formatMultiWindowAnalysis(capturedFiles, analysisResults, resolvedAppName, analysisQuestion, visionModel, quietMode) + else + -- Process successful capture without AI + return my formatMultiOutput(capturedFiles, resolvedAppName, quietMode) + end if + else + -- Single capture + set screenshotResult to my captureScreenshot(outputPath, captureMode, resolvedAppName, resizeDimension) + if screenshotResult starts with scriptInfoPrefix then + return screenshotResult -- Error message + else + set modeDescription to "full screen" + if captureMode is "window" then set modeDescription to "front window only" + + -- If AI analysis requested, analyze the screenshot + if analyzeMode then + set analysisResult to my analyzeImageWithAI(screenshotResult, analysisQuestion, visionModel, requestedProvider, resizeDimension) + if analysisResult starts with scriptInfoPrefix and analysisResult contains "Analysis Complete" then + -- Successful analysis + return analysisResult + else + -- Analysis failed, return screenshot success + analysis error + return scriptInfoPrefix & "Screenshot captured successfully! πŸ“Έ" & linefeed & "β€’ File: " & screenshotResult & linefeed & "β€’ App: " & resolvedAppName & linefeed & "β€’ Mode: " & modeDescription & linefeed & linefeed & "⚠️ AI Analysis failed:" & linefeed & analysisResult + end if + else + -- Regular screenshot without analysis + return my formatCaptureOutput(screenshotResult, resolvedAppName, modeDescription, quietMode) + end if + end if + end if + + on error generalErrorMsg number generalErrorNum + if appSpecificErrorOccurred then error generalErrorMsg number generalErrorNum + return my formatErrorMessage("Execution Error", generalErrorMsg, "error " & generalErrorNum) + end try +end run +--#endregion Main Script Logic (on run) + +--#region Usage Function +on usageText() + set LF to linefeed + set scriptName to "peekaboo.scpt" + + set outText to "Peekaboo v1.0.0 - Screenshot automation that actually works! πŸ‘€ β†’ πŸ“Έ β†’ πŸ’Ύ" & LF & LF + + set outText to outText & "USAGE:" & LF + set outText to outText & " peekaboo [app] [options] # Screenshot app or fullscreen" & LF + set outText to outText & " peekaboo analyze \"question\" [opts] # Analyze existing image" & LF + set outText to outText & " peekaboo list # List running apps" & LF + set outText to outText & " peekaboo help # Show this help" & LF & LF + + set outText to outText & "COMMANDS:" & LF + set outText to outText & " [app] App name or bundle ID (optional, defaults to fullscreen)" & LF + set outText to outText & " analyze Analyze existing image with AI vision" & LF + set outText to outText & " list, ls List all running apps with window info" & LF + set outText to outText & " help, -h Show this help message" & LF & LF + + set outText to outText & "OPTIONS:" & LF + set outText to outText & " -o, --output Output file or directory path" & LF + set outText to outText & " -f, --fullscreen Force fullscreen capture" & LF + set outText to outText & " -w, --window Single window capture (default with app)" & LF + set outText to outText & " -m, --multi Capture all app windows separately" & LF + set outText to outText & " -a, --ask \"question\" AI analysis of screenshot" & LF + set outText to outText & " --model AI model (e.g., llava:7b)" & LF + set outText to outText & " --provider AI provider: auto|ollama|claude" & LF + set outText to outText & " --format Output format: png|jpg|pdf" & LF + set outText to outText & " -r, --resize Resize to max dimension (faster AI)" & LF + set outText to outText & " -v, --verbose Enable debug output" & LF + set outText to outText & " -q, --quiet Minimal output (just file path)" & LF & LF + + set outText to outText & "EXAMPLES:" & LF + set outText to outText & " # Basic captures" & LF + set outText to outText & " peekaboo # Fullscreen" & LF + set outText to outText & " peekaboo Safari # Safari window" & LF + set outText to outText & " peekaboo Safari -o ~/Desktop/safari.png # Specific path" & LF + set outText to outText & " peekaboo -f -o screenshot.jpg --format jpg # Fullscreen as JPG" & LF & LF + + set outText to outText & " # Multi-window capture" & LF + set outText to outText & " peekaboo Chrome -m # All Chrome windows" & LF + set outText to outText & " peekaboo Safari -m -o ~/screenshots/ # To directory" & LF & LF + + set outText to outText & " # AI analysis" & LF + set outText to outText & " peekaboo Safari -a \"What's on this page?\" # Screenshot + analyze" & LF + set outText to outText & " peekaboo -f -a \"Any errors visible?\" # Fullscreen + analyze" & LF + set outText to outText & " peekaboo analyze photo.png \"What is this?\" # Analyze existing" & LF + set outText to outText & " peekaboo Terminal -a \"Show the error\" --model llava:13b" & LF + set outText to outText & " peekaboo Safari -a \"What's shown?\" -r 1024 # Resize for faster AI" & LF & LF + + set outText to outText & " # Other commands" & LF + set outText to outText & " peekaboo list # Show running apps" & LF + set outText to outText & " peekaboo help # This help" & LF & LF + + set outText to outText & "Note: When using with osascript, quote arguments and escape as needed:" & LF + set outText to outText & " osascript peekaboo.scpt Safari -a \"What's shown?\"" & LF & LF + + set outText to outText & "AI Analysis Features:" & LF + set outText to outText & " β€’ Smart provider detection: auto-detects Ollama or Claude CLI" & LF + set outText to outText & " β€’ Smart multi-window: Automatically analyzes ALL windows for multi-window apps" & LF + set outText to outText & " - App has 3 windows? Analyzes all 3 and reports on each" & LF + set outText to outText & " - Use -w flag to force single window analysis" & LF + set outText to outText & " β€’ Ollama: Local inference with vision models (recommended)" & LF + set outText to outText & " - Supports direct image file analysis" & LF + set outText to outText & " - Priority: qwen2.5vl:7b > llava:7b > llava-phi3:3.8b > minicpm-v:8b" & LF + set outText to outText & " β€’ Claude: Limited support (CLI doesn't analyze image files)" & LF + set outText to outText & " - Claude CLI detected but can't process image files directly" & LF + set outText to outText & " - Use Ollama for automated image analysis" & LF + set outText to outText & " β€’ One-step: Screenshot + analysis in single command" & LF + set outText to outText & " β€’ Two-step: Analyze existing images separately" & LF + set outText to outText & " β€’ Timeout protection: 90-second timeout prevents hanging" & LF & LF + + set outText to outText & "Multi-Window Features:" & LF + set outText to outText & " β€’ --multi creates separate files with descriptive names" & LF + set outText to outText & " β€’ Window titles are sanitized for safe filenames" & LF + set outText to outText & " β€’ Files named as: basename_window_N_title.ext" & LF + set outText to outText & " β€’ Each window is focused before capture for accuracy" & LF & LF + + set outText to outText & "Notes:" & LF + set outText to outText & " β€’ Default behavior: App specified = window capture, No app = full screen" & LF + set outText to outText & " β€’ Requires Screen Recording permission in System Preferences" & LF + set outText to outText & " β€’ Accessibility permission may be needed for window enumeration" & LF + set outText to outText & " β€’ Window titles longer than " & maxWindowTitleLength & " characters are truncated" & LF + set outText to outText & " β€’ Default capture delay: " & (captureDelay as string) & " second(s) (optimized for speed)" & LF + + return outText +end usageText +--#endregion Usage Function \ No newline at end of file diff --git a/.cursor/scripts/terminator.scpt b/.cursor/scripts/terminator.scpt new file mode 100755 index 0000000..e1d838f --- /dev/null +++ b/.cursor/scripts/terminator.scpt @@ -0,0 +1,774 @@ +#!/usr/bin/osascript +-------------------------------------------------------------------------------- +-- terminator.scpt - v0.6.1 Enhanced "T-1000" +-- Enhanced Terminal session management with smart session reuse and better error reporting +-- Features: Smart session reuse, enhanced error reporting, improved timing, better output formatting +-------------------------------------------------------------------------------- + +--#region Configuration Properties +property maxCommandWaitTime : 15.0 -- Increased from 10.0 for better reliability +property pollIntervalForBusyCheck : 0.1 +property startupDelayForTerminal : 0.7 +property minTailLinesOnWrite : 100 -- Increased from 15 for better build log visibility +property defaultTailLines : 100 -- Increased from 30 for better build log visibility +property tabTitlePrefix : "πŸ€–πŸ’₯ " -- For the window/tab title itself +property scriptInfoPrefix : "Terminator πŸ€–πŸ’₯: " -- For messages generated by this script +property projectIdentifierInTitle : "Project: " +property taskIdentifierInTitle : " - Task: " +property enableFuzzyTagGrouping : true +property fuzzyGroupingMinPrefixLength : 4 + +-- Safe enhanced properties (minimal additions) +property enhancedErrorReporting : true +property verboseLogging : false +--#endregion Configuration Properties + +--#region Helper Functions +on isValidPath(thePath) + if thePath is not "" and (thePath starts with "/") then + if not (thePath contains " -") then -- Basic heuristic + return true + end if + end if + return false +end isValidPath + +on getPathComponent(thePath, componentIndex) + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to "/" + set pathParts to text items of thePath + set AppleScript's text item delimiters to oldDelims + set nonEmptyParts to {} + repeat with aPart in pathParts + if aPart is not "" then set end of nonEmptyParts to aPart + end repeat + if (count nonEmptyParts) = 0 then return "" + try + if componentIndex is -1 then + return item -1 of nonEmptyParts + else if componentIndex > 0 and componentIndex ≀ (count nonEmptyParts) then + return item componentIndex of nonEmptyParts + end if + on error + return "" + end try + return "" +end getPathComponent + +on generateWindowTitle(taskTag as text, projectGroup as text) + if projectGroup is not "" then + return tabTitlePrefix & projectIdentifierInTitle & projectGroup & taskIdentifierInTitle & taskTag + else + return tabTitlePrefix & taskTag + end if +end generateWindowTitle + +on bufferContainsMeaningfulContentAS(multiLineText, knownInfoPrefix as text, commonShellPrompts as list) + if multiLineText is "" then return false + + -- Simple approach: if the trimmed content is substantial and not just our info messages, consider it meaningful + set trimmedText to my trimWhitespace(multiLineText) + if (length of trimmedText) < 3 then return false + + -- Check if it's only our script info messages + if trimmedText starts with knownInfoPrefix then + -- If it's ONLY our message and nothing else meaningful, return false + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to linefeed + set textLines to text items of multiLineText + set AppleScript's text item delimiters to oldDelims + + set nonInfoLines to 0 + repeat with aLine in textLines + set currentLine to my trimWhitespace(aLine as text) + if currentLine is not "" and not (currentLine starts with knownInfoPrefix) then + set nonInfoLines to nonInfoLines + 1 + end if + end repeat + + -- If we have substantial non-info content, consider it meaningful + return (nonInfoLines > 2) + end if + + -- If content doesn't start with our info prefix, likely contains command output + return true +end bufferContainsMeaningfulContentAS + +-- Enhanced error reporting helper +on formatErrorMessage(errorType, errorMsg, context) + if enhancedErrorReporting then + set formattedMsg to scriptInfoPrefix & errorType & ": " & errorMsg + if context is not "" then + set formattedMsg to formattedMsg & " (Context: " & context & ")" + end if + return formattedMsg + else + return scriptInfoPrefix & errorMsg + end if +end formatErrorMessage + +-- Enhanced logging helper +on logVerbose(message) + if verboseLogging then + log "πŸ” " & message + end if +end logVerbose +--#endregion Helper Functions + +--#region Main Script Logic (on run) +on run argv + set appSpecificErrorOccurred to false + try + my logVerbose("Starting Terminator v0.6.0 Safe Enhanced") + + tell application "System Events" + if not (exists process "Terminal") then + launch application id "com.apple.Terminal" + delay startupDelayForTerminal + end if + end tell + + set originalArgCount to count argv + if originalArgCount < 1 then return my usageText() + + set projectPathArg to "" + set actualArgsForParsing to argv + if originalArgCount > 0 then + set potentialPath to item 1 of argv + if my isValidPath(potentialPath) then + set projectPathArg to potentialPath + my logVerbose("Detected project path: " & projectPathArg) + if originalArgCount > 1 then + set actualArgsForParsing to items 2 thru -1 of argv + else + return my formatErrorMessage("Argument Error", "Project path \"" & projectPathArg & "\" provided, but no task tag or command specified." & linefeed & linefeed & my usageText(), "") + end if + end if + end if + + if (count actualArgsForParsing) < 1 then return my usageText() + + set taskTagName to item 1 of actualArgsForParsing + my logVerbose("Task tag: " & taskTagName) + + if (length of taskTagName) > 40 or (not my tagOK(taskTagName)) then + set errorMsg to "Task Tag missing or invalid: \"" & taskTagName & "\"." & linefeed & linefeed & Β¬ + "A 'task tag' (e.g., 'build', 'tests') is a short name (1-40 letters, digits, -, _) " & Β¬ + "to identify a specific task, optionally within a project session." & linefeed & linefeed + return my formatErrorMessage("Validation Error", errorMsg & my usageText(), "tag validation") + end if + + set doWrite to false + set shellCmd to "" + set originalUserShellCmd to "" + set currentTailLines to defaultTailLines + set explicitLinesProvided to false + set argCountAfterTagOrPath to count actualArgsForParsing + + if argCountAfterTagOrPath > 1 then + set commandParts to items 2 thru -1 of actualArgsForParsing + if (count commandParts) > 0 then + set lastOfCmdParts to item -1 of commandParts + if my isInteger(lastOfCmdParts) then + set currentTailLines to (lastOfCmdParts as integer) + set explicitLinesProvided to true + my logVerbose("Explicit lines requested: " & currentTailLines) + if (count commandParts) > 1 then + set commandParts to items 1 thru -2 of commandParts + else + set commandParts to {} + end if + end if + end if + if (count commandParts) > 0 then + set originalUserShellCmd to my joinList(commandParts, " ") + my logVerbose("Command detected: " & originalUserShellCmd) + end if + else if argCountAfterTagOrPath = 1 then + -- Only taskTagName was provided after potential projectPathArg + -- This is a read operation by default. + my logVerbose("Read-only operation detected") + end if + + if originalUserShellCmd is not "" and (my trimWhitespace(originalUserShellCmd) is not "") then + set doWrite to true + set shellCmd to originalUserShellCmd + else if projectPathArg is not "" and originalUserShellCmd is "" then + -- Path provided, task tag, and empty command string "" OR no command string but lines_to_read was there + set doWrite to true + set shellCmd to "" -- will become 'cd path' + my logVerbose("CD-only operation for path: " & projectPathArg) + else + set doWrite to false + set shellCmd to "" + end if + + if currentTailLines < 1 then set currentTailLines to 1 + if doWrite and (shellCmd is not "" or projectPathArg is not "") and currentTailLines < minTailLinesOnWrite then + set currentTailLines to minTailLinesOnWrite + my logVerbose("Increased tail lines for write operation: " & currentTailLines) + end if + + if projectPathArg is not "" and doWrite then + set quotedProjectPath to quoted form of projectPathArg + if shellCmd is not "" then + set shellCmd to "cd " & quotedProjectPath & " && " & shellCmd + else + set shellCmd to "cd " & quotedProjectPath + end if + my logVerbose("Final command: " & shellCmd) + end if + + set derivedProjectGroup to "" + if projectPathArg is not "" then + set derivedProjectGroup to my getPathComponent(projectPathArg, -1) + if derivedProjectGroup is "" then set derivedProjectGroup to "DefaultProject" + my logVerbose("Project group: " & derivedProjectGroup) + end if + + set allowCreation to false + if doWrite then + set allowCreation to true + else if explicitLinesProvided then + set allowCreation to true + end if + + set effectiveTabTitleForLookup to my generateWindowTitle(taskTagName, derivedProjectGroup) + my logVerbose("Tab title: " & effectiveTabTitleForLookup) + + set tabInfo to my ensureTabAndWindow(taskTagName, derivedProjectGroup, allowCreation, effectiveTabTitleForLookup) + + if tabInfo is missing value then + if not allowCreation then + set errorMsg to "Terminal session \"" & effectiveTabTitleForLookup & "\" not found." & linefeed & Β¬ + "To create this session, provide a command (even an empty string \"\" if only 'cd'-ing to a project path), " & Β¬ + "or specify lines to read (e.g., ... \"" & taskTagName & "\" 1)." & linefeed + if projectPathArg is not "" then + set errorMsg to errorMsg & "Project path was specified as: \"" & projectPathArg & "\"." & linefeed + else + set errorMsg to errorMsg & "If this is for a new project, provide the absolute project path as the first argument." & linefeed + end if + return my formatErrorMessage("Session Error", errorMsg & linefeed & my usageText(), "session lookup") + else + return my formatErrorMessage("Creation Error", "Could not find or create Terminal tab for \"" & effectiveTabTitleForLookup & "\". Check permissions/Terminal state.", "tab creation") + end if + end if + + set targetTab to targetTab of tabInfo + set parentWindow to parentWindow of tabInfo + set wasNewlyCreated to wasNewlyCreated of tabInfo + set createdInExistingViaFuzzy to createdInExistingWindowViaFuzzy of tabInfo + + my logVerbose("Tab info - new: " & wasNewlyCreated & ", fuzzy: " & createdInExistingViaFuzzy) + + set bufferText to "" + set commandTimedOut to false + set tabWasBusyOnRead to false + set previousCommandActuallyStopped to true + set attemptMadeToStopPreviousCommand to false + set identifiedBusyProcessName to "" + set theTTYForInfo to "" + + if not doWrite and wasNewlyCreated then + if createdInExistingViaFuzzy then + return scriptInfoPrefix & "New tab \"" & effectiveTabTitleForLookup & "\" created in existing project window and ready." + else + return scriptInfoPrefix & "New tab \"" & effectiveTabTitleForLookup & "\" (in new window) created and ready." + end if + end if + + tell application id "com.apple.Terminal" + try + set index of parentWindow to 1 + set selected tab of parentWindow to targetTab + if wasNewlyCreated and doWrite then + delay 0.4 + else + delay 0.1 + end if + + if doWrite and shellCmd is not "" then + my logVerbose("Executing command: " & shellCmd) + set canProceedWithWrite to true + if busy of targetTab then + if not wasNewlyCreated or createdInExistingViaFuzzy then + set attemptMadeToStopPreviousCommand to true + set previousCommandActuallyStopped to false + try + set theTTYForInfo to my trimWhitespace(tty of targetTab) + end try + set processesBefore to {} + try + set processesBefore to processes of targetTab + end try + set commonShells to {"login", "bash", "zsh", "sh", "tcsh", "ksh", "-bash", "-zsh", "-sh", "-tcsh", "-ksh", "dtterm", "fish"} + set identifiedBusyProcessName to "" + if (count of processesBefore) > 0 then + repeat with i from (count of processesBefore) to 1 by -1 + set aProcessName to item i of processesBefore + if aProcessName is not in commonShells then + set identifiedBusyProcessName to aProcessName + exit repeat + end if + end repeat + end if + my logVerbose("Busy process identified: " & identifiedBusyProcessName) + set processToTargetForKill to identifiedBusyProcessName + set killedViaPID to false + if theTTYForInfo is not "" and processToTargetForKill is not "" then + set shortTTY to text 6 thru -1 of theTTYForInfo + set pidsToKillText to "" + try + set psCommand to "ps -t " & shortTTY & " -o pid,comm | awk '$2 == \"" & processToTargetForKill & "\" {print $1}'" + set pidsToKillText to do shell script psCommand + end try + if pidsToKillText is not "" then + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to linefeed + set pidList to text items of pidsToKillText + set AppleScript's text item delimiters to oldDelims + repeat with aPID in pidList + set aPID to my trimWhitespace(aPID) + if aPID is not "" then + try + do shell script "kill -INT " & aPID + delay 0.3 + do shell script "kill -0 " & aPID + try + do shell script "kill -KILL " & aPID + delay 0.2 + try + do shell script "kill -0 " & aPID + on error + set previousCommandActuallyStopped to true + end try + end try + on error + set previousCommandActuallyStopped to true + end try + end if + if previousCommandActuallyStopped then + set killedViaPID to true + exit repeat + end if + end repeat + end if + end if + if not previousCommandActuallyStopped and busy of targetTab then + activate + delay 0.5 + tell application "System Events" to keystroke "c" using control down + delay 0.6 + if not (busy of targetTab) then + set previousCommandActuallyStopped to true + if identifiedBusyProcessName is not "" and (identifiedBusyProcessName is in (processes of targetTab)) then + set previousCommandActuallyStopped to false + end if + end if + else if not busy of targetTab then + set previousCommandActuallyStopped to true + end if + if not previousCommandActuallyStopped then + set canProceedWithWrite to false + end if + else if wasNewlyCreated and not createdInExistingViaFuzzy and busy of targetTab then + delay 0.4 + if busy of targetTab then + set attemptMadeToStopPreviousCommand to true + set previousCommandActuallyStopped to false + set identifiedBusyProcessName to "extended initialization" + set canProceedWithWrite to false + else + set previousCommandActuallyStopped to true + end if + end if + end if + + if canProceedWithWrite then + -- Clear before write to prevent output truncation (only for reused tabs) + if not wasNewlyCreated then + do script "clear" in targetTab + delay 0.1 + end if + do script shellCmd in targetTab + set commandStartTime to current date + set commandFinished to false + repeat while ((current date) - commandStartTime) < maxCommandWaitTime + if not (busy of targetTab) then + set commandFinished to true + exit repeat + end if + delay pollIntervalForBusyCheck + end repeat + if not commandFinished then set commandTimedOut to true + if commandFinished then delay 0.2 -- Increased from 0.1 for better output settling + my logVerbose("Command execution completed, timeout: " & commandTimedOut) + end if + else if not doWrite then + if busy of targetTab then + set tabWasBusyOnRead to true + try + set theTTYForInfo to my trimWhitespace(tty of targetTab) + end try + set processesReading to processes of targetTab + set commonShells to {"login", "bash", "zsh", "sh", "tcsh", "ksh", "-bash", "-zsh", "-sh", "-tcsh", "-ksh", "dtterm", "fish"} + set identifiedBusyProcessName to "" + if (count of processesReading) > 0 then + repeat with i from (count of processesReading) to 1 by -1 + set aProcessName to item i of processesReading + if aProcessName is not in commonShells then + set identifiedBusyProcessName to aProcessName + exit repeat + end if + end repeat + end if + my logVerbose("Tab busy during read with: " & identifiedBusyProcessName) + end if + end if + + set bufferText to history of targetTab + on error errMsg number errNum + set appSpecificErrorOccurred to true + return my formatErrorMessage("Terminal Error", errMsg, "error " & errNum) + end try + end tell + + set appendedMessage to "" + set ttyInfoStringForMessage to "" + if theTTYForInfo is not "" then set ttyInfoStringForMessage to " (TTY " & theTTYForInfo & ")" + if attemptMadeToStopPreviousCommand then + set processNameToReport to "process" + if identifiedBusyProcessName is not "" and identifiedBusyProcessName is not "extended initialization" then + set processNameToReport to "'" & identifiedBusyProcessName & "'" + else if identifiedBusyProcessName is "extended initialization" then + set processNameToReport to "tab's extended initialization" + end if + if previousCommandActuallyStopped then + set appendedMessage to linefeed & scriptInfoPrefix & "Previous " & processNameToReport & ttyInfoStringForMessage & " was interrupted. ---" + else + set appendedMessage to linefeed & scriptInfoPrefix & "Attempted to interrupt previous " & processNameToReport & ttyInfoStringForMessage & ", but it may still be running. New command NOT executed. ---" + end if + end if + if commandTimedOut then + set cmdForMsg to originalUserShellCmd + if projectPathArg is not "" and originalUserShellCmd is not "" then set cmdForMsg to originalUserShellCmd & " (in " & projectPathArg & ")" + if projectPathArg is not "" and originalUserShellCmd is "" then set cmdForMsg to "(cd " & projectPathArg & ")" + set appendedMessage to appendedMessage & linefeed & scriptInfoPrefix & "Command '" & cmdForMsg & "' may still be running. Returned after " & maxCommandWaitTime & "s timeout. ---" + else if tabWasBusyOnRead then + set processNameToReportOnRead to "process" + if identifiedBusyProcessName is not "" then set processNameToReportOnRead to "'" & identifiedBusyProcessName & "'" + set busyProcessInfoString to "" + if identifiedBusyProcessName is not "" then set busyProcessInfoString to " with " & processNameToReportOnRead + set appendedMessage to appendedMessage & linefeed & scriptInfoPrefix & "Tab" & ttyInfoStringForMessage & " was busy" & busyProcessInfoString & " during read. Output may be from an ongoing process. ---" + end if + + if appendedMessage is not "" then + if bufferText is "" then + set bufferText to my trimWhitespace(appendedMessage) + else + set bufferText to bufferText & appendedMessage + end if + end if + + set tailedOutput to my tailBufferAS(bufferText, currentTailLines) + set finalResult to my trimBlankLinesAS(tailedOutput) + + if finalResult is "" then + set effectiveOriginalCmdForMsg to originalUserShellCmd + if projectPathArg is not "" and originalUserShellCmd is "" then + set effectiveOriginalCmdForMsg to "(cd " & projectPathArg & ")" + else if projectPathArg is not "" and originalUserShellCmd is not "" then + set effectiveOriginalCmdForMsg to originalUserShellCmd & " (in " & projectPathArg & ")" + end if + + set baseMsgInfo to "Session \"" & effectiveTabTitleForLookup & "\", requested " & currentTailLines & " lines." + set specificAppendedInfo to my trimWhitespace(appendedMessage) + set suffixForReturn to "" + if specificAppendedInfo is not "" then set suffixForReturn to linefeed & specificAppendedInfo + + if attemptMadeToStopPreviousCommand and not previousCommandActuallyStopped then + return my formatErrorMessage("Process Error", "Previous command/initialization in session \"" & effectiveTabTitleForLookup & "\"" & ttyInfoStringForMessage & " may not have terminated. New command '" & effectiveOriginalCmdForMsg & "' NOT executed." & suffixForReturn, "process termination") + else if commandTimedOut then + return my formatErrorMessage("Timeout Error", "Command '" & effectiveOriginalCmdForMsg & "' timed out after " & maxCommandWaitTime & "s. No other output. " & baseMsgInfo & suffixForReturn, "command timeout") + else if tabWasBusyOnRead then + return my formatErrorMessage("Busy Error", "Tab for session \"" & effectiveTabTitleForLookup & "\" was busy during read. No other output. " & baseMsgInfo & suffixForReturn, "read busy") + else if doWrite and shellCmd is not "" then + return scriptInfoPrefix & "Command '" & effectiveOriginalCmdForMsg & "' executed in session \"" & effectiveTabTitleForLookup & "\". No output captured." + else + return scriptInfoPrefix & "No meaningful content found in session \"" & effectiveTabTitleForLookup & "\"." + end if + end if + + my logVerbose("Returning " & (length of finalResult) & " characters of output") + return finalResult + + on error generalErrorMsg number generalErrorNum + if appSpecificErrorOccurred then error generalErrorMsg number generalErrorNum + return my formatErrorMessage("Execution Error", generalErrorMsg, "error " & generalErrorNum) + end try +end run +--#endregion Main Script Logic (on run) + +--#region Helper Functions +on ensureTabAndWindow(taskTagName as text, projectGroupName as text, allowCreate as boolean, desiredFullTitle as text) + set wasActuallyCreated to false + set createdInExistingViaFuzzy to false + + tell application id "com.apple.Terminal" + try + repeat with w in windows + repeat with tb in tabs of w + try + if custom title of tb is desiredFullTitle then + set selected tab of w to tb + return {targetTab:tb, parentWindow:w, wasNewlyCreated:false, createdInExistingWindowViaFuzzy:false} + end if + end try + end repeat + end repeat + end try + + if allowCreate and enableFuzzyTagGrouping and projectGroupName is not "" then + set projectGroupSearchPatternForWindowName to tabTitlePrefix & projectIdentifierInTitle & projectGroupName + try + repeat with w in windows + try + -- Look for any window that contains our project name + if name of w contains projectGroupSearchPatternForWindowName or name of w contains (projectIdentifierInTitle & projectGroupName) then + if not frontmost then activate + delay 0.2 + set newTabInGroup to do script "" in w + delay 0.3 + set custom title of newTabInGroup to desiredFullTitle + delay 0.2 + set selected tab of w to newTabInGroup + return {targetTab:newTabInGroup, parentWindow:w, wasNewlyCreated:true, createdInExistingWindowViaFuzzy:true} + end if + end try + end repeat + end try + end if + + -- Enhanced fallback: if no project-specific window found, try to use any existing Terminator window + if allowCreate and enableFuzzyTagGrouping then + try + repeat with w in windows + try + if name of w contains tabTitlePrefix then + -- Found an existing Terminator window, use it for grouping + if not frontmost then activate + delay 0.2 + set newTabInGroup to do script "" in w + delay 0.3 + set custom title of newTabInGroup to desiredFullTitle + delay 0.2 + set selected tab of w to newTabInGroup + return {targetTab:newTabInGroup, parentWindow:w, wasNewlyCreated:true, createdInExistingWindowViaFuzzy:true} + end if + end try + end repeat + end try + end if + + if allowCreate then + try + if not frontmost then activate + delay 0.3 + set newTabInNewWindow to do script "" + set wasActuallyCreated to true + delay 0.4 + set custom title of newTabInNewWindow to desiredFullTitle + delay 0.2 + set parentWinOfNew to missing value + try + set parentWinOfNew to window of newTabInNewWindow + on error + if (count of windows) > 0 then set parentWinOfNew to front window + end try + if parentWinOfNew is not missing value then + if custom title of newTabInNewWindow is desiredFullTitle then + set selected tab of parentWinOfNew to newTabInNewWindow + return {targetTab:newTabInNewWindow, parentWindow:parentWinOfNew, wasNewlyCreated:wasActuallyCreated, createdInExistingWindowViaFuzzy:false} + end if + end if + repeat with w_final_scan in windows + repeat with tb_final_scan in tabs of w_final_scan + try + if custom title of tb_final_scan is desiredFullTitle then + set selected tab of w_final_scan to tb_final_scan + return {targetTab:tb_final_scan, parentWindow:w_final_scan, wasNewlyCreated:wasActuallyCreated, createdInExistingWindowViaFuzzy:false} + end if + end try + end repeat + end repeat + return missing value + on error + return missing value + end try + else + return missing value + end if + end tell +end ensureTabAndWindow + +on tailBufferAS(txt, n) + set AppleScript's text item delimiters to linefeed + set lst to text items of txt + if (count lst) = 0 then return "" + set startN to (count lst) - (n - 1) + if startN < 1 then set startN to 1 + set slice to items startN thru -1 of lst + set outText to slice as text + set AppleScript's text item delimiters to "" + return outText +end tailBufferAS + +on lineIsEffectivelyEmptyAS(aLine) + if aLine is "" then return true + set trimmedLine to my trimWhitespace(aLine) + return (trimmedLine is "") +end lineIsEffectivelyEmptyAS + +on trimBlankLinesAS(txt) + if txt is "" then return "" + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to {linefeed} + set originalLines to text items of txt + set linesToProcess to {} + repeat with aLineRef in originalLines + set aLine to contents of aLineRef + if my lineIsEffectivelyEmptyAS(aLine) then + set end of linesToProcess to "" + else + set end of linesToProcess to aLine + end if + end repeat + set firstContentLine to 1 + repeat while firstContentLine ≀ (count linesToProcess) and (item firstContentLine of linesToProcess is "") + set firstContentLine to firstContentLine + 1 + end repeat + set lastContentLine to count linesToProcess + repeat while lastContentLine β‰₯ firstContentLine and (item lastContentLine of linesToProcess is "") + set lastContentLine to lastContentLine - 1 + end repeat + if firstContentLine > lastContentLine then + set AppleScript's text item delimiters to oldDelims + return "" + end if + set resultLines to items firstContentLine thru lastContentLine of linesToProcess + set AppleScript's text item delimiters to linefeed + set trimmedTxt to resultLines as text + set AppleScript's text item delimiters to oldDelims + return trimmedTxt +end trimBlankLinesAS + +on trimWhitespace(theText) + set whitespaceChars to {" ", tab} + set newText to theText + repeat while (newText is not "") and (character 1 of newText is in whitespaceChars) + if (length of newText) > 1 then + set newText to text 2 thru -1 of newText + else + set newText to "" + end if + end repeat + repeat while (newText is not "") and (character -1 of newText is in whitespaceChars) + if (length of newText) > 1 then + set newText to text 1 thru -2 of newText + else + set newText to "" + end if + end repeat + return newText +end trimWhitespace + +on isInteger(v) + try + v as integer + return true + on error + return false + end try +end isInteger + +on tagOK(t) + try + do shell script "/bin/echo " & quoted form of t & " | /usr/bin/grep -E -q '^[A-Za-z0-9_-]+$'" + return true + on error + return false + end try +end tagOK + +on joinList(theList, theDelimiter) + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to theDelimiter + set theText to theList as text + set AppleScript's text item delimiters to oldDelims + return theText +end joinList + +on usageText() + set LF to linefeed + set scriptName to "terminator.scpt" + set exampleProject to "/Users/name/Projects/FancyApp" + set exampleProjectNameForTitle to my getPathComponent(exampleProject, -1) + if exampleProjectNameForTitle is "" then set exampleProjectNameForTitle to "DefaultProject" + set exampleTaskTag to "build_frontend" + set exampleFullCommand to "npm run build" + + set generatedExampleTitle to my generateWindowTitle(exampleTaskTag, exampleProjectNameForTitle) + + set outText to scriptName & " - v0.6.0 Enhanced \"T-1000\" – AppleScript Terminal helper" & LF & LF + set outText to outText & "Enhancements: Smart session reuse, enhanced error reporting, verbose logging (optional)" & LF & LF + set outText to outText & "Manages dedicated, tagged Terminal sessions, grouped by project path." & LF & LF + + set outText to outText & "Core Concept:" & LF + set outText to outText & " 1. For a NEW project, provide the absolute project path FIRST, then task tag, then command:" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"" & exampleTaskTag & "\" \"" & exampleFullCommand & "\"" & LF + set outText to outText & " The script will 'cd' into the project path and run the command." & LF + set outText to outText & " The tab will be titled like: \"" & generatedExampleTitle & "\"" & LF + set outText to outText & " 2. For SUBSEQUENT commands for THE SAME PROJECT, use the project path and task tag:" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"" & exampleTaskTag & "\" \"another_command\"" & LF + set outText to outText & " 3. To simply READ from an existing session (path & tag must identify an existing session):" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"" & exampleTaskTag & "\"" & LF + set outText to outText & " A READ operation on a non-existent tag (without path/command to create) will error." & LF & LF + + set outText to outText & "Title Format: \"" & tabTitlePrefix & projectIdentifierInTitle & "" & taskIdentifierInTitle & "\"" & LF + set outText to outText & "Or if no project path provided: \"" & tabTitlePrefix & "\"" & LF & LF + + set outText to outText & "Enhanced Features:" & LF + set outText to outText & " β€’ Smart session reuse for same project paths" & LF + set outText to outText & " β€’ Enhanced error reporting with context information" & LF + set outText to outText & " β€’ Optional verbose logging for debugging" & LF + set outText to outText & " β€’ No automatic clearing to prevent interrupting builds" & LF + set outText to outText & " β€’ 100-line default output for better build log visibility" & LF + set outText to outText & " β€’ Automatically 'cd's into project path if provided with a command." & LF + set outText to outText & " β€’ Groups new task tabs into existing project windows if fuzzy grouping enabled." & LF + set outText to outText & " β€’ Interrupts busy processes in reused tabs." & LF & LF + + set outText to outText & "Usage Examples:" & LF + set outText to outText & " # Start new project session, cd, run command, get 100 lines:" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"frontend_build\" \"npm run build\" 100" & LF + set outText to outText & " # Create/use 'backend_tests' task tab in the 'FancyApp' project window:" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"backend_tests\" \"pytest\"" & LF + set outText to outText & " # Prepare/create a new session by just cd'ing into project path (empty command):" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"dev_shell\" \"\" 1" & LF + set outText to outText & " # Read from an existing session:" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"frontend_build\" 50" & LF & LF + + set outText to outText & "Parameters:" & LF + set outText to outText & " [\"/absolute/project/path\"]: (Optional First Arg) Base path for project. Enables 'cd' and grouping." & LF + set outText to outText & " \"\": Required. Specific task name for the tab (e.g., 'build', 'tests')." & LF + set outText to outText & " [\"\"]: (Optional) Command. If path provided, 'cd path &&' is prepended." & LF + set outText to outText & " Use \"\" for no command (will just 'cd' if path given)." & LF + set outText to outText & " [[lines_to_read]]: (Optional Last Arg) Number of history lines. Default: " & defaultTailLines & "." & LF & LF + + set outText to outText & "Notes:" & LF + set outText to outText & " β€’ Provide project path on first use for a project for best window grouping and auto 'cd'." & LF + set outText to outText & " β€’ Ensure Automation permissions for Terminal.app & System Events.app." & LF + set outText to outText & " β€’ Works within Terminal.app's AppleScript limitations for reliable operation." & LF + + return outText +end usageText +--#endregion Helper Functions \ No newline at end of file diff --git a/README.md b/README.md index f8c5790..4120633 100644 --- a/README.md +++ b/README.md @@ -1,892 +1,425 @@ -# Peekaboo β€” The screenshot tool that just worksβ„’ +# Peekaboo MCP Server -![Peekaboo Banner](assets/banner.png) +A macOS utility exposed via Node.js MCP server for advanced screen captures, image analysis, and window management. -πŸ‘€ β†’ πŸ“Έ β†’ πŸ’Ύ β€” **Zero-click screenshots with AI superpowers** +## πŸš€ Installation & Setup ---- +### Prerequisites -## ✨ **FEATURES** +Before installing Peekaboo, ensure your system meets these requirements: -🎯 **Clean CLI** β€’ 🀫 **Quiet Mode** β€’ πŸ€– **AI Support** β€’ ⚑ **Non-Interactive** β€’ πŸͺŸ **Multi-Window** +**System Requirements:** +- **macOS 12.0+** (Monterey or later) +- **Node.js 18.0+** +- **Swift 5.7+** (for building the native CLI) +- **Xcode Command Line Tools** ---- +**Install Prerequisites:** +```bash +# Install Node.js (if not already installed) +brew install node -## πŸš€ **THE MAGIC** +# Install Xcode Command Line Tools (if not already installed) +xcode-select --install +``` -**Peekaboo** captures any app, any window, any time β€” no clicking required. Now with a beautiful command-line interface and AI vision analysis. +### Installation Methods -### 🎯 **Core Features** -- **Smart capture**: App window by default, fullscreen when no app specified -- **Zero interaction**: Uses window IDs, not mouse clicks -- **AI vision**: Ask questions about your screenshots (Ollama + Claude CLI) -- **Quiet mode**: Perfect for scripts and automation (`-q`) -- **Multi-window**: Capture all app windows separately (`-m`) -- **Format control**: PNG, JPG, PDF with auto-detection -- **Smart paths**: Auto-generated filenames or custom paths -- **Fast & reliable**: Optimized delays, robust error handling - -### 🌟 **Key Highlights** -- **Smart Multi-Window AI**: Automatically analyzes ALL windows for multi-window apps -- **Timeout Protection**: 90-second timeout prevents hanging on slow models -- **Clean CLI Design**: Consistent flags, short aliases, logical defaults -- **Claude CLI support**: Smart provider selection (Ollama preferred) -- **Performance tracking**: See how long AI analysis takes -- **Comprehensive help**: Clear sections, real examples - ---- - -## 🎯 **QUICK START** +#### Method 1: NPM Installation (Recommended) ```bash -# Install (one-time) -chmod +x peekaboo.scpt +# Install globally for system-wide access +npm install -g peekaboo-mcp -# Basic usage -osascript peekaboo.scpt # Capture fullscreen -osascript peekaboo.scpt Safari # Capture Safari window -osascript peekaboo.scpt help # Show all options +# Or install locally in your project +npm install peekaboo-mcp ``` ---- - -## πŸ“– **COMMAND REFERENCE** - -### 🎨 **Command Structure** -``` -peekaboo [app] [options] # Capture app or fullscreen -peekaboo analyze "question" [opts] # Analyze existing image -peekaboo list|ls # List running apps -peekaboo help|-h # Show help -``` - -### 🏷️ **Options** -| Option | Short | Description | -|--------|-------|-------------| -| `--output ` | `-o` | Output file or directory path | -| `--fullscreen` | `-f` | Force fullscreen capture | -| `--window` | `-w` | Single window (default with app) | -| `--multi` | `-m` | Capture all app windows | -| `--ask "question"` | `-a` | AI analysis of screenshot | -| `--quiet` | `-q` | Minimal output (just path) | -| `--verbose` | `-v` | Debug output | -| `--format ` | | Output format: png\|jpg\|pdf | -| `--model ` | | AI model (e.g., llava:7b) | -| `--provider

` | | AI provider: auto\|ollama\|claude | - ---- - -## πŸŽͺ **USAGE EXAMPLES** - -### πŸ“Έ **Basic Screenshots** -```bash -# Simplest captures -osascript peekaboo.scpt # Fullscreen β†’ /tmp/peekaboo_fullscreen_[timestamp].png -osascript peekaboo.scpt Safari # Safari window β†’ /tmp/peekaboo_safari_[timestamp].png -osascript peekaboo.scpt com.apple.Terminal # Using bundle ID β†’ /tmp/peekaboo_com_apple_terminal_[timestamp].png - -# Custom output paths -osascript peekaboo.scpt Safari -o ~/Desktop/safari.png -osascript peekaboo.scpt Finder -o ~/screenshots/finder.jpg --format jpg -osascript peekaboo.scpt -f -o ~/fullscreen.pdf # Fullscreen as PDF -``` - -### 🀫 **Quiet Mode** (Perfect for Scripts) -```bash -# Just get the file path - no extra output -FILE=$(osascript peekaboo.scpt Safari -q) -echo "Screenshot saved to: $FILE" - -# Use in scripts -SCREENSHOT=$(osascript peekaboo.scpt Terminal -q) -scp "$SCREENSHOT" user@server:/uploads/ - -# Chain commands -osascript peekaboo.scpt Finder -q | pbcopy # Copy path to clipboard -``` - -### 🎭 **Multi-Window Capture** -```bash -# Capture all windows of an app -osascript peekaboo.scpt Chrome -m -# Creates: /tmp/peekaboo_chrome_[timestamp]_window_1_[title].png -# /tmp/peekaboo_chrome_[timestamp]_window_2_[title].png -# etc. - -# Save to specific directory -osascript peekaboo.scpt Safari -m -o ~/safari-windows/ -# Creates: ~/safari-windows/peekaboo_safari_[timestamp]_window_1_[title].png -# ~/safari-windows/peekaboo_safari_[timestamp]_window_2_[title].png -``` - -### πŸ€– **AI Vision Analysis** -```bash -# One-step: Screenshot + Analysis -osascript peekaboo.scpt Safari -a "What website is this?" -osascript peekaboo.scpt Terminal -a "Are there any error messages?" -osascript peekaboo.scpt -f -a "Describe what's on my screen" - -# Specify AI model -osascript peekaboo.scpt Xcode -a "Is the build successful?" --model llava:13b - -# Two-step: Analyze existing image -osascript peekaboo.scpt analyze screenshot.png "What do you see?" -osascript peekaboo.scpt analyze error.png "Explain this error" --provider ollama -``` - -### πŸ” **App Discovery** -```bash -# List all running apps with window info -osascript peekaboo.scpt list -osascript peekaboo.scpt ls # Short alias - -# Output: -# β€’ Google Chrome (com.google.Chrome) -# Windows: 3 -# - "GitHub - Project" -# - "Documentation" -# - "Stack Overflow" -# β€’ Safari (com.apple.Safari) -# Windows: 2 -# - "Apple.com" -# - "News" -``` - -### 🎯 **Advanced Combinations** -```bash -# Quiet fullscreen with custom path and format -osascript peekaboo.scpt -f -o ~/desktop-capture --format jpg -q - -# Multi-window with AI analysis (analyzes first window) -osascript peekaboo.scpt Chrome -m -a "What tabs are open?" - -# Verbose mode for debugging -osascript peekaboo.scpt Safari -v -o ~/debug.png - -# Force window mode on fullscreen request -osascript peekaboo.scpt Safari -f -w # -w overrides -f -``` - ---- - -## ⚑ **QUICK WINS** - -### 🎯 **Basic Captures** -```bash -# Fullscreen (no app specified) -osascript peekaboo.scpt -``` -**Result**: Full screen β†’ `/tmp/peekaboo_fullscreen_20250522_143052.png` +#### Method 2: From Source ```bash -# App window with smart filename -osascript peekaboo.scpt Finder +# Clone the repository +git clone https://github.com/yourusername/peekaboo.git +cd peekaboo + +# Install Node.js dependencies +npm install + +# Build the TypeScript server +npm run build + +# Build the Swift CLI component +cd swift-cli +swift build -c release + +# Copy the binary to the project root +cp .build/release/peekaboo ../peekaboo + +# Return to project root +cd .. + +# Optional: Link for global access +npm link ``` -**Result**: Finder window β†’ `/tmp/peekaboo_finder_20250522_143052.png` + +### πŸ”§ Configuration + +#### Environment Setup + +Create a `.env` file in your project or set environment variables: ```bash -# Custom output path -osascript peekaboo.scpt Finder -o ~/Desktop/finder.png -``` -**Result**: Finder window β†’ `~/Desktop/finder.png` +# AI Provider Configuration (Optional) +AI_PROVIDERS='[ + { + "type": "ollama", + "baseUrl": "http://localhost:11434", + "model": "llava", + "enabled": true + }, + { + "type": "openai", + "apiKey": "your-openai-api-key", + "model": "gpt-4-vision-preview", + "enabled": false + } +]' -### 🎭 **Multi-Window Magic** -```bash -osascript peekaboo.scpt Safari -m +# Logging Configuration +LOG_LEVEL="INFO" +PEEKABOO_LOG_FILE="/tmp/peekaboo-mcp.log" + +# Optional: Custom paths for screenshots +PEEKABOO_DEFAULT_SAVE_PATH="~/Pictures/Screenshots" ``` -**Result**: Multiple files with smart names: -- `/tmp/peekaboo_safari_20250522_143052_window_1_github.png` -- `/tmp/peekaboo_safari_20250522_143052_window_2_docs.png` -- `/tmp/peekaboo_safari_20250522_143052_window_3_search.png` + +#### MCP Server Configuration + +Add Peekaboo to your MCP client configuration: + +**For Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_config.json`):** +```json +{ + "mcpServers": { + "peekaboo": { + "command": "peekaboo-mcp", + "args": [], + "env": { + "AI_PROVIDERS": "[{\"type\":\"ollama\",\"baseUrl\":\"http://localhost:11434\",\"model\":\"llava\",\"enabled\":true}]" + } + } + } +} +``` + +**For other MCP clients:** +```json +{ + "server": { + "command": "node", + "args": ["/path/to/peekaboo/dist/index.js"], + "env": { + "AI_PROVIDERS": "[{\"type\":\"ollama\",\"baseUrl\":\"http://localhost:11434\",\"model\":\"llava\",\"enabled\":true}]" + } + } +} +``` + +### πŸ” Permissions Setup + +Peekaboo requires specific macOS permissions to function properly: + +#### 1. Screen Recording Permission + +**Grant permission via System Preferences:** +1. Open **System Preferences** β†’ **Security & Privacy** β†’ **Privacy** +2. Select **Screen Recording** from the left sidebar +3. Click the **lock icon** and enter your password +4. Click **+** and add your terminal application or MCP client +5. Restart the application + +**For common applications:** +- **Terminal.app**: `/Applications/Utilities/Terminal.app` +- **Claude Desktop**: `/Applications/Claude.app` +- **VS Code**: `/Applications/Visual Studio Code.app` + +#### 2. Accessibility Permission (Optional) + +For advanced window management features: +1. Open **System Preferences** β†’ **Security & Privacy** β†’ **Privacy** +2. Select **Accessibility** from the left sidebar +3. Add your terminal/MCP client application + +### βœ… Verification + +Test your installation: ```bash -# Save to specific directory -osascript peekaboo.scpt Chrome -m -o ~/screenshots/ -``` -**Result**: All Chrome windows saved to `~/screenshots/` directory +# Test the Swift CLI directly +./peekaboo --help -### πŸ” **App Discovery** -```bash -osascript peekaboo.scpt list # or use 'ls' -``` -**Result**: Every running app + window titles. No guessing! +# Test server status +./peekaboo list server_status --json-output ---- +# Test screen capture (requires permissions) +./peekaboo image --mode screen --format png -## πŸ›  **SETUP** - -### 1️⃣ **Make Executable** -```bash -chmod +x peekaboo.scpt +# Start the MCP server for testing +peekaboo-mcp ``` -### 2️⃣ **Grant Powers** -- System Preferences β†’ Security & Privacy β†’ **Screen Recording** -- Add your terminal app to the list -- ✨ You're golden! - ---- - -## 🎨 **FORMAT PARTY** - -Peekaboo speaks all the languages: - -```bash -# PNG (default) - smart filename in /tmp -osascript peekaboo.scpt Safari -# β†’ /tmp/peekaboo_safari_20250522_143052.png - -# JPG with format flag -osascript peekaboo.scpt Safari -o ~/shot --format jpg -# β†’ ~/shot.jpg - -# PDF - vector goodness -osascript peekaboo.scpt Safari -o ~/doc.pdf -# β†’ ~/doc.pdf (format auto-detected from extension) - -# Mix and match options -osascript peekaboo.scpt -f --format jpg -o ~/fullscreen -q -# β†’ ~/fullscreen.jpg (quiet mode just prints path) +**Expected output for server status:** +```json +{ + "success": true, + "data": { + "swift_cli_available": true, + "permissions": { + "screen_recording": true + }, + "system_info": { + "macos_version": "14.0" + } + } +} ``` ---- +### 🎯 Quick Start -## πŸ€– **AI VISION ANALYSIS** ⭐ +Once installed and configured: -Peekaboo integrates with AI providers for powerful vision analysis - ask questions about your screenshots! Supports both **Ollama** (local, privacy-focused) and **Claude CLI** (cloud-based). - -**πŸͺŸ Smart Multi-Window AI** - When analyzing apps with multiple windows, Peekaboo automatically captures and analyzes ALL windows, giving you comprehensive insights about each one! - -### 🎯 **Key Features** -- **πŸ€– Smart Provider Selection** - Auto-detects Ollama or Claude CLI -- **🧠 Smart Model Auto-Detection** - Automatically picks the best available vision model (Ollama) -- **πŸ“ Intelligent Image Resizing** - Auto-compresses large screenshots (>5MB β†’ 2048px) for optimal AI processing -- **πŸͺŸ Smart Multi-Window Analysis** - Automatically analyzes ALL windows when app has multiple windows -- **⚑ One or Two-Step Workflows** - Screenshot+analyze or analyze existing images -- **πŸ”’ Privacy Options** - Choose between local (Ollama) or cloud (Claude) analysis -- **⏱️ Performance Tracking** - Shows analysis time for each request -- **⛰️ Timeout Protection** - 90-second timeout prevents hanging on slow models -- **🎯 Zero Configuration** - Just install your preferred AI provider, Peekaboo handles the rest - -### πŸš€ **One-Step: Screenshot + Analysis** -```bash -# Take screenshot and analyze it in one command (auto-selects provider) -osascript peekaboo.scpt Safari -a "What's the main content on this page?" -osascript peekaboo.scpt Terminal -a "Any error messages visible?" -osascript peekaboo.scpt Xcode -a "Is the build successful?" - -# Multi-window apps: Automatically analyzes ALL windows! -osascript peekaboo.scpt Chrome -a "What tabs are open?" -# πŸ€– Result: Window 1 "GitHub": Shows a pull request page... -# Window 2 "Docs": Shows API documentation... -# Window 3 "Gmail": Shows email inbox... - -# Force single window with -w flag -osascript peekaboo.scpt Chrome -w -a "What's on this tab?" - -# Specify AI provider explicitly -osascript peekaboo.scpt Chrome -a "What product is shown?" --provider ollama -osascript peekaboo.scpt Safari -a "Describe the page" --provider claude - -# Specify custom model (Ollama) -osascript peekaboo.scpt Chrome -a "What product is being shown?" --model llava:13b - -# Fullscreen analysis (no app specified) -osascript peekaboo.scpt -f -a "Describe what's on my screen" -osascript peekaboo.scpt -a "Any UI errors or warnings visible?" -v - -# Quiet mode for scripting (just outputs path after analysis) -osascript peekaboo.scpt Terminal -a "Find errors" -q -``` - -### πŸ” **Two-Step: Analyze Existing Images** -```bash -# Analyze screenshots you already have -osascript peekaboo.scpt analyze /tmp/screenshot.png "Describe what you see" -osascript peekaboo.scpt analyze error.png "What error is shown?" -osascript peekaboo.scpt analyze ui.png "Any UI issues?" --model qwen2.5vl:7b -``` - -### πŸ€– **AI Provider Comparison** - -| Provider | Type | Image Analysis | Setup | Best For | -|----------|------|---------------|-------|----------| -| **Ollama** | Local | βœ… Direct file analysis | Install + pull models | Privacy, automation | -| **Claude CLI** | Cloud | ❌ Limited support* | Install CLI | Text prompts | - -*Claude CLI currently doesn't support direct image file analysis but can work with images through interactive mode or MCP integrations. - -### πŸ› οΈ **Complete Ollama Setup Guide** (Recommended for Image Analysis) - -#### 1️⃣ **Install Ollama** -```bash -# macOS (Homebrew) -brew install ollama - -# Or direct install -curl -fsSL https://ollama.ai/install.sh | sh - -# Or download from https://ollama.ai -``` - -#### 2️⃣ **Start Ollama Service** -```bash -# Start the service (runs in background) -ollama serve - -# Or use the Ollama.app (GUI version) -# Download from https://ollama.ai β†’ Double-click to install -``` - -#### 3️⃣ **Pull Vision Models** -```bash -# πŸ† Recommended: Best overall (6GB) -ollama pull qwen2.5vl:7b - -# πŸš€ Popular choice: Good balance (4.7GB) -ollama pull llava:7b - -# ⚑ Lightweight: Low RAM usage (2.9GB) -ollama pull llava-phi3:3.8b - -# πŸ” OCR specialist: Great for text (5.5GB) -ollama pull minicpm-v:8b - -# 🌍 Latest and greatest: Cutting edge (11GB) -ollama pull llama3.2-vision:11b -``` - -#### 4️⃣ **Verify Setup** -```bash -# Check running models -ollama list - -# Test vision analysis -osascript peekaboo.scpt --ask "What do you see on my screen?" -``` - -### 🧠 **Smart Model Selection** -Peekaboo automatically picks the best available vision model in priority order: - -| Model | Size | Strengths | Best For | -|-------|------|-----------|----------| -| **qwen2.5vl:7b** | 6GB | πŸ† Document/chart analysis | Technical screenshots, code, UI | -| **llava:7b** | 4.7GB | πŸš€ Well-rounded performance | General purpose, balanced usage | -| **llava-phi3:3.8b** | 2.9GB | ⚑ Fast & lightweight | Low-resource systems, quick analysis | -| **minicpm-v:8b** | 5.5GB | πŸ” Superior OCR accuracy | Text-heavy images, error messages | -| **llama3.2-vision:11b** | 11GB | 🌟 Latest technology | Best quality, high-end systems | - -### πŸ“ **Smart Image Processing** -Peekaboo automatically optimizes images for AI analysis: - -```bash -# Large screenshots (>5MB) are automatically compressed -πŸ” Image size: 7126888 bytes -πŸ” Image is large (7126888 bytes), creating compressed version for AI -# β†’ Resized to 2048px max dimension while preserving aspect ratio -# β†’ Maintains quality while ensuring fast AI processing -``` - -**Benefits:** -- βœ… **Faster Analysis** - Smaller images = quicker AI responses -- βœ… **Reliable Processing** - Avoids API timeouts with huge images -- βœ… **Preserves Originals** - Full-resolution screenshots remain untouched -- βœ… **Smart Compression** - Uses macOS native `sips` tool for quality resizing - -### πŸ’‘ **Pro Usage Examples** - -```bash -# Automated UI testing with smart resizing -osascript peekaboo.scpt "Your App" --ask "Any error dialogs or crashes visible?" - -# High-resolution dashboard analysis (auto-compressed for AI) -osascript peekaboo.scpt "Grafana" --ask "Are all metrics healthy and green?" - -# Detailed code review screenshots -osascript peekaboo.scpt "VS Code" --ask "Any syntax errors or warnings in the code?" - -# Large-screen analysis (automatically handles 4K+ displays) -osascript peekaboo.scpt --ask "Describe the overall layout and any issues" -``` - -### πŸͺŸ **Smart Multi-Window Analysis** -When an app has multiple windows, Peekaboo automatically analyzes ALL of them: - -```bash -# Chrome with 3 tabs open? Peekaboo analyzes them all! -osascript peekaboo.scpt Chrome -a "What's on each tab?" - -# Result format: -# Peekaboo πŸ‘€: Multi-window AI Analysis Complete! πŸ€– -# -# πŸ“Έ App: Chrome (3 windows) -# ❓ Question: What's on each tab? -# πŸ€– Model: qwen2.5vl:7b -# -# πŸ’¬ Results for each window: -# -# πŸͺŸ Window 1: "GitHub - Pull Request #42" -# This shows a pull request for adding authentication... -# -# πŸͺŸ Window 2: "Stack Overflow - Python threading" -# A Stack Overflow page discussing Python threading concepts... -# -# πŸͺŸ Window 3: "Gmail - Inbox (42)" -# Gmail inbox showing 42 unread emails... -``` - -**Smart Defaults:** -- βœ… Multi-window apps β†’ Analyzes ALL windows automatically -- βœ… Single window apps β†’ Analyzes the one window -- βœ… Want just one window? β†’ Use `-w` flag to force single window mode -- βœ… Quiet mode β†’ Returns condensed results for each window - -### ⏱️ **Performance Tracking & Timeouts** -Every AI analysis shows execution time and has built-in protection: -``` -Peekaboo πŸ‘€: Analysis via qwen2.5vl:7b took 7 sec. -Peekaboo πŸ‘€: Analysis timed out after 90 seconds. -``` - -**Timeout Protection:** -- ⏰ 90-second timeout prevents hanging on large models -- πŸ›‘οΈ Clear error messages if model is too slow -- πŸ’‘ Suggests using smaller models on timeout - -**Perfect for:** -- πŸ§ͺ **Automated UI Testing** - "Any error messages visible?" -- πŸ“Š **Dashboard Monitoring** - "Are all systems green?" -- πŸ› **Error Detection** - "What errors are shown in this log?" -- πŸ“Έ **Content Verification** - "Does this page look correct?" -- πŸ” **Visual QA Automation** - "Any broken UI elements?" -- πŸ“± **App State Verification** - "Is the login successful?" -- ⏱️ **Performance Benchmarking** - Compare model speeds - ---- - -## ☁️ **CLOUD AI INTEGRATION** - -Peekaboo works seamlessly with **any AI service** that can read files! Perfect for Claude Code, Windsurf, ChatGPT, or any other AI tool. - -### πŸš€ **Quick Cloud AI Setup** - -**For AI tools like Claude Code, Windsurf, etc.:** - -1. **Copy the script file** to your project directory: +1. **Capture Screenshot:** ```bash - cp peekaboo.scpt /path/to/your/project/ + peekaboo-mcp + # In your MCP client: "Take a screenshot of my screen" ``` -2. **Tell your AI tool about it**: - ``` - I have a screenshot automation tool called peekaboo.scpt in this directory. - It can capture screenshots of any app and save them automatically. - Please read the file to understand how to use it. +2. **List Applications:** + ```bash + # In your MCP client: "Show me all running applications" ``` -3. **Your AI will automatically understand** how to: - - Take screenshots of specific apps - - Use smart filenames with timestamps - - Capture multiple windows - - Handle different output formats - - Integrate with your workflow - -### πŸ’‘ **Example AI Prompts** - +3. **Analyze Screenshot:** ```bash -# Ask your AI assistant: -"Use peekaboo.scpt to take a screenshot of Safari and save it to /tmp/webpage.png" + # In your MCP client: "Take a screenshot and tell me what's on my screen" + ``` -"Capture all Chrome windows with the multi-window feature" +### πŸ› Troubleshooting -"Take a screenshot of Xcode and then analyze if there are any build errors visible" +**Common Issues:** -"Set up an automated screenshot workflow for testing my app" -``` +| Issue | Solution | +|-------|----------| +| `Permission denied` errors | Grant Screen Recording permission in System Preferences | +| `Swift CLI unavailable` | Rebuild Swift CLI: `cd swift-cli && swift build -c release` | +| `AI analysis failed` | Check AI provider configuration and network connectivity | +| `Command not found: peekaboo-mcp` | Run `npm link` or check global npm installation | -### 🎯 **AI Tool Integration Examples** - -**Claude Code / Windsurf:** -``` -Use the peekaboo.scpt tool to capture screenshots during our development session. -The script automatically handles app targeting, file paths, and smart naming. -``` - -**ChatGPT / GitHub Copilot:** -``` -I have a screenshot automation script. Please read peekaboo.scpt and help me -integrate it into my testing workflow. -``` - -**Custom AI Scripts:** -```python -import subprocess - -def take_screenshot(app_name, output_path): - """Use Peekaboo to capture app screenshots""" - cmd = ["osascript", "peekaboo.scpt", app_name, output_path] - return subprocess.run(cmd, capture_output=True, text=True) - -# Your AI can now use this function automatically! -``` - -### 🧠 **Why AI Tools Love Peekaboo** - -- **πŸ“– Self-Documenting**: AI reads the script and understands all features instantly -- **🎯 Zero Config**: No API keys, no setup - just works -- **🧠 Smart Outputs**: Model-friendly filenames make AI integration seamless -- **⚑ Reliable**: Unattended operation perfect for AI-driven workflows -- **πŸ” Comprehensive**: From basic screenshots to multi-window analysis - -**The AI tool will automatically discover:** -- All available command-line options (`--multi`, `--window`, `--verbose`) -- Smart filename generation patterns -- Error handling and troubleshooting -- Integration with local Ollama for AI analysis -- Testing capabilities and examples - -### πŸŽͺ **Cloud AI + Local AI Combo** - -**Powerful workflow example:** +**Debug Mode:** ```bash -# 1. Use Peekaboo to capture and analyze locally -osascript peekaboo.scpt "Your App" --ask "Any errors visible?" +# Enable verbose logging +LOG_LEVEL=DEBUG peekaboo-mcp -# 2. Your cloud AI assistant can read the results and provide guidance -# 3. Iterate and improve based on AI recommendations -# 4. Automate the entire process with AI-generated scripts +# Check permissions +./peekaboo list server_status --json-output ``` +**Get Help:** +- πŸ“š [Documentation](./docs/) +- πŸ› [Issues](https://github.com/yourusername/peekaboo/issues) +- πŸ’¬ [Discussions](https://github.com/yourusername/peekaboo/discussions) + --- -## 🧠 **SMART FILENAMES** +## πŸ› οΈ Available Tools -Peekaboo automatically generates **model-friendly** filenames that are perfect for automation: +Once installed, Peekaboo provides three powerful MCP tools: -```bash -# App names become lowercase with underscores -osascript peekaboo.scpt "Safari" β†’ peekaboo_safari_TIMESTAMP.png -osascript peekaboo.scpt "Activity Monitor" β†’ peekaboo_activity_monitor_TIMESTAMP.png -osascript peekaboo.scpt "com.apple.TextEdit" β†’ peekaboo_com_apple_textedit_TIMESTAMP.png -osascript peekaboo.scpt "Final Cut Pro" β†’ peekaboo_final_cut_pro_TIMESTAMP.png +### πŸ“Έ `peekaboo.image` - Screen Capture -# Multi-window gets descriptive names -osascript peekaboo.scpt "Chrome" --multi β†’ chrome_window_1_github.png - β†’ chrome_window_2_documentation.png +**Parameters:** +- `mode`: `"screen"` | `"window"` | `"multi"` (default: "screen") +- `app`: Application identifier for window/multi modes +- `path`: Custom save path (optional) + +**Example:** +```json +{ + "name": "peekaboo.image", + "arguments": { + "mode": "window", + "app": "Safari" + } +} ``` -**Perfect for:** -- πŸ€– AI model file references -- πŸ“ Scripting and automation -- πŸ” Easy file searching -- πŸ“Š Batch processing +### πŸ“‹ `peekaboo.list` - Application Listing + +**Parameters:** +- `item_type`: `"running_applications"` | `"application_windows"` | `"server_status"` +- `app`: Application identifier (required for application_windows) + +**Example:** +```json +{ + "name": "peekaboo.list", + "arguments": { + "item_type": "running_applications" + } +} +``` + +### 🧩 `peekaboo.analyze` - AI Analysis + +**Parameters:** +- `image_path`: Absolute path to image file +- `question`: Question/prompt for AI analysis + +**Example:** +```json +{ + "name": "peekaboo.analyze", + "arguments": { + "image_path": "/tmp/screenshot.png", + "question": "What applications are visible in this screenshot?" + } +} +``` + +## 🎯 Key Features + +### Screen Capture +- **Multi-display support**: Captures each display separately +- **Window targeting**: Intelligent app/window matching with fuzzy search +- **Format flexibility**: PNG, JPEG, WebP, HEIF support +- **Automatic naming**: Timestamps and descriptive filenames +- **Permission handling**: Automatic screen recording permission checks + +### Application Management +- **Running app enumeration**: Complete system application listing +- **Window discovery**: Per-app window enumeration with metadata +- **Fuzzy matching**: Find apps by partial name, bundle ID, or PID +- **Real-time status**: Active/background status, window counts + +### AI Integration +- **Provider agnostic**: Support for Ollama, OpenAI, and other providers +- **Image analysis**: Natural language querying of captured content +- **Configurable**: Environment-based provider selection + +## πŸ›οΈ Project Structure + +``` +Peekaboo/ +β”œβ”€β”€ src/ # Node.js MCP Server (TypeScript) +β”‚ β”œβ”€β”€ index.ts # Main MCP server entry point +β”‚ β”œβ”€β”€ tools/ # Individual tool implementations +β”‚ β”‚ β”œβ”€β”€ image.ts # Screen capture tool +β”‚ β”‚ β”œβ”€β”€ analyze.ts # AI analysis tool +β”‚ β”‚ └── list.ts # Application/window listing +β”‚ β”œβ”€β”€ utils/ # Utility modules +β”‚ β”‚ β”œβ”€β”€ swift-cli.ts # Swift CLI integration +β”‚ β”‚ β”œβ”€β”€ ai-providers.ts # AI provider management +β”‚ β”‚ └── server-status.ts # Server status utilities +β”‚ └── types/ # Shared type definitions +β”œβ”€β”€ swift-cli/ # Native Swift CLI +β”‚ └── Sources/peekaboo/ # Swift source files +β”‚ β”œβ”€β”€ main.swift # CLI entry point +β”‚ β”œβ”€β”€ ImageCommand.swift # Image capture implementation +β”‚ β”œβ”€β”€ ListCommand.swift # Application listing +β”‚ β”œβ”€β”€ Models.swift # Data structures +β”‚ β”œβ”€β”€ ApplicationFinder.swift # App discovery logic +β”‚ β”œβ”€β”€ WindowManager.swift # Window management +β”‚ β”œβ”€β”€ PermissionsChecker.swift # macOS permissions +β”‚ └── JSONOutput.swift # JSON response formatting +β”œβ”€β”€ package.json # Node.js dependencies +β”œβ”€β”€ tsconfig.json # TypeScript configuration +└── README.md # This file +``` + +## πŸ”§ Technical Details + +### Swift CLI JSON Output +The Swift CLI outputs structured JSON when called with `--json-output`: + +```json +{ + "success": true, + "data": { + "applications": [ + { + "app_name": "Safari", + "bundle_id": "com.apple.Safari", + "pid": 1234, + "is_active": true, + "window_count": 2 + } + ] + }, + "debug_logs": ["Found 50 applications"] +} +``` + +### MCP Integration +The Node.js server translates between MCP's JSON-RPC protocol and the Swift CLI's JSON output, providing: +- **Schema validation** via Zod +- **Error handling** with proper MCP error codes +- **Logging** via Pino logger +- **Type safety** throughout the TypeScript codebase + +### Permission Model +Peekaboo respects macOS security by: +- **Checking screen recording permissions** before capture operations +- **Graceful degradation** when permissions are missing +- **Clear error messages** guiding users to grant required permissions + +## πŸ§ͺ Testing + +### Manual Testing +```bash +# Test Swift CLI directly +./peekaboo list apps --json-output | head -20 + +# Test MCP integration +echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/list"}' | node dist/index.js + +# Test image capture +echo '{"jsonrpc": "2.0", "id": 2, "method": "tools/call", "params": {"name": "peekaboo.image", "arguments": {"mode": "screen"}}}' | node dist/index.js +``` + +### Automated Testing +```bash +# TypeScript compilation +npm run build + +# Swift compilation +cd swift-cli && swift build +``` + +## πŸ› Known Issues + +- **FileHandle warning**: Non-critical Swift warning about TextOutputStream conformance +- **AI Provider Config**: Requires `AI_PROVIDERS` environment variable for analysis features + +## πŸš€ Future Enhancements + +- [ ] **OCR Integration**: Built-in text extraction from screenshots +- [ ] **Video Capture**: Screen recording capabilities +- [ ] **Annotation Tools**: Drawing/markup on captured images +- [ ] **Cloud Storage**: Direct upload to cloud providers +- [ ] **Hotkey Support**: System-wide keyboard shortcuts + +## πŸ“„ License + +MIT License - see LICENSE file for details. + +## 🀝 Contributing + +1. Fork the repository +2. Create a feature branch (`git checkout -b feature/amazing-feature`) +3. Commit your changes (`git commit -m 'Add amazing feature'`) +4. Push to the branch (`git push origin feature/amazing-feature`) +5. Open a Pull Request --- -## πŸ† **POWER MOVES** - -### 🎯 **Targeting Options** -```bash -# By name (easy) - smart filename -osascript peekaboo.scpt Safari -# β†’ /tmp/peekaboo_safari_20250522_143052.png - -# By name with custom path -osascript peekaboo.scpt Safari -o /tmp/safari.png - -# By bundle ID (precise) - gets sanitized -osascript peekaboo.scpt com.apple.Safari -# β†’ /tmp/peekaboo_com_apple_safari_20250522_143052.png - -# By display name (works too!) - spaces become underscores -osascript peekaboo.scpt "Final Cut Pro" -# β†’ /tmp/peekaboo_final_cut_pro_20250522_143052.png -``` - -### πŸŽͺ **Pro Features** -```bash -# Multi-window capture --m, --multi # All windows with descriptive names - -# Window modes --w, --window # Front window only (unattended!) --f, --fullscreen # Force fullscreen capture - -# Output control --q, --quiet # Minimal output (just path) --v, --verbose # See what's happening under the hood -``` - -### πŸ” **Discovery Mode** -```bash -osascript peekaboo.scpt list -``` -Shows you: -- πŸ“± All running apps -- πŸ†” Bundle IDs -- πŸͺŸ Window counts -- πŸ“ Exact window titles - ---- - -## 🎭 **REAL-WORLD SCENARIOS** - -### πŸ“Š **Documentation Screenshots** -```bash -# Quick capture to /tmp with descriptive names -osascript peekaboo.scpt Xcode -m -osascript peekaboo.scpt Terminal -m -osascript peekaboo.scpt Safari -m - -# Capture your entire workflow to specific directory -osascript peekaboo.scpt Xcode -m -o /docs/ -osascript peekaboo.scpt Terminal -m -o /docs/ -osascript peekaboo.scpt Safari -m -o /docs/ - -# Or specific files -osascript peekaboo.scpt Xcode -o /docs/xcode.png -osascript peekaboo.scpt Terminal -o /docs/terminal.png -osascript peekaboo.scpt Safari -o /docs/browser.png -``` - -### πŸš€ **CI/CD Integration** -```bash -# Quick automated testing screenshots with smart names -osascript peekaboo.scpt "Your App" -# β†’ /tmp/peekaboo_your_app_20250522_143052.png - -# Automated visual testing with AI -osascript peekaboo.scpt "Your App" -a "Any error messages or crashes visible?" -osascript peekaboo.scpt "Your App" -a "Is the login screen displayed correctly?" - -# Custom path with timestamp -osascript peekaboo.scpt "Your App" -o "/test-results/app-$(date +%s).png" - -# Quiet mode for scripts (just outputs path) -SCREENSHOT=$(osascript peekaboo.scpt "Your App" -q) -echo "Screenshot saved: $SCREENSHOT" -``` - -### 🎬 **Content Creation** -```bash -# Before/after shots with AI descriptions -osascript peekaboo.scpt Photoshop -a "Describe the current design state" -# ... do your work ... -osascript peekaboo.scpt Photoshop -a "What changes were made to the design?" - -# Traditional before/after shots -osascript peekaboo.scpt Photoshop -o /content/before.png -# ... do your work ... -osascript peekaboo.scpt Photoshop -o /content/after.png - -# Capture all design windows -osascript peekaboo.scpt Photoshop -m -o /content/designs/ -``` - -### πŸ§ͺ **Automated QA & Testing** -```bash -# Visual regression testing -osascript peekaboo.scpt "Your App" -a "Does the UI look correct?" -osascript peekaboo.scpt Safari -a "Are there any broken images or layout issues?" -osascript peekaboo.scpt Terminal -a "Any red error text visible?" - -# Dashboard monitoring -osascript peekaboo.scpt analyze /tmp/dashboard.png "Are all metrics green?" - -# Quiet mode for test scripts -if osascript peekaboo.scpt "Your App" -a "Any errors?" -q | grep -q "No errors"; then - echo "βœ… Test passed" -else - echo "❌ Test failed" -fi -``` - ---- - -## 🚨 **TROUBLESHOOTING** - -### πŸ” **Permission Denied?** -- Check Screen Recording permissions -- Restart your terminal after granting access - -### πŸ‘» **App Not Found?** -```bash -# See what's actually running -osascript peekaboo.scpt list -# or -osascript peekaboo.scpt ls - -# Try the bundle ID instead -osascript peekaboo.scpt com.company.AppName -o /tmp/shot.png -``` - -### πŸ“ **File Not Created?** -- Check the output directory exists (Peekaboo creates it!) -- Verify disk space -- Try a simple `/tmp/test.png` first - -### πŸ› **Debug Mode** -```bash -osascript peekaboo.scpt Safari -o /tmp/debug.png -v -# or -osascript peekaboo.scpt Safari --output /tmp/debug.png --verbose -``` - ---- - -## πŸŽͺ **FEATURES** - -| Feature | Description | -|---------|-------------| -| **Basic screenshots** | βœ… Full screen capture with app targeting | -| **App targeting** | βœ… By name or bundle ID | -| **Multi-format** | βœ… PNG, JPG, PDF support | -| **App discovery** | βœ… `list`/`ls` command shows running apps | -| **Multi-window** | βœ… `-m`/`--multi` captures all app windows | -| **Smart naming** | βœ… Descriptive filenames for windows | -| **Window modes** | βœ… `-w`/`--window` for front window only | -| **Auto paths** | βœ… Optional output path with smart /tmp defaults | -| **Smart filenames** | βœ… Model-friendly: app_name_timestamp format | -| **AI Vision Analysis** | βœ… Ollama + Claude CLI support with smart fallback | -| **Smart AI Models** | βœ… Auto-picks best: qwen2.5vl > llava > phi3 > minicpm | -| **Smart Image Compression** | βœ… Auto-resizes large images (>5MB β†’ 2048px) for AI | -| **AI Provider Selection** | βœ… Auto-detect or specify with `--provider` flag | -| **Performance Tracking** | βœ… Shows analysis time for benchmarking | -| **Cloud AI Integration** | βœ… Self-documenting for Claude, Windsurf, ChatGPT, etc. | -| **Quiet mode** | βœ… `-q`/`--quiet` for minimal output | -| **Verbose logging** | βœ… `-v`/`--verbose` for debugging | - ---- - -## πŸ§ͺ **TESTING** - -We've got you covered with comprehensive testing: - -```bash -# Run the full test suite -./test_peekaboo.sh - -# Test specific features -./test_peekaboo.sh ai # AI vision analysis only -./test_peekaboo.sh advanced # Multi-window, discovery, AI -./test_peekaboo.sh basic # Core screenshot functionality -./test_peekaboo.sh quick # Essential tests only - -# Test and cleanup -./test_peekaboo.sh all --cleanup -``` - -**Complete Test Coverage:** -- βœ… Basic screenshots with smart filenames -- βœ… App resolution (names + bundle IDs) -- βœ… Format support (PNG, JPG, PDF) -- βœ… Multi-window scenarios with descriptive names -- βœ… App discovery and window enumeration -- βœ… **AI vision analysis (8 comprehensive tests)** - - One-step: Screenshot + AI analysis - - Two-step: Analyze existing images - - Model auto-detection and custom models - - Error handling and edge cases -- βœ… Enhanced error messaging -- βœ… Performance and stress testing -- βœ… Integration workflows -- βœ… Compatibility with system apps - -**AI Test Details:** -```bash -# Specific AI testing scenarios -./test_peekaboo.sh ai -``` -- βœ… One-step screenshot + analysis workflow -- βœ… Custom model specification testing -- βœ… Two-step analysis of existing images -- βœ… Complex questions with special characters -- βœ… Invalid model error handling -- βœ… Missing file error handling -- βœ… Malformed command validation -- βœ… Graceful Ollama/model availability checks - ---- - -## βš™οΈ **CUSTOMIZATION** - -Tweak the magic in the script headers: - -```applescript -property captureDelay : 1.0 -- Wait before snap -property windowActivationDelay : 0.5 -- Window focus time -property enhancedErrorReporting : true -- Detailed errors -property verboseLogging : false -- Debug output -``` - ---- - -## πŸŽ‰ **WHY PEEKABOO ROCKS** - -### πŸš€ **Unattended = Unstoppable** -- No clicking, no selecting, no babysitting -- Perfect for automation and CI/CD -- Set it and forget it - -### 🧠 **Smart Everything** -- **Smart filenames**: Model-friendly with app names -- **Smart targeting**: Works with app names OR bundle IDs -- **Smart delays**: Optimized for speed (70% faster) -- **Smart AI analysis**: Auto-detects best vision model -- Auto-launches sleeping apps and brings them forward - -### 🎭 **Multi-Window Mastery** -- Captures ALL windows with descriptive names -- Safe filename generation with sanitization -- Never overwrites accidentally - -### ⚑ **Blazing Fast** -- **0.3s capture delay** (down from 1.0s) -- **0.2s window activation** (down from 0.5s) -- **0.1s multi-window focus** (down from 0.3s) -- Responsive and practical for daily use - -### πŸ€– **AI-Powered Vision** -- **Local analysis**: Private Ollama integration, no cloud -- **Smart model selection**: Auto-picks best available model -- **Multi-window intelligence**: Analyzes ALL windows automatically -- **One or two-step**: Screenshot+analyze or analyze existing images -- **Perfect for automation**: Visual testing, error detection, QA - -### πŸ” **Discovery Built-In** -- See exactly what's running -- Get precise window titles -- No more guessing games - ---- - -## πŸ“š **INSPIRED BY** - -Built in the style of the legendary **terminator.scpt** β€” because good patterns should be celebrated and extended. - ---- - -## πŸŽͺ **PROJECT FILES** - -``` -πŸ“ Peekaboo/ -β”œβ”€β”€ 🎯 peekaboo.scpt # Main screenshot tool -β”œβ”€β”€ πŸ§ͺ test_peekaboo.sh # Comprehensive test suite -β”œβ”€β”€ πŸ“– README.md # This awesomeness -└── 🎨 assets/ - └── banner.png # Project banner -``` - ---- - -## πŸ† **THE BOTTOM LINE** - -**Peekaboo** doesn't just take screenshots. It **conquers** them. - -πŸ‘€ Point β†’ πŸ“Έ Shoot β†’ πŸ’Ύ Save β†’ πŸŽ‰ Done! - -*Now you see it, now it's saved. Peekaboo!* - ---- - -*Built with ❀️ and lots of β˜• for the macOS automation community.* \ No newline at end of file +**πŸŽ‰ Peekaboo is ready to use!** The project successfully combines the power of native macOS APIs with modern Node.js tooling to create a comprehensive screen capture and analysis solution. \ No newline at end of file diff --git a/docs/spec.md b/docs/spec.md new file mode 100644 index 0000000..59b2a42 --- /dev/null +++ b/docs/spec.md @@ -0,0 +1,423 @@ +## Peekaboo: Full & Final Detailed Specification v1.1.1 +https://aistudio.google.com/prompts/1B0Va41QEZz5ZMiGmLl2gDme8kQ-LQPW- + +**Project Vision:** Peekaboo is a macOS utility exposed via a Node.js MCP server, enabling AI agents to perform advanced screen captures, image analysis via user-configured AI providers, and query application/window information. The core macOS interactions are handled by a native Swift command-line interface (CLI) named `peekaboo`, which is called by the Node.js server. All image captures automatically exclude window shadows/frames. + +**Core Components:** + +1. **Node.js/TypeScript MCP Server (`peekaboo-mcp`):** + * **NPM Package Name:** `peekaboo-mcp`. + * **GitHub Project Name:** `peekaboo`. + * Implements MCP server logic using the latest stable `@modelcontextprotocol/sdk`. + * Exposes three primary MCP tools: `peekaboo.image`, `peekaboo.analyze`, `peekaboo.list`. + * Translates MCP tool calls into commands for the Swift `peekaboo` CLI. + * Parses structured JSON output from the Swift `peekaboo` CLI. + * Handles image data preparation (reading files, Base64 encoding) for MCP responses if image data is explicitly requested by the client. + * Manages interaction with configured AI providers based on environment variables. All AI provider calls (Ollama, OpenAI, etc.) are made from this Node.js layer. + * Implements robust logging to a file using `pino`, ensuring no logs interfere with MCP stdio communication. +2. **Swift CLI (`peekaboo`):** + * A standalone macOS command-line tool, built as a universal binary (arm64 + x86_64). + * Handles all direct macOS system interactions: image capture, application/window listing, and fuzzy application matching. + * **Does NOT directly interact with any AI providers (Ollama, OpenAI, etc.).** + * Outputs all results and errors in a structured JSON format via a global `--json-output` flag. This JSON includes a `debug_logs` array for internal Swift CLI logs, which the Node.js server can relay to its own logger. + * The `peekaboo` binary is bundled at the root of the `peekaboo-mcp` NPM package. + +--- + +### I. Node.js/TypeScript MCP Server (`peekaboo-mcp`) + +#### A. Project Setup & Distribution + +1. **Language/Runtime:** Node.js (latest LTS recommended, e.g., v18+ or v20+), TypeScript (latest stable, e.g., v5+). +2. **Package Manager:** NPM. +3. **`package.json`:** + * `name`: `"peekaboo-mcp"` + * `version`: Semantic versioning (e.g., `1.1.1`). + * `type`: `"module"` (for ES Modules). + * `main`: `"dist/index.js"` (compiled server entry point). + * `bin`: `{ "peekaboo-mcp": "dist/index.js" }`. + * `files`: `["dist/", "peekaboo"]` (includes compiled JS and the Swift `peekaboo` binary at package root). + * `scripts`: + * `build`: Command to compile TypeScript (e.g., `tsc`). + * `start`: `node dist/index.js`. + * `prepublishOnly`: `npm run build`. + * `dependencies`: `@modelcontextprotocol/sdk` (latest stable), `zod` (for input validation), `pino` (for logging), relevant cloud AI SDKs (e.g., `openai`, `@anthropic-ai/sdk`). + * `devDependencies`: `typescript`, `@types/node`, `pino-pretty` (for optional development console logging). +4. **Distribution:** Published to NPM. Installable via `npm i -g peekaboo-mcp` or usable with `npx peekaboo-mcp`. +5. **Swift CLI Location Strategy:** + * The Node.js server will first check the environment variable `PEEKABOO_CLI_PATH`. If set and points to a valid executable, that path will be used. + * If `PEEKABOO_CLI_PATH` is not set or invalid, the server will fall back to a bundled path, resolved relative to its own script location (e.g., `path.resolve(path.dirname(fileURLToPath(import.meta.url)), '..', 'peekaboo')`, assuming the compiled server script is in `dist/` and `peekaboo` binary is at the package root). + +#### B. Server Initialization & Configuration (`src/index.ts`) + +1. **Imports:** `McpServer`, `StdioServerTransport` from `@modelcontextprotocol/sdk`; `pino` from `pino`; `os`, `path` from Node.js built-ins. +2. **Server Info:** `name: "PeekabooMCP"`, `version: `. +3. **Server Capabilities:** Advertise `tools` capability. +4. **Logging (Pino):** + * Instantiate `pino` logger. + * **Default Transport:** File transport to `path.join(os.tmpdir(), 'peekaboo-mcp.log')`. Use `mkdir: true` option for destination. + * **Log Level:** Controlled by ENV VAR `LOG_LEVEL` (standard Pino levels: `trace`, `debug`, `info`, `warn`, `error`, `fatal`). Default: `"info"`. + * **Conditional Console Logging (Development Only):** If ENV VAR `PEEKABOO_MCP_CONSOLE_LOGGING="true"`, add a second Pino transport targeting `process.stderr.fd` (potentially using `pino-pretty` for human-readable output). + * **Strict Rule:** All server operational logging must use the configured Pino instance. No direct `console.log/warn/error` that might output to `stdout`. +5. **Environment Variables (Read by Server):** + * `AI_PROVIDERS`: Comma-separated list of `provider_name/default_model_for_provider` pairs (e.g., `"openai/gpt-4o,ollama/qwen2.5vl:7b"`). If unset/empty, `peekaboo.analyze` tool reports AI not configured. + * `OPENAI_API_KEY`: API key for OpenAI. + * `ANTHROPIC_API_KEY`: (Example for future) API key for Anthropic. + * (Other cloud provider API keys as standard ENV VAR names). + * `OLLAMA_BASE_URL`: Base URL for local Ollama instance. Default: `"http://localhost:11434"`. + * `LOG_LEVEL`: For Pino logger. Default: `"info"`. + * `PEEKABOO_MCP_CONSOLE_LOGGING`: Boolean (`"true"`/`"false"`) for dev console logs. Default: `"false"`. + * `PEEKABOO_CLI_PATH`: Optional override for Swift `peekaboo` CLI path. +6. **Initial Status Reporting Logic:** + * A server-instance-level boolean flag: `let hasSentInitialStatus = false;`. + * A function `generateServerStatusString()`: Creates a formatted string: `"\n\n--- Peekaboo MCP Server Status ---\nName: PeekabooMCP\nVersion: \nConfigured AI Providers (from AI_PROVIDERS ENV): \n---"`. + * Response Augmentation: In the function that sends a `ToolResponse` back to the MCP client, if the response is for a successful tool call (not `initialize`/`initialized` or `peekaboo.list` with `item_type: "server_status"`) AND `hasSentInitialStatus` is `false`: + * Append `generateServerStatusString()` to the first `TextContentItem` in `ToolResponse.content`. If no text item exists, prepend a new one. + * Set `hasSentInitialStatus = true`. +7. **Tool Registration:** Register `peekaboo.image`, `peekaboo.analyze`, `peekaboo.list` with their Zod input schemas and handler functions. +8. **Transport:** `await server.connect(new StdioServerTransport());`. +9. **Shutdown:** Implement graceful shutdown on `SIGINT`, `SIGTERM` (e.g., `await server.close(); logger.flush(); process.exit(0);`). + +#### C. MCP Tool Specifications & Node.js Handler Logic + +**General Node.js Handler Pattern (for tools calling Swift `peekaboo` CLI):** + +1. Validate MCP `input` against the tool's Zod schema. If invalid, log error with Pino and return MCP error `ToolResponse`. +2. Construct command-line arguments for Swift `peekaboo` CLI based on MCP `input`. **Always include `--json-output`**. +3. Log the constructed Swift command with Pino at `debug` level. +4. Execute Swift `peekaboo` CLI using `child_process.spawn`, capturing `stdout`, `stderr`, and `exitCode`. +5. If any data is received on Swift CLI's `stderr`, log it immediately with Pino at `warn` level, prefixed (e.g., `[SwiftCLI-stderr]`). +6. On Swift CLI process close: + * If `exitCode !== 0` or `stdout` is empty/not parseable as JSON: + * Log failure details with Pino (`error` level). + * Construct MCP error `ToolResponse` (e.g., `errorCode: "SWIFT_CLI_EXECUTION_ERROR"` or `SWIFT_CLI_INVALID_OUTPUT` in `_meta`). Message should include relevant parts of raw `stdout`/`stderr` if available. + * If `exitCode === 0`: + * Attempt to parse `stdout` as JSON. If parsing fails, treat as error (above). + * Let `swiftResponse = JSON.parse(stdout)`. + * If `swiftResponse.debug_logs` (array of strings) exists, log each entry via Pino at `debug` level, clearly marked as from backend (e.g., `logger.debug({ backend: "swift", swift_log: entry })`). + * If `swiftResponse.success === false`: + * Extract `swiftResponse.error.message`, `swiftResponse.error.code`, `swiftResponse.error.details`. + * Construct and return MCP error `ToolResponse`, relaying these details (e.g., `message` in `content`, `code` in `_meta.backend_error_code`). + * If `swiftResponse.success === true`: + * Process `swiftResponse.data` to construct the success MCP `ToolResponse`. + * Relay `swiftResponse.messages` as `TextContentItem`s in the MCP response if appropriate. + * For `peekaboo.image` with `input.return_data: true`: + * Iterate `swiftResponse.data.saved_files.[*].path`. + * For each path, read image file into a `Buffer`. + * Base64 encode the `Buffer`. + * Construct `ImageContentItem` for MCP `ToolResponse.content`, including `data` (Base64 string) and `mimeType` (from `swiftResponse.data.saved_files.[*].mime_type`). + * Augment successful `ToolResponse` with initial server status string if applicable (see B.6). + * Send MCP `ToolResponse`. + +**Tool 1: `peekaboo.image`** + +* **MCP Description:** "Captures macOS screen content. Targets: entire screen (each display separately), a specific application window, or all windows of an application. Supports foreground/background capture. Captured image(s) can be saved to file(s) and/or returned directly as image data. Window shadows/frames are automatically excluded. Application identification uses intelligent fuzzy matching." +* **MCP Input Schema (`ImageInputSchema`):** + ```typescript + z.object({ + app: z.string().optional().describe("Optional. Target application: name, bundle ID, or partial name. If omitted, captures screen(s). Uses fuzzy matching."), + path: z.string().optional().describe("Optional. Base absolute path for saving. For 'screen' or 'multi' mode, display/window info is appended by backend. If omitted, default temporary paths used by backend. If 'return_data' true, images saved AND returned if 'path' specified."), + mode: z.enum(["screen", "window", "multi"]).optional().describe("Capture mode. Defaults to 'window' if 'app' is provided, otherwise 'screen'."), + window_specifier: z.union([ + z.object({ title: z.string().describe("Capture window by title.") }), + z.object({ index: z.number().int().nonnegative().describe("Capture window by index (0=frontmost). 'capture_focus' might need to be 'foreground'.") }), + ]).optional().describe("Optional. Specifies which window for 'window' mode. Defaults to main/frontmost of target app."), + format: z.enum(["png", "jpg"]).optional().default("png").describe("Output image format. Defaults to 'png'."), + return_data: z.boolean().optional().default(false).describe("Optional. If true, image data is returned in response content (one item for 'window' mode, multiple for 'screen' or 'multi' mode)."), + capture_focus: z.enum(["background", "foreground"]) + .optional().default("background").describe("Optional. Focus behavior. 'background' (default): capture without altering window focus. 'foreground': bring target to front before capture.") + }) + ``` + * **Node.js Handler - Default `mode` Logic:** If `input.app` provided & `input.mode` undefined, `mode="window"`. If no `input.app` & `input.mode` undefined, `mode="screen"`. +* **MCP Output Schema (`ToolResponse`):** + * `content`: `Array` + * If `input.return_data: true`: Contains `ImageContentItem`(s): `{ type: "image", data: "", mimeType: "image/", metadata?: { item_label?: string, window_title?: string, window_id?: number, source_path?: string } }`. + * May contain `TextContentItem`(s) (summary, file paths from `saved_files`, Swift CLI `messages`). + * `saved_files`: `Array<{ path: string, item_label?: string, window_title?: string, window_id?: number, mime_type: string }>` (Directly from Swift CLI JSON `data.saved_files` if images were saved). + * `isError?: boolean` + * `_meta?: { backend_error_code?: string }` (For relaying Swift CLI error codes). + +**Tool 2: `peekaboo.analyze`** + +* **MCP Description:** "Analyzes an image file using a configured AI model (local Ollama, cloud OpenAI, etc.) and returns a textual analysis/answer. Requires image path. AI provider selection and model defaults are governed by the server's `AI_PROVIDERS` environment variable and client overrides." +* **MCP Input Schema (`AnalyzeInputSchema`):** + ```typescript + z.object({ + image_path: z.string().describe("Required. Absolute path to image file (.png, .jpg, .webp) to be analyzed."), + question: z.string().describe("Required. Question for the AI about the image."), + provider_config: z.object({ + type: z.enum(["auto", "ollama", "openai" /* future: "anthropic_api" */]).default("auto") + .describe("AI provider. 'auto' uses server's AI_PROVIDERS ENV preference. Specific provider must be enabled in server's AI_PROVIDERS."), + model: z.string().optional().describe("Optional. Model name. If omitted, uses model from server's AI_PROVIDERS for chosen provider, or an internal default for that provider.") + }).optional().describe("Optional. Explicit provider/model. Validated against server's AI_PROVIDERS.") + }) + ``` +* **Node.js Handler Logic:** + 1. Validate input. Server pre-checks `image_path` extension (`.png`, `.jpg`, `.jpeg`, `.webp`); return MCP error if not recognized. + 2. Read `process.env.AI_PROVIDERS`. If unset/empty, return MCP error "AI analysis not configured on this server. Set the AI_PROVIDERS environment variable." Log this with Pino (`error` level). + 3. Parse `AI_PROVIDERS` into `configuredItems = [{provider: string, model: string}]`. + 4. **Determine Provider & Model:** + * `requestedProviderType = input.provider_config?.type || "auto"`. + * `requestedModelName = input.provider_config?.model`. + * `chosenProvider: string | null = null`, `chosenModel: string | null = null`. + * If `requestedProviderType !== "auto"`: + * Find entry in `configuredItems` where `provider === requestedProviderType`. + * If not found, MCP error: "Provider '{requestedProviderType}' is not enabled in server's AI_PROVIDERS configuration." + * `chosenProvider = requestedProviderType`. + * `chosenModel = requestedModelName || model_from_matching_configuredItem || hardcoded_default_for_chosenProvider`. + * Else (`requestedProviderType === "auto"`): + * Iterate `configuredItems` in order. For each `{provider, modelFromEnv}`: + * Check availability (Ollama up? Cloud API key for `provider` set in `process.env`?). + * If available: `chosenProvider = provider`, `chosenModel = requestedModelName || modelFromEnv`. Break. + * If no provider found after iteration, MCP error: "No configured AI providers in AI_PROVIDERS are currently operational." + 5. **Execute Analysis (Node.js handles all AI calls):** + * Read `input.image_path` into a `Buffer`. Base64 encode. + * If `chosenProvider` is "ollama": HTTP POST to Ollama (using `process.env.OLLAMA_BASE_URL`) with Base64 image, `input.question`, `chosenModel`. Handle Ollama API errors. + * If `chosenProvider` is "openai": Use OpenAI SDK/HTTP with Base64 image, `input.question`, `chosenModel`, and API key from `process.env.OPENAI_API_KEY`. Handle OpenAI API errors. + * (Similar for other cloud providers). + 6. Construct MCP `ToolResponse`. +* **MCP Output Schema (`ToolResponse`):** + * `content`: `[{ type: "text", text: "" }]` + * `analysis_text`: `string` (Core AI answer). + * `model_used`: `string` (e.g., "ollama/llava:7b", "openai/gpt-4o") - The actual provider/model pair used. + * `isError?: boolean` + * `_meta?: { backend_error_code?: string }` (For AI provider API errors). + +**Tool 3: `peekaboo.list`** + +* **MCP Description:** "Lists system items: all running applications, windows of a specific app, or server status. Allows specifying window details. App ID uses fuzzy matching." +* **MCP Input Schema (`ListInputSchema`):** + ```typescript + z.object({ + item_type: z.enum(["running_applications", "application_windows", "server_status"]) + .default("running_applications").describe("What to list. 'server_status' returns Peekaboo server info."), + app: z.string().optional().describe("Required if 'item_type' is 'application_windows'. Target application. Uses fuzzy matching."), + include_window_details: z.array( + z.enum(["off_screen", "bounds", "ids"]) + ).optional().describe("Optional, for 'application_windows'. Additional window details. Example: ['bounds', 'ids']") + }).refine(data => data.item_type !== "application_windows" || (data.app !== undefined && data.app.trim() !== ""), { + message: "For 'application_windows', 'app' identifier is required.", path: ["app"], + }).refine(data => !data.include_window_details || data.item_type === "application_windows", { + message: "'include_window_details' only for 'application_windows'.", path: ["include_window_details"], + }).refine(data => data.item_type !== "server_status" || (data.app === undefined && data.include_window_details === undefined), { + message: "'app' and 'include_window_details' not applicable for 'server_status'.", path: ["item_type"] + }) + ``` +* **Node.js Handler Logic:** + * If `input.item_type === "server_status"`: Handler directly calls `generateServerStatusString()` and returns it in `ToolResponse.content[{type:"text"}]`. Does NOT call Swift CLI. Does NOT affect `hasSentInitialStatus`. + * Else (for "running_applications", "application_windows"): Call Swift `peekaboo list ...` with mapped args (including joining `include_window_details` array to comma-separated string for Swift CLI flag). Parse Swift JSON. Format MCP `ToolResponse`. +* **MCP Output Schema (`ToolResponse`):** + * `content`: `[{ type: "text", text: "

" }]` + * If `item_type: "running_applications"`: `application_list`: `Array<{ app_name: string; bundle_id: string; pid: number; is_active: boolean; window_count: number }>`. + * If `item_type: "application_windows"`: + * `window_list`: `Array<{ window_title: string; window_id?: number; window_index?: number; bounds?: {x:number,y:number,w:number,h:number}; is_on_screen?: boolean }>`. + * `target_application_info`: `{ app_name: string; bundle_id?: string; pid: number }`. + * `isError?: boolean` + * `_meta?: { backend_error_code?: string }` + +--- + +### II. Swift CLI (`peekaboo`) + +#### A. General CLI Design + +1. **Executable Name:** `peekaboo` (Universal macOS binary: arm64 + x86_64). +2. **Argument Parser:** Use `swift-argument-parser` package. +3. **Top-Level Commands (Subcommands of `peekaboo`):** `image`, `list`. (No `analyze` command). +4. **Global Option (for all commands/subcommands):** `--json-output` (Boolean flag). + * If present: All `stdout` from Swift CLI MUST be a single, valid JSON object. `stderr` should be empty on success, or may contain system-level error text on catastrophic failure before JSON can be formed. + * If absent: Output human-readable text to `stdout` and `stderr` as appropriate for direct CLI usage. + * **Success JSON Structure:** + ```json + { + "success": true, + "data": { /* Command-specific structured data */ }, + "messages": ["Optional user-facing status/warning message from Swift CLI operations"], + "debug_logs": ["Internal Swift CLI debug log entry 1", "Another trace message"] + } + ``` + * **Error JSON Structure:** + ```json + { + "success": false, + "error": { + "message": "Detailed, user-understandable error message.", + "code": "SWIFT_ERROR_CODE_STRING", // e.g., PERMISSION_DENIED_SCREEN_RECORDING + "details": "Optional additional technical details or context." + }, + "debug_logs": ["Contextual debug log leading to error"] + } + ``` + * **Standardized Swift Error Codes (`error.code` values):** + * `PERMISSION_DENIED_SCREEN_RECORDING` + * `PERMISSION_DENIED_ACCESSIBILITY` (if Accessibility API is attempted for foregrounding) + * `APP_NOT_FOUND` (general app lookup failure) + * `AMBIGUOUS_APP_IDENTIFIER` (fuzzy match yields multiple candidates) + * `WINDOW_NOT_FOUND` + * `CAPTURE_FAILED` (general image capture error) + * `FILE_IO_ERROR` (e.g., cannot write to specified path) + * `INVALID_ARGUMENT` (CLI argument validation failure) + * `SIPS_ERROR` (if `sips` is used for PDF fallback and fails) + * `INTERNAL_SWIFT_ERROR` (unexpected Swift runtime errors) +5. **Permissions Handling:** + * The CLI must proactively check for Screen Recording permission before attempting any capture or window listing that requires it (e.g., reading window titles via `CGWindowListCopyWindowInfo`). + * If Accessibility is used for `--capture-focus foreground` window raising, check that permission. + * If permissions are missing, output the specific JSON error (e.g., code `PERMISSION_DENIED_SCREEN_RECORDING`) and exit. Do not hang or prompt interactively. +6. **Temporary File Management:** + * If the CLI needs to save an image temporarily (e.g., if `screencapture` is used as a fallback for PDF, or if no `--path` is given by Node.js), it uses `FileManager.default.temporaryDirectory` with unique filenames (e.g., `peekaboo__.`). + * These self-created temporary files **MUST be deleted by the Swift CLI** after it has successfully generated and flushed its JSON output to `stdout`. + * Files saved to a user/Node.js-specified `--path` are **NEVER** deleted by the Swift CLI. +7. **Internal Logging for `--json-output`:** + * When `--json-output` is active, internal verbose/debug messages are collected into the `debug_logs: [String]` array in the final JSON output. They are **NOT** printed to `stderr`. + * For standalone CLI use (no `--json-output`), these debug messages can print to `stderr`. + +#### B. `peekaboo image` Command + +* **Options (defined using `swift-argument-parser`):** + * `--app `: App identifier. + * `--path `: Base output directory or file prefix/path. + * `--mode `: `ModeEnum` is `screen, window, multi`. Default logic: if `--app` then `window`, else `screen`. + * `--window-title `: For `mode window`. + * `--window-index `: For `mode window`. + * `--format `: `FormatEnum` is `png, jpg`. Default `png`. + * `--capture-focus `: `FocusEnum` is `background, foreground`. Default `background`. +* **Behavior:** + * Implements fuzzy app matching. On ambiguity, returns JSON error with `code: "AMBIGUOUS_APP_IDENTIFIER"` and lists potential matches in `error.details` or `error.message`. + * Always attempts to exclude window shadow/frame (`CGWindowImageOption.boundsIgnoreFraming` or `screencapture -o` if shelled out for PDF). No cursor is captured. + * **Background Capture (`--capture-focus background` or default):** + * Primary method: Uses `CGWindowListCopyWindowInfo` to identify target window(s)/screen(s). + * Captures via `CGDisplayCreateImage` (for screen mode) or `CGWindowListCreateImageFromArray` (for window/multi modes). + * Converts `CGImage` to `Data` (PNG or JPG) and saves to file (at user `--path` or its own temp path). + * **Foreground Capture (`--capture-focus foreground`):** + * Activates app using `NSRunningApplication.activate(options: [.activateIgnoringOtherApps])`. + * If a specific window needs raising (e.g., from `--window-index` or specific `--window-title` for an app with many windows), it *may* attempt to use Accessibility API (`AXUIElementPerformAction(kAXRaiseAction)`) if available and permissioned. + * If specific window raise fails (or Accessibility not used/permitted), it logs a warning to the `debug_logs` array (e.g., "Could not raise specific window; proceeding with frontmost of activated app.") and captures the most suitable front window of the activated app. + * Capture mechanism is still preferably native CG APIs. + * **Multi-Screen (`--mode screen`):** Enumerates `CGGetActiveDisplayList`, captures each display using `CGDisplayCreateImage`. Filenames (if saving) get display-specific suffixes (e.g., `_display0_main.png`, `_display1.png`). + * **Multi-Window (`--mode multi`):** Uses `CGWindowListCopyWindowInfo` for target app's PID, captures each relevant window (on-screen by default) with `CGWindowListCreateImageFromArray`. Filenames get window-specific suffixes. + * **PDF Format Handling (as per Q7 decision):** If `--format pdf` were still supported (it's removed), it would use `Process` to call `screencapture -t pdf -R` or `-l`. Since PDF is removed, this is not applicable. +* **JSON Output `data` field structure (on success):** + ```json + { + "saved_files": [ // Array is always present, even if empty (e.g. capture failed before saving) + { + "path": "/absolute/path/to/saved/image.png", // Absolute path + "item_label": "Display 1 / Main", // Or window_title for window/multi modes + "window_id": 12345, // CGWindowID (UInt32), optional, if available & relevant + "window_index": 0, // Optional, if relevant (e.g. for multi-window or indexed capture) + "mime_type": "image/png" // Actual MIME type of the saved file + } + // ... more items if mode is screen or multi ... + ] + } + ``` + +#### C. `peekaboo list` Command + +* **Subcommands & Options:** + * `peekaboo list apps [--json-output]` + * `peekaboo list windows --app [--include-details ] [--json-output]` + * `--include-details` options: `off_screen`, `bounds`, `ids`. +* **Behavior:** + * `apps`: Uses `NSWorkspace.shared.runningApplications`. For each app, retrieves `localizedName`, `bundleIdentifier`, `processIdentifier` (pid), `isActive`. To get `window_count`, it performs a `CGWindowListCopyWindowInfo` call filtered by the app's PID and counts on-screen windows. + * `windows`: + * Resolves `app_identifier` using fuzzy matching. If ambiguous, returns JSON error. + * Uses `CGWindowListCopyWindowInfo` filtered by the target app's PID. + * If `--include-details` contains `"off_screen"`, uses `CGWindowListOption.optionAllScreenWindows` (and includes `kCGWindowIsOnscreen` boolean in output). Otherwise, uses `CGWindowListOption.optionOnScreenOnly`. + * Extracts `kCGWindowName` (title). + * If `"ids"` in `--include-details`, extracts `kCGWindowNumber` as `window_id`. + * If `"bounds"` in `--include-details`, extracts `kCGWindowBounds` as `bounds: {x, y, width, height}`. + * `window_index` is the 0-based index from the filtered array returned by `CGWindowListCopyWindowInfo` (reflecting z-order for on-screen windows). +* **JSON Output `data` field structure (on success):** + * For `apps`: + ```json + { + "applications": [ + { + "app_name": "Safari", + "bundle_id": "com.apple.Safari", + "pid": 501, + "is_active": true, + "window_count": 3 // Count of on-screen windows for this app + } + // ... more applications ... + ] + } + ``` + * For `windows`: + ```json + { + "target_application_info": { + "app_name": "Safari", + "pid": 501, + "bundle_id": "com.apple.Safari" + }, + "windows": [ + { + "window_title": "Apple", + "window_id": 67, // if "ids" requested + "window_index": 0, + "is_on_screen": true, // Potentially useful, especially if "off_screen" included + "bounds": {"x": 0, "y": 0, "width": 800, "height": 600} // if "bounds" requested + } + // ... more windows ... + ] + } + ``` + +--- + +### III. Build, Packaging & Distribution + +1. **Swift CLI (`peekaboo`):** + * `Package.swift` defines an executable product named `peekaboo`. + * Build process (e.g., part of NPM `prepublishOnly` or a separate build script): `swift build -c release --arch arm64 --arch x86_64`. + * The resulting universal binary (e.g., from `.build/apple/Products/Release/peekaboo`) is copied to the root of the `peekaboo-mcp` NPM package directory before publishing. +2. **Node.js MCP Server:** + * TypeScript is compiled to JavaScript (e.g., into `dist/`) using `tsc`. + * The NPM package includes `dist/` and the `peekaboo` Swift binary (at package root). + +--- + +### IV. Documentation (`README.md` for `peekaboo-mcp` NPM Package) + +1. **Project Overview:** Briefly state vision and components. +2. **Prerequisites:** + * macOS version (e.g., 12.0+ or as required by Swift/APIs). + * Xcode Command Line Tools (recommended for a stable development environment on macOS, even if not strictly used by the final Swift binary for all operations). + * Ollama (if using local Ollama for analysis) + instructions to pull models. +3. **Installation:** + * Primary: `npm install -g peekaboo-mcp`. + * Alternative: `npx peekaboo-mcp`. +4. **MCP Client Configuration:** + * Provide example JSON snippets for configuring popular MCP clients (e.g., VS Code, Cursor) to use `peekaboo-mcp`. + * Example for VS Code/Cursor using `npx` for robustness: + ```json + { + "mcpServers": { + "PeekabooMCP": { + "command": "npx", + "args": ["peekaboo-mcp"], + "env": { + "AI_PROVIDERS": "ollama/llava:latest,openai/gpt-4o", + "OPENAI_API_KEY": "sk-yourkeyhere" + /* other ENV VARS */ + } + } + } + } + ``` +5. **Required macOS Permissions:** + * **Screen Recording:** Essential for ALL `peekaboo.image` functionalities and for `peekaboo.list` if it needs to read window titles (which it does via `CGWindowListCopyWindowInfo`). Provide clear, step-by-step instructions for System Settings. Include `open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"` command. + * **Accessibility:** Required *only* if `peekaboo.image` with `capture_focus: "foreground"` needs to perform specific window raising actions (beyond simple app activation) via the Accessibility API. Explain this nuance. Include `open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"` command. +6. **Environment Variables (for Node.js `peekaboo-mcp` server):** + * `AI_PROVIDERS`: Crucial for `peekaboo.analyze`. Explain format (`provider/model,provider/model`), effect, and that `peekaboo.analyze` reports "not configured" if unset. List recognized `provider` names ("ollama", "openai"). + * `OPENAI_API_KEY` (and similar for other cloud providers): How they are used. + * `OLLAMA_BASE_URL`: Default and purpose. + * `LOG_LEVEL`: For `pino` logger. Values and default. + * `PEEKABOO_MCP_CONSOLE_LOGGING`: For development. + * `PEEKABOO_CLI_PATH`: For overriding bundled Swift CLI. +7. **MCP Tool Overview:** + * Brief descriptions of `peekaboo.image`, `peekaboo.analyze`, `peekaboo.list` and their primary purpose. +8. **Link to Detailed Tool Specification:** A separate `TOOL_API_REFERENCE.md` (generated from or summarizing the Zod schemas and output structures in this document) for users/AI developers needing full schema details. +9. **Troubleshooting / Support:** Link to GitHub issues.