mirror of
https://github.com/samsonjs/Peekaboo.git
synced 2026-03-25 09:25:47 +00:00
Add docs
This commit is contained in:
parent
95a5208127
commit
f746dc45c2
8 changed files with 3797 additions and 836 deletions
108
.cursor/rules/agent.mdc
Normal file
108
.cursor/rules/agent.mdc
Normal file
|
|
@ -0,0 +1,108 @@
|
|||
---
|
||||
description:
|
||||
globs:
|
||||
alwaysApply: false
|
||||
---
|
||||
# Agent Instructions
|
||||
|
||||
This file provides guidance to AI assistants when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
This is the `peekaboo` project, which provides a Model Context Protocol (MCP) server that enables executing AppleScript and JavaScript for Automation (JXA) scripts on macOS. The server features a knowledge base of pre-defined scripts accessible by ID and supports inline scripts, script files, and argument passing.
|
||||
|
||||
## Architecture
|
||||
|
||||
- **Server Configuration**: The server reads configuration from environment variables like `LOG_LEVEL` and `KB_PARSING`.
|
||||
- **MCP Tools**: Two main tools are provided:
|
||||
1. `execute_script`: Executes AppleScript/JXA from inline content, file path, or knowledge base ID
|
||||
2. `get_scripting_tips`: Retrieves information from the knowledge base
|
||||
- **Knowledge Base**: A collection of pre-defined scripts stored as Markdown files in `knowledge_base/` directory with YAML frontmatter
|
||||
- **ScriptExecutor**: Core component that executes scripts via `osascript` command
|
||||
|
||||
## Knowledge Base System
|
||||
|
||||
The knowledge base (`knowledge_base/` directory) contains numerous Markdown files organized by category:
|
||||
- Each file has YAML frontmatter with metadata: `id`, `title`, `description`, `language`, etc.
|
||||
- The actual script code is contained in the Markdown body in a fenced code block
|
||||
- Scripts can use placeholders like `--MCP_INPUT:keyName` and `--MCP_ARG_N` for parameter substitution
|
||||
|
||||
## Common Development Commands
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Run the server in development mode with hot reloading
|
||||
npm run dev
|
||||
|
||||
# Build the TypeScript project
|
||||
npm run build
|
||||
|
||||
# Start the compiled server
|
||||
npm run start
|
||||
|
||||
# Lint the codebase
|
||||
npm run lint
|
||||
|
||||
# Format the codebase
|
||||
npm run format
|
||||
|
||||
# Validate the knowledge base
|
||||
npm run validate
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
- `LOG_LEVEL`: Set logging level (`DEBUG`, `INFO`, `WARN`, `ERROR`) - default is `INFO`
|
||||
- `KB_PARSING`: Controls when knowledge base is parsed:
|
||||
- `lazy` (default): Parsed on first request
|
||||
- `eager`: Parsed when server starts
|
||||
|
||||
## Working with the Knowledge Base
|
||||
|
||||
When adding new scripts to the knowledge base:
|
||||
1. Create a new `.md` file in the appropriate category folder
|
||||
2. Include required YAML frontmatter (`title`, `description`, etc.)
|
||||
3. Add the script code in a fenced code block
|
||||
4. Run `npm run validate` to ensure the new content is correctly formatted
|
||||
|
||||
## Code Execution Flow
|
||||
|
||||
1. The `server.ts` file defines the MCP server and its tools
|
||||
2. `knowledgeBaseService.ts` loads and indexes scripts from the knowledge base
|
||||
3. `ScriptExecutor.ts` handles the actual execution of scripts
|
||||
4. Input validation is handled via Zod schemas in `schemas.ts`
|
||||
5. Logging is managed by the `Logger` class in `logger.ts`
|
||||
|
||||
## Security and Permissions
|
||||
|
||||
Remember that scripts run on macOS require specific permissions:
|
||||
- Automation permissions for controlling applications
|
||||
- Accessibility permissions for UI scripting via System Events
|
||||
- Full Disk Access for certain file operations
|
||||
|
||||
## Agent Operational Learnings and Debugging Strategies
|
||||
|
||||
This section captures key operational strategies and debugging techniques for the agent (me) based on collaborative sessions.
|
||||
|
||||
### Prioritizing Log Visibility for Debugging
|
||||
|
||||
When an external tool or script (like AppleScript via `osascript`) returns cryptic errors, or when agent-generated code/substitutions might be faulty:
|
||||
|
||||
1. **Suspect Dynamic Content**: Issues often stem from the dynamic content being passed to the external tool (e.g., incorrect placeholder substitutions leading to syntax errors in the target language).
|
||||
2. **Enable/Add Detailed Logging**: Prioritize enabling any built-in detailed logging features of the tool in question (e.g., `includeSubstitutionLogs: true` for this project's `execute_script` tool).
|
||||
3. **Ensure Log Visibility**: If standard debug logging doesn't appear in the primary output channel the user is observing, attempt to modify the code to force critical diagnostic information (like step-by-step transformations, variable states, or the exact content being passed externally) into that main output. This might involve temporarily altering the structure of the success or error messages to include these logs.
|
||||
* **Confirm Restarts and Code Version**: For changes requiring server restarts (common in this project), leverage any features that confirm the new code is active. For example, the server startup timestamp and execution mode info appended to `get_scripting_tips` output helps verify that a restart was successful and the intended code version (e.g., TypeScript source via `tsx` vs. compiled `dist/server.js`) is running.
|
||||
|
||||
### Iterative Simplification for Complex Patterns (e.g., Regex)
|
||||
|
||||
If a complex pattern (like a regular expression) in code being generated or modified by the agent is not working as expected, and the cause isn't immediately obvious:
|
||||
|
||||
1. **Isolate the Pattern**: Identify the specific complex pattern (e.g., a regex for string replacement).
|
||||
2. **Drastically Simplify**: Reduce the pattern to its most basic form that should still achieve a part of the goal or match a core component of the target string. (e.g., simplifying `/(?:["'])--MCP_INPUT:(\w+)(?:["'])/g` to `/--MCP_INPUT:/g` to test basic matching of the placeholder prefix).
|
||||
3. **Test the Simple Form**: Verify if this simplified pattern works. If it does, the core string manipulation mechanism is likely sound.
|
||||
4. **Incrementally Rebuild & Test**: Gradually add back elements of the original complexity (e.g., capture groups, character sets, quantifiers, lookarounds, backreferences like `\1`). Test at each incremental step to pinpoint which specific construct or combination introduces the failure. This process helped identify that `(?:["'])` was problematic in our placeholder regex, leading to a solution using a capturing group and a backreference like `/(["'])--MCP_INPUT:(\w+)\1/g`.
|
||||
5. **Verify Replacement Logic**: Ensure that if the pattern involves capturing groups for use in a replacement, the replacement logic correctly utilizes these captures and produces the intended output format (e.g., `valueToAppleScriptLiteral` for AppleScript).
|
||||
|
||||
This methodical approach is more effective than repeatedly trying minor variations of an already complex and failing pattern.
|
||||
98
.cursor/rules/mcp-inspector.mdc
Normal file
98
.cursor/rules/mcp-inspector.mdc
Normal file
|
|
@ -0,0 +1,98 @@
|
|||
---
|
||||
description:
|
||||
globs:
|
||||
alwaysApply: false
|
||||
---
|
||||
Rule Name: mcp-inspector
|
||||
Description: Debugging and verifying the `macos-automator-mcp` server via the MCP Inspector, using Playwright for UI automation and direct terminal commands for server management. This rule prioritizes stability and detailed verification through Playwright's introspection capabilities.
|
||||
|
||||
**Required Tools:**
|
||||
- `run_terminal_cmd`
|
||||
- `mcp_playwright_browser_navigate`
|
||||
- `mcp_playwright_browser_type`
|
||||
- `mcp_playwright_browser_click`
|
||||
- `mcp_playwright_browser_snapshot`
|
||||
- `mcp_playwright_browser_console_messages`
|
||||
- `mcp_playwright_browser_wait_for`
|
||||
|
||||
**User Workspace Path Placeholder:**
|
||||
- The path to the `start.sh` script will be specified as `[WORKSPACE_PATH]/start.sh`.
|
||||
- The AI assistant executing this rule **MUST** replace `[WORKSPACE_PATH]` with the absolute path to the user's current project workspace (e.g., as found in the `<user_info>` context block during rule execution).
|
||||
- Example of a resolved path if the workspace is `/Users/username/Projects/my-mcp-project`: `/Users/username/Projects/my-mcp-project/start.sh`.
|
||||
|
||||
---
|
||||
**Main Flow:**
|
||||
|
||||
**Phase 1: Start MCP Inspector Server**
|
||||
1. **Kill Existing Inspector Processes:**
|
||||
* Action: Call `run_terminal_cmd`.
|
||||
* `command`: `pkill -f 'npx @modelcontextprotocol/inspector' || true`
|
||||
* `is_background`: `false`
|
||||
* Expected: Cleans up any lingering Inspector processes.
|
||||
2. **Start New Inspector Process:**
|
||||
* Action: Call `run_terminal_cmd`.
|
||||
* `command`: `npx @modelcontextprotocol/inspector`
|
||||
* `is_background`: `true`
|
||||
* Expected: MCP Inspector starts in the background.
|
||||
3. **Wait for Inspector Initialization:**
|
||||
* Action: Call `mcp_playwright_browser_wait_for`.
|
||||
* `time`: `10` (seconds)
|
||||
* Expected: Allows ample time for the Inspector server to be ready. This step requires an active Playwright page, so it's implicitly preceded by navigation in Phase 2 if the browser isn't already open.
|
||||
|
||||
**Phase 2: Connect to Server via Playwright**
|
||||
1. **Navigate to Inspector URL:**
|
||||
* Action: Call `mcp_playwright_browser_navigate`.
|
||||
* `url`: `http://127.0.0.1:6274`
|
||||
* Expected: Playwright opens the MCP Inspector web UI.
|
||||
* Snapshot: Take a snapshot (`mcp_playwright_browser_snapshot`) to confirm page load and identify initial form element references (`ref`).
|
||||
2. **Fill Form (Command & Args only):**
|
||||
* **Set Command:**
|
||||
* Action: Call `mcp_playwright_browser_type`.
|
||||
* `element`: "Command textbox" (Obtain `ref` from snapshot).
|
||||
* `text`: `macos-automator-mcp`
|
||||
* **Set Arguments:**
|
||||
* Action: Call `mcp_playwright_browser_type`.
|
||||
* `element`: "Arguments textbox" (Obtain `ref` from snapshot).
|
||||
* `text`: `[WORKSPACE_PATH]/start.sh` (This placeholder MUST be replaced by the AI executing the rule with the absolute path to the user's current workspace).
|
||||
* *(Note: Environment Variables are skipped in this flow for simplicity and stability, as issues were previously observed when setting LOG_LEVEL=DEBUG during connection.)*
|
||||
3. **Click "Connect":**
|
||||
* Action: Call `mcp_playwright_browser_click`.
|
||||
* `element`: "Connect button" (Obtain `ref` from snapshot).
|
||||
* Expected: Connection to the `macos-automator-mcp` server is established.
|
||||
* Snapshot: Take a snapshot. Verify connection status (e.g., text changes to "Connected") and check for initial server logs in the UI.
|
||||
|
||||
**Phase 3: Interact with a Tool via Playwright**
|
||||
1. **List Tools:**
|
||||
* Action: Call `mcp_playwright_browser_click`.
|
||||
* `element`: "List Tools button" (Obtain `ref` from the latest snapshot).
|
||||
* Expected: The list of available tools from the `macos-automator-mcp` server is displayed.
|
||||
* Snapshot: Take a snapshot. Verify tools like `execute_script` and `get_scripting_tips` are visible.
|
||||
2. **Select 'get_scripting_tips' Tool:**
|
||||
* Action: Call `mcp_playwright_browser_click`.
|
||||
* `element`: "get_scripting_tips tool in list" (Obtain `ref` by identifying it in the snapshot's tool list).
|
||||
* Expected: The parameters form for `get_scripting_tips` is displayed in the right-hand panel.
|
||||
* Snapshot: Take a snapshot. Verify the right panel shows details for `get_scripting_tips` (e.g., its name, description, and parameter fields like 'searchTerm', 'listCategories', etc.).
|
||||
3. **Execute 'get_scripting_tips' (default parameters):**
|
||||
* Action: Call `mcp_playwright_browser_click`.
|
||||
* `element`: "Run Tool button" (Obtain `ref` for the 'Run Tool' button specific to the `get_scripting_tips` form in the right panel from the snapshot).
|
||||
* Expected: The `get_scripting_tips` tool is executed with its default parameters.
|
||||
* Snapshot: Take a snapshot.
|
||||
|
||||
**Phase 4: Verify Tool Execution and Logs in Playwright**
|
||||
1. **Check for Results in UI:**
|
||||
* Action: Examine the latest snapshot.
|
||||
* Look for: The results of the `get_scripting_tips` call (e.g., a list of script categories if `listCategories` was implicitly true by default, or an empty result if no default search term was run).
|
||||
* The results should appear in the 'Result from tool' or a similarly named section within the right-hand panel where the tool's form was.
|
||||
2. **Check Console Logs (Optional but Recommended):**
|
||||
* Action: Call `mcp_playwright_browser_console_messages`.
|
||||
* Expected: Review for any errors or relevant messages from the Inspector or the tool interaction.
|
||||
3. **Check MCP Server Logs in UI:**
|
||||
* Action: Examine the latest snapshot.
|
||||
* Look for: Logs related to the `get_scripting_tips` tool execution in the main server log panel (usually bottom-left, titled "Error output from MCP server" or similar, but also shows general logs).
|
||||
|
||||
**Troubleshooting Notes:**
|
||||
- If connection fails, check the `run_terminal_cmd` output for the Inspector to ensure it started correctly.
|
||||
- Check Playwright console messages for clues.
|
||||
- Ensure the `[WORKSPACE_PATH]` was correctly resolved and points to an existing `start.sh` script.
|
||||
- Element `ref` values can change. Always use the latest snapshot to get correct `ref` values before an interaction.
|
||||
- Shadow DOM: The MCP Inspector UI uses Shadow DOM extensively for the tool details and results panels. Playwright's default selectors should pierce Shadow DOM, but if issues arise with finding elements *within* the tool panel (right-hand side after selecting a tool), be mindful of this. The provided flow assumes Playwright's auto-piercing handles this sufficiently.
|
||||
216
.cursor/rules/safari.mdc
Normal file
216
.cursor/rules/safari.mdc
Normal file
|
|
@ -0,0 +1,216 @@
|
|||
---
|
||||
description:
|
||||
globs:
|
||||
alwaysApply: false
|
||||
---
|
||||
### Meta Note
|
||||
|
||||
This file, `safari.mdc`, serves as a repository for detailed working notes, observations, and learnings acquired during the process of automating Safari interactions, particularly for the MCP Inspector UI. It's intended to capture the nuances of trial-and-error, debugging steps, and insights into what worked, what didn't, and why.
|
||||
|
||||
This contrasts with `mcp-inspector.mdc`, which is designed to be the concise, polished, and operational ruleset for future automated runs once a specific automation flow (like connecting to the MCP Inspector) has been stabilized and proven reliable. `mcp-inspector.mdc` should contain the 'final' working scripts and minimal necessary commentary, while `safari.mdc` is the space for the extended antechamber of discovery.
|
||||
|
||||
---
|
||||
|
||||
### Key Learnings and Observations from Safari Automation (MCP Inspector)
|
||||
|
||||
#### 1. Managing Safari Windows and Tabs for the Inspector
|
||||
|
||||
* **Objective:** Reliably direct Safari to the MCP Inspector URL (`http://127.0.0.1:6274`) in a predictable way, preferably using a single, consistent browser window and tab to avoid disrupting the user's workspace or losing context.
|
||||
* **Initial Challenges & Evolution:
|
||||
* Simply using `make new document with properties {URL:"..."}` could lead to multiple windows/tabs if not managed.
|
||||
* Attempts to close all existing Inspector tabs first (`repeat with w in windows... close t...`) were functional but could be overly aggressive if the user had other work in Safari.
|
||||
* Identifying and reusing an *existing specific tab* for the Inspector requires careful targeting (e.g., `first tab whose URL starts with "..."`). If this tab was from a previous, unconfigured session, just switching to it wasn't enough; it needed to be reloaded/reset.
|
||||
* **Refined & Recommended Approach (as implemented in `mcp-inspector.mdc`):
|
||||
```applescript
|
||||
tell application "Safari"
|
||||
activate
|
||||
delay 0.2 -- Allow Safari to become the frontmost application
|
||||
if (count of windows) is 0 then
|
||||
-- No Safari windows are open, so create a new one.
|
||||
make new document with properties {URL:"http://127.0.0.1:6274"}
|
||||
else
|
||||
-- Safari has windows open; use the frontmost one.
|
||||
tell front window
|
||||
set inspectorTab to missing value
|
||||
try
|
||||
-- Check if a tab for the Inspector is already open in this window.
|
||||
set inspectorTab to (first tab whose URL starts with "http://127.0.0.1:6274")
|
||||
end try
|
||||
|
||||
if inspectorTab is not missing value then
|
||||
-- An Inspector tab exists: set its URL again (to refresh/reset) and make it active.
|
||||
set URL of inspectorTab to "http://127.0.0.1:6274"
|
||||
set current tab to inspectorTab
|
||||
else
|
||||
-- No specific Inspector tab found: set the URL of the *current active tab*.
|
||||
set URL of current tab to "http://127.0.0.1:6274"
|
||||
end if
|
||||
end tell
|
||||
end if
|
||||
delay 1 -- Pause to allow the page to begin loading.
|
||||
end tell
|
||||
```
|
||||
This logic aims to use the existing front window and either reuse/refresh an Inspector tab or repurpose the current active tab, falling back to creating a new window only if Safari isn't open.
|
||||
|
||||
#### 2. Clicking Elements Programmatically (The "Connect" Button Saga)
|
||||
|
||||
* **The Core Challenge:** Programmatically clicking the "Connect" button in the MCP Inspector UI to initiate the server connection.
|
||||
* **Strategies Explored & Lessons:
|
||||
* **CSS Selectors (`querySelector`):**
|
||||
* Simple selectors like `[data-testid='env-vars-button']` worked for some buttons but required escaping single quotes in AppleScript: `do JavaScript "document.querySelector('[data-testid=\\\'env-vars-button\\']').click();"`.
|
||||
* A complex `querySelector` for the "Connect" button (e.g., `'button[data-testid*=connect-button], button:not([disabled])... > span:contains(Connect)...'.click()`) ran without JS error but didn't reliably establish the connection, suggesting it might not have found the exact interactable element or the click wasn't registering correctly.
|
||||
* **XPath (`document.evaluate`):**
|
||||
* **Highly Specific XPaths:** An initial XPath based on the rule (`//button[contains(., 'Connect') and .//svg[.//polygon[@points='6 3 20 12 6 21 6 3']]]`) was very difficult to embed correctly in AppleScript due to nested single quotes requiring complex escaping (`\'`). This often led to AppleScript parsing errors (`-2741`).
|
||||
* **`character id 39` for AppleScript String Construction:** To combat escaping issues, building the JavaScript string in AppleScript using `set sQuote to character id 39` for internal single quotes was effective for getting the AppleScript parser to accept the command. Example:
|
||||
```applescript
|
||||
set sQuote to character id 39
|
||||
set jsConnectText to "Connect"
|
||||
set specificXPath to "//button[contains(., " & sQuote & jsConnectText & sQuote & ") and .//svg[.//polygon[@points=" & sQuote & "6 3 20 12 6 21 6 3" & sQuote & "]]]"
|
||||
set jsCommand to "document.evaluate(" & sQuote & specificXPath & sQuote & ", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.click();"
|
||||
```
|
||||
While this made the AppleScript runnable, this very specific XPath still didn't reliably trigger the connection.
|
||||
* **Successful XPath:** The breakthrough came with a slightly less specific but more robust XPath: `//button[.//text()='Connect']`. This finds a button that *contains* a text node exactly matching "Connect".
|
||||
* JavaScript: `document.evaluate("//button[.//text()='Connect']", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.click();`
|
||||
* AppleScript embedding (note `\"` for JS string quotes):
|
||||
```applescript
|
||||
set jsCommand to "document.evaluate(\"//button[.//text()='Connect']\", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.click();"
|
||||
do JavaScript jsCommand in front document
|
||||
```
|
||||
This method proved successful in clicking the button and establishing the connection.
|
||||
* **`dispatchEvent(new MouseEvent('click', ...))`:** This was tried as an alternative to `.click()` but did not yield a different outcome for the "Connect" button in this specific scenario.
|
||||
|
||||
#### 3. JavaScript Construction and Execution in AppleScript
|
||||
|
||||
* **`do JavaScript "..."`:** This is the fundamental command.
|
||||
* **String Literals and Escaping:**
|
||||
* If the AppleScript command itself is enclosed in double quotes (`"..."`), then any literal double quotes *within the JavaScript code* must be escaped as `\\"`.
|
||||
* Single quotes (`'`) within the JavaScript code usually do not need escaping in this context.
|
||||
* Example: `do JavaScript "var el = document.getElementById(\"myId\"); el.value = 'Hello\';"`
|
||||
* **Long/Multiline JavaScript:**
|
||||
* Concatenating multiple AppleScript string literals using `&` (and optionally `¬` for line continuation) can build up a long JavaScript command. However, this can be fragile if not every part is perfectly quoted and escaped. Often, AppleScript parsing errors (`-2741`) occur before the JS is even attempted.
|
||||
* For complex JS, it's often more robust to ensure the entire JavaScript code is a single, well-formed string literal from AppleScript's perspective. If the JS itself is very complex, pre-constructing parts of it in AppleScript variables (especially strings that need careful quoting, like XPaths) can help.
|
||||
* **Returning Values:** The `do JavaScript` command returns the result of the last JavaScript statement executed. This can be invaluable for debugging, e.g., `return 'Found element';` or `return element !== null;`.
|
||||
|
||||
#### 4. Asynchronicity and Delays
|
||||
|
||||
* **Essential `delay` commands (Strategic vs. Tactical):**
|
||||
* **Strategic Delay (Crucial):** A critical lesson was the necessity of a significant delay (e.g., ~5 seconds) *after* an external process like the MCP Inspector is launched (e.g., via `npx` in iTerm) and *before* Safari automation attempts to interact with its web UI. This allows the external process and its web server to fully initialize. Without this, Safari automation might target a page that isn't ready or fully functional, leading to failures.
|
||||
* **Tactical Delays (Within Safari UI Automation - Often Avoidable):** Initially, small `delay` commands were used within Safari AppleScripts after actions like clicks or page loads (e.g., `delay 0.25`, `delay 1`). While these can sometimes help ensure the DOM is updated, the latest successful runs showed that if the backend/server (Inspector) is fully ready (due to the strategic delay), rapid Safari UI interactions (form filling, sequential clicks) can often be performed reliably *without* these internal micro-delays. Removing them can speed up the automation if the underlying application is responsive enough.
|
||||
* **Context is Key:** The need for tactical delays depends on how quickly the web application updates its DOM and responds to JavaScript events. For the MCP Inspector, once it's running, its UI seems to respond quickly enough to handle a sequence of JavaScript commands without interspersed AppleScript delays, provided the commands themselves are valid and target the correct elements.
|
||||
|
||||
* **Checking for Results:** When verifying an action (e.g., checking if `document.body.innerText.includes('Connected')`), it's vital that this check happens *after* the action has had a chance to complete and the UI to reflect the change. If running without tactical delays, this check should still be performed after the relevant JavaScript action that's supposed to cause the change.
|
||||
|
||||
#### 5. MCP Inspector Specifics
|
||||
|
||||
* **URL Consistency:** The MCP Inspector URL (`http://127.0.0.1:6274`) was found to be consistent between runs, simplifying Safari targeting.
|
||||
* **Server Logs in the Inspector UI:** It was confirmed that after the `macos-automator-mcp` server connects via the MCP Inspector, its startup and operational logs (e.g., `[macos_automator_server] [INFO] Starting...`) are displayed directly within the MCP Inspector's web interface in Safari. This is the primary place to check for these server-specific logs, rather than the iTerm console running the `npx @modelcontextprotocol/inspector` command (which shows the Inspector's own proxy/connection logs). The Safari UI shows "Connected" status, and the server logs within the UI provide detailed confirmation of the server's state.
|
||||
|
||||
#### 6. Automating iTerm via AppleScript and Advanced Timing Considerations
|
||||
|
||||
* **Full iTerm Automation via AppleScript:** Due to persistent issues with iTerm-specific MCP tools (e.g., `mcp_iterm_send_control_character`, `mcp_iterm_write_to_terminal` consistently failing with "Tool not found" errors), a robust AppleScript workaround was developed and successfully implemented to manage the iTerm portion of the MCP Inspector setup. This script handles:
|
||||
* Activating iTerm.
|
||||
* Ensuring a window is available.
|
||||
* Sending a Control-C command to the current session using `System Events` (for reliability, targeting the iTerm process) to terminate any running commands.
|
||||
* Writing the `npx @modelcontextprotocol/inspector` command to the iTerm session to start the inspector.
|
||||
* The successful AppleScript structure is as follows (and now part of `mcp-inspector.mdc`):
|
||||
```applescript
|
||||
tell application "iTerm"
|
||||
activate
|
||||
if (count of windows) is 0 then
|
||||
create window with default profile
|
||||
delay 0.5 # Brief delay for window creation
|
||||
end if
|
||||
end tell
|
||||
delay 0.2 # Ensure iTerm is frontmost
|
||||
|
||||
tell application "System Events"
|
||||
# Note: 'iTerm' process name might need to be 'iTerm2' for iTerm3+.
|
||||
tell process "iTerm"
|
||||
keystroke "c" using control down
|
||||
end tell
|
||||
end tell
|
||||
delay 0.2 # Pause after Ctrl-C
|
||||
|
||||
tell application "iTerm"
|
||||
tell current window
|
||||
tell current session
|
||||
write text "npx @modelcontextprotocol/inspector"
|
||||
end tell
|
||||
end tell
|
||||
end tell
|
||||
```
|
||||
|
||||
* **iTerm Process Name in System Events:** When using `System Events` to control iTerm (e.g., for `keystroke`), the `tell process "iTerm"` command might need to be `tell process "iTerm2"` if using iTerm version 3 or later, as the application's registered process name can vary.
|
||||
|
||||
* **Reinforcing the Strategic Delay:** The success of running Safari UI automation steps *without* internal (tactical) delays is highly dependent on the *strategic* delay implemented *after* initiating the MCP Inspector in iTerm and *before* beginning any Safari interaction. A delay of approximately 5 seconds was found to be effective, allowing `npx` and the Inspector server to fully initialize. Attempting Safari automation too soon, especially without tactical delays, will likely result in failures as the web UI won't be ready or responsive.
|
||||
|
||||
#### 7. Interacting with Shadow DOM (Advanced)
|
||||
|
||||
* **Identifying Shadow DOM:** Some web UIs, including potentially parts of the MCP Inspector (especially complex, self-contained components like the tool details and results panels), may use Shadow DOM to encapsulate their structure and styles. Standard `document.querySelector` or `document.evaluate` calls from the main document context will *not* pierce these shadow boundaries.
|
||||
* **Symptoms of Shadow DOM:** If `document.body.innerText` seems to miss details of an active UI component, or if standard selectors fail for visible elements that are clearly part of a specific component, Shadow DOM may be in use.
|
||||
* **Accessing Elements within Shadow DOM (Conceptual JavaScript Approach):**
|
||||
To interact with elements inside a shadow root, you first need a reference to the host element, then access its `shadowRoot` property, and then query within that root.
|
||||
```javascript
|
||||
// 1. Find the host element (custom element tag name, e.g., 'tool-details-panel')
|
||||
const hostElement = document.querySelector('your-shadow-host-tag-name');
|
||||
|
||||
if (hostElement && hostElement.shadowRoot) {
|
||||
const shadowRoot = hostElement.shadowRoot;
|
||||
|
||||
// 2. Query within the shadowRoot for target elements
|
||||
const targetElementInShadow = shadowRoot.querySelector('#some-element-inside-shadow');
|
||||
if (targetElementInShadow) {
|
||||
// targetElementInShadow.click();
|
||||
// return targetElementInShadow.textContent;
|
||||
} else {
|
||||
// return 'Element not found within shadowRoot';
|
||||
}
|
||||
} else {
|
||||
// return 'Shadow host not found or no shadowRoot attached';
|
||||
}
|
||||
```
|
||||
* **Recursive Deep Query Helper (Conceptual):** For nested shadow DOMs or when the exact host is unknown, a recursive or iterative deep query function can be useful. This function would traverse the DOM, checking each element for a `shadowRoot` and searching within it.
|
||||
```javascript
|
||||
function $deep(selector, rootNode = document) {
|
||||
const stack = [rootNode];
|
||||
while (stack.length) {
|
||||
const currentNode = stack.shift();
|
||||
if (currentNode.nodeType === Node.ELEMENT_NODE && currentNode.matches(selector)) {
|
||||
return currentNode;
|
||||
}
|
||||
if (currentNode.shadowRoot) {
|
||||
stack.push(currentNode.shadowRoot);
|
||||
}
|
||||
// Check children only if it's an Element or DocumentFragment (like a shadowRoot)
|
||||
if (currentNode.nodeType === Node.ELEMENT_NODE || currentNode.nodeType === Node.DOCUMENT_FRAGMENT_NODE) {
|
||||
if (currentNode.children) { // Ensure children property exists
|
||||
stack.push(...currentNode.children);
|
||||
}
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
// Usage: const someButton = $deep('button.some-class-in-shadow');
|
||||
```
|
||||
* **Challenges with AppleScript `do JavaScript`:**
|
||||
* **Return Value Limitations:** Complex objects (like DOM elements) or very large strings (like extensive `outerHTML`) returned from `do JavaScript` can sometimes result in `missing value` or empty strings in AppleScript, making debugging difficult.
|
||||
* **Debugging:** Direct console logging from `do JavaScript` is not visible to the AppleScript environment, complicating troubleshooting of JavaScript execution within Safari.
|
||||
* **Reliability:** For highly dynamic UIs with extensive Shadow DOM, the AppleScript `do JavaScript` bridge may not always be reliable enough for complex, multi-step interactions, especially when precise timing or access to nuanced DOM states is required. Direct API/tool calls, if available, are often more robust for verification in such cases.
|
||||
* **Discovering Shadow Host Tag Names:** If the specific tag name of a shadow host is unknown, one might attempt to list all elements that have a `shadowRoot`:
|
||||
```javascript
|
||||
// JavaScript to be executed via AppleScript to list shadow host tag names
|
||||
// (Note: Return value handling by AppleScript needs to be robust, e.g., JSON stringify)
|
||||
// let hosts = [...document.querySelectorAll('*')]\
|
||||
// .filter(el => el.shadowRoot)\
|
||||
// .map(el => el.tagName);\
|
||||
// return JSON.stringify(hosts);\
|
||||
```
|
||||
However, successful execution and return of this data via AppleScript `do JavaScript` can be unreliable, as experienced in attempts to automate the MCP Inspector.
|
||||
|
||||
These notes capture the iterative process and key takeaways from the Safari automation for the MCP Inspector. The successful methods are now enshrined in `mcp-inspector.mdc`, while this document provides the background and context.
|
||||
|
||||
---
|
||||
### Meta-Level Collaboration & Rule Evolution Notes
|
||||
|
||||
* **Rule Refinement for Readability (User Feedback):** Based on user feedback, the main operational rule file (`mcp-inspector.mdc`) was refactored to move lengthy scripts (like the Safari tab setup AppleScript) into an Appendix section (e.g., `[Setup Safari Tab for Inspector]`). This keeps the main flow of the rule concise and readable for both humans and models, while still providing the full implementation details in a structured way. The `safari.mdc` file is designated for the more verbose, evolutionary notes and debugging narratives.
|
||||
* **Tool Usage Preferences (User Feedback):** User indicated a preference for using the `edit_file` tool for modifying rule files (like `.mdc` files) rather than `claude_code`. This allows the user to review the diff in their IDE before the change is effectively applied by the AI. This preference will be honored for future rule file modifications.
|
||||
195
.cursor/safari.mdc
Normal file
195
.cursor/safari.mdc
Normal file
|
|
@ -0,0 +1,195 @@
|
|||
---
|
||||
description:
|
||||
globs:
|
||||
alwaysApply: false
|
||||
---
|
||||
#### 5. MCP Inspector Specifics
|
||||
|
||||
* **URL Consistency:** The MCP Inspector URL (`http://127.0.0.1:6274`) was found to be consistent between runs, simplifying Safari targeting.
|
||||
* **"Connected" State vs. iTerm Logs:** A key finding was that the Safari Inspector UI can show "Connected" (and tools subsequently work) even if detailed `DEBUG`-level logs from the launched server process (`start.sh` -> `node dist/server.js`) do not appear in the iTerm console where `npx @modelcontextprotocol/inspector` is running. The Inspector seems to show its own proxying/connection logs, but the full stdout/stderr of the child might not always be visible there. This means successful connection and tool usability are the primary indicators, and absence of detailed server logs in the iTerm console is not necessarily a showstopper for basic interaction, though it would affect deeper debugging of the server itself.
|
||||
|
||||
These notes capture the iterative process and key takeaways from the Safari automation for the MCP Inspector. The successful methods are now enshrined in `mcp-inspector.mdc`, while this document provides the background and context.
|
||||
|
||||
This contrasts with `mcp-inspector.mdc`, which is designed to be the concise, polished, and operational ruleset for future automated runs once a specific automation flow (like connecting to the MCP Inspector) has been stabilized and proven reliable. `mcp-inspector.mdc` should contain the 'final' working scripts and minimal necessary commentary, while `safari.mdc` is the space for the extended antechamber of discovery.
|
||||
|
||||
* **Clarification on `[WORKSPACE_PATH]` Resolution:** The placeholder `[WORKSPACE_PATH]` used in rules (e.g., for script paths like `[WORKSPACE_PATH]/start.sh`) must be dynamically replaced by the AI with the absolute path of the current project workspace. This path is typically available to the AI from its context (e.g., derived from `user_info.workspace_path` or a similar environment variable). It is crucial that the AI ensures the resolved path is correctly quoted if it's used in shell commands or script arguments, especially if the path might contain spaces or special characters. For instance, a path like `/Users/username/My Projects/project-name` should be passed as `'/Users/username/My Projects/project-name'` in a shell command.
|
||||
|
||||
---
|
||||
|
||||
### Strategies for Robust Element Selection
|
||||
|
||||
When automating UI interactions, the reliability of your scripts heavily depends on how you identify and select HTML elements. Here's a hierarchy of preferences and tips for making your selectors more robust:
|
||||
|
||||
1. **`data-testid` Attributes (Gold Standard):**
|
||||
* **Why:** These are custom attributes specifically added for testing and automation. They are decoupled from styling and functional implementation details, making them the most resilient to UI changes.
|
||||
* **Example (CSS):** `[data-testid='user-login-button']`
|
||||
* **Example (XPath):** `//*[@data-testid='user-login-button']`
|
||||
|
||||
2. **Unique `id` Attributes:**
|
||||
* **Why:** `id` attributes are *supposed* to be unique within a page. If developers adhere to this, they are very reliable.
|
||||
* **Example (CSS):** `#submit-form`
|
||||
* **Example (XPath):** `//*[@id='submit-form']`
|
||||
|
||||
3. **Stable `aria-label`, `aria-labelledby`, `role`, or other Accessibility Attributes:**
|
||||
* **Why:** Accessibility attributes are often more stable than class names used for styling, as they relate to the element's function and purpose.
|
||||
* **Example (CSS):** `button[aria-label='Open settings']`
|
||||
* **Example (XPath):** `//button[@aria-label='Open settings']`
|
||||
|
||||
4. **Stable Class Names (Used for Structure/Function, Not Just Styling):**
|
||||
* **Why:** Some class names indicate the structure or function of an element rather than just its appearance. These can be reasonably stable. Avoid classes that are purely presentational (e.g., `color-blue`, `margin-small`).
|
||||
* **Example (CSS):** `.user-profile-card .username` (Contextual selection)
|
||||
* **Example (XPath):** `//div[contains(@class, 'user-profile-card')]//span[contains(@class, 'username')]`
|
||||
|
||||
5. **Structural XPaths (Based on DOM hierarchy):**
|
||||
* **Why:** Relying on the element's position within the DOM (e.g., "the second `div` inside a `section` with a specific header"). These are more brittle than attribute-based selectors because any structural change can break them. Use sparingly and keep them as simple as possible.
|
||||
* **Example (XPath):** `//section[@id='main-content']/div[2]/p`
|
||||
|
||||
6. **Text-Based XPaths (Using visible text):**
|
||||
* **Why:** Selecting elements based on their visible text content (e.g., a button with the text "Submit"). Can be useful, but prone to breakage if the text changes (e.g., for localization or wording updates).
|
||||
* **Example (XPath):** `//button[text()='Submit']` or `//button[contains(text(), 'Submit')]`
|
||||
* **Tip for Robustness:** Use XPath's `normalize-space()` function to handle variations in whitespace (leading, trailing, multiple internal spaces).
|
||||
* `//button[normalize-space(text())='Submit']` (Matches " Submit ", "Submit", " Submit" etc.)
|
||||
* `//a[contains(normalize-space(.), 'Learn More')]` (Checks within any descendant text nodes)
|
||||
|
||||
**General Tips for Selectors:**
|
||||
|
||||
* **Prefer CSS Selectors for Simplicity and Speed:** When applicable, CSS selectors are often more concise and can be faster than XPaths.
|
||||
* **Use Browser Developer Tools:** Actively use the "Inspect Element" feature in your browser to test and refine your CSS selectors and XPaths. Most dev tools allow you to directly test them.
|
||||
* **Avoid Generated IDs/Classes:** Be wary of IDs or class names that look auto-generated (e.g., `id="ext-gen1234"`), as these are likely to change between page loads or application versions.
|
||||
* **Context is Key:** Instead of overly complex global selectors, try to select a stable parent element first, then find the target element within that parent's context. This often leads to simpler and more reliable selectors.
|
||||
|
||||
---
|
||||
|
||||
### Debugging AppleScript `do JavaScript` Execution Flow
|
||||
|
||||
Successfully executing JavaScript via AppleScript's `do JavaScript` command often involves navigating two potential layers of errors: AppleScript parsing errors and JavaScript runtime errors. Here's how to approach debugging:
|
||||
|
||||
**1. Differentiating Error Types:**
|
||||
|
||||
* **AppleScript Compile-Time/Parsing Errors (e.g., `-2741`):**
|
||||
* **Symptom:** The AppleScript editor shows an error, or the script fails immediately when run, often with error messages like "Syntax Error," "Expected end of line but found...", or specific error codes like `-2741` (which typically means the command couldn't be parsed correctly, often due to malformed strings or incorrect quoting).
|
||||
* **Cause:** The AppleScript interpreter itself cannot understand the structure of your `do JavaScript "..."` command, usually due to incorrect quoting or escaping of characters *within the AppleScript string that defines the JavaScript code*.
|
||||
* **The JavaScript code itself hasn't even been sent to Safari yet.**
|
||||
|
||||
* **JavaScript Runtime Errors:**
|
||||
* **Symptom:** The AppleScript command runs without an immediate AppleScript error, but the desired action doesn't occur in Safari, or `do JavaScript` returns an error message from the JavaScript engine (e.g., "TypeError: null is not an object" or "SyntaxError: Unexpected identifier").
|
||||
* **Cause:** The JavaScript code was successfully passed to Safari, but the JavaScript engine encountered an error while trying to execute it (e.g., trying to access a property of a non-existent element, incorrect JS syntax, etc.).
|
||||
|
||||
**2. Debugging AppleScript Syntax/Parsing Errors:**
|
||||
|
||||
* **Simplify the JavaScript String:** Start with the simplest possible JavaScript that should work, e.g.:
|
||||
```applescript
|
||||
tell application "Safari"
|
||||
do JavaScript "'test';" in front document
|
||||
end tell
|
||||
```
|
||||
* **Log the Constructed JavaScript String:** Before the `do JavaScript` line, use AppleScript's `log` command to print the exact JavaScript string you are about to send. This helps you visually inspect it for quoting issues.
|
||||
```applescript
|
||||
set jsCommand to "document.getElementById(\"myButton\").click();"
|
||||
log jsCommand
|
||||
tell application "Safari"
|
||||
do JavaScript jsCommand in front document
|
||||
end tell
|
||||
```
|
||||
Check the logged output carefully in Script Editor's "Messages" tab.
|
||||
* **Build Complex Strings Incrementally:** If your JavaScript is complex, build it in parts using AppleScript variables. This can make it easier to manage quoting for each part.
|
||||
* **Master Quoting:**
|
||||
* If AppleScript string is in double quotes (`"..."`): Escape internal JS double quotes as `\"`. JS single quotes usually don't need escaping.
|
||||
* Use `character id 39` for single quotes if constructing JS with many internal single quotes to avoid confusion: `set sQuote to character id 39`. `set jsCommand to "var name = " & sQuote & "Pete" & sQuote & ";"`
|
||||
|
||||
**3. Debugging JavaScript Runtime Errors:**
|
||||
|
||||
* **Test in Safari's Web Inspector Console:** The most effective way to debug the JavaScript itself is to open Safari, navigate to the target page, open the Web Inspector (Develop > Show Web Inspector), and paste your JavaScript snippet directly into the Console. This provides immediate feedback, error messages, and allows for interactive debugging.
|
||||
* **Use `try...catch` in Your JavaScript:** Wrap your JavaScript code in a `try...catch` block to capture and return error messages back to AppleScript. This can make it much easier to see what went wrong inside Safari.
|
||||
```applescript
|
||||
set jsCommand to "try { document.getElementById('nonExistentElement').value = 'test'; return 'Success'; } catch(e) { return 'JS Error: ' + e.name + ': ' + e.message; }"
|
||||
tell application "Safari"
|
||||
set jsResult to do JavaScript jsCommand in front document
|
||||
log jsResult
|
||||
end tell
|
||||
```
|
||||
* **Return Values for Debugging:** Have your JavaScript return intermediate values or status indicators to AppleScript to understand its state.
|
||||
```applescript
|
||||
set jsCommand to "var el = document.getElementById('myField'); if (el) { return 'Element found!'; } else { return 'Element NOT found.'; }"
|
||||
log (do JavaScript jsCommand in front document)
|
||||
```
|
||||
|
||||
By systematically checking for AppleScript parsing issues first, then moving to debug the JavaScript logic within Safari's environment, you can effectively troubleshoot `do JavaScript` commands.
|
||||
|
||||
---
|
||||
|
||||
### Advanced Asynchronous Handling: Polling for Conditions
|
||||
|
||||
Web pages load and update content asynchronously. Relying on fixed `delay` commands in AppleScript after an action (like a click or page navigation) can be unreliable because the actual time needed for the UI to update can vary due to network speed, server load, or client-side processing.
|
||||
|
||||
A more robust approach is to actively poll for a specific condition to be met (e.g., an element appearing, text changing, a certain JavaScript variable becoming true) before proceeding. This makes your scripts more resilient to timing variations.
|
||||
|
||||
**How Polling Works:**
|
||||
|
||||
1. Define the JavaScript code that checks for your desired condition (this should return `true` or `false`).
|
||||
2. In AppleScript, create a loop that:
|
||||
* Executes the JavaScript check.
|
||||
* If the condition is met, exit the loop.
|
||||
* If not, wait for a short interval (e.g., 0.5 seconds).
|
||||
* Include a counter or timeout mechanism to prevent the loop from running indefinitely if the condition is never met.
|
||||
|
||||
**Example: Polling for 'Connected' Status in MCP Inspector**
|
||||
|
||||
This AppleScript snippet demonstrates polling for the text "Connected" to appear on the page after clicking the connect button:
|
||||
|
||||
```applescript
|
||||
-- JavaScript to check if the page body contains the text "Connected"
|
||||
set jsCheckConnected to "document.body.innerText.includes('Connected');"
|
||||
|
||||
set isNowConnected to false
|
||||
set attempts to 0
|
||||
set maxAttempts to 20 -- Set a reasonable limit, e.g., 20 attempts
|
||||
set pollInterval to 0.5 -- Wait 0.5 seconds between attempts
|
||||
|
||||
log "Polling for 'Connected' status..."
|
||||
|
||||
tell application "Safari"
|
||||
tell front document
|
||||
repeat while isNowConnected is false and attempts < maxAttempts
|
||||
try
|
||||
if (do JavaScript jsCheckConnected) is true then
|
||||
set isNowConnected to true
|
||||
log "Status changed to 'Connected' after " & (attempts + 1) & " attempts."
|
||||
else
|
||||
delay pollInterval
|
||||
end if
|
||||
on error errMsg number errNum
|
||||
log "Error during JavaScript check (attempt " & (attempts + 1) & "): " & errMsg & " (Number: " & errNum & ")"
|
||||
-- Decide if you want to stop on error or just log and continue
|
||||
delay pollInterval -- Still delay even if JS itself errored, maybe it's a temporary issue
|
||||
end try
|
||||
set attempts to attempts + 1
|
||||
end repeat
|
||||
end tell
|
||||
end tell
|
||||
|
||||
if isNowConnected then
|
||||
log "Successfully confirmed 'Connected' status via polling."
|
||||
-- Proceed with next actions that depend on being connected
|
||||
else
|
||||
log "Failed to see 'Connected' status within " & (maxAttempts * pollInterval) & " seconds."
|
||||
-- Handle the failure case (e.g., log error, stop script)
|
||||
end if
|
||||
```
|
||||
|
||||
**Benefits of Polling:**
|
||||
|
||||
* **Increased Reliability:** Scripts wait only as long as necessary, adapting to real-time conditions rather than fixed, potentially too short or too long, delays.
|
||||
* **Reduced Brittleness:** Less likely to fail due to unexpected slowdowns.
|
||||
* **Clearer Intent:** The script explicitly states what condition it's waiting for.
|
||||
|
||||
**Considerations:**
|
||||
|
||||
* **Timeout:** Always implement a maximum number of attempts or a total timeout to prevent infinite loops if the condition never occurs.
|
||||
* **Poll Interval:** Choose a reasonable interval. Too short can be resource-intensive; too long can make the script feel sluggish.
|
||||
* **Error Handling:** Include `try...on error` blocks within your loop to gracefully handle potential errors during the JavaScript execution (e.g., if the page is still transitioning and elements are not yet available).
|
||||
|
||||
---
|
||||
|
||||
### Meta-Level Collaboration & Rule Evolution Notes
|
||||
|
||||
1614
.cursor/scripts/peekaboo.scpt
Executable file
1614
.cursor/scripts/peekaboo.scpt
Executable file
File diff suppressed because it is too large
Load diff
774
.cursor/scripts/terminator.scpt
Executable file
774
.cursor/scripts/terminator.scpt
Executable file
|
|
@ -0,0 +1,774 @@
|
|||
#!/usr/bin/osascript
|
||||
--------------------------------------------------------------------------------
|
||||
-- terminator.scpt - v0.6.1 Enhanced "T-1000"
|
||||
-- Enhanced Terminal session management with smart session reuse and better error reporting
|
||||
-- Features: Smart session reuse, enhanced error reporting, improved timing, better output formatting
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
--#region Configuration Properties
|
||||
property maxCommandWaitTime : 15.0 -- Increased from 10.0 for better reliability
|
||||
property pollIntervalForBusyCheck : 0.1
|
||||
property startupDelayForTerminal : 0.7
|
||||
property minTailLinesOnWrite : 100 -- Increased from 15 for better build log visibility
|
||||
property defaultTailLines : 100 -- Increased from 30 for better build log visibility
|
||||
property tabTitlePrefix : "🤖💥 " -- For the window/tab title itself
|
||||
property scriptInfoPrefix : "Terminator 🤖💥: " -- For messages generated by this script
|
||||
property projectIdentifierInTitle : "Project: "
|
||||
property taskIdentifierInTitle : " - Task: "
|
||||
property enableFuzzyTagGrouping : true
|
||||
property fuzzyGroupingMinPrefixLength : 4
|
||||
|
||||
-- Safe enhanced properties (minimal additions)
|
||||
property enhancedErrorReporting : true
|
||||
property verboseLogging : false
|
||||
--#endregion Configuration Properties
|
||||
|
||||
--#region Helper Functions
|
||||
on isValidPath(thePath)
|
||||
if thePath is not "" and (thePath starts with "/") then
|
||||
if not (thePath contains " -") then -- Basic heuristic
|
||||
return true
|
||||
end if
|
||||
end if
|
||||
return false
|
||||
end isValidPath
|
||||
|
||||
on getPathComponent(thePath, componentIndex)
|
||||
set oldDelims to AppleScript's text item delimiters
|
||||
set AppleScript's text item delimiters to "/"
|
||||
set pathParts to text items of thePath
|
||||
set AppleScript's text item delimiters to oldDelims
|
||||
set nonEmptyParts to {}
|
||||
repeat with aPart in pathParts
|
||||
if aPart is not "" then set end of nonEmptyParts to aPart
|
||||
end repeat
|
||||
if (count nonEmptyParts) = 0 then return ""
|
||||
try
|
||||
if componentIndex is -1 then
|
||||
return item -1 of nonEmptyParts
|
||||
else if componentIndex > 0 and componentIndex ≤ (count nonEmptyParts) then
|
||||
return item componentIndex of nonEmptyParts
|
||||
end if
|
||||
on error
|
||||
return ""
|
||||
end try
|
||||
return ""
|
||||
end getPathComponent
|
||||
|
||||
on generateWindowTitle(taskTag as text, projectGroup as text)
|
||||
if projectGroup is not "" then
|
||||
return tabTitlePrefix & projectIdentifierInTitle & projectGroup & taskIdentifierInTitle & taskTag
|
||||
else
|
||||
return tabTitlePrefix & taskTag
|
||||
end if
|
||||
end generateWindowTitle
|
||||
|
||||
on bufferContainsMeaningfulContentAS(multiLineText, knownInfoPrefix as text, commonShellPrompts as list)
|
||||
if multiLineText is "" then return false
|
||||
|
||||
-- Simple approach: if the trimmed content is substantial and not just our info messages, consider it meaningful
|
||||
set trimmedText to my trimWhitespace(multiLineText)
|
||||
if (length of trimmedText) < 3 then return false
|
||||
|
||||
-- Check if it's only our script info messages
|
||||
if trimmedText starts with knownInfoPrefix then
|
||||
-- If it's ONLY our message and nothing else meaningful, return false
|
||||
set oldDelims to AppleScript's text item delimiters
|
||||
set AppleScript's text item delimiters to linefeed
|
||||
set textLines to text items of multiLineText
|
||||
set AppleScript's text item delimiters to oldDelims
|
||||
|
||||
set nonInfoLines to 0
|
||||
repeat with aLine in textLines
|
||||
set currentLine to my trimWhitespace(aLine as text)
|
||||
if currentLine is not "" and not (currentLine starts with knownInfoPrefix) then
|
||||
set nonInfoLines to nonInfoLines + 1
|
||||
end if
|
||||
end repeat
|
||||
|
||||
-- If we have substantial non-info content, consider it meaningful
|
||||
return (nonInfoLines > 2)
|
||||
end if
|
||||
|
||||
-- If content doesn't start with our info prefix, likely contains command output
|
||||
return true
|
||||
end bufferContainsMeaningfulContentAS
|
||||
|
||||
-- Enhanced error reporting helper
|
||||
on formatErrorMessage(errorType, errorMsg, context)
|
||||
if enhancedErrorReporting then
|
||||
set formattedMsg to scriptInfoPrefix & errorType & ": " & errorMsg
|
||||
if context is not "" then
|
||||
set formattedMsg to formattedMsg & " (Context: " & context & ")"
|
||||
end if
|
||||
return formattedMsg
|
||||
else
|
||||
return scriptInfoPrefix & errorMsg
|
||||
end if
|
||||
end formatErrorMessage
|
||||
|
||||
-- Enhanced logging helper
|
||||
on logVerbose(message)
|
||||
if verboseLogging then
|
||||
log "🔍 " & message
|
||||
end if
|
||||
end logVerbose
|
||||
--#endregion Helper Functions
|
||||
|
||||
--#region Main Script Logic (on run)
|
||||
on run argv
|
||||
set appSpecificErrorOccurred to false
|
||||
try
|
||||
my logVerbose("Starting Terminator v0.6.0 Safe Enhanced")
|
||||
|
||||
tell application "System Events"
|
||||
if not (exists process "Terminal") then
|
||||
launch application id "com.apple.Terminal"
|
||||
delay startupDelayForTerminal
|
||||
end if
|
||||
end tell
|
||||
|
||||
set originalArgCount to count argv
|
||||
if originalArgCount < 1 then return my usageText()
|
||||
|
||||
set projectPathArg to ""
|
||||
set actualArgsForParsing to argv
|
||||
if originalArgCount > 0 then
|
||||
set potentialPath to item 1 of argv
|
||||
if my isValidPath(potentialPath) then
|
||||
set projectPathArg to potentialPath
|
||||
my logVerbose("Detected project path: " & projectPathArg)
|
||||
if originalArgCount > 1 then
|
||||
set actualArgsForParsing to items 2 thru -1 of argv
|
||||
else
|
||||
return my formatErrorMessage("Argument Error", "Project path \"" & projectPathArg & "\" provided, but no task tag or command specified." & linefeed & linefeed & my usageText(), "")
|
||||
end if
|
||||
end if
|
||||
end if
|
||||
|
||||
if (count actualArgsForParsing) < 1 then return my usageText()
|
||||
|
||||
set taskTagName to item 1 of actualArgsForParsing
|
||||
my logVerbose("Task tag: " & taskTagName)
|
||||
|
||||
if (length of taskTagName) > 40 or (not my tagOK(taskTagName)) then
|
||||
set errorMsg to "Task Tag missing or invalid: \"" & taskTagName & "\"." & linefeed & linefeed & ¬
|
||||
"A 'task tag' (e.g., 'build', 'tests') is a short name (1-40 letters, digits, -, _) " & ¬
|
||||
"to identify a specific task, optionally within a project session." & linefeed & linefeed
|
||||
return my formatErrorMessage("Validation Error", errorMsg & my usageText(), "tag validation")
|
||||
end if
|
||||
|
||||
set doWrite to false
|
||||
set shellCmd to ""
|
||||
set originalUserShellCmd to ""
|
||||
set currentTailLines to defaultTailLines
|
||||
set explicitLinesProvided to false
|
||||
set argCountAfterTagOrPath to count actualArgsForParsing
|
||||
|
||||
if argCountAfterTagOrPath > 1 then
|
||||
set commandParts to items 2 thru -1 of actualArgsForParsing
|
||||
if (count commandParts) > 0 then
|
||||
set lastOfCmdParts to item -1 of commandParts
|
||||
if my isInteger(lastOfCmdParts) then
|
||||
set currentTailLines to (lastOfCmdParts as integer)
|
||||
set explicitLinesProvided to true
|
||||
my logVerbose("Explicit lines requested: " & currentTailLines)
|
||||
if (count commandParts) > 1 then
|
||||
set commandParts to items 1 thru -2 of commandParts
|
||||
else
|
||||
set commandParts to {}
|
||||
end if
|
||||
end if
|
||||
end if
|
||||
if (count commandParts) > 0 then
|
||||
set originalUserShellCmd to my joinList(commandParts, " ")
|
||||
my logVerbose("Command detected: " & originalUserShellCmd)
|
||||
end if
|
||||
else if argCountAfterTagOrPath = 1 then
|
||||
-- Only taskTagName was provided after potential projectPathArg
|
||||
-- This is a read operation by default.
|
||||
my logVerbose("Read-only operation detected")
|
||||
end if
|
||||
|
||||
if originalUserShellCmd is not "" and (my trimWhitespace(originalUserShellCmd) is not "") then
|
||||
set doWrite to true
|
||||
set shellCmd to originalUserShellCmd
|
||||
else if projectPathArg is not "" and originalUserShellCmd is "" then
|
||||
-- Path provided, task tag, and empty command string "" OR no command string but lines_to_read was there
|
||||
set doWrite to true
|
||||
set shellCmd to "" -- will become 'cd path'
|
||||
my logVerbose("CD-only operation for path: " & projectPathArg)
|
||||
else
|
||||
set doWrite to false
|
||||
set shellCmd to ""
|
||||
end if
|
||||
|
||||
if currentTailLines < 1 then set currentTailLines to 1
|
||||
if doWrite and (shellCmd is not "" or projectPathArg is not "") and currentTailLines < minTailLinesOnWrite then
|
||||
set currentTailLines to minTailLinesOnWrite
|
||||
my logVerbose("Increased tail lines for write operation: " & currentTailLines)
|
||||
end if
|
||||
|
||||
if projectPathArg is not "" and doWrite then
|
||||
set quotedProjectPath to quoted form of projectPathArg
|
||||
if shellCmd is not "" then
|
||||
set shellCmd to "cd " & quotedProjectPath & " && " & shellCmd
|
||||
else
|
||||
set shellCmd to "cd " & quotedProjectPath
|
||||
end if
|
||||
my logVerbose("Final command: " & shellCmd)
|
||||
end if
|
||||
|
||||
set derivedProjectGroup to ""
|
||||
if projectPathArg is not "" then
|
||||
set derivedProjectGroup to my getPathComponent(projectPathArg, -1)
|
||||
if derivedProjectGroup is "" then set derivedProjectGroup to "DefaultProject"
|
||||
my logVerbose("Project group: " & derivedProjectGroup)
|
||||
end if
|
||||
|
||||
set allowCreation to false
|
||||
if doWrite then
|
||||
set allowCreation to true
|
||||
else if explicitLinesProvided then
|
||||
set allowCreation to true
|
||||
end if
|
||||
|
||||
set effectiveTabTitleForLookup to my generateWindowTitle(taskTagName, derivedProjectGroup)
|
||||
my logVerbose("Tab title: " & effectiveTabTitleForLookup)
|
||||
|
||||
set tabInfo to my ensureTabAndWindow(taskTagName, derivedProjectGroup, allowCreation, effectiveTabTitleForLookup)
|
||||
|
||||
if tabInfo is missing value then
|
||||
if not allowCreation then
|
||||
set errorMsg to "Terminal session \"" & effectiveTabTitleForLookup & "\" not found." & linefeed & ¬
|
||||
"To create this session, provide a command (even an empty string \"\" if only 'cd'-ing to a project path), " & ¬
|
||||
"or specify lines to read (e.g., ... \"" & taskTagName & "\" 1)." & linefeed
|
||||
if projectPathArg is not "" then
|
||||
set errorMsg to errorMsg & "Project path was specified as: \"" & projectPathArg & "\"." & linefeed
|
||||
else
|
||||
set errorMsg to errorMsg & "If this is for a new project, provide the absolute project path as the first argument." & linefeed
|
||||
end if
|
||||
return my formatErrorMessage("Session Error", errorMsg & linefeed & my usageText(), "session lookup")
|
||||
else
|
||||
return my formatErrorMessage("Creation Error", "Could not find or create Terminal tab for \"" & effectiveTabTitleForLookup & "\". Check permissions/Terminal state.", "tab creation")
|
||||
end if
|
||||
end if
|
||||
|
||||
set targetTab to targetTab of tabInfo
|
||||
set parentWindow to parentWindow of tabInfo
|
||||
set wasNewlyCreated to wasNewlyCreated of tabInfo
|
||||
set createdInExistingViaFuzzy to createdInExistingWindowViaFuzzy of tabInfo
|
||||
|
||||
my logVerbose("Tab info - new: " & wasNewlyCreated & ", fuzzy: " & createdInExistingViaFuzzy)
|
||||
|
||||
set bufferText to ""
|
||||
set commandTimedOut to false
|
||||
set tabWasBusyOnRead to false
|
||||
set previousCommandActuallyStopped to true
|
||||
set attemptMadeToStopPreviousCommand to false
|
||||
set identifiedBusyProcessName to ""
|
||||
set theTTYForInfo to ""
|
||||
|
||||
if not doWrite and wasNewlyCreated then
|
||||
if createdInExistingViaFuzzy then
|
||||
return scriptInfoPrefix & "New tab \"" & effectiveTabTitleForLookup & "\" created in existing project window and ready."
|
||||
else
|
||||
return scriptInfoPrefix & "New tab \"" & effectiveTabTitleForLookup & "\" (in new window) created and ready."
|
||||
end if
|
||||
end if
|
||||
|
||||
tell application id "com.apple.Terminal"
|
||||
try
|
||||
set index of parentWindow to 1
|
||||
set selected tab of parentWindow to targetTab
|
||||
if wasNewlyCreated and doWrite then
|
||||
delay 0.4
|
||||
else
|
||||
delay 0.1
|
||||
end if
|
||||
|
||||
if doWrite and shellCmd is not "" then
|
||||
my logVerbose("Executing command: " & shellCmd)
|
||||
set canProceedWithWrite to true
|
||||
if busy of targetTab then
|
||||
if not wasNewlyCreated or createdInExistingViaFuzzy then
|
||||
set attemptMadeToStopPreviousCommand to true
|
||||
set previousCommandActuallyStopped to false
|
||||
try
|
||||
set theTTYForInfo to my trimWhitespace(tty of targetTab)
|
||||
end try
|
||||
set processesBefore to {}
|
||||
try
|
||||
set processesBefore to processes of targetTab
|
||||
end try
|
||||
set commonShells to {"login", "bash", "zsh", "sh", "tcsh", "ksh", "-bash", "-zsh", "-sh", "-tcsh", "-ksh", "dtterm", "fish"}
|
||||
set identifiedBusyProcessName to ""
|
||||
if (count of processesBefore) > 0 then
|
||||
repeat with i from (count of processesBefore) to 1 by -1
|
||||
set aProcessName to item i of processesBefore
|
||||
if aProcessName is not in commonShells then
|
||||
set identifiedBusyProcessName to aProcessName
|
||||
exit repeat
|
||||
end if
|
||||
end repeat
|
||||
end if
|
||||
my logVerbose("Busy process identified: " & identifiedBusyProcessName)
|
||||
set processToTargetForKill to identifiedBusyProcessName
|
||||
set killedViaPID to false
|
||||
if theTTYForInfo is not "" and processToTargetForKill is not "" then
|
||||
set shortTTY to text 6 thru -1 of theTTYForInfo
|
||||
set pidsToKillText to ""
|
||||
try
|
||||
set psCommand to "ps -t " & shortTTY & " -o pid,comm | awk '$2 == \"" & processToTargetForKill & "\" {print $1}'"
|
||||
set pidsToKillText to do shell script psCommand
|
||||
end try
|
||||
if pidsToKillText is not "" then
|
||||
set oldDelims to AppleScript's text item delimiters
|
||||
set AppleScript's text item delimiters to linefeed
|
||||
set pidList to text items of pidsToKillText
|
||||
set AppleScript's text item delimiters to oldDelims
|
||||
repeat with aPID in pidList
|
||||
set aPID to my trimWhitespace(aPID)
|
||||
if aPID is not "" then
|
||||
try
|
||||
do shell script "kill -INT " & aPID
|
||||
delay 0.3
|
||||
do shell script "kill -0 " & aPID
|
||||
try
|
||||
do shell script "kill -KILL " & aPID
|
||||
delay 0.2
|
||||
try
|
||||
do shell script "kill -0 " & aPID
|
||||
on error
|
||||
set previousCommandActuallyStopped to true
|
||||
end try
|
||||
end try
|
||||
on error
|
||||
set previousCommandActuallyStopped to true
|
||||
end try
|
||||
end if
|
||||
if previousCommandActuallyStopped then
|
||||
set killedViaPID to true
|
||||
exit repeat
|
||||
end if
|
||||
end repeat
|
||||
end if
|
||||
end if
|
||||
if not previousCommandActuallyStopped and busy of targetTab then
|
||||
activate
|
||||
delay 0.5
|
||||
tell application "System Events" to keystroke "c" using control down
|
||||
delay 0.6
|
||||
if not (busy of targetTab) then
|
||||
set previousCommandActuallyStopped to true
|
||||
if identifiedBusyProcessName is not "" and (identifiedBusyProcessName is in (processes of targetTab)) then
|
||||
set previousCommandActuallyStopped to false
|
||||
end if
|
||||
end if
|
||||
else if not busy of targetTab then
|
||||
set previousCommandActuallyStopped to true
|
||||
end if
|
||||
if not previousCommandActuallyStopped then
|
||||
set canProceedWithWrite to false
|
||||
end if
|
||||
else if wasNewlyCreated and not createdInExistingViaFuzzy and busy of targetTab then
|
||||
delay 0.4
|
||||
if busy of targetTab then
|
||||
set attemptMadeToStopPreviousCommand to true
|
||||
set previousCommandActuallyStopped to false
|
||||
set identifiedBusyProcessName to "extended initialization"
|
||||
set canProceedWithWrite to false
|
||||
else
|
||||
set previousCommandActuallyStopped to true
|
||||
end if
|
||||
end if
|
||||
end if
|
||||
|
||||
if canProceedWithWrite then
|
||||
-- Clear before write to prevent output truncation (only for reused tabs)
|
||||
if not wasNewlyCreated then
|
||||
do script "clear" in targetTab
|
||||
delay 0.1
|
||||
end if
|
||||
do script shellCmd in targetTab
|
||||
set commandStartTime to current date
|
||||
set commandFinished to false
|
||||
repeat while ((current date) - commandStartTime) < maxCommandWaitTime
|
||||
if not (busy of targetTab) then
|
||||
set commandFinished to true
|
||||
exit repeat
|
||||
end if
|
||||
delay pollIntervalForBusyCheck
|
||||
end repeat
|
||||
if not commandFinished then set commandTimedOut to true
|
||||
if commandFinished then delay 0.2 -- Increased from 0.1 for better output settling
|
||||
my logVerbose("Command execution completed, timeout: " & commandTimedOut)
|
||||
end if
|
||||
else if not doWrite then
|
||||
if busy of targetTab then
|
||||
set tabWasBusyOnRead to true
|
||||
try
|
||||
set theTTYForInfo to my trimWhitespace(tty of targetTab)
|
||||
end try
|
||||
set processesReading to processes of targetTab
|
||||
set commonShells to {"login", "bash", "zsh", "sh", "tcsh", "ksh", "-bash", "-zsh", "-sh", "-tcsh", "-ksh", "dtterm", "fish"}
|
||||
set identifiedBusyProcessName to ""
|
||||
if (count of processesReading) > 0 then
|
||||
repeat with i from (count of processesReading) to 1 by -1
|
||||
set aProcessName to item i of processesReading
|
||||
if aProcessName is not in commonShells then
|
||||
set identifiedBusyProcessName to aProcessName
|
||||
exit repeat
|
||||
end if
|
||||
end repeat
|
||||
end if
|
||||
my logVerbose("Tab busy during read with: " & identifiedBusyProcessName)
|
||||
end if
|
||||
end if
|
||||
|
||||
set bufferText to history of targetTab
|
||||
on error errMsg number errNum
|
||||
set appSpecificErrorOccurred to true
|
||||
return my formatErrorMessage("Terminal Error", errMsg, "error " & errNum)
|
||||
end try
|
||||
end tell
|
||||
|
||||
set appendedMessage to ""
|
||||
set ttyInfoStringForMessage to ""
|
||||
if theTTYForInfo is not "" then set ttyInfoStringForMessage to " (TTY " & theTTYForInfo & ")"
|
||||
if attemptMadeToStopPreviousCommand then
|
||||
set processNameToReport to "process"
|
||||
if identifiedBusyProcessName is not "" and identifiedBusyProcessName is not "extended initialization" then
|
||||
set processNameToReport to "'" & identifiedBusyProcessName & "'"
|
||||
else if identifiedBusyProcessName is "extended initialization" then
|
||||
set processNameToReport to "tab's extended initialization"
|
||||
end if
|
||||
if previousCommandActuallyStopped then
|
||||
set appendedMessage to linefeed & scriptInfoPrefix & "Previous " & processNameToReport & ttyInfoStringForMessage & " was interrupted. ---"
|
||||
else
|
||||
set appendedMessage to linefeed & scriptInfoPrefix & "Attempted to interrupt previous " & processNameToReport & ttyInfoStringForMessage & ", but it may still be running. New command NOT executed. ---"
|
||||
end if
|
||||
end if
|
||||
if commandTimedOut then
|
||||
set cmdForMsg to originalUserShellCmd
|
||||
if projectPathArg is not "" and originalUserShellCmd is not "" then set cmdForMsg to originalUserShellCmd & " (in " & projectPathArg & ")"
|
||||
if projectPathArg is not "" and originalUserShellCmd is "" then set cmdForMsg to "(cd " & projectPathArg & ")"
|
||||
set appendedMessage to appendedMessage & linefeed & scriptInfoPrefix & "Command '" & cmdForMsg & "' may still be running. Returned after " & maxCommandWaitTime & "s timeout. ---"
|
||||
else if tabWasBusyOnRead then
|
||||
set processNameToReportOnRead to "process"
|
||||
if identifiedBusyProcessName is not "" then set processNameToReportOnRead to "'" & identifiedBusyProcessName & "'"
|
||||
set busyProcessInfoString to ""
|
||||
if identifiedBusyProcessName is not "" then set busyProcessInfoString to " with " & processNameToReportOnRead
|
||||
set appendedMessage to appendedMessage & linefeed & scriptInfoPrefix & "Tab" & ttyInfoStringForMessage & " was busy" & busyProcessInfoString & " during read. Output may be from an ongoing process. ---"
|
||||
end if
|
||||
|
||||
if appendedMessage is not "" then
|
||||
if bufferText is "" then
|
||||
set bufferText to my trimWhitespace(appendedMessage)
|
||||
else
|
||||
set bufferText to bufferText & appendedMessage
|
||||
end if
|
||||
end if
|
||||
|
||||
set tailedOutput to my tailBufferAS(bufferText, currentTailLines)
|
||||
set finalResult to my trimBlankLinesAS(tailedOutput)
|
||||
|
||||
if finalResult is "" then
|
||||
set effectiveOriginalCmdForMsg to originalUserShellCmd
|
||||
if projectPathArg is not "" and originalUserShellCmd is "" then
|
||||
set effectiveOriginalCmdForMsg to "(cd " & projectPathArg & ")"
|
||||
else if projectPathArg is not "" and originalUserShellCmd is not "" then
|
||||
set effectiveOriginalCmdForMsg to originalUserShellCmd & " (in " & projectPathArg & ")"
|
||||
end if
|
||||
|
||||
set baseMsgInfo to "Session \"" & effectiveTabTitleForLookup & "\", requested " & currentTailLines & " lines."
|
||||
set specificAppendedInfo to my trimWhitespace(appendedMessage)
|
||||
set suffixForReturn to ""
|
||||
if specificAppendedInfo is not "" then set suffixForReturn to linefeed & specificAppendedInfo
|
||||
|
||||
if attemptMadeToStopPreviousCommand and not previousCommandActuallyStopped then
|
||||
return my formatErrorMessage("Process Error", "Previous command/initialization in session \"" & effectiveTabTitleForLookup & "\"" & ttyInfoStringForMessage & " may not have terminated. New command '" & effectiveOriginalCmdForMsg & "' NOT executed." & suffixForReturn, "process termination")
|
||||
else if commandTimedOut then
|
||||
return my formatErrorMessage("Timeout Error", "Command '" & effectiveOriginalCmdForMsg & "' timed out after " & maxCommandWaitTime & "s. No other output. " & baseMsgInfo & suffixForReturn, "command timeout")
|
||||
else if tabWasBusyOnRead then
|
||||
return my formatErrorMessage("Busy Error", "Tab for session \"" & effectiveTabTitleForLookup & "\" was busy during read. No other output. " & baseMsgInfo & suffixForReturn, "read busy")
|
||||
else if doWrite and shellCmd is not "" then
|
||||
return scriptInfoPrefix & "Command '" & effectiveOriginalCmdForMsg & "' executed in session \"" & effectiveTabTitleForLookup & "\". No output captured."
|
||||
else
|
||||
return scriptInfoPrefix & "No meaningful content found in session \"" & effectiveTabTitleForLookup & "\"."
|
||||
end if
|
||||
end if
|
||||
|
||||
my logVerbose("Returning " & (length of finalResult) & " characters of output")
|
||||
return finalResult
|
||||
|
||||
on error generalErrorMsg number generalErrorNum
|
||||
if appSpecificErrorOccurred then error generalErrorMsg number generalErrorNum
|
||||
return my formatErrorMessage("Execution Error", generalErrorMsg, "error " & generalErrorNum)
|
||||
end try
|
||||
end run
|
||||
--#endregion Main Script Logic (on run)
|
||||
|
||||
--#region Helper Functions
|
||||
on ensureTabAndWindow(taskTagName as text, projectGroupName as text, allowCreate as boolean, desiredFullTitle as text)
|
||||
set wasActuallyCreated to false
|
||||
set createdInExistingViaFuzzy to false
|
||||
|
||||
tell application id "com.apple.Terminal"
|
||||
try
|
||||
repeat with w in windows
|
||||
repeat with tb in tabs of w
|
||||
try
|
||||
if custom title of tb is desiredFullTitle then
|
||||
set selected tab of w to tb
|
||||
return {targetTab:tb, parentWindow:w, wasNewlyCreated:false, createdInExistingWindowViaFuzzy:false}
|
||||
end if
|
||||
end try
|
||||
end repeat
|
||||
end repeat
|
||||
end try
|
||||
|
||||
if allowCreate and enableFuzzyTagGrouping and projectGroupName is not "" then
|
||||
set projectGroupSearchPatternForWindowName to tabTitlePrefix & projectIdentifierInTitle & projectGroupName
|
||||
try
|
||||
repeat with w in windows
|
||||
try
|
||||
-- Look for any window that contains our project name
|
||||
if name of w contains projectGroupSearchPatternForWindowName or name of w contains (projectIdentifierInTitle & projectGroupName) then
|
||||
if not frontmost then activate
|
||||
delay 0.2
|
||||
set newTabInGroup to do script "" in w
|
||||
delay 0.3
|
||||
set custom title of newTabInGroup to desiredFullTitle
|
||||
delay 0.2
|
||||
set selected tab of w to newTabInGroup
|
||||
return {targetTab:newTabInGroup, parentWindow:w, wasNewlyCreated:true, createdInExistingWindowViaFuzzy:true}
|
||||
end if
|
||||
end try
|
||||
end repeat
|
||||
end try
|
||||
end if
|
||||
|
||||
-- Enhanced fallback: if no project-specific window found, try to use any existing Terminator window
|
||||
if allowCreate and enableFuzzyTagGrouping then
|
||||
try
|
||||
repeat with w in windows
|
||||
try
|
||||
if name of w contains tabTitlePrefix then
|
||||
-- Found an existing Terminator window, use it for grouping
|
||||
if not frontmost then activate
|
||||
delay 0.2
|
||||
set newTabInGroup to do script "" in w
|
||||
delay 0.3
|
||||
set custom title of newTabInGroup to desiredFullTitle
|
||||
delay 0.2
|
||||
set selected tab of w to newTabInGroup
|
||||
return {targetTab:newTabInGroup, parentWindow:w, wasNewlyCreated:true, createdInExistingWindowViaFuzzy:true}
|
||||
end if
|
||||
end try
|
||||
end repeat
|
||||
end try
|
||||
end if
|
||||
|
||||
if allowCreate then
|
||||
try
|
||||
if not frontmost then activate
|
||||
delay 0.3
|
||||
set newTabInNewWindow to do script ""
|
||||
set wasActuallyCreated to true
|
||||
delay 0.4
|
||||
set custom title of newTabInNewWindow to desiredFullTitle
|
||||
delay 0.2
|
||||
set parentWinOfNew to missing value
|
||||
try
|
||||
set parentWinOfNew to window of newTabInNewWindow
|
||||
on error
|
||||
if (count of windows) > 0 then set parentWinOfNew to front window
|
||||
end try
|
||||
if parentWinOfNew is not missing value then
|
||||
if custom title of newTabInNewWindow is desiredFullTitle then
|
||||
set selected tab of parentWinOfNew to newTabInNewWindow
|
||||
return {targetTab:newTabInNewWindow, parentWindow:parentWinOfNew, wasNewlyCreated:wasActuallyCreated, createdInExistingWindowViaFuzzy:false}
|
||||
end if
|
||||
end if
|
||||
repeat with w_final_scan in windows
|
||||
repeat with tb_final_scan in tabs of w_final_scan
|
||||
try
|
||||
if custom title of tb_final_scan is desiredFullTitle then
|
||||
set selected tab of w_final_scan to tb_final_scan
|
||||
return {targetTab:tb_final_scan, parentWindow:w_final_scan, wasNewlyCreated:wasActuallyCreated, createdInExistingWindowViaFuzzy:false}
|
||||
end if
|
||||
end try
|
||||
end repeat
|
||||
end repeat
|
||||
return missing value
|
||||
on error
|
||||
return missing value
|
||||
end try
|
||||
else
|
||||
return missing value
|
||||
end if
|
||||
end tell
|
||||
end ensureTabAndWindow
|
||||
|
||||
on tailBufferAS(txt, n)
|
||||
set AppleScript's text item delimiters to linefeed
|
||||
set lst to text items of txt
|
||||
if (count lst) = 0 then return ""
|
||||
set startN to (count lst) - (n - 1)
|
||||
if startN < 1 then set startN to 1
|
||||
set slice to items startN thru -1 of lst
|
||||
set outText to slice as text
|
||||
set AppleScript's text item delimiters to ""
|
||||
return outText
|
||||
end tailBufferAS
|
||||
|
||||
on lineIsEffectivelyEmptyAS(aLine)
|
||||
if aLine is "" then return true
|
||||
set trimmedLine to my trimWhitespace(aLine)
|
||||
return (trimmedLine is "")
|
||||
end lineIsEffectivelyEmptyAS
|
||||
|
||||
on trimBlankLinesAS(txt)
|
||||
if txt is "" then return ""
|
||||
set oldDelims to AppleScript's text item delimiters
|
||||
set AppleScript's text item delimiters to {linefeed}
|
||||
set originalLines to text items of txt
|
||||
set linesToProcess to {}
|
||||
repeat with aLineRef in originalLines
|
||||
set aLine to contents of aLineRef
|
||||
if my lineIsEffectivelyEmptyAS(aLine) then
|
||||
set end of linesToProcess to ""
|
||||
else
|
||||
set end of linesToProcess to aLine
|
||||
end if
|
||||
end repeat
|
||||
set firstContentLine to 1
|
||||
repeat while firstContentLine ≤ (count linesToProcess) and (item firstContentLine of linesToProcess is "")
|
||||
set firstContentLine to firstContentLine + 1
|
||||
end repeat
|
||||
set lastContentLine to count linesToProcess
|
||||
repeat while lastContentLine ≥ firstContentLine and (item lastContentLine of linesToProcess is "")
|
||||
set lastContentLine to lastContentLine - 1
|
||||
end repeat
|
||||
if firstContentLine > lastContentLine then
|
||||
set AppleScript's text item delimiters to oldDelims
|
||||
return ""
|
||||
end if
|
||||
set resultLines to items firstContentLine thru lastContentLine of linesToProcess
|
||||
set AppleScript's text item delimiters to linefeed
|
||||
set trimmedTxt to resultLines as text
|
||||
set AppleScript's text item delimiters to oldDelims
|
||||
return trimmedTxt
|
||||
end trimBlankLinesAS
|
||||
|
||||
on trimWhitespace(theText)
|
||||
set whitespaceChars to {" ", tab}
|
||||
set newText to theText
|
||||
repeat while (newText is not "") and (character 1 of newText is in whitespaceChars)
|
||||
if (length of newText) > 1 then
|
||||
set newText to text 2 thru -1 of newText
|
||||
else
|
||||
set newText to ""
|
||||
end if
|
||||
end repeat
|
||||
repeat while (newText is not "") and (character -1 of newText is in whitespaceChars)
|
||||
if (length of newText) > 1 then
|
||||
set newText to text 1 thru -2 of newText
|
||||
else
|
||||
set newText to ""
|
||||
end if
|
||||
end repeat
|
||||
return newText
|
||||
end trimWhitespace
|
||||
|
||||
on isInteger(v)
|
||||
try
|
||||
v as integer
|
||||
return true
|
||||
on error
|
||||
return false
|
||||
end try
|
||||
end isInteger
|
||||
|
||||
on tagOK(t)
|
||||
try
|
||||
do shell script "/bin/echo " & quoted form of t & " | /usr/bin/grep -E -q '^[A-Za-z0-9_-]+$'"
|
||||
return true
|
||||
on error
|
||||
return false
|
||||
end try
|
||||
end tagOK
|
||||
|
||||
on joinList(theList, theDelimiter)
|
||||
set oldDelims to AppleScript's text item delimiters
|
||||
set AppleScript's text item delimiters to theDelimiter
|
||||
set theText to theList as text
|
||||
set AppleScript's text item delimiters to oldDelims
|
||||
return theText
|
||||
end joinList
|
||||
|
||||
on usageText()
|
||||
set LF to linefeed
|
||||
set scriptName to "terminator.scpt"
|
||||
set exampleProject to "/Users/name/Projects/FancyApp"
|
||||
set exampleProjectNameForTitle to my getPathComponent(exampleProject, -1)
|
||||
if exampleProjectNameForTitle is "" then set exampleProjectNameForTitle to "DefaultProject"
|
||||
set exampleTaskTag to "build_frontend"
|
||||
set exampleFullCommand to "npm run build"
|
||||
|
||||
set generatedExampleTitle to my generateWindowTitle(exampleTaskTag, exampleProjectNameForTitle)
|
||||
|
||||
set outText to scriptName & " - v0.6.0 Enhanced \"T-1000\" – AppleScript Terminal helper" & LF & LF
|
||||
set outText to outText & "Enhancements: Smart session reuse, enhanced error reporting, verbose logging (optional)" & LF & LF
|
||||
set outText to outText & "Manages dedicated, tagged Terminal sessions, grouped by project path." & LF & LF
|
||||
|
||||
set outText to outText & "Core Concept:" & LF
|
||||
set outText to outText & " 1. For a NEW project, provide the absolute project path FIRST, then task tag, then command:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"" & exampleTaskTag & "\" \"" & exampleFullCommand & "\"" & LF
|
||||
set outText to outText & " The script will 'cd' into the project path and run the command." & LF
|
||||
set outText to outText & " The tab will be titled like: \"" & generatedExampleTitle & "\"" & LF
|
||||
set outText to outText & " 2. For SUBSEQUENT commands for THE SAME PROJECT, use the project path and task tag:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"" & exampleTaskTag & "\" \"another_command\"" & LF
|
||||
set outText to outText & " 3. To simply READ from an existing session (path & tag must identify an existing session):" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"" & exampleTaskTag & "\"" & LF
|
||||
set outText to outText & " A READ operation on a non-existent tag (without path/command to create) will error." & LF & LF
|
||||
|
||||
set outText to outText & "Title Format: \"" & tabTitlePrefix & projectIdentifierInTitle & "<ProjectName>" & taskIdentifierInTitle & "<TaskTag>\"" & LF
|
||||
set outText to outText & "Or if no project path provided: \"" & tabTitlePrefix & "<TaskTag>\"" & LF & LF
|
||||
|
||||
set outText to outText & "Enhanced Features:" & LF
|
||||
set outText to outText & " • Smart session reuse for same project paths" & LF
|
||||
set outText to outText & " • Enhanced error reporting with context information" & LF
|
||||
set outText to outText & " • Optional verbose logging for debugging" & LF
|
||||
set outText to outText & " • No automatic clearing to prevent interrupting builds" & LF
|
||||
set outText to outText & " • 100-line default output for better build log visibility" & LF
|
||||
set outText to outText & " • Automatically 'cd's into project path if provided with a command." & LF
|
||||
set outText to outText & " • Groups new task tabs into existing project windows if fuzzy grouping enabled." & LF
|
||||
set outText to outText & " • Interrupts busy processes in reused tabs." & LF & LF
|
||||
|
||||
set outText to outText & "Usage Examples:" & LF
|
||||
set outText to outText & " # Start new project session, cd, run command, get 100 lines:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"frontend_build\" \"npm run build\" 100" & LF
|
||||
set outText to outText & " # Create/use 'backend_tests' task tab in the 'FancyApp' project window:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"backend_tests\" \"pytest\"" & LF
|
||||
set outText to outText & " # Prepare/create a new session by just cd'ing into project path (empty command):" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"dev_shell\" \"\" 1" & LF
|
||||
set outText to outText & " # Read from an existing session:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"frontend_build\" 50" & LF & LF
|
||||
|
||||
set outText to outText & "Parameters:" & LF
|
||||
set outText to outText & " [\"/absolute/project/path\"]: (Optional First Arg) Base path for project. Enables 'cd' and grouping." & LF
|
||||
set outText to outText & " \"<task_tag_name>\": Required. Specific task name for the tab (e.g., 'build', 'tests')." & LF
|
||||
set outText to outText & " [\"<shell_command_parts...>\"]: (Optional) Command. If path provided, 'cd path &&' is prepended." & LF
|
||||
set outText to outText & " Use \"\" for no command (will just 'cd' if path given)." & LF
|
||||
set outText to outText & " [[lines_to_read]]: (Optional Last Arg) Number of history lines. Default: " & defaultTailLines & "." & LF & LF
|
||||
|
||||
set outText to outText & "Notes:" & LF
|
||||
set outText to outText & " • Provide project path on first use for a project for best window grouping and auto 'cd'." & LF
|
||||
set outText to outText & " • Ensure Automation permissions for Terminal.app & System Events.app." & LF
|
||||
set outText to outText & " • Works within Terminal.app's AppleScript limitations for reliable operation." & LF
|
||||
|
||||
return outText
|
||||
end usageText
|
||||
--#endregion Helper Functions
|
||||
423
docs/spec.md
Normal file
423
docs/spec.md
Normal file
|
|
@ -0,0 +1,423 @@
|
|||
## Peekaboo: Full & Final Detailed Specification v1.1.1
|
||||
https://aistudio.google.com/prompts/1B0Va41QEZz5ZMiGmLl2gDme8kQ-LQPW-
|
||||
|
||||
**Project Vision:** Peekaboo is a macOS utility exposed via a Node.js MCP server, enabling AI agents to perform advanced screen captures, image analysis via user-configured AI providers, and query application/window information. The core macOS interactions are handled by a native Swift command-line interface (CLI) named `peekaboo`, which is called by the Node.js server. All image captures automatically exclude window shadows/frames.
|
||||
|
||||
**Core Components:**
|
||||
|
||||
1. **Node.js/TypeScript MCP Server (`peekaboo-mcp`):**
|
||||
* **NPM Package Name:** `peekaboo-mcp`.
|
||||
* **GitHub Project Name:** `peekaboo`.
|
||||
* Implements MCP server logic using the latest stable `@modelcontextprotocol/sdk`.
|
||||
* Exposes three primary MCP tools: `peekaboo.image`, `peekaboo.analyze`, `peekaboo.list`.
|
||||
* Translates MCP tool calls into commands for the Swift `peekaboo` CLI.
|
||||
* Parses structured JSON output from the Swift `peekaboo` CLI.
|
||||
* Handles image data preparation (reading files, Base64 encoding) for MCP responses if image data is explicitly requested by the client.
|
||||
* Manages interaction with configured AI providers based on environment variables. All AI provider calls (Ollama, OpenAI, etc.) are made from this Node.js layer.
|
||||
* Implements robust logging to a file using `pino`, ensuring no logs interfere with MCP stdio communication.
|
||||
2. **Swift CLI (`peekaboo`):**
|
||||
* A standalone macOS command-line tool, built as a universal binary (arm64 + x86_64).
|
||||
* Handles all direct macOS system interactions: image capture, application/window listing, and fuzzy application matching.
|
||||
* **Does NOT directly interact with any AI providers (Ollama, OpenAI, etc.).**
|
||||
* Outputs all results and errors in a structured JSON format via a global `--json-output` flag. This JSON includes a `debug_logs` array for internal Swift CLI logs, which the Node.js server can relay to its own logger.
|
||||
* The `peekaboo` binary is bundled at the root of the `peekaboo-mcp` NPM package.
|
||||
|
||||
---
|
||||
|
||||
### I. Node.js/TypeScript MCP Server (`peekaboo-mcp`)
|
||||
|
||||
#### A. Project Setup & Distribution
|
||||
|
||||
1. **Language/Runtime:** Node.js (latest LTS recommended, e.g., v18+ or v20+), TypeScript (latest stable, e.g., v5+).
|
||||
2. **Package Manager:** NPM.
|
||||
3. **`package.json`:**
|
||||
* `name`: `"peekaboo-mcp"`
|
||||
* `version`: Semantic versioning (e.g., `1.1.1`).
|
||||
* `type`: `"module"` (for ES Modules).
|
||||
* `main`: `"dist/index.js"` (compiled server entry point).
|
||||
* `bin`: `{ "peekaboo-mcp": "dist/index.js" }`.
|
||||
* `files`: `["dist/", "peekaboo"]` (includes compiled JS and the Swift `peekaboo` binary at package root).
|
||||
* `scripts`:
|
||||
* `build`: Command to compile TypeScript (e.g., `tsc`).
|
||||
* `start`: `node dist/index.js`.
|
||||
* `prepublishOnly`: `npm run build`.
|
||||
* `dependencies`: `@modelcontextprotocol/sdk` (latest stable), `zod` (for input validation), `pino` (for logging), relevant cloud AI SDKs (e.g., `openai`, `@anthropic-ai/sdk`).
|
||||
* `devDependencies`: `typescript`, `@types/node`, `pino-pretty` (for optional development console logging).
|
||||
4. **Distribution:** Published to NPM. Installable via `npm i -g peekaboo-mcp` or usable with `npx peekaboo-mcp`.
|
||||
5. **Swift CLI Location Strategy:**
|
||||
* The Node.js server will first check the environment variable `PEEKABOO_CLI_PATH`. If set and points to a valid executable, that path will be used.
|
||||
* If `PEEKABOO_CLI_PATH` is not set or invalid, the server will fall back to a bundled path, resolved relative to its own script location (e.g., `path.resolve(path.dirname(fileURLToPath(import.meta.url)), '..', 'peekaboo')`, assuming the compiled server script is in `dist/` and `peekaboo` binary is at the package root).
|
||||
|
||||
#### B. Server Initialization & Configuration (`src/index.ts`)
|
||||
|
||||
1. **Imports:** `McpServer`, `StdioServerTransport` from `@modelcontextprotocol/sdk`; `pino` from `pino`; `os`, `path` from Node.js built-ins.
|
||||
2. **Server Info:** `name: "PeekabooMCP"`, `version: <package_version from package.json>`.
|
||||
3. **Server Capabilities:** Advertise `tools` capability.
|
||||
4. **Logging (Pino):**
|
||||
* Instantiate `pino` logger.
|
||||
* **Default Transport:** File transport to `path.join(os.tmpdir(), 'peekaboo-mcp.log')`. Use `mkdir: true` option for destination.
|
||||
* **Log Level:** Controlled by ENV VAR `LOG_LEVEL` (standard Pino levels: `trace`, `debug`, `info`, `warn`, `error`, `fatal`). Default: `"info"`.
|
||||
* **Conditional Console Logging (Development Only):** If ENV VAR `PEEKABOO_MCP_CONSOLE_LOGGING="true"`, add a second Pino transport targeting `process.stderr.fd` (potentially using `pino-pretty` for human-readable output).
|
||||
* **Strict Rule:** All server operational logging must use the configured Pino instance. No direct `console.log/warn/error` that might output to `stdout`.
|
||||
5. **Environment Variables (Read by Server):**
|
||||
* `AI_PROVIDERS`: Comma-separated list of `provider_name/default_model_for_provider` pairs (e.g., `"openai/gpt-4o,ollama/qwen2.5vl:7b"`). If unset/empty, `peekaboo.analyze` tool reports AI not configured.
|
||||
* `OPENAI_API_KEY`: API key for OpenAI.
|
||||
* `ANTHROPIC_API_KEY`: (Example for future) API key for Anthropic.
|
||||
* (Other cloud provider API keys as standard ENV VAR names).
|
||||
* `OLLAMA_BASE_URL`: Base URL for local Ollama instance. Default: `"http://localhost:11434"`.
|
||||
* `LOG_LEVEL`: For Pino logger. Default: `"info"`.
|
||||
* `PEEKABOO_MCP_CONSOLE_LOGGING`: Boolean (`"true"`/`"false"`) for dev console logs. Default: `"false"`.
|
||||
* `PEEKABOO_CLI_PATH`: Optional override for Swift `peekaboo` CLI path.
|
||||
6. **Initial Status Reporting Logic:**
|
||||
* A server-instance-level boolean flag: `let hasSentInitialStatus = false;`.
|
||||
* A function `generateServerStatusString()`: Creates a formatted string: `"\n\n--- Peekaboo MCP Server Status ---\nName: PeekabooMCP\nVersion: <server_version>\nConfigured AI Providers (from AI_PROVIDERS ENV): <parsed list or 'None Configured. Set AI_PROVIDERS ENV.'>\n---"`.
|
||||
* Response Augmentation: In the function that sends a `ToolResponse` back to the MCP client, if the response is for a successful tool call (not `initialize`/`initialized` or `peekaboo.list` with `item_type: "server_status"`) AND `hasSentInitialStatus` is `false`:
|
||||
* Append `generateServerStatusString()` to the first `TextContentItem` in `ToolResponse.content`. If no text item exists, prepend a new one.
|
||||
* Set `hasSentInitialStatus = true`.
|
||||
7. **Tool Registration:** Register `peekaboo.image`, `peekaboo.analyze`, `peekaboo.list` with their Zod input schemas and handler functions.
|
||||
8. **Transport:** `await server.connect(new StdioServerTransport());`.
|
||||
9. **Shutdown:** Implement graceful shutdown on `SIGINT`, `SIGTERM` (e.g., `await server.close(); logger.flush(); process.exit(0);`).
|
||||
|
||||
#### C. MCP Tool Specifications & Node.js Handler Logic
|
||||
|
||||
**General Node.js Handler Pattern (for tools calling Swift `peekaboo` CLI):**
|
||||
|
||||
1. Validate MCP `input` against the tool's Zod schema. If invalid, log error with Pino and return MCP error `ToolResponse`.
|
||||
2. Construct command-line arguments for Swift `peekaboo` CLI based on MCP `input`. **Always include `--json-output`**.
|
||||
3. Log the constructed Swift command with Pino at `debug` level.
|
||||
4. Execute Swift `peekaboo` CLI using `child_process.spawn`, capturing `stdout`, `stderr`, and `exitCode`.
|
||||
5. If any data is received on Swift CLI's `stderr`, log it immediately with Pino at `warn` level, prefixed (e.g., `[SwiftCLI-stderr]`).
|
||||
6. On Swift CLI process close:
|
||||
* If `exitCode !== 0` or `stdout` is empty/not parseable as JSON:
|
||||
* Log failure details with Pino (`error` level).
|
||||
* Construct MCP error `ToolResponse` (e.g., `errorCode: "SWIFT_CLI_EXECUTION_ERROR"` or `SWIFT_CLI_INVALID_OUTPUT` in `_meta`). Message should include relevant parts of raw `stdout`/`stderr` if available.
|
||||
* If `exitCode === 0`:
|
||||
* Attempt to parse `stdout` as JSON. If parsing fails, treat as error (above).
|
||||
* Let `swiftResponse = JSON.parse(stdout)`.
|
||||
* If `swiftResponse.debug_logs` (array of strings) exists, log each entry via Pino at `debug` level, clearly marked as from backend (e.g., `logger.debug({ backend: "swift", swift_log: entry })`).
|
||||
* If `swiftResponse.success === false`:
|
||||
* Extract `swiftResponse.error.message`, `swiftResponse.error.code`, `swiftResponse.error.details`.
|
||||
* Construct and return MCP error `ToolResponse`, relaying these details (e.g., `message` in `content`, `code` in `_meta.backend_error_code`).
|
||||
* If `swiftResponse.success === true`:
|
||||
* Process `swiftResponse.data` to construct the success MCP `ToolResponse`.
|
||||
* Relay `swiftResponse.messages` as `TextContentItem`s in the MCP response if appropriate.
|
||||
* For `peekaboo.image` with `input.return_data: true`:
|
||||
* Iterate `swiftResponse.data.saved_files.[*].path`.
|
||||
* For each path, read image file into a `Buffer`.
|
||||
* Base64 encode the `Buffer`.
|
||||
* Construct `ImageContentItem` for MCP `ToolResponse.content`, including `data` (Base64 string) and `mimeType` (from `swiftResponse.data.saved_files.[*].mime_type`).
|
||||
* Augment successful `ToolResponse` with initial server status string if applicable (see B.6).
|
||||
* Send MCP `ToolResponse`.
|
||||
|
||||
**Tool 1: `peekaboo.image`**
|
||||
|
||||
* **MCP Description:** "Captures macOS screen content. Targets: entire screen (each display separately), a specific application window, or all windows of an application. Supports foreground/background capture. Captured image(s) can be saved to file(s) and/or returned directly as image data. Window shadows/frames are automatically excluded. Application identification uses intelligent fuzzy matching."
|
||||
* **MCP Input Schema (`ImageInputSchema`):**
|
||||
```typescript
|
||||
z.object({
|
||||
app: z.string().optional().describe("Optional. Target application: name, bundle ID, or partial name. If omitted, captures screen(s). Uses fuzzy matching."),
|
||||
path: z.string().optional().describe("Optional. Base absolute path for saving. For 'screen' or 'multi' mode, display/window info is appended by backend. If omitted, default temporary paths used by backend. If 'return_data' true, images saved AND returned if 'path' specified."),
|
||||
mode: z.enum(["screen", "window", "multi"]).optional().describe("Capture mode. Defaults to 'window' if 'app' is provided, otherwise 'screen'."),
|
||||
window_specifier: z.union([
|
||||
z.object({ title: z.string().describe("Capture window by title.") }),
|
||||
z.object({ index: z.number().int().nonnegative().describe("Capture window by index (0=frontmost). 'capture_focus' might need to be 'foreground'.") }),
|
||||
]).optional().describe("Optional. Specifies which window for 'window' mode. Defaults to main/frontmost of target app."),
|
||||
format: z.enum(["png", "jpg"]).optional().default("png").describe("Output image format. Defaults to 'png'."),
|
||||
return_data: z.boolean().optional().default(false).describe("Optional. If true, image data is returned in response content (one item for 'window' mode, multiple for 'screen' or 'multi' mode)."),
|
||||
capture_focus: z.enum(["background", "foreground"])
|
||||
.optional().default("background").describe("Optional. Focus behavior. 'background' (default): capture without altering window focus. 'foreground': bring target to front before capture.")
|
||||
})
|
||||
```
|
||||
* **Node.js Handler - Default `mode` Logic:** If `input.app` provided & `input.mode` undefined, `mode="window"`. If no `input.app` & `input.mode` undefined, `mode="screen"`.
|
||||
* **MCP Output Schema (`ToolResponse`):**
|
||||
* `content`: `Array<ImageContentItem | TextContentItem>`
|
||||
* If `input.return_data: true`: Contains `ImageContentItem`(s): `{ type: "image", data: "<base64_string_no_prefix>", mimeType: "image/<format>", metadata?: { item_label?: string, window_title?: string, window_id?: number, source_path?: string } }`.
|
||||
* May contain `TextContentItem`(s) (summary, file paths from `saved_files`, Swift CLI `messages`).
|
||||
* `saved_files`: `Array<{ path: string, item_label?: string, window_title?: string, window_id?: number, mime_type: string }>` (Directly from Swift CLI JSON `data.saved_files` if images were saved).
|
||||
* `isError?: boolean`
|
||||
* `_meta?: { backend_error_code?: string }` (For relaying Swift CLI error codes).
|
||||
|
||||
**Tool 2: `peekaboo.analyze`**
|
||||
|
||||
* **MCP Description:** "Analyzes an image file using a configured AI model (local Ollama, cloud OpenAI, etc.) and returns a textual analysis/answer. Requires image path. AI provider selection and model defaults are governed by the server's `AI_PROVIDERS` environment variable and client overrides."
|
||||
* **MCP Input Schema (`AnalyzeInputSchema`):**
|
||||
```typescript
|
||||
z.object({
|
||||
image_path: z.string().describe("Required. Absolute path to image file (.png, .jpg, .webp) to be analyzed."),
|
||||
question: z.string().describe("Required. Question for the AI about the image."),
|
||||
provider_config: z.object({
|
||||
type: z.enum(["auto", "ollama", "openai" /* future: "anthropic_api" */]).default("auto")
|
||||
.describe("AI provider. 'auto' uses server's AI_PROVIDERS ENV preference. Specific provider must be enabled in server's AI_PROVIDERS."),
|
||||
model: z.string().optional().describe("Optional. Model name. If omitted, uses model from server's AI_PROVIDERS for chosen provider, or an internal default for that provider.")
|
||||
}).optional().describe("Optional. Explicit provider/model. Validated against server's AI_PROVIDERS.")
|
||||
})
|
||||
```
|
||||
* **Node.js Handler Logic:**
|
||||
1. Validate input. Server pre-checks `image_path` extension (`.png`, `.jpg`, `.jpeg`, `.webp`); return MCP error if not recognized.
|
||||
2. Read `process.env.AI_PROVIDERS`. If unset/empty, return MCP error "AI analysis not configured on this server. Set the AI_PROVIDERS environment variable." Log this with Pino (`error` level).
|
||||
3. Parse `AI_PROVIDERS` into `configuredItems = [{provider: string, model: string}]`.
|
||||
4. **Determine Provider & Model:**
|
||||
* `requestedProviderType = input.provider_config?.type || "auto"`.
|
||||
* `requestedModelName = input.provider_config?.model`.
|
||||
* `chosenProvider: string | null = null`, `chosenModel: string | null = null`.
|
||||
* If `requestedProviderType !== "auto"`:
|
||||
* Find entry in `configuredItems` where `provider === requestedProviderType`.
|
||||
* If not found, MCP error: "Provider '{requestedProviderType}' is not enabled in server's AI_PROVIDERS configuration."
|
||||
* `chosenProvider = requestedProviderType`.
|
||||
* `chosenModel = requestedModelName || model_from_matching_configuredItem || hardcoded_default_for_chosenProvider`.
|
||||
* Else (`requestedProviderType === "auto"`):
|
||||
* Iterate `configuredItems` in order. For each `{provider, modelFromEnv}`:
|
||||
* Check availability (Ollama up? Cloud API key for `provider` set in `process.env`?).
|
||||
* If available: `chosenProvider = provider`, `chosenModel = requestedModelName || modelFromEnv`. Break.
|
||||
* If no provider found after iteration, MCP error: "No configured AI providers in AI_PROVIDERS are currently operational."
|
||||
5. **Execute Analysis (Node.js handles all AI calls):**
|
||||
* Read `input.image_path` into a `Buffer`. Base64 encode.
|
||||
* If `chosenProvider` is "ollama": HTTP POST to Ollama (using `process.env.OLLAMA_BASE_URL`) with Base64 image, `input.question`, `chosenModel`. Handle Ollama API errors.
|
||||
* If `chosenProvider` is "openai": Use OpenAI SDK/HTTP with Base64 image, `input.question`, `chosenModel`, and API key from `process.env.OPENAI_API_KEY`. Handle OpenAI API errors.
|
||||
* (Similar for other cloud providers).
|
||||
6. Construct MCP `ToolResponse`.
|
||||
* **MCP Output Schema (`ToolResponse`):**
|
||||
* `content`: `[{ type: "text", text: "<AI's analysis/answer>" }]`
|
||||
* `analysis_text`: `string` (Core AI answer).
|
||||
* `model_used`: `string` (e.g., "ollama/llava:7b", "openai/gpt-4o") - The actual provider/model pair used.
|
||||
* `isError?: boolean`
|
||||
* `_meta?: { backend_error_code?: string }` (For AI provider API errors).
|
||||
|
||||
**Tool 3: `peekaboo.list`**
|
||||
|
||||
* **MCP Description:** "Lists system items: all running applications, windows of a specific app, or server status. Allows specifying window details. App ID uses fuzzy matching."
|
||||
* **MCP Input Schema (`ListInputSchema`):**
|
||||
```typescript
|
||||
z.object({
|
||||
item_type: z.enum(["running_applications", "application_windows", "server_status"])
|
||||
.default("running_applications").describe("What to list. 'server_status' returns Peekaboo server info."),
|
||||
app: z.string().optional().describe("Required if 'item_type' is 'application_windows'. Target application. Uses fuzzy matching."),
|
||||
include_window_details: z.array(
|
||||
z.enum(["off_screen", "bounds", "ids"])
|
||||
).optional().describe("Optional, for 'application_windows'. Additional window details. Example: ['bounds', 'ids']")
|
||||
}).refine(data => data.item_type !== "application_windows" || (data.app !== undefined && data.app.trim() !== ""), {
|
||||
message: "For 'application_windows', 'app' identifier is required.", path: ["app"],
|
||||
}).refine(data => !data.include_window_details || data.item_type === "application_windows", {
|
||||
message: "'include_window_details' only for 'application_windows'.", path: ["include_window_details"],
|
||||
}).refine(data => data.item_type !== "server_status" || (data.app === undefined && data.include_window_details === undefined), {
|
||||
message: "'app' and 'include_window_details' not applicable for 'server_status'.", path: ["item_type"]
|
||||
})
|
||||
```
|
||||
* **Node.js Handler Logic:**
|
||||
* If `input.item_type === "server_status"`: Handler directly calls `generateServerStatusString()` and returns it in `ToolResponse.content[{type:"text"}]`. Does NOT call Swift CLI. Does NOT affect `hasSentInitialStatus`.
|
||||
* Else (for "running_applications", "application_windows"): Call Swift `peekaboo list ...` with mapped args (including joining `include_window_details` array to comma-separated string for Swift CLI flag). Parse Swift JSON. Format MCP `ToolResponse`.
|
||||
* **MCP Output Schema (`ToolResponse`):**
|
||||
* `content`: `[{ type: "text", text: "<Summary or Status String>" }]`
|
||||
* If `item_type: "running_applications"`: `application_list`: `Array<{ app_name: string; bundle_id: string; pid: number; is_active: boolean; window_count: number }>`.
|
||||
* If `item_type: "application_windows"`:
|
||||
* `window_list`: `Array<{ window_title: string; window_id?: number; window_index?: number; bounds?: {x:number,y:number,w:number,h:number}; is_on_screen?: boolean }>`.
|
||||
* `target_application_info`: `{ app_name: string; bundle_id?: string; pid: number }`.
|
||||
* `isError?: boolean`
|
||||
* `_meta?: { backend_error_code?: string }`
|
||||
|
||||
---
|
||||
|
||||
### II. Swift CLI (`peekaboo`)
|
||||
|
||||
#### A. General CLI Design
|
||||
|
||||
1. **Executable Name:** `peekaboo` (Universal macOS binary: arm64 + x86_64).
|
||||
2. **Argument Parser:** Use `swift-argument-parser` package.
|
||||
3. **Top-Level Commands (Subcommands of `peekaboo`):** `image`, `list`. (No `analyze` command).
|
||||
4. **Global Option (for all commands/subcommands):** `--json-output` (Boolean flag).
|
||||
* If present: All `stdout` from Swift CLI MUST be a single, valid JSON object. `stderr` should be empty on success, or may contain system-level error text on catastrophic failure before JSON can be formed.
|
||||
* If absent: Output human-readable text to `stdout` and `stderr` as appropriate for direct CLI usage.
|
||||
* **Success JSON Structure:**
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": { /* Command-specific structured data */ },
|
||||
"messages": ["Optional user-facing status/warning message from Swift CLI operations"],
|
||||
"debug_logs": ["Internal Swift CLI debug log entry 1", "Another trace message"]
|
||||
}
|
||||
```
|
||||
* **Error JSON Structure:**
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": {
|
||||
"message": "Detailed, user-understandable error message.",
|
||||
"code": "SWIFT_ERROR_CODE_STRING", // e.g., PERMISSION_DENIED_SCREEN_RECORDING
|
||||
"details": "Optional additional technical details or context."
|
||||
},
|
||||
"debug_logs": ["Contextual debug log leading to error"]
|
||||
}
|
||||
```
|
||||
* **Standardized Swift Error Codes (`error.code` values):**
|
||||
* `PERMISSION_DENIED_SCREEN_RECORDING`
|
||||
* `PERMISSION_DENIED_ACCESSIBILITY` (if Accessibility API is attempted for foregrounding)
|
||||
* `APP_NOT_FOUND` (general app lookup failure)
|
||||
* `AMBIGUOUS_APP_IDENTIFIER` (fuzzy match yields multiple candidates)
|
||||
* `WINDOW_NOT_FOUND`
|
||||
* `CAPTURE_FAILED` (general image capture error)
|
||||
* `FILE_IO_ERROR` (e.g., cannot write to specified path)
|
||||
* `INVALID_ARGUMENT` (CLI argument validation failure)
|
||||
* `SIPS_ERROR` (if `sips` is used for PDF fallback and fails)
|
||||
* `INTERNAL_SWIFT_ERROR` (unexpected Swift runtime errors)
|
||||
5. **Permissions Handling:**
|
||||
* The CLI must proactively check for Screen Recording permission before attempting any capture or window listing that requires it (e.g., reading window titles via `CGWindowListCopyWindowInfo`).
|
||||
* If Accessibility is used for `--capture-focus foreground` window raising, check that permission.
|
||||
* If permissions are missing, output the specific JSON error (e.g., code `PERMISSION_DENIED_SCREEN_RECORDING`) and exit. Do not hang or prompt interactively.
|
||||
6. **Temporary File Management:**
|
||||
* If the CLI needs to save an image temporarily (e.g., if `screencapture` is used as a fallback for PDF, or if no `--path` is given by Node.js), it uses `FileManager.default.temporaryDirectory` with unique filenames (e.g., `peekaboo_<uuid>_<info>.<format>`).
|
||||
* These self-created temporary files **MUST be deleted by the Swift CLI** after it has successfully generated and flushed its JSON output to `stdout`.
|
||||
* Files saved to a user/Node.js-specified `--path` are **NEVER** deleted by the Swift CLI.
|
||||
7. **Internal Logging for `--json-output`:**
|
||||
* When `--json-output` is active, internal verbose/debug messages are collected into the `debug_logs: [String]` array in the final JSON output. They are **NOT** printed to `stderr`.
|
||||
* For standalone CLI use (no `--json-output`), these debug messages can print to `stderr`.
|
||||
|
||||
#### B. `peekaboo image` Command
|
||||
|
||||
* **Options (defined using `swift-argument-parser`):**
|
||||
* `--app <String?>`: App identifier.
|
||||
* `--path <String?>`: Base output directory or file prefix/path.
|
||||
* `--mode <ModeEnum?>`: `ModeEnum` is `screen, window, multi`. Default logic: if `--app` then `window`, else `screen`.
|
||||
* `--window-title <String?>`: For `mode window`.
|
||||
* `--window-index <Int?>`: For `mode window`.
|
||||
* `--format <FormatEnum?>`: `FormatEnum` is `png, jpg`. Default `png`.
|
||||
* `--capture-focus <FocusEnum?>`: `FocusEnum` is `background, foreground`. Default `background`.
|
||||
* **Behavior:**
|
||||
* Implements fuzzy app matching. On ambiguity, returns JSON error with `code: "AMBIGUOUS_APP_IDENTIFIER"` and lists potential matches in `error.details` or `error.message`.
|
||||
* Always attempts to exclude window shadow/frame (`CGWindowImageOption.boundsIgnoreFraming` or `screencapture -o` if shelled out for PDF). No cursor is captured.
|
||||
* **Background Capture (`--capture-focus background` or default):**
|
||||
* Primary method: Uses `CGWindowListCopyWindowInfo` to identify target window(s)/screen(s).
|
||||
* Captures via `CGDisplayCreateImage` (for screen mode) or `CGWindowListCreateImageFromArray` (for window/multi modes).
|
||||
* Converts `CGImage` to `Data` (PNG or JPG) and saves to file (at user `--path` or its own temp path).
|
||||
* **Foreground Capture (`--capture-focus foreground`):**
|
||||
* Activates app using `NSRunningApplication.activate(options: [.activateIgnoringOtherApps])`.
|
||||
* If a specific window needs raising (e.g., from `--window-index` or specific `--window-title` for an app with many windows), it *may* attempt to use Accessibility API (`AXUIElementPerformAction(kAXRaiseAction)`) if available and permissioned.
|
||||
* If specific window raise fails (or Accessibility not used/permitted), it logs a warning to the `debug_logs` array (e.g., "Could not raise specific window; proceeding with frontmost of activated app.") and captures the most suitable front window of the activated app.
|
||||
* Capture mechanism is still preferably native CG APIs.
|
||||
* **Multi-Screen (`--mode screen`):** Enumerates `CGGetActiveDisplayList`, captures each display using `CGDisplayCreateImage`. Filenames (if saving) get display-specific suffixes (e.g., `_display0_main.png`, `_display1.png`).
|
||||
* **Multi-Window (`--mode multi`):** Uses `CGWindowListCopyWindowInfo` for target app's PID, captures each relevant window (on-screen by default) with `CGWindowListCreateImageFromArray`. Filenames get window-specific suffixes.
|
||||
* **PDF Format Handling (as per Q7 decision):** If `--format pdf` were still supported (it's removed), it would use `Process` to call `screencapture -t pdf -R<bounds>` or `-l<id>`. Since PDF is removed, this is not applicable.
|
||||
* **JSON Output `data` field structure (on success):**
|
||||
```json
|
||||
{
|
||||
"saved_files": [ // Array is always present, even if empty (e.g. capture failed before saving)
|
||||
{
|
||||
"path": "/absolute/path/to/saved/image.png", // Absolute path
|
||||
"item_label": "Display 1 / Main", // Or window_title for window/multi modes
|
||||
"window_id": 12345, // CGWindowID (UInt32), optional, if available & relevant
|
||||
"window_index": 0, // Optional, if relevant (e.g. for multi-window or indexed capture)
|
||||
"mime_type": "image/png" // Actual MIME type of the saved file
|
||||
}
|
||||
// ... more items if mode is screen or multi ...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### C. `peekaboo list` Command
|
||||
|
||||
* **Subcommands & Options:**
|
||||
* `peekaboo list apps [--json-output]`
|
||||
* `peekaboo list windows --app <app_identifier_string> [--include-details <comma_separated_string_of_options>] [--json-output]`
|
||||
* `--include-details` options: `off_screen`, `bounds`, `ids`.
|
||||
* **Behavior:**
|
||||
* `apps`: Uses `NSWorkspace.shared.runningApplications`. For each app, retrieves `localizedName`, `bundleIdentifier`, `processIdentifier` (pid), `isActive`. To get `window_count`, it performs a `CGWindowListCopyWindowInfo` call filtered by the app's PID and counts on-screen windows.
|
||||
* `windows`:
|
||||
* Resolves `app_identifier` using fuzzy matching. If ambiguous, returns JSON error.
|
||||
* Uses `CGWindowListCopyWindowInfo` filtered by the target app's PID.
|
||||
* If `--include-details` contains `"off_screen"`, uses `CGWindowListOption.optionAllScreenWindows` (and includes `kCGWindowIsOnscreen` boolean in output). Otherwise, uses `CGWindowListOption.optionOnScreenOnly`.
|
||||
* Extracts `kCGWindowName` (title).
|
||||
* If `"ids"` in `--include-details`, extracts `kCGWindowNumber` as `window_id`.
|
||||
* If `"bounds"` in `--include-details`, extracts `kCGWindowBounds` as `bounds: {x, y, width, height}`.
|
||||
* `window_index` is the 0-based index from the filtered array returned by `CGWindowListCopyWindowInfo` (reflecting z-order for on-screen windows).
|
||||
* **JSON Output `data` field structure (on success):**
|
||||
* For `apps`:
|
||||
```json
|
||||
{
|
||||
"applications": [
|
||||
{
|
||||
"app_name": "Safari",
|
||||
"bundle_id": "com.apple.Safari",
|
||||
"pid": 501,
|
||||
"is_active": true,
|
||||
"window_count": 3 // Count of on-screen windows for this app
|
||||
}
|
||||
// ... more applications ...
|
||||
]
|
||||
}
|
||||
```
|
||||
* For `windows`:
|
||||
```json
|
||||
{
|
||||
"target_application_info": {
|
||||
"app_name": "Safari",
|
||||
"pid": 501,
|
||||
"bundle_id": "com.apple.Safari"
|
||||
},
|
||||
"windows": [
|
||||
{
|
||||
"window_title": "Apple",
|
||||
"window_id": 67, // if "ids" requested
|
||||
"window_index": 0,
|
||||
"is_on_screen": true, // Potentially useful, especially if "off_screen" included
|
||||
"bounds": {"x": 0, "y": 0, "width": 800, "height": 600} // if "bounds" requested
|
||||
}
|
||||
// ... more windows ...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### III. Build, Packaging & Distribution
|
||||
|
||||
1. **Swift CLI (`peekaboo`):**
|
||||
* `Package.swift` defines an executable product named `peekaboo`.
|
||||
* Build process (e.g., part of NPM `prepublishOnly` or a separate build script): `swift build -c release --arch arm64 --arch x86_64`.
|
||||
* The resulting universal binary (e.g., from `.build/apple/Products/Release/peekaboo`) is copied to the root of the `peekaboo-mcp` NPM package directory before publishing.
|
||||
2. **Node.js MCP Server:**
|
||||
* TypeScript is compiled to JavaScript (e.g., into `dist/`) using `tsc`.
|
||||
* The NPM package includes `dist/` and the `peekaboo` Swift binary (at package root).
|
||||
|
||||
---
|
||||
|
||||
### IV. Documentation (`README.md` for `peekaboo-mcp` NPM Package)
|
||||
|
||||
1. **Project Overview:** Briefly state vision and components.
|
||||
2. **Prerequisites:**
|
||||
* macOS version (e.g., 12.0+ or as required by Swift/APIs).
|
||||
* Xcode Command Line Tools (recommended for a stable development environment on macOS, even if not strictly used by the final Swift binary for all operations).
|
||||
* Ollama (if using local Ollama for analysis) + instructions to pull models.
|
||||
3. **Installation:**
|
||||
* Primary: `npm install -g peekaboo-mcp`.
|
||||
* Alternative: `npx peekaboo-mcp`.
|
||||
4. **MCP Client Configuration:**
|
||||
* Provide example JSON snippets for configuring popular MCP clients (e.g., VS Code, Cursor) to use `peekaboo-mcp`.
|
||||
* Example for VS Code/Cursor using `npx` for robustness:
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"PeekabooMCP": {
|
||||
"command": "npx",
|
||||
"args": ["peekaboo-mcp"],
|
||||
"env": {
|
||||
"AI_PROVIDERS": "ollama/llava:latest,openai/gpt-4o",
|
||||
"OPENAI_API_KEY": "sk-yourkeyhere"
|
||||
/* other ENV VARS */
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
5. **Required macOS Permissions:**
|
||||
* **Screen Recording:** Essential for ALL `peekaboo.image` functionalities and for `peekaboo.list` if it needs to read window titles (which it does via `CGWindowListCopyWindowInfo`). Provide clear, step-by-step instructions for System Settings. Include `open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"` command.
|
||||
* **Accessibility:** Required *only* if `peekaboo.image` with `capture_focus: "foreground"` needs to perform specific window raising actions (beyond simple app activation) via the Accessibility API. Explain this nuance. Include `open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"` command.
|
||||
6. **Environment Variables (for Node.js `peekaboo-mcp` server):**
|
||||
* `AI_PROVIDERS`: Crucial for `peekaboo.analyze`. Explain format (`provider/model,provider/model`), effect, and that `peekaboo.analyze` reports "not configured" if unset. List recognized `provider` names ("ollama", "openai").
|
||||
* `OPENAI_API_KEY` (and similar for other cloud providers): How they are used.
|
||||
* `OLLAMA_BASE_URL`: Default and purpose.
|
||||
* `LOG_LEVEL`: For `pino` logger. Values and default.
|
||||
* `PEEKABOO_MCP_CONSOLE_LOGGING`: For development.
|
||||
* `PEEKABOO_CLI_PATH`: For overriding bundled Swift CLI.
|
||||
7. **MCP Tool Overview:**
|
||||
* Brief descriptions of `peekaboo.image`, `peekaboo.analyze`, `peekaboo.list` and their primary purpose.
|
||||
8. **Link to Detailed Tool Specification:** A separate `TOOL_API_REFERENCE.md` (generated from or summarizing the Zod schemas and output structures in this document) for users/AI developers needing full schema details.
|
||||
9. **Troubleshooting / Support:** Link to GitHub issues.
|
||||
Loading…
Reference in a new issue