mirror of
https://github.com/samsonjs/Peekaboo.git
synced 2026-04-16 13:05:49 +00:00
423 lines
32 KiB
Markdown
423 lines
32 KiB
Markdown
## Peekaboo: Full & Final Detailed Specification v1.1.1
|
|
https://aistudio.google.com/prompts/1B0Va41QEZz5ZMiGmLl2gDme8kQ-LQPW-
|
|
|
|
**Project Vision:** Peekaboo is a macOS utility exposed via a Node.js MCP server, enabling AI agents to perform advanced screen captures, image analysis via user-configured AI providers, and query application/window information. The core macOS interactions are handled by a native Swift command-line interface (CLI) named `peekaboo`, which is called by the Node.js server. All image captures automatically exclude window shadows/frames.
|
|
|
|
**Core Components:**
|
|
|
|
1. **Node.js/TypeScript MCP Server (`peekaboo-mcp`):**
|
|
* **NPM Package Name:** `peekaboo-mcp`.
|
|
* **GitHub Project Name:** `peekaboo`.
|
|
* Implements MCP server logic using the latest stable `@modelcontextprotocol/sdk`.
|
|
* Exposes three primary MCP tools: `peekaboo.image`, `peekaboo.analyze`, `peekaboo.list`.
|
|
* Translates MCP tool calls into commands for the Swift `peekaboo` CLI.
|
|
* Parses structured JSON output from the Swift `peekaboo` CLI.
|
|
* Handles image data preparation (reading files, Base64 encoding) for MCP responses if image data is explicitly requested by the client.
|
|
* Manages interaction with configured AI providers based on environment variables. All AI provider calls (Ollama, OpenAI, etc.) are made from this Node.js layer.
|
|
* Implements robust logging to a file using `pino`, ensuring no logs interfere with MCP stdio communication.
|
|
2. **Swift CLI (`peekaboo`):**
|
|
* A standalone macOS command-line tool, built as a universal binary (arm64 + x86_64).
|
|
* Handles all direct macOS system interactions: image capture, application/window listing, and fuzzy application matching.
|
|
* **Does NOT directly interact with any AI providers (Ollama, OpenAI, etc.).**
|
|
* Outputs all results and errors in a structured JSON format via a global `--json-output` flag. This JSON includes a `debug_logs` array for internal Swift CLI logs, which the Node.js server can relay to its own logger.
|
|
* The `peekaboo` binary is bundled at the root of the `peekaboo-mcp` NPM package.
|
|
|
|
---
|
|
|
|
### I. Node.js/TypeScript MCP Server (`peekaboo-mcp`)
|
|
|
|
#### A. Project Setup & Distribution
|
|
|
|
1. **Language/Runtime:** Node.js (latest LTS recommended, e.g., v18+ or v20+), TypeScript (latest stable, e.g., v5+).
|
|
2. **Package Manager:** NPM.
|
|
3. **`package.json`:**
|
|
* `name`: `"peekaboo-mcp"`
|
|
* `version`: Semantic versioning (e.g., `1.1.1`).
|
|
* `type`: `"module"` (for ES Modules).
|
|
* `main`: `"dist/index.js"` (compiled server entry point).
|
|
* `bin`: `{ "peekaboo-mcp": "dist/index.js" }`.
|
|
* `files`: `["dist/", "peekaboo"]` (includes compiled JS and the Swift `peekaboo` binary at package root).
|
|
* `scripts`:
|
|
* `build`: Command to compile TypeScript (e.g., `tsc`).
|
|
* `start`: `node dist/index.js`.
|
|
* `prepublishOnly`: `npm run build`.
|
|
* `dependencies`: `@modelcontextprotocol/sdk` (latest stable), `zod` (for input validation), `pino` (for logging), relevant cloud AI SDKs (e.g., `openai`, `@anthropic-ai/sdk`).
|
|
* `devDependencies`: `typescript`, `@types/node`, `pino-pretty` (for optional development console logging).
|
|
4. **Distribution:** Published to NPM. Installable via `npm i -g peekaboo-mcp` or usable with `npx peekaboo-mcp`.
|
|
5. **Swift CLI Location Strategy:**
|
|
* The Node.js server will first check the environment variable `PEEKABOO_CLI_PATH`. If set and points to a valid executable, that path will be used.
|
|
* If `PEEKABOO_CLI_PATH` is not set or invalid, the server will fall back to a bundled path, resolved relative to its own script location (e.g., `path.resolve(path.dirname(fileURLToPath(import.meta.url)), '..', 'peekaboo')`, assuming the compiled server script is in `dist/` and `peekaboo` binary is at the package root).
|
|
|
|
#### B. Server Initialization & Configuration (`src/index.ts`)
|
|
|
|
1. **Imports:** `McpServer`, `StdioServerTransport` from `@modelcontextprotocol/sdk`; `pino` from `pino`; `os`, `path` from Node.js built-ins.
|
|
2. **Server Info:** `name: "PeekabooMCP"`, `version: <package_version from package.json>`.
|
|
3. **Server Capabilities:** Advertise `tools` capability.
|
|
4. **Logging (Pino):**
|
|
* Instantiate `pino` logger.
|
|
* **Default Transport:** File transport to `path.join(os.tmpdir(), 'peekaboo-mcp.log')`. Use `mkdir: true` option for destination.
|
|
* **Log Level:** Controlled by ENV VAR `LOG_LEVEL` (standard Pino levels: `trace`, `debug`, `info`, `warn`, `error`, `fatal`). Default: `"info"`.
|
|
* **Conditional Console Logging (Development Only):** If ENV VAR `PEEKABOO_MCP_CONSOLE_LOGGING="true"`, add a second Pino transport targeting `process.stderr.fd` (potentially using `pino-pretty` for human-readable output).
|
|
* **Strict Rule:** All server operational logging must use the configured Pino instance. No direct `console.log/warn/error` that might output to `stdout`.
|
|
5. **Environment Variables (Read by Server):**
|
|
* `AI_PROVIDERS`: Comma-separated list of `provider_name/default_model_for_provider` pairs (e.g., `"openai/gpt-4o,ollama/qwen2.5vl:7b"`). If unset/empty, `peekaboo.analyze` tool reports AI not configured.
|
|
* `OPENAI_API_KEY`: API key for OpenAI.
|
|
* `ANTHROPIC_API_KEY`: (Example for future) API key for Anthropic.
|
|
* (Other cloud provider API keys as standard ENV VAR names).
|
|
* `OLLAMA_BASE_URL`: Base URL for local Ollama instance. Default: `"http://localhost:11434"`.
|
|
* `LOG_LEVEL`: For Pino logger. Default: `"info"`.
|
|
* `PEEKABOO_MCP_CONSOLE_LOGGING`: Boolean (`"true"`/`"false"`) for dev console logs. Default: `"false"`.
|
|
* `PEEKABOO_CLI_PATH`: Optional override for Swift `peekaboo` CLI path.
|
|
6. **Initial Status Reporting Logic:**
|
|
* A server-instance-level boolean flag: `let hasSentInitialStatus = false;`.
|
|
* A function `generateServerStatusString()`: Creates a formatted string: `"\n\n--- Peekaboo MCP Server Status ---\nName: PeekabooMCP\nVersion: <server_version>\nConfigured AI Providers (from AI_PROVIDERS ENV): <parsed list or 'None Configured. Set AI_PROVIDERS ENV.'>\n---"`.
|
|
* Response Augmentation: In the function that sends a `ToolResponse` back to the MCP client, if the response is for a successful tool call (not `initialize`/`initialized` or `peekaboo.list` with `item_type: "server_status"`) AND `hasSentInitialStatus` is `false`:
|
|
* Append `generateServerStatusString()` to the first `TextContentItem` in `ToolResponse.content`. If no text item exists, prepend a new one.
|
|
* Set `hasSentInitialStatus = true`.
|
|
7. **Tool Registration:** Register `peekaboo.image`, `peekaboo.analyze`, `peekaboo.list` with their Zod input schemas and handler functions.
|
|
8. **Transport:** `await server.connect(new StdioServerTransport());`.
|
|
9. **Shutdown:** Implement graceful shutdown on `SIGINT`, `SIGTERM` (e.g., `await server.close(); logger.flush(); process.exit(0);`).
|
|
|
|
#### C. MCP Tool Specifications & Node.js Handler Logic
|
|
|
|
**General Node.js Handler Pattern (for tools calling Swift `peekaboo` CLI):**
|
|
|
|
1. Validate MCP `input` against the tool's Zod schema. If invalid, log error with Pino and return MCP error `ToolResponse`.
|
|
2. Construct command-line arguments for Swift `peekaboo` CLI based on MCP `input`. **Always include `--json-output`**.
|
|
3. Log the constructed Swift command with Pino at `debug` level.
|
|
4. Execute Swift `peekaboo` CLI using `child_process.spawn`, capturing `stdout`, `stderr`, and `exitCode`.
|
|
5. If any data is received on Swift CLI's `stderr`, log it immediately with Pino at `warn` level, prefixed (e.g., `[SwiftCLI-stderr]`).
|
|
6. On Swift CLI process close:
|
|
* If `exitCode !== 0` or `stdout` is empty/not parseable as JSON:
|
|
* Log failure details with Pino (`error` level).
|
|
* Construct MCP error `ToolResponse` (e.g., `errorCode: "SWIFT_CLI_EXECUTION_ERROR"` or `SWIFT_CLI_INVALID_OUTPUT` in `_meta`). Message should include relevant parts of raw `stdout`/`stderr` if available.
|
|
* If `exitCode === 0`:
|
|
* Attempt to parse `stdout` as JSON. If parsing fails, treat as error (above).
|
|
* Let `swiftResponse = JSON.parse(stdout)`.
|
|
* If `swiftResponse.debug_logs` (array of strings) exists, log each entry via Pino at `debug` level, clearly marked as from backend (e.g., `logger.debug({ backend: "swift", swift_log: entry })`).
|
|
* If `swiftResponse.success === false`:
|
|
* Extract `swiftResponse.error.message`, `swiftResponse.error.code`, `swiftResponse.error.details`.
|
|
* Construct and return MCP error `ToolResponse`, relaying these details (e.g., `message` in `content`, `code` in `_meta.backend_error_code`).
|
|
* If `swiftResponse.success === true`:
|
|
* Process `swiftResponse.data` to construct the success MCP `ToolResponse`.
|
|
* Relay `swiftResponse.messages` as `TextContentItem`s in the MCP response if appropriate.
|
|
* For `peekaboo.image` with `input.return_data: true`:
|
|
* Iterate `swiftResponse.data.saved_files.[*].path`.
|
|
* For each path, read image file into a `Buffer`.
|
|
* Base64 encode the `Buffer`.
|
|
* Construct `ImageContentItem` for MCP `ToolResponse.content`, including `data` (Base64 string) and `mimeType` (from `swiftResponse.data.saved_files.[*].mime_type`).
|
|
* Augment successful `ToolResponse` with initial server status string if applicable (see B.6).
|
|
* Send MCP `ToolResponse`.
|
|
|
|
**Tool 1: `peekaboo.image`**
|
|
|
|
* **MCP Description:** "Captures macOS screen content. Targets: entire screen (each display separately), a specific application window, or all windows of an application. Supports foreground/background capture. Captured image(s) can be saved to file(s) and/or returned directly as image data. Window shadows/frames are automatically excluded. Application identification uses intelligent fuzzy matching."
|
|
* **MCP Input Schema (`ImageInputSchema`):**
|
|
```typescript
|
|
z.object({
|
|
app: z.string().optional().describe("Optional. Target application: name, bundle ID, or partial name. If omitted, captures screen(s). Uses fuzzy matching."),
|
|
path: z.string().optional().describe("Optional. Base absolute path for saving. For 'screen' or 'multi' mode, display/window info is appended by backend. If omitted, default temporary paths used by backend. If 'return_data' true, images saved AND returned if 'path' specified."),
|
|
mode: z.enum(["screen", "window", "multi"]).optional().describe("Capture mode. Defaults to 'window' if 'app' is provided, otherwise 'screen'."),
|
|
window_specifier: z.union([
|
|
z.object({ title: z.string().describe("Capture window by title.") }),
|
|
z.object({ index: z.number().int().nonnegative().describe("Capture window by index (0=frontmost). 'capture_focus' might need to be 'foreground'.") }),
|
|
]).optional().describe("Optional. Specifies which window for 'window' mode. Defaults to main/frontmost of target app."),
|
|
format: z.enum(["png", "jpg"]).optional().default("png").describe("Output image format. Defaults to 'png'."),
|
|
return_data: z.boolean().optional().default(false).describe("Optional. If true, image data is returned in response content (one item for 'window' mode, multiple for 'screen' or 'multi' mode)."),
|
|
capture_focus: z.enum(["background", "foreground"])
|
|
.optional().default("background").describe("Optional. Focus behavior. 'background' (default): capture without altering window focus. 'foreground': bring target to front before capture.")
|
|
})
|
|
```
|
|
* **Node.js Handler - Default `mode` Logic:** If `input.app` provided & `input.mode` undefined, `mode="window"`. If no `input.app` & `input.mode` undefined, `mode="screen"`.
|
|
* **MCP Output Schema (`ToolResponse`):**
|
|
* `content`: `Array<ImageContentItem | TextContentItem>`
|
|
* If `input.return_data: true`: Contains `ImageContentItem`(s): `{ type: "image", data: "<base64_string_no_prefix>", mimeType: "image/<format>", metadata?: { item_label?: string, window_title?: string, window_id?: number, source_path?: string } }`.
|
|
* May contain `TextContentItem`(s) (summary, file paths from `saved_files`, Swift CLI `messages`).
|
|
* `saved_files`: `Array<{ path: string, item_label?: string, window_title?: string, window_id?: number, mime_type: string }>` (Directly from Swift CLI JSON `data.saved_files` if images were saved).
|
|
* `isError?: boolean`
|
|
* `_meta?: { backend_error_code?: string }` (For relaying Swift CLI error codes).
|
|
|
|
**Tool 2: `peekaboo.analyze`**
|
|
|
|
* **MCP Description:** "Analyzes an image file using a configured AI model (local Ollama, cloud OpenAI, etc.) and returns a textual analysis/answer. Requires image path. AI provider selection and model defaults are governed by the server's `AI_PROVIDERS` environment variable and client overrides."
|
|
* **MCP Input Schema (`AnalyzeInputSchema`):**
|
|
```typescript
|
|
z.object({
|
|
image_path: z.string().describe("Required. Absolute path to image file (.png, .jpg, .webp) to be analyzed."),
|
|
question: z.string().describe("Required. Question for the AI about the image."),
|
|
provider_config: z.object({
|
|
type: z.enum(["auto", "ollama", "openai" /* future: "anthropic_api" */]).default("auto")
|
|
.describe("AI provider. 'auto' uses server's AI_PROVIDERS ENV preference. Specific provider must be enabled in server's AI_PROVIDERS."),
|
|
model: z.string().optional().describe("Optional. Model name. If omitted, uses model from server's AI_PROVIDERS for chosen provider, or an internal default for that provider.")
|
|
}).optional().describe("Optional. Explicit provider/model. Validated against server's AI_PROVIDERS.")
|
|
})
|
|
```
|
|
* **Node.js Handler Logic:**
|
|
1. Validate input. Server pre-checks `image_path` extension (`.png`, `.jpg`, `.jpeg`, `.webp`); return MCP error if not recognized.
|
|
2. Read `process.env.AI_PROVIDERS`. If unset/empty, return MCP error "AI analysis not configured on this server. Set the AI_PROVIDERS environment variable." Log this with Pino (`error` level).
|
|
3. Parse `AI_PROVIDERS` into `configuredItems = [{provider: string, model: string}]`.
|
|
4. **Determine Provider & Model:**
|
|
* `requestedProviderType = input.provider_config?.type || "auto"`.
|
|
* `requestedModelName = input.provider_config?.model`.
|
|
* `chosenProvider: string | null = null`, `chosenModel: string | null = null`.
|
|
* If `requestedProviderType !== "auto"`:
|
|
* Find entry in `configuredItems` where `provider === requestedProviderType`.
|
|
* If not found, MCP error: "Provider '{requestedProviderType}' is not enabled in server's AI_PROVIDERS configuration."
|
|
* `chosenProvider = requestedProviderType`.
|
|
* `chosenModel = requestedModelName || model_from_matching_configuredItem || hardcoded_default_for_chosenProvider`.
|
|
* Else (`requestedProviderType === "auto"`):
|
|
* Iterate `configuredItems` in order. For each `{provider, modelFromEnv}`:
|
|
* Check availability (Ollama up? Cloud API key for `provider` set in `process.env`?).
|
|
* If available: `chosenProvider = provider`, `chosenModel = requestedModelName || modelFromEnv`. Break.
|
|
* If no provider found after iteration, MCP error: "No configured AI providers in AI_PROVIDERS are currently operational."
|
|
5. **Execute Analysis (Node.js handles all AI calls):**
|
|
* Read `input.image_path` into a `Buffer`. Base64 encode.
|
|
* If `chosenProvider` is "ollama": HTTP POST to Ollama (using `process.env.OLLAMA_BASE_URL`) with Base64 image, `input.question`, `chosenModel`. Handle Ollama API errors.
|
|
* If `chosenProvider` is "openai": Use OpenAI SDK/HTTP with Base64 image, `input.question`, `chosenModel`, and API key from `process.env.OPENAI_API_KEY`. Handle OpenAI API errors.
|
|
* (Similar for other cloud providers).
|
|
6. Construct MCP `ToolResponse`.
|
|
* **MCP Output Schema (`ToolResponse`):**
|
|
* `content`: `[{ type: "text", text: "<AI's analysis/answer>" }]`
|
|
* `analysis_text`: `string` (Core AI answer).
|
|
* `model_used`: `string` (e.g., "ollama/llava:7b", "openai/gpt-4o") - The actual provider/model pair used.
|
|
* `isError?: boolean`
|
|
* `_meta?: { backend_error_code?: string }` (For AI provider API errors).
|
|
|
|
**Tool 3: `peekaboo.list`**
|
|
|
|
* **MCP Description:** "Lists system items: all running applications, windows of a specific app, or server status. Allows specifying window details. App ID uses fuzzy matching."
|
|
* **MCP Input Schema (`ListInputSchema`):**
|
|
```typescript
|
|
z.object({
|
|
item_type: z.enum(["running_applications", "application_windows", "server_status"])
|
|
.default("running_applications").describe("What to list. 'server_status' returns Peekaboo server info."),
|
|
app: z.string().optional().describe("Required if 'item_type' is 'application_windows'. Target application. Uses fuzzy matching."),
|
|
include_window_details: z.array(
|
|
z.enum(["off_screen", "bounds", "ids"])
|
|
).optional().describe("Optional, for 'application_windows'. Additional window details. Example: ['bounds', 'ids']")
|
|
}).refine(data => data.item_type !== "application_windows" || (data.app !== undefined && data.app.trim() !== ""), {
|
|
message: "For 'application_windows', 'app' identifier is required.", path: ["app"],
|
|
}).refine(data => !data.include_window_details || data.item_type === "application_windows", {
|
|
message: "'include_window_details' only for 'application_windows'.", path: ["include_window_details"],
|
|
}).refine(data => data.item_type !== "server_status" || (data.app === undefined && data.include_window_details === undefined), {
|
|
message: "'app' and 'include_window_details' not applicable for 'server_status'.", path: ["item_type"]
|
|
})
|
|
```
|
|
* **Node.js Handler Logic:**
|
|
* If `input.item_type === "server_status"`: Handler directly calls `generateServerStatusString()` and returns it in `ToolResponse.content[{type:"text"}]`. Does NOT call Swift CLI. Does NOT affect `hasSentInitialStatus`.
|
|
* Else (for "running_applications", "application_windows"): Call Swift `peekaboo list ...` with mapped args (including joining `include_window_details` array to comma-separated string for Swift CLI flag). Parse Swift JSON. Format MCP `ToolResponse`.
|
|
* **MCP Output Schema (`ToolResponse`):**
|
|
* `content`: `[{ type: "text", text: "<Summary or Status String>" }]`
|
|
* If `item_type: "running_applications"`: `application_list`: `Array<{ app_name: string; bundle_id: string; pid: number; is_active: boolean; window_count: number }>`.
|
|
* If `item_type: "application_windows"`:
|
|
* `window_list`: `Array<{ window_title: string; window_id?: number; window_index?: number; bounds?: {x:number,y:number,w:number,h:number}; is_on_screen?: boolean }>`.
|
|
* `target_application_info`: `{ app_name: string; bundle_id?: string; pid: number }`.
|
|
* `isError?: boolean`
|
|
* `_meta?: { backend_error_code?: string }`
|
|
|
|
---
|
|
|
|
### II. Swift CLI (`peekaboo`)
|
|
|
|
#### A. General CLI Design
|
|
|
|
1. **Executable Name:** `peekaboo` (Universal macOS binary: arm64 + x86_64).
|
|
2. **Argument Parser:** Use `swift-argument-parser` package.
|
|
3. **Top-Level Commands (Subcommands of `peekaboo`):** `image`, `list`. (No `analyze` command).
|
|
4. **Global Option (for all commands/subcommands):** `--json-output` (Boolean flag).
|
|
* If present: All `stdout` from Swift CLI MUST be a single, valid JSON object. `stderr` should be empty on success, or may contain system-level error text on catastrophic failure before JSON can be formed.
|
|
* If absent: Output human-readable text to `stdout` and `stderr` as appropriate for direct CLI usage.
|
|
* **Success JSON Structure:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"data": { /* Command-specific structured data */ },
|
|
"messages": ["Optional user-facing status/warning message from Swift CLI operations"],
|
|
"debug_logs": ["Internal Swift CLI debug log entry 1", "Another trace message"]
|
|
}
|
|
```
|
|
* **Error JSON Structure:**
|
|
```json
|
|
{
|
|
"success": false,
|
|
"error": {
|
|
"message": "Detailed, user-understandable error message.",
|
|
"code": "SWIFT_ERROR_CODE_STRING", // e.g., PERMISSION_DENIED_SCREEN_RECORDING
|
|
"details": "Optional additional technical details or context."
|
|
},
|
|
"debug_logs": ["Contextual debug log leading to error"]
|
|
}
|
|
```
|
|
* **Standardized Swift Error Codes (`error.code` values):**
|
|
* `PERMISSION_DENIED_SCREEN_RECORDING`
|
|
* `PERMISSION_DENIED_ACCESSIBILITY` (if Accessibility API is attempted for foregrounding)
|
|
* `APP_NOT_FOUND` (general app lookup failure)
|
|
* `AMBIGUOUS_APP_IDENTIFIER` (fuzzy match yields multiple candidates)
|
|
* `WINDOW_NOT_FOUND`
|
|
* `CAPTURE_FAILED` (general image capture error)
|
|
* `FILE_IO_ERROR` (e.g., cannot write to specified path)
|
|
* `INVALID_ARGUMENT` (CLI argument validation failure)
|
|
* `SIPS_ERROR` (if `sips` is used for PDF fallback and fails)
|
|
* `INTERNAL_SWIFT_ERROR` (unexpected Swift runtime errors)
|
|
5. **Permissions Handling:**
|
|
* The CLI must proactively check for Screen Recording permission before attempting any capture or window listing that requires it (e.g., reading window titles via `CGWindowListCopyWindowInfo`).
|
|
* If Accessibility is used for `--capture-focus foreground` window raising, check that permission.
|
|
* If permissions are missing, output the specific JSON error (e.g., code `PERMISSION_DENIED_SCREEN_RECORDING`) and exit. Do not hang or prompt interactively.
|
|
6. **Temporary File Management:**
|
|
* If the CLI needs to save an image temporarily (e.g., if `screencapture` is used as a fallback for PDF, or if no `--path` is given by Node.js), it uses `FileManager.default.temporaryDirectory` with unique filenames (e.g., `peekaboo_<uuid>_<info>.<format>`).
|
|
* These self-created temporary files **MUST be deleted by the Swift CLI** after it has successfully generated and flushed its JSON output to `stdout`.
|
|
* Files saved to a user/Node.js-specified `--path` are **NEVER** deleted by the Swift CLI.
|
|
7. **Internal Logging for `--json-output`:**
|
|
* When `--json-output` is active, internal verbose/debug messages are collected into the `debug_logs: [String]` array in the final JSON output. They are **NOT** printed to `stderr`.
|
|
* For standalone CLI use (no `--json-output`), these debug messages can print to `stderr`.
|
|
|
|
#### B. `peekaboo image` Command
|
|
|
|
* **Options (defined using `swift-argument-parser`):**
|
|
* `--app <String?>`: App identifier.
|
|
* `--path <String?>`: Base output directory or file prefix/path.
|
|
* `--mode <ModeEnum?>`: `ModeEnum` is `screen, window, multi`. Default logic: if `--app` then `window`, else `screen`.
|
|
* `--window-title <String?>`: For `mode window`.
|
|
* `--window-index <Int?>`: For `mode window`.
|
|
* `--format <FormatEnum?>`: `FormatEnum` is `png, jpg`. Default `png`.
|
|
* `--capture-focus <FocusEnum?>`: `FocusEnum` is `background, foreground`. Default `background`.
|
|
* **Behavior:**
|
|
* Implements fuzzy app matching. On ambiguity, returns JSON error with `code: "AMBIGUOUS_APP_IDENTIFIER"` and lists potential matches in `error.details` or `error.message`.
|
|
* Always attempts to exclude window shadow/frame (`CGWindowImageOption.boundsIgnoreFraming` or `screencapture -o` if shelled out for PDF). No cursor is captured.
|
|
* **Background Capture (`--capture-focus background` or default):**
|
|
* Primary method: Uses `CGWindowListCopyWindowInfo` to identify target window(s)/screen(s).
|
|
* Captures via `CGDisplayCreateImage` (for screen mode) or `CGWindowListCreateImageFromArray` (for window/multi modes).
|
|
* Converts `CGImage` to `Data` (PNG or JPG) and saves to file (at user `--path` or its own temp path).
|
|
* **Foreground Capture (`--capture-focus foreground`):**
|
|
* Activates app using `NSRunningApplication.activate(options: [.activateIgnoringOtherApps])`.
|
|
* If a specific window needs raising (e.g., from `--window-index` or specific `--window-title` for an app with many windows), it *may* attempt to use Accessibility API (`AXUIElementPerformAction(kAXRaiseAction)`) if available and permissioned.
|
|
* If specific window raise fails (or Accessibility not used/permitted), it logs a warning to the `debug_logs` array (e.g., "Could not raise specific window; proceeding with frontmost of activated app.") and captures the most suitable front window of the activated app.
|
|
* Capture mechanism is still preferably native CG APIs.
|
|
* **Multi-Screen (`--mode screen`):** Enumerates `CGGetActiveDisplayList`, captures each display using `CGDisplayCreateImage`. Filenames (if saving) get display-specific suffixes (e.g., `_display0_main.png`, `_display1.png`).
|
|
* **Multi-Window (`--mode multi`):** Uses `CGWindowListCopyWindowInfo` for target app's PID, captures each relevant window (on-screen by default) with `CGWindowListCreateImageFromArray`. Filenames get window-specific suffixes.
|
|
* **PDF Format Handling (as per Q7 decision):** If `--format pdf` were still supported (it's removed), it would use `Process` to call `screencapture -t pdf -R<bounds>` or `-l<id>`. Since PDF is removed, this is not applicable.
|
|
* **JSON Output `data` field structure (on success):**
|
|
```json
|
|
{
|
|
"saved_files": [ // Array is always present, even if empty (e.g. capture failed before saving)
|
|
{
|
|
"path": "/absolute/path/to/saved/image.png", // Absolute path
|
|
"item_label": "Display 1 / Main", // Or window_title for window/multi modes
|
|
"window_id": 12345, // CGWindowID (UInt32), optional, if available & relevant
|
|
"window_index": 0, // Optional, if relevant (e.g. for multi-window or indexed capture)
|
|
"mime_type": "image/png" // Actual MIME type of the saved file
|
|
}
|
|
// ... more items if mode is screen or multi ...
|
|
]
|
|
}
|
|
```
|
|
|
|
#### C. `peekaboo list` Command
|
|
|
|
* **Subcommands & Options:**
|
|
* `peekaboo list apps [--json-output]`
|
|
* `peekaboo list windows --app <app_identifier_string> [--include-details <comma_separated_string_of_options>] [--json-output]`
|
|
* `--include-details` options: `off_screen`, `bounds`, `ids`.
|
|
* **Behavior:**
|
|
* `apps`: Uses `NSWorkspace.shared.runningApplications`. For each app, retrieves `localizedName`, `bundleIdentifier`, `processIdentifier` (pid), `isActive`. To get `window_count`, it performs a `CGWindowListCopyWindowInfo` call filtered by the app's PID and counts on-screen windows.
|
|
* `windows`:
|
|
* Resolves `app_identifier` using fuzzy matching. If ambiguous, returns JSON error.
|
|
* Uses `CGWindowListCopyWindowInfo` filtered by the target app's PID.
|
|
* If `--include-details` contains `"off_screen"`, uses `CGWindowListOption.optionAllScreenWindows` (and includes `kCGWindowIsOnscreen` boolean in output). Otherwise, uses `CGWindowListOption.optionOnScreenOnly`.
|
|
* Extracts `kCGWindowName` (title).
|
|
* If `"ids"` in `--include-details`, extracts `kCGWindowNumber` as `window_id`.
|
|
* If `"bounds"` in `--include-details`, extracts `kCGWindowBounds` as `bounds: {x, y, width, height}`.
|
|
* `window_index` is the 0-based index from the filtered array returned by `CGWindowListCopyWindowInfo` (reflecting z-order for on-screen windows).
|
|
* **JSON Output `data` field structure (on success):**
|
|
* For `apps`:
|
|
```json
|
|
{
|
|
"applications": [
|
|
{
|
|
"app_name": "Safari",
|
|
"bundle_id": "com.apple.Safari",
|
|
"pid": 501,
|
|
"is_active": true,
|
|
"window_count": 3 // Count of on-screen windows for this app
|
|
}
|
|
// ... more applications ...
|
|
]
|
|
}
|
|
```
|
|
* For `windows`:
|
|
```json
|
|
{
|
|
"target_application_info": {
|
|
"app_name": "Safari",
|
|
"pid": 501,
|
|
"bundle_id": "com.apple.Safari"
|
|
},
|
|
"windows": [
|
|
{
|
|
"window_title": "Apple",
|
|
"window_id": 67, // if "ids" requested
|
|
"window_index": 0,
|
|
"is_on_screen": true, // Potentially useful, especially if "off_screen" included
|
|
"bounds": {"x": 0, "y": 0, "width": 800, "height": 600} // if "bounds" requested
|
|
}
|
|
// ... more windows ...
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### III. Build, Packaging & Distribution
|
|
|
|
1. **Swift CLI (`peekaboo`):**
|
|
* `Package.swift` defines an executable product named `peekaboo`.
|
|
* Build process (e.g., part of NPM `prepublishOnly` or a separate build script): `swift build -c release --arch arm64 --arch x86_64`.
|
|
* The resulting universal binary (e.g., from `.build/apple/Products/Release/peekaboo`) is copied to the root of the `peekaboo-mcp` NPM package directory before publishing.
|
|
2. **Node.js MCP Server:**
|
|
* TypeScript is compiled to JavaScript (e.g., into `dist/`) using `tsc`.
|
|
* The NPM package includes `dist/` and the `peekaboo` Swift binary (at package root).
|
|
|
|
---
|
|
|
|
### IV. Documentation (`README.md` for `peekaboo-mcp` NPM Package)
|
|
|
|
1. **Project Overview:** Briefly state vision and components.
|
|
2. **Prerequisites:**
|
|
* macOS version (e.g., 12.0+ or as required by Swift/APIs).
|
|
* Xcode Command Line Tools (recommended for a stable development environment on macOS, even if not strictly used by the final Swift binary for all operations).
|
|
* Ollama (if using local Ollama for analysis) + instructions to pull models.
|
|
3. **Installation:**
|
|
* Primary: `npm install -g peekaboo-mcp`.
|
|
* Alternative: `npx peekaboo-mcp`.
|
|
4. **MCP Client Configuration:**
|
|
* Provide example JSON snippets for configuring popular MCP clients (e.g., VS Code, Cursor) to use `peekaboo-mcp`.
|
|
* Example for VS Code/Cursor using `npx` for robustness:
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"PeekabooMCP": {
|
|
"command": "npx",
|
|
"args": ["peekaboo-mcp"],
|
|
"env": {
|
|
"AI_PROVIDERS": "ollama/llava:latest,openai/gpt-4o",
|
|
"OPENAI_API_KEY": "sk-yourkeyhere"
|
|
/* other ENV VARS */
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
5. **Required macOS Permissions:**
|
|
* **Screen Recording:** Essential for ALL `peekaboo.image` functionalities and for `peekaboo.list` if it needs to read window titles (which it does via `CGWindowListCopyWindowInfo`). Provide clear, step-by-step instructions for System Settings. Include `open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"` command.
|
|
* **Accessibility:** Required *only* if `peekaboo.image` with `capture_focus: "foreground"` needs to perform specific window raising actions (beyond simple app activation) via the Accessibility API. Explain this nuance. Include `open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"` command.
|
|
6. **Environment Variables (for Node.js `peekaboo-mcp` server):**
|
|
* `AI_PROVIDERS`: Crucial for `peekaboo.analyze`. Explain format (`provider/model,provider/model`), effect, and that `peekaboo.analyze` reports "not configured" if unset. List recognized `provider` names ("ollama", "openai").
|
|
* `OPENAI_API_KEY` (and similar for other cloud providers): How they are used.
|
|
* `OLLAMA_BASE_URL`: Default and purpose.
|
|
* `LOG_LEVEL`: For `pino` logger. Values and default.
|
|
* `PEEKABOO_MCP_CONSOLE_LOGGING`: For development.
|
|
* `PEEKABOO_CLI_PATH`: For overriding bundled Swift CLI.
|
|
7. **MCP Tool Overview:**
|
|
* Brief descriptions of `peekaboo.image`, `peekaboo.analyze`, `peekaboo.list` and their primary purpose.
|
|
8. **Link to Detailed Tool Specification:** A separate `TOOL_API_REFERENCE.md` (generated from or summarizing the Zod schemas and output structures in this document) for users/AI developers needing full schema details.
|
|
9. **Troubleshooting / Support:** Link to GitHub issues.
|