Peekaboo/docs/spec.md
Peter Steinberger f746dc45c2 Add docs
2025-05-23 05:39:36 +02:00

32 KiB

Peekaboo: Full & Final Detailed Specification v1.1.1

https://aistudio.google.com/prompts/1B0Va41QEZz5ZMiGmLl2gDme8kQ-LQPW-

Project Vision: Peekaboo is a macOS utility exposed via a Node.js MCP server, enabling AI agents to perform advanced screen captures, image analysis via user-configured AI providers, and query application/window information. The core macOS interactions are handled by a native Swift command-line interface (CLI) named peekaboo, which is called by the Node.js server. All image captures automatically exclude window shadows/frames.

Core Components:

  1. Node.js/TypeScript MCP Server (peekaboo-mcp):
    • NPM Package Name: peekaboo-mcp.
    • GitHub Project Name: peekaboo.
    • Implements MCP server logic using the latest stable @modelcontextprotocol/sdk.
    • Exposes three primary MCP tools: peekaboo.image, peekaboo.analyze, peekaboo.list.
    • Translates MCP tool calls into commands for the Swift peekaboo CLI.
    • Parses structured JSON output from the Swift peekaboo CLI.
    • Handles image data preparation (reading files, Base64 encoding) for MCP responses if image data is explicitly requested by the client.
    • Manages interaction with configured AI providers based on environment variables. All AI provider calls (Ollama, OpenAI, etc.) are made from this Node.js layer.
    • Implements robust logging to a file using pino, ensuring no logs interfere with MCP stdio communication.
  2. Swift CLI (peekaboo):
    • A standalone macOS command-line tool, built as a universal binary (arm64 + x86_64).
    • Handles all direct macOS system interactions: image capture, application/window listing, and fuzzy application matching.
    • Does NOT directly interact with any AI providers (Ollama, OpenAI, etc.).
    • Outputs all results and errors in a structured JSON format via a global --json-output flag. This JSON includes a debug_logs array for internal Swift CLI logs, which the Node.js server can relay to its own logger.
    • The peekaboo binary is bundled at the root of the peekaboo-mcp NPM package.

I. Node.js/TypeScript MCP Server (peekaboo-mcp)

A. Project Setup & Distribution

  1. Language/Runtime: Node.js (latest LTS recommended, e.g., v18+ or v20+), TypeScript (latest stable, e.g., v5+).
  2. Package Manager: NPM.
  3. package.json:
    • name: "peekaboo-mcp"
    • version: Semantic versioning (e.g., 1.1.1).
    • type: "module" (for ES Modules).
    • main: "dist/index.js" (compiled server entry point).
    • bin: { "peekaboo-mcp": "dist/index.js" }.
    • files: ["dist/", "peekaboo"] (includes compiled JS and the Swift peekaboo binary at package root).
    • scripts:
      • build: Command to compile TypeScript (e.g., tsc).
      • start: node dist/index.js.
      • prepublishOnly: npm run build.
    • dependencies: @modelcontextprotocol/sdk (latest stable), zod (for input validation), pino (for logging), relevant cloud AI SDKs (e.g., openai, @anthropic-ai/sdk).
    • devDependencies: typescript, @types/node, pino-pretty (for optional development console logging).
  4. Distribution: Published to NPM. Installable via npm i -g peekaboo-mcp or usable with npx peekaboo-mcp.
  5. Swift CLI Location Strategy:
    • The Node.js server will first check the environment variable PEEKABOO_CLI_PATH. If set and points to a valid executable, that path will be used.
    • If PEEKABOO_CLI_PATH is not set or invalid, the server will fall back to a bundled path, resolved relative to its own script location (e.g., path.resolve(path.dirname(fileURLToPath(import.meta.url)), '..', 'peekaboo'), assuming the compiled server script is in dist/ and peekaboo binary is at the package root).

B. Server Initialization & Configuration (src/index.ts)

  1. Imports: McpServer, StdioServerTransport from @modelcontextprotocol/sdk; pino from pino; os, path from Node.js built-ins.
  2. Server Info: name: "PeekabooMCP", version: <package_version from package.json>.
  3. Server Capabilities: Advertise tools capability.
  4. Logging (Pino):
    • Instantiate pino logger.
    • Default Transport: File transport to path.join(os.tmpdir(), 'peekaboo-mcp.log'). Use mkdir: true option for destination.
    • Log Level: Controlled by ENV VAR LOG_LEVEL (standard Pino levels: trace, debug, info, warn, error, fatal). Default: "info".
    • Conditional Console Logging (Development Only): If ENV VAR PEEKABOO_MCP_CONSOLE_LOGGING="true", add a second Pino transport targeting process.stderr.fd (potentially using pino-pretty for human-readable output).
    • Strict Rule: All server operational logging must use the configured Pino instance. No direct console.log/warn/error that might output to stdout.
  5. Environment Variables (Read by Server):
    • AI_PROVIDERS: Comma-separated list of provider_name/default_model_for_provider pairs (e.g., "openai/gpt-4o,ollama/qwen2.5vl:7b"). If unset/empty, peekaboo.analyze tool reports AI not configured.
    • OPENAI_API_KEY: API key for OpenAI.
    • ANTHROPIC_API_KEY: (Example for future) API key for Anthropic.
    • (Other cloud provider API keys as standard ENV VAR names).
    • OLLAMA_BASE_URL: Base URL for local Ollama instance. Default: "http://localhost:11434".
    • LOG_LEVEL: For Pino logger. Default: "info".
    • PEEKABOO_MCP_CONSOLE_LOGGING: Boolean ("true"/"false") for dev console logs. Default: "false".
    • PEEKABOO_CLI_PATH: Optional override for Swift peekaboo CLI path.
  6. Initial Status Reporting Logic:
    • A server-instance-level boolean flag: let hasSentInitialStatus = false;.
    • A function generateServerStatusString(): Creates a formatted string: "\n\n--- Peekaboo MCP Server Status ---\nName: PeekabooMCP\nVersion: <server_version>\nConfigured AI Providers (from AI_PROVIDERS ENV): <parsed list or 'None Configured. Set AI_PROVIDERS ENV.'>\n---".
    • Response Augmentation: In the function that sends a ToolResponse back to the MCP client, if the response is for a successful tool call (not initialize/initialized or peekaboo.list with item_type: "server_status") AND hasSentInitialStatus is false:
      • Append generateServerStatusString() to the first TextContentItem in ToolResponse.content. If no text item exists, prepend a new one.
      • Set hasSentInitialStatus = true.
  7. Tool Registration: Register peekaboo.image, peekaboo.analyze, peekaboo.list with their Zod input schemas and handler functions.
  8. Transport: await server.connect(new StdioServerTransport());.
  9. Shutdown: Implement graceful shutdown on SIGINT, SIGTERM (e.g., await server.close(); logger.flush(); process.exit(0);).

C. MCP Tool Specifications & Node.js Handler Logic

General Node.js Handler Pattern (for tools calling Swift peekaboo CLI):

  1. Validate MCP input against the tool's Zod schema. If invalid, log error with Pino and return MCP error ToolResponse.
  2. Construct command-line arguments for Swift peekaboo CLI based on MCP input. Always include --json-output.
  3. Log the constructed Swift command with Pino at debug level.
  4. Execute Swift peekaboo CLI using child_process.spawn, capturing stdout, stderr, and exitCode.
  5. If any data is received on Swift CLI's stderr, log it immediately with Pino at warn level, prefixed (e.g., [SwiftCLI-stderr]).
  6. On Swift CLI process close:
    • If exitCode !== 0 or stdout is empty/not parseable as JSON:
      • Log failure details with Pino (error level).
      • Construct MCP error ToolResponse (e.g., errorCode: "SWIFT_CLI_EXECUTION_ERROR" or SWIFT_CLI_INVALID_OUTPUT in _meta). Message should include relevant parts of raw stdout/stderr if available.
    • If exitCode === 0:
      • Attempt to parse stdout as JSON. If parsing fails, treat as error (above).
      • Let swiftResponse = JSON.parse(stdout).
      • If swiftResponse.debug_logs (array of strings) exists, log each entry via Pino at debug level, clearly marked as from backend (e.g., logger.debug({ backend: "swift", swift_log: entry })).
      • If swiftResponse.success === false:
        • Extract swiftResponse.error.message, swiftResponse.error.code, swiftResponse.error.details.
        • Construct and return MCP error ToolResponse, relaying these details (e.g., message in content, code in _meta.backend_error_code).
      • If swiftResponse.success === true:
        • Process swiftResponse.data to construct the success MCP ToolResponse.
        • Relay swiftResponse.messages as TextContentItems in the MCP response if appropriate.
        • For peekaboo.image with input.return_data: true:
          • Iterate swiftResponse.data.saved_files.[*].path.
          • For each path, read image file into a Buffer.
          • Base64 encode the Buffer.
          • Construct ImageContentItem for MCP ToolResponse.content, including data (Base64 string) and mimeType (from swiftResponse.data.saved_files.[*].mime_type).
    • Augment successful ToolResponse with initial server status string if applicable (see B.6).
    • Send MCP ToolResponse.

Tool 1: peekaboo.image

  • MCP Description: "Captures macOS screen content. Targets: entire screen (each display separately), a specific application window, or all windows of an application. Supports foreground/background capture. Captured image(s) can be saved to file(s) and/or returned directly as image data. Window shadows/frames are automatically excluded. Application identification uses intelligent fuzzy matching."
  • MCP Input Schema (ImageInputSchema):
    z.object({
      app: z.string().optional().describe("Optional. Target application: name, bundle ID, or partial name. If omitted, captures screen(s). Uses fuzzy matching."),
      path: z.string().optional().describe("Optional. Base absolute path for saving. For 'screen' or 'multi' mode, display/window info is appended by backend. If omitted, default temporary paths used by backend. If 'return_data' true, images saved AND returned if 'path' specified."),
      mode: z.enum(["screen", "window", "multi"]).optional().describe("Capture mode. Defaults to 'window' if 'app' is provided, otherwise 'screen'."),
      window_specifier: z.union([
        z.object({ title: z.string().describe("Capture window by title.") }),
        z.object({ index: z.number().int().nonnegative().describe("Capture window by index (0=frontmost). 'capture_focus' might need to be 'foreground'.") }),
      ]).optional().describe("Optional. Specifies which window for 'window' mode. Defaults to main/frontmost of target app."),
      format: z.enum(["png", "jpg"]).optional().default("png").describe("Output image format. Defaults to 'png'."),
      return_data: z.boolean().optional().default(false).describe("Optional. If true, image data is returned in response content (one item for 'window' mode, multiple for 'screen' or 'multi' mode)."),
      capture_focus: z.enum(["background", "foreground"])
        .optional().default("background").describe("Optional. Focus behavior. 'background' (default): capture without altering window focus. 'foreground': bring target to front before capture.")
    })
    
    • Node.js Handler - Default mode Logic: If input.app provided & input.mode undefined, mode="window". If no input.app & input.mode undefined, mode="screen".
  • MCP Output Schema (ToolResponse):
    • content: Array<ImageContentItem | TextContentItem>
      • If input.return_data: true: Contains ImageContentItem(s): { type: "image", data: "<base64_string_no_prefix>", mimeType: "image/<format>", metadata?: { item_label?: string, window_title?: string, window_id?: number, source_path?: string } }.
      • May contain TextContentItem(s) (summary, file paths from saved_files, Swift CLI messages).
    • saved_files: Array<{ path: string, item_label?: string, window_title?: string, window_id?: number, mime_type: string }> (Directly from Swift CLI JSON data.saved_files if images were saved).
    • isError?: boolean
    • _meta?: { backend_error_code?: string } (For relaying Swift CLI error codes).

Tool 2: peekaboo.analyze

  • MCP Description: "Analyzes an image file using a configured AI model (local Ollama, cloud OpenAI, etc.) and returns a textual analysis/answer. Requires image path. AI provider selection and model defaults are governed by the server's AI_PROVIDERS environment variable and client overrides."
  • MCP Input Schema (AnalyzeInputSchema):
    z.object({
      image_path: z.string().describe("Required. Absolute path to image file (.png, .jpg, .webp) to be analyzed."),
      question: z.string().describe("Required. Question for the AI about the image."),
      provider_config: z.object({
        type: z.enum(["auto", "ollama", "openai" /* future: "anthropic_api" */]).default("auto")
          .describe("AI provider. 'auto' uses server's AI_PROVIDERS ENV preference. Specific provider must be enabled in server's AI_PROVIDERS."),
        model: z.string().optional().describe("Optional. Model name. If omitted, uses model from server's AI_PROVIDERS for chosen provider, or an internal default for that provider.")
      }).optional().describe("Optional. Explicit provider/model. Validated against server's AI_PROVIDERS.")
    })
    
  • Node.js Handler Logic:
    1. Validate input. Server pre-checks image_path extension (.png, .jpg, .jpeg, .webp); return MCP error if not recognized.
    2. Read process.env.AI_PROVIDERS. If unset/empty, return MCP error "AI analysis not configured on this server. Set the AI_PROVIDERS environment variable." Log this with Pino (error level).
    3. Parse AI_PROVIDERS into configuredItems = [{provider: string, model: string}].
    4. Determine Provider & Model:
      • requestedProviderType = input.provider_config?.type || "auto".
      • requestedModelName = input.provider_config?.model.
      • chosenProvider: string | null = null, chosenModel: string | null = null.
      • If requestedProviderType !== "auto":
        • Find entry in configuredItems where provider === requestedProviderType.
        • If not found, MCP error: "Provider '{requestedProviderType}' is not enabled in server's AI_PROVIDERS configuration."
        • chosenProvider = requestedProviderType.
        • chosenModel = requestedModelName || model_from_matching_configuredItem || hardcoded_default_for_chosenProvider.
      • Else (requestedProviderType === "auto"):
        • Iterate configuredItems in order. For each {provider, modelFromEnv}:
          • Check availability (Ollama up? Cloud API key for provider set in process.env?).
          • If available: chosenProvider = provider, chosenModel = requestedModelName || modelFromEnv. Break.
        • If no provider found after iteration, MCP error: "No configured AI providers in AI_PROVIDERS are currently operational."
    5. Execute Analysis (Node.js handles all AI calls):
      • Read input.image_path into a Buffer. Base64 encode.
      • If chosenProvider is "ollama": HTTP POST to Ollama (using process.env.OLLAMA_BASE_URL) with Base64 image, input.question, chosenModel. Handle Ollama API errors.
      • If chosenProvider is "openai": Use OpenAI SDK/HTTP with Base64 image, input.question, chosenModel, and API key from process.env.OPENAI_API_KEY. Handle OpenAI API errors.
      • (Similar for other cloud providers).
    6. Construct MCP ToolResponse.
  • MCP Output Schema (ToolResponse):
    • content: [{ type: "text", text: "<AI's analysis/answer>" }]
    • analysis_text: string (Core AI answer).
    • model_used: string (e.g., "ollama/llava:7b", "openai/gpt-4o") - The actual provider/model pair used.
    • isError?: boolean
    • _meta?: { backend_error_code?: string } (For AI provider API errors).

Tool 3: peekaboo.list

  • MCP Description: "Lists system items: all running applications, windows of a specific app, or server status. Allows specifying window details. App ID uses fuzzy matching."
  • MCP Input Schema (ListInputSchema):
    z.object({
      item_type: z.enum(["running_applications", "application_windows", "server_status"])
        .default("running_applications").describe("What to list. 'server_status' returns Peekaboo server info."),
      app: z.string().optional().describe("Required if 'item_type' is 'application_windows'. Target application. Uses fuzzy matching."),
      include_window_details: z.array(
        z.enum(["off_screen", "bounds", "ids"])
      ).optional().describe("Optional, for 'application_windows'. Additional window details. Example: ['bounds', 'ids']")
    }).refine(data => data.item_type !== "application_windows" || (data.app !== undefined && data.app.trim() !== ""), {
      message: "For 'application_windows', 'app' identifier is required.", path: ["app"],
    }).refine(data => !data.include_window_details || data.item_type === "application_windows", {
      message: "'include_window_details' only for 'application_windows'.", path: ["include_window_details"],
    }).refine(data => data.item_type !== "server_status" || (data.app === undefined && data.include_window_details === undefined), {
        message: "'app' and 'include_window_details' not applicable for 'server_status'.", path: ["item_type"]
    })
    
  • Node.js Handler Logic:
    • If input.item_type === "server_status": Handler directly calls generateServerStatusString() and returns it in ToolResponse.content[{type:"text"}]. Does NOT call Swift CLI. Does NOT affect hasSentInitialStatus.
    • Else (for "running_applications", "application_windows"): Call Swift peekaboo list ... with mapped args (including joining include_window_details array to comma-separated string for Swift CLI flag). Parse Swift JSON. Format MCP ToolResponse.
  • MCP Output Schema (ToolResponse):
    • content: [{ type: "text", text: "<Summary or Status String>" }]
    • If item_type: "running_applications": application_list: Array<{ app_name: string; bundle_id: string; pid: number; is_active: boolean; window_count: number }>.
    • If item_type: "application_windows":
      • window_list: Array<{ window_title: string; window_id?: number; window_index?: number; bounds?: {x:number,y:number,w:number,h:number}; is_on_screen?: boolean }>.
      • target_application_info: { app_name: string; bundle_id?: string; pid: number }.
    • isError?: boolean
    • _meta?: { backend_error_code?: string }

II. Swift CLI (peekaboo)

A. General CLI Design

  1. Executable Name: peekaboo (Universal macOS binary: arm64 + x86_64).
  2. Argument Parser: Use swift-argument-parser package.
  3. Top-Level Commands (Subcommands of peekaboo): image, list. (No analyze command).
  4. Global Option (for all commands/subcommands): --json-output (Boolean flag).
    • If present: All stdout from Swift CLI MUST be a single, valid JSON object. stderr should be empty on success, or may contain system-level error text on catastrophic failure before JSON can be formed.
    • If absent: Output human-readable text to stdout and stderr as appropriate for direct CLI usage.
    • Success JSON Structure:
      {
        "success": true,
        "data": { /* Command-specific structured data */ },
        "messages": ["Optional user-facing status/warning message from Swift CLI operations"],
        "debug_logs": ["Internal Swift CLI debug log entry 1", "Another trace message"]
      }
      
    • Error JSON Structure:
      {
        "success": false,
        "error": {
          "message": "Detailed, user-understandable error message.",
          "code": "SWIFT_ERROR_CODE_STRING", // e.g., PERMISSION_DENIED_SCREEN_RECORDING
          "details": "Optional additional technical details or context."
        },
        "debug_logs": ["Contextual debug log leading to error"]
      }
      
    • Standardized Swift Error Codes (error.code values):
      • PERMISSION_DENIED_SCREEN_RECORDING
      • PERMISSION_DENIED_ACCESSIBILITY (if Accessibility API is attempted for foregrounding)
      • APP_NOT_FOUND (general app lookup failure)
      • AMBIGUOUS_APP_IDENTIFIER (fuzzy match yields multiple candidates)
      • WINDOW_NOT_FOUND
      • CAPTURE_FAILED (general image capture error)
      • FILE_IO_ERROR (e.g., cannot write to specified path)
      • INVALID_ARGUMENT (CLI argument validation failure)
      • SIPS_ERROR (if sips is used for PDF fallback and fails)
      • INTERNAL_SWIFT_ERROR (unexpected Swift runtime errors)
  5. Permissions Handling:
    • The CLI must proactively check for Screen Recording permission before attempting any capture or window listing that requires it (e.g., reading window titles via CGWindowListCopyWindowInfo).
    • If Accessibility is used for --capture-focus foreground window raising, check that permission.
    • If permissions are missing, output the specific JSON error (e.g., code PERMISSION_DENIED_SCREEN_RECORDING) and exit. Do not hang or prompt interactively.
  6. Temporary File Management:
    • If the CLI needs to save an image temporarily (e.g., if screencapture is used as a fallback for PDF, or if no --path is given by Node.js), it uses FileManager.default.temporaryDirectory with unique filenames (e.g., peekaboo_<uuid>_<info>.<format>).
    • These self-created temporary files MUST be deleted by the Swift CLI after it has successfully generated and flushed its JSON output to stdout.
    • Files saved to a user/Node.js-specified --path are NEVER deleted by the Swift CLI.
  7. Internal Logging for --json-output:
    • When --json-output is active, internal verbose/debug messages are collected into the debug_logs: [String] array in the final JSON output. They are NOT printed to stderr.
    • For standalone CLI use (no --json-output), these debug messages can print to stderr.

B. peekaboo image Command

  • Options (defined using swift-argument-parser):
    • --app <String?>: App identifier.
    • --path <String?>: Base output directory or file prefix/path.
    • --mode <ModeEnum?>: ModeEnum is screen, window, multi. Default logic: if --app then window, else screen.
    • --window-title <String?>: For mode window.
    • --window-index <Int?>: For mode window.
    • --format <FormatEnum?>: FormatEnum is png, jpg. Default png.
    • --capture-focus <FocusEnum?>: FocusEnum is background, foreground. Default background.
  • Behavior:
    • Implements fuzzy app matching. On ambiguity, returns JSON error with code: "AMBIGUOUS_APP_IDENTIFIER" and lists potential matches in error.details or error.message.
    • Always attempts to exclude window shadow/frame (CGWindowImageOption.boundsIgnoreFraming or screencapture -o if shelled out for PDF). No cursor is captured.
    • Background Capture (--capture-focus background or default):
      • Primary method: Uses CGWindowListCopyWindowInfo to identify target window(s)/screen(s).
      • Captures via CGDisplayCreateImage (for screen mode) or CGWindowListCreateImageFromArray (for window/multi modes).
      • Converts CGImage to Data (PNG or JPG) and saves to file (at user --path or its own temp path).
    • Foreground Capture (--capture-focus foreground):
      • Activates app using NSRunningApplication.activate(options: [.activateIgnoringOtherApps]).
      • If a specific window needs raising (e.g., from --window-index or specific --window-title for an app with many windows), it may attempt to use Accessibility API (AXUIElementPerformAction(kAXRaiseAction)) if available and permissioned.
      • If specific window raise fails (or Accessibility not used/permitted), it logs a warning to the debug_logs array (e.g., "Could not raise specific window; proceeding with frontmost of activated app.") and captures the most suitable front window of the activated app.
      • Capture mechanism is still preferably native CG APIs.
    • Multi-Screen (--mode screen): Enumerates CGGetActiveDisplayList, captures each display using CGDisplayCreateImage. Filenames (if saving) get display-specific suffixes (e.g., _display0_main.png, _display1.png).
    • Multi-Window (--mode multi): Uses CGWindowListCopyWindowInfo for target app's PID, captures each relevant window (on-screen by default) with CGWindowListCreateImageFromArray. Filenames get window-specific suffixes.
    • PDF Format Handling (as per Q7 decision): If --format pdf were still supported (it's removed), it would use Process to call screencapture -t pdf -R<bounds> or -l<id>. Since PDF is removed, this is not applicable.
  • JSON Output data field structure (on success):
    {
      "saved_files": [ // Array is always present, even if empty (e.g. capture failed before saving)
        {
          "path": "/absolute/path/to/saved/image.png", // Absolute path
          "item_label": "Display 1 / Main", // Or window_title for window/multi modes
          "window_id": 12345, // CGWindowID (UInt32), optional, if available & relevant
          "window_index": 0,  // Optional, if relevant (e.g. for multi-window or indexed capture)
          "mime_type": "image/png" // Actual MIME type of the saved file
        }
        // ... more items if mode is screen or multi ...
      ]
    }
    

C. peekaboo list Command

  • Subcommands & Options:
    • peekaboo list apps [--json-output]
    • peekaboo list windows --app <app_identifier_string> [--include-details <comma_separated_string_of_options>] [--json-output]
      • --include-details options: off_screen, bounds, ids.
  • Behavior:
    • apps: Uses NSWorkspace.shared.runningApplications. For each app, retrieves localizedName, bundleIdentifier, processIdentifier (pid), isActive. To get window_count, it performs a CGWindowListCopyWindowInfo call filtered by the app's PID and counts on-screen windows.
    • windows:
      • Resolves app_identifier using fuzzy matching. If ambiguous, returns JSON error.
      • Uses CGWindowListCopyWindowInfo filtered by the target app's PID.
      • If --include-details contains "off_screen", uses CGWindowListOption.optionAllScreenWindows (and includes kCGWindowIsOnscreen boolean in output). Otherwise, uses CGWindowListOption.optionOnScreenOnly.
      • Extracts kCGWindowName (title).
      • If "ids" in --include-details, extracts kCGWindowNumber as window_id.
      • If "bounds" in --include-details, extracts kCGWindowBounds as bounds: {x, y, width, height}.
      • window_index is the 0-based index from the filtered array returned by CGWindowListCopyWindowInfo (reflecting z-order for on-screen windows).
  • JSON Output data field structure (on success):
    • For apps:
      {
        "applications": [
          {
            "app_name": "Safari",
            "bundle_id": "com.apple.Safari",
            "pid": 501,
            "is_active": true,
            "window_count": 3 // Count of on-screen windows for this app
          }
          // ... more applications ...
        ]
      }
      
    • For windows:
      {
        "target_application_info": {
          "app_name": "Safari",
          "pid": 501,
          "bundle_id": "com.apple.Safari"
        },
        "windows": [
          {
            "window_title": "Apple",
            "window_id": 67, // if "ids" requested
            "window_index": 0,
            "is_on_screen": true, // Potentially useful, especially if "off_screen" included
            "bounds": {"x": 0, "y": 0, "width": 800, "height": 600} // if "bounds" requested
          }
          // ... more windows ...
        ]
      }
      

III. Build, Packaging & Distribution

  1. Swift CLI (peekaboo):
    • Package.swift defines an executable product named peekaboo.
    • Build process (e.g., part of NPM prepublishOnly or a separate build script): swift build -c release --arch arm64 --arch x86_64.
    • The resulting universal binary (e.g., from .build/apple/Products/Release/peekaboo) is copied to the root of the peekaboo-mcp NPM package directory before publishing.
  2. Node.js MCP Server:
    • TypeScript is compiled to JavaScript (e.g., into dist/) using tsc.
    • The NPM package includes dist/ and the peekaboo Swift binary (at package root).

IV. Documentation (README.md for peekaboo-mcp NPM Package)

  1. Project Overview: Briefly state vision and components.
  2. Prerequisites:
    • macOS version (e.g., 12.0+ or as required by Swift/APIs).
    • Xcode Command Line Tools (recommended for a stable development environment on macOS, even if not strictly used by the final Swift binary for all operations).
    • Ollama (if using local Ollama for analysis) + instructions to pull models.
  3. Installation:
    • Primary: npm install -g peekaboo-mcp.
    • Alternative: npx peekaboo-mcp.
  4. MCP Client Configuration:
    • Provide example JSON snippets for configuring popular MCP clients (e.g., VS Code, Cursor) to use peekaboo-mcp.
    • Example for VS Code/Cursor using npx for robustness:
      {
        "mcpServers": {
          "PeekabooMCP": {
            "command": "npx",
            "args": ["peekaboo-mcp"],
            "env": {
              "AI_PROVIDERS": "ollama/llava:latest,openai/gpt-4o",
              "OPENAI_API_KEY": "sk-yourkeyhere"
              /* other ENV VARS */
            }
          }
        }
      }
      
  5. Required macOS Permissions:
    • Screen Recording: Essential for ALL peekaboo.image functionalities and for peekaboo.list if it needs to read window titles (which it does via CGWindowListCopyWindowInfo). Provide clear, step-by-step instructions for System Settings. Include open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture" command.
    • Accessibility: Required only if peekaboo.image with capture_focus: "foreground" needs to perform specific window raising actions (beyond simple app activation) via the Accessibility API. Explain this nuance. Include open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility" command.
  6. Environment Variables (for Node.js peekaboo-mcp server):
    • AI_PROVIDERS: Crucial for peekaboo.analyze. Explain format (provider/model,provider/model), effect, and that peekaboo.analyze reports "not configured" if unset. List recognized provider names ("ollama", "openai").
    • OPENAI_API_KEY (and similar for other cloud providers): How they are used.
    • OLLAMA_BASE_URL: Default and purpose.
    • LOG_LEVEL: For pino logger. Values and default.
    • PEEKABOO_MCP_CONSOLE_LOGGING: For development.
    • PEEKABOO_CLI_PATH: For overriding bundled Swift CLI.
  7. MCP Tool Overview:
    • Brief descriptions of peekaboo.image, peekaboo.analyze, peekaboo.list and their primary purpose.
  8. Link to Detailed Tool Specification: A separate TOOL_API_REFERENCE.md (generated from or summarizing the Zod schemas and output structures in this document) for users/AI developers needing full schema details.
  9. Troubleshooting / Support: Link to GitHub issues.