mirror of
https://github.com/samsonjs/Peekaboo.git
synced 2026-03-25 09:25:47 +00:00
32 KiB
32 KiB
Peekaboo: Full & Final Detailed Specification v1.1.1
https://aistudio.google.com/prompts/1B0Va41QEZz5ZMiGmLl2gDme8kQ-LQPW-
Project Vision: Peekaboo is a macOS utility exposed via a Node.js MCP server, enabling AI agents to perform advanced screen captures, image analysis via user-configured AI providers, and query application/window information. The core macOS interactions are handled by a native Swift command-line interface (CLI) named peekaboo, which is called by the Node.js server. All image captures automatically exclude window shadows/frames.
Core Components:
- Node.js/TypeScript MCP Server (
peekaboo-mcp):- NPM Package Name:
peekaboo-mcp. - GitHub Project Name:
peekaboo. - Implements MCP server logic using the latest stable
@modelcontextprotocol/sdk. - Exposes three primary MCP tools:
peekaboo.image,peekaboo.analyze,peekaboo.list. - Translates MCP tool calls into commands for the Swift
peekabooCLI. - Parses structured JSON output from the Swift
peekabooCLI. - Handles image data preparation (reading files, Base64 encoding) for MCP responses if image data is explicitly requested by the client.
- Manages interaction with configured AI providers based on environment variables. All AI provider calls (Ollama, OpenAI, etc.) are made from this Node.js layer.
- Implements robust logging to a file using
pino, ensuring no logs interfere with MCP stdio communication.
- NPM Package Name:
- Swift CLI (
peekaboo):- A standalone macOS command-line tool, built as a universal binary (arm64 + x86_64).
- Handles all direct macOS system interactions: image capture, application/window listing, and fuzzy application matching.
- Does NOT directly interact with any AI providers (Ollama, OpenAI, etc.).
- Outputs all results and errors in a structured JSON format via a global
--json-outputflag. This JSON includes adebug_logsarray for internal Swift CLI logs, which the Node.js server can relay to its own logger. - The
peekaboobinary is bundled at the root of thepeekaboo-mcpNPM package.
I. Node.js/TypeScript MCP Server (peekaboo-mcp)
A. Project Setup & Distribution
- Language/Runtime: Node.js (latest LTS recommended, e.g., v18+ or v20+), TypeScript (latest stable, e.g., v5+).
- Package Manager: NPM.
package.json:name:"peekaboo-mcp"version: Semantic versioning (e.g.,1.1.1).type:"module"(for ES Modules).main:"dist/index.js"(compiled server entry point).bin:{ "peekaboo-mcp": "dist/index.js" }.files:["dist/", "peekaboo"](includes compiled JS and the Swiftpeekaboobinary at package root).scripts:build: Command to compile TypeScript (e.g.,tsc).start:node dist/index.js.prepublishOnly:npm run build.
dependencies:@modelcontextprotocol/sdk(latest stable),zod(for input validation),pino(for logging), relevant cloud AI SDKs (e.g.,openai,@anthropic-ai/sdk).devDependencies:typescript,@types/node,pino-pretty(for optional development console logging).
- Distribution: Published to NPM. Installable via
npm i -g peekaboo-mcpor usable withnpx peekaboo-mcp. - Swift CLI Location Strategy:
- The Node.js server will first check the environment variable
PEEKABOO_CLI_PATH. If set and points to a valid executable, that path will be used. - If
PEEKABOO_CLI_PATHis not set or invalid, the server will fall back to a bundled path, resolved relative to its own script location (e.g.,path.resolve(path.dirname(fileURLToPath(import.meta.url)), '..', 'peekaboo'), assuming the compiled server script is indist/andpeekaboobinary is at the package root).
- The Node.js server will first check the environment variable
B. Server Initialization & Configuration (src/index.ts)
- Imports:
McpServer,StdioServerTransportfrom@modelcontextprotocol/sdk;pinofrompino;os,pathfrom Node.js built-ins. - Server Info:
name: "PeekabooMCP",version: <package_version from package.json>. - Server Capabilities: Advertise
toolscapability. - Logging (Pino):
- Instantiate
pinologger. - Default Transport: File transport to
path.join(os.tmpdir(), 'peekaboo-mcp.log'). Usemkdir: trueoption for destination. - Log Level: Controlled by ENV VAR
LOG_LEVEL(standard Pino levels:trace,debug,info,warn,error,fatal). Default:"info". - Conditional Console Logging (Development Only): If ENV VAR
PEEKABOO_MCP_CONSOLE_LOGGING="true", add a second Pino transport targetingprocess.stderr.fd(potentially usingpino-prettyfor human-readable output). - Strict Rule: All server operational logging must use the configured Pino instance. No direct
console.log/warn/errorthat might output tostdout.
- Instantiate
- Environment Variables (Read by Server):
AI_PROVIDERS: Comma-separated list ofprovider_name/default_model_for_providerpairs (e.g.,"openai/gpt-4o,ollama/qwen2.5vl:7b"). If unset/empty,peekaboo.analyzetool reports AI not configured.OPENAI_API_KEY: API key for OpenAI.ANTHROPIC_API_KEY: (Example for future) API key for Anthropic.- (Other cloud provider API keys as standard ENV VAR names).
OLLAMA_BASE_URL: Base URL for local Ollama instance. Default:"http://localhost:11434".LOG_LEVEL: For Pino logger. Default:"info".PEEKABOO_MCP_CONSOLE_LOGGING: Boolean ("true"/"false") for dev console logs. Default:"false".PEEKABOO_CLI_PATH: Optional override for SwiftpeekabooCLI path.
- Initial Status Reporting Logic:
- A server-instance-level boolean flag:
let hasSentInitialStatus = false;. - A function
generateServerStatusString(): Creates a formatted string:"\n\n--- Peekaboo MCP Server Status ---\nName: PeekabooMCP\nVersion: <server_version>\nConfigured AI Providers (from AI_PROVIDERS ENV): <parsed list or 'None Configured. Set AI_PROVIDERS ENV.'>\n---". - Response Augmentation: In the function that sends a
ToolResponseback to the MCP client, if the response is for a successful tool call (notinitialize/initializedorpeekaboo.listwithitem_type: "server_status") ANDhasSentInitialStatusisfalse:- Append
generateServerStatusString()to the firstTextContentIteminToolResponse.content. If no text item exists, prepend a new one. - Set
hasSentInitialStatus = true.
- Append
- A server-instance-level boolean flag:
- Tool Registration: Register
peekaboo.image,peekaboo.analyze,peekaboo.listwith their Zod input schemas and handler functions. - Transport:
await server.connect(new StdioServerTransport());. - Shutdown: Implement graceful shutdown on
SIGINT,SIGTERM(e.g.,await server.close(); logger.flush(); process.exit(0);).
C. MCP Tool Specifications & Node.js Handler Logic
General Node.js Handler Pattern (for tools calling Swift peekaboo CLI):
- Validate MCP
inputagainst the tool's Zod schema. If invalid, log error with Pino and return MCP errorToolResponse. - Construct command-line arguments for Swift
peekabooCLI based on MCPinput. Always include--json-output. - Log the constructed Swift command with Pino at
debuglevel. - Execute Swift
peekabooCLI usingchild_process.spawn, capturingstdout,stderr, andexitCode. - If any data is received on Swift CLI's
stderr, log it immediately with Pino atwarnlevel, prefixed (e.g.,[SwiftCLI-stderr]). - On Swift CLI process close:
- If
exitCode !== 0orstdoutis empty/not parseable as JSON:- Log failure details with Pino (
errorlevel). - Construct MCP error
ToolResponse(e.g.,errorCode: "SWIFT_CLI_EXECUTION_ERROR"orSWIFT_CLI_INVALID_OUTPUTin_meta). Message should include relevant parts of rawstdout/stderrif available.
- Log failure details with Pino (
- If
exitCode === 0:- Attempt to parse
stdoutas JSON. If parsing fails, treat as error (above). - Let
swiftResponse = JSON.parse(stdout). - If
swiftResponse.debug_logs(array of strings) exists, log each entry via Pino atdebuglevel, clearly marked as from backend (e.g.,logger.debug({ backend: "swift", swift_log: entry })). - If
swiftResponse.success === false:- Extract
swiftResponse.error.message,swiftResponse.error.code,swiftResponse.error.details. - Construct and return MCP error
ToolResponse, relaying these details (e.g.,messageincontent,codein_meta.backend_error_code).
- Extract
- If
swiftResponse.success === true:- Process
swiftResponse.datato construct the success MCPToolResponse. - Relay
swiftResponse.messagesasTextContentItems in the MCP response if appropriate. - For
peekaboo.imagewithinput.return_data: true:- Iterate
swiftResponse.data.saved_files.[*].path. - For each path, read image file into a
Buffer. - Base64 encode the
Buffer. - Construct
ImageContentItemfor MCPToolResponse.content, includingdata(Base64 string) andmimeType(fromswiftResponse.data.saved_files.[*].mime_type).
- Iterate
- Process
- Attempt to parse
- Augment successful
ToolResponsewith initial server status string if applicable (see B.6). - Send MCP
ToolResponse.
- If
Tool 1: peekaboo.image
- MCP Description: "Captures macOS screen content. Targets: entire screen (each display separately), a specific application window, or all windows of an application. Supports foreground/background capture. Captured image(s) can be saved to file(s) and/or returned directly as image data. Window shadows/frames are automatically excluded. Application identification uses intelligent fuzzy matching."
- MCP Input Schema (
ImageInputSchema):z.object({ app: z.string().optional().describe("Optional. Target application: name, bundle ID, or partial name. If omitted, captures screen(s). Uses fuzzy matching."), path: z.string().optional().describe("Optional. Base absolute path for saving. For 'screen' or 'multi' mode, display/window info is appended by backend. If omitted, default temporary paths used by backend. If 'return_data' true, images saved AND returned if 'path' specified."), mode: z.enum(["screen", "window", "multi"]).optional().describe("Capture mode. Defaults to 'window' if 'app' is provided, otherwise 'screen'."), window_specifier: z.union([ z.object({ title: z.string().describe("Capture window by title.") }), z.object({ index: z.number().int().nonnegative().describe("Capture window by index (0=frontmost). 'capture_focus' might need to be 'foreground'.") }), ]).optional().describe("Optional. Specifies which window for 'window' mode. Defaults to main/frontmost of target app."), format: z.enum(["png", "jpg"]).optional().default("png").describe("Output image format. Defaults to 'png'."), return_data: z.boolean().optional().default(false).describe("Optional. If true, image data is returned in response content (one item for 'window' mode, multiple for 'screen' or 'multi' mode)."), capture_focus: z.enum(["background", "foreground"]) .optional().default("background").describe("Optional. Focus behavior. 'background' (default): capture without altering window focus. 'foreground': bring target to front before capture.") })- Node.js Handler - Default
modeLogic: Ifinput.appprovided &input.modeundefined,mode="window". If noinput.app&input.modeundefined,mode="screen".
- Node.js Handler - Default
- MCP Output Schema (
ToolResponse):content:Array<ImageContentItem | TextContentItem>- If
input.return_data: true: ContainsImageContentItem(s):{ type: "image", data: "<base64_string_no_prefix>", mimeType: "image/<format>", metadata?: { item_label?: string, window_title?: string, window_id?: number, source_path?: string } }. - May contain
TextContentItem(s) (summary, file paths fromsaved_files, Swift CLImessages).
- If
saved_files:Array<{ path: string, item_label?: string, window_title?: string, window_id?: number, mime_type: string }>(Directly from Swift CLI JSONdata.saved_filesif images were saved).isError?: boolean_meta?: { backend_error_code?: string }(For relaying Swift CLI error codes).
Tool 2: peekaboo.analyze
- MCP Description: "Analyzes an image file using a configured AI model (local Ollama, cloud OpenAI, etc.) and returns a textual analysis/answer. Requires image path. AI provider selection and model defaults are governed by the server's
AI_PROVIDERSenvironment variable and client overrides." - MCP Input Schema (
AnalyzeInputSchema):z.object({ image_path: z.string().describe("Required. Absolute path to image file (.png, .jpg, .webp) to be analyzed."), question: z.string().describe("Required. Question for the AI about the image."), provider_config: z.object({ type: z.enum(["auto", "ollama", "openai" /* future: "anthropic_api" */]).default("auto") .describe("AI provider. 'auto' uses server's AI_PROVIDERS ENV preference. Specific provider must be enabled in server's AI_PROVIDERS."), model: z.string().optional().describe("Optional. Model name. If omitted, uses model from server's AI_PROVIDERS for chosen provider, or an internal default for that provider.") }).optional().describe("Optional. Explicit provider/model. Validated against server's AI_PROVIDERS.") }) - Node.js Handler Logic:
- Validate input. Server pre-checks
image_pathextension (.png,.jpg,.jpeg,.webp); return MCP error if not recognized. - Read
process.env.AI_PROVIDERS. If unset/empty, return MCP error "AI analysis not configured on this server. Set the AI_PROVIDERS environment variable." Log this with Pino (errorlevel). - Parse
AI_PROVIDERSintoconfiguredItems = [{provider: string, model: string}]. - Determine Provider & Model:
requestedProviderType = input.provider_config?.type || "auto".requestedModelName = input.provider_config?.model.chosenProvider: string | null = null,chosenModel: string | null = null.- If
requestedProviderType !== "auto":- Find entry in
configuredItemswhereprovider === requestedProviderType. - If not found, MCP error: "Provider '{requestedProviderType}' is not enabled in server's AI_PROVIDERS configuration."
chosenProvider = requestedProviderType.chosenModel = requestedModelName || model_from_matching_configuredItem || hardcoded_default_for_chosenProvider.
- Find entry in
- Else (
requestedProviderType === "auto"):- Iterate
configuredItemsin order. For each{provider, modelFromEnv}:- Check availability (Ollama up? Cloud API key for
providerset inprocess.env?). - If available:
chosenProvider = provider,chosenModel = requestedModelName || modelFromEnv. Break.
- Check availability (Ollama up? Cloud API key for
- If no provider found after iteration, MCP error: "No configured AI providers in AI_PROVIDERS are currently operational."
- Iterate
- Execute Analysis (Node.js handles all AI calls):
- Read
input.image_pathinto aBuffer. Base64 encode. - If
chosenProvideris "ollama": HTTP POST to Ollama (usingprocess.env.OLLAMA_BASE_URL) with Base64 image,input.question,chosenModel. Handle Ollama API errors. - If
chosenProvideris "openai": Use OpenAI SDK/HTTP with Base64 image,input.question,chosenModel, and API key fromprocess.env.OPENAI_API_KEY. Handle OpenAI API errors. - (Similar for other cloud providers).
- Read
- Construct MCP
ToolResponse.
- Validate input. Server pre-checks
- MCP Output Schema (
ToolResponse):content:[{ type: "text", text: "<AI's analysis/answer>" }]analysis_text:string(Core AI answer).model_used:string(e.g., "ollama/llava:7b", "openai/gpt-4o") - The actual provider/model pair used.isError?: boolean_meta?: { backend_error_code?: string }(For AI provider API errors).
Tool 3: peekaboo.list
- MCP Description: "Lists system items: all running applications, windows of a specific app, or server status. Allows specifying window details. App ID uses fuzzy matching."
- MCP Input Schema (
ListInputSchema):z.object({ item_type: z.enum(["running_applications", "application_windows", "server_status"]) .default("running_applications").describe("What to list. 'server_status' returns Peekaboo server info."), app: z.string().optional().describe("Required if 'item_type' is 'application_windows'. Target application. Uses fuzzy matching."), include_window_details: z.array( z.enum(["off_screen", "bounds", "ids"]) ).optional().describe("Optional, for 'application_windows'. Additional window details. Example: ['bounds', 'ids']") }).refine(data => data.item_type !== "application_windows" || (data.app !== undefined && data.app.trim() !== ""), { message: "For 'application_windows', 'app' identifier is required.", path: ["app"], }).refine(data => !data.include_window_details || data.item_type === "application_windows", { message: "'include_window_details' only for 'application_windows'.", path: ["include_window_details"], }).refine(data => data.item_type !== "server_status" || (data.app === undefined && data.include_window_details === undefined), { message: "'app' and 'include_window_details' not applicable for 'server_status'.", path: ["item_type"] }) - Node.js Handler Logic:
- If
input.item_type === "server_status": Handler directly callsgenerateServerStatusString()and returns it inToolResponse.content[{type:"text"}]. Does NOT call Swift CLI. Does NOT affecthasSentInitialStatus. - Else (for "running_applications", "application_windows"): Call Swift
peekaboo list ...with mapped args (including joininginclude_window_detailsarray to comma-separated string for Swift CLI flag). Parse Swift JSON. Format MCPToolResponse.
- If
- MCP Output Schema (
ToolResponse):content:[{ type: "text", text: "<Summary or Status String>" }]- If
item_type: "running_applications":application_list:Array<{ app_name: string; bundle_id: string; pid: number; is_active: boolean; window_count: number }>. - If
item_type: "application_windows":window_list:Array<{ window_title: string; window_id?: number; window_index?: number; bounds?: {x:number,y:number,w:number,h:number}; is_on_screen?: boolean }>.target_application_info:{ app_name: string; bundle_id?: string; pid: number }.
isError?: boolean_meta?: { backend_error_code?: string }
II. Swift CLI (peekaboo)
A. General CLI Design
- Executable Name:
peekaboo(Universal macOS binary: arm64 + x86_64). - Argument Parser: Use
swift-argument-parserpackage. - Top-Level Commands (Subcommands of
peekaboo):image,list. (Noanalyzecommand). - Global Option (for all commands/subcommands):
--json-output(Boolean flag).- If present: All
stdoutfrom Swift CLI MUST be a single, valid JSON object.stderrshould be empty on success, or may contain system-level error text on catastrophic failure before JSON can be formed. - If absent: Output human-readable text to
stdoutandstderras appropriate for direct CLI usage. - Success JSON Structure:
{ "success": true, "data": { /* Command-specific structured data */ }, "messages": ["Optional user-facing status/warning message from Swift CLI operations"], "debug_logs": ["Internal Swift CLI debug log entry 1", "Another trace message"] } - Error JSON Structure:
{ "success": false, "error": { "message": "Detailed, user-understandable error message.", "code": "SWIFT_ERROR_CODE_STRING", // e.g., PERMISSION_DENIED_SCREEN_RECORDING "details": "Optional additional technical details or context." }, "debug_logs": ["Contextual debug log leading to error"] } - Standardized Swift Error Codes (
error.codevalues):PERMISSION_DENIED_SCREEN_RECORDINGPERMISSION_DENIED_ACCESSIBILITY(if Accessibility API is attempted for foregrounding)APP_NOT_FOUND(general app lookup failure)AMBIGUOUS_APP_IDENTIFIER(fuzzy match yields multiple candidates)WINDOW_NOT_FOUNDCAPTURE_FAILED(general image capture error)FILE_IO_ERROR(e.g., cannot write to specified path)INVALID_ARGUMENT(CLI argument validation failure)SIPS_ERROR(ifsipsis used for PDF fallback and fails)INTERNAL_SWIFT_ERROR(unexpected Swift runtime errors)
- If present: All
- Permissions Handling:
- The CLI must proactively check for Screen Recording permission before attempting any capture or window listing that requires it (e.g., reading window titles via
CGWindowListCopyWindowInfo). - If Accessibility is used for
--capture-focus foregroundwindow raising, check that permission. - If permissions are missing, output the specific JSON error (e.g., code
PERMISSION_DENIED_SCREEN_RECORDING) and exit. Do not hang or prompt interactively.
- The CLI must proactively check for Screen Recording permission before attempting any capture or window listing that requires it (e.g., reading window titles via
- Temporary File Management:
- If the CLI needs to save an image temporarily (e.g., if
screencaptureis used as a fallback for PDF, or if no--pathis given by Node.js), it usesFileManager.default.temporaryDirectorywith unique filenames (e.g.,peekaboo_<uuid>_<info>.<format>). - These self-created temporary files MUST be deleted by the Swift CLI after it has successfully generated and flushed its JSON output to
stdout. - Files saved to a user/Node.js-specified
--pathare NEVER deleted by the Swift CLI.
- If the CLI needs to save an image temporarily (e.g., if
- Internal Logging for
--json-output:- When
--json-outputis active, internal verbose/debug messages are collected into thedebug_logs: [String]array in the final JSON output. They are NOT printed tostderr. - For standalone CLI use (no
--json-output), these debug messages can print tostderr.
- When
B. peekaboo image Command
- Options (defined using
swift-argument-parser):--app <String?>: App identifier.--path <String?>: Base output directory or file prefix/path.--mode <ModeEnum?>:ModeEnumisscreen, window, multi. Default logic: if--appthenwindow, elsescreen.--window-title <String?>: Formode window.--window-index <Int?>: Formode window.--format <FormatEnum?>:FormatEnumispng, jpg. Defaultpng.--capture-focus <FocusEnum?>:FocusEnumisbackground, foreground. Defaultbackground.
- Behavior:
- Implements fuzzy app matching. On ambiguity, returns JSON error with
code: "AMBIGUOUS_APP_IDENTIFIER"and lists potential matches inerror.detailsorerror.message. - Always attempts to exclude window shadow/frame (
CGWindowImageOption.boundsIgnoreFramingorscreencapture -oif shelled out for PDF). No cursor is captured. - Background Capture (
--capture-focus backgroundor default):- Primary method: Uses
CGWindowListCopyWindowInfoto identify target window(s)/screen(s). - Captures via
CGDisplayCreateImage(for screen mode) orCGWindowListCreateImageFromArray(for window/multi modes). - Converts
CGImagetoData(PNG or JPG) and saves to file (at user--pathor its own temp path).
- Primary method: Uses
- Foreground Capture (
--capture-focus foreground):- Activates app using
NSRunningApplication.activate(options: [.activateIgnoringOtherApps]). - If a specific window needs raising (e.g., from
--window-indexor specific--window-titlefor an app with many windows), it may attempt to use Accessibility API (AXUIElementPerformAction(kAXRaiseAction)) if available and permissioned. - If specific window raise fails (or Accessibility not used/permitted), it logs a warning to the
debug_logsarray (e.g., "Could not raise specific window; proceeding with frontmost of activated app.") and captures the most suitable front window of the activated app. - Capture mechanism is still preferably native CG APIs.
- Activates app using
- Multi-Screen (
--mode screen): EnumeratesCGGetActiveDisplayList, captures each display usingCGDisplayCreateImage. Filenames (if saving) get display-specific suffixes (e.g.,_display0_main.png,_display1.png). - Multi-Window (
--mode multi): UsesCGWindowListCopyWindowInfofor target app's PID, captures each relevant window (on-screen by default) withCGWindowListCreateImageFromArray. Filenames get window-specific suffixes. - PDF Format Handling (as per Q7 decision): If
--format pdfwere still supported (it's removed), it would useProcessto callscreencapture -t pdf -R<bounds>or-l<id>. Since PDF is removed, this is not applicable.
- Implements fuzzy app matching. On ambiguity, returns JSON error with
- JSON Output
datafield structure (on success):{ "saved_files": [ // Array is always present, even if empty (e.g. capture failed before saving) { "path": "/absolute/path/to/saved/image.png", // Absolute path "item_label": "Display 1 / Main", // Or window_title for window/multi modes "window_id": 12345, // CGWindowID (UInt32), optional, if available & relevant "window_index": 0, // Optional, if relevant (e.g. for multi-window or indexed capture) "mime_type": "image/png" // Actual MIME type of the saved file } // ... more items if mode is screen or multi ... ] }
C. peekaboo list Command
- Subcommands & Options:
peekaboo list apps [--json-output]peekaboo list windows --app <app_identifier_string> [--include-details <comma_separated_string_of_options>] [--json-output]--include-detailsoptions:off_screen,bounds,ids.
- Behavior:
apps: UsesNSWorkspace.shared.runningApplications. For each app, retrieveslocalizedName,bundleIdentifier,processIdentifier(pid),isActive. To getwindow_count, it performs aCGWindowListCopyWindowInfocall filtered by the app's PID and counts on-screen windows.windows:- Resolves
app_identifierusing fuzzy matching. If ambiguous, returns JSON error. - Uses
CGWindowListCopyWindowInfofiltered by the target app's PID. - If
--include-detailscontains"off_screen", usesCGWindowListOption.optionAllScreenWindows(and includeskCGWindowIsOnscreenboolean in output). Otherwise, usesCGWindowListOption.optionOnScreenOnly. - Extracts
kCGWindowName(title). - If
"ids"in--include-details, extractskCGWindowNumberaswindow_id. - If
"bounds"in--include-details, extractskCGWindowBoundsasbounds: {x, y, width, height}. window_indexis the 0-based index from the filtered array returned byCGWindowListCopyWindowInfo(reflecting z-order for on-screen windows).
- Resolves
- JSON Output
datafield structure (on success):- For
apps:{ "applications": [ { "app_name": "Safari", "bundle_id": "com.apple.Safari", "pid": 501, "is_active": true, "window_count": 3 // Count of on-screen windows for this app } // ... more applications ... ] } - For
windows:{ "target_application_info": { "app_name": "Safari", "pid": 501, "bundle_id": "com.apple.Safari" }, "windows": [ { "window_title": "Apple", "window_id": 67, // if "ids" requested "window_index": 0, "is_on_screen": true, // Potentially useful, especially if "off_screen" included "bounds": {"x": 0, "y": 0, "width": 800, "height": 600} // if "bounds" requested } // ... more windows ... ] }
- For
III. Build, Packaging & Distribution
- Swift CLI (
peekaboo):Package.swiftdefines an executable product namedpeekaboo.- Build process (e.g., part of NPM
prepublishOnlyor a separate build script):swift build -c release --arch arm64 --arch x86_64. - The resulting universal binary (e.g., from
.build/apple/Products/Release/peekaboo) is copied to the root of thepeekaboo-mcpNPM package directory before publishing.
- Node.js MCP Server:
- TypeScript is compiled to JavaScript (e.g., into
dist/) usingtsc. - The NPM package includes
dist/and thepeekabooSwift binary (at package root).
- TypeScript is compiled to JavaScript (e.g., into
IV. Documentation (README.md for peekaboo-mcp NPM Package)
- Project Overview: Briefly state vision and components.
- Prerequisites:
- macOS version (e.g., 12.0+ or as required by Swift/APIs).
- Xcode Command Line Tools (recommended for a stable development environment on macOS, even if not strictly used by the final Swift binary for all operations).
- Ollama (if using local Ollama for analysis) + instructions to pull models.
- Installation:
- Primary:
npm install -g peekaboo-mcp. - Alternative:
npx peekaboo-mcp.
- Primary:
- MCP Client Configuration:
- Provide example JSON snippets for configuring popular MCP clients (e.g., VS Code, Cursor) to use
peekaboo-mcp. - Example for VS Code/Cursor using
npxfor robustness:{ "mcpServers": { "PeekabooMCP": { "command": "npx", "args": ["peekaboo-mcp"], "env": { "AI_PROVIDERS": "ollama/llava:latest,openai/gpt-4o", "OPENAI_API_KEY": "sk-yourkeyhere" /* other ENV VARS */ } } } }
- Provide example JSON snippets for configuring popular MCP clients (e.g., VS Code, Cursor) to use
- Required macOS Permissions:
- Screen Recording: Essential for ALL
peekaboo.imagefunctionalities and forpeekaboo.listif it needs to read window titles (which it does viaCGWindowListCopyWindowInfo). Provide clear, step-by-step instructions for System Settings. Includeopen "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"command. - Accessibility: Required only if
peekaboo.imagewithcapture_focus: "foreground"needs to perform specific window raising actions (beyond simple app activation) via the Accessibility API. Explain this nuance. Includeopen "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"command.
- Screen Recording: Essential for ALL
- Environment Variables (for Node.js
peekaboo-mcpserver):AI_PROVIDERS: Crucial forpeekaboo.analyze. Explain format (provider/model,provider/model), effect, and thatpeekaboo.analyzereports "not configured" if unset. List recognizedprovidernames ("ollama", "openai").OPENAI_API_KEY(and similar for other cloud providers): How they are used.OLLAMA_BASE_URL: Default and purpose.LOG_LEVEL: Forpinologger. Values and default.PEEKABOO_MCP_CONSOLE_LOGGING: For development.PEEKABOO_CLI_PATH: For overriding bundled Swift CLI.
- MCP Tool Overview:
- Brief descriptions of
peekaboo.image,peekaboo.analyze,peekaboo.listand their primary purpose.
- Brief descriptions of
- Link to Detailed Tool Specification: A separate
TOOL_API_REFERENCE.md(generated from or summarizing the Zod schemas and output structures in this document) for users/AI developers needing full schema details. - Troubleshooting / Support: Link to GitHub issues.