mirror of
https://github.com/samsonjs/Peekaboo.git
synced 2026-03-25 09:25:47 +00:00
32 KiB
32 KiB
Peekaboo: Full & Final Detailed Specification v1.1.1
https://aistudio.google.com/prompts/1B0Va41QEZz5ZMiGmLl2gDme8kQ-LQPW-
Project Vision: Peekaboo is a macOS utility exposed via a Node.js MCP server, enabling AI agents to perform advanced screen captures, image analysis via user-configured AI providers, and query application/window information. The core macOS interactions are handled by a native Swift command-line interface (CLI) named peekaboo, which is called by the Node.js server. All image captures automatically exclude window shadows/frames.
Core Components:
- Node.js/TypeScript MCP Server (
peekaboo-mcp):- NPM Package Name:
peekaboo-mcp. - GitHub Project Name:
peekaboo. - Implements MCP server logic using the latest stable
@modelcontextprotocol/sdk. - Exposes three primary MCP tools:
image,analyze,list. - Translates MCP tool calls into commands for the Swift
peekabooCLI. - Parses structured JSON output from the Swift
peekabooCLI. - Handles image data preparation (reading files, Base64 encoding) for MCP responses if image data is explicitly requested by the client.
- Manages interaction with configured AI providers based on environment variables. All AI provider calls (Ollama, OpenAI, etc.) are made from this Node.js layer.
- Implements robust logging to a file using
pino, ensuring no logs interfere with MCP stdio communication.
- NPM Package Name:
- Swift CLI (
peekaboo):- A standalone macOS command-line tool, built as a universal binary (arm64 + x86_64).
- Handles all direct macOS system interactions: image capture, application/window listing, and fuzzy application matching.
- Does NOT directly interact with any AI providers (Ollama, OpenAI, etc.).
- Outputs all results and errors in a structured JSON format via a global
--json-outputflag. This JSON includes adebug_logsarray for internal Swift CLI logs, which the Node.js server can relay to its own logger. - The
peekaboobinary is bundled at the root of thepeekaboo-mcpNPM package.
I. Node.js/TypeScript MCP Server (peekaboo-mcp)
A. Project Setup & Distribution
- Language/Runtime: Node.js (latest LTS recommended, e.g., v18+ or v20+), TypeScript (latest stable, e.g., v5+).
- Package Manager: NPM.
package.json:name:"peekaboo-mcp"version: Semantic versioning (e.g.,1.1.1).type:"module"(for ES Modules).main:"dist/index.js"(compiled server entry point).bin:{ "peekaboo-mcp": "dist/index.js" }.files:["dist/", "peekaboo"](includes compiled JS and the Swiftpeekaboobinary at package root).scripts:build: Command to compile TypeScript (e.g.,tsc).start:node dist/index.js.prepublishOnly:npm run build.
dependencies:@modelcontextprotocol/sdk(latest stable),zod(for input validation),pino(for logging), relevant cloud AI SDKs (e.g.,openai,@anthropic-ai/sdk).devDependencies:typescript,@types/node,pino-pretty(for optional development console logging).
- Distribution: Published to NPM. Installable via
npm i -g peekaboo-mcpor usable withnpx peekaboo-mcp. - Swift CLI Location Strategy:
- The Node.js server will first check the environment variable
PEEKABOO_CLI_PATH. If set and points to a valid executable, that path will be used. - If
PEEKABOO_CLI_PATHis not set or invalid, the server will fall back to a bundled path, resolved relative to its own script location (e.g.,path.resolve(path.dirname(fileURLToPath(import.meta.url)), '..', 'peekaboo'), assuming the compiled server script is indist/andpeekaboobinary is at the package root).
- The Node.js server will first check the environment variable
B. Server Initialization & Configuration (src/index.ts)
- Imports:
McpServer,StdioServerTransportfrom@modelcontextprotocol/sdk;pinofrompino;os,pathfrom Node.js built-ins. - Server Info:
name: "PeekabooMCP",version: <package_version from package.json>. - Server Capabilities: Advertise
toolscapability. - Logging (Pino):
- Instantiate
pinologger. - Default Transport: File transport to
path.join(os.tmpdir(), 'peekaboo-mcp.log'). Usemkdir: trueoption for destination. - Log Level: Controlled by ENV VAR
LOG_LEVEL(standard Pino levels:trace,debug,info,warn,error,fatal). Default:"info". - Conditional Console Logging (Development Only): If ENV VAR
PEEKABOO_MCP_CONSOLE_LOGGING="true", add a second Pino transport targetingprocess.stderr.fd(potentially usingpino-prettyfor human-readable output). - Strict Rule: All server operational logging must use the configured Pino instance. No direct
console.log/warn/errorthat might output tostdout.
- Instantiate
- Environment Variables (Read by Server):
AI_PROVIDERS: Comma-separated list ofprovider_name/default_model_for_providerpairs (e.g.,"openai/gpt-4o,ollama/qwen2.5vl:7b"). If unset/empty,analyzetool reports AI not configured.OPENAI_API_KEY: API key for OpenAI.ANTHROPIC_API_KEY: (Example for future) API key for Anthropic.- (Other cloud provider API keys as standard ENV VAR names).
OLLAMA_BASE_URL: Base URL for local Ollama instance. Default:"http://localhost:11434".LOG_LEVEL: For Pino logger. Default:"info".LOG_FILE: Path to the server's log file. Default:path.join(os.tmpdir(), 'peekaboo-mcp.log').DEFAULT_SAVE_PATH: Default base absolute path for saving images captured byimageif not specified in the tool input. If this ENV is also not set, the Swift CLI will use its own temporary directory logic.CONSOLE_LOGGING: Boolean ("true"/"false") for dev console logs. Default:"false".CLI_PATH: Optional override for SwiftpeekabooCLI path.
- Initial Status Reporting Logic:
- A server-instance-level boolean flag:
let hasSentInitialStatus = false;. - A function
generateServerStatusString(): Creates a formatted string:"\n\n--- Peekaboo MCP Server Status ---\nName: PeekabooMCP\nVersion: <server_version>\nConfigured AI Providers (from AI_PROVIDERS ENV): <parsed list or 'None Configured. Set AI_PROVIDERS ENV.'>\n---". - Response Augmentation: In the function that sends a
ToolResponseback to the MCP client, if the response is for a successful tool call (notinitialize/initializedorlistwithitem_type: "server_status") ANDhasSentInitialStatusisfalse:- Append
generateServerStatusString()to the firstTextContentIteminToolResponse.content. If no text item exists, prepend a new one. - Set
hasSentInitialStatus = true.
- Append
- A server-instance-level boolean flag:
- Tool Registration: Register
image,analyze,listwith their Zod input schemas and handler functions. - Transport:
await server.connect(new StdioServerTransport());. - Shutdown: Implement graceful shutdown on
SIGINT,SIGTERM(e.g.,await server.close(); logger.flush(); process.exit(0);).
C. MCP Tool Specifications & Node.js Handler Logic
General Node.js Handler Pattern (for tools calling Swift peekaboo CLI):
- Validate MCP
inputagainst the tool's Zod schema. If invalid, log error with Pino and return MCP errorToolResponse. - Construct command-line arguments for Swift
peekabooCLI based on MCPinput. Always include--json-output. - Log the constructed Swift command with Pino at
debuglevel. - Execute Swift
peekabooCLI usingchild_process.spawn, capturingstdout,stderr, andexitCode. - If any data is received on Swift CLI's
stderr, log it immediately with Pino atwarnlevel, prefixed (e.g.,[SwiftCLI-stderr]). - On Swift CLI process close:
- If
exitCode !== 0orstdoutis empty/not parseable as JSON:- Log failure details with Pino (
errorlevel). - Construct MCP error
ToolResponse(e.g.,errorCode: "SWIFT_CLI_EXECUTION_ERROR"orSWIFT_CLI_INVALID_OUTPUTin_meta). Message should include relevant parts of rawstdout/stderrif available.
- Log failure details with Pino (
- If
exitCode === 0:- Attempt to parse
stdoutas JSON. If parsing fails, treat as error (above). - Let
swiftResponse = JSON.parse(stdout). - If
swiftResponse.debug_logs(array of strings) exists, log each entry via Pino atdebuglevel, clearly marked as from backend (e.g.,logger.debug({ backend: "swift", swift_log: entry })). - If
swiftResponse.success === false:- Extract
swiftResponse.error.message,swiftResponse.error.code,swiftResponse.error.details. - Construct and return MCP error
ToolResponse, relaying these details (e.g.,messageincontent,codein_meta.backend_error_code).
- Extract
- If
swiftResponse.success === true:- Process
swiftResponse.datato construct the success MCPToolResponse. - Relay
swiftResponse.messagesasTextContentItems in the MCP response if appropriate. - For
imagewithinput.return_data: true:- Iterate
swiftResponse.data.saved_files.[*].path. - For each path, read image file into a
Buffer. - Base64 encode the
Buffer. - Construct
ImageContentItemfor MCPToolResponse.content, includingdata(Base64 string) andmimeType(fromswiftResponse.data.saved_files.[*].mime_type).
- Iterate
- Process
- Attempt to parse
- Augment successful
ToolResponsewith initial server status string if applicable (see B.6). - Send MCP
ToolResponse.
- If
Tool 1: image
- MCP Description: "Captures macOS screen content. Targets: entire screen (each display separately), a specific application window, or all windows of an application. Supports foreground/background capture. Captured image(s) can be saved to file(s) and/or returned directly as image data. Window shadows/frames are automatically excluded. Application identification uses intelligent fuzzy matching."
- MCP Input Schema (
ImageInputSchema):z.object({ app: z.string().optional().describe("Optional. Target application: name, bundle ID, or partial name. If omitted, captures screen(s). Uses fuzzy matching."), path: z.string().optional().describe("Optional. Base absolute path for saving. For 'screen' or 'multi' mode, display/window info is appended by backend. If omitted, the server checks the DEFAULT_SAVE_PATH environment variable. If neither is set, the Swift CLI uses its default temporary paths. If 'return_data' true, images saved AND returned if a path is determined (either from input or ENV)."), mode: z.enum(["screen", "window", "multi"]).optional().describe("Capture mode. Defaults to 'window' if 'app' is provided, otherwise 'screen'."), window_specifier: z.union([ z.object({ title: z.string().describe("Capture window by title.") }), z.object({ index: z.number().int().nonnegative().describe("Capture window by index (0=frontmost). 'capture_focus' might need to be 'foreground'.") }), ]).optional().describe("Optional. Specifies which window for 'window' mode. Defaults to main/frontmost of target app."), format: z.enum(["png", "jpg"]).optional().default("png").describe("Output image format. Defaults to 'png'."), return_data: z.boolean().optional().default(false).describe("Optional. If true, image data is returned in response content (one item for 'window' mode, multiple for 'screen' or 'multi' mode)."), capture_focus: z.enum(["background", "foreground"]) .optional().default("background").describe("Optional. Focus behavior. 'background' (default): capture without altering window focus. 'foreground': bring target to front before capture.") })- Node.js Handler - Default
modeLogic: Ifinput.appprovided &input.modeundefined,mode="window". If noinput.app&input.modeundefined,mode="screen".
- Node.js Handler - Default
- MCP Output Schema (
ToolResponse):content:Array<ImageContentItem | TextContentItem>- If
input.return_data: true: ContainsImageContentItem(s):{ type: "image", data: "<base64_string_no_prefix>", mimeType: "image/<format>", metadata?: { item_label?: string, window_title?: string, window_id?: number, source_path?: string } }. - May contain
TextContentItem(s) (summary, file paths fromsaved_files, Swift CLImessages).
- If
saved_files:Array<{ path: string, item_label?: string, window_title?: string, window_id?: number, mime_type: string }>(Directly from Swift CLI JSONdata.saved_filesif images were saved).isError?: boolean_meta?: { backend_error_code?: string }(For relaying Swift CLI error codes).
Tool 2: analyze
- MCP Description: "Analyzes an image file using a configured AI model (local Ollama, cloud OpenAI, etc.) and returns a textual analysis/answer. Requires image path. AI provider selection and model defaults are governed by the server's
AI_PROVIDERSenvironment variable and client overrides." - MCP Input Schema (
AnalyzeInputSchema):z.object({ image_path: z.string().describe("Required. Absolute path to image file (.png, .jpg, .webp) to be analyzed."), question: z.string().describe("Required. Question for the AI about the image."), provider_config: z.object({ type: z.enum(["auto", "ollama", "openai" /* future: "anthropic_api" */]).default("auto") .describe("AI provider. 'auto' uses server's AI_PROVIDERS ENV preference. Specific provider must be enabled in server's AI_PROVIDERS."), model: z.string().optional().describe("Optional. Model name. If omitted, uses model from server's AI_PROVIDERS for chosen provider, or an internal default for that provider.") }).optional().describe("Optional. Explicit provider/model. Validated against server's AI_PROVIDERS.") }) - Node.js Handler Logic:
- Validate input. Server pre-checks
image_pathextension (.png,.jpg,.jpeg,.webp); return MCP error if not recognized. - Read
process.env.AI_PROVIDERS. If unset/empty, return MCP error "AI analysis not configured on this server. Set the AI_PROVIDERS environment variable." Log this with Pino (errorlevel). - Parse
AI_PROVIDERSintoconfiguredItems = [{provider: string, model: string}]. - Determine Provider & Model:
requestedProviderType = input.provider_config?.type || "auto".requestedModelName = input.provider_config?.model.chosenProvider: string | null = null,chosenModel: string | null = null.- If
requestedProviderType !== "auto":- Find entry in
configuredItemswhereprovider === requestedProviderType. - If not found, MCP error: "Provider '{requestedProviderType}' is not enabled in server's AI_PROVIDERS configuration."
chosenProvider = requestedProviderType.chosenModel = requestedModelName || model_from_matching_configuredItem || hardcoded_default_for_chosenProvider.
- Find entry in
- Else (
requestedProviderType === "auto"):- Iterate
configuredItemsin order. For each{provider, modelFromEnv}:- Check availability (Ollama up? Cloud API key for
providerset inprocess.env?). - If available:
chosenProvider = provider,chosenModel = requestedModelName || modelFromEnv. Break.
- Check availability (Ollama up? Cloud API key for
- If no provider found after iteration, MCP error: "No configured AI providers in AI_PROVIDERS are currently operational."
- Iterate
- Execute Analysis (Node.js handles all AI calls):
- Read
input.image_pathinto aBuffer. Base64 encode. - If
chosenProvideris "ollama": HTTP POST to Ollama (usingprocess.env.OLLAMA_BASE_URL) with Base64 image,input.question,chosenModel. Handle Ollama API errors. - If
chosenProvideris "openai": Use OpenAI SDK/HTTP with Base64 image,input.question,chosenModel, and API key fromprocess.env.OPENAI_API_KEY. Handle OpenAI API errors. - (Similar for other cloud providers).
- Read
- Construct MCP
ToolResponse.
- Validate input. Server pre-checks
- MCP Output Schema (
ToolResponse):content:[{ type: "text", text: "<AI's analysis/answer>" }]analysis_text:string(Core AI answer).model_used:string(e.g., "ollama/llava:7b", "openai/gpt-4o") - The actual provider/model pair used.isError?: boolean_meta?: { backend_error_code?: string }(For AI provider API errors).
Tool 3: list
- MCP Description: "Lists system items: all running applications, windows of a specific app, or server status. Allows specifying window details. App ID uses fuzzy matching."
- MCP Input Schema (
ListInputSchema):z.object({ item_type: z.enum(["running_applications", "application_windows", "server_status"]) .default("running_applications").describe("What to list. 'server_status' returns Peekaboo server info."), app: z.string().optional().describe("Required if 'item_type' is 'application_windows'. Target application. Uses fuzzy matching."), include_window_details: z.array( z.enum(["off_screen", "bounds", "ids"]) ).optional().describe("Optional, for 'application_windows'. Additional window details. Example: ['bounds', 'ids']") }).refine(data => data.item_type !== "application_windows" || (data.app !== undefined && data.app.trim() !== ""), { message: "For 'application_windows', 'app' identifier is required.", path: ["app"], }).refine(data => !data.include_window_details || data.item_type === "application_windows", { message: "'include_window_details' only for 'application_windows'.", path: ["include_window_details"], }).refine(data => data.item_type !== "server_status" || (data.app === undefined && data.include_window_details === undefined), { message: "'app' and 'include_window_details' not applicable for 'server_status'.", path: ["item_type"] }) - Node.js Handler Logic:
- If
input.item_type === "server_status": Handler directly callsgenerateServerStatusString()and returns it inToolResponse.content[{type:"text"}]. Does NOT call Swift CLI. Does NOT affecthasSentInitialStatus. - Else (for "running_applications", "application_windows"): Call Swift
peekaboo list ...with mapped args (including joininginclude_window_detailsarray to comma-separated string for Swift CLI flag). Parse Swift JSON. Format MCPToolResponse.
- If
- MCP Output Schema (
ToolResponse):content:[{ type: "text", text: "<Summary or Status String>" }]- If
item_type: "running_applications":application_list:Array<{ app_name: string; bundle_id: string; pid: number; is_active: boolean; window_count: number }>. - If
item_type: "application_windows":window_list:Array<{ window_title: string; window_id?: number; window_index?: number; bounds?: {x:number,y:number,w:number,h:number}; is_on_screen?: boolean }>.target_application_info:{ app_name: string; bundle_id?: string; pid: number }.
isError?: boolean_meta?: { backend_error_code?: string }
II. Swift CLI (peekaboo)
A. General CLI Design
- Executable Name:
peekaboo(Universal macOS binary: arm64 + x86_64). - Argument Parser: Use
swift-argument-parserpackage. - Top-Level Commands (Subcommands of
peekaboo):image,list. (Noanalyzecommand). - Global Option (for all commands/subcommands):
--json-output(Boolean flag).- If present: All
stdoutfrom Swift CLI MUST be a single, valid JSON object.stderrshould be empty on success, or may contain system-level error text on catastrophic failure before JSON can be formed. - If absent: Output human-readable text to
stdoutandstderras appropriate for direct CLI usage. - Success JSON Structure:
{ "success": true, "data": { /* Command-specific structured data */ }, "messages": ["Optional user-facing status/warning message from Swift CLI operations"], "debug_logs": ["Internal Swift CLI debug log entry 1", "Another trace message"] } - Error JSON Structure:
{ "success": false, "error": { "message": "Detailed, user-understandable error message.", "code": "SWIFT_ERROR_CODE_STRING", // e.g., PERMISSION_DENIED_SCREEN_RECORDING "details": "Optional additional technical details or context." }, "debug_logs": ["Contextual debug log leading to error"] } - Standardized Swift Error Codes (
error.codevalues):PERMISSION_DENIED_SCREEN_RECORDINGPERMISSION_DENIED_ACCESSIBILITY(if Accessibility API is attempted for foregrounding)APP_NOT_FOUND(general app lookup failure)AMBIGUOUS_APP_IDENTIFIER(fuzzy match yields multiple candidates)WINDOW_NOT_FOUNDCAPTURE_FAILED(general image capture error)FILE_IO_ERROR(e.g., cannot write to specified path)INVALID_ARGUMENT(CLI argument validation failure)SIPS_ERROR(ifsipsis used for PDF fallback and fails)INTERNAL_SWIFT_ERROR(unexpected Swift runtime errors)
- If present: All
- Permissions Handling:
- The CLI must proactively check for Screen Recording permission before attempting any capture or window listing that requires it (e.g., reading window titles via
CGWindowListCopyWindowInfo). - If Accessibility is used for
--capture-focus foregroundwindow raising, check that permission. - If permissions are missing, output the specific JSON error (e.g., code
PERMISSION_DENIED_SCREEN_RECORDING) and exit. Do not hang or prompt interactively.
- The CLI must proactively check for Screen Recording permission before attempting any capture or window listing that requires it (e.g., reading window titles via
- Temporary File Management:
- If the CLI needs to save an image temporarily (e.g., if
screencaptureis used as a fallback for PDF, or if no--pathis given by Node.js), it usesFileManager.default.temporaryDirectorywith unique filenames (e.g.,peekaboo_<uuid>_<info>.<format>). - These self-created temporary files MUST be deleted by the Swift CLI after it has successfully generated and flushed its JSON output to
stdout. - Files saved to a user/Node.js-specified
--pathare NEVER deleted by the Swift CLI.
- If the CLI needs to save an image temporarily (e.g., if
- Internal Logging for
--json-output:- When
--json-outputis active, internal verbose/debug messages are collected into thedebug_logs: [String]array in the final JSON output. They are NOT printed tostderr. - For standalone CLI use (no
--json-output), these debug messages can print tostderr.
- When
B. peekaboo image Command
- Options (defined using
swift-argument-parser):--app <String?>: App identifier.--path <String?>: Base output directory or file prefix/path.--mode <ModeEnum?>:ModeEnumisscreen, window, multi. Default logic: if--appthenwindow, elsescreen.--window-title <String?>: Formode window.--window-index <Int?>: Formode window.--format <FormatEnum?>:FormatEnumispng, jpg. Defaultpng.--capture-focus <FocusEnum?>:FocusEnumisbackground, foreground. Defaultbackground.
- Behavior:
- Implements fuzzy app matching. On ambiguity, returns JSON error with
code: "AMBIGUOUS_APP_IDENTIFIER"and lists potential matches inerror.detailsorerror.message. - Always attempts to exclude window shadow/frame (
CGWindowImageOption.boundsIgnoreFramingorscreencapture -oif shelled out for PDF). No cursor is captured. - Background Capture (
--capture-focus backgroundor default):- Primary method: Uses
CGWindowListCopyWindowInfoto identify target window(s)/screen(s). - Captures via
CGDisplayCreateImage(for screen mode) orCGWindowListCreateImageFromArray(for window/multi modes). - Converts
CGImagetoData(PNG or JPG) and saves to file (at user--pathor its own temp path).
- Primary method: Uses
- Foreground Capture (
--capture-focus foreground):- Activates app using
NSRunningApplication.activate(options: [.activateIgnoringOtherApps]). - If a specific window needs raising (e.g., from
--window-indexor specific--window-titlefor an app with many windows), it may attempt to use Accessibility API (AXUIElementPerformAction(kAXRaiseAction)) if available and permissioned. - If specific window raise fails (or Accessibility not used/permitted), it logs a warning to the
debug_logsarray (e.g., "Could not raise specific window; proceeding with frontmost of activated app.") and captures the most suitable front window of the activated app. - Capture mechanism is still preferably native CG APIs.
- Activates app using
- Multi-Screen (
--mode screen): EnumeratesCGGetActiveDisplayList, captures each display usingCGDisplayCreateImage. Filenames (if saving) get display-specific suffixes (e.g.,_display0_main.png,_display1.png). - Multi-Window (
--mode multi): UsesCGWindowListCopyWindowInfofor target app's PID, captures each relevant window (on-screen by default) withCGWindowListCreateImageFromArray. Filenames get window-specific suffixes. - PDF Format Handling (as per Q7 decision): If
--format pdfwere still supported (it's removed), it would useProcessto callscreencapture -t pdf -R<bounds>or-l<id>. Since PDF is removed, this is not applicable.
- Implements fuzzy app matching. On ambiguity, returns JSON error with
- JSON Output
datafield structure (on success):{ "saved_files": [ // Array is always present, even if empty (e.g. capture failed before saving) { "path": "/absolute/path/to/saved/image.png", // Absolute path "item_label": "Display 1 / Main", // Or window_title for window/multi modes "window_id": 12345, // CGWindowID (UInt32), optional, if available & relevant "window_index": 0, // Optional, if relevant (e.g. for multi-window or indexed capture) "mime_type": "image/png" // Actual MIME type of the saved file } // ... more items if mode is screen or multi ... ] }
C. peekaboo list Command
- Subcommands & Options:
peekaboo list apps [--json-output]peekaboo list windows --app <app_identifier_string> [--include-details <comma_separated_string_of_options>] [--json-output]--include-detailsoptions:off_screen,bounds,ids.
- Behavior:
apps: UsesNSWorkspace.shared.runningApplications. For each app, retrieveslocalizedName,bundleIdentifier,processIdentifier(pid),isActive. To getwindow_count, it performs aCGWindowListCopyWindowInfocall filtered by the app's PID and counts on-screen windows.windows:- Resolves
app_identifierusing fuzzy matching. If ambiguous, returns JSON error. - Uses
CGWindowListCopyWindowInfofiltered by the target app's PID. - If
--include-detailscontains"off_screen", usesCGWindowListOption.optionAllScreenWindows(and includeskCGWindowIsOnscreenboolean in output). Otherwise, usesCGWindowListOption.optionOnScreenOnly. - Extracts
kCGWindowName(title). - If
"ids"in--include-details, extractskCGWindowNumberaswindow_id. - If
"bounds"in--include-details, extractskCGWindowBoundsasbounds: {x, y, width, height}. window_indexis the 0-based index from the filtered array returned byCGWindowListCopyWindowInfo(reflecting z-order for on-screen windows).
- Resolves
- JSON Output
datafield structure (on success):- For
apps:{ "applications": [ { "app_name": "Safari", "bundle_id": "com.apple.Safari", "pid": 501, "is_active": true, "window_count": 3 // Count of on-screen windows for this app } // ... more applications ... ] } - For
windows:{ "target_application_info": { "app_name": "Safari", "pid": 501, "bundle_id": "com.apple.Safari" }, "windows": [ { "window_title": "Apple", "window_id": 67, // if "ids" requested "window_index": 0, "is_on_screen": true, // Potentially useful, especially if "off_screen" included "bounds": {"x": 0, "y": 0, "width": 800, "height": 600} // if "bounds" requested } // ... more windows ... ] }
- For
III. Build, Packaging & Distribution
- Swift CLI (
peekaboo):Package.swiftdefines an executable product namedpeekaboo.- Build process (e.g., part of NPM
prepublishOnlyor a separate build script):swift build -c release --arch arm64 --arch x86_64. - The resulting universal binary (e.g., from
.build/apple/Products/Release/peekaboo) is copied to the root of thepeekaboo-mcpNPM package directory before publishing.
- Node.js MCP Server:
- TypeScript is compiled to JavaScript (e.g., into
dist/) usingtsc. - The NPM package includes
dist/and thepeekabooSwift binary (at package root).
- TypeScript is compiled to JavaScript (e.g., into
IV. Documentation (README.md for peekaboo-mcp NPM Package)
- Project Overview: Briefly state vision and components.
- Prerequisites:
- macOS version (e.g., 12.0+ or as required by Swift/APIs).
- Xcode Command Line Tools (recommended for a stable development environment on macOS, even if not strictly used by the final Swift binary for all operations).
- Ollama (if using local Ollama for analysis) + instructions to pull models.
- Installation:
- Primary:
npm install -g peekaboo-mcp. - Alternative:
npx peekaboo-mcp.
- Primary:
- MCP Client Configuration:
- Provide example JSON snippets for configuring popular MCP clients (e.g., VS Code, Cursor) to use
peekaboo-mcp. - Example for VS Code/Cursor using
npxfor robustness:{ "mcpServers": { "PeekabooMCP": { "command": "npx", "args": ["peekaboo-mcp"], "env": { "AI_PROVIDERS": "ollama/llava:latest,openai/gpt-4o", "OPENAI_API_KEY": "sk-yourkeyhere" /* other ENV VARS */ } } } }
- Provide example JSON snippets for configuring popular MCP clients (e.g., VS Code, Cursor) to use
- Required macOS Permissions:
- Screen Recording: Essential for ALL
imagefunctionalities and forlistif it needs to read window titles (which it does viaCGWindowListCopyWindowInfo). Provide clear, step-by-step instructions for System Settings. Includeopen "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"command. - Accessibility: Required only if
imagewithcapture_focus: "foreground"needs to perform specific window raising actions (beyond simple app activation) via the Accessibility API. Explain this nuance. Includeopen "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"command.
- Screen Recording: Essential for ALL
- Environment Variables (for Node.js
peekaboo-mcpserver):AI_PROVIDERS: Crucial foranalyze. Explain format (provider/model,provider/model), effect, and thatanalyzereports "not configured" if unset. List recognizedprovidernames ("ollama", "openai").OPENAI_API_KEY(and similar for other cloud providers): How they are used.OLLAMA_BASE_URL: Default and purpose.LOG_LEVEL: Forpinologger. Values and default.LOG_FILE: Path to the server's log file. Default:path.join(os.tmpdir(), 'peekaboo-mcp.log').DEFAULT_SAVE_PATH: Default base absolute path for saving images captured byimageif not specified in the tool input. If this ENV is also not set, the Swift CLI will use its own temporary directory logic.CONSOLE_LOGGING: For development.CLI_PATH: For overriding bundled Swift CLI.
- MCP Tool Overview:
- Brief descriptions of
image,analyze,listand their primary purpose.
- Brief descriptions of
- Link to Detailed Tool Specification: A separate
TOOL_API_REFERENCE.md(generated from or summarizing the Zod schemas and output structures in this document) for users/AI developers needing full schema details. - Troubleshooting / Support: Link to GitHub issues.