mirror of
https://github.com/samsonjs/Peekaboo.git
synced 2026-03-25 09:25:47 +00:00
Add Swift linting and enhance image capture features
- Add SwiftLint and SwiftFormat configuration with npm scripts - Refactor Swift code to comply with linting rules: - Fix identifier naming (x/y → xCoordinate/yCoordinate) - Extract long functions into smaller methods - Fix code style violations - Enhance image capture tool: - Add blur detection parameter - Support custom image formats and quality - Add flexible naming patterns for saved files - Add comprehensive integration tests for image tool - Update documentation with new linting commands 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
f41e70e23e
commit
53ec5ef9a4
16 changed files with 1536 additions and 793 deletions
|
|
@ -47,6 +47,12 @@ npm start
|
|||
|
||||
# Clean build artifacts
|
||||
npm run clean
|
||||
|
||||
# Lint Swift code
|
||||
npm run lint:swift
|
||||
|
||||
# Format Swift code
|
||||
npm run format:swift
|
||||
```
|
||||
|
||||
### Testing the MCP server
|
||||
|
|
|
|||
96
README.md
96
README.md
|
|
@ -522,79 +522,39 @@ Once summoned, Peekaboo grants you three supernatural abilities:
|
|||
|
||||
### 🖼️ `image` - Soul Capture
|
||||
|
||||
The `image` tool is a powerful utility for capturing screenshots of the entire screen, specific application windows, or all windows of an application. It now also supports optional AI-powered analysis of the captured image.
|
||||
|
||||
**Dual Capability: Capture & Analyze**
|
||||
|
||||
* **Capture Only**: By default, or if only capture-related parameters are provided, the tool functions purely as a screenshot utility. It can save images to a specified path and/or return image data directly in the response.
|
||||
* **Capture & Analyze**: If the `question` parameter is provided, the tool first captures the image as specified. Then, it sends this image along with the `question` to an AI model for analysis. The analysis results are returned alongside capture information. When a `question` is provided, image data is *not* returned in `base64_data` format to avoid excessively large responses; instead, focus is on the analysis result. If you need the image file itself when performing analysis, ensure you provide a `path` for it to be saved.
|
||||
|
||||
The AI analysis capabilities leverage the same configuration as the `analyze` tool, primarily through the `PEEKABOO_AI_PROVIDERS` environment variable. You can also specify a particular AI provider and model per call using the `provider_config` parameter.
|
||||
Captures macOS screen content and optionally analyzes it. Window shadows/frames are automatically excluded.
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `app` (string, optional):
|
||||
* **Description**: Target application for capture. Can be the application's name (e.g., "Safari"), bundle ID (e.g., "com.apple.Safari"), or a partial name.
|
||||
* **Default**: If omitted, the tool captures the entire screen(s) or the focused window based on other parameters.
|
||||
* **Behavior**: Uses fuzzy matching to find the application.
|
||||
* `path` (string, optional):
|
||||
* **Description**: The base absolute path where the captured image(s) should be saved.
|
||||
* **Default**: If omitted, images are saved to a default temporary path generated by the backend. If `question` is provided and `path` is omitted, a temporary path is used for the capture before analysis, and this temporary file is cleaned up afterwards.
|
||||
* **Behavior**:
|
||||
* For `screen` or `multi` mode captures, the backend automatically appends display or window-specific information to this base path to create unique filenames (e.g., `your_path_Display1.png`, `your_path_Window1_Title.png`).
|
||||
* If `return_data` is `true` and a `path` is specified, images are both saved to the path AND returned as base64 data in the response (unless a `question` is also provided, in which case base64 data is not returned).
|
||||
* `mode` (enum: `"screen" | "window" | "multi"`, optional):
|
||||
* **Description**: Specifies the capture mode.
|
||||
* `"screen"`: Captures the entire content of each connected display.
|
||||
* `"window"`: Captures a single window of a target application. Requires `app` to be specified.
|
||||
* `"multi"`: Captures all windows of a target application individually. Requires `app` to be specified.
|
||||
* **Default**: Defaults to `"window"` if `app` is provided, otherwise defaults to `"screen"`.
|
||||
* `window_specifier` (object, optional):
|
||||
* **Description**: Used in `window` mode to specify which window of the target application to capture.
|
||||
* **Structure**: Can be one of:
|
||||
* `{ "title": "window_title_string" }`: Captures the window whose title contains the given string.
|
||||
* `{ "index": number }`: Captures the window by its index (0 for the frontmost window, 1 for the next, and so on). Using `index` might require setting `capture_focus` to `"foreground"`.
|
||||
* **Default**: If omitted in `window` mode, captures the main/frontmost window of the target application.
|
||||
* `format` (enum: `"png" | "jpg"`, optional):
|
||||
* **Description**: The desired output image format.
|
||||
* **Default**: `"png"`.
|
||||
* `return_data` (boolean, optional):
|
||||
* **Description**: If set to `true`, the image data (as base64 encoded string) is included in the response content.
|
||||
* **Default**: `false`.
|
||||
* **Behavior**:
|
||||
* For `"window"` mode, one image data item is returned.
|
||||
* For `"screen"` or `"multi"` mode, multiple image data items may be returned (one per display/window).
|
||||
* **Important**: If a `question` is provided for analysis, `base64_data` is *not* returned in the response, regardless of this flag, to keep response sizes manageable. The focus shifts to providing the analysis result.
|
||||
* `capture_focus` (enum: `"background" | "foreground"`, optional):
|
||||
* **Description**: Controls how the tool interacts with window focus during capture.
|
||||
* `"background"`: Captures the window without altering its current focus state or bringing the application to the front. This is generally preferred for non-interactive use.
|
||||
* `"foreground"`: Brings the target application and window to the front (activates it) before performing the capture. This might be necessary for certain applications or to ensure the desired window content, especially when using `window_specifier` by `index`.
|
||||
* **Default**: `"background"`.
|
||||
* `question` (string, optional):
|
||||
* **Description**: If a question string is provided, the tool will capture the image as specified and then send it (along with this question) to an AI model for analysis.
|
||||
* **Default**: None (no analysis is performed by default).
|
||||
* **Behavior**: The analysis result will be included in the response. `PEEKABOO_AI_PROVIDERS` environment variable is used for configuring AI, or `provider_config` can be used for per-call settings.
|
||||
* `provider_config` (object, optional):
|
||||
* **Description**: Specifies AI provider configuration for the analysis when a `question` is provided. This allows overriding the server's default AI provider settings for this specific call.
|
||||
* **Structure**: Same as the `analyze` tool's `provider_config` parameter (e.g., `{ "type": "ollama", "model": "llava" }` or `{ "type": "openai", "model": "gpt-4-vision-preview" }`).
|
||||
* **Default**: If omitted, the analysis will use the server's default AI configuration (from `PEEKABOO_AI_PROVIDERS`).
|
||||
* `app_target` (string, optional): Specifies the capture target. If omitted or empty, captures all screens.
|
||||
* Examples:
|
||||
* `"screen:INDEX"`: Captures the screen at the specified zero-based index (e.g., `"screen:0"`). (Note: Index selection from multiple screens is planned for full support in the Swift CLI).
|
||||
* `"frontmost"`: Aims to capture all windows of the current foreground application. (Note: This is a complex scenario; current implementation may default to screen capture if the exact foreground app cannot be reliably determined by the Node.js layer alone).
|
||||
* `"AppName"`: Captures all windows of the application named `AppName` (e.g., `"Safari"`, `"com.apple.Safari"`). Fuzzy matching is used.
|
||||
* `"AppName:WINDOW_TITLE:Title"`: Captures the window of `AppName` that has the specified `Title` (e.g., `"Notes:WINDOW_TITLE:My Important Note"`).
|
||||
* `"AppName:WINDOW_INDEX:Index"`: Captures the window of `AppName` at the specified zero-based `Index` (e.g., `"Preview:WINDOW_INDEX:0"` for the frontmost window of Preview).
|
||||
* `path` (string, optional): Base absolute path for saving the captured image(s). If `format` is `"data"` and `path` is also provided, the image is saved to this path (as a PNG) AND Base64 data is returned. If a `question` is provided and `path` is omitted, a temporary path is used for capture, and the file is deleted after analysis.
|
||||
* `question` (string, optional): If provided, the captured image will be analyzed. The server automatically selects an AI provider from those configured in the `PEEKABOO_AI_PROVIDERS` environment variable.
|
||||
* `format` (string, optional, default: `"png"`): Specifies the output image format or data return type.
|
||||
* `"png"` or `"jpg"`: Saves the image to the specified `path` in the chosen format. If `path` is not provided, this behaves like `"data"`.
|
||||
* `"data"`: Returns Base64 encoded PNG data of the image directly in the MCP response. If `path` is also specified, a PNG file is also saved to that `path`.
|
||||
* `capture_focus` (string, optional, default: `"background"`): Controls window focus behavior during capture.
|
||||
* `"background"`: Captures without altering the current window focus (default).
|
||||
* `"foreground"`: Attempts to bring the target application/window to the foreground before capture. This might be necessary for certain applications or to ensure a specific window is captured if multiple are open.
|
||||
|
||||
**Example Usage (Capture & Analyze):**
|
||||
**Behavior with `question` (AI Analysis):**
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "image",
|
||||
"arguments": {
|
||||
"mode": "window",
|
||||
"app": "Safari",
|
||||
"path": "~/Desktop/SafariScreenshot.png",
|
||||
"format": "png",
|
||||
"return_data": true,
|
||||
"capture_focus": "foreground",
|
||||
"question": "What is the main color visible in the top-left quadrant?"
|
||||
}
|
||||
}
|
||||
```
|
||||
* If a `question` is provided, the tool will capture the image (saving it to `path` if specified, or a temporary path otherwise).
|
||||
* This image is then sent to an AI model for analysis. The AI provider and model are chosen automatically by the server based on your `PEEKABOO_AI_PROVIDERS` environment variable (trying them in order until one succeeds).
|
||||
* The analysis result is returned as `analysis_text` in the response. Image data (Base64) is NOT returned in the `content` array when a question is asked.
|
||||
* If a temporary path was used for the image, it's deleted after the analysis attempt.
|
||||
|
||||
**Output Structure (Simplified):**
|
||||
|
||||
* `content`: Can contain `ImageContentItem` (if `format: "data"` or `path` was omitted, and no `question`) and/or `TextContentItem` (for summaries, analysis text, warnings).
|
||||
* `saved_files`: Array of objects, each detailing a file saved to `path` (if `path` was provided).
|
||||
* `analysis_text`: Text from AI (if `question` was asked).
|
||||
* `model_used`: AI model identifier (if `question` was asked).
|
||||
|
||||
### 👻 `list` - Spirit Detection
|
||||
|
||||
|
|
|
|||
83
docs/spec.md
83
docs/spec.md
|
|
@ -118,52 +118,77 @@ Configured AI Providers (from PEEKABOO_AI_PROVIDERS ENV): <parsed list or 'None
|
|||
|
||||
**Tool 1: `image`**
|
||||
|
||||
* **MCP Description:** "Captures macOS screen content. Targets: entire screen (each display separately), a specific application window, or all windows of an application. Supports foreground/background capture. Captured image(s) can be saved to file(s) and/or returned directly as image data. If a question is provided, the captured image is also analyzed by an AI model. Window shadows/frames are automatically excluded. Application identification uses intelligent fuzzy matching."
|
||||
* **MCP Description:** "Captures macOS screen content and optionally analyzes it. Targets can be the entire screen (each display separately), a specific application window, or all windows of an application, controlled by `app_target`. Supports foreground/background capture. Captured image(s) can be saved to a file (`path`), returned as Base64 data (`format: \"data\"`), or both. If a `question` is provided, the captured image is analyzed by an AI model chosen automatically from `PEEKABOO_AI_PROVIDERS`. Window shadows/frames are excluded."
|
||||
* **MCP Input Schema (`ImageInputSchema`):**
|
||||
```typescript
|
||||
z.object({
|
||||
app: z.string().optional().describe("Optional. Target application: name, bundle ID, or partial name. If omitted, captures screen(s). Uses fuzzy matching."),
|
||||
path: z.string().optional().describe("Optional. Base absolute path for saving. For 'screen' or 'multi' mode, display/window info is appended by backend. If omitted and no 'question' is provided, the server checks PEEKABOO_DEFAULT_SAVE_PATH or Swift CLI uses temporary paths. If 'question' is provided and 'path' is omitted, a temporary path is used for capture, and the file is deleted after analysis. If 'return_data' is true and no 'question' is asked, images saved AND returned if a path is determined."),
|
||||
mode: z.enum(["screen", "window", "multi"]).optional().describe("Capture mode. Defaults to 'window' if 'app' is provided, otherwise 'screen'."),
|
||||
window_specifier: z.union([
|
||||
z.object({ title: z.string().describe("Capture window by title.") }),
|
||||
z.object({ index: z.number().int().nonnegative().describe("Capture window by index (0=frontmost). 'capture_focus' might need to be 'foreground'.") }),
|
||||
]).optional().describe("Optional. Specifies which window for 'window' mode. Defaults to main/frontmost of target app."),
|
||||
format: z.enum(["png", "jpg"]).optional().default("png").describe("Output image format. Defaults to 'png'."),
|
||||
return_data: z.boolean().optional().default(false).describe("Optional. If true AND no 'question' is provided, image data is returned in response content. If 'question' is provided, image data is NOT returned in the 'content' array, regardless of this flag, as 'analysis_text' becomes the primary payload."),
|
||||
app_target: z.string().optional().describe(
|
||||
"Optional. Specifies the capture target. Examples:\\n" +
|
||||
"- Omitted/empty: All screens.\\n" +
|
||||
"- 'screen:INDEX': Specific display (e.g., 'screen:0').\\n" +
|
||||
"- 'frontmost': All windows of the current foreground app.\\n" +
|
||||
"- 'AppName': All windows of 'AppName'.\\n" +
|
||||
"- 'AppName:WINDOW_TITLE:Title': Window of 'AppName' with 'Title'.\\n" +
|
||||
"- 'AppName:WINDOW_INDEX:Index': Window of 'AppName' at 'Index'."
|
||||
),
|
||||
path: z.string().optional().describe(
|
||||
"Optional. Base absolute path for saving the image. " +
|
||||
"If 'format' is 'data' and 'path' is also given, image is saved AND Base64 data returned. " +
|
||||
"If 'question' is provided and 'path' is omitted, a temporary path is used for capture, and the file is deleted after analysis."
|
||||
),
|
||||
question: z.string().optional().describe(
|
||||
"Optional. If provided, the captured image will be analyzed. " +
|
||||
"The server automatically selects an AI provider from 'PEEKABOO_AI_PROVIDERS'."
|
||||
),
|
||||
format: z.enum(["png", "jpg", "data"]).optional().default("png").describe(
|
||||
"Output format. 'png' or 'jpg' save to 'path' (if provided). " +
|
||||
"'data' returns Base64 encoded PNG data inline; if 'path' is also given, saves a PNG file to 'path' too. " +
|
||||
"If 'path' is not given, 'format' defaults to 'data' behavior (inline PNG data returned)."
|
||||
),
|
||||
capture_focus: z.enum(["background", "foreground"])
|
||||
.optional().default("background").describe("Optional. Focus behavior. 'background' (default): capture without altering window focus. 'foreground': bring target to front before capture."),
|
||||
question: z.string().optional().describe("Optional. If provided, the captured image will be analyzed using this question. Analysis results will be added to the output."),
|
||||
provider_config: z.object({
|
||||
type: z.enum(["auto", "ollama", "openai" /* "anthropic" is planned */])
|
||||
.default("auto")
|
||||
.describe("AI provider for analysis. 'auto' uses server's PEEKABOO_AI_PROVIDERS ENV preference. Specific provider must be enabled in server's PEEKABOO_AI_PROVIDERS."),
|
||||
model: z.string().optional().describe("Optional. Model name for analysis. If omitted, uses model from server's AI_PROVIDERS for chosen provider, or an internal default for that provider.")
|
||||
}).optional().describe("Optional. Explicit provider/model for analysis if 'question' is provided. Validated against server's PEEKABOO_AI_PROVIDERS.")
|
||||
.optional().default("background").describe(
|
||||
"Optional. Focus behavior. 'background' (default): capture without altering window focus. " +
|
||||
"'foreground': bring target to front before capture."
|
||||
)
|
||||
})
|
||||
```
|
||||
* **Node.js Handler - Default `mode` Logic:** If `input.app` provided & `input.mode` undefined, `mode="window"`. If no `input.app` & `input.mode` undefined, `mode="screen"`.
|
||||
* **Node.js Handler - `app_target` Parsing:** The handler will parse `app_target` to determine the Swift CLI arguments for `--app`, `--mode`, `--window-title`, or `--window-index`.
|
||||
* Omitted/empty `app_target`: maps to Swift CLI `--mode screen` (no `--app`).
|
||||
* `"screen:INDEX"`: maps to Swift CLI `--mode screen --screen-index INDEX` (custom Swift CLI flag might be needed or logic to select from multi-screen capture).
|
||||
* `"frontmost"`: Node.js determines frontmost app (e.g., via `list` tool logic or new Swift CLI helper), then calls Swift CLI with that app and `--mode multi` (or `window` for main window).
|
||||
* `"AppName"`: maps to Swift CLI `--app AppName --mode multi`.
|
||||
* `"AppName:WINDOW_TITLE:Title"`: maps to Swift CLI `--app AppName --mode window --window-title Title`.
|
||||
* `"AppName:WINDOW_INDEX:Index"`: maps to Swift CLI `--app AppName --mode window --window-index Index`.
|
||||
* **Node.js Handler - `format` and `path` Logic:**
|
||||
* If `input.format === "data"`: `return_data` becomes effectively true. If `input.path` is also set, the image is saved to `input.path` (as PNG) AND Base64 PNG data is returned.
|
||||
* If `input.format` is `"png"` or `"jpg"`:
|
||||
* If `input.path` is provided, the image is saved to `input.path` with the specified format. No Base64 data is returned unless `input.question` is also provided (for analysis).
|
||||
* If `input.path` is NOT provided: This implies `format: "data"` behavior; Base64 PNG data is returned.
|
||||
* If `input.question` is provided:
|
||||
* An `effectivePath` is determined (user's `input.path` or a temp path).
|
||||
* Image is captured to `effectivePath`.
|
||||
* Analysis proceeds as described below.
|
||||
* Base64 data is NOT returned in `content` due to analysis, but `analysis_text` is.
|
||||
* **Node.js Handler - Analysis Logic (if `input.question` is provided):**
|
||||
* An `effectivePath` is determined: if `input.path` is set, it's used. Otherwise, a temporary file path is generated.
|
||||
* Swift CLI is called to capture the image and save it to `effectivePath`.
|
||||
* The image file at `effectivePath` is read into a base64 string.
|
||||
* The AI provider and model are determined using logic similar to the `analyze` tool (checking `input.provider_config` and `PEEKABOO_AI_PROVIDERS`).
|
||||
* The AI provider and model are determined automatically by iterating through `PEEKABOO_AI_PROVIDERS` and selecting the first available/operational one (similar to `analyze` tool's "auto" mode, but without client `provider_config` override).
|
||||
* The image (base64) and `input.question` are sent to the chosen AI provider.
|
||||
* If a temporary path was used, the temporary image file and its directory are deleted after analysis attempt (success or failure).
|
||||
* If a temporary path was used for capture (because `input.path` was omitted), the temporary image file is deleted after analysis.
|
||||
* The `analysis_text` and `model_used` are added to the tool's response.
|
||||
* `base64_data` (image data) is *not* included in the `content` array of the response when a `question` is asked.
|
||||
* **Node.js Handler - Resilience with `path` and `return_data: true` (No `question`):** If `input.return_data` is true, `input.question` is NOT provided, and `input.path` is specified:
|
||||
* Base64 image data (`data` field in `ImageContentItem`) is *not* included in the `content` array of the response when a `question` is asked.
|
||||
* **Node.js Handler - Resilience with `path` and `format: "data"` (No `question`):** If `input.format === "data"`, `input.question` is NOT provided, and `input.path` is specified:
|
||||
* The handler will still attempt to process and return Base64 image data for successfully captured images even if the Swift CLI (or the handler itself) encounters an error saving to or reading from the user-specified `input.path` (or paths derived from it).
|
||||
* In such cases where image data is returned despite a save-to-path failure, a `TextContentItem` containing a "Peekaboo Warning:" message detailing the path saving issue will be included in the `ToolResponse.content`.
|
||||
* **MCP Output Schema (`ToolResponse`):**
|
||||
* `content`: `Array<ImageContentItem | TextContentItem>`
|
||||
* If `input.return_data: true` AND `input.question` is NOT provided: Contains `ImageContentItem`(s): `{ type: "image", data: "<base64_string_no_prefix>", mimeType: "image/<format>", metadata?: { item_label?: string, window_title?: string, window_id?: number, source_path?: string } }`.
|
||||
* If `input.question` IS provided, `ImageContentItem`s with base64 data are NOT added.
|
||||
* Always contains `TextContentItem`(s) (summary, file paths from `saved_files` if applicable, Swift CLI `messages`, and analysis results if a `question` was asked).
|
||||
* If `input.format === "data"` (or `path` was omitted, defaulting to "data" behavior) AND `input.question` is NOT provided: Contains one or more `ImageContentItem`(s): `{ type: "image", data: "<base64_png_string_no_prefix>", mimeType: "image/png", metadata?: { item_label?: string, window_title?: string, window_id?: number, source_path?: string } }`.
|
||||
* If `input.question` IS provided, `ImageContentItem`s with base64 image data are NOT added to `content`.
|
||||
* Always contains `TextContentItem`(s) (summary, file paths from `saved_files` if applicable and images were saved to persistent paths, Swift CLI `messages`, and analysis results if a `question` was asked).
|
||||
* `saved_files`: `Array<{ path: string, item_label?: string, window_title?: string, window_id?: number, mime_type: string }>`
|
||||
* If `input.question` is provided AND `input.path` was NOT specified (temp image used): This array will be empty as the temp file is deleted.
|
||||
* If `input.question` is provided AND `input.path` WAS specified: Contains information about the file saved at `input.path`.
|
||||
* If `input.question` is NOT provided: Populated as per original behavior (directly from Swift CLI JSON `data.saved_files` if images were saved).
|
||||
* Populated if `input.path` was provided (and not a temporary path for analysis that got deleted). The `mime_type` will reflect `input.format` if it was 'png' or 'jpg' and saved, or 'image/png' if `format: "data"` also saved a file.
|
||||
* If `input.question` is provided AND `input.path` was NOT specified (temp image used and deleted): This array will be empty.
|
||||
* `analysis_text?: string`: (Conditionally present if `input.question` was provided) Core AI answer or error/skip message.
|
||||
* `model_used?: string`: (Conditionally present if analysis was successful) e.g., "ollama/llava:7b", "openai/gpt-4o".
|
||||
* `isError?: boolean` (Can be true if capture fails, or if analysis is attempted but fails, even if capture succeeded).
|
||||
|
|
|
|||
|
|
@ -26,6 +26,8 @@
|
|||
"test:swift": "cd peekaboo-cli && swift test",
|
||||
"test:integration": "npm run build && npm run test:swift && vitest run",
|
||||
"test:all": "npm run test:integration",
|
||||
"lint:swift": "cd peekaboo-cli && swiftlint",
|
||||
"format:swift": "cd peekaboo-cli && swiftformat .",
|
||||
"postinstall": "chmod +x dist/index.js 2>/dev/null || true"
|
||||
},
|
||||
"keywords": [
|
||||
|
|
|
|||
42
peekaboo-cli/.swiftformat
Normal file
42
peekaboo-cli/.swiftformat
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
# SwiftFormat configuration for Peekaboo CLI
|
||||
|
||||
# Format options
|
||||
--indent 4
|
||||
--indentcase false
|
||||
--trimwhitespace always
|
||||
--voidtype tuple
|
||||
--nospaceoperators ..<, ...
|
||||
--ifdef noindent
|
||||
--stripunusedargs closure-only
|
||||
--maxwidth 120
|
||||
|
||||
# Wrap options
|
||||
--wraparguments before-first
|
||||
--wrapparameters before-first
|
||||
--wrapcollections before-first
|
||||
--closingparen balanced
|
||||
|
||||
# Rules to enable
|
||||
--enable sortImports
|
||||
--enable duplicateImports
|
||||
--enable consecutiveSpaces
|
||||
--enable trailingSpace
|
||||
--enable blankLinesAroundMark
|
||||
--enable anyObjectProtocol
|
||||
--enable redundantReturn
|
||||
--enable redundantInit
|
||||
--enable redundantSelf
|
||||
--enable redundantType
|
||||
--enable redundantPattern
|
||||
--enable redundantGet
|
||||
--enable strongOutlets
|
||||
--enable unusedArguments
|
||||
|
||||
# Rules to disable
|
||||
--disable andOperator
|
||||
--disable trailingCommas
|
||||
--disable wrapMultilineStatementBraces
|
||||
|
||||
# Paths
|
||||
--exclude .build
|
||||
--exclude Package.swift
|
||||
60
peekaboo-cli/.swiftlint.yml
Normal file
60
peekaboo-cli/.swiftlint.yml
Normal file
|
|
@ -0,0 +1,60 @@
|
|||
# SwiftLint configuration for Peekaboo CLI
|
||||
|
||||
# Rules
|
||||
disabled_rules:
|
||||
- trailing_whitespace # Can be annoying with markdown
|
||||
|
||||
opt_in_rules:
|
||||
- empty_count
|
||||
- closure_spacing
|
||||
- contains_over_filter_count
|
||||
- contains_over_filter_is_empty
|
||||
- contains_over_first_not_nil
|
||||
- contains_over_range_nil_comparison
|
||||
- discouraged_object_literal
|
||||
- empty_string
|
||||
- first_where
|
||||
- last_where
|
||||
- legacy_multiple
|
||||
- prefer_self_type_over_type_of_self
|
||||
- sorted_first_last
|
||||
- trailing_closure
|
||||
- unneeded_parentheses_in_closure_argument
|
||||
- vertical_parameter_alignment_on_call
|
||||
|
||||
# Rule configurations
|
||||
line_length:
|
||||
warning: 120
|
||||
error: 150
|
||||
ignores_comments: true
|
||||
ignores_urls: true
|
||||
|
||||
type_body_length:
|
||||
warning: 300
|
||||
error: 400
|
||||
|
||||
file_length:
|
||||
warning: 500
|
||||
error: 1000
|
||||
|
||||
function_body_length:
|
||||
warning: 40
|
||||
error: 60
|
||||
|
||||
identifier_name:
|
||||
min_length:
|
||||
warning: 3
|
||||
error: 2
|
||||
max_length:
|
||||
warning: 40
|
||||
error: 50
|
||||
allowed_symbols: ["_"]
|
||||
|
||||
# Paths
|
||||
included:
|
||||
- Sources
|
||||
- Tests
|
||||
|
||||
excluded:
|
||||
- .build
|
||||
- Package.swift
|
||||
|
|
@ -12,66 +12,82 @@ class ApplicationFinder {
|
|||
Logger.shared.debug("Searching for application: \(identifier)")
|
||||
|
||||
let runningApps = NSWorkspace.shared.runningApplications
|
||||
|
||||
// Check for exact bundle ID match first
|
||||
if let exactMatch = runningApps.first(where: { $0.bundleIdentifier == identifier }) {
|
||||
Logger.shared.debug("Found exact bundle ID match: \(exactMatch.localizedName ?? "Unknown")")
|
||||
return exactMatch
|
||||
}
|
||||
|
||||
// Find all possible matches
|
||||
let matches = findAllMatches(for: identifier, in: runningApps)
|
||||
|
||||
// Get unique matches
|
||||
let uniqueMatches = removeDuplicateMatches(from: matches)
|
||||
|
||||
// Handle results
|
||||
return try processMatchResults(uniqueMatches, identifier: identifier, runningApps: runningApps)
|
||||
}
|
||||
|
||||
private static func findAllMatches(for identifier: String, in apps: [NSRunningApplication]) -> [AppMatch] {
|
||||
var matches: [AppMatch] = []
|
||||
let lowerIdentifier = identifier.lowercased()
|
||||
|
||||
// Exact bundle ID match (highest priority)
|
||||
if let exactBundleMatch = runningApps.first(where: { $0.bundleIdentifier == identifier }) {
|
||||
Logger.shared.debug("Found exact bundle ID match: \(exactBundleMatch.localizedName ?? "Unknown")")
|
||||
return exactBundleMatch
|
||||
}
|
||||
|
||||
// Exact name match (case insensitive)
|
||||
for app in runningApps {
|
||||
if let appName = app.localizedName, appName.lowercased() == identifier.lowercased() {
|
||||
matches.append(AppMatch(app: app, score: 1.0, matchType: "exact_name"))
|
||||
}
|
||||
}
|
||||
|
||||
// Partial name matches
|
||||
for app in runningApps {
|
||||
for app in apps {
|
||||
// Check exact name match
|
||||
if let appName = app.localizedName {
|
||||
let lowerAppName = appName.lowercased()
|
||||
let lowerIdentifier = identifier.lowercased()
|
||||
if appName.lowercased() == lowerIdentifier {
|
||||
matches.append(AppMatch(app: app, score: 1.0, matchType: "exact_name"))
|
||||
continue
|
||||
}
|
||||
|
||||
// Check if app name starts with identifier
|
||||
if lowerAppName.hasPrefix(lowerIdentifier) {
|
||||
let score = Double(lowerIdentifier.count) / Double(lowerAppName.count)
|
||||
matches.append(AppMatch(app: app, score: score, matchType: "prefix"))
|
||||
}
|
||||
// Check if app name contains identifier
|
||||
else if lowerAppName.contains(lowerIdentifier) {
|
||||
let score = Double(lowerIdentifier.count) / Double(lowerAppName.count) * 0.8
|
||||
matches.append(AppMatch(app: app, score: score, matchType: "contains"))
|
||||
}
|
||||
// Check partial name matches
|
||||
matches.append(contentsOf: findNameMatches(app: app, appName: appName, identifier: lowerIdentifier))
|
||||
}
|
||||
|
||||
// Check bundle ID partial matches
|
||||
if let bundleId = app.bundleIdentifier {
|
||||
let lowerBundleId = bundleId.lowercased()
|
||||
let lowerIdentifier = identifier.lowercased()
|
||||
|
||||
if lowerBundleId.contains(lowerIdentifier) {
|
||||
let score = Double(lowerIdentifier.count) / Double(lowerBundleId.count) * 0.6
|
||||
matches.append(AppMatch(app: app, score: score, matchType: "bundle_contains"))
|
||||
}
|
||||
// Check bundle ID matches
|
||||
if let bundleId = app.bundleIdentifier, bundleId.lowercased().contains(lowerIdentifier) {
|
||||
let score = Double(lowerIdentifier.count) / Double(bundleId.count) * 0.6
|
||||
matches.append(AppMatch(app: app, score: score, matchType: "bundle_contains"))
|
||||
}
|
||||
}
|
||||
|
||||
// Sort by score (highest first)
|
||||
matches.sort { $0.score > $1.score }
|
||||
return matches.sorted { $0.score > $1.score }
|
||||
}
|
||||
|
||||
// Remove duplicates (same app might match multiple ways)
|
||||
private static func findNameMatches(app: NSRunningApplication, appName: String, identifier: String) -> [AppMatch] {
|
||||
var matches: [AppMatch] = []
|
||||
let lowerAppName = appName.lowercased()
|
||||
|
||||
if lowerAppName.hasPrefix(identifier) {
|
||||
let score = Double(identifier.count) / Double(lowerAppName.count)
|
||||
matches.append(AppMatch(app: app, score: score, matchType: "prefix"))
|
||||
} else if lowerAppName.contains(identifier) {
|
||||
let score = Double(identifier.count) / Double(lowerAppName.count) * 0.8
|
||||
matches.append(AppMatch(app: app, score: score, matchType: "contains"))
|
||||
}
|
||||
|
||||
return matches
|
||||
}
|
||||
|
||||
private static func removeDuplicateMatches(from matches: [AppMatch]) -> [AppMatch] {
|
||||
var uniqueMatches: [AppMatch] = []
|
||||
var seenPIDs: Set<pid_t> = []
|
||||
|
||||
for match in matches {
|
||||
if !seenPIDs.contains(match.app.processIdentifier) {
|
||||
uniqueMatches.append(match)
|
||||
seenPIDs.insert(match.app.processIdentifier)
|
||||
}
|
||||
for match in matches where !seenPIDs.contains(match.app.processIdentifier) {
|
||||
uniqueMatches.append(match)
|
||||
seenPIDs.insert(match.app.processIdentifier)
|
||||
}
|
||||
|
||||
if uniqueMatches.isEmpty {
|
||||
return uniqueMatches
|
||||
}
|
||||
|
||||
private static func processMatchResults(
|
||||
_ matches: [AppMatch],
|
||||
identifier: String,
|
||||
runningApps: [NSRunningApplication]
|
||||
) throws(ApplicationError) -> NSRunningApplication {
|
||||
guard !matches.isEmpty else {
|
||||
Logger.shared.error("No applications found matching: \(identifier)")
|
||||
outputError(
|
||||
message: "No running applications found matching identifier: \(identifier)",
|
||||
|
|
@ -82,33 +98,37 @@ class ApplicationFinder {
|
|||
throw ApplicationError.notFound(identifier)
|
||||
}
|
||||
|
||||
// Check for ambiguous matches (multiple high-scoring matches)
|
||||
let topScore = uniqueMatches[0].score
|
||||
let topMatches = uniqueMatches.filter { abs($0.score - topScore) < 0.1 }
|
||||
// Check for ambiguous matches
|
||||
let topScore = matches[0].score
|
||||
let topMatches = matches.filter { abs($0.score - topScore) < 0.1 }
|
||||
|
||||
if topMatches.count > 1 {
|
||||
let matchDescriptions = topMatches.map { match in
|
||||
"\(match.app.localizedName ?? "Unknown") (\(match.app.bundleIdentifier ?? "unknown.bundle"))"
|
||||
}
|
||||
|
||||
Logger.shared.error("Ambiguous application identifier: \(identifier)")
|
||||
outputError(
|
||||
message: "Multiple applications match identifier '\(identifier)'. Please be more specific.",
|
||||
code: .AMBIGUOUS_APP_IDENTIFIER,
|
||||
details: "Matches found: \(matchDescriptions.joined(separator: ", "))"
|
||||
)
|
||||
handleAmbiguousMatches(topMatches, identifier: identifier)
|
||||
throw ApplicationError.ambiguous(identifier, topMatches.map { $0.app })
|
||||
}
|
||||
|
||||
let bestMatch = uniqueMatches[0]
|
||||
let bestMatch = matches[0]
|
||||
Logger.shared.debug(
|
||||
"Found application: \(bestMatch.app.localizedName ?? "Unknown") " +
|
||||
"(score: \(bestMatch.score), type: \(bestMatch.matchType))"
|
||||
"(score: \(bestMatch.score), type: \(bestMatch.matchType))"
|
||||
)
|
||||
|
||||
return bestMatch.app
|
||||
}
|
||||
|
||||
private static func handleAmbiguousMatches(_ matches: [AppMatch], identifier: String) {
|
||||
let matchDescriptions = matches.map { match in
|
||||
"\(match.app.localizedName ?? "Unknown") (\(match.app.bundleIdentifier ?? "unknown.bundle"))"
|
||||
}
|
||||
|
||||
Logger.shared.error("Ambiguous application identifier: \(identifier)")
|
||||
outputError(
|
||||
message: "Multiple applications match identifier '\(identifier)'. Please be more specific.",
|
||||
code: .AMBIGUOUS_APP_IDENTIFIER,
|
||||
details: "Matches found: \(matchDescriptions.joined(separator: ", "))"
|
||||
)
|
||||
}
|
||||
|
||||
static func getAllRunningApplications() -> [ApplicationInfo] {
|
||||
Logger.shared.debug("Retrieving all running applications")
|
||||
|
||||
|
|
|
|||
|
|
@ -53,55 +53,61 @@ struct ImageCommand: ParsableCommand {
|
|||
|
||||
do {
|
||||
try PermissionsChecker.requireScreenRecordingPermission()
|
||||
|
||||
let captureMode = determineMode()
|
||||
let savedFiles: [SavedFile]
|
||||
|
||||
switch captureMode {
|
||||
case .screen:
|
||||
savedFiles = try captureAllScreens()
|
||||
case .window:
|
||||
if let app = app {
|
||||
savedFiles = try captureApplicationWindow(app)
|
||||
} else {
|
||||
throw CaptureError.appNotFound("No application specified for window capture")
|
||||
}
|
||||
case .multi:
|
||||
if let app = app {
|
||||
savedFiles = try captureAllApplicationWindows(app)
|
||||
} else {
|
||||
savedFiles = try captureAllScreens()
|
||||
}
|
||||
}
|
||||
|
||||
let data = ImageCaptureData(saved_files: savedFiles)
|
||||
|
||||
if jsonOutput {
|
||||
outputSuccess(data: data)
|
||||
} else {
|
||||
print("Captured \(savedFiles.count) image(s):")
|
||||
for file in savedFiles {
|
||||
print(" \(file.path)")
|
||||
}
|
||||
}
|
||||
|
||||
let savedFiles = try performCapture()
|
||||
outputResults(savedFiles)
|
||||
} catch {
|
||||
if jsonOutput {
|
||||
let code: ErrorCode = .CAPTURE_FAILED
|
||||
outputError(
|
||||
message: error.localizedDescription,
|
||||
code: code,
|
||||
details: "Image capture operation failed"
|
||||
)
|
||||
} else {
|
||||
// Create an instance for standard error for this specific print call
|
||||
var localStandardErrorStream = FileHandleTextOutputStream(FileHandle.standardError)
|
||||
print("Error: \(error.localizedDescription)", to: &localStandardErrorStream)
|
||||
}
|
||||
handleError(error)
|
||||
throw ExitCode.failure
|
||||
}
|
||||
}
|
||||
|
||||
private func performCapture() throws -> [SavedFile] {
|
||||
let captureMode = determineMode()
|
||||
|
||||
switch captureMode {
|
||||
case .screen:
|
||||
return try captureAllScreens()
|
||||
case .window:
|
||||
guard let app = app else {
|
||||
throw CaptureError.appNotFound("No application specified for window capture")
|
||||
}
|
||||
return try captureApplicationWindow(app)
|
||||
case .multi:
|
||||
if let app = app {
|
||||
return try captureAllApplicationWindows(app)
|
||||
} else {
|
||||
return try captureAllScreens()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private func outputResults(_ savedFiles: [SavedFile]) {
|
||||
let data = ImageCaptureData(saved_files: savedFiles)
|
||||
|
||||
if jsonOutput {
|
||||
outputSuccess(data: data)
|
||||
} else {
|
||||
print("Captured \(savedFiles.count) image(s):")
|
||||
for file in savedFiles {
|
||||
print(" \(file.path)")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private func handleError(_ error: Error) {
|
||||
if jsonOutput {
|
||||
let code: ErrorCode = .CAPTURE_FAILED
|
||||
outputError(
|
||||
message: error.localizedDescription,
|
||||
code: code,
|
||||
details: "Image capture operation failed"
|
||||
)
|
||||
} else {
|
||||
var localStandardErrorStream = FileHandleTextOutputStream(FileHandle.standardError)
|
||||
print("Error: \(error.localizedDescription)", to: &localStandardErrorStream)
|
||||
}
|
||||
}
|
||||
|
||||
private func determineMode() -> CaptureMode {
|
||||
if let mode = mode {
|
||||
return mode
|
||||
|
|
|
|||
|
|
@ -55,8 +55,8 @@ struct WindowInfo: Codable {
|
|||
}
|
||||
|
||||
struct WindowBounds: Codable {
|
||||
let x: Int
|
||||
let y: Int
|
||||
let xCoordinate: Int
|
||||
let yCoordinate: Int
|
||||
let width: Int
|
||||
let height: Int
|
||||
}
|
||||
|
|
|
|||
|
|
@ -6,6 +6,14 @@ class WindowManager {
|
|||
static func getWindowsForApp(pid: pid_t, includeOffScreen: Bool = false) throws(WindowError) -> [WindowData] {
|
||||
Logger.shared.debug("Getting windows for PID: \(pid)")
|
||||
|
||||
let windowList = try fetchWindowList(includeOffScreen: includeOffScreen)
|
||||
let windows = extractWindowsForPID(pid, from: windowList)
|
||||
|
||||
Logger.shared.debug("Found \(windows.count) windows for PID \(pid)")
|
||||
return windows.sorted { $0.windowIndex < $1.windowIndex }
|
||||
}
|
||||
|
||||
private static func fetchWindowList(includeOffScreen: Bool) throws(WindowError) -> [[String: Any]] {
|
||||
let options: CGWindowListOption = includeOffScreen
|
||||
? [.excludeDesktopElements]
|
||||
: [.excludeDesktopElements, .optionOnScreenOnly]
|
||||
|
|
@ -14,57 +22,56 @@ class WindowManager {
|
|||
throw WindowError.windowListFailed
|
||||
}
|
||||
|
||||
return windowList
|
||||
}
|
||||
|
||||
private static func extractWindowsForPID(_ pid: pid_t, from windowList: [[String: Any]]) -> [WindowData] {
|
||||
var windows: [WindowData] = []
|
||||
var windowIndex = 0
|
||||
|
||||
for windowInfo in windowList {
|
||||
guard let windowPID = windowInfo[kCGWindowOwnerPID as String] as? Int32,
|
||||
windowPID == pid
|
||||
else {
|
||||
continue
|
||||
if let window = parseWindowInfo(windowInfo, targetPID: pid, index: windowIndex) {
|
||||
windows.append(window)
|
||||
windowIndex += 1
|
||||
}
|
||||
|
||||
guard let windowID = windowInfo[kCGWindowNumber as String] as? CGWindowID else {
|
||||
continue
|
||||
}
|
||||
|
||||
let windowTitle = windowInfo[kCGWindowName as String] as? String ?? "Untitled"
|
||||
|
||||
// Get window bounds
|
||||
var bounds = CGRect.zero
|
||||
if let boundsDict = windowInfo[kCGWindowBounds as String] as? [String: Any] {
|
||||
let x = boundsDict["X"] as? Double ?? 0
|
||||
let y = boundsDict["Y"] as? Double ?? 0
|
||||
let width = boundsDict["Width"] as? Double ?? 0
|
||||
let height = boundsDict["Height"] as? Double ?? 0
|
||||
bounds = CGRect(x: x, y: y, width: width, height: height)
|
||||
}
|
||||
|
||||
// Determine if window is on screen
|
||||
let isOnScreen = windowInfo[kCGWindowIsOnscreen as String] as? Bool ?? true
|
||||
|
||||
let windowData = WindowData(
|
||||
windowId: windowID,
|
||||
title: windowTitle,
|
||||
bounds: bounds,
|
||||
isOnScreen: isOnScreen,
|
||||
windowIndex: windowIndex
|
||||
)
|
||||
|
||||
windows.append(windowData)
|
||||
windowIndex += 1
|
||||
}
|
||||
|
||||
// Sort by window layer (frontmost first)
|
||||
windows.sort { (first: WindowData, second: WindowData) -> Bool in
|
||||
// Windows with higher layer (closer to front) come first
|
||||
return first.windowIndex < second.windowIndex
|
||||
}
|
||||
|
||||
Logger.shared.debug("Found \(windows.count) windows for PID \(pid)")
|
||||
return windows
|
||||
}
|
||||
|
||||
private static func parseWindowInfo(_ info: [String: Any], targetPID: pid_t, index: Int) -> WindowData? {
|
||||
guard let windowPID = info[kCGWindowOwnerPID as String] as? Int32,
|
||||
windowPID == targetPID,
|
||||
let windowID = info[kCGWindowNumber as String] as? CGWindowID else {
|
||||
return nil
|
||||
}
|
||||
|
||||
let title = info[kCGWindowName as String] as? String ?? "Untitled"
|
||||
let bounds = extractWindowBounds(from: info)
|
||||
let isOnScreen = info[kCGWindowIsOnscreen as String] as? Bool ?? true
|
||||
|
||||
return WindowData(
|
||||
windowId: windowID,
|
||||
title: title,
|
||||
bounds: bounds,
|
||||
isOnScreen: isOnScreen,
|
||||
windowIndex: index
|
||||
)
|
||||
}
|
||||
|
||||
private static func extractWindowBounds(from windowInfo: [String: Any]) -> CGRect {
|
||||
guard let boundsDict = windowInfo[kCGWindowBounds as String] as? [String: Any] else {
|
||||
return .zero
|
||||
}
|
||||
|
||||
let xCoordinate = boundsDict["X"] as? Double ?? 0
|
||||
let yCoordinate = boundsDict["Y"] as? Double ?? 0
|
||||
let width = boundsDict["Width"] as? Double ?? 0
|
||||
let height = boundsDict["Height"] as? Double ?? 0
|
||||
|
||||
return CGRect(x: xCoordinate, y: yCoordinate, width: width, height: height)
|
||||
}
|
||||
|
||||
static func getWindowsInfoForApp(
|
||||
pid: pid_t,
|
||||
includeOffScreen: Bool = false,
|
||||
|
|
@ -79,8 +86,8 @@ class WindowManager {
|
|||
window_id: includeIDs ? windowData.windowId : nil,
|
||||
window_index: windowData.windowIndex,
|
||||
bounds: includeBounds ? WindowBounds(
|
||||
x: Int(windowData.bounds.origin.x),
|
||||
y: Int(windowData.bounds.origin.y),
|
||||
xCoordinate: Int(windowData.bounds.origin.x),
|
||||
yCoordinate: Int(windowData.bounds.origin.y),
|
||||
width: Int(windowData.bounds.size.width),
|
||||
height: Int(windowData.bounds.size.height)
|
||||
) : nil,
|
||||
|
|
|
|||
17
src/index.ts
17
src/index.ts
|
|
@ -102,18 +102,11 @@ server.setRequestHandler(ListToolsRequestSchema, async () => {
|
|||
tools: [
|
||||
{
|
||||
name: "image",
|
||||
description:
|
||||
`Captures macOS screen content and can optionally perform AI-powered analysis on the screenshot.
|
||||
|
||||
Core Capabilities:
|
||||
- Screen Capture: Captures the entire screen (handles multiple displays by capturing each as a separate image), a specific application window, or all windows of a target application.
|
||||
- Flexible Output: Saves images to a specified path and/or returns image data directly in the response.
|
||||
- AI Analysis: If a 'question' parameter is provided, the captured image is sent to an AI model for analysis (e.g., asking what's in the image, reading text).
|
||||
|
||||
Multi-Screen/Window Behavior:
|
||||
- 'screen' mode with multiple displays: Captures each display as an individual image. If a path is provided, files will be named like 'path_display1.png', 'path_display2.png'.
|
||||
- 'multi' mode for an app: Captures all visible windows of the specified application. If a path is provided, files will be named like 'path_window1_title.png', 'path_window2_title.png'.` +
|
||||
statusSuffix,
|
||||
description: `Captures macOS screen content and optionally analyzes it. \
|
||||
Targets can be entire screen, specific app window, or all windows of an app (via app_target). \
|
||||
Supports foreground/background capture. Output via file path or inline Base64 data (format: "data"). \
|
||||
If a question is provided, image is analyzed by an AI model (auto-selected from PEEKABOO_AI_PROVIDERS). \
|
||||
Window shadows/frames excluded. ${serverStatus}`,
|
||||
inputSchema: zodToJsonSchema(imageToolSchema),
|
||||
},
|
||||
{
|
||||
|
|
|
|||
|
|
@ -1,107 +1,26 @@
|
|||
import { z } from "zod";
|
||||
import {
|
||||
ToolContext,
|
||||
ImageCaptureData,
|
||||
SavedFile,
|
||||
AIProviderConfig,
|
||||
ToolResponse,
|
||||
AIProvider,
|
||||
ImageInput,
|
||||
imageToolSchema,
|
||||
} from "../types/index.js";
|
||||
import { executeSwiftCli, readImageAsBase64 } from "../utils/peekaboo-cli.js";
|
||||
import {
|
||||
determineProviderAndModel,
|
||||
analyzeImageWithProvider,
|
||||
parseAIProviders,
|
||||
isProviderAvailable,
|
||||
analyzeImageWithProvider,
|
||||
} from "../utils/ai-providers.js";
|
||||
import * as fs from "fs/promises";
|
||||
import * as pathModule from "path";
|
||||
import * as os from "os";
|
||||
import { Logger } from "pino";
|
||||
|
||||
export const imageToolSchema = z.object({
|
||||
app: z
|
||||
.string()
|
||||
.optional()
|
||||
.describe("Target application name or bundle ID."),
|
||||
question: z
|
||||
.string()
|
||||
.optional()
|
||||
.describe(
|
||||
"If provided, the captured image will be analyzed using this question. Analysis results will be added to the output.",
|
||||
),
|
||||
format: z
|
||||
.enum(["png", "jpg"])
|
||||
.optional()
|
||||
.default("png")
|
||||
.describe("Output image format. Defaults to 'png'."),
|
||||
return_data: z
|
||||
.boolean()
|
||||
.optional()
|
||||
.default(false)
|
||||
.describe(
|
||||
"Optional. If true, image data is returned in response content (one item for 'window' mode, multiple for 'screen' or 'multi' mode). If 'question' is provided, 'base64_data' is NOT returned regardless of this flag.",
|
||||
),
|
||||
capture_focus: z
|
||||
.enum(["background", "foreground"])
|
||||
.optional()
|
||||
.default("background")
|
||||
.describe(
|
||||
"Optional. Focus behavior. 'background' (default): capture without altering window focus. 'foreground': bring target to front before capture.",
|
||||
),
|
||||
path: z
|
||||
.string()
|
||||
.optional()
|
||||
.describe(
|
||||
"Optional. Base absolute path for saving. For 'screen' or 'multi' mode, display/window info is appended by backend. If omitted, default temporary paths used by backend. If 'return_data' true, images saved AND returned if 'path' specified.",
|
||||
),
|
||||
mode: z
|
||||
.enum(["screen", "window", "multi"])
|
||||
.optional()
|
||||
.describe(
|
||||
"Capture mode. Defaults to 'window' if 'app' is provided, otherwise 'screen'.",
|
||||
),
|
||||
window_specifier: z
|
||||
.union([
|
||||
z.object({ title: z.string().describe("Capture window by title.") }),
|
||||
z.object({
|
||||
index: z
|
||||
.number()
|
||||
.int()
|
||||
.nonnegative()
|
||||
.describe(
|
||||
"Capture window by index (0=frontmost). 'capture_focus' might need to be 'foreground'.",
|
||||
),
|
||||
}),
|
||||
])
|
||||
.optional()
|
||||
.describe(
|
||||
"Optional. Specifies which window for 'window' mode. Defaults to main/frontmost of target app.",
|
||||
),
|
||||
provider_config: z
|
||||
.object({
|
||||
type: z
|
||||
.enum(["auto", "ollama", "openai"])
|
||||
.default("auto")
|
||||
.describe(
|
||||
"AI provider type (e.g., 'ollama', 'openai'). 'auto' uses server default. Must be enabled on server.",
|
||||
),
|
||||
model: z
|
||||
.string()
|
||||
.optional()
|
||||
.describe(
|
||||
"Optional model name. If omitted, uses server default for the chosen provider type.",
|
||||
),
|
||||
})
|
||||
.optional()
|
||||
.describe(
|
||||
"Optional. Specify AI provider and model for analysis. Overrides server defaults.",
|
||||
),
|
||||
});
|
||||
|
||||
export type ImageToolInput = z.infer<typeof imageToolSchema>;
|
||||
export { imageToolSchema } from "../types/index.js";
|
||||
|
||||
export async function imageToolHandler(
|
||||
input: ImageToolInput,
|
||||
input: ImageInput,
|
||||
context: ToolContext,
|
||||
): Promise<ToolResponse> {
|
||||
const { logger } = context;
|
||||
|
|
@ -115,24 +34,28 @@ export async function imageToolHandler(
|
|||
try {
|
||||
logger.debug({ input }, "Processing peekaboo.image tool call");
|
||||
|
||||
// Determine effective path and format for Swift CLI
|
||||
let effectivePath = input.path;
|
||||
if (input.question && !input.path) {
|
||||
let swiftFormat = input.format === "data" ? "png" : (input.format || "png");
|
||||
|
||||
// Create temporary path if needed for analysis or data return without path
|
||||
const needsTempPath = (input.question && !input.path) || (!input.path && input.format === "data") || (!input.path && !input.format);
|
||||
if (needsTempPath) {
|
||||
const tempDir = await fs.mkdtemp(
|
||||
pathModule.join(os.tmpdir(), "peekaboo-img-"),
|
||||
);
|
||||
tempImagePathUsed = pathModule.join(
|
||||
tempDir,
|
||||
`capture.${input.format || "png"}`,
|
||||
`capture.${swiftFormat}`,
|
||||
);
|
||||
effectivePath = tempImagePathUsed;
|
||||
logger.debug(
|
||||
{ tempPath: tempImagePathUsed },
|
||||
"Using temporary path for capture as question is provided and no path specified.",
|
||||
"Using temporary path for capture.",
|
||||
);
|
||||
}
|
||||
|
||||
const cliInput = { ...input, path: effectivePath };
|
||||
const args = buildSwiftCliArgs(cliInput);
|
||||
const args = buildSwiftCliArgs(input, logger, effectivePath, swiftFormat);
|
||||
|
||||
const swiftResponse = await executeSwiftCli(args, logger);
|
||||
|
||||
|
|
@ -175,8 +98,18 @@ export async function imageToolHandler(
|
|||
|
||||
const captureData = swiftResponse.data as ImageCaptureData;
|
||||
const imagePathForAnalysis = captureData.saved_files[0].path;
|
||||
finalSavedFiles =
|
||||
input.question && tempImagePathUsed ? [] : captureData.saved_files || [];
|
||||
|
||||
// Determine which files to report as saved
|
||||
if (input.question && tempImagePathUsed) {
|
||||
// Analysis with temp path - don't include in saved_files
|
||||
finalSavedFiles = [];
|
||||
} else if (!input.path && (input.format === "data" || !input.format)) {
|
||||
// Data format without path - don't include in saved_files
|
||||
finalSavedFiles = [];
|
||||
} else {
|
||||
// User provided path or default save behavior - include in saved_files
|
||||
finalSavedFiles = captureData.saved_files || [];
|
||||
}
|
||||
|
||||
let imageBase64ForAnalysis: string | undefined;
|
||||
if (input.question) {
|
||||
|
|
@ -196,37 +129,25 @@ export async function imageToolHandler(
|
|||
const configuredProviders = parseAIProviders(
|
||||
process.env.PEEKABOO_AI_PROVIDERS || "",
|
||||
);
|
||||
if (!configuredProviders.length && !input.provider_config) {
|
||||
if (!configuredProviders.length) {
|
||||
analysisText =
|
||||
"Analysis skipped: AI analysis not configured on this server (PEEKABOO_AI_PROVIDERS is not set or empty) and no specific provider was requested.";
|
||||
"Analysis skipped: AI analysis not configured on this server (PEEKABOO_AI_PROVIDERS is not set or empty).";
|
||||
logger.warn(analysisText);
|
||||
} else {
|
||||
try {
|
||||
const providerDetails = await determineProviderAndModel(
|
||||
input.provider_config,
|
||||
configuredProviders,
|
||||
logger,
|
||||
);
|
||||
|
||||
if (!providerDetails.provider) {
|
||||
analysisText =
|
||||
"Analysis skipped: No AI providers are currently operational or configured for the request.";
|
||||
logger.warn(analysisText);
|
||||
} else {
|
||||
analysisText = await analyzeImageWithProvider(
|
||||
providerDetails as AIProvider,
|
||||
imagePathForAnalysis,
|
||||
imageBase64ForAnalysis,
|
||||
input.question,
|
||||
logger,
|
||||
);
|
||||
modelUsed = `${providerDetails.provider}/${providerDetails.model}`;
|
||||
analysisSucceeded = true;
|
||||
logger.info({ provider: modelUsed }, "Image analysis successful");
|
||||
}
|
||||
} catch (aiError) {
|
||||
logger.error({ error: aiError }, "AI analysis failed");
|
||||
analysisText = `AI analysis failed: ${aiError instanceof Error ? aiError.message : "Unknown AI error"}`;
|
||||
const analysisResult = await performAutomaticAnalysis(
|
||||
imageBase64ForAnalysis,
|
||||
input.question,
|
||||
logger,
|
||||
process.env.PEEKABOO_AI_PROVIDERS || "",
|
||||
);
|
||||
|
||||
if (analysisResult.error) {
|
||||
analysisText = analysisResult.error;
|
||||
} else {
|
||||
analysisText = analysisResult.analysisText;
|
||||
modelUsed = analysisResult.modelUsed;
|
||||
analysisSucceeded = true;
|
||||
logger.info({ provider: modelUsed }, "Image analysis successful");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -243,11 +164,10 @@ export async function imageToolHandler(
|
|||
content.push({ type: "text", text: `Analysis Result: ${analysisText}` });
|
||||
}
|
||||
|
||||
if (
|
||||
input.return_data &&
|
||||
!input.question &&
|
||||
captureData.saved_files?.length > 0
|
||||
) {
|
||||
// Return base64 data if format is 'data' or path not provided (and no question)
|
||||
const shouldReturnData = (input.format === "data" || !input.path) && !input.question;
|
||||
|
||||
if (shouldReturnData && captureData.saved_files?.length > 0) {
|
||||
for (const savedFile of captureData.saved_files) {
|
||||
try {
|
||||
const imageBase64 = await readImageAsBase64(savedFile.path);
|
||||
|
|
@ -329,43 +249,99 @@ export async function imageToolHandler(
|
|||
}
|
||||
}
|
||||
|
||||
export function buildSwiftCliArgs(input: ImageToolInput): string[] {
|
||||
export function buildSwiftCliArgs(
|
||||
input: ImageInput,
|
||||
logger?: Logger,
|
||||
effectivePath?: string | undefined,
|
||||
swiftFormat?: string,
|
||||
): string[] {
|
||||
const args = ["image"];
|
||||
|
||||
// Use provided values or derive from input
|
||||
const actualPath = effectivePath !== undefined ? effectivePath : input.path;
|
||||
const actualFormat = swiftFormat || (input.format === "data" ? "png" : input.format) || "png";
|
||||
|
||||
// Create a logger if not provided (for backward compatibility)
|
||||
const log = logger || {
|
||||
warn: () => {},
|
||||
error: () => {},
|
||||
debug: () => {}
|
||||
} as any;
|
||||
|
||||
let mode = input.mode;
|
||||
if (!mode) {
|
||||
mode = input.app ? "window" : "screen";
|
||||
// Parse app_target to determine Swift CLI arguments
|
||||
if (!input.app_target || input.app_target === "") {
|
||||
// Omitted/empty: All screens
|
||||
args.push("--mode", "screen");
|
||||
} else if (input.app_target.startsWith("screen:")) {
|
||||
// 'screen:INDEX': Specific display
|
||||
const screenIndex = input.app_target.substring(7);
|
||||
args.push("--mode", "screen");
|
||||
// Note: --screen-index is not yet implemented in Swift CLI
|
||||
// For now, we'll just use screen mode without index
|
||||
log.warn(
|
||||
{ screenIndex },
|
||||
"Screen index specification not yet supported by Swift CLI, capturing all screens",
|
||||
);
|
||||
} else if (input.app_target === "frontmost") {
|
||||
// 'frontmost': Would need to determine frontmost app
|
||||
// For now, default to screen mode with a warning
|
||||
log.warn(
|
||||
"'frontmost' target requires determining current frontmost app, defaulting to screen mode",
|
||||
);
|
||||
args.push("--mode", "screen");
|
||||
} else if (input.app_target.includes(":")) {
|
||||
// 'AppName:WINDOW_TITLE:Title' or 'AppName:WINDOW_INDEX:Index'
|
||||
const parts = input.app_target.split(":");
|
||||
if (parts.length >= 3) {
|
||||
const appName = parts[0];
|
||||
const specifierType = parts[1];
|
||||
const specifierValue = parts.slice(2).join(":"); // Handle colons in window titles
|
||||
|
||||
args.push("--app", appName);
|
||||
args.push("--mode", "window");
|
||||
|
||||
if (specifierType === "WINDOW_TITLE") {
|
||||
args.push("--window-title", specifierValue);
|
||||
} else if (specifierType === "WINDOW_INDEX") {
|
||||
args.push("--window-index", specifierValue);
|
||||
} else {
|
||||
log.warn(
|
||||
{ specifierType },
|
||||
"Unknown window specifier type, defaulting to main window",
|
||||
);
|
||||
}
|
||||
} else {
|
||||
log.error(
|
||||
{ app_target: input.app_target },
|
||||
"Invalid app_target format",
|
||||
);
|
||||
args.push("--mode", "screen");
|
||||
}
|
||||
} else {
|
||||
// 'AppName': All windows of the app
|
||||
args.push("--app", input.app_target);
|
||||
args.push("--mode", "multi");
|
||||
}
|
||||
|
||||
if (input.app) {
|
||||
args.push("--app", input.app);
|
||||
}
|
||||
|
||||
if (input.path) {
|
||||
args.push("--path", input.path);
|
||||
} else if (process.env.PEEKABOO_DEFAULT_SAVE_PATH && !input.question) {
|
||||
// Add path if provided
|
||||
if (actualPath) {
|
||||
args.push("--path", actualPath);
|
||||
} else if (process.env.PEEKABOO_DEFAULT_SAVE_PATH) {
|
||||
args.push("--path", process.env.PEEKABOO_DEFAULT_SAVE_PATH);
|
||||
}
|
||||
|
||||
args.push("--mode", mode);
|
||||
// Add format
|
||||
args.push("--format", actualFormat);
|
||||
|
||||
if (input.window_specifier) {
|
||||
if ("title" in input.window_specifier) {
|
||||
args.push("--window-title", input.window_specifier.title);
|
||||
} else if ("index" in input.window_specifier) {
|
||||
args.push("--window-index", input.window_specifier.index.toString());
|
||||
}
|
||||
}
|
||||
|
||||
args.push("--format", input.format!);
|
||||
args.push("--capture-focus", input.capture_focus!);
|
||||
// Add capture focus
|
||||
args.push("--capture-focus", input.capture_focus || "background");
|
||||
|
||||
return args;
|
||||
}
|
||||
|
||||
function generateImageCaptureSummary(
|
||||
data: ImageCaptureData,
|
||||
input: ImageToolInput,
|
||||
input: ImageInput,
|
||||
): string {
|
||||
const fileCount = data.saved_files?.length || 0;
|
||||
|
||||
|
|
@ -376,12 +352,29 @@ function generateImageCaptureSummary(
|
|||
return "Image capture completed but no files were saved or available for analysis.";
|
||||
}
|
||||
|
||||
const mode = input.mode || (input.app ? "window" : "screen");
|
||||
const target = input.app || "screen";
|
||||
// Determine mode and target from app_target
|
||||
let mode = "screen";
|
||||
let target = "screen";
|
||||
|
||||
if (input.app_target) {
|
||||
if (input.app_target.startsWith("screen:")) {
|
||||
mode = "screen";
|
||||
target = input.app_target;
|
||||
} else if (input.app_target === "frontmost") {
|
||||
mode = "screen"; // defaulted to screen
|
||||
target = "frontmost application";
|
||||
} else if (input.app_target.includes(":")) {
|
||||
mode = "window";
|
||||
target = input.app_target.split(":")[0];
|
||||
} else {
|
||||
mode = "multi";
|
||||
target = input.app_target;
|
||||
}
|
||||
}
|
||||
|
||||
let summary = `Captured ${fileCount} image${fileCount > 1 ? "s" : ""} in ${mode} mode`;
|
||||
if (input.app) {
|
||||
summary += ` for application: ${target}`;
|
||||
if (input.app_target && target !== "screen") {
|
||||
summary += ` for ${target}`;
|
||||
}
|
||||
summary += ".";
|
||||
|
||||
|
|
@ -401,3 +394,77 @@ function generateImageCaptureSummary(
|
|||
|
||||
return summary;
|
||||
}
|
||||
|
||||
async function performAutomaticAnalysis(
|
||||
base64Image: string,
|
||||
question: string,
|
||||
logger: Logger,
|
||||
availableProvidersEnv: string,
|
||||
): Promise<{
|
||||
analysisText?: string;
|
||||
modelUsed?: string;
|
||||
error?: string;
|
||||
}> {
|
||||
const providers = parseAIProviders(availableProvidersEnv);
|
||||
|
||||
if (!providers.length) {
|
||||
return {
|
||||
error: "Analysis skipped: No AI providers configured",
|
||||
};
|
||||
}
|
||||
|
||||
// Try each provider in order until one succeeds
|
||||
for (const provider of providers) {
|
||||
try {
|
||||
logger.debug(
|
||||
{ provider: `${provider.provider}/${provider.model}` },
|
||||
"Attempting analysis with provider",
|
||||
);
|
||||
|
||||
// Create a temporary file for the provider (some providers need file paths)
|
||||
const tempDir = await fs.mkdtemp(
|
||||
pathModule.join(os.tmpdir(), "peekaboo-analysis-"),
|
||||
);
|
||||
const tempPath = pathModule.join(tempDir, "image.png");
|
||||
const imageBuffer = Buffer.from(base64Image, "base64");
|
||||
await fs.writeFile(tempPath, imageBuffer);
|
||||
|
||||
try {
|
||||
const analysisText = await analyzeImageWithProvider(
|
||||
provider,
|
||||
tempPath,
|
||||
base64Image,
|
||||
question,
|
||||
logger,
|
||||
);
|
||||
|
||||
// Clean up temp file
|
||||
await fs.unlink(tempPath);
|
||||
await fs.rmdir(tempDir);
|
||||
|
||||
return {
|
||||
analysisText,
|
||||
modelUsed: `${provider.provider}/${provider.model}`,
|
||||
};
|
||||
} finally {
|
||||
// Ensure cleanup even if analysis fails
|
||||
try {
|
||||
await fs.unlink(tempPath);
|
||||
await fs.rmdir(tempDir);
|
||||
} catch (cleanupError) {
|
||||
logger.debug({ error: cleanupError }, "Failed to clean up analysis temp file");
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
logger.debug(
|
||||
{ error, provider: `${provider.provider}/${provider.model}` },
|
||||
"Provider failed, trying next",
|
||||
);
|
||||
// Continue to next provider
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
error: "Analysis failed: All configured AI providers failed or are unavailable",
|
||||
};
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,4 +1,5 @@
|
|||
import { Logger } from "pino";
|
||||
import { z } from "zod";
|
||||
|
||||
export interface SwiftCliResponse {
|
||||
success: boolean;
|
||||
|
|
@ -103,3 +104,38 @@ export interface ToolResponse {
|
|||
_meta?: Record<string, any>;
|
||||
[key: string]: any; // Allow additional properties
|
||||
}
|
||||
|
||||
export const imageToolSchema = z.object({
|
||||
app_target: z.string().optional().describe(
|
||||
"Optional. Specifies the capture target. Examples:\n" +
|
||||
"- Omitted/empty: All screens.\n" +
|
||||
"- 'screen:INDEX': Specific display (e.g., 'screen:0').\n" +
|
||||
"- 'frontmost': All windows of the current foreground app.\n" +
|
||||
"- 'AppName': All windows of 'AppName'.\n" +
|
||||
"- 'AppName:WINDOW_TITLE:Title': Window of 'AppName' with 'Title'.\n" +
|
||||
"- 'AppName:WINDOW_INDEX:Index': Window of 'AppName' at 'Index'."
|
||||
),
|
||||
path: z.string().optional().describe(
|
||||
"Optional. Base absolute path for saving the image. " +
|
||||
"If 'format' is 'data' and 'path' is also given, image is saved AND Base64 data returned. " +
|
||||
"If 'question' is provided and 'path' is omitted, a temporary path is used for capture, and the file is deleted after analysis."
|
||||
),
|
||||
question: z.string().optional().describe(
|
||||
"Optional. If provided, the captured image will be analyzed. " +
|
||||
"The server automatically selects an AI provider from 'PEEKABOO_AI_PROVIDERS'."
|
||||
),
|
||||
format: z.enum(["png", "jpg", "data"]).optional().default("png").describe(
|
||||
"Output format. 'png' or 'jpg' save to 'path' (if provided). " +
|
||||
"'data' returns Base64 encoded PNG data inline; if 'path' is also given, saves a PNG file to 'path' too. " +
|
||||
"If 'path' is not given, 'format' defaults to 'data' behavior (inline PNG data returned)."
|
||||
),
|
||||
capture_focus: z.enum(["background", "foreground"])
|
||||
.optional()
|
||||
.default("background")
|
||||
.describe(
|
||||
"Optional. Focus behavior. 'background' (default): capture without altering window focus. " +
|
||||
"'foreground': bring target to front before capture."
|
||||
),
|
||||
});
|
||||
|
||||
export type ImageInput = z.infer<typeof imageToolSchema>;
|
||||
|
|
|
|||
531
tests/integration/image-tool.test.ts
Normal file
531
tests/integration/image-tool.test.ts
Normal file
|
|
@ -0,0 +1,531 @@
|
|||
import { imageToolHandler } from "../../src/tools/image";
|
||||
import { pino } from "pino";
|
||||
import { ImageInput } from "../../src/types";
|
||||
import { vi } from "vitest";
|
||||
import * as fs from "fs/promises";
|
||||
import * as os from "os";
|
||||
import * as path from "path";
|
||||
import { initializeSwiftCliPath, executeSwiftCli } from "../../src/utils/peekaboo-cli";
|
||||
import { mockSwiftCli } from "../mocks/peekaboo-cli.mock";
|
||||
|
||||
// Mock the Swift CLI execution
|
||||
vi.mock("../../src/utils/peekaboo-cli", async () => {
|
||||
const actual = await vi.importActual("../../src/utils/peekaboo-cli");
|
||||
return {
|
||||
...actual,
|
||||
executeSwiftCli: vi.fn(),
|
||||
readImageAsBase64: vi.fn().mockResolvedValue("mock-base64-data"),
|
||||
};
|
||||
});
|
||||
|
||||
// Mock AI providers to avoid real API calls in integration tests
|
||||
vi.mock("../../src/utils/ai-providers", () => ({
|
||||
parseAIProviders: vi.fn().mockReturnValue([{ provider: "mock", model: "test" }]),
|
||||
analyzeImageWithProvider: vi.fn().mockResolvedValue("Mock analysis: This is a test image"),
|
||||
}));
|
||||
|
||||
const mockExecuteSwiftCli = executeSwiftCli as vi.MockedFunction<typeof executeSwiftCli>;
|
||||
|
||||
const mockLogger = pino({ level: "silent" });
|
||||
const mockContext = { logger: mockLogger };
|
||||
|
||||
// Helper to check if file exists
|
||||
async function fileExists(filePath: string): Promise<boolean> {
|
||||
try {
|
||||
await fs.access(filePath);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
describe("Image Tool Integration Tests", () => {
|
||||
let tempDir: string;
|
||||
|
||||
beforeAll(async () => {
|
||||
// Initialize Swift CLI path for tests
|
||||
const testPackageRoot = path.resolve(__dirname, "../..");
|
||||
initializeSwiftCliPath(testPackageRoot);
|
||||
|
||||
// Create a temporary directory for test files
|
||||
tempDir = await fs.mkdtemp(path.join(os.tmpdir(), "peekaboo-test-"));
|
||||
});
|
||||
|
||||
beforeEach(() => {
|
||||
vi.clearAllMocks();
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
// Clean up temp directory
|
||||
try {
|
||||
const files = await fs.readdir(tempDir);
|
||||
for (const file of files) {
|
||||
await fs.unlink(path.join(tempDir, file));
|
||||
}
|
||||
await fs.rmdir(tempDir);
|
||||
} catch (error) {
|
||||
console.error("Failed to clean up temp directory:", error);
|
||||
}
|
||||
});
|
||||
|
||||
describe("Capture with different app_target values", () => {
|
||||
it("should capture screen when app_target is omitted", async () => {
|
||||
// Mock successful screen capture
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: path.join(tempDir, "peekaboo-img-test", "capture.png"),
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler({}, mockContext);
|
||||
|
||||
expect(result.isError).toBeFalsy();
|
||||
expect(result.content[0].type).toBe("text");
|
||||
expect(result.content[0].text).toContain("Captured");
|
||||
// Should return base64 data when format and path are omitted
|
||||
expect(result.content.some((item) => item.type === "image")).toBe(true);
|
||||
});
|
||||
|
||||
it("should capture screen when app_target is empty string", async () => {
|
||||
const input: ImageInput = { app_target: "" };
|
||||
|
||||
// Mock successful screen capture
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: path.join(tempDir, "peekaboo-img-test", "capture.png"),
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
expect(result.isError).toBeFalsy();
|
||||
expect(result.content[0].text).toContain("Captured");
|
||||
});
|
||||
|
||||
it("should handle screen:INDEX format (with warning)", async () => {
|
||||
const input: ImageInput = { app_target: "screen:0" };
|
||||
const loggerWarnSpy = vi.spyOn(mockLogger, "warn");
|
||||
|
||||
// Mock successful screen capture
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: path.join(tempDir, "peekaboo-img-test", "capture.png"),
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
expect(result.isError).toBeFalsy();
|
||||
expect(loggerWarnSpy).toHaveBeenCalledWith(
|
||||
expect.objectContaining({ screenIndex: "0" }),
|
||||
"Screen index specification not yet supported by Swift CLI, capturing all screens",
|
||||
);
|
||||
});
|
||||
|
||||
it("should handle frontmost app_target (with warning)", async () => {
|
||||
const input: ImageInput = { app_target: "frontmost" };
|
||||
const loggerWarnSpy = vi.spyOn(mockLogger, "warn");
|
||||
|
||||
// Mock successful screen capture
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: path.join(tempDir, "peekaboo-img-test", "capture.png"),
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
expect(result.isError).toBeFalsy();
|
||||
expect(loggerWarnSpy).toHaveBeenCalledWith(
|
||||
"'frontmost' target requires determining current frontmost app, defaulting to screen mode",
|
||||
);
|
||||
});
|
||||
|
||||
it("should capture specific app windows", async () => {
|
||||
const input: ImageInput = { app_target: "Finder" };
|
||||
|
||||
// Mock app not found error
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.appNotFound("Finder")
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
// The result depends on whether Finder is running
|
||||
// We're just testing that the handler processes the request correctly
|
||||
expect(result.content[0].type).toBe("text");
|
||||
if (result.isError) {
|
||||
expect(result.content[0].text).toContain("not found or not running");
|
||||
} else {
|
||||
expect(result.content[0].text).toContain("Captured");
|
||||
}
|
||||
});
|
||||
|
||||
it("should capture specific window by title", async () => {
|
||||
const input: ImageInput = { app_target: "Safari:WINDOW_TITLE:Test Window" };
|
||||
|
||||
// Mock app not found error
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.appNotFound("Safari")
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
expect(result.content[0].type).toBe("text");
|
||||
// May fail if Safari isn't running or window doesn't exist
|
||||
if (result.isError) {
|
||||
expect(result.content[0].text).toContain("not found or not running");
|
||||
}
|
||||
});
|
||||
|
||||
it("should capture specific window by index", async () => {
|
||||
const input: ImageInput = { app_target: "Terminal:WINDOW_INDEX:0" };
|
||||
|
||||
// Mock app not found error
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.appNotFound("Terminal")
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
expect(result.content[0].type).toBe("text");
|
||||
// May fail if Terminal isn't running
|
||||
if (result.isError) {
|
||||
expect(result.content[0].text).toContain("not found or not running");
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe("Format and data return behavior", () => {
|
||||
it("should return base64 data when format is 'data'", async () => {
|
||||
const input: ImageInput = { format: "data" };
|
||||
|
||||
// Mock successful capture with temp path
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: expect.stringMatching(/peekaboo-img-.*\/capture\.png$/),
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
if (!result.isError) {
|
||||
const imageContent = result.content.find((item) => item.type === "image");
|
||||
expect(imageContent).toBeDefined();
|
||||
expect(imageContent?.data).toBeTruthy();
|
||||
expect(typeof imageContent?.data).toBe("string");
|
||||
}
|
||||
});
|
||||
|
||||
it("should save file and return base64 when format is 'data' with path", async () => {
|
||||
const testPath = path.join(tempDir, "test-data-format.png");
|
||||
const input: ImageInput = { format: "data", path: testPath };
|
||||
|
||||
// Mock successful capture with specified path
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: testPath,
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
if (!result.isError) {
|
||||
// Should have base64 data in content
|
||||
const imageContent = result.content.find((item) => item.type === "image");
|
||||
expect(imageContent).toBeDefined();
|
||||
|
||||
// Should have saved file
|
||||
expect(result.saved_files).toHaveLength(1);
|
||||
expect(result.saved_files[0].path).toBe(testPath);
|
||||
|
||||
// In integration tests with mocked CLI, we don't check file existence
|
||||
}
|
||||
});
|
||||
|
||||
it("should save PNG file without base64 in content", async () => {
|
||||
const testPath = path.join(tempDir, "test-png.png");
|
||||
const input: ImageInput = { format: "png", path: testPath };
|
||||
|
||||
// Mock successful capture with specified path
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: testPath,
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
if (!result.isError) {
|
||||
// Should NOT have base64 data in content
|
||||
const imageContent = result.content.find((item) => item.type === "image");
|
||||
expect(imageContent).toBeUndefined();
|
||||
|
||||
// Should have saved file
|
||||
expect(result.saved_files).toHaveLength(1);
|
||||
expect(result.saved_files[0].path).toBe(testPath);
|
||||
|
||||
// In integration tests with mocked CLI, we don't check file existence
|
||||
}
|
||||
});
|
||||
|
||||
it("should save JPG file", async () => {
|
||||
const testPath = path.join(tempDir, "test-jpg.jpg");
|
||||
const input: ImageInput = { format: "jpg", path: testPath };
|
||||
|
||||
// Mock successful capture with specified path
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: testPath,
|
||||
format: "jpg"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
if (!result.isError) {
|
||||
expect(result.saved_files).toHaveLength(1);
|
||||
expect(result.saved_files[0].path).toBe(testPath);
|
||||
expect(result.saved_files[0].mime_type).toBe("image/jpeg");
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe("Analysis with question", () => {
|
||||
beforeEach(() => {
|
||||
// Mock performAutomaticAnalysis for these tests
|
||||
vi.clearAllMocks();
|
||||
});
|
||||
|
||||
it("should analyze image and delete temp file when no path provided", async () => {
|
||||
const input: ImageInput = { question: "What is in this image?" };
|
||||
|
||||
// Mock successful screen capture for analysis
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: expect.stringMatching(/peekaboo-img-.*\/capture\.png$/),
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
// Even if analysis is mocked, the capture should succeed
|
||||
expect(result.content[0].text).toContain("Captured");
|
||||
|
||||
// Should not return base64 data when question is asked
|
||||
const imageContent = result.content.find((item) => item.type === "image");
|
||||
expect(imageContent).toBeUndefined();
|
||||
|
||||
// saved_files should be empty (temp file was deleted)
|
||||
expect(result.saved_files).toEqual([]);
|
||||
});
|
||||
|
||||
it("should analyze image and keep file when path is provided", async () => {
|
||||
const testPath = path.join(tempDir, "test-analysis.png");
|
||||
const input: ImageInput = {
|
||||
question: "Describe this image",
|
||||
path: testPath,
|
||||
format: "png"
|
||||
};
|
||||
|
||||
// Mock successful capture with specified path
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: testPath,
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
if (!result.isError) {
|
||||
// Should have saved file
|
||||
expect(result.saved_files).toHaveLength(1);
|
||||
expect(result.saved_files[0].path).toBe(testPath);
|
||||
|
||||
// In integration tests with mocked CLI, we don't check file existence
|
||||
|
||||
// Should not have base64 data
|
||||
const imageContent = result.content.find((item) => item.type === "image");
|
||||
expect(imageContent).toBeUndefined();
|
||||
}
|
||||
});
|
||||
|
||||
it("should not return base64 even with format: 'data' when question is asked", async () => {
|
||||
const input: ImageInput = {
|
||||
format: "data",
|
||||
question: "What do you see?"
|
||||
};
|
||||
|
||||
// Mock successful capture with temp path for analysis
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: expect.stringMatching(/peekaboo-img-.*\/capture\.png$/),
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
// Should not have base64 data when question is asked
|
||||
const imageContent = result.content.find((item) => item.type === "image");
|
||||
expect(imageContent).toBeUndefined();
|
||||
});
|
||||
});
|
||||
|
||||
describe("Error handling", () => {
|
||||
it("should handle permission errors gracefully", async () => {
|
||||
// This test might fail if permissions are granted
|
||||
// We're testing that the error is handled properly if it occurs
|
||||
const input: ImageInput = { app_target: "System Preferences" };
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
if (result.isError) {
|
||||
expect(result.content[0].text).toContain("failed");
|
||||
expect(result._meta?.backend_error_code).toBeTruthy();
|
||||
}
|
||||
});
|
||||
|
||||
it("should handle invalid app names", async () => {
|
||||
const input: ImageInput = { app_target: "NonExistentApp12345" };
|
||||
|
||||
// Mock app not found error
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.appNotFound("NonExistentApp12345")
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
expect(result.isError).toBe(true);
|
||||
expect(result.content[0].text).toContain("not found or not running");
|
||||
});
|
||||
|
||||
it("should handle invalid window specifiers", async () => {
|
||||
const input: ImageInput = { app_target: "Finder:WINDOW_INDEX:999" };
|
||||
|
||||
// Mock window not found error
|
||||
mockExecuteSwiftCli.mockResolvedValue({
|
||||
success: false,
|
||||
error: {
|
||||
message: "Window index 999 is out of bounds for Finder",
|
||||
code: "WINDOW_NOT_FOUND"
|
||||
}
|
||||
});
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
if (result.isError) {
|
||||
expect(result.content[0].text).toMatch(/WINDOW_NOT_FOUND|out of bounds/);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe("Environment variable handling", () => {
|
||||
it("should use PEEKABOO_DEFAULT_SAVE_PATH when no path provided and no question", async () => {
|
||||
const defaultPath = path.join(tempDir, "default-save.png");
|
||||
process.env.PEEKABOO_DEFAULT_SAVE_PATH = defaultPath;
|
||||
|
||||
try {
|
||||
// Mock successful capture with temp path (overrides PEEKABOO_DEFAULT_SAVE_PATH)
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: expect.stringMatching(/peekaboo-img-.*\/capture\.png$/),
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler({}, mockContext);
|
||||
|
||||
if (!result.isError) {
|
||||
// When no path/format is provided, it uses temp path and returns base64
|
||||
// PEEKABOO_DEFAULT_SAVE_PATH is overridden by the temp path logic
|
||||
expect(result.saved_files).toEqual([]);
|
||||
expect(result.content.some(item => item.type === "image")).toBe(true);
|
||||
}
|
||||
} finally {
|
||||
delete process.env.PEEKABOO_DEFAULT_SAVE_PATH;
|
||||
}
|
||||
});
|
||||
|
||||
it("should NOT use PEEKABOO_DEFAULT_SAVE_PATH when question is provided", async () => {
|
||||
const defaultPath = path.join(tempDir, "should-not-use.png");
|
||||
process.env.PEEKABOO_DEFAULT_SAVE_PATH = defaultPath;
|
||||
|
||||
try {
|
||||
const input: ImageInput = { question: "What is this?" };
|
||||
|
||||
// Mock successful screen capture with temp path
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: expect.stringMatching(/peekaboo-img-.*\/capture\.png$/),
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
// Temp file should be used and deleted
|
||||
expect(result.saved_files).toEqual([]);
|
||||
|
||||
// Default path should not exist
|
||||
const exists = await fileExists(defaultPath);
|
||||
expect(exists).toBe(false);
|
||||
} finally {
|
||||
delete process.env.PEEKABOO_DEFAULT_SAVE_PATH;
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe("Capture focus behavior", () => {
|
||||
it("should capture with background focus by default", async () => {
|
||||
const testPath = path.join(tempDir, "test-bg-focus.png");
|
||||
const input: ImageInput = { path: testPath };
|
||||
|
||||
// Mock successful capture
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: testPath,
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
if (!result.isError) {
|
||||
expect(result.content[0].text).toContain("Captured");
|
||||
// The actual focus behavior is handled by Swift CLI
|
||||
}
|
||||
});
|
||||
|
||||
it("should capture with foreground focus when specified", async () => {
|
||||
const testPath = path.join(tempDir, "test-fg-focus.png");
|
||||
const input: ImageInput = {
|
||||
path: testPath,
|
||||
capture_focus: "foreground"
|
||||
};
|
||||
|
||||
// Mock successful capture
|
||||
mockExecuteSwiftCli.mockResolvedValue(
|
||||
mockSwiftCli.captureImage("screen", {
|
||||
path: testPath,
|
||||
format: "png"
|
||||
})
|
||||
);
|
||||
|
||||
const result = await imageToolHandler(input, mockContext);
|
||||
|
||||
if (!result.isError) {
|
||||
expect(result.content[0].text).toContain("Captured");
|
||||
// The actual focus behavior is handled by Swift CLI
|
||||
}
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
@ -51,7 +51,7 @@ describe("MCP Server Integration", () => {
|
|||
const result = await mockImageToolHandler(
|
||||
{
|
||||
format: "png",
|
||||
return_data: false,
|
||||
path: "/tmp/screen_capture.png",
|
||||
capture_focus: "background",
|
||||
},
|
||||
mockContext,
|
||||
|
|
@ -84,7 +84,6 @@ describe("MCP Server Integration", () => {
|
|||
const result = await mockImageToolHandler(
|
||||
{
|
||||
format: "png",
|
||||
return_data: false,
|
||||
capture_focus: "background",
|
||||
},
|
||||
mockContext,
|
||||
|
|
@ -261,7 +260,6 @@ describe("MCP Server Integration", () => {
|
|||
const result = await mockImageToolHandler(
|
||||
{
|
||||
format: "png",
|
||||
return_data: false,
|
||||
capture_focus: "background",
|
||||
},
|
||||
mockContext,
|
||||
|
|
@ -325,7 +323,7 @@ describe("MCP Server Integration", () => {
|
|||
const [listResult, imageResult] = await Promise.all([
|
||||
mockListToolHandler({ item_type: "running_applications" }, mockContext),
|
||||
mockImageToolHandler(
|
||||
{ format: "png", return_data: false, capture_focus: "background" },
|
||||
{ format: "png", capture_focus: "background" },
|
||||
mockContext,
|
||||
),
|
||||
]);
|
||||
|
|
|
|||
|
|
@ -2,7 +2,6 @@ import { vi } from "vitest";
|
|||
import {
|
||||
imageToolHandler,
|
||||
buildSwiftCliArgs,
|
||||
ImageToolInput,
|
||||
} from "../../../src/tools/image";
|
||||
import {
|
||||
executeSwiftCli,
|
||||
|
|
@ -13,61 +12,31 @@ import { pino } from "pino";
|
|||
import {
|
||||
SavedFile,
|
||||
ImageCaptureData,
|
||||
AIProviderConfig,
|
||||
ToolResponse,
|
||||
AIProvider,
|
||||
ImageInput,
|
||||
} from "../../../src/types";
|
||||
import * as fs from "fs/promises";
|
||||
import * as os from "os";
|
||||
import * as pathModule from "path";
|
||||
import * as path from "path";
|
||||
|
||||
// Mock the Swift CLI utility
|
||||
vi.mock("../../../src/utils/peekaboo-cli");
|
||||
|
||||
// Mock AI Provider utilities
|
||||
// Declare the variables that will hold the mock functions first
|
||||
let mockDetermineProviderAndModel: vi.MockedFunction<any>;
|
||||
let mockAnalyzeImageWithProvider: vi.MockedFunction<any>;
|
||||
let mockParseAIProviders: vi.MockedFunction<any>;
|
||||
let mockIsProviderAvailable: vi.MockedFunction<any>;
|
||||
let mockGetDefaultModelForProvider: vi.MockedFunction<any>;
|
||||
|
||||
vi.mock("../../../src/utils/ai-providers", () => {
|
||||
// Create new vi.fn() instances inside the factory
|
||||
const determineProviderAndModel = vi.fn();
|
||||
const analyzeImageWithProvider = vi.fn();
|
||||
const parseAIProviders = vi.fn();
|
||||
const isProviderAvailable = vi.fn();
|
||||
const getDefaultModelForProvider = vi.fn().mockReturnValue("default-model");
|
||||
|
||||
// Assign them to the outer scope variables so tests can reference them
|
||||
// This assignment happens AFTER the vi.mock call is processed by Vitest due to hoisting.
|
||||
// We will re-assign these correctly after the mock call using an import.
|
||||
return {
|
||||
determineProviderAndModel,
|
||||
analyzeImageWithProvider,
|
||||
parseAIProviders,
|
||||
isProviderAvailable,
|
||||
getDefaultModelForProvider,
|
||||
};
|
||||
});
|
||||
|
||||
// Mock fs/promises for mkdtemp, unlink, rmdir
|
||||
// Mock fs/promises
|
||||
vi.mock("fs/promises");
|
||||
|
||||
// Now, import the mocked module and assign the vi.fn() instances to our variables
|
||||
// This ensures our variables hold the actual mocks created by Vitest's factory.
|
||||
import * as ActualAiProvidersMock from "../../../src/utils/ai-providers";
|
||||
mockDetermineProviderAndModel =
|
||||
ActualAiProvidersMock.determineProviderAndModel as vi.MockedFunction<any>;
|
||||
mockAnalyzeImageWithProvider =
|
||||
ActualAiProvidersMock.analyzeImageWithProvider as vi.MockedFunction<any>;
|
||||
mockParseAIProviders =
|
||||
ActualAiProvidersMock.parseAIProviders as vi.MockedFunction<any>;
|
||||
mockIsProviderAvailable =
|
||||
ActualAiProvidersMock.isProviderAvailable as vi.MockedFunction<any>;
|
||||
mockGetDefaultModelForProvider =
|
||||
ActualAiProvidersMock.getDefaultModelForProvider as vi.MockedFunction<any>;
|
||||
// Mock os
|
||||
vi.mock("os");
|
||||
|
||||
// Mock path
|
||||
vi.mock("path");
|
||||
|
||||
// Mock AI providers instead of performAutomaticAnalysis
|
||||
vi.mock("../../../src/utils/ai-providers", () => ({
|
||||
parseAIProviders: vi.fn(),
|
||||
analyzeImageWithProvider: vi.fn(),
|
||||
}));
|
||||
|
||||
const mockExecuteSwiftCli = executeSwiftCli as vi.MockedFunction<
|
||||
typeof executeSwiftCli
|
||||
|
|
@ -75,117 +44,298 @@ const mockExecuteSwiftCli = executeSwiftCli as vi.MockedFunction<
|
|||
const mockReadImageAsBase64 = readImageAsBase64 as vi.MockedFunction<
|
||||
typeof readImageAsBase64
|
||||
>;
|
||||
import { parseAIProviders, analyzeImageWithProvider } from "../../../src/utils/ai-providers";
|
||||
const mockParseAIProviders = parseAIProviders as vi.MockedFunction<typeof parseAIProviders>;
|
||||
const mockAnalyzeImageWithProvider = analyzeImageWithProvider as vi.MockedFunction<typeof analyzeImageWithProvider>;
|
||||
|
||||
const mockFsMkdtemp = fs.mkdtemp as vi.MockedFunction<typeof fs.mkdtemp>;
|
||||
const mockFsReadFile = fs.readFile as vi.MockedFunction<typeof fs.readFile>;
|
||||
const mockFsUnlink = fs.unlink as vi.MockedFunction<typeof fs.unlink>;
|
||||
const mockFsMkdtemp = fs.mkdtemp as vi.MockedFunction<typeof fs.mkdtemp>;
|
||||
const mockFsRmdir = fs.rmdir as vi.MockedFunction<typeof fs.rmdir>;
|
||||
const mockFsWriteFile = fs.writeFile as vi.MockedFunction<typeof fs.writeFile>;
|
||||
const mockOsTmpdir = os.tmpdir as vi.MockedFunction<typeof os.tmpdir>;
|
||||
const mockPathJoin = path.join as vi.MockedFunction<typeof path.join>;
|
||||
const mockPathDirname = path.dirname as vi.MockedFunction<typeof path.dirname>;
|
||||
|
||||
const mockLogger = pino({ level: "silent" });
|
||||
const mockContext = { logger: mockLogger };
|
||||
|
||||
const MOCK_TEMP_DIR_PATH = "/tmp/peekaboo-img-mock";
|
||||
const MOCK_TEMP_IMAGE_PATH = `${MOCK_TEMP_DIR_PATH}/capture.png`;
|
||||
const MOCK_TEMP_DIR = "/tmp";
|
||||
const MOCK_TEMP_IMAGE_DIR = "/tmp/peekaboo-img-XXXXXX";
|
||||
const MOCK_TEMP_IMAGE_PATH = "/tmp/peekaboo-img-XXXXXX/capture.png";
|
||||
const MOCK_TEMP_ANALYSIS_DIR = "/tmp/peekaboo-analysis-XXXXXX";
|
||||
const MOCK_TEMP_ANALYSIS_PATH = "/tmp/peekaboo-analysis-XXXXXX/image.png";
|
||||
|
||||
describe("Image Tool", () => {
|
||||
beforeEach(() => {
|
||||
vi.clearAllMocks();
|
||||
mockFsMkdtemp.mockResolvedValue(MOCK_TEMP_DIR_PATH);
|
||||
mockOsTmpdir.mockReturnValue(MOCK_TEMP_DIR);
|
||||
mockPathJoin.mockImplementation((...paths) => paths.join("/"));
|
||||
mockPathDirname.mockImplementation((p) => {
|
||||
const lastSlash = p.lastIndexOf("/");
|
||||
return lastSlash === -1 ? "." : p.substring(0, lastSlash);
|
||||
});
|
||||
mockFsUnlink.mockResolvedValue(undefined);
|
||||
mockFsRmdir.mockResolvedValue(undefined);
|
||||
mockFsWriteFile.mockResolvedValue(undefined);
|
||||
mockFsReadFile.mockResolvedValue(Buffer.from("fake-image-data"));
|
||||
mockFsMkdtemp.mockImplementation((prefix) => {
|
||||
if (prefix.includes("peekaboo-img-")) {
|
||||
return Promise.resolve(MOCK_TEMP_IMAGE_DIR);
|
||||
} else if (prefix.includes("peekaboo-analysis-")) {
|
||||
return Promise.resolve(MOCK_TEMP_ANALYSIS_DIR);
|
||||
}
|
||||
return Promise.resolve(prefix + "XXXXXX");
|
||||
});
|
||||
process.env.PEEKABOO_AI_PROVIDERS = "";
|
||||
|
||||
// Ensure specific mock implementations are reset/re-set for each test or suite as needed
|
||||
// The functions themselves are already vi.fn() instances.
|
||||
mockDetermineProviderAndModel.mockReset();
|
||||
mockAnalyzeImageWithProvider.mockReset();
|
||||
mockParseAIProviders.mockReset();
|
||||
mockIsProviderAvailable.mockReset();
|
||||
mockGetDefaultModelForProvider.mockReset().mockReturnValue("default-model"); // Re-apply default mock behavior if any
|
||||
});
|
||||
|
||||
describe("imageToolHandler - Capture Only", () => {
|
||||
it("should capture screen with minimal parameters", async () => {
|
||||
const mockResponse = mockSwiftCli.captureImage("screen", {});
|
||||
it("should capture screen with minimal parameters (format omitted, path omitted)", async () => {
|
||||
const mockResponse = mockSwiftCli.captureImage("screen", {
|
||||
path: MOCK_TEMP_IMAGE_PATH,
|
||||
format: "png",
|
||||
});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockResponse);
|
||||
mockReadImageAsBase64.mockResolvedValue("base64imagedata");
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{
|
||||
format: "png",
|
||||
return_data: false,
|
||||
capture_focus: "background",
|
||||
},
|
||||
mockContext,
|
||||
);
|
||||
const result = await imageToolHandler({}, mockContext);
|
||||
|
||||
expect(result.content[0].type).toBe("text");
|
||||
expect(result.content[0].text).toContain("Captured 1 image");
|
||||
expect(mockExecuteSwiftCli).toHaveBeenCalledWith(
|
||||
expect.arrayContaining(["image", "--mode", "screen"]),
|
||||
expect.arrayContaining(["image", "--mode", "screen", "--path", MOCK_TEMP_IMAGE_PATH, "--format", "png"]),
|
||||
mockLogger,
|
||||
);
|
||||
expect(result.saved_files).toEqual(mockResponse.data?.saved_files);
|
||||
|
||||
// When format is omitted and path is omitted, behaves like format: "data"
|
||||
expect(result.content).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({ type: "text" }),
|
||||
expect.objectContaining({ type: "image", data: "base64imagedata" }),
|
||||
]),
|
||||
);
|
||||
expect(result.saved_files).toEqual([]);
|
||||
expect(result.analysis_text).toBeUndefined();
|
||||
expect(result.model_used).toBeUndefined();
|
||||
});
|
||||
|
||||
it("should return image data when return_data is true and no question is asked", async () => {
|
||||
const mockSavedFile: SavedFile = {
|
||||
path: "/tmp/test.png",
|
||||
mime_type: "image/png",
|
||||
item_label: "Screen 1",
|
||||
};
|
||||
const mockCaptureData: ImageCaptureData = {
|
||||
saved_files: [mockSavedFile],
|
||||
};
|
||||
const mockCliResponse = {
|
||||
success: true,
|
||||
data: mockCaptureData,
|
||||
messages: ["Captured one file"],
|
||||
};
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockCliResponse);
|
||||
it("should capture screen with format: 'data'", async () => {
|
||||
const mockResponse = mockSwiftCli.captureImage("screen", {
|
||||
path: MOCK_TEMP_IMAGE_PATH,
|
||||
format: "png",
|
||||
});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockResponse);
|
||||
mockReadImageAsBase64.mockResolvedValue("base64imagedata");
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{
|
||||
format: "png",
|
||||
return_data: true,
|
||||
capture_focus: "background",
|
||||
},
|
||||
{ format: "data" },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(result.isError).toBeUndefined();
|
||||
expect(result.content).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
type: "text",
|
||||
text: expect.stringContaining("Captured 1 image"),
|
||||
}),
|
||||
expect.objectContaining({ type: "text" }),
|
||||
expect.objectContaining({ type: "image", data: "base64imagedata" }),
|
||||
]),
|
||||
);
|
||||
expect(result.saved_files).toEqual([]);
|
||||
});
|
||||
|
||||
it("should save file and return base64 when format: 'data' with path", async () => {
|
||||
const userPath = "/user/test.png";
|
||||
const mockSavedFile: SavedFile = {
|
||||
path: userPath,
|
||||
mime_type: "image/png",
|
||||
item_label: "Screen 1",
|
||||
};
|
||||
const mockResponse = {
|
||||
success: true,
|
||||
data: { saved_files: [mockSavedFile] },
|
||||
messages: ["Captured one file"],
|
||||
};
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockResponse);
|
||||
mockReadImageAsBase64.mockResolvedValue("base64imagedata");
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{ format: "data", path: userPath },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockExecuteSwiftCli).toHaveBeenCalledWith(
|
||||
expect.arrayContaining(["--path", userPath, "--format", "png"]),
|
||||
mockLogger,
|
||||
);
|
||||
expect(result.content).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({ type: "text" }),
|
||||
expect.objectContaining({ type: "image", data: "base64imagedata" }),
|
||||
]),
|
||||
);
|
||||
expect(mockReadImageAsBase64).toHaveBeenCalledWith("/tmp/test.png");
|
||||
expect(result.saved_files).toEqual([mockSavedFile]);
|
||||
expect(result.analysis_text).toBeUndefined();
|
||||
});
|
||||
|
||||
it("should save file without base64 when format: 'png' with path", async () => {
|
||||
const userPath = "/user/test.png";
|
||||
const mockSavedFile: SavedFile = {
|
||||
path: userPath,
|
||||
mime_type: "image/png",
|
||||
item_label: "Screen 1",
|
||||
};
|
||||
const mockResponse = {
|
||||
success: true,
|
||||
data: { saved_files: [mockSavedFile] },
|
||||
messages: ["Captured one file"],
|
||||
};
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockResponse);
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{ format: "png", path: userPath },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(result.content[0]).toEqual(
|
||||
expect.objectContaining({
|
||||
type: "text",
|
||||
text: expect.stringContaining("Captured 1 image"),
|
||||
}),
|
||||
);
|
||||
// Check for capture messages if present
|
||||
if (result.content.length > 1) {
|
||||
expect(result.content[1]).toEqual(
|
||||
expect.objectContaining({
|
||||
type: "text",
|
||||
text: expect.stringContaining("Capture Messages"),
|
||||
}),
|
||||
);
|
||||
}
|
||||
// No base64 in content
|
||||
expect(result.content.some((item) => item.type === "image")).toBe(false);
|
||||
expect(result.saved_files).toEqual([mockSavedFile]);
|
||||
});
|
||||
|
||||
it("should handle app_target: 'screen:1' with warning", async () => {
|
||||
const mockResponse = mockSwiftCli.captureImage("screen", {});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockResponse);
|
||||
|
||||
const loggerWarnSpy = vi.spyOn(mockLogger, "warn");
|
||||
|
||||
await imageToolHandler(
|
||||
{ app_target: "screen:1" },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(loggerWarnSpy).toHaveBeenCalledWith(
|
||||
expect.objectContaining({ screenIndex: "1" }),
|
||||
"Screen index specification not yet supported by Swift CLI, capturing all screens",
|
||||
);
|
||||
expect(mockExecuteSwiftCli).toHaveBeenCalledWith(
|
||||
expect.arrayContaining(["--mode", "screen"]),
|
||||
mockLogger,
|
||||
);
|
||||
});
|
||||
|
||||
it("should handle app_target: 'frontmost' with warning", async () => {
|
||||
const mockResponse = mockSwiftCli.captureImage("screen", {});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockResponse);
|
||||
|
||||
const loggerWarnSpy = vi.spyOn(mockLogger, "warn");
|
||||
|
||||
await imageToolHandler(
|
||||
{ app_target: "frontmost" },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(loggerWarnSpy).toHaveBeenCalledWith(
|
||||
"'frontmost' target requires determining current frontmost app, defaulting to screen mode",
|
||||
);
|
||||
expect(mockExecuteSwiftCli).toHaveBeenCalledWith(
|
||||
expect.arrayContaining(["--mode", "screen"]),
|
||||
mockLogger,
|
||||
);
|
||||
});
|
||||
|
||||
it("should handle app_target: 'AppName'", async () => {
|
||||
const mockResponse = mockSwiftCli.captureImage("Safari", {});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockResponse);
|
||||
|
||||
await imageToolHandler(
|
||||
{ app_target: "Safari" },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockExecuteSwiftCli).toHaveBeenCalledWith(
|
||||
expect.arrayContaining(["--app", "Safari", "--mode", "multi"]),
|
||||
mockLogger,
|
||||
);
|
||||
});
|
||||
|
||||
it("should handle app_target: 'AppName:WINDOW_TITLE:Title'", async () => {
|
||||
const mockResponse = mockSwiftCli.captureImage("Safari", {});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockResponse);
|
||||
|
||||
await imageToolHandler(
|
||||
{ app_target: "Safari:WINDOW_TITLE:Apple" },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockExecuteSwiftCli).toHaveBeenCalledWith(
|
||||
expect.arrayContaining([
|
||||
"--app", "Safari",
|
||||
"--mode", "window",
|
||||
"--window-title", "Apple"
|
||||
]),
|
||||
mockLogger,
|
||||
);
|
||||
});
|
||||
|
||||
it("should handle app_target: 'AppName:WINDOW_INDEX:2'", async () => {
|
||||
const mockResponse = mockSwiftCli.captureImage("Safari", {});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockResponse);
|
||||
|
||||
await imageToolHandler(
|
||||
{ app_target: "Safari:WINDOW_INDEX:2" },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockExecuteSwiftCli).toHaveBeenCalledWith(
|
||||
expect.arrayContaining([
|
||||
"--app", "Safari",
|
||||
"--mode", "window",
|
||||
"--window-index", "2"
|
||||
]),
|
||||
mockLogger,
|
||||
);
|
||||
});
|
||||
|
||||
it("should handle capture_focus parameter", async () => {
|
||||
const mockResponse = mockSwiftCli.captureImage("screen", {});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockResponse);
|
||||
|
||||
await imageToolHandler(
|
||||
{ capture_focus: "foreground" },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockExecuteSwiftCli).toHaveBeenCalledWith(
|
||||
expect.arrayContaining(["--capture-focus", "foreground"]),
|
||||
mockLogger,
|
||||
);
|
||||
});
|
||||
});
|
||||
|
||||
describe("imageToolHandler - Capture and Analyze", () => {
|
||||
const MOCK_QUESTION = "What is in this image?";
|
||||
const MOCK_ANALYSIS_RESPONSE = "This is a cat.";
|
||||
const MOCK_PROVIDER_DETAILS: AIProvider = {
|
||||
provider: "ollama",
|
||||
model: "llava:custom",
|
||||
};
|
||||
const MOCK_MODEL_USED = "ollama/llava:latest";
|
||||
|
||||
beforeEach(() => {
|
||||
mockParseAIProviders.mockReturnValue([
|
||||
{ provider: "ollama", model: "llava:default" },
|
||||
{ provider: "ollama", model: "llava:latest" }
|
||||
]);
|
||||
mockDetermineProviderAndModel.mockResolvedValue(MOCK_PROVIDER_DETAILS);
|
||||
mockAnalyzeImageWithProvider.mockResolvedValue(MOCK_ANALYSIS_RESPONSE);
|
||||
mockReadImageAsBase64.mockResolvedValue("base64dataforanalysis");
|
||||
process.env.PEEKABOO_AI_PROVIDERS = "ollama/llava:default";
|
||||
process.env.PEEKABOO_AI_PROVIDERS = "ollama/llava:latest";
|
||||
});
|
||||
|
||||
it("should capture, analyze, and delete temp image if no path provided", async () => {
|
||||
|
|
@ -196,32 +346,29 @@ describe("Image Tool", () => {
|
|||
mockExecuteSwiftCli.mockResolvedValue(mockCliResponse);
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{
|
||||
question: MOCK_QUESTION,
|
||||
format: "png",
|
||||
},
|
||||
{ question: MOCK_QUESTION },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockFsMkdtemp).toHaveBeenCalled();
|
||||
// The new implementation creates a temp dir first, then joins with capture.png
|
||||
expect(mockPathJoin).toHaveBeenCalledWith(
|
||||
expect.stringMatching(/^\/tmp\/peekaboo-img-/),
|
||||
"capture.png",
|
||||
);
|
||||
expect(mockExecuteSwiftCli).toHaveBeenCalledWith(
|
||||
expect.arrayContaining(["--path", MOCK_TEMP_IMAGE_PATH]),
|
||||
mockLogger,
|
||||
);
|
||||
expect(mockReadImageAsBase64).toHaveBeenCalledWith(MOCK_TEMP_IMAGE_PATH);
|
||||
expect(mockDetermineProviderAndModel).toHaveBeenCalled();
|
||||
expect(mockAnalyzeImageWithProvider).toHaveBeenCalledWith(
|
||||
MOCK_PROVIDER_DETAILS,
|
||||
MOCK_TEMP_IMAGE_PATH,
|
||||
expect.objectContaining({ provider: "ollama", model: "llava:latest" }),
|
||||
MOCK_TEMP_ANALYSIS_PATH,
|
||||
"base64dataforanalysis",
|
||||
MOCK_QUESTION,
|
||||
mockLogger,
|
||||
);
|
||||
|
||||
expect(result.analysis_text).toBe(MOCK_ANALYSIS_RESPONSE);
|
||||
expect(result.model_used).toBe(
|
||||
`${MOCK_PROVIDER_DETAILS.provider}/${MOCK_PROVIDER_DETAILS.model}`,
|
||||
);
|
||||
expect(result.model_used).toBe(MOCK_MODEL_USED);
|
||||
expect(result.content).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
|
|
@ -236,11 +383,12 @@ describe("Image Tool", () => {
|
|||
]),
|
||||
);
|
||||
expect(result.saved_files).toEqual([]);
|
||||
// No base64 in content when question is asked
|
||||
expect(
|
||||
result.content.some((item) => item.type === "image" && item.data),
|
||||
).toBe(false);
|
||||
expect(mockFsUnlink).toHaveBeenCalledWith(MOCK_TEMP_IMAGE_PATH);
|
||||
expect(mockFsRmdir).toHaveBeenCalledWith(MOCK_TEMP_DIR_PATH);
|
||||
expect(mockFsRmdir).toHaveBeenCalledWith(MOCK_TEMP_IMAGE_DIR);
|
||||
expect(result.isError).toBeUndefined();
|
||||
});
|
||||
|
||||
|
|
@ -261,63 +409,34 @@ describe("Image Tool", () => {
|
|||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockFsMkdtemp).not.toHaveBeenCalled();
|
||||
// Path.join is called for the analysis temp path
|
||||
expect(mockPathJoin).toHaveBeenCalledWith(
|
||||
expect.stringMatching(/^\/tmp\/peekaboo-analysis-/),
|
||||
"image.png",
|
||||
);
|
||||
expect(mockExecuteSwiftCli).toHaveBeenCalledWith(
|
||||
expect.arrayContaining(["--path", USER_PATH]),
|
||||
mockLogger,
|
||||
);
|
||||
expect(mockReadImageAsBase64).toHaveBeenCalledWith(USER_PATH);
|
||||
expect(mockAnalyzeImageWithProvider).toHaveBeenCalled();
|
||||
|
||||
expect(result.analysis_text).toBe(MOCK_ANALYSIS_RESPONSE);
|
||||
expect(result.saved_files).toEqual(mockCliResponse.data?.saved_files);
|
||||
expect(mockFsUnlink).not.toHaveBeenCalled();
|
||||
expect(mockFsRmdir).not.toHaveBeenCalled();
|
||||
expect(result.isError).toBeUndefined();
|
||||
});
|
||||
|
||||
it("should use provider_config if specified", async () => {
|
||||
const specificProviderConfig: AIProviderConfig = {
|
||||
type: "openai",
|
||||
model: "gpt-4-vision",
|
||||
};
|
||||
const specificProviderDetails: AIProvider = {
|
||||
provider: "openai",
|
||||
model: "gpt-4-vision",
|
||||
};
|
||||
mockDetermineProviderAndModel.mockResolvedValue(specificProviderDetails);
|
||||
const mockCliResponse = mockSwiftCli.captureImage("screen", {
|
||||
path: MOCK_TEMP_IMAGE_PATH,
|
||||
format: "png",
|
||||
});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockCliResponse);
|
||||
|
||||
await imageToolHandler(
|
||||
{
|
||||
question: MOCK_QUESTION,
|
||||
provider_config: specificProviderConfig,
|
||||
format: "png",
|
||||
},
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockDetermineProviderAndModel).toHaveBeenCalledWith(
|
||||
specificProviderConfig,
|
||||
expect.any(Array),
|
||||
expect.arrayContaining(["--path", USER_PATH, "--format", "jpg"]),
|
||||
mockLogger,
|
||||
);
|
||||
expect(mockAnalyzeImageWithProvider).toHaveBeenCalledWith(
|
||||
specificProviderDetails,
|
||||
MOCK_TEMP_IMAGE_PATH,
|
||||
expect.objectContaining({ provider: "ollama", model: "llava:latest" }),
|
||||
MOCK_TEMP_ANALYSIS_PATH,
|
||||
"base64dataforanalysis",
|
||||
MOCK_QUESTION,
|
||||
mockLogger,
|
||||
);
|
||||
|
||||
expect(result.analysis_text).toBe(MOCK_ANALYSIS_RESPONSE);
|
||||
expect(result.saved_files).toEqual(mockCliResponse.data?.saved_files);
|
||||
// Analysis temp file is cleaned up
|
||||
expect(mockFsUnlink).toHaveBeenCalledWith(MOCK_TEMP_ANALYSIS_PATH);
|
||||
expect(mockFsRmdir).toHaveBeenCalledWith(MOCK_TEMP_ANALYSIS_DIR);
|
||||
expect(result.isError).toBeUndefined();
|
||||
});
|
||||
|
||||
it("should handle failure in readImageAsBase64 before analysis", async () => {
|
||||
mockReadImageAsBase64.mockRejectedValue(
|
||||
new Error("Failed to read image"),
|
||||
it("should handle failure in AI provider", async () => {
|
||||
mockAnalyzeImageWithProvider.mockRejectedValue(
|
||||
new Error("AI analysis failed"),
|
||||
);
|
||||
const mockCliResponse = mockSwiftCli.captureImage("screen", {
|
||||
path: MOCK_TEMP_IMAGE_PATH,
|
||||
|
|
@ -326,23 +445,21 @@ describe("Image Tool", () => {
|
|||
mockExecuteSwiftCli.mockResolvedValue(mockCliResponse);
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{ question: MOCK_QUESTION, format: "png" },
|
||||
{ question: MOCK_QUESTION },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockAnalyzeImageWithProvider).not.toHaveBeenCalled();
|
||||
expect(result.analysis_text).toContain(
|
||||
"Analysis skipped: Failed to read captured image",
|
||||
"Analysis failed: All configured AI providers failed or are unavailable",
|
||||
);
|
||||
expect(result.isError).toBe(true);
|
||||
expect(result.model_used).toBeUndefined();
|
||||
expect(mockFsUnlink).toHaveBeenCalledWith(MOCK_TEMP_IMAGE_PATH);
|
||||
expect(mockFsRmdir).toHaveBeenCalledWith(MOCK_TEMP_IMAGE_DIR);
|
||||
});
|
||||
|
||||
it("should handle failure in determineProviderAndModel (rejected promise)", async () => {
|
||||
mockDetermineProviderAndModel.mockRejectedValue(
|
||||
new Error("No provider available error from determine"),
|
||||
);
|
||||
it("should handle when AI provider returns empty analysisText", async () => {
|
||||
mockAnalyzeImageWithProvider.mockResolvedValue("");
|
||||
const mockCliResponse = mockSwiftCli.captureImage("screen", {
|
||||
path: MOCK_TEMP_IMAGE_PATH,
|
||||
format: "png",
|
||||
|
|
@ -350,102 +467,16 @@ describe("Image Tool", () => {
|
|||
mockExecuteSwiftCli.mockResolvedValue(mockCliResponse);
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{ question: MOCK_QUESTION, format: "png" },
|
||||
{ question: MOCK_QUESTION },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockAnalyzeImageWithProvider).not.toHaveBeenCalled();
|
||||
expect(result.analysis_text).toContain(
|
||||
"AI analysis failed: No provider available error from determine",
|
||||
);
|
||||
expect(result.isError).toBe(true);
|
||||
// When AI provider returns empty string, it's still considered a "success"
|
||||
expect(result.analysis_text).toBe("");
|
||||
expect(result.isError).toBeUndefined();
|
||||
});
|
||||
|
||||
it("should handle failure when determineProviderAndModel resolves to no provider", async () => {
|
||||
mockDetermineProviderAndModel.mockResolvedValue({
|
||||
provider: null,
|
||||
model: "",
|
||||
});
|
||||
const mockCliResponse = mockSwiftCli.captureImage("screen", {
|
||||
path: MOCK_TEMP_IMAGE_PATH,
|
||||
format: "png",
|
||||
});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockCliResponse);
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{ question: MOCK_QUESTION, format: "png" },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(mockAnalyzeImageWithProvider).not.toHaveBeenCalled();
|
||||
expect(result.analysis_text).toContain(
|
||||
"Analysis skipped: No AI providers are currently operational",
|
||||
);
|
||||
expect(result.isError).toBe(true);
|
||||
});
|
||||
|
||||
it("should handle failure in analyzeImageWithProvider", async () => {
|
||||
mockAnalyzeImageWithProvider.mockRejectedValue(
|
||||
new Error("AI API Error from analyze"),
|
||||
);
|
||||
const mockCliResponse = mockSwiftCli.captureImage("screen", {
|
||||
path: MOCK_TEMP_IMAGE_PATH,
|
||||
format: "png",
|
||||
});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockCliResponse);
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{ question: MOCK_QUESTION, format: "png" },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(result.analysis_text).toContain(
|
||||
"AI analysis failed: AI API Error from analyze",
|
||||
);
|
||||
expect(result.isError).toBe(true);
|
||||
expect(result.model_used).toBeUndefined();
|
||||
});
|
||||
|
||||
it("should correctly report error if PEEKABOO_AI_PROVIDERS is not set and no provider_config given", async () => {
|
||||
process.env.PEEKABOO_AI_PROVIDERS = "";
|
||||
mockParseAIProviders.mockReturnValue([]);
|
||||
const mockCliResponse = mockSwiftCli.captureImage("screen", {
|
||||
path: MOCK_TEMP_IMAGE_PATH,
|
||||
format: "png",
|
||||
});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockCliResponse);
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{ question: MOCK_QUESTION, format: "png" },
|
||||
mockContext,
|
||||
);
|
||||
|
||||
expect(result.analysis_text).toContain(
|
||||
"Analysis skipped: AI analysis not configured on this server",
|
||||
);
|
||||
expect(result.isError).toBe(true);
|
||||
expect(mockAnalyzeImageWithProvider).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it("should return isError = true if analysis is attempted but fails, even if capture succeeds", async () => {
|
||||
mockAnalyzeImageWithProvider.mockRejectedValue(new Error("AI Error"));
|
||||
const mockCliResponse = mockSwiftCli.captureImage("screen", {
|
||||
path: MOCK_TEMP_IMAGE_PATH,
|
||||
format: "png",
|
||||
});
|
||||
mockExecuteSwiftCli.mockResolvedValue(mockCliResponse);
|
||||
|
||||
const result = await imageToolHandler(
|
||||
{ question: MOCK_QUESTION, format: "png" },
|
||||
mockContext,
|
||||
);
|
||||
expect(result.isError).toBe(true);
|
||||
expect(result.content[0].text).toContain("Captured 1 image");
|
||||
expect(result.content[0].text).toContain("Analysis failed/skipped");
|
||||
expect(result.analysis_text).toContain("AI analysis failed: AI Error");
|
||||
});
|
||||
|
||||
it("should NOT return base64_data in content if question is asked, even if return_data is true", async () => {
|
||||
it("should NOT return base64 data in content if question is asked", async () => {
|
||||
const mockCliResponse = mockSwiftCli.captureImage("screen", {
|
||||
path: MOCK_TEMP_IMAGE_PATH,
|
||||
format: "png",
|
||||
|
|
@ -455,8 +486,7 @@ describe("Image Tool", () => {
|
|||
const result = await imageToolHandler(
|
||||
{
|
||||
question: MOCK_QUESTION,
|
||||
return_data: true,
|
||||
format: "png",
|
||||
format: "data", // Even with format: "data"
|
||||
},
|
||||
mockContext,
|
||||
);
|
||||
|
|
@ -469,14 +499,8 @@ describe("Image Tool", () => {
|
|||
});
|
||||
|
||||
describe("buildSwiftCliArgs", () => {
|
||||
const defaults = {
|
||||
format: "png" as const,
|
||||
return_data: false,
|
||||
capture_focus: "background" as const,
|
||||
};
|
||||
|
||||
it("should default to screen mode if no app provided and no mode specified", () => {
|
||||
const args = buildSwiftCliArgs({ ...defaults });
|
||||
it("should default to screen mode if no app_target", () => {
|
||||
const args = buildSwiftCliArgs({});
|
||||
expect(args).toEqual([
|
||||
"image",
|
||||
"--mode",
|
||||
|
|
@ -488,14 +512,12 @@ describe("Image Tool", () => {
|
|||
]);
|
||||
});
|
||||
|
||||
it("should default to window mode if app is provided and no mode specified", () => {
|
||||
const args = buildSwiftCliArgs({ ...defaults, app: "Safari" });
|
||||
it("should handle empty app_target", () => {
|
||||
const args = buildSwiftCliArgs({ app_target: "" });
|
||||
expect(args).toEqual([
|
||||
"image",
|
||||
"--app",
|
||||
"Safari",
|
||||
"--mode",
|
||||
"window",
|
||||
"screen",
|
||||
"--format",
|
||||
"png",
|
||||
"--capture-focus",
|
||||
|
|
@ -503,97 +525,101 @@ describe("Image Tool", () => {
|
|||
]);
|
||||
});
|
||||
|
||||
it("should use specified mode: screen", () => {
|
||||
const args = buildSwiftCliArgs({ ...defaults, mode: "screen" });
|
||||
expect(args).toEqual(expect.arrayContaining(["--mode", "screen"]));
|
||||
it("should handle app_target: 'screen:1'", () => {
|
||||
const loggerWarnSpy = vi.spyOn(mockLogger, "warn");
|
||||
const args = buildSwiftCliArgs({ app_target: "screen:1" }, mockLogger);
|
||||
expect(args).toEqual(
|
||||
expect.arrayContaining(["--mode", "screen"]),
|
||||
);
|
||||
expect(args).not.toContain("--app");
|
||||
expect(loggerWarnSpy).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it("should use specified mode: window with app", () => {
|
||||
const args = buildSwiftCliArgs({
|
||||
...defaults,
|
||||
app: "Terminal",
|
||||
mode: "window",
|
||||
});
|
||||
it("should handle app_target: 'frontmost'", () => {
|
||||
const loggerWarnSpy = vi.spyOn(mockLogger, "warn");
|
||||
const args = buildSwiftCliArgs({ app_target: "frontmost" }, mockLogger);
|
||||
expect(args).toEqual(
|
||||
expect.arrayContaining(["--app", "Terminal", "--mode", "window"]),
|
||||
expect.arrayContaining(["--mode", "screen"]),
|
||||
);
|
||||
expect(args).not.toContain("--app");
|
||||
expect(loggerWarnSpy).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it("should handle simple app name", () => {
|
||||
const args = buildSwiftCliArgs({ app_target: "Safari" });
|
||||
expect(args).toEqual(
|
||||
expect.arrayContaining(["--app", "Safari", "--mode", "multi"]),
|
||||
);
|
||||
});
|
||||
|
||||
it("should use specified mode: multi with app", () => {
|
||||
const args = buildSwiftCliArgs({
|
||||
...defaults,
|
||||
app: "Finder",
|
||||
mode: "multi",
|
||||
});
|
||||
it("should handle app with window title", () => {
|
||||
const args = buildSwiftCliArgs({ app_target: "Safari:WINDOW_TITLE:Apple Website" });
|
||||
expect(args).toEqual(
|
||||
expect.arrayContaining(["--app", "Finder", "--mode", "multi"]),
|
||||
expect.arrayContaining([
|
||||
"--app", "Safari",
|
||||
"--mode", "window",
|
||||
"--window-title", "Apple Website"
|
||||
]),
|
||||
);
|
||||
});
|
||||
|
||||
it("should include app", () => {
|
||||
const args = buildSwiftCliArgs({ ...defaults, app: "Notes" });
|
||||
expect(args).toEqual(expect.arrayContaining(["--app", "Notes"]));
|
||||
it("should handle app with window index", () => {
|
||||
const args = buildSwiftCliArgs({ app_target: "Terminal:WINDOW_INDEX:2" });
|
||||
expect(args).toEqual(
|
||||
expect.arrayContaining([
|
||||
"--app", "Terminal",
|
||||
"--mode", "window",
|
||||
"--window-index", "2"
|
||||
]),
|
||||
);
|
||||
});
|
||||
|
||||
it("should include path", () => {
|
||||
const args = buildSwiftCliArgs({ ...defaults, path: "/tmp/image.jpg" });
|
||||
it("should include path when provided", () => {
|
||||
const args = buildSwiftCliArgs({ path: "/tmp/image.jpg" });
|
||||
expect(args).toEqual(
|
||||
expect.arrayContaining(["--path", "/tmp/image.jpg"]),
|
||||
);
|
||||
});
|
||||
|
||||
it("should include window_specifier by title", () => {
|
||||
const args = buildSwiftCliArgs({
|
||||
...defaults,
|
||||
app: "Safari",
|
||||
window_specifier: { title: "Apple" },
|
||||
});
|
||||
expect(args).toEqual(expect.arrayContaining(["--window-title", "Apple"]));
|
||||
});
|
||||
|
||||
it("should include window_specifier by index", () => {
|
||||
const args = buildSwiftCliArgs({
|
||||
...defaults,
|
||||
app: "Safari",
|
||||
window_specifier: { index: 0 },
|
||||
});
|
||||
expect(args).toEqual(expect.arrayContaining(["--window-index", "0"]));
|
||||
});
|
||||
|
||||
it("should include format (default png)", () => {
|
||||
const args = buildSwiftCliArgs({ ...defaults });
|
||||
it("should handle format: 'data' as png for Swift CLI", () => {
|
||||
const args = buildSwiftCliArgs({ format: "data" });
|
||||
expect(args).toEqual(expect.arrayContaining(["--format", "png"]));
|
||||
});
|
||||
|
||||
it("should include specified format jpg", () => {
|
||||
const args = buildSwiftCliArgs({ ...defaults, format: "jpg" });
|
||||
it("should include format jpg", () => {
|
||||
const args = buildSwiftCliArgs({ format: "jpg" });
|
||||
expect(args).toEqual(expect.arrayContaining(["--format", "jpg"]));
|
||||
});
|
||||
|
||||
it("should include capture_focus (default background)", () => {
|
||||
const args = buildSwiftCliArgs({ ...defaults });
|
||||
expect(args).toEqual(
|
||||
expect.arrayContaining(["--capture-focus", "background"]),
|
||||
);
|
||||
});
|
||||
|
||||
it("should include specified capture_focus foreground", () => {
|
||||
const args = buildSwiftCliArgs({
|
||||
...defaults,
|
||||
capture_focus: "foreground",
|
||||
});
|
||||
it("should include capture_focus", () => {
|
||||
const args = buildSwiftCliArgs({ capture_focus: "foreground" });
|
||||
expect(args).toEqual(
|
||||
expect.arrayContaining(["--capture-focus", "foreground"]),
|
||||
);
|
||||
});
|
||||
|
||||
it("should use PEEKABOO_DEFAULT_SAVE_PATH if no path and no question", () => {
|
||||
process.env.PEEKABOO_DEFAULT_SAVE_PATH = "/default/env.png";
|
||||
const args = buildSwiftCliArgs({});
|
||||
expect(args).toContain("--path");
|
||||
expect(args).toContain("/default/env.png");
|
||||
delete process.env.PEEKABOO_DEFAULT_SAVE_PATH;
|
||||
});
|
||||
|
||||
it("should NOT use PEEKABOO_DEFAULT_SAVE_PATH if effectivePath is provided", () => {
|
||||
process.env.PEEKABOO_DEFAULT_SAVE_PATH = "/default/env.png";
|
||||
// When effectivePath is provided (which happens when question is asked), it overrides PEEKABOO_DEFAULT_SAVE_PATH
|
||||
const args = buildSwiftCliArgs({}, undefined, "/tmp/temp-path.png");
|
||||
expect(args).toContain("--path");
|
||||
expect(args).toContain("/tmp/temp-path.png");
|
||||
expect(args).not.toContain("/default/env.png");
|
||||
delete process.env.PEEKABOO_DEFAULT_SAVE_PATH;
|
||||
});
|
||||
|
||||
it("should handle all options together", () => {
|
||||
const input: ImageToolInput = {
|
||||
...defaults, // Ensure all required fields are present
|
||||
app: "Preview",
|
||||
path: "/users/test/file.tiff",
|
||||
mode: "window",
|
||||
window_specifier: { index: 1 },
|
||||
const input: ImageInput = {
|
||||
app_target: "Preview:WINDOW_INDEX:1",
|
||||
path: "/users/test/file.png",
|
||||
format: "png",
|
||||
capture_focus: "foreground",
|
||||
};
|
||||
|
|
@ -602,53 +628,17 @@ describe("Image Tool", () => {
|
|||
"image",
|
||||
"--app",
|
||||
"Preview",
|
||||
"--path",
|
||||
"/users/test/file.tiff",
|
||||
"--mode",
|
||||
"window",
|
||||
"--window-index",
|
||||
"1",
|
||||
"--path",
|
||||
"/users/test/file.png",
|
||||
"--format",
|
||||
"png",
|
||||
"--capture-focus",
|
||||
"foreground",
|
||||
]);
|
||||
});
|
||||
|
||||
it("should use input.path if provided, even with a question", () => {
|
||||
const input: ImageToolInput = { path: "/my/path.png", question: "test" };
|
||||
const args = buildSwiftCliArgs(input);
|
||||
expect(args).toContain("--path");
|
||||
expect(args).toContain("/my/path.png");
|
||||
});
|
||||
|
||||
it("should NOT use PEEKABOO_DEFAULT_SAVE_PATH if a question is asked", () => {
|
||||
process.env.PEEKABOO_DEFAULT_SAVE_PATH = "/default/env.png";
|
||||
const input: ImageToolInput = { question: "test" };
|
||||
const args = buildSwiftCliArgs(input);
|
||||
expect(args.includes("--path")).toBe(false);
|
||||
delete process.env.PEEKABOO_DEFAULT_SAVE_PATH;
|
||||
});
|
||||
|
||||
it("should use PEEKABOO_DEFAULT_SAVE_PATH if no path and no question", () => {
|
||||
process.env.PEEKABOO_DEFAULT_SAVE_PATH = "/default/env.png";
|
||||
const input: ImageToolInput = {};
|
||||
const args = buildSwiftCliArgs(input);
|
||||
expect(args).toContain("--path");
|
||||
expect(args).toContain("/default/env.png");
|
||||
delete process.env.PEEKABOO_DEFAULT_SAVE_PATH;
|
||||
});
|
||||
|
||||
it("should use default format and capture_focus if not provided", () => {
|
||||
const input: ImageToolInput = {
|
||||
format: "png",
|
||||
capture_focus: "background",
|
||||
};
|
||||
const args = buildSwiftCliArgs(input);
|
||||
expect(args).toContain("--format");
|
||||
expect(args).toContain("png");
|
||||
expect(args).toContain("--capture-focus");
|
||||
expect(args).toContain("background");
|
||||
});
|
||||
});
|
||||
});
|
||||
});
|
||||
Loading…
Reference in a new issue