Add proper tool description and work around a bug in Gemini’s parser

2026-06-28 05:29:34 +00:00 · 2025-05-25 16:45:10 +02:00 · 2025-05-25 16:45:10 +02:00 · f41e70e23e
commit f41e70e23e
parent 50846a5816
6 changed files with 417 additions and 201 deletions
--- a/README.md
+++ b/README.md
@ -522,18 +522,76 @@ Once summoned, Peekaboo grants you three supernatural abilities:

 ### 🖼️ `image` - Soul Capture

-**Parameters:**
- `mode`: `"screen"` | `"window"` | `"multi"` (default: "screen")
- `app`: Application identifier for window/multi modes
- `path`: Custom save path (optional)
+The `image` tool is a powerful utility for capturing screenshots of the entire screen, specific application windows, or all windows of an application. It now also supports optional AI-powered analysis of the captured image.
+
+**Dual Capability: Capture & Analyze**
+
+*   **Capture Only**: By default, or if only capture-related parameters are provided, the tool functions purely as a screenshot utility. It can save images to a specified path and/or return image data directly in the response.
+*   **Capture & Analyze**: If the `question` parameter is provided, the tool first captures the image as specified. Then, it sends this image along with the `question` to an AI model for analysis. The analysis results are returned alongside capture information. When a `question` is provided, image data is *not* returned in `base64_data` format to avoid excessively large responses; instead, focus is on the analysis result. If you need the image file itself when performing analysis, ensure you provide a `path` for it to be saved.
+
+The AI analysis capabilities leverage the same configuration as the `analyze` tool, primarily through the `PEEKABOO_AI_PROVIDERS` environment variable. You can also specify a particular AI provider and model per call using the `provider_config` parameter.
+
+**Parameters:**
+
+*   `app` (string, optional):
+    *   **Description**: Target application for capture. Can be the application's name (e.g., "Safari"), bundle ID (e.g., "com.apple.Safari"), or a partial name.
+    *   **Default**: If omitted, the tool captures the entire screen(s) or the focused window based on other parameters.
+    *   **Behavior**: Uses fuzzy matching to find the application.
+*   `path` (string, optional):
+    *   **Description**: The base absolute path where the captured image(s) should be saved.
+    *   **Default**: If omitted, images are saved to a default temporary path generated by the backend. If `question` is provided and `path` is omitted, a temporary path is used for the capture before analysis, and this temporary file is cleaned up afterwards.
+    *   **Behavior**:
+        *   For `screen` or `multi` mode captures, the backend automatically appends display or window-specific information to this base path to create unique filenames (e.g., `your_path_Display1.png`, `your_path_Window1_Title.png`).
+        *   If `return_data` is `true` and a `path` is specified, images are both saved to the path AND returned as base64 data in the response (unless a `question` is also provided, in which case base64 data is not returned).
+*   `mode` (enum: `"screen" | "window" | "multi"`, optional):
+    *   **Description**: Specifies the capture mode.
+        *   `"screen"`: Captures the entire content of each connected display.
+        *   `"window"`: Captures a single window of a target application. Requires `app` to be specified.
+        *   `"multi"`: Captures all windows of a target application individually. Requires `app` to be specified.
+    *   **Default**: Defaults to `"window"` if `app` is provided, otherwise defaults to `"screen"`.
+*   `window_specifier` (object, optional):
+    *   **Description**: Used in `window` mode to specify which window of the target application to capture.
+    *   **Structure**: Can be one of:
+        *   `{ "title": "window_title_string" }`: Captures the window whose title contains the given string.
+        *   `{ "index": number }`: Captures the window by its index (0 for the frontmost window, 1 for the next, and so on). Using `index` might require setting `capture_focus` to `"foreground"`.
+    *   **Default**: If omitted in `window` mode, captures the main/frontmost window of the target application.
+*   `format` (enum: `"png" | "jpg"`, optional):
+    *   **Description**: The desired output image format.
+    *   **Default**: `"png"`.
+*   `return_data` (boolean, optional):
+    *   **Description**: If set to `true`, the image data (as base64 encoded string) is included in the response content.
+    *   **Default**: `false`.
+    *   **Behavior**:
+        *   For `"window"` mode, one image data item is returned.
+        *   For `"screen"` or `"multi"` mode, multiple image data items may be returned (one per display/window).
+        *   **Important**: If a `question` is provided for analysis, `base64_data` is *not* returned in the response, regardless of this flag, to keep response sizes manageable. The focus shifts to providing the analysis result.
+*   `capture_focus` (enum: `"background" | "foreground"`, optional):
+    *   **Description**: Controls how the tool interacts with window focus during capture.
+        *   `"background"`: Captures the window without altering its current focus state or bringing the application to the front. This is generally preferred for non-interactive use.
+        *   `"foreground"`: Brings the target application and window to the front (activates it) before performing the capture. This might be necessary for certain applications or to ensure the desired window content, especially when using `window_specifier` by `index`.
+    *   **Default**: `"background"`.
+*   `question` (string, optional):
+    *   **Description**: If a question string is provided, the tool will capture the image as specified and then send it (along with this question) to an AI model for analysis.
+    *   **Default**: None (no analysis is performed by default).
+    *   **Behavior**: The analysis result will be included in the response. `PEEKABOO_AI_PROVIDERS` environment variable is used for configuring AI, or `provider_config` can be used for per-call settings.
+*   `provider_config` (object, optional):
+    *   **Description**: Specifies AI provider configuration for the analysis when a `question` is provided. This allows overriding the server's default AI provider settings for this specific call.
+    *   **Structure**: Same as the `analyze` tool's `provider_config` parameter (e.g., `{ "type": "ollama", "model": "llava" }` or `{ "type": "openai", "model": "gpt-4-vision-preview" }`).
+    *   **Default**: If omitted, the analysis will use the server's default AI configuration (from `PEEKABOO_AI_PROVIDERS`).
+
+**Example Usage (Capture & Analyze):**

-**Example:**
 ```json
 {
-  "name": "image", 
+  "name": "image",
  "arguments": {
    "mode": "window",
-    "app": "Safari"
+    "app": "Safari",
+    "path": "~/Desktop/SafariScreenshot.png",
+    "format": "png",
+    "return_data": true,
+    "capture_focus": "foreground",
+    "question": "What is the main color visible in the top-left quadrant?"
  }
 }
 ```
--- a/docs/tool-description.md
+++ b/docs/tool-description.md
@ -0,0 +1,94 @@
+# Tool Schema Description: Debugging and Resolution
+
+This document outlines the process undertaken to debug and resolve issues related to tool schema generation and compatibility, particularly for the `image` tool in the Peekaboo MCP server. The goal was to ensure that tool parameters, including their descriptions, were correctly processed by the `zodToJsonSchema` function and displayed accurately in the client (Cursor, powered by Gemini 2.5 Pro).
+
+## 1. Initial Problem
+
+The primary issue was that the `image` tool's parameters were not loading or being displayed correctly in the client. The client often reported an "incompatible schema" error for this tool, while other tools like `analyze` and `list` (which had simpler schemas) were working correctly after some initial refinements. This indicated a problem specific to the complexity or structure of the `image` tool's Zod schema or how it was being converted to a JSON schema.
+
+## 2. Refinement of `zodToJsonSchema`
+
+A significant early step was to refactor the `zodToJsonSchema` function located in `src/index.ts`. The initial version was somewhat simplistic and did not robustly handle various Zod constructs:
+*   Extracting descriptions from `.describe()` calls, especially when nested within `.optional()` or `.default()`.
+*   Properly representing Zod unions (`z.union()`), objects (`z.object()`), enums (`z.enum()`), and custom types (`z.custom()`).
+
+The refactoring involved:
+*   Creating a recursive helper function (`unwrapZodSchema`) to get to the core Zod type and its description, peeling off wrappers like `ZodOptional` and `ZodDefault`.
+*   Ensuring that descriptions from `.describe()` calls were consistently picked up and added to the `description` field in the resulting JSON schema properties.
+*   Explicitly handling different Zod types (`ZodString`, `ZodBoolean`, `ZodNumber`, `ZodEnum`, `ZodObject`, `ZodArray`, `ZodUnion`, `ZodLiteral`, `ZodNativeEnum`, `ZodEffects` for transformations/refinements) to build a more accurate JSON schema representation.
+
+This refactoring was crucial for the `analyze` and `list` tools to display their parameters correctly and laid the foundation for debugging the `image` tool.
+
+## 3. Debugging the `imageToolSchema`
+
+The `imageToolSchema` was the most complex, involving several optional fields, enums, a union of objects, and a custom Zod type. The debugging approach was methodical:
+
+1.  **Bottom-Up Simplification**: The `imageToolSchema` was initially reduced to its simplest possible form (e.g., a single optional string field like `app`). The `imageToolHandler` logic was also temporarily stubbed out to return a static success message to avoid TypeScript errors due to the schema changes.
+2.  **Incremental Re-addition of Fields**: One by one, each original field was added back to the `imageToolSchema`, followed by a build and client test:
+    *   `app` (optional string)
+    *   `question` (optional string)
+    *   `return_data` (optional boolean with default)
+    *   `format` (optional enum with default)
+    *   `capture_focus` (optional enum with default)
+    *   `path` (optional string)
+    *   `mode` (optional enum without Zod default)
+    *   All these fields, when added individually or in combination, resulted in a schema that was correctly displayed in the client. This confirmed that basic Zod types, optionals, defaults, and enums were being handled correctly by the improved `zodToJsonSchema` and were compatible with the client/model.
+3.  **Identifying the Problematic Fields**: The issues arose when reintroducing the more complex fields:
+    *   `window_specifier`: An optional `z.union([z.object({ title: ... }), z.object({ index: ... })])`. This field, surprisingly, *did* work and its parameters displayed correctly once the main tool description in `src/index.ts` was shortened (see section 4).
+    *   `provider_config`: This was the most problematic. Initially defined in `imageToolSchema` as `z.custom<AIProviderConfig>().optional().describe(...)`, where `AIProviderConfig` was a `z.union([OllamaConfig, OpenAIConfig])`.
+
+## 4. Addressing UI Space and Main Descriptions
+
+During testing, it became apparent that the client UI (Cursor's tooltip/parameter display area) had limited space. The verbose, multi-line `description` strings in `src/index.ts` (which manually listed parameters as a hack) were consuming this space, preventing the schema-derived parameters from being fully visible.
+
+**Solution**: The main `description` strings for all tools in `src/index.ts` were shortened to be concise summaries. This allowed the client to properly display the parameter details generated by `zodToJsonSchema`.
+
+## 5. Resolving `provider_config` Incompatibility
+
+Even with the refined `zodToJsonSchema` and shortened main descriptions, the `image` tool would often trigger an "incompatible schema" error message from the Gemini model when `provider_config` was included with its `z.custom(z.union(...))` definition. However, paradoxically, the client UI *would sometimes still display the parameters correctly*, suggesting a discrepancy between the client's display-rendering schema validation and the model's execution-time schema validation.
+
+**The Fix**:
+The `provider_config` field in `imageToolSchema` (in `src/tools/image.ts`) was changed to mirror the structure used in the `analyzeToolSchema` (which was working reliably). Instead of `z.custom<AIProviderConfig>()`, it became a direct `z.object()` definition:
+```typescript
+provider_config: z
+    .object({
+      type: z
+        .enum(["auto", "ollama", "openai"])
+        .default("auto")
+        .describe(
+          "AI provider type. 'auto' uses server default.",
+        ),
+      model: z
+        .string()
+        .optional()
+        .describe(
+          "Optional model name. Uses server default if omitted.",
+        ),
+    })
+    .optional()
+    .describe(
+      "Optional. Specify AI provider/model for analysis.",
+    ),
+```
+This simpler, more explicit Zod structure for `provider_config` resolved the incompatibility. The client was then able to consistently load and display all parameters for the `image` tool without the "incompatible schema" error blocking its usability (though the error message itself sometimes lingered, possibly due to caching, it no longer prevented parameter display).
+
+## 6. Restoring Handler Logic and Testing
+
+After the schema was confirmed to be working, the stubbed-out implementations of `imageToolHandler`, `buildSwiftCliArgs`, and `generateImageCaptureSummary` in `src/tools/image.ts` needed to be reverted to their original, fully functional code. Following this, `npm test` was run to ensure all unit and integration tests passed.
+
+## 7. Fine-tuning Description Length for Client UI
+
+After successfully resolving the Gemini model's schema compatibility issues, a separate observation was made regarding the client UI (Cursor). When the main tool descriptions in `src/index.ts` were made very verbose (e.g., including long use-case examples directly in the description string), Cursor's UI would not display these long descriptions, defaulting to showing only the parameter list generated from the schema. This was *not* a Gemini model rejection but rather a UI display limitation or choice.
+
+**Solution**: The main descriptions were adjusted to be moderately detailed, providing core capabilities and multi-screen/window behavior, while omitting extremely long examples. This length was found to be acceptable for Cursor's UI, allowing it to display the richer description alongside the schema-derived parameters.
+
+## Key Learnings:
+
+*   **Schema Simplicity**: The Gemini model's schema validation appears to be sensitive to complex Zod structures, especially combinations like `z.custom()` wrapping `z.union()` of `z.object()`s. Favoring more direct and explicit Zod definitions (e.g., `z.object()` with clearly defined properties) improves compatibility.
+*   **`zodToJsonSchema` Robustness**: A comprehensive `zodToJsonSchema` function that correctly handles various Zod types and extracts `.describe()` metadata is crucial for accurate schema generation.
+*   **Client UI vs. Model Validation**: There can be slight differences in how a client UI parses/displays a schema and how the underlying model validates it for execution. Successful display in the UI is a good sign but not a definitive guarantee of model compatibility if complex types are involved. The inverse can also occur: the model may accept a schema that a client UI truncates or simplifies for display due to its own constraints.
+*   **Main Description Length for Client UI**: Client UIs (like Cursor) may have their own limitations or display preferences for the length of the main tool description string. Overly verbose descriptions might not be fully displayed. It's important to balance richness of information with conciseness suitable for the UI, relying on the schema for detailed parameter information.
+*   **Concise Main Descriptions (Initial Approach)**: Initially, keeping the primary `description` field for a tool (in `src/index.ts`) very concise was a workaround for UI space issues, allowing schema-derived parameters to appear. The final approach found a middle ground.
+*   **Iterative Debugging**: The bottom-up approach (simplifying then incrementally adding complexity) was highly effective in isolating the problematic parts of the schema.
+
+By addressing these points, particularly the structure of `provider_config`, the verbosity of main descriptions for UI compatibility, and ensuring a robust `zodToJsonSchema` implementation, the Peekaboo tools' schemas were made fully compatible and are now presented effectively in the client. 
--- a/src/index.ts
+++ b/src/index.ts
@ -22,6 +22,7 @@ import {
 } from "./tools/index.js";
 import { generateServerStatusString } from "./utils/server-status.js";
 import { initializeSwiftCliPath } from "./utils/peekaboo-cli.js";
+import { zodToJsonSchema } from "./utils/zod-to-json-schema.js";

 // Get package version and determine package root
 const __filename = fileURLToPath(import.meta.url);
@ -78,86 +79,6 @@ const logger = pino(
 // Tool context for handlers
 const toolContext = { logger };

-// Convert Zod schema to JSON Schema format
-function zodToJsonSchema(schema: any): any {
-  // Simple conversion - this would need to be more sophisticated for complex schemas
-  if (schema._def?.typeName === "ZodObject") {
-    const properties: any = {};
-    const required = [];
-
-    for (const [key, value] of Object.entries(schema.shape)) {
-      const field = value as any;
-      if (field._def?.typeName === "ZodString") {
-        properties[key] = { type: "string", description: field.description };
-        if (!field.isOptional()) required.push(key);
-      } else if (field._def?.typeName === "ZodEnum") {
-        properties[key] = {
-          type: "string",
-          enum: field._def.values,
-          description: field.description,
-        };
-        if (!field.isOptional()) required.push(key);
-      } else if (field._def?.typeName === "ZodBoolean") {
-        properties[key] = { type: "boolean", description: field.description };
-        if (!field.isOptional()) required.push(key);
-      } else if (field._def?.typeName === "ZodOptional") {
-        // Handle optional fields
-        const innerType = field._def.innerType;
-        if (innerType._def?.typeName === "ZodString") {
-          properties[key] = {
-            type: "string",
-            description: innerType.description,
-          };
-        } else if (innerType._def?.typeName === "ZodEnum") {
-          properties[key] = {
-            type: "string",
-            enum: innerType._def.values,
-            description: innerType.description,
-          };
-        }
-      } else if (field._def?.typeName === "ZodDefault") {
-        // Handle default fields
-        const innerType = field._def.innerType;
-        if (innerType._def?.typeName === "ZodString") {
-          properties[key] = {
-            type: "string",
-            description: innerType.description,
-            default: field._def.defaultValue(),
-          };
-        } else if (innerType._def?.typeName === "ZodEnum") {
-          properties[key] = {
-            type: "string",
-            enum: innerType._def.values,
-            description: innerType.description,
-            default: field._def.defaultValue(),
-          };
-        } else if (innerType._def?.typeName === "ZodBoolean") {
-          properties[key] = {
-            type: "boolean",
-            description: innerType.description,
-            default: field._def.defaultValue(),
-          };
-        }
-      } else {
-        // Fallback for complex types
-        properties[key] = {
-          type: "object",
-          description: field.description || "Complex object",
-        };
-      }
-    }
-
-    return {
-      type: "object",
-      properties,
-      required,
-    };
-  }
-
-  // Fallback
-  return { type: "object" };
-}
-
 // Create MCP server using the low-level API
 const server = new Server(
  {
@ -182,21 +103,56 @@ server.setRequestHandler(ListToolsRequestSchema, async () => {
      {
        name: "image",
        description:
-          "Captures macOS screen content. Targets: entire screen (each display separately), a specific application window, or all windows of an application. Supports foreground/background capture. Captured image(s) can be saved to file(s) and/or returned directly as image data. Window shadows/frames are automatically excluded. Application identification uses intelligent fuzzy matching." +
+`Captures macOS screen content and can optionally perform AI-powered analysis on the screenshot.
+
+Core Capabilities:
+- Screen Capture: Captures the entire screen (handles multiple displays by capturing each as a separate image), a specific application window, or all windows of a target application.
+- Flexible Output: Saves images to a specified path and/or returns image data directly in the response.
+- AI Analysis: If a 'question' parameter is provided, the captured image is sent to an AI model for analysis (e.g., asking what's in the image, reading text).
+
+Multi-Screen/Window Behavior:
+- 'screen' mode with multiple displays: Captures each display as an individual image. If a path is provided, files will be named like 'path_display1.png', 'path_display2.png'.
+- 'multi' mode for an app: Captures all visible windows of the specified application. If a path is provided, files will be named like 'path_window1_title.png', 'path_window2_title.png'.` +
          statusSuffix,
        inputSchema: zodToJsonSchema(imageToolSchema),
      },
      {
        name: "analyze",
        description:
-          "Analyzes an image file using a configured AI model (local Ollama, cloud OpenAI, etc.) and returns a textual analysis/answer. Requires image path. AI provider selection and model defaults are governed by the server's `AI_PROVIDERS` environment variable and client overrides." +
+`Analyzes a pre-existing image file from the local filesystem using a configured AI model.
+
+This tool is useful when an image already exists (e.g., previously captured, downloaded, or generated) and you need to understand its content, extract text, or answer specific questions about it.
+
+Capabilities:
+- Image Understanding: Provide any question about the image (e.g., "What objects are in this picture?", "Describe the scene.", "Is there a red car?").
+- Text Extraction (OCR): Ask the AI to extract text from the image (e.g., "What text is visible in this screenshot?").
+- Flexible AI Configuration: Can use server-default AI providers/models or specify a particular one per call via 'provider_config'.
+
+Example:
+If you have an image '/tmp/chart.png' showing a bar chart, you could ask:
+{ "image_path": "/tmp/chart.png", "question": "Which category has the highest value in this bar chart?" }
+The AI will analyze the image and attempt to answer your question based on its visual content.` +
          statusSuffix,
        inputSchema: zodToJsonSchema(analyzeToolSchema),
      },
      {
        name: "list",
        description:
-          "Lists system items: all running applications, windows of a specific app, or server status. Allows specifying window details. App ID uses fuzzy matching." +
+`Lists various system items on macOS, providing situational awareness.
+
+Capabilities:
+- Running Applications: Get a list of all currently running applications (names and bundle IDs).
+- Application Windows: For a specific application (identified by name or bundle ID), list its open windows.
+  - Details: Optionally include window IDs, bounds (position and size), and whether a window is off-screen.
+  - Multi-window apps: Clearly lists each window of the target app.
+- Server Status: Provides information about the Peekaboo MCP server itself (version, configured AI providers).
+
+Use Cases:
+- Agent needs to know if 'Photoshop' is running before attempting to automate it.
+  { "item_type": "running_applications" } // Agent checks if 'Photoshop' is in the list.
+- Agent wants to find a specific 'Notes' window to capture.
+  { "item_type": "application_windows", "app": "Notes", "include_window_details": ["ids", "bounds"] }
+  The agent can then use the window title or ID with the 'image' tool.` +
          statusSuffix,
        inputSchema: zodToJsonSchema(listToolSchema),
      },
--- a/src/tools/image.ts
+++ b/src/tools/image.ts
@ -20,10 +20,33 @@ import * as os from "os";

 export const imageToolSchema = z.object({
  app: z
+    .string()
+    .optional()
+    .describe("Target application name or bundle ID."),
+  question: z
    .string()
    .optional()
    .describe(
-      "Optional. Target application: name, bundle ID, or partial name. If omitted, captures screen(s). Uses fuzzy matching.",
+      "If provided, the captured image will be analyzed using this question. Analysis results will be added to the output.",
+    ),
+  format: z
+    .enum(["png", "jpg"])
+    .optional()
+    .default("png")
+    .describe("Output image format. Defaults to 'png'."),
+  return_data: z
+    .boolean()
+    .optional()
+    .default(false)
+    .describe(
+      "Optional. If true, image data is returned in response content (one item for 'window' mode, multiple for 'screen' or 'multi' mode). If 'question' is provided, 'base64_data' is NOT returned regardless of this flag.",
+    ),
+  capture_focus: z
+    .enum(["background", "foreground"])
+    .optional()
+    .default("background")
+    .describe(
+      "Optional. Focus behavior. 'background' (default): capture without altering window focus. 'foreground': bring target to front before capture.",
    ),
  path: z
    .string()
@ -54,36 +77,24 @@ export const imageToolSchema = z.object({
    .describe(
      "Optional. Specifies which window for 'window' mode. Defaults to main/frontmost of target app.",
    ),
-  format: z
-    .enum(["png", "jpg"])
-    .optional()
-    .default("png")
-    .describe("Output image format. Defaults to 'png'."),
-  return_data: z
-    .boolean()
-    .optional()
-    .default(false)
-    .describe(
-      "Optional. If true, image data is returned in response content (one item for 'window' mode, multiple for 'screen' or 'multi' mode). If 'question' is provided, 'base64_data' is NOT returned regardless of this flag.",
-    ),
-  capture_focus: z
-    .enum(["background", "foreground"])
-    .optional()
-    .default("background")
-    .describe(
-      "Optional. Focus behavior. 'background' (default): capture without altering window focus. 'foreground': bring target to front before capture.",
-    ),
-  question: z
-    .string()
-    .optional()
-    .describe(
-      "If provided, the captured image will be analyzed using this question. Analysis results will be added to the output.",
-    ),
  provider_config: z
-    .custom<AIProviderConfig>()
+    .object({
+      type: z
+        .enum(["auto", "ollama", "openai"])
+        .default("auto")
+        .describe(
+          "AI provider type (e.g., 'ollama', 'openai'). 'auto' uses server default. Must be enabled on server.",
+        ),
+      model: z
+        .string()
+        .optional()
+        .describe(
+          "Optional model name. If omitted, uses server default for the chosen provider type.",
+        ),
+    })
    .optional()
    .describe(
-      "AI provider configuration for analysis (e.g., { type: 'ollama', model: 'llava' }). If not provided, uses server default configuration for analysis. Refer to 'analyze' tool schema for structure.",
+      "Optional. Specify AI provider and model for analysis. Overrides server defaults.",
    ),
 });

--- a/src/utils/zod-to-json-schema.ts
+++ b/src/utils/zod-to-json-schema.ts
@ -1,85 +1,93 @@
 import { z } from "zod";

+/**
+ * Helper function to recursively unwrap Zod schema wrappers
+ * This properly extracts descriptions from nested wrapper types
+ */
+function unwrapZodSchema(field: z.ZodTypeAny): { 
+  coreSchema: z.ZodTypeAny; 
+  description: string | undefined;
+  hasDefault: boolean;
+  defaultValue?: any;
+} {
+  let description = (field as any)._def?.description || (field as any).description;
+  let hasDefault = false;
+  let defaultValue: any;
+  
+  // Handle wrapper types
+  if (field instanceof z.ZodOptional) {
+    const inner = unwrapZodSchema(field._def.innerType);
+    return {
+      coreSchema: inner.coreSchema,
+      description: description || inner.description,
+      hasDefault: inner.hasDefault,
+      defaultValue: inner.defaultValue,
+    };
+  }
+  
+  if (field instanceof z.ZodDefault) {
+    hasDefault = true;
+    defaultValue = field._def.defaultValue();
+    const inner = unwrapZodSchema(field._def.innerType);
+    return {
+      coreSchema: inner.coreSchema,
+      description: description || inner.description,
+      hasDefault: true,
+      defaultValue,
+    };
+  }
+  
+  if (field instanceof z.ZodEffects) {
+    const inner = unwrapZodSchema(field._def.schema);
+    return {
+      coreSchema: inner.coreSchema,
+      description: description || inner.description,
+      hasDefault: inner.hasDefault,
+      defaultValue: inner.defaultValue,
+    };
+  }
+  
+  // Return the core schema
+  return { coreSchema: field, description, hasDefault, defaultValue };
+}
+
 /**
 * Convert Zod schema to JSON Schema format
- * This is a simplified converter for common Zod types used in the tools
+ * This is a robust converter for common Zod types used in the tools
 */
 export function zodToJsonSchema(schema: z.ZodTypeAny): any {
-  // Handle ZodDefault first
-  if (schema instanceof z.ZodDefault) {
-    const jsonSchema = zodToJsonSchema(schema._def.innerType);
-    jsonSchema.default = schema._def.defaultValue();
-    return jsonSchema;
-  }
-
-  // Handle ZodOptional
-  if (schema instanceof z.ZodOptional) {
-    return zodToJsonSchema(schema._def.innerType);
-  }
-
-  if (schema instanceof z.ZodString) {
-    const jsonSchema: any = { type: "string" };
-    if (schema.description) {
-      jsonSchema.description = schema.description;
-    }
-    return jsonSchema;
-  }
-
-  if (schema instanceof z.ZodNumber) {
-    const jsonSchema: any = { type: "number" };
-    if (schema.description) {
-      jsonSchema.description = schema.description;
-    }
-    if (schema.isInt) {
-      jsonSchema.type = "integer";
-    }
-    return jsonSchema;
-  }
-
-  if (schema instanceof z.ZodBoolean) {
-    const jsonSchema: any = { type: "boolean" };
-    if (schema.description) {
-      jsonSchema.description = schema.description;
-    }
-    return jsonSchema;
-  }
-
-  if (schema instanceof z.ZodEnum) {
-    const jsonSchema: any = {
-      type: "string",
-      enum: schema._def.values,
-    };
-    if (schema.description) {
-      jsonSchema.description = schema.description;
-    }
-    return jsonSchema;
-  }
-
-  if (schema instanceof z.ZodArray) {
-    const jsonSchema: any = {
-      type: "array",
-      items: zodToJsonSchema(schema._def.type),
-    };
-    if (schema.description) {
-      jsonSchema.description = schema.description;
-    }
-    return jsonSchema;
-  }
-
-  if (schema instanceof z.ZodObject) {
-    const shape = schema.shape;
+  const { coreSchema, description: rootDescription, hasDefault, defaultValue } = unwrapZodSchema(schema);
+  
+  // Handle ZodObject
+  if (coreSchema instanceof z.ZodObject) {
+    const shape = coreSchema.shape;
    const properties: any = {};
    const required: string[] = [];

    for (const [key, value] of Object.entries(shape)) {
      const fieldSchema = value as z.ZodTypeAny;
-      properties[key] = zodToJsonSchema(fieldSchema);
-
-      // Check if field is required (not optional and not default)
-      if (
-        !(fieldSchema instanceof z.ZodOptional) &&
-        !(fieldSchema instanceof z.ZodDefault)
-      ) {
+      const unwrapped = unwrapZodSchema(fieldSchema);
+      
+      // Check if field is optional
+      const isOptional = fieldSchema instanceof z.ZodOptional;
+      
+      // Build JSON schema for the property
+      const propertySchema = zodToJsonSchema(unwrapped.coreSchema);
+      
+      // Add description from unwrapping if not already present
+      if (unwrapped.description && !propertySchema.description) {
+        propertySchema.description = unwrapped.description;
+      }
+      
+      // Add default value if available
+      if (unwrapped.hasDefault && unwrapped.defaultValue !== undefined) {
+        propertySchema.default = unwrapped.defaultValue;
+      }
+      
+      properties[key] = propertySchema;
+      
+      // Add to required array if not optional and no default
+      if (!isOptional && !unwrapped.hasDefault) {
        required.push(key);
      }
    }
@ -93,23 +101,112 @@ export function zodToJsonSchema(schema: z.ZodTypeAny): any {
      jsonSchema.required = required;
    }

-    if (schema.description) {
-      jsonSchema.description = schema.description;
+    if (rootDescription) {
+      jsonSchema.description = rootDescription;
    }

    return jsonSchema;
  }
+  
+  // Handle ZodArray
+  if (coreSchema instanceof z.ZodArray) {
+    const jsonSchema: any = {
+      type: "array",
+      items: zodToJsonSchema(coreSchema._def.type),
+    };
+    
+    // Handle array constraints
+    const minLength = (coreSchema as any)._def.minLength;
+    if (minLength?.value > 0) {
+      jsonSchema.minItems = minLength.value;
+    }
+    
+    const maxLength = (coreSchema as any)._def.maxLength;
+    if (maxLength?.value !== undefined) {
+      jsonSchema.maxItems = maxLength.value;
+    }
+    
+    if (rootDescription) {
+      jsonSchema.description = rootDescription;
+    }
+    
+    if (hasDefault && defaultValue !== undefined) {
+      jsonSchema.default = defaultValue;
+    }
+    
+    return jsonSchema;
+  }

-  if (schema instanceof z.ZodUnion) {
-    return {
-      oneOf: schema._def.options.map((option: z.ZodTypeAny) =>
+  // Handle ZodString
+  if (coreSchema instanceof z.ZodString) {
+    const jsonSchema: any = { type: "string" };
+    if (rootDescription) {
+      jsonSchema.description = rootDescription;
+    }
+    if (hasDefault && defaultValue !== undefined) {
+      jsonSchema.default = defaultValue;
+    }
+    return jsonSchema;
+  }
+
+  // Handle ZodNumber
+  if (coreSchema instanceof z.ZodNumber) {
+    const jsonSchema: any = { type: "number" };
+    if (rootDescription) {
+      jsonSchema.description = rootDescription;
+    }
+    if ((coreSchema as any).isInt) {
+      jsonSchema.type = "integer";
+    }
+    if (hasDefault && defaultValue !== undefined) {
+      jsonSchema.default = defaultValue;
+    }
+    return jsonSchema;
+  }
+
+  // Handle ZodBoolean
+  if (coreSchema instanceof z.ZodBoolean) {
+    const jsonSchema: any = { type: "boolean" };
+    if (rootDescription) {
+      jsonSchema.description = rootDescription;
+    }
+    if (hasDefault && defaultValue !== undefined) {
+      jsonSchema.default = defaultValue;
+    }
+    return jsonSchema;
+  }
+
+  // Handle ZodEnum
+  if (coreSchema instanceof z.ZodEnum) {
+    const jsonSchema: any = {
+      type: "string",
+      enum: coreSchema._def.values,
+    };
+    if (rootDescription) {
+      jsonSchema.description = rootDescription;
+    }
+    if (hasDefault && defaultValue !== undefined) {
+      jsonSchema.default = defaultValue;
+    }
+    return jsonSchema;
+  }
+
+  // Handle ZodUnion
+  if (coreSchema instanceof z.ZodUnion) {
+    const jsonSchema: any = {
+      oneOf: coreSchema._def.options.map((option: z.ZodTypeAny) =>
        zodToJsonSchema(option),
      ),
    };
+    if (rootDescription) {
+      jsonSchema.description = rootDescription;
+    }
+    return jsonSchema;
  }

-  if (schema instanceof z.ZodLiteral) {
-    const value = schema._def.value;
+  // Handle ZodLiteral
+  if (coreSchema instanceof z.ZodLiteral) {
+    const value = coreSchema._def.value;
    const jsonSchema: any = {};

    if (typeof value === "string") {
@ -126,8 +223,8 @@ export function zodToJsonSchema(schema: z.ZodTypeAny): any {
      jsonSchema.const = value;
    }

-    if (schema.description) {
-      jsonSchema.description = schema.description;
+    if (rootDescription) {
+      jsonSchema.description = rootDescription;
    }

    return jsonSchema;
--- a/tests/unit/tools/image.test.ts
+++ b/tests/unit/tools/image.test.ts
@ -111,7 +111,7 @@ describe("Image Tool", () => {
      const result = await imageToolHandler(
        {
          format: "png",
-          return_data: false,
+        return_data: false,
          capture_focus: "background",
        },
        mockContext,
@ -455,7 +455,7 @@ describe("Image Tool", () => {
      const result = await imageToolHandler(
        {
          question: MOCK_QUESTION,
-          return_data: true,
+        return_data: true,
          format: "png",
        },
        mockContext,
@ -651,4 +651,4 @@ describe("Image Tool", () => {
      expect(args).toContain("background");
    });
  });
-});
+});