Peekaboo/docs/tool-description.md

9.5 KiB

Tool Schema Description: Debugging and Resolution

This document outlines the process undertaken to debug and resolve issues related to tool schema generation and compatibility, particularly for the image tool in the Peekaboo MCP server. The goal was to ensure that tool parameters, including their descriptions, were correctly processed by the zodToJsonSchema function and displayed accurately in the client (Cursor, powered by Gemini 2.5 Pro).

1. Initial Problem

The primary issue was that the image tool's parameters were not loading or being displayed correctly in the client. The client often reported an "incompatible schema" error for this tool, while other tools like analyze and list (which had simpler schemas) were working correctly after some initial refinements. This indicated a problem specific to the complexity or structure of the image tool's Zod schema or how it was being converted to a JSON schema.

2. Refinement of zodToJsonSchema

A significant early step was to refactor the zodToJsonSchema function located in src/index.ts. The initial version was somewhat simplistic and did not robustly handle various Zod constructs:

  • Extracting descriptions from .describe() calls, especially when nested within .optional() or .default().
  • Properly representing Zod unions (z.union()), objects (z.object()), enums (z.enum()), and custom types (z.custom()).

The refactoring involved:

  • Creating a recursive helper function (unwrapZodSchema) to get to the core Zod type and its description, peeling off wrappers like ZodOptional and ZodDefault.
  • Ensuring that descriptions from .describe() calls were consistently picked up and added to the description field in the resulting JSON schema properties.
  • Explicitly handling different Zod types (ZodString, ZodBoolean, ZodNumber, ZodEnum, ZodObject, ZodArray, ZodUnion, ZodLiteral, ZodNativeEnum, ZodEffects for transformations/refinements) to build a more accurate JSON schema representation.

This refactoring was crucial for the analyze and list tools to display their parameters correctly and laid the foundation for debugging the image tool.

3. Debugging the imageToolSchema

The imageToolSchema was the most complex, involving several optional fields, enums, a union of objects, and a custom Zod type. The debugging approach was methodical:

  1. Bottom-Up Simplification: The imageToolSchema was initially reduced to its simplest possible form (e.g., a single optional string field like app). The imageToolHandler logic was also temporarily stubbed out to return a static success message to avoid TypeScript errors due to the schema changes.
  2. Incremental Re-addition of Fields: One by one, each original field was added back to the imageToolSchema, followed by a build and client test:
    • app (optional string)
    • question (optional string)
    • return_data (optional boolean with default)
    • format (optional enum with default)
    • capture_focus (optional enum with default)
    • path (optional string)
    • mode (optional enum without Zod default)
    • All these fields, when added individually or in combination, resulted in a schema that was correctly displayed in the client. This confirmed that basic Zod types, optionals, defaults, and enums were being handled correctly by the improved zodToJsonSchema and were compatible with the client/model.
  3. Identifying the Problematic Fields: The issues arose when reintroducing the more complex fields:
    • window_specifier: An optional z.union([z.object({ title: ... }), z.object({ index: ... })]). This field, surprisingly, did work and its parameters displayed correctly once the main tool description in src/index.ts was shortened (see section 4).
    • provider_config: This was the most problematic. Initially defined in imageToolSchema as z.custom<AIProviderConfig>().optional().describe(...), where AIProviderConfig was a z.union([OllamaConfig, OpenAIConfig]).

4. Addressing UI Space and Main Descriptions

During testing, it became apparent that the client UI (Cursor's tooltip/parameter display area) had limited space. The verbose, multi-line description strings in src/index.ts (which manually listed parameters as a hack) were consuming this space, preventing the schema-derived parameters from being fully visible.

Solution: The main description strings for all tools in src/index.ts were shortened to be concise summaries. This allowed the client to properly display the parameter details generated by zodToJsonSchema.

5. Resolving provider_config Incompatibility

Even with the refined zodToJsonSchema and shortened main descriptions, the image tool would often trigger an "incompatible schema" error message from the Gemini model when provider_config was included with its z.custom(z.union(...)) definition. However, paradoxically, the client UI would sometimes still display the parameters correctly, suggesting a discrepancy between the client's display-rendering schema validation and the model's execution-time schema validation.

The Fix: The provider_config field in imageToolSchema (in src/tools/image.ts) was changed to mirror the structure used in the analyzeToolSchema (which was working reliably). Instead of z.custom<AIProviderConfig>(), it became a direct z.object() definition:

provider_config: z
    .object({
      type: z
        .enum(["auto", "ollama", "openai"])
        .default("auto")
        .describe(
          "AI provider type. 'auto' uses server default.",
        ),
      model: z
        .string()
        .optional()
        .describe(
          "Optional model name. Uses server default if omitted.",
        ),
    })
    .optional()
    .describe(
      "Optional. Specify AI provider/model for analysis.",
    ),

This simpler, more explicit Zod structure for provider_config resolved the incompatibility. The client was then able to consistently load and display all parameters for the image tool without the "incompatible schema" error blocking its usability (though the error message itself sometimes lingered, possibly due to caching, it no longer prevented parameter display).

6. Restoring Handler Logic and Testing

After the schema was confirmed to be working, the stubbed-out implementations of imageToolHandler, buildSwiftCliArgs, and generateImageCaptureSummary in src/tools/image.ts needed to be reverted to their original, fully functional code. Following this, npm test was run to ensure all unit and integration tests passed.

7. Fine-tuning Description Length for Client UI

After successfully resolving the Gemini model's schema compatibility issues, a separate observation was made regarding the client UI (Cursor). When the main tool descriptions in src/index.ts were made very verbose (e.g., including long use-case examples directly in the description string), Cursor's UI would not display these long descriptions, defaulting to showing only the parameter list generated from the schema. This was not a Gemini model rejection but rather a UI display limitation or choice.

Solution: The main descriptions were adjusted to be moderately detailed, providing core capabilities and multi-screen/window behavior, while omitting extremely long examples. This length was found to be acceptable for Cursor's UI, allowing it to display the richer description alongside the schema-derived parameters.

Key Learnings:

  • Schema Simplicity: The Gemini model's schema validation appears to be sensitive to complex Zod structures, especially combinations like z.custom() wrapping z.union() of z.object()s. Favoring more direct and explicit Zod definitions (e.g., z.object() with clearly defined properties) improves compatibility.
  • zodToJsonSchema Robustness: A comprehensive zodToJsonSchema function that correctly handles various Zod types and extracts .describe() metadata is crucial for accurate schema generation.
  • Client UI vs. Model Validation: There can be slight differences in how a client UI parses/displays a schema and how the underlying model validates it for execution. Successful display in the UI is a good sign but not a definitive guarantee of model compatibility if complex types are involved. The inverse can also occur: the model may accept a schema that a client UI truncates or simplifies for display due to its own constraints.
  • Main Description Length for Client UI: Client UIs (like Cursor) may have their own limitations or display preferences for the length of the main tool description string. Overly verbose descriptions might not be fully displayed. It's important to balance richness of information with conciseness suitable for the UI, relying on the schema for detailed parameter information.
  • Concise Main Descriptions (Initial Approach): Initially, keeping the primary description field for a tool (in src/index.ts) very concise was a workaround for UI space issues, allowing schema-derived parameters to appear. The final approach found a middle ground.
  • Iterative Debugging: The bottom-up approach (simplifying then incrementally adding complexity) was highly effective in isolating the problematic parts of the schema.

By addressing these points, particularly the structure of provider_config, the verbosity of main descriptions for UI compatibility, and ensuring a robust zodToJsonSchema implementation, the Peekaboo tools' schemas were made fully compatible and are now presented effectively in the client.