Peekaboo/CHANGELOG.md

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.1.0-beta.1] - 2025-01-09

### Added
- **PID-based application targeting**: You can now target applications by their Process ID using the `PID:XXXX` syntax
  - Works with both `image` and `list` tools
  - Example: `app_target: "PID:663"` to capture windows from process 663
  - Provides clear error messages for invalid PIDs or non-existent processes
  - Useful for targeting specific instances when multiple copies of an app are running

## [1.0.1] - 2025-01-08

### Fixed
- Re-release of v1.0.0 due to npm registry issue
- No code changes from v1.0.0

## [1.0.0] - 2025-01-08

### 🎉 First Stable Release

Peekaboo MCP is now production-ready! This release marks the culmination of extensive development, testing, and refinement to create a robust macOS screen capture and window management tool for AI agents.

### Key Features
- **Advanced Screen Capture**: Capture entire screens, specific windows, or all windows of an application
- **AI-Powered Image Analysis**: Analyze captured or existing images using multiple AI providers (Ollama, OpenAI)
- **Window Management**: List running applications and their windows with detailed metadata
- **Flexible Output Options**: Save to file or return Base64-encoded data inline
- **Swift 6 Compatibility**: Fully migrated to Swift 6 with strict concurrency for maximum reliability
- **Universal Binary**: Supports both Apple Silicon and Intel Macs

### Recent Improvements (from beta releases)
- Fixed critical MCP server error handling for edge cases
- Complete Swift 6 migration with proper async/await patterns
- Enhanced error messages and debugging capabilities
- Improved window matching with fuzzy search
- Better handling of multi-display setups
- Robust permission handling for Screen Recording and Accessibility
- Lowered macOS requirement from 15.0 to 14.0 (Sonoma)

### Requirements
- macOS 14.0 or later (Sonoma)
- Node.js 18 or later
- Screen Recording permission (for capture features)
- Accessibility permission (optional, for foreground window detection)

### Getting Started
```bash
npm install -g @steipete/peekaboo-mcp
```

For detailed documentation, visit: https://github.com/steipete/Peekaboo

## [1.0.0-beta.26] - 2025-01-08

### Changed
- **Lowered macOS requirement from 15.0 to 14.0 (Sonoma)**
  - Analysis showed that all APIs used by Peekaboo are available in macOS 14.0
  - Key APIs: SCScreenshotManager.captureImage, configuration.shouldBeOpaque
  - Makes Peekaboo available to more users who haven't upgraded to Sequoia
  - Updated Package.swift, documentation, and availability annotations

### Fixed
- Fixed TypeScript warning about undefined modelName in AI providers

## [1.0.0] - 2025-01-08

### 🎉 First Stable Release

Peekaboo MCP is now production-ready! This release marks the culmination of extensive development, testing, and refinement to create a robust macOS screen capture and window management tool for AI agents.

### Key Features
- **Advanced Screen Capture**: Capture entire screens, specific windows, or all windows of an application
- **AI-Powered Image Analysis**: Analyze captured or existing images using multiple AI providers (Ollama, OpenAI)
- **Window Management**: List running applications and their windows with detailed metadata
- **Flexible Output Options**: Save to file or return Base64-encoded data inline
- **Swift 6 Compatibility**: Fully migrated to Swift 6 with strict concurrency for maximum reliability
- **Universal Binary**: Supports both Apple Silicon and Intel Macs

### Recent Improvements (from beta releases)
- Fixed critical MCP server error handling for edge cases
- Complete Swift 6 migration with proper async/await patterns
- Enhanced error messages and debugging capabilities
- Improved window matching with fuzzy search
- Better handling of multi-display setups
- Robust permission handling for Screen Recording and Accessibility

### Requirements
- macOS 14.0 or later (Sonoma)
- Node.js 18 or later
- Screen Recording permission (for capture features)
- Accessibility permission (optional, for foreground window detection)

### Getting Started
```bash
npm install -g @steipete/peekaboo-mcp
```

For detailed documentation, visit: https://github.com/steipete/Peekaboo

## [1.0.0-beta.25] - 2025-01-08

### Fixed
- **Critical MCP server error handling**
  - Fixed issue where unexpected errors would cause "No result received" response
  - All tool execution errors now return proper MCP error responses
  - Handles edge cases with special characters in tool parameters gracefully
  - Prevents server from silently failing on unexpected exceptions

## [1.0.0-beta.24] - 2025-01-08

### Changed
- **Complete Swift 6 migration with strict concurrency**
  - Migrated to Swift 6.0 toolchain with StrictConcurrency enabled
  - All data models and types now conform to Sendable protocol
  - Replaced AsyncParsableCommand with ParsableCommand + async adapter pattern
  - Implemented proper async/sync bridging using DispatchSemaphore for ArgumentParser compatibility
  - Fixed CLI execution issue where commands were showing help instead of executing

### Improved
- Enhanced thread safety with @unchecked Sendable for synchronized state
- Better separation of concerns between async operations and CLI interface
- More robust error handling in async contexts

## [1.0.0-beta.23] - 2025-01-08

### Changed
- Initial Swift 6 migration attempt (had execution issues, fixed in beta.24)

## [1.0.0-beta.22] - 2025-01-08

### Fixed
- **Critical deadlock fix in Swift CLI image capture**
  - Removed DispatchSemaphore usage that violated Swift concurrency rules and caused infinite hangs
  - Implemented RunLoop-based async-to-sync bridging for proper concurrency handling
  - Converted all capture methods to async/await patterns while maintaining CLI compatibility
  - Replaced Thread.sleep with Task.sleep in async contexts
  - Fixed test timeouts by eliminating blocking operations
  - No macOS version requirements added - solution uses standard Foundation APIs

### Added
- **Smart browser helper filtering for improved Chrome/Safari matching**
  - Automatically filters out browser helper processes when searching for common browsers (chrome, safari, firefox, edge, brave, arc, opera)
  - Prevents confusing "no capturable windows" errors when helper processes like "Google Chrome Helper (Renderer)" are matched instead of the main browser
  - Provides browser-specific error messages: "Chrome browser is not running or not found" instead of generic app not found errors
  - Only applies filtering to browser identifiers - other application searches work normally
  - Comprehensive test coverage for browser filtering scenarios

- **Proper frontmost window capture implementation**
  - Added dedicated `frontmost` capture mode that captures the frontmost window of the frontmost application
  - Replaces previous fallback behavior that incorrectly captured all screens
  - Uses `NSWorkspace.shared.frontmostApplication` to detect the currently active application
  - Returns exactly one image with proper metadata (app name, window title, window ID)
  - Generates descriptive filenames like `frontmost_Safari_20250608_083230.png`

### Fixed
- **List tool empty string parameter handling**
  - Fixed issue where `item_type: ""` was not properly defaulting to the correct operation
  - Empty strings and whitespace-only strings now fall back to proper default logic
  - Added comprehensive test coverage for edge cases

## [1.0.0-beta.21] - 2025-06-08

### Security
- **Critical security fix for malformed app targets**
  - Fixed vulnerability where malformed app targets with multiple leading colons (e.g., "::::::::::::::::Finder") created empty app names that would match ALL system processes
  - Enhanced input validation to prevent unintended broad process matching
  - Added defensive parsing logic with fallback to screen mode for invalid inputs
  - Comprehensive test coverage for edge cases and malformed inputs

### Changed
- **Multiple exact app matches now capture all windows instead of erroring**
  - When multiple applications have exact matches (e.g., "claude" and "Claude"), the system now captures all windows from all matching applications
  - This replaces the previous behavior of throwing an ambiguous match error
  - Window indices are sequential across all matched applications
  - Each saved file preserves the original application name in `item_label`
  - Only truly ambiguous fuzzy matches still return errors
  - Comprehensive test coverage for various multiple match scenarios

### Fixed
- **Enhanced error handling and user experience**
  - Improved window title matching error messages with available window titles and URL guidance
  - Fixed path traversal error reporting to show correct file system errors instead of permission errors
  - Added case-insensitive handling for window specifiers (WINDOW_TITLE, window_title, etc.)
  - Enhanced backward compatibility with hidden path parameters in analyze tool
- **Format validation improvements**
  - Added defensive format validation with automatic PNG fallback for invalid formats
  - Improved file extension correction when format is changed
  - Better handling of edge cases in image processing

## [1.0.0-beta.20] - 2025-06-08

### Added
- **Window count display optimization**: Single-window apps no longer show "Windows: 1" in list output ([#6](https://github.com/steipete/Peekaboo/pull/6))
  - Reduces visual clutter for the common case of apps with only one window
  - Apps with 0, 2, or more windows still display the count
  - Improves readability of the `list apps` command output
- **Timeout handling for Swift CLI operations** ([#2](https://github.com/steipete/Peekaboo/pull/2))
  - Prevents test suite and operations from hanging indefinitely
  - Default timeout of 30 seconds, configurable via `PEEKABOO_CLI_TIMEOUT` environment variable
  - Graceful process termination with SIGTERM followed by SIGKILL if needed
  - Clear timeout error messages indicating when operations exceed time limits

### Fixed
- **Input validation improvements**:
  - Whitespace is now trimmed from `app_target` parameter (e.g., `"   Spotify   "` now works correctly)
  - Format parameter is now case-insensitive (`"PNG"` and `"png"` both work)
  - Added support for `"jpeg"` as an alias for `"jpg"` format
- **Edge case handling**:
  - Float and hex screen indices now parse correctly (e.g., `screen:1.5` → `screen:1`, `screen:0x1` → `screen:0`)
  - Special filesystem characters (|, :, *) in filenames are preserved as-is
  - Empty questions to analyze tool are handled gracefully (analysis is skipped)
- **Swift error handling improvements**:
  - Fixed CaptureError enum compatibility issues in tests
  - Improved error messages with better context for ApplicationFinder errors
- Fixed overly broad permission error detection that incorrectly reported file I/O errors as screen recording permission issues
  - File permission errors (e.g., writing to `/System/`) now correctly report as `FILE_IO_ERROR`
  - Directory not found errors provide clear messages about missing parent directories
  - Added specific error code checking for ScreenCaptureKit and CoreGraphics APIs
  - Only errors containing both "permission" and capture-related terms are now considered screen recording issues
- Enhanced file write error handling with pre-emptive directory checks
- Added debug logging to permission checker for diagnosing intermittent failures
- Improved error propagation from deep system APIs
  - Underlying errors from ScreenCaptureKit and file operations are now captured and logged
  - Debug logs include full error details for better troubleshooting
  - Error messages include the original system error descriptions
- Fixed duplicate error output when ApplicationFinder throws errors
- Enhanced error details for app not found errors to include list of available applications
- Removed complex multi-JSON parsing logic from TypeScript that was only needed due to duplicate error output
- Fixed all test assertions to match the new `executeSwiftCli` signature with timeout parameter

## [1.0.0-beta.19] - 2025-06-08

### Added
- Automatic format fallback for screen captures to prevent JavaScript stack overflow errors
  - When `format: "data"` is specified for screen captures, the tool automatically falls back to PNG format
  - A warning message is included in the response explaining why the fallback occurred
  - Application window captures can still use `format: "data"` without restrictions
  - This prevents agents from encountering "Maximum call stack size exceeded" errors when capturing screens
- Invalid format values now automatically fall back to PNG instead of returning an error
  - Empty strings, null values, and unrecognized format values are converted to PNG
  - This provides a better user experience by gracefully handling invalid inputs
- Enhanced error messages for ambiguous application identifiers
  - When multiple applications match an identifier (e.g., "C" matches Calendar, Console, and Cursor), the error message now lists all matching applications with their bundle IDs
  - This helps users quickly identify the correct application name to use
  - Applies to both `image` and `list` tools

## [1.0.0-beta.18] - 2025-06-08

### Added
- Fuzzy matching for application names using Levenshtein distance algorithm
  - Typos like "Chromee" now correctly match "Google Chrome"
  - Common misspellings are handled intelligently (e.g., "Finderr" → "Finder")
  - Multi-word app names are matched word-by-word for better accuracy
- Smart error messages that suggest similar app names when no exact match is found
- Window-specific labels in analysis results when capturing multiple windows
  - Shows window titles instead of repeating app names
  - Example: 'Analysis for "MCP Inspector":' instead of "Analysis for Google Chrome"

### Fixed
- Error messages now show specific details instead of generic "unknown error"
  - Non-existent apps show: "No running applications found matching identifier: AppName"
  - Properly parses Swift CLI JSON error responses
- Fixed test failures related to error message format changes

### Changed
- Improved application matching scoring to prefer main apps over helper processes
- Enhanced TypeScript error handling to parse JSON responses even on non-zero exit codes

## [1.0.0-beta.21] - 2025-01-10

### Fixed
- The `list` tool no longer returns a generic "unknown error" when a non-existent `app` is specified. It now returns a clear error message: `"List operation failed: The specified application ('AppName') is not running or could not be found."`, improving usability and error diagnosis.

## [1.0.0-beta.20] - 2025-01-09

### Changed
- Improved error message for the `image` tool. When an `app_target` is specified for a running application that has no visible windows, the tool now returns a specific error (`"Image capture failed: The 'AppName' process is running, but it has no capturable windows..."`) instead of a generic "window not found" error. This provides clearer feedback and suggests using `capture_focus: 'foreground'` as a remedy.

## [1.0.0-beta.19] - 2025-01-08

### Changed
- The `image` tool's behavior has been updated. When a `question` is provided for analysis and no `path` is specified, the tool now preserves the captured image(s) in their temporary directory instead of deleting them. The paths to these saved files are now correctly returned in the `saved_files` array, making them accessible after the tool run completes.

## [1.0.0-beta.18] - 2025-01-08

### Fixed
- Fixed a bug where providing an empty string for the `capture_focus` parameter in the `image` tool would cause a validation error. The schema now correctly handles this case and applies the default value ('background'), making the parameter truly optional.

## [1.0.0-beta.17] - 2025-01-08

### Added
- The `image` tool's analysis capability has been significantly enhanced. When a capture results in multiple images (e.g., targeting an application with multiple windows) and a `question` is provided, the tool will now perform an AI analysis for **every single captured image**.
- The analysis results are returned in a single, clearly formatted text block, with each window's analysis presented under a descriptive header.

## [1.0.0-beta.16] - 2025-01-08

### Enhanced
- **Smart Path Handling**: The Swift CLI now intelligently detects whether a provided path is intended as a file or directory:
  - **File paths** (with extensions): Uses exact path for single screen captures, appends screen identifiers for multiple captures
  - **Directory paths** (no extension or trailing `/`): Places generated filenames inside the directory
  - **Auto-Creation**: Automatically creates intermediate directories as needed for both file and directory paths
  - **Edge Cases**: Properly handles special directory indicators (`.`, `..`), hidden files, unicode characters, and paths with spaces

### Improved
- **Enhanced Error Messages**: File write errors now provide detailed, actionable guidance:
  - Permission denied errors include specific directory permission checks
  - Missing directory errors suggest ensuring parent directories exist
  - Disk space errors clearly indicate insufficient storage
  - Generic I/O errors include underlying system error details

### Added
- **Comprehensive Test Coverage**: Added 52+ new tests covering path handling, error scenarios, and edge cases
- **Path Logic Validation**: Tests for file vs directory detection, multiple format support, and special character handling

### Fixed
- Fixed original issue where `/tmp/screenshot.png` was incorrectly treated as a directory instead of a filename
- Improved file extension preservation when appending screen/window identifiers to filenames
- Enhanced path validation for complex nested directory structures

## [1.0.0-beta.15] - 2025-01-08

### Improved
- The `list` tool is now more lenient. `item_type` is optional and defaults to `running_applications`. If an `app` is specified without an `item_type`, it intelligently defaults to `application_windows`.

### Fixed
- Fixed a bug where the `list` tool would crash if called with an empty `item_type`.
- Fixed a bug where the `image` tool would fail silently if no path was provided, resulting in a generic "Failed to write file" error. The logic for handling temporary paths is now more robust.

## [1.0.0-beta.14] - 2025-01-08

### Added
- Enhanced test host application with real-time permission status display and CLI availability checking
- Comprehensive test coverage improvements with proper Swift Testing patterns
- Local test execution framework with detailed setup instructions

### Improved
- Swift code quality: Fixed all SwiftLint violations (reduced from 31 to 0 serious violations)
- Test stability: Resolved Swift test compilation errors and improved test reliability
- Code organization: Refactored ImageCommand.swift for better readability and maintainability
- Documentation: Enhanced CLAUDE.md and release documentation with proper testing procedures

### Fixed
- JSON encoding/decoding issues in tests by removing unnecessary snake_case conversions
- Window title validation expectations for system windows without titles
- Swift Testing syntax errors and compiler warnings
- Function and file length violations through strategic refactoring

## [1.0.0-beta.13] - 2025-01-08

### Added
- Comprehensive local-only test framework for testing actual screenshot functionality
- SwiftUI test host application for controlled testing environment
- Screenshot validation tests including content validation and visual regression
- Performance benchmarking tests for capture operations
- Multi-display capture tests
- Test infrastructure for permission dialog testing

### Improved
- The `list` tool with `item_type: 'running_applications'` now intelligently filters its results to only show applications that have one or more windows. This provides a cleaner, more relevant list for a screenshot utility by default, hiding background processes that have no user interface.
- Test coverage with local-only tests that can validate actual capture functionality
- Test organization with new tags: `localOnly`, `screenshot`, `multiWindow`, `focus`

### Fixed
- Fixed a bug where calling the `image` tool without any arguments would incorrectly result in a "Failed to write to file" error. The tool now correctly creates and uses a temporary file, returning the capture as Base64 data as intended.
- The `list` tool's input validation is now more lenient. It will no longer error when an empty `include_window_details: []` array is provided for an `item_type` other than `application_windows`.

## [1.0.0-beta.12] - 2025-01-08

### Added
- Comprehensive Swift Testing framework adoption with enhanced test coverage
- New test files for JSON output validation, logger thread safety, and image capture logic
- Centralized test tagging system for better test organization

### Improved
- CI/CD pipeline now uses macOS-15 runner with Xcode 16.3
- Swift CLI is now built before TypeScript tests to fix integration test failures
- Applied SwiftFormat to all Swift files for consistent code style
- Fixed all SwiftLint violations (31 issues resolved) achieving zero linting issues
- Enhanced thread safety in Logger implementation
- Optimized tests with parameterized testing and async/await patterns

### Fixed
- Fixed a bug where calling the `image` tool without a `path` argument would incorrectly result in a "Failed to write to file" error. The tool now correctly captures the image to a temporary location and returns the image data as Base64, as intended by the specification.
- Fixed Swift test compilation errors with proper Swift Testing syntax
- Fixed TypeScript test expectations after error message improvements
- Resolved CI integration test failures by ensuring Swift CLI availability

## [1.0.0-beta.11] - 2025-01-06

### Improved
- Greatly enhanced error handling for the `image` tool. The Swift CLI now returns distinct exit codes for different error conditions, such as missing Screen Recording or Accessibility permissions, instead of a generic failure code.
- The Node.js server now maps these specific exit codes to clear, user-friendly error messages, guiding the user on how to resolve the issue (e.g., "Screen Recording permission is not granted. Please enable it in System Settings...").
- This replaces the previous generic "Swift CLI execution failed" error, providing a much better user experience, especially during initial setup and permission granting.

## [1.0.0-beta.10] - 2024-07-28

### 🎉 Major Improvements
- **Full MCP Best Practices Compliance**: Implemented all requirements from the MCP best practices guide
- **Enhanced Info Command**: The `server_status` option in the list tool now provides comprehensive diagnostics including:
  - Native binary (Swift CLI) status and version
  - System permissions (screen recording, accessibility)
  - Environment configuration and potential issues
  - Log file accessibility checks
- **Dynamic Version Injection**: Swift CLI version is now automatically synchronized with package.json during build
- **Improved Code Quality**:
  - Split large image.ts (472 lines) into smaller, focused modules (<250 lines each)
  - Added ESLint configuration with TypeScript support
  - Fixed all critical linting errors and reduced warnings
  - Improved TypeScript types throughout the codebase

### 🔧 Changed
- Default log path updated to `~/Library/Logs/peekaboo-mcp.log` (macOS standard location)
- Updated macOS requirement to v14+ (Sonoma) for better compatibility
- Pino logger now falls back to temp directory if configured path is not writable
- LICENSE and README.md now included in npm package

### 🐛 Fixed
- Swift CLI version synchronization with npm package
- ESLint errors for unused variables and improper types
- Test setup converted from Jest to Vitest syntax
- All trailing spaces and formatting issues

### 📦 Development
- Added Swift compiler warning checks in release preparation
- Enhanced prepare-release script with comprehensive validation
- Added `npm run inspector` for MCP inspector tool

## [1.0.0-beta.9] - 2025-01-25

### 🔧 Changed
- Updated server status formatting to improve readability

## [1.0.0-beta.3] - 2025-01-21

### Added
- Enhanced `image` tool to support optional immediate analysis of the captured screenshot by providing a `question` and `provider_config`.
  - If a `question` is given and no `path` is specified, the image is saved to a temporary location and deleted after analysis.
  - If a `question` is given, Base64 image data is not returned in the `content` array; the analysis result becomes the primary payload, alongside image metadata.

### Changed
- Migrated test runner from Jest to Vitest.
- Updated documentation (`README.md`, `docs/spec.md`) to reflect new `image` tool capabilities.

## [1.0.0-beta.2] - Previous Release Date

### Fixed
- (Summarize fixes from beta.2 if known, otherwise remove or mark as TBD)

### Added
- Initial E2E tests for CLI image capture.

## [1.0.0-beta.8] - 2025-01-25

### 🔧 Changed
- Updated server status formatting

## [1.0.0-beta.7] - 2025-01-25

### 🔧 Changed
- Minor updates and improvements

## [1.0.0-beta.6] - 2025-01-25

### 📝 Changed
- Updated tool descriptions for better clarity

## [1.0.0-beta.5] - 2025-01-25

### 🔄 Changed
- Version bump for npm release (beta.4 was already published)

## [1.0.0-beta.4] - 2025-01-25

### ✨ Added
- Comprehensive Swift unit tests for all CLI components
- Release preparation script with extensive validation checks
- Swift code linting and formatting with SwiftLint and SwiftFormat
- Enhanced image tool with blur detection, custom formats (PNG/JPG), and naming patterns
- Robust error handling for Swift CLI integration

### 🐛 Fixed
- Swift CLI integration tests now properly handle error output
- Fixed Swift code to comply with SwiftLint rules
- Corrected JSON structure expectations in tests

### 📚 Changed
- Updated all dependencies to latest versions
- Improved test coverage for both TypeScript and Swift code
- Enhanced release process with automated checks
- Swift CLI `image` command: Added `--screen-index <Int>` option to capture a specific display when `--mode screen` is used
- MCP `image` tool: Now fully supports `app_target: "screen:INDEX"` by utilizing the Swift CLI's new `--screen-index` capability

### ♻️ Changed

- **MCP `image` tool API significantly simplified:**
    - Replaced `app`, `mode`, and `window_specifier` parameters with a single `app_target` string (e.g., `"AppName"`, `"AppName:WINDOW_TITLE:Title"`, `"screen:0"`).
    - `format` parameter now includes `"data"` option to return Base64 PNG data directly. If `path` is also given with `format: "data"`, file is saved (as PNG) AND data is returned.
    - If `path` is omitted, `image` tool now defaults to `format: "data"` behavior (returns Base64 PNG data).
    - `