Peekaboo/README.md
Peter Steinberger f746dc45c2 Add docs
2025-05-23 05:39:36 +02:00

425 lines
No EOL
11 KiB
Markdown

# Peekaboo MCP Server
A macOS utility exposed via Node.js MCP server for advanced screen captures, image analysis, and window management.
## 🚀 Installation & Setup
### Prerequisites
Before installing Peekaboo, ensure your system meets these requirements:
**System Requirements:**
- **macOS 12.0+** (Monterey or later)
- **Node.js 18.0+**
- **Swift 5.7+** (for building the native CLI)
- **Xcode Command Line Tools**
**Install Prerequisites:**
```bash
# Install Node.js (if not already installed)
brew install node
# Install Xcode Command Line Tools (if not already installed)
xcode-select --install
```
### Installation Methods
#### Method 1: NPM Installation (Recommended)
```bash
# Install globally for system-wide access
npm install -g peekaboo-mcp
# Or install locally in your project
npm install peekaboo-mcp
```
#### Method 2: From Source
```bash
# Clone the repository
git clone https://github.com/yourusername/peekaboo.git
cd peekaboo
# Install Node.js dependencies
npm install
# Build the TypeScript server
npm run build
# Build the Swift CLI component
cd swift-cli
swift build -c release
# Copy the binary to the project root
cp .build/release/peekaboo ../peekaboo
# Return to project root
cd ..
# Optional: Link for global access
npm link
```
### 🔧 Configuration
#### Environment Setup
Create a `.env` file in your project or set environment variables:
```bash
# AI Provider Configuration (Optional)
AI_PROVIDERS='[
{
"type": "ollama",
"baseUrl": "http://localhost:11434",
"model": "llava",
"enabled": true
},
{
"type": "openai",
"apiKey": "your-openai-api-key",
"model": "gpt-4-vision-preview",
"enabled": false
}
]'
# Logging Configuration
LOG_LEVEL="INFO"
PEEKABOO_LOG_FILE="/tmp/peekaboo-mcp.log"
# Optional: Custom paths for screenshots
PEEKABOO_DEFAULT_SAVE_PATH="~/Pictures/Screenshots"
```
#### MCP Server Configuration
Add Peekaboo to your MCP client configuration:
**For Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_config.json`):**
```json
{
"mcpServers": {
"peekaboo": {
"command": "peekaboo-mcp",
"args": [],
"env": {
"AI_PROVIDERS": "[{\"type\":\"ollama\",\"baseUrl\":\"http://localhost:11434\",\"model\":\"llava\",\"enabled\":true}]"
}
}
}
}
```
**For other MCP clients:**
```json
{
"server": {
"command": "node",
"args": ["/path/to/peekaboo/dist/index.js"],
"env": {
"AI_PROVIDERS": "[{\"type\":\"ollama\",\"baseUrl\":\"http://localhost:11434\",\"model\":\"llava\",\"enabled\":true}]"
}
}
}
```
### 🔐 Permissions Setup
Peekaboo requires specific macOS permissions to function properly:
#### 1. Screen Recording Permission
**Grant permission via System Preferences:**
1. Open **System Preferences****Security & Privacy****Privacy**
2. Select **Screen Recording** from the left sidebar
3. Click the **lock icon** and enter your password
4. Click **+** and add your terminal application or MCP client
5. Restart the application
**For common applications:**
- **Terminal.app**: `/Applications/Utilities/Terminal.app`
- **Claude Desktop**: `/Applications/Claude.app`
- **VS Code**: `/Applications/Visual Studio Code.app`
#### 2. Accessibility Permission (Optional)
For advanced window management features:
1. Open **System Preferences****Security & Privacy****Privacy**
2. Select **Accessibility** from the left sidebar
3. Add your terminal/MCP client application
### ✅ Verification
Test your installation:
```bash
# Test the Swift CLI directly
./peekaboo --help
# Test server status
./peekaboo list server_status --json-output
# Test screen capture (requires permissions)
./peekaboo image --mode screen --format png
# Start the MCP server for testing
peekaboo-mcp
```
**Expected output for server status:**
```json
{
"success": true,
"data": {
"swift_cli_available": true,
"permissions": {
"screen_recording": true
},
"system_info": {
"macos_version": "14.0"
}
}
}
```
### 🎯 Quick Start
Once installed and configured:
1. **Capture Screenshot:**
```bash
peekaboo-mcp
# In your MCP client: "Take a screenshot of my screen"
```
2. **List Applications:**
```bash
# In your MCP client: "Show me all running applications"
```
3. **Analyze Screenshot:**
```bash
# In your MCP client: "Take a screenshot and tell me what's on my screen"
```
### 🐛 Troubleshooting
**Common Issues:**
| Issue | Solution |
|-------|----------|
| `Permission denied` errors | Grant Screen Recording permission in System Preferences |
| `Swift CLI unavailable` | Rebuild Swift CLI: `cd swift-cli && swift build -c release` |
| `AI analysis failed` | Check AI provider configuration and network connectivity |
| `Command not found: peekaboo-mcp` | Run `npm link` or check global npm installation |
**Debug Mode:**
```bash
# Enable verbose logging
LOG_LEVEL=DEBUG peekaboo-mcp
# Check permissions
./peekaboo list server_status --json-output
```
**Get Help:**
- 📚 [Documentation](./docs/)
- 🐛 [Issues](https://github.com/yourusername/peekaboo/issues)
- 💬 [Discussions](https://github.com/yourusername/peekaboo/discussions)
---
## 🛠️ Available Tools
Once installed, Peekaboo provides three powerful MCP tools:
### 📸 `peekaboo.image` - Screen Capture
**Parameters:**
- `mode`: `"screen"` | `"window"` | `"multi"` (default: "screen")
- `app`: Application identifier for window/multi modes
- `path`: Custom save path (optional)
**Example:**
```json
{
"name": "peekaboo.image",
"arguments": {
"mode": "window",
"app": "Safari"
}
}
```
### 📋 `peekaboo.list` - Application Listing
**Parameters:**
- `item_type`: `"running_applications"` | `"application_windows"` | `"server_status"`
- `app`: Application identifier (required for application_windows)
**Example:**
```json
{
"name": "peekaboo.list",
"arguments": {
"item_type": "running_applications"
}
}
```
### 🧩 `peekaboo.analyze` - AI Analysis
**Parameters:**
- `image_path`: Absolute path to image file
- `question`: Question/prompt for AI analysis
**Example:**
```json
{
"name": "peekaboo.analyze",
"arguments": {
"image_path": "/tmp/screenshot.png",
"question": "What applications are visible in this screenshot?"
}
}
```
## 🎯 Key Features
### Screen Capture
- **Multi-display support**: Captures each display separately
- **Window targeting**: Intelligent app/window matching with fuzzy search
- **Format flexibility**: PNG, JPEG, WebP, HEIF support
- **Automatic naming**: Timestamps and descriptive filenames
- **Permission handling**: Automatic screen recording permission checks
### Application Management
- **Running app enumeration**: Complete system application listing
- **Window discovery**: Per-app window enumeration with metadata
- **Fuzzy matching**: Find apps by partial name, bundle ID, or PID
- **Real-time status**: Active/background status, window counts
### AI Integration
- **Provider agnostic**: Support for Ollama, OpenAI, and other providers
- **Image analysis**: Natural language querying of captured content
- **Configurable**: Environment-based provider selection
## 🏛️ Project Structure
```
Peekaboo/
├── src/ # Node.js MCP Server (TypeScript)
│ ├── index.ts # Main MCP server entry point
│ ├── tools/ # Individual tool implementations
│ │ ├── image.ts # Screen capture tool
│ │ ├── analyze.ts # AI analysis tool
│ │ └── list.ts # Application/window listing
│ ├── utils/ # Utility modules
│ │ ├── swift-cli.ts # Swift CLI integration
│ │ ├── ai-providers.ts # AI provider management
│ │ └── server-status.ts # Server status utilities
│ └── types/ # Shared type definitions
├── swift-cli/ # Native Swift CLI
│ └── Sources/peekaboo/ # Swift source files
│ ├── main.swift # CLI entry point
│ ├── ImageCommand.swift # Image capture implementation
│ ├── ListCommand.swift # Application listing
│ ├── Models.swift # Data structures
│ ├── ApplicationFinder.swift # App discovery logic
│ ├── WindowManager.swift # Window management
│ ├── PermissionsChecker.swift # macOS permissions
│ └── JSONOutput.swift # JSON response formatting
├── package.json # Node.js dependencies
├── tsconfig.json # TypeScript configuration
└── README.md # This file
```
## 🔧 Technical Details
### Swift CLI JSON Output
The Swift CLI outputs structured JSON when called with `--json-output`:
```json
{
"success": true,
"data": {
"applications": [
{
"app_name": "Safari",
"bundle_id": "com.apple.Safari",
"pid": 1234,
"is_active": true,
"window_count": 2
}
]
},
"debug_logs": ["Found 50 applications"]
}
```
### MCP Integration
The Node.js server translates between MCP's JSON-RPC protocol and the Swift CLI's JSON output, providing:
- **Schema validation** via Zod
- **Error handling** with proper MCP error codes
- **Logging** via Pino logger
- **Type safety** throughout the TypeScript codebase
### Permission Model
Peekaboo respects macOS security by:
- **Checking screen recording permissions** before capture operations
- **Graceful degradation** when permissions are missing
- **Clear error messages** guiding users to grant required permissions
## 🧪 Testing
### Manual Testing
```bash
# Test Swift CLI directly
./peekaboo list apps --json-output | head -20
# Test MCP integration
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/list"}' | node dist/index.js
# Test image capture
echo '{"jsonrpc": "2.0", "id": 2, "method": "tools/call", "params": {"name": "peekaboo.image", "arguments": {"mode": "screen"}}}' | node dist/index.js
```
### Automated Testing
```bash
# TypeScript compilation
npm run build
# Swift compilation
cd swift-cli && swift build
```
## 🐛 Known Issues
- **FileHandle warning**: Non-critical Swift warning about TextOutputStream conformance
- **AI Provider Config**: Requires `AI_PROVIDERS` environment variable for analysis features
## 🚀 Future Enhancements
- [ ] **OCR Integration**: Built-in text extraction from screenshots
- [ ] **Video Capture**: Screen recording capabilities
- [ ] **Annotation Tools**: Drawing/markup on captured images
- [ ] **Cloud Storage**: Direct upload to cloud providers
- [ ] **Hotkey Support**: System-wide keyboard shortcuts
## 📄 License
MIT License - see LICENSE file for details.
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
---
**🎉 Peekaboo is ready to use!** The project successfully combines the power of native macOS APIs with modern Node.js tooling to create a comprehensive screen capture and analysis solution.