vibetunnel/mac/docs/screencap.md
Helmut Januschka f3b2022d48
Integrate screencap functionality for remote screen sharing (#209)
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2025-07-06 03:31:34 +01:00

474 lines
No EOL
17 KiB
Markdown

# Screen Capture (Screencap) Feature
## Overview
VibeTunnel's screen capture feature allows users to share their Mac screen and control it remotely through a web browser. The implementation uses WebRTC for high-performance video streaming with low latency and WebSocket/UNIX socket for secure control messages.
## Architecture
### Architecture Diagram
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Browser │ │ Server │ │ Mac App │
│ (Client) │ │ (Port 4020) │ │ (VibeTunnel)│
└─────┬───────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ 1. Connect WebSocket │ │
├───────────────────────────────────►│ │
│ /ws/screencap-signal (auth) │ │
│ │ │
│ │ 2. Connect UNIX Socket │
│ │◄──────────────────────────────────┤
│ │ ~/.vibetunnel/screencap.sock │
│ │ │
│ 3. Request window list │ │
├───────────────────────────────────►│ 4. Forward request │
│ {type: 'api-request', ├──────────────────────────────────►│
│ method: 'GET', │ │
│ endpoint: '/processes'} │ │
│ │ │
│ │ 5. Return window data │
│ 6. Receive window list │◄──────────────────────────────────┤
│◄───────────────────────────────────┤ {type: 'api-response', │
│ │ result: [...]} │
│ │ │
│ 7. Start capture request │ │
├───────────────────────────────────►│ 8. Forward to Mac │
│ ├──────────────────────────────────►│
│ │ │
│ │ 9. WebRTC Offer │
│ 10. Receive Offer │◄──────────────────────────────────┤
│◄───────────────────────────────────┤ │
│ │ │
│ 11. Send Answer │ 12. Forward Answer │
├───────────────────────────────────►├──────────────────────────────────►│
│ │ │
│ 13. Exchange ICE candidates │ (Server relays ICE) │
│◄──────────────────────────────────►│◄─────────────────────────────────►│
│ │ │
│ │ │
│ 14. WebRTC P2P Connection Established │
│◄═══════════════════════════════════════════════════════════════════════►│
│ (Direct video stream, no server involved) │
│ │ │
│ 15. Mouse/Keyboard events │ 16. Forward events │
├───────────────────────────────────►├──────────────────────────────────►│
│ {type: 'api-request', │ │
│ method: 'POST', │ │
│ endpoint: '/click'} │ │
│ │ │
```
### Components
1. **ScreencapService** (`mac/VibeTunnel/Core/Services/ScreencapService.swift`)
- Singleton service that manages screen capture functionality
- Uses ScreenCaptureKit for capturing screen/window content
- Manages capture sessions and processes video frames
- Provides API endpoints for window/display enumeration and control
- Supports process grouping with app icons
2. **WebRTCManager** (`mac/VibeTunnel/Core/Services/WebRTCManager.swift`)
- Manages WebRTC peer connections
- Handles signaling via UNIX socket (not WebSocket)
- Processes video frames from ScreenCaptureKit
- Supports H.264 and VP8 video codecs (VP8 prioritized for compatibility)
- Implements session-based security for control operations
- Adaptive bitrate control (1-50 Mbps) based on network conditions
- Supports 4K and 8K quality modes
3. **Web Frontend** (`web/src/client/components/screencap-view.ts`)
- LitElement-based UI for screen capture
- WebRTC client for receiving video streams
- API client for controlling capture sessions
- Session management for secure control operations
- Touch support for mobile devices
4. **UNIX Socket Handler** (`web/src/server/websocket/screencap-unix-handler.ts`)
- Manages UNIX socket at `~/.vibetunnel/screencap.sock`
- Facilitates WebRTC signaling between Mac app and browser
- Routes API requests between browser and Mac app
- No authentication needed for local UNIX socket
### Communication Flow
```
Browser <--WebSocket--> Node.js Server <--UNIX Socket--> Mac App
<--WebRTC P2P--------------------------------->
```
1. Browser connects to `/ws/screencap-signal` with JWT auth
2. Mac app connects via UNIX socket at `~/.vibetunnel/screencap.sock`
3. Browser requests screen capture via API
4. Mac app creates WebRTC offer and sends through signaling
5. Browser responds with answer
6. P2P connection established for video streaming
## Features
### Capture Modes
- **Desktop Capture**: Share entire display(s)
- **Window Capture**: Share specific application windows
- **Multi-display Support**: Handle multiple monitors (-1 index for all displays)
- **Process Grouping**: View windows grouped by application with icons
### Security Model
#### Authentication Flow
1. **Browser → Server**: JWT token in WebSocket connection
2. **Mac App → Server**: Local UNIX socket connection (no auth needed - local only)
3. **No Direct Access**: All communication goes through server relay
#### Session Management
- Each capture session has unique ID for security
- Session IDs are generated by the browser client
- Control operations (click, key, capture) require valid session
- Session is validated on each control operation
- Session is cleared when capture stops
#### Eliminated Vulnerabilities
Previously, the Mac app ran an HTTP server on port 4010:
```
❌ OLD: Browser → HTTP (no auth) → Mac App:4010
✅ NEW: Browser → WebSocket (auth) → Server → UNIX Socket → Mac App
```
This eliminates:
- Unauthenticated local access
- CORS vulnerabilities
- Open port exposure
### Video Quality
- **Codec Support**:
- VP8 (prioritized for browser compatibility)
- H.264/AVC (secondary)
- **Resolution Options**:
- 4K (3840x2160) - Default
- 8K (7680x4320) - Optional high quality mode
- **Frame Rate**: 60 FPS target
- **Adaptive Bitrate**:
- Starts at 40 Mbps
- Adjusts between 1-50 Mbps based on:
- Packet loss (reduces bitrate if > 2%)
- Round-trip time (reduces if > 150ms)
- Network conditions (increases in good conditions)
- **Hardware Acceleration**: Uses VideoToolbox for efficient encoding
- **Low Latency**: < 50ms typical latency
## Message Protocol
### API Request/Response
Browser Server Mac:
```json
{
"type": "api-request",
"requestId": "uuid",
"method": "GET|POST",
"endpoint": "/processes|/displays|/capture|/click|/key",
"params": { /* optional */ },
"sessionId": "session-uuid"
}
```
Mac Server Browser:
```json
{
"type": "api-response",
"requestId": "uuid",
"result": { /* success data */ },
"error": "error message if failed"
}
```
### WebRTC Signaling
Standard WebRTC signaling messages:
- `start-capture`: Initiate screen sharing
- `offer`: SDP offer from Mac
- `answer`: SDP answer from browser
- `ice-candidate`: ICE candidate exchange
- `mac-ready`: Mac app ready for capture
## API Endpoints (via WebSocket)
All API requests are sent through the WebSocket connection as `api-request` messages:
### GET /displays
Returns list of available displays:
```json
{
"displays": [
{
"id": "NSScreen-1",
"width": 1920,
"height": 1080,
"scaleFactor": 2.0,
"name": "Built-in Display"
}
]
}
```
### GET /processes
Returns process groups with windows and app icons:
```json
{
"processes": [
{
"name": "Terminal",
"pid": 456,
"icon": "base64-encoded-icon",
"windows": [
{
"cgWindowID": 123,
"title": "Terminal — bash",
"ownerName": "Terminal",
"ownerPID": 456,
"x": 0, "y": 0,
"width": 1920, "height": 1080,
"isOnScreen": true
}
]
}
]
}
```
### POST /capture
Starts desktop capture:
```json
// Request
{
"type": "desktop",
"index": 0, // Display index or -1 for all displays
"webrtc": true,
"use8k": false
}
// Response
{
"status": "started",
"type": "desktop",
"webrtc": true,
"sessionId": "uuid"
}
```
### POST /capture-window
Starts window capture:
```json
// Request
{
"cgWindowID": 123,
"webrtc": true,
"use8k": false
}
// Response
{
"status": "started",
"cgWindowID": 123,
"webrtc": true,
"sessionId": "uuid"
}
```
### POST /stop
Stops capture and clears session:
```json
{
"status": "stopped"
}
```
### POST /click, /mousedown, /mouseup, /mousemove
Sends mouse events (requires session):
```json
{
"x": 500, // 0-1000 normalized range
"y": 500 // 0-1000 normalized range
}
```
### POST /key
Sends keyboard events (requires session):
```json
{
"key": "a",
"metaKey": false,
"ctrlKey": false,
"altKey": false,
"shiftKey": true
}
```
### GET /frame
Get current frame as JPEG (for non-WebRTC mode):
```json
{
"frame": "base64-encoded-jpeg"
}
```
## Implementation Details
### UNIX Socket Connection
The Mac app connects to the server via UNIX socket instead of WebSocket:
1. **Socket Path**: `~/.vibetunnel/screencap.sock`
2. **Shared Connection**: Uses `SharedUnixSocketManager` for socket management
3. **Message Routing**: Messages are routed between browser WebSocket and Mac UNIX socket
4. **No Authentication**: Local UNIX socket doesn't require authentication
### WebRTC Implementation
1. **Video Processing**:
- `processVideoFrameSync` method handles CMSampleBuffer without data races
- Frames are converted to RTCVideoFrame with proper timestamps
- First frame and periodic frames are logged for debugging
2. **Codec Configuration**:
- VP8 is prioritized over H.264 in SDP for better compatibility
- Bandwidth constraints added to SDP (b=AS:bitrate)
- Codec reordering happens during peer connection setup
3. **Stats Monitoring**:
- Stats collected every 2 seconds when connected
- Monitors packet loss, RTT, and bytes sent
- Automatically adjusts bitrate based on conditions
### Coordinate System
- Browser uses 0-1000 normalized range for mouse coordinates
- Mac app converts to actual pixel coordinates based on capture area
- Ensures consistent input handling across different resolutions
## Usage
### Accessing Screen Capture
1. Ensure VibeTunnel server is running
2. Navigate to `http://localhost:4020/screencap` in a web browser
3. Grant Screen Recording permission if prompted
4. Select capture mode (desktop or window)
5. Click "Start" to begin sharing
### Prerequisites
- macOS 14.0 or later
- Screen Recording permission granted to VibeTunnel
- Modern web browser with WebRTC support
- Screencap feature enabled in VibeTunnel settings
## Development
### Running Locally
1. **Start server** (includes UNIX socket handler):
```bash
cd web
pnpm run dev
```
2. **Run Mac app** (connects to local server):
- Open Xcode project
- Build and run
- UNIX socket will auto-connect
3. **Access screen sharing**:
- Navigate to http://localhost:4020/screencap
- Requires authentication
### Testing
```bash
# Monitor logs during capture
./scripts/vtlog.sh -c WebRTCManager -f
# Check frame processing
./scripts/vtlog.sh -s "video frame" -f
# Debug session issues
./scripts/vtlog.sh -s "session" -c WebRTCManager
# Monitor bitrate adjustments
./scripts/vtlog.sh -s "bitrate" -f
# Check UNIX socket connection
./scripts/vtlog.sh -c UnixSocket -f
```
### Debug Logging
Enable debug logs:
```bash
# Browser console
localStorage.setItem('DEBUG', 'screencap*');
# Mac app (or use vtlog)
defaults write sh.vibetunnel.vibetunnel debugMode -bool YES
```
## Troubleshooting
### Common Issues
**"Mac peer not connected" error**
- Ensure Mac app is running
- Check UNIX socket connection at `~/.vibetunnel/screencap.sock`
- Verify Mac app has permissions to create socket file
- Check server logs for connection errors
**"Unauthorized: Invalid session" error**
- This happens when clicking before a session is established
- The browser client generates a session ID when starting capture
- Ensure the session ID is being forwarded through the socket
- Check that the Mac app is validating the session properly
**Black screen or no video**
- Check browser console for WebRTC errors
- Ensure Screen Recording permission is granted
- Try refreshing the page
- Verify VP8/H.264 codec support in browser
- Check if video frames are being sent (look for "FIRST VIDEO FRAME SENT" in logs)
**Poor video quality**
- Check network conditions (logs show packet loss and RTT)
- Monitor bitrate adjustments in logs
- Try disabling 8K mode if enabled
- Ensure sufficient bandwidth (up to 50 Mbps for high quality)
**Input events not working**
- Check Accessibility permissions for Mac app
- Verify coordinate transformation (0-1000 range)
- Check API message flow in logs
- Ensure session is valid
## Security Considerations
- Always validate session IDs for control operations
- Input validation for coordinates and key events
- Rate limiting on API requests to prevent abuse
- Secure session generation (crypto.randomUUID with fallback)
- Sessions tied to specific capture instances
- Clear audit logging with session IDs and timestamps
- Control operations include: click, key, mouse events, capture start/stop
## Future Enhancements
- Audio capture support
- Recording capabilities with configurable formats
- Multiple concurrent viewers for same screen
- Annotation/drawing tools overlay
- File transfer through drag & drop
- Enhanced mobile touch controls and gestures
- Screen area selection for partial capture
- Virtual display support