mirror of
https://github.com/samsonjs/vibetunnel.git
synced 2026-04-04 11:05:53 +00:00
Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
474 lines
No EOL
17 KiB
Markdown
474 lines
No EOL
17 KiB
Markdown
# Screen Capture (Screencap) Feature
|
|
|
|
## Overview
|
|
|
|
VibeTunnel's screen capture feature allows users to share their Mac screen and control it remotely through a web browser. The implementation uses WebRTC for high-performance video streaming with low latency and WebSocket/UNIX socket for secure control messages.
|
|
|
|
## Architecture
|
|
|
|
### Architecture Diagram
|
|
|
|
```
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Browser │ │ Server │ │ Mac App │
|
|
│ (Client) │ │ (Port 4020) │ │ (VibeTunnel)│
|
|
└─────┬───────┘ └──────┬──────┘ └──────┬──────┘
|
|
│ │ │
|
|
│ 1. Connect WebSocket │ │
|
|
├───────────────────────────────────►│ │
|
|
│ /ws/screencap-signal (auth) │ │
|
|
│ │ │
|
|
│ │ 2. Connect UNIX Socket │
|
|
│ │◄──────────────────────────────────┤
|
|
│ │ ~/.vibetunnel/screencap.sock │
|
|
│ │ │
|
|
│ 3. Request window list │ │
|
|
├───────────────────────────────────►│ 4. Forward request │
|
|
│ {type: 'api-request', ├──────────────────────────────────►│
|
|
│ method: 'GET', │ │
|
|
│ endpoint: '/processes'} │ │
|
|
│ │ │
|
|
│ │ 5. Return window data │
|
|
│ 6. Receive window list │◄──────────────────────────────────┤
|
|
│◄───────────────────────────────────┤ {type: 'api-response', │
|
|
│ │ result: [...]} │
|
|
│ │ │
|
|
│ 7. Start capture request │ │
|
|
├───────────────────────────────────►│ 8. Forward to Mac │
|
|
│ ├──────────────────────────────────►│
|
|
│ │ │
|
|
│ │ 9. WebRTC Offer │
|
|
│ 10. Receive Offer │◄──────────────────────────────────┤
|
|
│◄───────────────────────────────────┤ │
|
|
│ │ │
|
|
│ 11. Send Answer │ 12. Forward Answer │
|
|
├───────────────────────────────────►├──────────────────────────────────►│
|
|
│ │ │
|
|
│ 13. Exchange ICE candidates │ (Server relays ICE) │
|
|
│◄──────────────────────────────────►│◄─────────────────────────────────►│
|
|
│ │ │
|
|
│ │ │
|
|
│ 14. WebRTC P2P Connection Established │
|
|
│◄═══════════════════════════════════════════════════════════════════════►│
|
|
│ (Direct video stream, no server involved) │
|
|
│ │ │
|
|
│ 15. Mouse/Keyboard events │ 16. Forward events │
|
|
├───────────────────────────────────►├──────────────────────────────────►│
|
|
│ {type: 'api-request', │ │
|
|
│ method: 'POST', │ │
|
|
│ endpoint: '/click'} │ │
|
|
│ │ │
|
|
```
|
|
|
|
### Components
|
|
|
|
1. **ScreencapService** (`mac/VibeTunnel/Core/Services/ScreencapService.swift`)
|
|
- Singleton service that manages screen capture functionality
|
|
- Uses ScreenCaptureKit for capturing screen/window content
|
|
- Manages capture sessions and processes video frames
|
|
- Provides API endpoints for window/display enumeration and control
|
|
- Supports process grouping with app icons
|
|
|
|
2. **WebRTCManager** (`mac/VibeTunnel/Core/Services/WebRTCManager.swift`)
|
|
- Manages WebRTC peer connections
|
|
- Handles signaling via UNIX socket (not WebSocket)
|
|
- Processes video frames from ScreenCaptureKit
|
|
- Supports H.264 and VP8 video codecs (VP8 prioritized for compatibility)
|
|
- Implements session-based security for control operations
|
|
- Adaptive bitrate control (1-50 Mbps) based on network conditions
|
|
- Supports 4K and 8K quality modes
|
|
|
|
3. **Web Frontend** (`web/src/client/components/screencap-view.ts`)
|
|
- LitElement-based UI for screen capture
|
|
- WebRTC client for receiving video streams
|
|
- API client for controlling capture sessions
|
|
- Session management for secure control operations
|
|
- Touch support for mobile devices
|
|
|
|
4. **UNIX Socket Handler** (`web/src/server/websocket/screencap-unix-handler.ts`)
|
|
- Manages UNIX socket at `~/.vibetunnel/screencap.sock`
|
|
- Facilitates WebRTC signaling between Mac app and browser
|
|
- Routes API requests between browser and Mac app
|
|
- No authentication needed for local UNIX socket
|
|
|
|
### Communication Flow
|
|
|
|
```
|
|
Browser <--WebSocket--> Node.js Server <--UNIX Socket--> Mac App
|
|
<--WebRTC P2P--------------------------------->
|
|
```
|
|
|
|
1. Browser connects to `/ws/screencap-signal` with JWT auth
|
|
2. Mac app connects via UNIX socket at `~/.vibetunnel/screencap.sock`
|
|
3. Browser requests screen capture via API
|
|
4. Mac app creates WebRTC offer and sends through signaling
|
|
5. Browser responds with answer
|
|
6. P2P connection established for video streaming
|
|
|
|
## Features
|
|
|
|
### Capture Modes
|
|
|
|
- **Desktop Capture**: Share entire display(s)
|
|
- **Window Capture**: Share specific application windows
|
|
- **Multi-display Support**: Handle multiple monitors (-1 index for all displays)
|
|
- **Process Grouping**: View windows grouped by application with icons
|
|
|
|
### Security Model
|
|
|
|
#### Authentication Flow
|
|
|
|
1. **Browser → Server**: JWT token in WebSocket connection
|
|
2. **Mac App → Server**: Local UNIX socket connection (no auth needed - local only)
|
|
3. **No Direct Access**: All communication goes through server relay
|
|
|
|
#### Session Management
|
|
|
|
- Each capture session has unique ID for security
|
|
- Session IDs are generated by the browser client
|
|
- Control operations (click, key, capture) require valid session
|
|
- Session is validated on each control operation
|
|
- Session is cleared when capture stops
|
|
|
|
#### Eliminated Vulnerabilities
|
|
|
|
Previously, the Mac app ran an HTTP server on port 4010:
|
|
```
|
|
❌ OLD: Browser → HTTP (no auth) → Mac App:4010
|
|
✅ NEW: Browser → WebSocket (auth) → Server → UNIX Socket → Mac App
|
|
```
|
|
|
|
This eliminates:
|
|
- Unauthenticated local access
|
|
- CORS vulnerabilities
|
|
- Open port exposure
|
|
|
|
### Video Quality
|
|
|
|
- **Codec Support**:
|
|
- VP8 (prioritized for browser compatibility)
|
|
- H.264/AVC (secondary)
|
|
- **Resolution Options**:
|
|
- 4K (3840x2160) - Default
|
|
- 8K (7680x4320) - Optional high quality mode
|
|
- **Frame Rate**: 60 FPS target
|
|
- **Adaptive Bitrate**:
|
|
- Starts at 40 Mbps
|
|
- Adjusts between 1-50 Mbps based on:
|
|
- Packet loss (reduces bitrate if > 2%)
|
|
- Round-trip time (reduces if > 150ms)
|
|
- Network conditions (increases in good conditions)
|
|
- **Hardware Acceleration**: Uses VideoToolbox for efficient encoding
|
|
- **Low Latency**: < 50ms typical latency
|
|
|
|
## Message Protocol
|
|
|
|
### API Request/Response
|
|
|
|
Browser → Server → Mac:
|
|
```json
|
|
{
|
|
"type": "api-request",
|
|
"requestId": "uuid",
|
|
"method": "GET|POST",
|
|
"endpoint": "/processes|/displays|/capture|/click|/key",
|
|
"params": { /* optional */ },
|
|
"sessionId": "session-uuid"
|
|
}
|
|
```
|
|
|
|
Mac → Server → Browser:
|
|
```json
|
|
{
|
|
"type": "api-response",
|
|
"requestId": "uuid",
|
|
"result": { /* success data */ },
|
|
"error": "error message if failed"
|
|
}
|
|
```
|
|
|
|
### WebRTC Signaling
|
|
|
|
Standard WebRTC signaling messages:
|
|
- `start-capture`: Initiate screen sharing
|
|
- `offer`: SDP offer from Mac
|
|
- `answer`: SDP answer from browser
|
|
- `ice-candidate`: ICE candidate exchange
|
|
- `mac-ready`: Mac app ready for capture
|
|
|
|
## API Endpoints (via WebSocket)
|
|
|
|
All API requests are sent through the WebSocket connection as `api-request` messages:
|
|
|
|
### GET /displays
|
|
Returns list of available displays:
|
|
```json
|
|
{
|
|
"displays": [
|
|
{
|
|
"id": "NSScreen-1",
|
|
"width": 1920,
|
|
"height": 1080,
|
|
"scaleFactor": 2.0,
|
|
"name": "Built-in Display"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### GET /processes
|
|
Returns process groups with windows and app icons:
|
|
```json
|
|
{
|
|
"processes": [
|
|
{
|
|
"name": "Terminal",
|
|
"pid": 456,
|
|
"icon": "base64-encoded-icon",
|
|
"windows": [
|
|
{
|
|
"cgWindowID": 123,
|
|
"title": "Terminal — bash",
|
|
"ownerName": "Terminal",
|
|
"ownerPID": 456,
|
|
"x": 0, "y": 0,
|
|
"width": 1920, "height": 1080,
|
|
"isOnScreen": true
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### POST /capture
|
|
Starts desktop capture:
|
|
```json
|
|
// Request
|
|
{
|
|
"type": "desktop",
|
|
"index": 0, // Display index or -1 for all displays
|
|
"webrtc": true,
|
|
"use8k": false
|
|
}
|
|
|
|
// Response
|
|
{
|
|
"status": "started",
|
|
"type": "desktop",
|
|
"webrtc": true,
|
|
"sessionId": "uuid"
|
|
}
|
|
```
|
|
|
|
### POST /capture-window
|
|
Starts window capture:
|
|
```json
|
|
// Request
|
|
{
|
|
"cgWindowID": 123,
|
|
"webrtc": true,
|
|
"use8k": false
|
|
}
|
|
|
|
// Response
|
|
{
|
|
"status": "started",
|
|
"cgWindowID": 123,
|
|
"webrtc": true,
|
|
"sessionId": "uuid"
|
|
}
|
|
```
|
|
|
|
### POST /stop
|
|
Stops capture and clears session:
|
|
```json
|
|
{
|
|
"status": "stopped"
|
|
}
|
|
```
|
|
|
|
### POST /click, /mousedown, /mouseup, /mousemove
|
|
Sends mouse events (requires session):
|
|
```json
|
|
{
|
|
"x": 500, // 0-1000 normalized range
|
|
"y": 500 // 0-1000 normalized range
|
|
}
|
|
```
|
|
|
|
### POST /key
|
|
Sends keyboard events (requires session):
|
|
```json
|
|
{
|
|
"key": "a",
|
|
"metaKey": false,
|
|
"ctrlKey": false,
|
|
"altKey": false,
|
|
"shiftKey": true
|
|
}
|
|
```
|
|
|
|
### GET /frame
|
|
Get current frame as JPEG (for non-WebRTC mode):
|
|
```json
|
|
{
|
|
"frame": "base64-encoded-jpeg"
|
|
}
|
|
```
|
|
|
|
## Implementation Details
|
|
|
|
### UNIX Socket Connection
|
|
|
|
The Mac app connects to the server via UNIX socket instead of WebSocket:
|
|
|
|
1. **Socket Path**: `~/.vibetunnel/screencap.sock`
|
|
2. **Shared Connection**: Uses `SharedUnixSocketManager` for socket management
|
|
3. **Message Routing**: Messages are routed between browser WebSocket and Mac UNIX socket
|
|
4. **No Authentication**: Local UNIX socket doesn't require authentication
|
|
|
|
### WebRTC Implementation
|
|
|
|
1. **Video Processing**:
|
|
- `processVideoFrameSync` method handles CMSampleBuffer without data races
|
|
- Frames are converted to RTCVideoFrame with proper timestamps
|
|
- First frame and periodic frames are logged for debugging
|
|
|
|
2. **Codec Configuration**:
|
|
- VP8 is prioritized over H.264 in SDP for better compatibility
|
|
- Bandwidth constraints added to SDP (b=AS:bitrate)
|
|
- Codec reordering happens during peer connection setup
|
|
|
|
3. **Stats Monitoring**:
|
|
- Stats collected every 2 seconds when connected
|
|
- Monitors packet loss, RTT, and bytes sent
|
|
- Automatically adjusts bitrate based on conditions
|
|
|
|
### Coordinate System
|
|
|
|
- Browser uses 0-1000 normalized range for mouse coordinates
|
|
- Mac app converts to actual pixel coordinates based on capture area
|
|
- Ensures consistent input handling across different resolutions
|
|
|
|
## Usage
|
|
|
|
### Accessing Screen Capture
|
|
|
|
1. Ensure VibeTunnel server is running
|
|
2. Navigate to `http://localhost:4020/screencap` in a web browser
|
|
3. Grant Screen Recording permission if prompted
|
|
4. Select capture mode (desktop or window)
|
|
5. Click "Start" to begin sharing
|
|
|
|
### Prerequisites
|
|
|
|
- macOS 14.0 or later
|
|
- Screen Recording permission granted to VibeTunnel
|
|
- Modern web browser with WebRTC support
|
|
- Screencap feature enabled in VibeTunnel settings
|
|
|
|
## Development
|
|
|
|
### Running Locally
|
|
|
|
1. **Start server** (includes UNIX socket handler):
|
|
```bash
|
|
cd web
|
|
pnpm run dev
|
|
```
|
|
|
|
2. **Run Mac app** (connects to local server):
|
|
- Open Xcode project
|
|
- Build and run
|
|
- UNIX socket will auto-connect
|
|
|
|
3. **Access screen sharing**:
|
|
- Navigate to http://localhost:4020/screencap
|
|
- Requires authentication
|
|
|
|
### Testing
|
|
|
|
```bash
|
|
# Monitor logs during capture
|
|
./scripts/vtlog.sh -c WebRTCManager -f
|
|
|
|
# Check frame processing
|
|
./scripts/vtlog.sh -s "video frame" -f
|
|
|
|
# Debug session issues
|
|
./scripts/vtlog.sh -s "session" -c WebRTCManager
|
|
|
|
# Monitor bitrate adjustments
|
|
./scripts/vtlog.sh -s "bitrate" -f
|
|
|
|
# Check UNIX socket connection
|
|
./scripts/vtlog.sh -c UnixSocket -f
|
|
```
|
|
|
|
### Debug Logging
|
|
|
|
Enable debug logs:
|
|
```bash
|
|
# Browser console
|
|
localStorage.setItem('DEBUG', 'screencap*');
|
|
|
|
# Mac app (or use vtlog)
|
|
defaults write sh.vibetunnel.vibetunnel debugMode -bool YES
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**"Mac peer not connected" error**
|
|
- Ensure Mac app is running
|
|
- Check UNIX socket connection at `~/.vibetunnel/screencap.sock`
|
|
- Verify Mac app has permissions to create socket file
|
|
- Check server logs for connection errors
|
|
|
|
**"Unauthorized: Invalid session" error**
|
|
- This happens when clicking before a session is established
|
|
- The browser client generates a session ID when starting capture
|
|
- Ensure the session ID is being forwarded through the socket
|
|
- Check that the Mac app is validating the session properly
|
|
|
|
**Black screen or no video**
|
|
- Check browser console for WebRTC errors
|
|
- Ensure Screen Recording permission is granted
|
|
- Try refreshing the page
|
|
- Verify VP8/H.264 codec support in browser
|
|
- Check if video frames are being sent (look for "FIRST VIDEO FRAME SENT" in logs)
|
|
|
|
**Poor video quality**
|
|
- Check network conditions (logs show packet loss and RTT)
|
|
- Monitor bitrate adjustments in logs
|
|
- Try disabling 8K mode if enabled
|
|
- Ensure sufficient bandwidth (up to 50 Mbps for high quality)
|
|
|
|
**Input events not working**
|
|
- Check Accessibility permissions for Mac app
|
|
- Verify coordinate transformation (0-1000 range)
|
|
- Check API message flow in logs
|
|
- Ensure session is valid
|
|
|
|
## Security Considerations
|
|
|
|
- Always validate session IDs for control operations
|
|
- Input validation for coordinates and key events
|
|
- Rate limiting on API requests to prevent abuse
|
|
- Secure session generation (crypto.randomUUID with fallback)
|
|
- Sessions tied to specific capture instances
|
|
- Clear audit logging with session IDs and timestamps
|
|
- Control operations include: click, key, mouse events, capture start/stop
|
|
|
|
## Future Enhancements
|
|
|
|
- Audio capture support
|
|
- Recording capabilities with configurable formats
|
|
- Multiple concurrent viewers for same screen
|
|
- Annotation/drawing tools overlay
|
|
- File transfer through drag & drop
|
|
- Enhanced mobile touch controls and gestures
|
|
- Screen area selection for partial capture
|
|
- Virtual display support |