Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
17 KiB
Screen Capture (Screencap) Feature
Overview
VibeTunnel's screen capture feature allows users to share their Mac screen and control it remotely through a web browser. The implementation uses WebRTC for high-performance video streaming with low latency and WebSocket/UNIX socket for secure control messages.
Architecture
Architecture Diagram
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Browser │ │ Server │ │ Mac App │
│ (Client) │ │ (Port 4020) │ │ (VibeTunnel)│
└─────┬───────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ 1. Connect WebSocket │ │
├───────────────────────────────────►│ │
│ /ws/screencap-signal (auth) │ │
│ │ │
│ │ 2. Connect UNIX Socket │
│ │◄──────────────────────────────────┤
│ │ ~/.vibetunnel/screencap.sock │
│ │ │
│ 3. Request window list │ │
├───────────────────────────────────►│ 4. Forward request │
│ {type: 'api-request', ├──────────────────────────────────►│
│ method: 'GET', │ │
│ endpoint: '/processes'} │ │
│ │ │
│ │ 5. Return window data │
│ 6. Receive window list │◄──────────────────────────────────┤
│◄───────────────────────────────────┤ {type: 'api-response', │
│ │ result: [...]} │
│ │ │
│ 7. Start capture request │ │
├───────────────────────────────────►│ 8. Forward to Mac │
│ ├──────────────────────────────────►│
│ │ │
│ │ 9. WebRTC Offer │
│ 10. Receive Offer │◄──────────────────────────────────┤
│◄───────────────────────────────────┤ │
│ │ │
│ 11. Send Answer │ 12. Forward Answer │
├───────────────────────────────────►├──────────────────────────────────►│
│ │ │
│ 13. Exchange ICE candidates │ (Server relays ICE) │
│◄──────────────────────────────────►│◄─────────────────────────────────►│
│ │ │
│ │ │
│ 14. WebRTC P2P Connection Established │
│◄═══════════════════════════════════════════════════════════════════════►│
│ (Direct video stream, no server involved) │
│ │ │
│ 15. Mouse/Keyboard events │ 16. Forward events │
├───────────────────────────────────►├──────────────────────────────────►│
│ {type: 'api-request', │ │
│ method: 'POST', │ │
│ endpoint: '/click'} │ │
│ │ │
Components
-
ScreencapService (
mac/VibeTunnel/Core/Services/ScreencapService.swift)- Singleton service that manages screen capture functionality
- Uses ScreenCaptureKit for capturing screen/window content
- Manages capture sessions and processes video frames
- Provides API endpoints for window/display enumeration and control
- Supports process grouping with app icons
-
WebRTCManager (
mac/VibeTunnel/Core/Services/WebRTCManager.swift)- Manages WebRTC peer connections
- Handles signaling via UNIX socket (not WebSocket)
- Processes video frames from ScreenCaptureKit
- Supports H.264 and VP8 video codecs (VP8 prioritized for compatibility)
- Implements session-based security for control operations
- Adaptive bitrate control (1-50 Mbps) based on network conditions
- Supports 4K and 8K quality modes
-
Web Frontend (
web/src/client/components/screencap-view.ts)- LitElement-based UI for screen capture
- WebRTC client for receiving video streams
- API client for controlling capture sessions
- Session management for secure control operations
- Touch support for mobile devices
-
UNIX Socket Handler (
web/src/server/websocket/screencap-unix-handler.ts)- Manages UNIX socket at
~/.vibetunnel/screencap.sock - Facilitates WebRTC signaling between Mac app and browser
- Routes API requests between browser and Mac app
- No authentication needed for local UNIX socket
- Manages UNIX socket at
Communication Flow
Browser <--WebSocket--> Node.js Server <--UNIX Socket--> Mac App
<--WebRTC P2P--------------------------------->
- Browser connects to
/ws/screencap-signalwith JWT auth - Mac app connects via UNIX socket at
~/.vibetunnel/screencap.sock - Browser requests screen capture via API
- Mac app creates WebRTC offer and sends through signaling
- Browser responds with answer
- P2P connection established for video streaming
Features
Capture Modes
- Desktop Capture: Share entire display(s)
- Window Capture: Share specific application windows
- Multi-display Support: Handle multiple monitors (-1 index for all displays)
- Process Grouping: View windows grouped by application with icons
Security Model
Authentication Flow
- Browser → Server: JWT token in WebSocket connection
- Mac App → Server: Local UNIX socket connection (no auth needed - local only)
- No Direct Access: All communication goes through server relay
Session Management
- Each capture session has unique ID for security
- Session IDs are generated by the browser client
- Control operations (click, key, capture) require valid session
- Session is validated on each control operation
- Session is cleared when capture stops
Eliminated Vulnerabilities
Previously, the Mac app ran an HTTP server on port 4010:
❌ OLD: Browser → HTTP (no auth) → Mac App:4010
✅ NEW: Browser → WebSocket (auth) → Server → UNIX Socket → Mac App
This eliminates:
- Unauthenticated local access
- CORS vulnerabilities
- Open port exposure
Video Quality
- Codec Support:
- VP8 (prioritized for browser compatibility)
- H.264/AVC (secondary)
- Resolution Options:
- 4K (3840x2160) - Default
- 8K (7680x4320) - Optional high quality mode
- Frame Rate: 60 FPS target
- Adaptive Bitrate:
- Starts at 40 Mbps
- Adjusts between 1-50 Mbps based on:
- Packet loss (reduces bitrate if > 2%)
- Round-trip time (reduces if > 150ms)
- Network conditions (increases in good conditions)
- Hardware Acceleration: Uses VideoToolbox for efficient encoding
- Low Latency: < 50ms typical latency
Message Protocol
API Request/Response
Browser → Server → Mac:
{
"type": "api-request",
"requestId": "uuid",
"method": "GET|POST",
"endpoint": "/processes|/displays|/capture|/click|/key",
"params": { /* optional */ },
"sessionId": "session-uuid"
}
Mac → Server → Browser:
{
"type": "api-response",
"requestId": "uuid",
"result": { /* success data */ },
"error": "error message if failed"
}
WebRTC Signaling
Standard WebRTC signaling messages:
start-capture: Initiate screen sharingoffer: SDP offer from Macanswer: SDP answer from browserice-candidate: ICE candidate exchangemac-ready: Mac app ready for capture
API Endpoints (via WebSocket)
All API requests are sent through the WebSocket connection as api-request messages:
GET /displays
Returns list of available displays:
{
"displays": [
{
"id": "NSScreen-1",
"width": 1920,
"height": 1080,
"scaleFactor": 2.0,
"name": "Built-in Display"
}
]
}
GET /processes
Returns process groups with windows and app icons:
{
"processes": [
{
"name": "Terminal",
"pid": 456,
"icon": "base64-encoded-icon",
"windows": [
{
"cgWindowID": 123,
"title": "Terminal — bash",
"ownerName": "Terminal",
"ownerPID": 456,
"x": 0, "y": 0,
"width": 1920, "height": 1080,
"isOnScreen": true
}
]
}
]
}
POST /capture
Starts desktop capture:
// Request
{
"type": "desktop",
"index": 0, // Display index or -1 for all displays
"webrtc": true,
"use8k": false
}
// Response
{
"status": "started",
"type": "desktop",
"webrtc": true,
"sessionId": "uuid"
}
POST /capture-window
Starts window capture:
// Request
{
"cgWindowID": 123,
"webrtc": true,
"use8k": false
}
// Response
{
"status": "started",
"cgWindowID": 123,
"webrtc": true,
"sessionId": "uuid"
}
POST /stop
Stops capture and clears session:
{
"status": "stopped"
}
POST /click, /mousedown, /mouseup, /mousemove
Sends mouse events (requires session):
{
"x": 500, // 0-1000 normalized range
"y": 500 // 0-1000 normalized range
}
POST /key
Sends keyboard events (requires session):
{
"key": "a",
"metaKey": false,
"ctrlKey": false,
"altKey": false,
"shiftKey": true
}
GET /frame
Get current frame as JPEG (for non-WebRTC mode):
{
"frame": "base64-encoded-jpeg"
}
Implementation Details
UNIX Socket Connection
The Mac app connects to the server via UNIX socket instead of WebSocket:
- Socket Path:
~/.vibetunnel/screencap.sock - Shared Connection: Uses
SharedUnixSocketManagerfor socket management - Message Routing: Messages are routed between browser WebSocket and Mac UNIX socket
- No Authentication: Local UNIX socket doesn't require authentication
WebRTC Implementation
-
Video Processing:
processVideoFrameSyncmethod handles CMSampleBuffer without data races- Frames are converted to RTCVideoFrame with proper timestamps
- First frame and periodic frames are logged for debugging
-
Codec Configuration:
- VP8 is prioritized over H.264 in SDP for better compatibility
- Bandwidth constraints added to SDP (b=AS:bitrate)
- Codec reordering happens during peer connection setup
-
Stats Monitoring:
- Stats collected every 2 seconds when connected
- Monitors packet loss, RTT, and bytes sent
- Automatically adjusts bitrate based on conditions
Coordinate System
- Browser uses 0-1000 normalized range for mouse coordinates
- Mac app converts to actual pixel coordinates based on capture area
- Ensures consistent input handling across different resolutions
Usage
Accessing Screen Capture
- Ensure VibeTunnel server is running
- Navigate to
http://localhost:4020/screencapin a web browser - Grant Screen Recording permission if prompted
- Select capture mode (desktop or window)
- Click "Start" to begin sharing
Prerequisites
- macOS 14.0 or later
- Screen Recording permission granted to VibeTunnel
- Modern web browser with WebRTC support
- Screencap feature enabled in VibeTunnel settings
Development
Running Locally
-
Start server (includes UNIX socket handler):
cd web pnpm run dev -
Run Mac app (connects to local server):
- Open Xcode project
- Build and run
- UNIX socket will auto-connect
-
Access screen sharing:
- Navigate to http://localhost:4020/screencap
- Requires authentication
Testing
# Monitor logs during capture
./scripts/vtlog.sh -c WebRTCManager -f
# Check frame processing
./scripts/vtlog.sh -s "video frame" -f
# Debug session issues
./scripts/vtlog.sh -s "session" -c WebRTCManager
# Monitor bitrate adjustments
./scripts/vtlog.sh -s "bitrate" -f
# Check UNIX socket connection
./scripts/vtlog.sh -c UnixSocket -f
Debug Logging
Enable debug logs:
# Browser console
localStorage.setItem('DEBUG', 'screencap*');
# Mac app (or use vtlog)
defaults write sh.vibetunnel.vibetunnel debugMode -bool YES
Troubleshooting
Common Issues
"Mac peer not connected" error
- Ensure Mac app is running
- Check UNIX socket connection at
~/.vibetunnel/screencap.sock - Verify Mac app has permissions to create socket file
- Check server logs for connection errors
"Unauthorized: Invalid session" error
- This happens when clicking before a session is established
- The browser client generates a session ID when starting capture
- Ensure the session ID is being forwarded through the socket
- Check that the Mac app is validating the session properly
Black screen or no video
- Check browser console for WebRTC errors
- Ensure Screen Recording permission is granted
- Try refreshing the page
- Verify VP8/H.264 codec support in browser
- Check if video frames are being sent (look for "FIRST VIDEO FRAME SENT" in logs)
Poor video quality
- Check network conditions (logs show packet loss and RTT)
- Monitor bitrate adjustments in logs
- Try disabling 8K mode if enabled
- Ensure sufficient bandwidth (up to 50 Mbps for high quality)
Input events not working
- Check Accessibility permissions for Mac app
- Verify coordinate transformation (0-1000 range)
- Check API message flow in logs
- Ensure session is valid
Security Considerations
- Always validate session IDs for control operations
- Input validation for coordinates and key events
- Rate limiting on API requests to prevent abuse
- Secure session generation (crypto.randomUUID with fallback)
- Sessions tied to specific capture instances
- Clear audit logging with session IDs and timestamps
- Control operations include: click, key, mouse events, capture start/stop
Future Enhancements
- Audio capture support
- Recording capabilities with configurable formats
- Multiple concurrent viewers for same screen
- Annotation/drawing tools overlay
- File transfer through drag & drop
- Enhanced mobile touch controls and gestures
- Screen area selection for partial capture
- Virtual display support