mirror of
https://github.com/samsonjs/Peekaboo.git
synced 2026-04-25 14:47:43 +00:00
🤖 Added AI Vision Analysis with Smart Model Selection
Major new feature: Local AI vision analysis with Ollama integration Features: • One-step: Screenshot + AI analysis in single command • Two-step: Analyze existing images separately • Smart model auto-detection with priority ranking • Simplified ollama run commands (no complex API calls) • Comprehensive error handling and setup instructions Priority models: qwen2.5vl:7b > llava:7b > llava-phi3:3.8b > minicpm-v:8b Examples: osascript peekaboo.scpt "Safari" --ask "What's on this page?" osascript peekaboo.scpt analyze "/tmp/shot.png" "Any errors?" Perfect for automated testing, QA, and visual verification\! 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
939c60aaaf
commit
a5132f53c1
2 changed files with 288 additions and 6 deletions
92
README.md
92
README.md
|
|
@ -18,6 +18,7 @@
|
|||
- 💥 **Zero interaction**: 100% unattended operation
|
||||
- 🧠 **Smart filenames**: Model-friendly names with app info
|
||||
- ⚡ **Optimized speed**: 70% faster capture delays
|
||||
- 🤖 **AI Vision Analysis**: Local Ollama integration with auto-model detection
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -53,6 +54,12 @@ osascript peekaboo.scpt "Chrome" "/tmp/chrome.png" --multi
|
|||
|
||||
# 🪟 Just the front window
|
||||
osascript peekaboo.scpt "TextEdit" "/tmp/textedit.png" --window
|
||||
|
||||
# 🤖 AI analysis: Screenshot + question in one step
|
||||
osascript peekaboo.scpt "Safari" --ask "What's on this page?"
|
||||
|
||||
# 🔍 Analyze existing image
|
||||
osascript peekaboo.scpt analyze "/tmp/screenshot.png" "Any errors visible?"
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -125,6 +132,60 @@ osascript peekaboo.scpt "Safari" "/tmp/shot.pdf"
|
|||
|
||||
---
|
||||
|
||||
## 🤖 **AI VISION ANALYSIS**
|
||||
|
||||
Peekaboo integrates with **Ollama** for local AI vision analysis - ask questions about your screenshots!
|
||||
|
||||
### 🚀 **One-Step: Screenshot + Analysis**
|
||||
```bash
|
||||
# Take screenshot and analyze it in one command
|
||||
osascript peekaboo.scpt "Safari" --ask "What's the main content on this page?"
|
||||
osascript peekaboo.scpt "Terminal" --ask "Any error messages visible?"
|
||||
osascript peekaboo.scpt "Xcode" --ask "Is the build successful?"
|
||||
osascript peekaboo.scpt "Chrome" --ask "What product is being shown?" --model llava:13b
|
||||
```
|
||||
|
||||
### 🔍 **Two-Step: Analyze Existing Images**
|
||||
```bash
|
||||
# Analyze screenshots you already have
|
||||
osascript peekaboo.scpt analyze "/tmp/screenshot.png" "Describe what you see"
|
||||
osascript peekaboo.scpt analyze "/path/error.png" "What error is shown?"
|
||||
osascript peekaboo.scpt analyze "/Desktop/ui.png" "Any UI issues?" --model qwen2.5vl:7b
|
||||
```
|
||||
|
||||
### 🧠 **Smart Model Selection**
|
||||
Peekaboo automatically picks the best available vision model:
|
||||
|
||||
**Priority order:**
|
||||
1. `qwen2.5vl:7b` (6GB) - Best doc/chart understanding
|
||||
2. `llava:7b` (4.7GB) - Solid all-rounder
|
||||
3. `llava-phi3:3.8b` (2.9GB) - Tiny but chatty
|
||||
4. `minicpm-v:8b` (5.5GB) - Killer OCR
|
||||
5. `gemma3:4b` (3.3GB) - Multilingual support
|
||||
|
||||
### ⚡ **Quick Setup**
|
||||
```bash
|
||||
# Install Ollama
|
||||
curl -fsSL https://ollama.ai/install.sh | sh
|
||||
|
||||
# Pull a vision model (pick one)
|
||||
ollama pull qwen2.5vl:7b # Recommended: best overall
|
||||
ollama pull llava:7b # Popular: good balance
|
||||
ollama pull llava-phi3:3.8b # Lightweight: low RAM
|
||||
|
||||
# Ready to analyze!
|
||||
osascript peekaboo.scpt "Safari" --ask "What's on this page?"
|
||||
```
|
||||
|
||||
**Perfect for:**
|
||||
- 🧪 Automated UI testing
|
||||
- 📊 Dashboard monitoring
|
||||
- 🐛 Error detection
|
||||
- 📸 Content verification
|
||||
- 🔍 Visual QA automation
|
||||
|
||||
---
|
||||
|
||||
## 🧠 **SMART FILENAMES**
|
||||
|
||||
Peekaboo automatically generates **model-friendly** filenames that are perfect for automation:
|
||||
|
|
@ -214,18 +275,38 @@ osascript peekaboo.scpt "Safari" "/docs/browser.png" --multi
|
|||
osascript peekaboo.scpt "Your App"
|
||||
# → /tmp/peekaboo_your_app_20250522_143052.png
|
||||
|
||||
# Automated visual testing with AI
|
||||
osascript peekaboo.scpt "Your App" --ask "Any error messages or crashes visible?"
|
||||
osascript peekaboo.scpt "Your App" --ask "Is the login screen displayed correctly?"
|
||||
|
||||
# Custom path with timestamp
|
||||
osascript peekaboo.scpt "Your App" "/test-results/app-$(date +%s).png"
|
||||
```
|
||||
|
||||
### 🎬 **Content Creation**
|
||||
```bash
|
||||
# Before/after shots
|
||||
# Before/after shots with AI descriptions
|
||||
osascript peekaboo.scpt "Photoshop" --ask "Describe the current design state"
|
||||
# ... do your work ...
|
||||
osascript peekaboo.scpt "Photoshop" --ask "What changes were made to the design?"
|
||||
|
||||
# Traditional before/after shots
|
||||
osascript peekaboo.scpt "Photoshop" "/content/before.png"
|
||||
# ... do your work ...
|
||||
osascript peekaboo.scpt "Photoshop" "/content/after.png"
|
||||
```
|
||||
|
||||
### 🧪 **Automated QA & Testing**
|
||||
```bash
|
||||
# Visual regression testing
|
||||
osascript peekaboo.scpt "Your App" --ask "Does the UI look correct?"
|
||||
osascript peekaboo.scpt "Safari" --ask "Are there any broken images or layout issues?"
|
||||
osascript peekaboo.scpt "Terminal" --ask "Any red error text visible?"
|
||||
|
||||
# Dashboard monitoring
|
||||
osascript peekaboo.scpt analyze "/tmp/dashboard.png" "Are all metrics green?"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚨 **TROUBLESHOOTING**
|
||||
|
|
@ -268,6 +349,8 @@ osascript peekaboo.scpt "Safari" "/tmp/debug.png" --verbose
|
|||
| **Window modes** | ✅ `--window` for front window only |
|
||||
| **Auto paths** | ✅ Optional output path with smart /tmp defaults |
|
||||
| **Smart filenames** | ✅ Model-friendly: app_name_timestamp format |
|
||||
| **AI Vision Analysis** | ✅ Local Ollama integration with auto-model detection |
|
||||
| **Smart AI Models** | ✅ Auto-picks best: qwen2.5vl > llava > phi3 > minicpm |
|
||||
| **Verbose logging** | ✅ `--verbose` for debugging |
|
||||
|
||||
---
|
||||
|
|
@ -318,6 +401,7 @@ property verboseLogging : false -- Debug output
|
|||
- **Smart filenames**: Model-friendly with app names
|
||||
- **Smart targeting**: Works with app names OR bundle IDs
|
||||
- **Smart delays**: Optimized for speed (70% faster)
|
||||
- **Smart AI analysis**: Auto-detects best vision model
|
||||
- Auto-launches sleeping apps and brings them forward
|
||||
|
||||
### 🎭 **Multi-Window Mastery**
|
||||
|
|
@ -331,6 +415,12 @@ property verboseLogging : false -- Debug output
|
|||
- **0.1s multi-window focus** (down from 0.3s)
|
||||
- Responsive and practical for daily use
|
||||
|
||||
### 🤖 **AI-Powered Vision**
|
||||
- **Local analysis**: Private Ollama integration, no cloud
|
||||
- **Smart model selection**: Auto-picks best available model
|
||||
- **One or two-step**: Screenshot+analyze or analyze existing images
|
||||
- **Perfect for automation**: Visual testing, error detection, QA
|
||||
|
||||
### 🔍 **Discovery Built-In**
|
||||
- See exactly what's running
|
||||
- Get precise window titles
|
||||
|
|
|
|||
202
peekaboo.scpt
202
peekaboo.scpt
|
|
@ -13,6 +13,10 @@ property windowActivationDelay : 0.2
|
|||
property enhancedErrorReporting : true
|
||||
property verboseLogging : false
|
||||
property maxWindowTitleLength : 50
|
||||
-- AI Analysis Configuration
|
||||
property defaultVisionModel : "qwen2.5vl:7b"
|
||||
-- Prioritized list of vision models (best to fallback)
|
||||
property visionModelPriority : {"qwen2.5vl:7b", "llava:7b", "llava-phi3:3.8b", "minicpm-v:8b", "gemma3:4b", "llava:latest", "qwen2.5vl:3b", "llava:13b", "llava-llama3:8b"}
|
||||
--#endregion Configuration Properties
|
||||
|
||||
--#region Helper Functions
|
||||
|
|
@ -135,6 +139,125 @@ on trimWhitespace(theText)
|
|||
end trimWhitespace
|
||||
--#endregion Helper Functions
|
||||
|
||||
--#region AI Analysis Functions
|
||||
on checkOllamaAvailable()
|
||||
try
|
||||
do shell script "ollama --version >/dev/null 2>&1"
|
||||
return true
|
||||
on error
|
||||
return false
|
||||
end try
|
||||
end checkOllamaAvailable
|
||||
|
||||
on getAvailableVisionModels()
|
||||
set availableModels to {}
|
||||
try
|
||||
set ollamaList to do shell script "ollama list 2>/dev/null | tail -n +2 | awk '{print $1}' | grep -v '^$'"
|
||||
set modelLines to paragraphs of ollamaList
|
||||
repeat with modelLine in modelLines
|
||||
set modelName to contents of modelLine
|
||||
if modelName is not "" then
|
||||
set end of availableModels to modelName
|
||||
end if
|
||||
end repeat
|
||||
on error
|
||||
-- Return empty list if ollama list fails
|
||||
end try
|
||||
return availableModels
|
||||
end getAvailableVisionModels
|
||||
|
||||
on findBestVisionModel(requestedModel)
|
||||
my logVerbose("Finding best vision model, requested: " & requestedModel)
|
||||
|
||||
set availableModels to my getAvailableVisionModels()
|
||||
my logVerbose("Available models: " & (availableModels as string))
|
||||
|
||||
-- If specific model requested and available, use it
|
||||
if requestedModel is not defaultVisionModel then
|
||||
repeat with availModel in availableModels
|
||||
if contents of availModel is requestedModel then
|
||||
my logVerbose("Using requested model: " & requestedModel)
|
||||
return requestedModel
|
||||
end if
|
||||
end repeat
|
||||
-- Requested model not found, will fall back to priority list
|
||||
my logVerbose("Requested model '" & requestedModel & "' not found, checking priority list")
|
||||
end if
|
||||
|
||||
-- Find best available model from priority list
|
||||
repeat with priorityModel in visionModelPriority
|
||||
repeat with availModel in availableModels
|
||||
if contents of availModel is contents of priorityModel then
|
||||
my logVerbose("Using priority model: " & contents of priorityModel)
|
||||
return contents of priorityModel
|
||||
end if
|
||||
end repeat
|
||||
end repeat
|
||||
|
||||
-- No priority models available, use first available vision model
|
||||
repeat with availModel in availableModels
|
||||
set modelName to contents of availModel
|
||||
if modelName contains "llava" or modelName contains "qwen" or modelName contains "gemma" or modelName contains "minicpm" then
|
||||
my logVerbose("Using first available vision model: " & modelName)
|
||||
return modelName
|
||||
end if
|
||||
end repeat
|
||||
|
||||
-- No vision models found
|
||||
return ""
|
||||
end findBestVisionModel
|
||||
|
||||
on getOllamaInstallInstructions()
|
||||
set instructions to scriptInfoPrefix & "AI Analysis requires Ollama with a vision model." & linefeed & linefeed
|
||||
set instructions to instructions & "🚀 Quick Setup:" & linefeed
|
||||
set instructions to instructions & "1. Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh" & linefeed
|
||||
set instructions to instructions & "2. Pull a vision model: ollama pull " & defaultVisionModel & linefeed
|
||||
set instructions to instructions & "3. Models are ready to use!" & linefeed & linefeed
|
||||
set instructions to instructions & "💡 Recommended models:" & linefeed
|
||||
set instructions to instructions & " • qwen2.5vl:7b (6GB) - Best doc/chart understanding" & linefeed
|
||||
set instructions to instructions & " • llava:7b (4.7GB) - Solid all-rounder" & linefeed
|
||||
set instructions to instructions & " • llava-phi3:3.8b (2.9GB) - Tiny but chatty" & linefeed
|
||||
set instructions to instructions & " • minicpm-v:8b (5.5GB) - Killer OCR" & linefeed & linefeed
|
||||
set instructions to instructions & "Then retry your Peekaboo command with --ask or --analyze!"
|
||||
return instructions
|
||||
end getOllamaInstallInstructions
|
||||
|
||||
on analyzeImageWithAI(imagePath, question, requestedModel)
|
||||
my logVerbose("Analyzing image with AI: " & imagePath)
|
||||
my logVerbose("Requested model: " & requestedModel)
|
||||
my logVerbose("Question: " & question)
|
||||
|
||||
-- Check if Ollama is available
|
||||
if not my checkOllamaAvailable() then
|
||||
return my formatErrorMessage("Ollama Error", "Ollama is not installed or not in PATH." & linefeed & linefeed & my getOllamaInstallInstructions(), "ollama unavailable")
|
||||
end if
|
||||
|
||||
-- Find best available vision model
|
||||
set modelToUse to my findBestVisionModel(requestedModel)
|
||||
if modelToUse is "" then
|
||||
return my formatErrorMessage("Model Error", "No vision models found." & linefeed & linefeed & my getOllamaInstallInstructions(), "no vision models")
|
||||
end if
|
||||
|
||||
-- Use ollama run command (much simpler than API)
|
||||
try
|
||||
my logVerbose("Using model: " & modelToUse)
|
||||
set ollamaCmd to "ollama run " & quoted form of modelToUse & " --image " & quoted form of imagePath & " " & quoted form of question
|
||||
my logVerbose("Running: " & ollamaCmd)
|
||||
|
||||
set aiResponse to do shell script ollamaCmd
|
||||
|
||||
return scriptInfoPrefix & "AI Analysis Complete! 🤖" & linefeed & linefeed & "📸 Image: " & imagePath & linefeed & "❓ Question: " & question & linefeed & "🤖 Model: " & modelToUse & linefeed & linefeed & "💬 Answer:" & linefeed & aiResponse
|
||||
|
||||
on error errMsg
|
||||
if errMsg contains "model" and errMsg contains "not found" then
|
||||
return my formatErrorMessage("Model Error", "Model '" & modelToUse & "' not found." & linefeed & linefeed & "Install it with: ollama pull " & modelToUse & linefeed & linefeed & my getOllamaInstallInstructions(), "model not found")
|
||||
else
|
||||
return my formatErrorMessage("Analysis Error", "Failed to analyze image: " & errMsg & linefeed & linefeed & "Make sure Ollama is running and the model is available.", "ollama execution")
|
||||
end if
|
||||
end try
|
||||
end analyzeImageWithAI
|
||||
--#endregion AI Analysis Functions
|
||||
|
||||
--#region App Discovery Functions
|
||||
on listRunningApps()
|
||||
set appList to {}
|
||||
|
|
@ -523,6 +646,26 @@ on run argv
|
|||
end if
|
||||
end if
|
||||
|
||||
-- Handle analyze command for existing images (two-step workflow)
|
||||
if argCount ≥ 3 then
|
||||
set firstArg to item 1 of argv
|
||||
if firstArg is "analyze" or firstArg is "--analyze" then
|
||||
set imagePath to item 2 of argv
|
||||
set question to item 3 of argv
|
||||
set modelToUse to defaultVisionModel
|
||||
|
||||
-- Check for custom model
|
||||
if argCount ≥ 5 then
|
||||
set modelFlag to item 4 of argv
|
||||
if modelFlag is "--model" then
|
||||
set modelToUse to item 5 of argv
|
||||
end if
|
||||
end if
|
||||
|
||||
return my analyzeImageWithAI(imagePath, question, modelToUse)
|
||||
end if
|
||||
end if
|
||||
|
||||
if argCount < 1 then return my usageText()
|
||||
|
||||
set appIdentifier to item 1 of argv
|
||||
|
|
@ -538,19 +681,38 @@ on run argv
|
|||
end if
|
||||
set captureMode to "screen" -- default
|
||||
set multiWindow to false
|
||||
set analyzeMode to false
|
||||
set analysisQuestion to ""
|
||||
set visionModel to defaultVisionModel
|
||||
|
||||
-- Parse additional options
|
||||
if argCount > 2 then
|
||||
repeat with i from 3 to argCount
|
||||
set i to 3
|
||||
repeat while i ≤ argCount
|
||||
set arg to item i of argv
|
||||
if arg is "--window" or arg is "-w" then
|
||||
set captureMode to "window"
|
||||
-- Remove interactive mode option
|
||||
else if arg is "--multi" or arg is "-m" then
|
||||
set multiWindow to true
|
||||
else if arg is "--verbose" or arg is "-v" then
|
||||
set verboseLogging to true
|
||||
else if arg is "--ask" or arg is "--analyze" then
|
||||
set analyzeMode to true
|
||||
if i < argCount then
|
||||
set i to i + 1
|
||||
set analysisQuestion to item i of argv
|
||||
else
|
||||
return my formatErrorMessage("Argument Error", "--ask requires a question parameter" & linefeed & linefeed & my usageText(), "validation")
|
||||
end if
|
||||
else if arg is "--model" then
|
||||
if i < argCount then
|
||||
set i to i + 1
|
||||
set visionModel to item i of argv
|
||||
else
|
||||
return my formatErrorMessage("Argument Error", "--model requires a model name parameter" & linefeed & linefeed & my usageText(), "validation")
|
||||
end if
|
||||
end if
|
||||
set i to i + 1
|
||||
end repeat
|
||||
end if
|
||||
|
||||
|
|
@ -638,7 +800,20 @@ on run argv
|
|||
set modeDescription to "full screen"
|
||||
if captureMode is "window" then set modeDescription to "front window only"
|
||||
|
||||
return scriptInfoPrefix & "Screenshot captured successfully! 📸" & linefeed & "• File: " & screenshotResult & linefeed & "• App: " & resolvedAppName & linefeed & "• Mode: " & modeDescription & linefeed & "💡 The " & modeDescription & " of " & resolvedAppName & " has been saved."
|
||||
-- If AI analysis requested, analyze the screenshot
|
||||
if analyzeMode then
|
||||
set analysisResult to my analyzeImageWithAI(screenshotResult, analysisQuestion, visionModel)
|
||||
if analysisResult starts with scriptInfoPrefix and analysisResult contains "Analysis Complete" then
|
||||
-- Successful analysis
|
||||
return analysisResult
|
||||
else
|
||||
-- Analysis failed, return screenshot success + analysis error
|
||||
return scriptInfoPrefix & "Screenshot captured successfully! 📸" & linefeed & "• File: " & screenshotResult & linefeed & "• App: " & resolvedAppName & linefeed & "• Mode: " & modeDescription & linefeed & linefeed & "⚠️ AI Analysis failed:" & linefeed & analysisResult
|
||||
end if
|
||||
else
|
||||
-- Regular screenshot without analysis
|
||||
return scriptInfoPrefix & "Screenshot captured successfully! 📸" & linefeed & "• File: " & screenshotResult & linefeed & "• App: " & resolvedAppName & linefeed & "• Mode: " & modeDescription & linefeed & "💡 The " & modeDescription & " of " & resolvedAppName & " has been saved."
|
||||
end if
|
||||
end if
|
||||
end if
|
||||
|
||||
|
|
@ -660,6 +835,7 @@ on usageText()
|
|||
|
||||
set outText to outText & "Usage:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"<app_name_or_bundle_id>\" [\"<output_path>\"] [options]" & LF
|
||||
set outText to outText & " osascript " & scriptName & " analyze \"<image_path>\" \"<question>\" [--model model_name]" & LF
|
||||
set outText to outText & " osascript " & scriptName & " list" & LF
|
||||
set outText to outText & " osascript " & scriptName & " help" & LF & LF
|
||||
|
||||
|
|
@ -670,12 +846,14 @@ on usageText()
|
|||
|
||||
set outText to outText & "Options:" & LF
|
||||
set outText to outText & " --window, -w: Capture frontmost window only" & LF
|
||||
set outText to outText & " --interactive, -i: Interactive window selection" & LF
|
||||
set outText to outText & " --multi, -m: Capture all windows with descriptive names" & LF
|
||||
set outText to outText & " --ask \"question\": AI analysis of screenshot (requires Ollama)" & LF
|
||||
set outText to outText & " --model model_name: Custom vision model (auto-detects best available)" & LF
|
||||
set outText to outText & " --verbose, -v: Enable verbose logging" & LF & LF
|
||||
|
||||
set outText to outText & "Commands:" & LF
|
||||
set outText to outText & " list: List all running apps with window titles" & LF
|
||||
set outText to outText & " analyze: Analyze existing image with AI vision" & LF
|
||||
set outText to outText & " help: Show this help message" & LF & LF
|
||||
|
||||
set outText to outText & "Examples:" & LF
|
||||
|
|
@ -688,7 +866,21 @@ on usageText()
|
|||
set outText to outText & " # Front window only:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"TextEdit\" \"/tmp/textedit.png\" --window" & LF
|
||||
set outText to outText & " # All windows with descriptive names:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"Safari\" \"/tmp/safari_windows.png\" --multi" & LF & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"Safari\" \"/tmp/safari_windows.png\" --multi" & LF
|
||||
set outText to outText & " # One-step: Screenshot + AI analysis:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"Safari\" --ask \"What's on this page?\"" & LF
|
||||
set outText to outText & " # Two-step: Analyze existing image:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " analyze \"/tmp/screenshot.png\" \"Describe what you see\"" & LF
|
||||
set outText to outText & " # Custom model:" & LF
|
||||
set outText to outText & " osascript " & scriptName & " \"Safari\" --ask \"Any errors?\" --model llava:13b" & LF & LF
|
||||
|
||||
set outText to outText & "AI Analysis Features:" & LF
|
||||
set outText to outText & " • Local inference with Ollama (private, no data sent to cloud)" & LF
|
||||
set outText to outText & " • Auto-detects best available vision model from your Ollama install" & LF
|
||||
set outText to outText & " • Priority: qwen2.5vl:7b > llava:7b > llava-phi3:3.8b > minicpm-v:8b" & LF
|
||||
set outText to outText & " • One-step: Screenshot + analysis in single command" & LF
|
||||
set outText to outText & " • Two-step: Analyze existing images separately" & LF
|
||||
set outText to outText & " • Detailed setup guide if models missing" & LF & LF
|
||||
|
||||
set outText to outText & "Multi-Window Features:" & LF
|
||||
set outText to outText & " • --multi creates separate files with descriptive names" & LF
|
||||
|
|
|
|||
Loading…
Reference in a new issue