🤖 Added AI Vision Analysis with Smart Model Selection

Major new feature: Local AI vision analysis with Ollama integration Features: • One-step: Screenshot + AI analysis in single command • Two-step: Analyze existing images separately • Smart model auto-detection with priority ranking • Simplified ollama run commands (no complex API calls) • Comprehensive error handling and setup instructions Priority models: qwen2.5vl:7b > llava:7b > llava-phi3:3.8b > minicpm-v:8b Examples: osascript peekaboo.scpt "Safari" --ask "What's on this page?" osascript peekaboo.scpt analyze "/tmp/shot.png" "Any errors?" Perfect for automated testing, QA, and visual verification\! 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-27 15:07:41 +00:00 · 2025-05-22 19:02:14 +02:00 · 2025-05-22 19:02:14 +02:00 · a5132f53c1
commit a5132f53c1
parent 939c60aaaf
2 changed files with 288 additions and 6 deletions
--- a/README.md
+++ b/README.md
@ -18,6 +18,7 @@
 - 💥 **Zero interaction**: 100% unattended operation
 - 🧠 **Smart filenames**: Model-friendly names with app info
 - ⚡ **Optimized speed**: 70% faster capture delays
 - 🤖 **AI Vision Analysis**: Local Ollama integration with auto-model detection
 ---
@ -53,6 +54,12 @@ osascript peekaboo.scpt "Chrome" "/tmp/chrome.png" --multi
 # 🪟 Just the front window  
 osascript peekaboo.scpt "TextEdit" "/tmp/textedit.png" --window
 # 🤖 AI analysis: Screenshot + question in one step
 osascript peekaboo.scpt "Safari" --ask "What's on this page?"
 # 🔍 Analyze existing image
 osascript peekaboo.scpt analyze "/tmp/screenshot.png" "Any errors visible?"
 ```
 ---
@ -125,6 +132,60 @@ osascript peekaboo.scpt "Safari" "/tmp/shot.pdf"
 ---
 ## 🤖 **AI VISION ANALYSIS**
 Peekaboo integrates with **Ollama** for local AI vision analysis - ask questions about your screenshots!
 ### 🚀 **One-Step: Screenshot + Analysis**
 ```bash
 # Take screenshot and analyze it in one command
 osascript peekaboo.scpt "Safari" --ask "What's the main content on this page?"
 osascript peekaboo.scpt "Terminal" --ask "Any error messages visible?"
 osascript peekaboo.scpt "Xcode" --ask "Is the build successful?"
 osascript peekaboo.scpt "Chrome" --ask "What product is being shown?" --model llava:13b
 ```
 ### 🔍 **Two-Step: Analyze Existing Images**  
 ```bash
 # Analyze screenshots you already have
 osascript peekaboo.scpt analyze "/tmp/screenshot.png" "Describe what you see"
 osascript peekaboo.scpt analyze "/path/error.png" "What error is shown?"
 osascript peekaboo.scpt analyze "/Desktop/ui.png" "Any UI issues?" --model qwen2.5vl:7b
 ```
 ### 🧠 **Smart Model Selection**
 Peekaboo automatically picks the best available vision model:
 **Priority order:**
 1. `qwen2.5vl:7b` (6GB) - Best doc/chart understanding  
 2. `llava:7b` (4.7GB) - Solid all-rounder
 3. `llava-phi3:3.8b` (2.9GB) - Tiny but chatty
 4. `minicpm-v:8b` (5.5GB) - Killer OCR
 5. `gemma3:4b` (3.3GB) - Multilingual support
 ### ⚡ **Quick Setup**
 ```bash
 # Install Ollama
 curl -fsSL https://ollama.ai/install.sh | sh
 # Pull a vision model (pick one)
 ollama pull qwen2.5vl:7b    # Recommended: best overall
 ollama pull llava:7b        # Popular: good balance  
 ollama pull llava-phi3:3.8b # Lightweight: low RAM
 # Ready to analyze!
 osascript peekaboo.scpt "Safari" --ask "What's on this page?"
 ```
 **Perfect for:**
 - 🧪 Automated UI testing  
 - 📊 Dashboard monitoring
 - 🐛 Error detection
 - 📸 Content verification
 - 🔍 Visual QA automation
 ---
 ## 🧠 **SMART FILENAMES**
 Peekaboo automatically generates **model-friendly** filenames that are perfect for automation:
@ -214,18 +275,38 @@ osascript peekaboo.scpt "Safari" "/docs/browser.png" --multi
 osascript peekaboo.scpt "Your App"
 # → /tmp/peekaboo_your_app_20250522_143052.png
 # Automated visual testing with AI
 osascript peekaboo.scpt "Your App" --ask "Any error messages or crashes visible?"
 osascript peekaboo.scpt "Your App" --ask "Is the login screen displayed correctly?"
 # Custom path with timestamp
 osascript peekaboo.scpt "Your App" "/test-results/app-$(date +%s).png"
 ```
 ### 🎬 **Content Creation**
 ```bash
-# Before/after shots
+# Before/after shots with AI descriptions
 osascript peekaboo.scpt "Photoshop" --ask "Describe the current design state"
 # ... do your work ...
 osascript peekaboo.scpt "Photoshop" --ask "What changes were made to the design?"
 # Traditional before/after shots
 osascript peekaboo.scpt "Photoshop" "/content/before.png"
 # ... do your work ...
 osascript peekaboo.scpt "Photoshop" "/content/after.png"
 ```
 ### 🧪 **Automated QA & Testing**
 ```bash
 # Visual regression testing
 osascript peekaboo.scpt "Your App" --ask "Does the UI look correct?"
 osascript peekaboo.scpt "Safari" --ask "Are there any broken images or layout issues?"
 osascript peekaboo.scpt "Terminal" --ask "Any red error text visible?"
 # Dashboard monitoring
 osascript peekaboo.scpt analyze "/tmp/dashboard.png" "Are all metrics green?"
 ```
 ---
 ## 🚨 **TROUBLESHOOTING**
@ -268,6 +349,8 @@ osascript peekaboo.scpt "Safari" "/tmp/debug.png" --verbose
 | **Window modes** | ✅ `--window` for front window only |
 | **Auto paths** | ✅ Optional output path with smart /tmp defaults |
 | **Smart filenames** | ✅ Model-friendly: app_name_timestamp format |
 | **AI Vision Analysis** | ✅ Local Ollama integration with auto-model detection |
 | **Smart AI Models** | ✅ Auto-picks best: qwen2.5vl > llava > phi3 > minicpm |
 | **Verbose logging** | ✅ `--verbose` for debugging |
 ---
@ -318,6 +401,7 @@ property verboseLogging : false          -- Debug output
 - **Smart filenames**: Model-friendly with app names
 - **Smart targeting**: Works with app names OR bundle IDs
 - **Smart delays**: Optimized for speed (70% faster)
 - **Smart AI analysis**: Auto-detects best vision model
 - Auto-launches sleeping apps and brings them forward
 ### 🎭 **Multi-Window Mastery**
@ -331,6 +415,12 @@ property verboseLogging : false          -- Debug output
 - **0.1s multi-window focus** (down from 0.3s)
 - Responsive and practical for daily use
 ### 🤖 **AI-Powered Vision**
 - **Local analysis**: Private Ollama integration, no cloud
 - **Smart model selection**: Auto-picks best available model  
 - **One or two-step**: Screenshot+analyze or analyze existing images
 - **Perfect for automation**: Visual testing, error detection, QA
 ### 🔍 **Discovery Built-In**
 - See exactly what's running
 - Get precise window titles
--- a/peekaboo.scpt
+++ b/peekaboo.scpt
@ -13,6 +13,10 @@ property windowActivationDelay : 0.2
 property enhancedErrorReporting : true
 property verboseLogging : false
 property maxWindowTitleLength : 50
 -- AI Analysis Configuration  
 property defaultVisionModel : "qwen2.5vl:7b"
 -- Prioritized list of vision models (best to fallback)
 property visionModelPriority : {"qwen2.5vl:7b", "llava:7b", "llava-phi3:3.8b", "minicpm-v:8b", "gemma3:4b", "llava:latest", "qwen2.5vl:3b", "llava:13b", "llava-llama3:8b"}
 --#endregion Configuration Properties
 --#region Helper Functions
@ -135,6 +139,125 @@ on trimWhitespace(theText)
 end trimWhitespace
 --#endregion Helper Functions
 --#region AI Analysis Functions
 on checkOllamaAvailable()
    try
        do shell script "ollama --version >/dev/null 2>&1"
        return true
    on error
        return false
    end try
 end checkOllamaAvailable
 on getAvailableVisionModels()
    set availableModels to {}
    try
        set ollamaList to do shell script "ollama list 2>/dev/null | tail -n +2 | awk '{print $1}' | grep -v '^$'"
        set modelLines to paragraphs of ollamaList
        repeat with modelLine in modelLines
            set modelName to contents of modelLine
            if modelName is not "" then
                set end of availableModels to modelName
            end if
        end repeat
    on error
        -- Return empty list if ollama list fails
    end try
    return availableModels
 end getAvailableVisionModels
 on findBestVisionModel(requestedModel)
    my logVerbose("Finding best vision model, requested: " & requestedModel)
    set availableModels to my getAvailableVisionModels()
    my logVerbose("Available models: " & (availableModels as string))
    -- If specific model requested and available, use it
    if requestedModel is not defaultVisionModel then
        repeat with availModel in availableModels
            if contents of availModel is requestedModel then
                my logVerbose("Using requested model: " & requestedModel)
                return requestedModel
            end if
        end repeat
        -- Requested model not found, will fall back to priority list
        my logVerbose("Requested model '" & requestedModel & "' not found, checking priority list")
    end if
    -- Find best available model from priority list
    repeat with priorityModel in visionModelPriority
        repeat with availModel in availableModels
            if contents of availModel is contents of priorityModel then
                my logVerbose("Using priority model: " & contents of priorityModel)
                return contents of priorityModel
            end if
        end repeat
    end repeat
    -- No priority models available, use first available vision model
    repeat with availModel in availableModels
        set modelName to contents of availModel
        if modelName contains "llava" or modelName contains "qwen" or modelName contains "gemma" or modelName contains "minicpm" then
            my logVerbose("Using first available vision model: " & modelName)
            return modelName
        end if
    end repeat
    -- No vision models found
    return ""
 end findBestVisionModel
 on getOllamaInstallInstructions()
    set instructions to scriptInfoPrefix & "AI Analysis requires Ollama with a vision model." & linefeed & linefeed
    set instructions to instructions & "🚀 Quick Setup:" & linefeed
    set instructions to instructions & "1. Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh" & linefeed
    set instructions to instructions & "2. Pull a vision model: ollama pull " & defaultVisionModel & linefeed
    set instructions to instructions & "3. Models are ready to use!" & linefeed & linefeed
    set instructions to instructions & "💡 Recommended models:" & linefeed
    set instructions to instructions & "  • qwen2.5vl:7b (6GB) - Best doc/chart understanding" & linefeed
    set instructions to instructions & "  • llava:7b (4.7GB) - Solid all-rounder" & linefeed  
    set instructions to instructions & "  • llava-phi3:3.8b (2.9GB) - Tiny but chatty" & linefeed
    set instructions to instructions & "  • minicpm-v:8b (5.5GB) - Killer OCR" & linefeed & linefeed
    set instructions to instructions & "Then retry your Peekaboo command with --ask or --analyze!"
    return instructions
 end getOllamaInstallInstructions
 on analyzeImageWithAI(imagePath, question, requestedModel)
    my logVerbose("Analyzing image with AI: " & imagePath)
    my logVerbose("Requested model: " & requestedModel)
    my logVerbose("Question: " & question)
    -- Check if Ollama is available
    if not my checkOllamaAvailable() then
        return my formatErrorMessage("Ollama Error", "Ollama is not installed or not in PATH." & linefeed & linefeed & my getOllamaInstallInstructions(), "ollama unavailable")
    end if
    -- Find best available vision model
    set modelToUse to my findBestVisionModel(requestedModel)
    if modelToUse is "" then
        return my formatErrorMessage("Model Error", "No vision models found." & linefeed & linefeed & my getOllamaInstallInstructions(), "no vision models")
    end if
    -- Use ollama run command (much simpler than API)
    try
        my logVerbose("Using model: " & modelToUse)
        set ollamaCmd to "ollama run " & quoted form of modelToUse & " --image " & quoted form of imagePath & " " & quoted form of question
        my logVerbose("Running: " & ollamaCmd)
        set aiResponse to do shell script ollamaCmd
        return scriptInfoPrefix & "AI Analysis Complete! 🤖" & linefeed & linefeed & "📸 Image: " & imagePath & linefeed & "❓ Question: " & question & linefeed & "🤖 Model: " & modelToUse & linefeed & linefeed & "💬 Answer:" & linefeed & aiResponse
    on error errMsg
        if errMsg contains "model" and errMsg contains "not found" then
            return my formatErrorMessage("Model Error", "Model '" & modelToUse & "' not found." & linefeed & linefeed & "Install it with: ollama pull " & modelToUse & linefeed & linefeed & my getOllamaInstallInstructions(), "model not found")
        else
            return my formatErrorMessage("Analysis Error", "Failed to analyze image: " & errMsg & linefeed & linefeed & "Make sure Ollama is running and the model is available.", "ollama execution")
        end if
    end try
 end analyzeImageWithAI
 --#endregion AI Analysis Functions
 --#region App Discovery Functions
 on listRunningApps()
    set appList to {}
@ -523,6 +646,26 @@ on run argv
            end if
        end if
        -- Handle analyze command for existing images (two-step workflow)
        if argCount ≥ 3 then
            set firstArg to item 1 of argv
            if firstArg is "analyze" or firstArg is "--analyze" then
                set imagePath to item 2 of argv
                set question to item 3 of argv
                set modelToUse to defaultVisionModel
                -- Check for custom model
                if argCount ≥ 5 then
                    set modelFlag to item 4 of argv
                    if modelFlag is "--model" then
                        set modelToUse to item 5 of argv
                    end if
                end if
                return my analyzeImageWithAI(imagePath, question, modelToUse)
            end if
        end if
        if argCount < 1 then return my usageText()
        set appIdentifier to item 1 of argv
@ -538,19 +681,38 @@ on run argv
        end if
        set captureMode to "screen" -- default
        set multiWindow to false
        set analyzeMode to false
        set analysisQuestion to ""
        set visionModel to defaultVisionModel
        -- Parse additional options
        if argCount > 2 then
-            repeat with i from 3 to argCount
+            set i to 3
            repeat while i ≤ argCount
                set arg to item i of argv
                if arg is "--window" or arg is "-w" then
                    set captureMode to "window"
                -- Remove interactive mode option
                else if arg is "--multi" or arg is "-m" then
                    set multiWindow to true
                else if arg is "--verbose" or arg is "-v" then
                    set verboseLogging to true
                else if arg is "--ask" or arg is "--analyze" then
                    set analyzeMode to true
                    if i < argCount then
                        set i to i + 1
                        set analysisQuestion to item i of argv
                    else
                        return my formatErrorMessage("Argument Error", "--ask requires a question parameter" & linefeed & linefeed & my usageText(), "validation")
                    end if
                else if arg is "--model" then
                    if i < argCount then
                        set i to i + 1
                        set visionModel to item i of argv
                    else
                        return my formatErrorMessage("Argument Error", "--model requires a model name parameter" & linefeed & linefeed & my usageText(), "validation")
                    end if
                end if
                set i to i + 1
            end repeat
        end if
@ -638,7 +800,20 @@ on run argv
                set modeDescription to "full screen"
                if captureMode is "window" then set modeDescription to "front window only"
-                return scriptInfoPrefix & "Screenshot captured successfully! 📸" & linefeed & "• File: " & screenshotResult & linefeed & "• App: " & resolvedAppName & linefeed & "• Mode: " & modeDescription & linefeed & "💡 The " & modeDescription & " of " & resolvedAppName & " has been saved."
+                -- If AI analysis requested, analyze the screenshot
                if analyzeMode then
                    set analysisResult to my analyzeImageWithAI(screenshotResult, analysisQuestion, visionModel)
                    if analysisResult starts with scriptInfoPrefix and analysisResult contains "Analysis Complete" then
                        -- Successful analysis
                        return analysisResult
                    else
                        -- Analysis failed, return screenshot success + analysis error
                        return scriptInfoPrefix & "Screenshot captured successfully! 📸" & linefeed & "• File: " & screenshotResult & linefeed & "• App: " & resolvedAppName & linefeed & "• Mode: " & modeDescription & linefeed & linefeed & "⚠️ AI Analysis failed:" & linefeed & analysisResult
                    end if
                else
                    -- Regular screenshot without analysis
                    return scriptInfoPrefix & "Screenshot captured successfully! 📸" & linefeed & "• File: " & screenshotResult & linefeed & "• App: " & resolvedAppName & linefeed & "• Mode: " & modeDescription & linefeed & "💡 The " & modeDescription & " of " & resolvedAppName & " has been saved."
                end if
            end if
        end if
@ -660,6 +835,7 @@ on usageText()
    set outText to outText & "Usage:" & LF
    set outText to outText & "  osascript " & scriptName & " \"<app_name_or_bundle_id>\" [\"<output_path>\"] [options]" & LF
    set outText to outText & "  osascript " & scriptName & " analyze \"<image_path>\" \"<question>\" [--model model_name]" & LF
    set outText to outText & "  osascript " & scriptName & " list" & LF
    set outText to outText & "  osascript " & scriptName & " help" & LF & LF
@ -670,12 +846,14 @@ on usageText()
    set outText to outText & "Options:" & LF
    set outText to outText & "  --window, -w:         Capture frontmost window only" & LF
    set outText to outText & "  --interactive, -i:    Interactive window selection" & LF
    set outText to outText & "  --multi, -m:          Capture all windows with descriptive names" & LF
    set outText to outText & "  --ask \"question\":      AI analysis of screenshot (requires Ollama)" & LF
    set outText to outText & "  --model model_name:   Custom vision model (auto-detects best available)" & LF
    set outText to outText & "  --verbose, -v:        Enable verbose logging" & LF & LF
    set outText to outText & "Commands:" & LF
    set outText to outText & "  list:                 List all running apps with window titles" & LF
    set outText to outText & "  analyze:              Analyze existing image with AI vision" & LF
    set outText to outText & "  help:                 Show this help message" & LF & LF
    set outText to outText & "Examples:" & LF
@ -688,7 +866,21 @@ on usageText()
    set outText to outText & "  # Front window only:" & LF
    set outText to outText & "  osascript " & scriptName & " \"TextEdit\" \"/tmp/textedit.png\" --window" & LF
    set outText to outText & "  # All windows with descriptive names:" & LF
-    set outText to outText & "  osascript " & scriptName & " \"Safari\" \"/tmp/safari_windows.png\" --multi" & LF & LF
+    set outText to outText & "  osascript " & scriptName & " \"Safari\" \"/tmp/safari_windows.png\" --multi" & LF
    set outText to outText & "  # One-step: Screenshot + AI analysis:" & LF
    set outText to outText & "  osascript " & scriptName & " \"Safari\" --ask \"What's on this page?\"" & LF
    set outText to outText & "  # Two-step: Analyze existing image:" & LF
    set outText to outText & "  osascript " & scriptName & " analyze \"/tmp/screenshot.png\" \"Describe what you see\"" & LF
    set outText to outText & "  # Custom model:" & LF
    set outText to outText & "  osascript " & scriptName & " \"Safari\" --ask \"Any errors?\" --model llava:13b" & LF & LF
    set outText to outText & "AI Analysis Features:" & LF
    set outText to outText & "  • Local inference with Ollama (private, no data sent to cloud)" & LF
    set outText to outText & "  • Auto-detects best available vision model from your Ollama install" & LF
    set outText to outText & "  • Priority: qwen2.5vl:7b > llava:7b > llava-phi3:3.8b > minicpm-v:8b" & LF
    set outText to outText & "  • One-step: Screenshot + analysis in single command" & LF
    set outText to outText & "  • Two-step: Analyze existing images separately" & LF
    set outText to outText & "  • Detailed setup guide if models missing" & LF & LF
    set outText to outText & "Multi-Window Features:" & LF
    set outText to outText & "  • --multi creates separate files with descriptive names" & LF