mirror of
https://github.com/1SecondEveryday/image-analysis-eval.git
synced 2026-03-25 09:05:49 +00:00
Tweak extract_tags.rb script
This commit is contained in:
parent
296bf87522
commit
f5004815ef
2 changed files with 33 additions and 8 deletions
25
CLAUDE.md
25
CLAUDE.md
|
|
@ -72,6 +72,25 @@ The evaluation framework tests different combinations of:
|
|||
## Current Focus
|
||||
|
||||
Based on git history, the project has narrowed from broad testing to:
|
||||
- **Models**: llava:7b (quality) and qwen2.5vl:3b (speed)
|
||||
- **Sizes**: 768px and 1024px (balance of quality and performance)
|
||||
- **Goal**: Optimal tag extraction for video diary search functionality
|
||||
- **Models**: llava:7b, qwen2.5vl:7b, and minicpm-v:8b
|
||||
- **Sizes**: 768px (optimal balance of quality and performance)
|
||||
- **Prompts**: Simplified to 01, 03, and 05 (complex prompts removed)
|
||||
- **Goal**: Optimal tag extraction for video diary search functionality
|
||||
|
||||
## Evaluation Priorities
|
||||
|
||||
When evaluating model performance, our priorities are (in order):
|
||||
1. **People detection** - Detecting human presence, emotions, expressions, moods, activities, and interactions
|
||||
2. **Overall mood/atmosphere** - Capturing the feeling and emotional tone of scenes
|
||||
3. **Objects** - Important items that provide context
|
||||
4. **Scene details** - Colors, lighting, setting/location, time of day
|
||||
2. **Camera perspective** - Identifying selfies and POV (first-person) shots
|
||||
|
||||
### Key Insights from Testing
|
||||
- **Emotion focus**: We prioritize understanding how people feel over precisely counting them
|
||||
- **Background matters**: Details like "bicycles in distance" enable memory-based searches
|
||||
- **Simple prompts win**: Complex prompts cause repetition without adding value
|
||||
- **Model strengths vary**:
|
||||
- Qwen2.5VL: Best for emotion keywords
|
||||
- MiniCPM-V: Best for comprehensive scene understanding
|
||||
- LLaVA:7b: Most reliable with minimal repetition
|
||||
|
|
|
|||
|
|
@ -11,7 +11,8 @@ require 'time'
|
|||
|
||||
class TagExtractor
|
||||
OLLAMA_URL = 'http://localhost:11434/api/generate'
|
||||
DEFAULT_MODELS = ['llava:7b', 'qwen2.5vl:7b', 'minicpm-v:8b']
|
||||
# DEFAULT_MODELS = ['llava:7b', 'qwen2.5vl:7b', 'minicpm-v:8b']
|
||||
DEFAULT_MODELS = ['gemma3:4b', 'gemma3:12b', 'gemma3:27b']
|
||||
VALID_EXTENSIONS = %w[.jpg .jpeg .png .gif .bmp .tiff .tif].freeze
|
||||
|
||||
def initialize(options = {})
|
||||
|
|
@ -61,15 +62,20 @@ class TagExtractor
|
|||
|
||||
# Check if model exists and pull if needed
|
||||
unless model_exists?(model)
|
||||
puts " 📦 Model not found locally. Pulling #{model}..."
|
||||
puts " 📦 Model #{model} not found locally. Attempting to pull..."
|
||||
puts " ⏳ This may take a while for large models..."
|
||||
|
||||
pull_success = system("ollama pull #{model}")
|
||||
|
||||
unless pull_success
|
||||
puts " ❌ Failed to pull #{model}. Skipping..."
|
||||
puts " ❌ Failed to pull #{model}. Skipping this model."
|
||||
puts " Try running manually: ollama pull #{model}"
|
||||
next
|
||||
end
|
||||
|
||||
puts " ✓ Successfully pulled #{model}"
|
||||
else
|
||||
puts " ✓ Model #{model} already available"
|
||||
end
|
||||
|
||||
# Ensure model is loaded
|
||||
|
|
@ -205,8 +211,8 @@ class TagExtractor
|
|||
|
||||
def model_exists?(model)
|
||||
list_output = `ollama list 2>&1`
|
||||
model_name = model.split(':').first
|
||||
list_output.include?(model_name)
|
||||
# The model name appears at the start of each line in the output
|
||||
list_output.lines.any? { |line| line.strip.start_with?("#{model} ") || line.strip.start_with?("#{model}\t") }
|
||||
end
|
||||
|
||||
def unload_model(model)
|
||||
|
|
|
|||
Loading…
Reference in a new issue