mirror of
https://github.com/1SecondEveryday/image-analysis-eval.git
synced 2026-03-25 09:05:49 +00:00
Restore complex prompts and add more models
This commit is contained in:
parent
db295c545c
commit
e73c212b87
3 changed files with 27 additions and 1 deletions
|
|
@ -11,7 +11,7 @@ require 'time'
|
|||
|
||||
class TagExtractor
|
||||
OLLAMA_URL = 'http://localhost:11434/api/generate'
|
||||
DEFAULT_MODELS = ['llava:7b']
|
||||
DEFAULT_MODELS = ['llava:7b', 'qwen2.5vl:7b', 'bakllava:7b', 'minicpm-v:8b', 'llama3.2-vision:11b', 'llava:13b']
|
||||
VALID_EXTENSIONS = %w[.jpg .jpeg .png .gif .bmp .tiff .tif].freeze
|
||||
|
||||
def initialize(options = {})
|
||||
|
|
|
|||
13
prompts/08-memory-search-optimizer.txt
Normal file
13
prompts/08-memory-search-optimizer.txt
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
Create detailed keywords for video diary search. Users might search for: "happy moments", "food experiences", "family time", "adventures", "quiet moments", "celebrations", "daily life", "travel memories".
|
||||
|
||||
Keyword everything visible including:
|
||||
- People: if present include 'people' with count descriptor (e.g. '3-people'), approximate ages, emotions on faces, what they're doing, how they're interacting
|
||||
- Scene type: where this is happening, indoor/outdoor, public/private space
|
||||
- Time: morning light, afternoon, golden hour, evening, night time
|
||||
- Mood: the feeling of the moment (joyful, peaceful, exciting, intimate, festive, contemplative)
|
||||
- Activities: eating, playing, working, relaxing, traveling, celebrating, exploring
|
||||
- Details: specific foods visible, drinks, decorations, clothing styles, weather, season
|
||||
- Colors: main colors that define the scene
|
||||
- Special moments: laughter, hugs, cheers, surprises, achievements
|
||||
|
||||
Format: comma-separated keywords only, be specific rather than generic.
|
||||
13
prompts/11-smart-scene-decoder.txt
Normal file
13
prompts/11-smart-scene-decoder.txt
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
You're analyzing frames for an AI-powered video diary search. Users search with natural language like "dinner with friends", "kids playing", "sunset at the beach", "birthday celebrations", "quiet morning coffee".
|
||||
|
||||
Extract and keyword:
|
||||
HUMANS: if humans visible include 'people' and descriptive count (e.g. '4-people', 'couple', 'crowd'), estimated ages in decades (20s/30s/etc), primary emotion per person, body language, attire style. Skip if no humans present
|
||||
ACTIONS: primary action, secondary actions, interactions, gestures
|
||||
LOCATION: venue type, indoor/outdoor, architectural style, geographic region if evident
|
||||
TEMPORAL: exact time if visible, otherwise: dawn/morning/noon/afternoon/dusk/night, season indicators
|
||||
AMBIANCE: energy level 1-10, mood descriptors, lighting quality, color temperature
|
||||
OBJECTS: enumerate all significant objects, food with cuisine type, beverages, decorations
|
||||
CONTEXT: occasion type, relationship dynamics, cultural indicators
|
||||
TECHNICAL: image quality descriptors, composition style
|
||||
|
||||
Output as comma-separated keywords. Prioritize specific over generic (e.g., "pepperoni pizza" not just "food").
|
||||
Loading…
Reference in a new issue