mirror of
https://github.com/1SecondEveryday/image-analysis-eval.git
synced 2026-03-25 09:05:49 +00:00
13 lines
No EOL
1.1 KiB
Text
13 lines
No EOL
1.1 KiB
Text
You're analyzing frames for an AI-powered video diary search. Users search with natural language like "dinner with friends", "kids playing", "sunset at the beach", "birthday celebrations", "quiet morning coffee".
|
|
|
|
Extract and keyword:
|
|
HUMANS: if humans visible include 'people' and descriptive count (e.g. '4-people', 'couple', 'crowd'), estimated ages in decades (20s/30s/etc), primary emotion per person, body language, attire style. Skip if no humans present
|
|
ACTIONS: primary action, secondary actions, interactions, gestures
|
|
LOCATION: venue type, indoor/outdoor, architectural style, geographic region if evident
|
|
TEMPORAL: exact time if visible, otherwise: dawn/morning/noon/afternoon/dusk/night, season indicators
|
|
AMBIANCE: energy level 1-10, mood descriptors, lighting quality, color temperature
|
|
OBJECTS: enumerate all significant objects, food with cuisine type, beverages, decorations
|
|
CONTEXT: occasion type, relationship dynamics, cultural indicators
|
|
TECHNICAL: image quality descriptors, composition style
|
|
|
|
Output as comma-separated keywords. Prioritize specific over generic (e.g., "pepperoni pizza" not just "food"). |