Commit graph

30 commits

Author SHA1 Message Date
aa2b7abc2f
Add an untested script to run batches in parallel on runpod 2025-07-14 19:53:52 -07:00
0848b43304
Enhance README with comprehensive testing history and insights
Documents the complete 7-round evaluation process, from initial 6-model testing through Gemma3:12b's breakthrough selfie detection. Adds historical context for removed experimental prompts (07-11), model evolution insights, and performance characteristics discovered through extensive testing.

Key additions:
- Complete testing history (Take 1-7 plus mini-tests)
- Model ranking evolution and breakthrough discoveries
- Experimental prompt history and removal rationale
- Technical insights from 768px optimization and repetition patterns
- Results archive documentation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-09 13:35:46 -07:00
357018ee7b
Add comprehensive README for image analysis evaluation framework
Documents the VLM evaluation system for extracting searchable tags from video diary snippets. Includes setup instructions, script documentation, prompt strategies, and performance insights from extensive testing.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-09 13:26:34 -07:00
c683981279
Improve prompt when no people are present 2025-07-09 11:49:04 -07:00
7a5d40eb4b
Add take7 results with no user prompt, it's bad 2025-06-27 22:57:03 -04:00
5cff18bb9a
Replace --single-prompt with --skip-prompt, but it sucks 2025-06-27 22:56:37 -04:00
6c6d0e86a3
Add more results, keyword selector, and 768px photos 2025-06-26 12:46:13 -04:00
2481b9299d
Add take 5 results 2025-06-26 09:42:31 -04:00
f5004815ef
Tweak extract_tags.rb script 2025-06-26 09:42:11 -04:00
296bf87522
Tweak prompts to put the right emphasis on people (it's not counting them) 2025-06-25 09:24:18 -04:00
f115fdf0a1
Add results take 4 - bigger models 2025-06-25 08:38:37 -04:00
3a39303629
Add results from runs 2 and 3 2025-06-25 01:08:16 -04:00
e73c212b87
Restore complex prompts and add more models 2025-06-25 01:08:01 -04:00
db295c545c
Remove qwen2.5vl:3b for now 2025-06-25 00:30:31 -04:00
e706d68e72
Remove under-performing prompts 2025-06-25 00:29:55 -04:00
dd16b71f54
Try to reduce mentions of people when there are none 2025-06-25 00:22:57 -04:00
12769edaf4
Improve system prompt, drop temp back down to 0.1 2025-06-25 00:20:37 -04:00
6807db8ad9
Add CLAUDE.md 2025-06-24 23:51:31 -04:00
81f4ac2396
Add results-take1 summary 2025-06-24 23:51:21 -04:00
437a4a3284
Simplify, focus on llava:7b and qwen2.5vl:3b and 768px and 1024px images 2025-06-24 23:05:19 -04:00
9c32f2d04c
Add first batch of results 2025-06-24 22:47:30 -04:00
554488d1c4
Add scripts to work in parallel and aggregate results separately 2025-06-24 21:55:21 -04:00
c9fbfc1b67
Tweak concurrency/parallelism per model 2025-06-24 11:14:53 -04:00
7b6a1e5479
Pull models properly when benchmarking 2025-06-24 10:20:18 -04:00
86a382a700
Make benchmarking a lot faster and more efficient, stop deleting models 2025-06-24 10:09:36 -04:00
0d0e7a7cb0
Add benchmark script 2025-06-24 10:01:18 -04:00
f2750ed0e2
Pull models as needed 2025-06-24 09:46:44 -04:00
02a430c60e
Add more default models 2025-06-24 09:43:22 -04:00
7e4036ff20
Rename original photos 2025-06-24 09:30:03 -04:00
c283e6fb4f
First commit 2025-06-24 09:22:33 -04:00