|
|
296bf87522
|
Tweak prompts to put the right emphasis on people (it's not counting them)
|
2025-06-25 09:24:18 -04:00 |
|
|
|
f115fdf0a1
|
Add results take 4 - bigger models
|
2025-06-25 08:38:37 -04:00 |
|
|
|
3a39303629
|
Add results from runs 2 and 3
|
2025-06-25 01:08:16 -04:00 |
|
|
|
e73c212b87
|
Restore complex prompts and add more models
|
2025-06-25 01:08:01 -04:00 |
|
|
|
db295c545c
|
Remove qwen2.5vl:3b for now
|
2025-06-25 00:30:31 -04:00 |
|
|
|
e706d68e72
|
Remove under-performing prompts
|
2025-06-25 00:29:55 -04:00 |
|
|
|
dd16b71f54
|
Try to reduce mentions of people when there are none
|
2025-06-25 00:22:57 -04:00 |
|
|
|
12769edaf4
|
Improve system prompt, drop temp back down to 0.1
|
2025-06-25 00:20:37 -04:00 |
|
|
|
6807db8ad9
|
Add CLAUDE.md
|
2025-06-24 23:51:31 -04:00 |
|
|
|
81f4ac2396
|
Add results-take1 summary
|
2025-06-24 23:51:21 -04:00 |
|
|
|
437a4a3284
|
Simplify, focus on llava:7b and qwen2.5vl:3b and 768px and 1024px images
|
2025-06-24 23:05:19 -04:00 |
|
|
|
9c32f2d04c
|
Add first batch of results
|
2025-06-24 22:47:30 -04:00 |
|
|
|
554488d1c4
|
Add scripts to work in parallel and aggregate results separately
|
2025-06-24 21:55:21 -04:00 |
|
|
|
c9fbfc1b67
|
Tweak concurrency/parallelism per model
|
2025-06-24 11:14:53 -04:00 |
|
|
|
7b6a1e5479
|
Pull models properly when benchmarking
|
2025-06-24 10:20:18 -04:00 |
|
|
|
86a382a700
|
Make benchmarking a lot faster and more efficient, stop deleting models
|
2025-06-24 10:09:36 -04:00 |
|
|
|
0d0e7a7cb0
|
Add benchmark script
|
2025-06-24 10:01:18 -04:00 |
|
|
|
f2750ed0e2
|
Pull models as needed
|
2025-06-24 09:46:44 -04:00 |
|
|
|
02a430c60e
|
Add more default models
|
2025-06-24 09:43:22 -04:00 |
|
|
|
7e4036ff20
|
Rename original photos
|
2025-06-24 09:30:03 -04:00 |
|
|
|
c283e6fb4f
|
First commit
|
2025-06-24 09:22:33 -04:00 |
|