gh-1SecondEveryday-image-analysis-eval

sjs/gh-1SecondEveryday-image-analysis-eval

mirror of https://github.com/1SecondEveryday/image-analysis-eval.git synced 2026-03-25 09:05:49 +00:00

Author	SHA1	Message	Date
Sami Samhuri	aa2b7abc2f	Add an untested script to run batches in parallel on runpod	2025-07-14 19:53:52 -07:00
Sami Samhuri	0848b43304	Enhance README with comprehensive testing history and insights Documents the complete 7-round evaluation process, from initial 6-model testing through Gemma3:12b's breakthrough selfie detection. Adds historical context for removed experimental prompts (07-11), model evolution insights, and performance characteristics discovered through extensive testing. Key additions: - Complete testing history (Take 1-7 plus mini-tests) - Model ranking evolution and breakthrough discoveries - Experimental prompt history and removal rationale - Technical insights from 768px optimization and repetition patterns - Results archive documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-07-09 13:35:46 -07:00
Sami Samhuri	357018ee7b	Add comprehensive README for image analysis evaluation framework Documents the VLM evaluation system for extracting searchable tags from video diary snippets. Includes setup instructions, script documentation, prompt strategies, and performance insights from extensive testing. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-07-09 13:26:34 -07:00
Sami Samhuri	c683981279	Improve prompt when no people are present	2025-07-09 11:49:04 -07:00
Sami Samhuri	7a5d40eb4b	Add take7 results with no user prompt, it's bad	2025-06-27 22:57:03 -04:00
Sami Samhuri	5cff18bb9a	Replace --single-prompt with --skip-prompt, but it sucks	2025-06-27 22:56:37 -04:00
Sami Samhuri	6c6d0e86a3	Add more results, keyword selector, and 768px photos	2025-06-26 12:46:13 -04:00
Sami Samhuri	2481b9299d	Add take 5 results	2025-06-26 09:42:31 -04:00
Sami Samhuri	f5004815ef	Tweak extract_tags.rb script	2025-06-26 09:42:11 -04:00
Sami Samhuri	296bf87522	Tweak prompts to put the right emphasis on people (it's not counting them)	2025-06-25 09:24:18 -04:00
Sami Samhuri	f115fdf0a1	Add results take 4 - bigger models	2025-06-25 08:38:37 -04:00
Sami Samhuri	3a39303629	Add results from runs 2 and 3	2025-06-25 01:08:16 -04:00
Sami Samhuri	e73c212b87	Restore complex prompts and add more models	2025-06-25 01:08:01 -04:00
Sami Samhuri	db295c545c	Remove qwen2.5vl:3b for now	2025-06-25 00:30:31 -04:00
Sami Samhuri	e706d68e72	Remove under-performing prompts	2025-06-25 00:29:55 -04:00
Sami Samhuri	dd16b71f54	Try to reduce mentions of people when there are none	2025-06-25 00:22:57 -04:00
Sami Samhuri	12769edaf4	Improve system prompt, drop temp back down to 0.1	2025-06-25 00:20:37 -04:00
Sami Samhuri	6807db8ad9	Add CLAUDE.md	2025-06-24 23:51:31 -04:00
Sami Samhuri	81f4ac2396	Add results-take1 summary	2025-06-24 23:51:21 -04:00
Sami Samhuri	437a4a3284	Simplify, focus on llava:7b and qwen2.5vl:3b and 768px and 1024px images	2025-06-24 23:05:19 -04:00
Sami Samhuri	9c32f2d04c	Add first batch of results	2025-06-24 22:47:30 -04:00
Sami Samhuri	554488d1c4	Add scripts to work in parallel and aggregate results separately	2025-06-24 21:55:21 -04:00
Sami Samhuri	c9fbfc1b67	Tweak concurrency/parallelism per model	2025-06-24 11:14:53 -04:00
Sami Samhuri	7b6a1e5479	Pull models properly when benchmarking	2025-06-24 10:20:18 -04:00
Sami Samhuri	86a382a700	Make benchmarking a lot faster and more efficient, stop deleting models	2025-06-24 10:09:36 -04:00
Sami Samhuri	0d0e7a7cb0	Add benchmark script	2025-06-24 10:01:18 -04:00
Sami Samhuri	f2750ed0e2	Pull models as needed	2025-06-24 09:46:44 -04:00
Sami Samhuri	02a430c60e	Add more default models	2025-06-24 09:43:22 -04:00
Sami Samhuri	7e4036ff20	Rename original photos	2025-06-24 09:30:03 -04:00
Sami Samhuri	c283e6fb4f	First commit	2025-06-24 09:22:33 -04:00

30 commits