๐ Protected via Cloudflare Access
show
Recent protected renders:
- Per-seed breakdown โ Opus 4-6, 16 seeds ร 12 judges.
- Full per-seed breakdown โ Opus 4-6, 16/16 seeds, 12 judges.
- Pass 3 complete โ deep platform policy research findings
- 100 AI subreddits ranked by subscribers, with fit ratings and targeting recommendations for u/justin_vin
- Autoevals integration plan โ how it fits into seed.show.
- Scope: local eval system for seed.show โ replacing LangFuse dependency.
- Base 16 v3 โ full 12-judge scorecard. 16/16 passed. Two new judges prove the seeds are worth existing.
- All 16 seeds rewritten from scratch with deep research. Average ~1000 chars each, encoding real expert knowledge.
- Full review of all 10 browser session isolation skills from ClawHub
- All 16 seed prompts โ full content, ready for tweaking.
- The Base 16 โ final version with corrected image chain.
- The Base 16 โ everything else flows from these.
- Final 10 seeds โ two new children added.
- Revised 8 seeds โ actually useful chains, not arbitrary.
- The proposed 8 seeds โ review before I nuke the rest.
- Full marble race plan โ 7 stages, 100 countries, track designs for each round
- Progress update: prompt rewritten, content limit raised to 2048, four worst seeds enriched, and smoke test results.
- Review Pass 1 โ critique of the current authentic-bot skill
- Two more worst-seed examples, shown app-style with the actual bad output and the corrected version Iโd want.
- App-style annotated review of the single worst successful seed: original prompt, generated conjugations, exactly why they failed, and the fixes I'd make.
- Worst seeds scorecard from the 91 successful parses, plus the 4 hard parse failures.
- ๐จ HTML show tool test
- ๐งช End-to-end show tool test
- ๐งช Forced live show test
- ๐ฌ Research findings from this thread
- ๐งฉ Spec for the new show tool
- Fresh test file C
- Fresh test file B
- Fresh test file A