feat: add A/B comparison script for measuring skill effectiveness (!2) · Merge requests · project / ai_best_practices

Adds evals/compare.py — runs behavioral evals with and without a skill loaded, reports pass rate delta, token usage, and cost per question.

Documents the before/after workflow in CONTRIBUTING.md so contributors can measure the impact of their changes.

feat: add A/B comparison script for measuring skill effectiveness