feat: #3583203 Add multi-model comparison support to compare.py
Add --models flag to run evals across multiple models in a single invocation (e.g., --models sonnet haiku). Prints per-model comparison tables followed by a cross-model summary showing which models benefit from guidance. Backwards-compatible: --model (singular) works as before.
Includes duplicate model detection, mutual exclusion with --model, and branching JSON output (single-model uses old format, multi-model uses per_model + summary structure).
By: zorz
Closes #3583203