Competitive

Prompt Simulator

Cross-model citation probe. Single-probe measurement is statistically meaningless.

SAMPLED

ai-product-bench baseline: 47.3% agreement across 132q × 3r × 2m = 792 responses

Your simulations appear above/below this line in the delta column.

Topic

Models

Prompts per topic50

10100

Runs per prompt3

~20 min · 600 LLM calls

Simulation history