Why accuracy matters
Predicting behavior without understanding causality is guesswork. The leaderboard shows how closely our agent simulations match observed human outcomes so operators can trust the models behind their decisions.
Benchmark methodology
Each leaderboard entry represents a replicated human study, a production experiment, or a partner pilot. We ingest anonymized outcomes, recreate the scenario in simulation, and measure how well agent choices align with observed behavior.
Benchmarks cover pricing, messaging, product adoption, policy response, and support experience. New domains are added every month as partners share additional data.
What the leaderboard tracks
01 :: Human match rate
Percent of scenarios where agent choices align with human outcomes within a defined confidence interval.
02 :: Rank-order correlation
Spearman correlation between simulated and observed preference rankings across alternatives.
03 :: Causal uplift
Lift achieved when teams deploy simulation-backed decisions versus legacy heuristics or A/B testing.
04 :: Time-to-insight
Median time from hypothesis to actionable recommendation across all leaderboard entries.
Benchmarked domains
01 :: Pricing & packaging
Subscription, usage-based, and hybrid pricing models benchmarked against revenue and retention outcomes.
02 :: Narrative & messaging
Campaigns, onboarding flows, and policy communications ranked by engagement, activation, and comprehension.
03 :: Product adoption
Feature prioritization, bundling, and roadmap decisions scored against post-launch adoption curves.
04 :: Support & CX
Service scripts, retention plays, and channel routing optimized for resolution time and CSAT.
05 :: Policy response
Public sector pilots and civic engagement programs benchmarked for participation and equity impact.
06 :: Risk & compliance
Fraud safeguards, lending decisions, and claims processes evaluated for accuracy and fairness.
How we score
01 :: Ground truth ingestion
We import anonymized human outcome data with consent, ensuring segment coverage and statistical validity.
02 :: Scenario reconstruction
Simulations mirror the real-world experiment design, constraints, and available levers before we score accuracy.
03 :: Cross-segment checks
Performance is sliced by demographic, psychographic, and firmographic cohorts to surface bias.
04 :: Confidence intervals
We publish interval bounds alongside mean accuracy so operators understand variance and edge cases.
05 :: Continuous recalibration
Benchmarks update automatically as partners feed new decisions and outcomes into Subconscious.ai.
06 :: Open verification
Partners can reproduce any score using our leaderboard methodology guide and validation scripts.
Frequently
asked questions
01 :: How is accuracy calculated?
We compare simulated decisions to anonymized human outcomes and score match rate, rank correlation, and causal uplift for each scenario.
02 :: Can we audit the methodology?
Yes. We provide the experiment checklist, scoring scripts, and sample data under NDA so you can replicate results.
03 :: How often is the leaderboard updated?
New benchmarks publish monthly, with rolling updates when partners contribute large datasets.
04 :: Do you share partner names?
We anonymize sensitive engagements by default, but some partners opt-in to named case studies showcased elsewhere on the site.