LLM Model Performance Leaderboard
Comparative analysis of various LLM models across different research domains, measuring their performance on key statistical metrics.
Filter by Research Domain
About the Metrics
Manhattan Similarity (%)
We recreate the conjoint survey from a published study and gather responses from an AI model. Next, we estimate a discrete choice model similar to the one reported in the study. By computing choice probabilities for each option using both the estimated and reported parameters, we measure their similarity using Manhattan distance. This metric quantifies how closely the AI replicates human decision patterns on a probability scale.
Coverage Probability (%)
We recreate the conjoint survey from a published study and collect responses from an AI model. Then, we estimate a discrete choice model similar to the one in the study. To assess similarity, we check whether the study's reported parameters fall within a confidence interval constructed based on estimated parameters. This metric evaluates HumanβLLM Equivalence while accounting for sampling differences.
Directional Effect Matching (%)
We recreate the conjoint survey from a published study and collect responses from an AI model. After estimating a discrete choice model similar to the study's, we compare the signs (positive or negative) of the estimated parameters with those reported in the study. This metric measures HumanβLLM Equivalence in terms of preference alignment.
Spearman Correlation (%)
We recreate the conjoint survey from a published study and collect responses from an AI model. After estimating a discrete choice model similar to the one in the study, we calculate the Spearman correlation between the estimated and reported parameters. This metric quantifies HumanβLLM Equivalence based on preference ordering.
Domain-Specific Leaderboards
Detailed performance breakdowns for each research domain.
Public Health
Consumer Research
Economics
Agricultural Sciences
Business Administration
Have questions about the data?
Join our awesome community
Share results, seek support and stay updated with new releases