LLM A/B Dashboard
Dashboard
Experiments
Models
Dashboard
Experiments
GPT-3.5 vs GPT-4 Cost-Quality…
GPT-3.5 vs GPT-4 Cost-Quality Tradeoff
Running
2024-02-01
3245
Total Users
28405
Total Interactions
2
Variants
response_quality_score
Primary Metric
Conversion Rate Over Time
Metric Comparison
Data Quality Checks
Temporal Leakage
Passed
Duplicate Records
Passed
Cross-Group Contamination
Passed
Sample Ratio Mismatch
Passed
Recommendations
No recommendations available