LLM A/B Dashboard
Dashboard
Experiments
Models
Dashboard
Experiments
Llama-3 vs Mixtral Q&A
Llama-3 vs Mixtral Q&A
Completed
2024-01-15 - 2024-02-28
7834
Total Users
62672
Total Interactions
2
Variants
converted
Primary Metric
A/B Test Results
Metric
Control
Treatment
Lift
CI (95%)
P-value
Significant
Effect Size
converted
0.68
0.65
-4.41%
[-0.05, -0.01]
0.0089
Yes
-0.06
Conversion Rate Over Time
Metric Comparison
Data Quality Checks
Temporal Leakage
Passed
Duplicate Records
Passed
Cross-Group Contamination
Passed
Sample Ratio Mismatch
Passed
Recommendations
Control (Llama-3-70B) outperforms treatment (Mixtral-8x7B) for Q&A tasks
Recommend keeping Llama-3 as default for Q&A use case