Llama-3 vs Mixtral Q&A

Completed

2024-01-15 - 2024-02-28

7834
Total Users
62672
Total Interactions
2
Variants
converted
Primary Metric
A/B Test Results
Metric Control Treatment Lift CI (95%) P-value Significant Effect Size
converted 0.68 0.65 -4.41% [-0.05, -0.01] 0.0089 Yes -0.06
Conversion Rate Over Time
Metric Comparison
Data Quality Checks
  • Temporal Leakage Passed
  • Duplicate Records Passed
  • Cross-Group Contamination Passed
  • Sample Ratio Mismatch Passed
Recommendations
  • Control (Llama-3-70B) outperforms treatment (Mixtral-8x7B) for Q&A tasks
  • Recommend keeping Llama-3 as default for Q&A use case