Llama-3-70B vs Mixtral-8x7B Q&A

Completed

2025-12-16 - 2026-01-27

--
Total Users
--
Total Interactions
2
Variants
converted
Primary Metric
A/B Test Results
Metric Control Treatment Lift CI (95%) P-value Significant Effect Size
converted 0.8952 0.8636 -3.54% [-0.0517, -0.0117] 0.0 Yes -0.0975
response_quality_score 8.56 8.38 -2.07% [-0.48, 0.12] 0.15 No -0.0885
latency_ms 905.0 580.0 -35.94% [-375.0, -275.0] 0.01 Yes -0.6506
Conversion Rate Over Time
Metric Comparison
Data Quality Checks
  • Failed
  • Failed
  • Failed
Recommendations
  • Control (Llama-3-70B) outperforms treatment (Mixtral-8x7B). Do not roll out.
Subgroup Analysis
User_Tier
Segment Control Rate Treatment Rate Lift Sample Size Significant
free 0.8823 0.8497 -3.7% 50598 No
enterprise 0.9392 0.9172 -2.35% 3678 No
pro 0.9098 0.879 -3.39% 33248 No
Use_Case
Segment Control Rate Treatment Rate Lift Sample Size Significant
creative_writing 0.8831 0.8517 -3.55% 21873 No
qa 0.9269 0.9016 -2.73% 21976 No
code_generation 0.8861 0.8481 -4.29% 21832 No
text_summarization 0.8849 0.8524 -3.68% 21843 No
Region_Code
Segment Control Rate Treatment Rate Lift Sample Size Significant
LATAM 0.8918 0.8689 -2.56% 21019 No
EU 0.898 0.8546 -4.84% 22853 No
NA 0.9036 0.8641 -4.37% 22594 No
APAC 0.8874 0.8669 -2.31% 21058 No