Compare performance metrics across all tested LLM models
| Model | Provider | Quality | Conversion | Avg Latency | P95 Latency | Error Rate | Cost/1k | Interactions |
|---|---|---|---|---|---|---|---|---|
| GPT-4-Turbo | OpenAI | 7.82 | 0.72% | 1245.0 ms | 2890.0 ms | 0.012% | $0.01 | 125000 |
| GPT-3.5-Turbo | OpenAI | 6.92 | 0.65% | 458.0 ms | 890.0 ms | 0.008% | $0.0005 | 98000 |
| Claude-3-Opus | Anthropic | 8.15 | 0.75% | 1420.0 ms | 3200.0 ms | 0.015% | $0.015 | 87000 |
| Claude-3-Sonnet | Anthropic | 7.54 | 0.70% | 820.0 ms | 1650.0 ms | 0.011% | $0.003 | 72000 |
| Llama-3-70B | Meta | 7.22 | 0.68% | 920.0 ms | 1850.0 ms | 0.018% | $0.0009 | 65000 |
| Mixtral-8x7B | Mistral AI | 7.05 | 0.65% | 680.0 ms | 1200.0 ms | 0.014% | $0.0002 | 54000 |
| Gemini-1.5-Pro | 7.68 | 0.71% | 750.0 ms | 1480.0 ms | 0.01% | $0.007 | 48000 |