Compare performance metrics across all tested LLM models
| Model | Provider | Quality | Conversion | Avg Latency | P95 Latency | Error Rate | Cost/1k | Interactions |
|---|---|---|---|---|---|---|---|---|
| GPT-4-Turbo | OpenAI | 8.99 | % | 1150 ms | ms | % | $0.01 | 131949 |
| Llama-3-70B | Meta | 8.56 | % | 905 ms | ms | % | $0.0009 | 44871 |
| Claude-3-Opus | Anthropic | 9.17 | % | 1346 ms | ms | % | $0.015 | 44637 |
| Gemini-1.5-Pro | 8.86 | % | 948 ms | ms | % | $0.007 | 43401 | |
| GPT-3.5-Turbo | OpenAI | 8.29 | % | 463 ms | ms | % | $0.0005 | 43294 |
| Mixtral-8x7B | Mistral AI | 8.38 | % | 580 ms | ms | % | $0.0002 | 42653 |
| Llama-3-8B | Meta | None | % | None ms | ms | % | $0.0001 | -- |
| Claude-3-Sonnet | Anthropic | None | % | None ms | ms | % | $0.003 | -- |