fireworks LLM Benchmarks – Performance & Latency

Provider Snapshot

Models Tracked

Avg Tokens / Second

36.66

Avg Time to First Token (ms)

102.86

Last Updated

May 20, 2026

7 fireworks models are actively benchmarked with 512 total measurements across 464 benchmark runs.
kimi-k2p6 leads the fleet with 47.10 tokens/second, while glm-5 delivers 32.20 tok/s.
Performance varies by 46.3% across the fireworks model lineup, indicating diverse optimization strategies for different use cases.
Avg time to first token across the fleet is 102.86 ms, showing excellent responsiveness for interactive applications.
The fireworks model fleet shows consistent performance characteristics (22.6% variation coefficient), indicating standardized infrastructure.

Provider	Model	Avg Toks/Sec	Min	Max	Avg TTF (ms)
fireworks	kimi-k2p6	47.10	4.48	78.60	0.00
fireworks	minimax-m2p7	45.20	2.61	100.00	0.00
fireworks	llama-3.3-70b	40.10	5.57	69.20	720.00
fireworks	kimi-k2p5	37.10	1.11	70.00	0.00
fireworks	deepseek-v4-pro	34.40	14.50	69.50	0.00
fireworks	glm-5	32.20	17.40	48.80	0.00

Complete list of all fireworks models tracked in the benchmark system. Click any model name to view detailed performance data.

Provider	Model	Avg Toks/Sec	Min	Max	Avg TTF (ms)
fireworks	deepseek-v4-pro	34.40	14.50	69.50	0.00
fireworks	glm-5	32.20	17.40	48.80	0.00
fireworks	glm-5p1	20.50	8.38	37.00	0.00
fireworks	kimi-k2p5	37.10	1.11	70.00	0.00
fireworks	kimi-k2p6	47.10	4.48	78.60	0.00
fireworks	llama-3.3-70b	40.10	5.57	69.20	720.00
fireworks	minimax-m2p7	45.20	2.61	100.00	0.00