groq LLM Benchmarks – Performance & Latency

Provider Snapshot

Models Tracked

Avg Tokens / Second

204.50

Avg Time to First Token (ms)

235.00

Last Updated

Feb 8, 2026

6 groq models are actively benchmarked with 1143 total measurements across 626 benchmark runs.
llama-3.1-8b leads the fleet with 281.00 tokens/second, while kimi-k2 delivers 140.00 tok/s.
Performance varies by 100.7% across the groq model lineup, indicating diverse optimization strategies for different use cases.
Avg time to first token across the fleet is 235.00 ms, showing excellent responsiveness for interactive applications.
The groq model fleet shows consistent performance characteristics (22.7% variation coefficient), indicating standardized infrastructure.

Provider	Model	Avg Toks/Sec	Min	Max	Avg TTF (ms)
groq	llama-3.1-8b	281.00	95.20	447.00	130.00
groq	qwen-3-32b	240.00	46.20	391.00	150.00
groq	llama-3.3-70b	206.00	79.50	280.00	120.00
groq	llama-4-scout	195.00	23.10	316.00	250.00
groq	llama-4-maverick	165.00	19.70	310.00	510.00
groq	kimi-k2	140.00	21.90	203.00	250.00

Complete list of all groq models tracked in the benchmark system. Click any model name to view detailed performance data.

Provider	Model	Avg Toks/Sec	Min	Max	Avg TTF (ms)
groq	qwen-3-32b	240.00	46.20	391.00	150.00
groq	llama-4-maverick	165.00	19.70	310.00	510.00
groq	llama-3.3-70b	206.00	79.50	280.00	120.00
groq	kimi-k2	140.00	21.90	203.00	250.00
groq	llama-4-scout	195.00	23.10	316.00	250.00
groq	llama-3.1-8b	281.00	95.20	447.00	130.00