FreeLLM is an open-source, self-hosted gateway that unifies eight free LLM providers (Groq, Gemini, Mistral, Cerebras, NVIDIA NIM, Cloudflare Workers AI, GitHub Models, Ollama) into a single OpenAI-compatible endpoint. It automatically handles rate limits, failover, and multi-key rotation, allowing you to stack up to 3 keys per provider for ~450 free requests per minute. Key features include:
- Drop-in OpenAI SDK: Change only the base URL to use any OpenAI-compatible SDK.
- Automatic failover: If one provider rate-limits, requests silently route to the next.
- Multi-key rotation: Set multiple API keys per provider to increase throughput.
- Token tracking: Rolling 24-hour token counts per provider.
- Circuit breakers: Per-provider health monitoring with automatic recovery.
- Three meta-models:
free-fastfor speed,free-smartfor reasoning,freefor max uptime. - Real-time dashboard: Live request log, latency, and token usage.
- Response caching: SHA-256 keyed, LRU eviction, configurable TTL – zero quota burn.
- Truly $0: No markup, no subscription, self-host in 2 minutes.
Ideal for developers, indie hackers, and startups who want to experiment with LLMs without incurring costs.
