Free AI Access Tracker

# Antigravity

Details and Limits: Early Access / Beta. Check website for availability. Currently offers free access to Gemini 3 Pro, Claude Sonnet 4.5 and GPT-OSS 120B.

🏆 Absolute Best Free Option
Google Antigravity
Currently the absolute best free agentic coding IDE. Google's next-gen agentic coding IDE built by Google DeepMind.

# Gemini CLI

Details and Limits: Generous free tier with 60 requests per minute and 1000 requests per day using OAuth with your Google account.

🏆 Absolute Best Free Option
Google Gemini CLI
The official CLI coding agent for Google's Gemini models. Provides access to Gemini 3 Pro Preview, Gemini 2.5 Pro, Gemini 2.5 Flash and Gemini 2.5 Flash-Lite.

# OpenRouter

Details and Limits: 20 requests/minute, 50 requests/day (up to 1000/day with $10 lifetime topup)

🏆 Best Free API Choice
xAI Grok 4.1 Fast
Currently best free model from OpenRouter. Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window. Reasoning can be enabled/disabled using the reasoning enabled parameter in the API.
Details: Temporarily free access via OpenRouter.
x-ai/grok-4.1-fast:free Click to copy

Show All OpenRouter Models

Tongyi DeepResearch 30B A3B (free) (alibaba/tongyi-deepresearch-30b-a3b:free)
Tongyi DeepResearch is an agentic large language model developed by Tongyi Lab, with 30 billion total parameters activating only 3 billion per token. It's optimized for long-horizon, deep information-seeking tasks and delivers state-of-the-art performance on benchmarks like Humanity's Last Exam, BrowserComp, BrowserComp-ZH, WebWalkerQA, GAIA, xbench-DeepSearch, and FRAMES. This makes it superior for complex agentic search, reasoning, and multi-step problem-solving compared to prior models. The model includes a fully automated synthetic data pipeline for scalable pre-training, fine-tuning, and reinforcement learning. It uses large-scale continual pre-training on diverse agentic data to boost reasoning and stay fresh. It also features end-to-end on-policy RL with a customized Group Relative Policy Optimization, including token-level gradients and negative sample filtering for stable training. The model supports ReAct for core ability checks and an IterResearch-based 'Heavy' mode for max performance through test-time scaling. It's ideal for advanced research agents, tool use, and heavy inference workflows.
Details: Free access via OpenRouter. Context: 131k tokens.

alibaba/tongyi-deepresearch-30b-a3b:free Click to copy

ArliAI: QwQ 32B RpR v1 (free) (arliai/qwq-32b-arliai-rpr-v1:free)
QwQ-32B-ArliAI-RpR-v1 is a 32B parameter model fine-tuned from Qwen/QwQ-32B using a curated creative writing and roleplay dataset originally developed for the RPMax series. It is designed to maintain coherence and reasoning across long multi-turn conversations by introducing explicit reasoning steps per dialogue turn, generated and refined using the base model itself. The model was trained using RS-QLORA+ on 8K sequence lengths and supports up to 128K context windows (with practical performance around 32K). It is optimized for creative roleplay and dialogue generation, with an emphasis on minimizing cross-context repetition while preserving stylistic diversity.
Details: Free access via OpenRouter. Context: 33k tokens.

arliai/qwq-32b-arliai-rpr-v1:free Click to copy

Venice: Uncensored (free) (cognitivecomputations/dolphin-mistral-24b-venice-edition:free)
Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving user control over alignment, system prompts, and behavior. Intended for advanced and unrestricted use cases, Venice Uncensored emphasizes steerability and transparent behavior, removing default safety and alignment layers typically found in mainstream assistant models.
Details: Free access via OpenRouter. Context: 33k tokens.

cognitivecomputations/dolphin-mistral-24b-venice-edition:free Click to copy

DeepSeek: R1 0528 (free) (deepseek/deepseek-r1-0528:free)
May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model.
Details: Free access via OpenRouter. Context: 164k tokens.

deepseek/deepseek-r1-0528:free Click to copy

DeepSeek: R1 Distill Llama 70B (free) (deepseek/deepseek-r1-distill-llama-70b:free)
DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
Details: Free access via OpenRouter. Context: 8k tokens.

deepseek/deepseek-r1-distill-llama-70b:free Click to copy

DeepSeek: R1 (free) (deepseek/deepseek-r1:free)
DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model & technical report. MIT licensed: Distill & commercialize freely!
Details: Free access via OpenRouter. Context: 164k tokens.

deepseek/deepseek-r1:free Click to copy

DeepSeek: DeepSeek R1 0528 Qwen3 8B (free) (deepseek/deepseek-r1-0528-qwen3-8b:free)
DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1 that taps more compute and smarter post-training tricks, pushing its reasoning and inference to the brink of flagship models like O3 and Gemini 2.5 Pro. It now tops math, programming, and logic leaderboards, showcasing a step-change in depth-of-thought. The distilled variant, DeepSeek-R1-0528-Qwen3-8B, transfers this chain-of-thought into an 8 B-parameter form, beating standard Qwen3 8B by +10 pp and tying the 235 B “thinking” giant on AIME 2024.
Details: Free access via OpenRouter. Context: 131k tokens.

deepseek/deepseek-r1-0528-qwen3-8b:free Click to copy

DeepSeek: DeepSeek V3 0324 (free) (deepseek/deepseek-chat-v3-0324:free)
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.
Details: Free access via OpenRouter. Context: 164k tokens.

deepseek/deepseek-chat-v3-0324:free Click to copy

Google: Gemma 3 27B (free) (google/gemma-3-27b-it:free)
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2
Details: Free access via OpenRouter. Context: 131k tokens.

google/gemma-3-27b-it:free Click to copy

Google: Gemini 2.0 Flash Experimental (free) (google/gemini-2.0-flash-exp:free)
Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.
Details: Free access via OpenRouter. Context: 1049k tokens.

google/gemini-2.0-flash-exp:free Click to copy

Google: Gemma 3 4B (free) (google/gemma-3-4b-it:free)
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling.
Details: Free access via OpenRouter. Context: 33k tokens.

google/gemma-3-4b-it:free Click to copy

Google: Gemma 3n 4B (free) (google/gemma-3n-e4b-it:free)
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements. This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. Read more in the blog post
Details: Free access via OpenRouter. Context: 8k tokens.

google/gemma-3n-e4b-it:free Click to copy

Google: Gemma 3 12B (free) (google/gemma-3-12b-it:free)
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 12B is the second largest in the family of Gemma 3 models after Gemma 3 27B
Details: Free access via OpenRouter. Context: 33k tokens.

google/gemma-3-12b-it:free Click to copy

Google: Gemma 3n 2B (free) (google/gemma-3n-e2b-it:free)
Gemma 3n E2B IT is a multimodal, instruction-tuned model developed by Google DeepMind, designed to operate efficiently at an effective parameter size of 2B while leveraging a 6B architecture. Based on the MatFormer architecture, it supports nested submodels and modular composition via the Mix-and-Match framework. Gemma 3n models are optimized for low-resource deployment, offering 32K context length and strong multilingual and reasoning performance across common benchmarks. This variant is trained on a diverse corpus including code, math, web, and multimodal data.
Details: Free access via OpenRouter. Context: 8k tokens.

google/gemma-3n-e2b-it:free Click to copy

Kwaipilot: KAT-Coder-Pro V1 (free) (kwaipilot/kat-coder-pro:free)
KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.
Details: Free access via OpenRouter. Context: 256k tokens.

kwaipilot/kat-coder-pro:free Click to copy

Meituan: LongCat Flash Chat (free) (meituan/longcat-flash-chat:free)
LongCat-Flash-Chat is a large-scale Mixture-of-Experts (MoE) model with 560B total parameters, of which 18.6B–31.3B (≈27B on average) are dynamically activated per input. It introduces a shortcut-connected MoE design to reduce communication overhead and achieve high throughput while maintaining training stability through advanced scaling strategies such as hyperparameter transfer, deterministic computation, and multi-stage optimization. This release, LongCat-Flash-Chat, is a non-thinking foundation model optimized for conversational and agentic tasks. It supports long context windows up to 128K tokens and shows competitive performance across reasoning, coding, instruction following, and domain benchmarks, with particular strengths in tool use and complex multi-step interactions.
Details: Free access via OpenRouter. Context: 131k tokens.

meituan/longcat-flash-chat:free Click to copy

Meta: Llama 3.2 3B Instruct (free) (meta-llama/llama-3.2-3b-instruct:free)
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages. Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings. Click here for the original model card. Usage of this model is subject to Meta's Acceptable Use Policy.
Details: Free access via OpenRouter. Context: 131k tokens.

meta-llama/llama-3.2-3b-instruct:free Click to copy

Meta: Llama 3.3 70B Instruct (free) (meta-llama/llama-3.3-70b-instruct:free)
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Model Card
Details: Free access via OpenRouter. Context: 131k tokens.

meta-llama/llama-3.3-70b-instruct:free Click to copy

Microsoft: MAI DS R1 (free) (microsoft/mai-ds-r1:free)
MAI-DS-R1 is a post-trained variant of DeepSeek-R1 developed by the Microsoft AI team to improve the model’s responsiveness on previously blocked topics while enhancing its safety profile. Built on top of DeepSeek-R1’s reasoning foundation, it integrates 110k examples from the Tulu-3 SFT dataset and 350k internally curated multilingual safety-alignment samples. The model retains strong reasoning, coding, and problem-solving capabilities, while unblocking a wide range of prompts previously restricted in R1. MAI-DS-R1 demonstrates improved performance on harm mitigation benchmarks and maintains competitive results across general reasoning tasks. It surpasses R1-1776 in satisfaction metrics for blocked queries and reduces leakage in harmful content categories. The model is based on a transformer MoE architecture and is suitable for general-purpose use cases, excluding high-stakes domains such as legal, medical, or autonomous systems.
Details: Free access via OpenRouter. Context: 164k tokens.

microsoft/mai-ds-r1:free Click to copy

Mistral: Mistral 7B Instruct (free) (mistralai/mistral-7b-instruct:free)
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.
Details: Free access via OpenRouter. Context: 33k tokens.

mistralai/mistral-7b-instruct:free Click to copy

Mistral: Mistral Nemo (free) (mistralai/mistral-nemo:free)
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license.
Details: Free access via OpenRouter. Context: 131k tokens.

mistralai/mistral-nemo:free Click to copy

Mistral: Mistral Small 3 (free) (mistralai/mistral-small-24b-instruct-2501:free)
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment. The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.
Details: Free access via OpenRouter. Context: 33k tokens.

mistralai/mistral-small-24b-instruct-2501:free Click to copy

Mistral: Mistral Small 3.1 24B (free) (mistralai/mistral-small-3.1-24b-instruct:free)
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is Mistral Small 3.2
Details: Free access via OpenRouter. Context: 128k tokens.

mistralai/mistral-small-3.1-24b-instruct:free Click to copy

Mistral: Mistral Small 3.2 24B (free) (mistralai/mistral-small-3.2-24b-instruct:free)
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on WildBench and Arena Hard, reduces infinite generations, and delivers gains in tool use and structured output tasks. It supports image and text inputs with structured outputs, function/tool calling, and strong performance across coding (HumanEval+, MBPP), STEM (MMLU, MATH, GPQA), and vision benchmarks (ChartQA, DocVQA).
Details: Free access via OpenRouter. Context: 131k tokens.

mistralai/mistral-small-3.2-24b-instruct:free Click to copy

MoonshotAI: Kimi K2 0711 (free) (moonshotai/kimi-k2:free)
Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.
Details: Free access via OpenRouter. Context: 33k tokens.

moonshotai/kimi-k2:free Click to copy

NVIDIA: Nemotron Nano 12B 2 VL (free) (nvidia/nemotron-nano-12b-v2-vl:free)
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.
Details: Free access via OpenRouter. Context: 128k tokens.

nvidia/nemotron-nano-12b-v2-vl:free Click to copy

Nous: Hermes 3 405B Instruct (free) (nousresearch/hermes-3-llama-3.1-405b:free)
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.
Details: Free access via OpenRouter. Context: 131k tokens.

nousresearch/hermes-3-llama-3.1-405b:free Click to copy

NVIDIA: Nemotron Nano 9B V2 (free) (nvidia/nemotron-nano-9b-v2:free)
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.
Details: Free access via OpenRouter. Context: 128k tokens.

nvidia/nemotron-nano-9b-v2:free Click to copy

OpenAI: gpt-oss-20b (free) (openai/gpt-oss-20b:free)
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.
Details: Free access via OpenRouter. Context: 131k tokens.

openai/gpt-oss-20b:free Click to copy

Bert-Nebulon Alpha (openrouter/bert-nebulon-alpha)
This is a cloaked model provided to the community to gather feedback. A general-purpose multimodal model (text/image in, text out) designed for reliability, long-context comprehension, and adaptive logic. It is engineered for production-grade assistants, retrieval-augmented systems, science workloads, and complex agentic workflows. Note: All prompts and completions for this model are logged by the provider and may be used to improve the model.
Details: Free access via OpenRouter. Context: 256k tokens.

openrouter/bert-nebulon-alpha Click to copy

Qwen2.5 72B Instruct (free) (qwen/qwen-2.5-72b-instruct:free)
Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.
Details: Free access via OpenRouter. Context: 33k tokens.

qwen/qwen-2.5-72b-instruct:free Click to copy

Qwen2.5 Coder 32B Instruct (free) (qwen/qwen-2.5-coder-32b-instruct:free)
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning and code fixing. - A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. To read more about its evaluation results, check out Qwen 2.5 Coder's blog.
Details: Free access via OpenRouter. Context: 33k tokens.

qwen/qwen-2.5-coder-32b-instruct:free Click to copy

Qwen: Qwen2.5 VL 32B Instruct (free) (qwen/qwen2.5-vl-32b-instruct:free)
Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.
Details: Free access via OpenRouter. Context: 16k tokens.

qwen/qwen2.5-vl-32b-instruct:free Click to copy

Qwen: Qwen3 14B (free) (qwen/qwen3-14b:free)
Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, programming, and logical inference, and a "non-thinking" mode for general-purpose conversation. The model is fine-tuned for instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.
Details: Free access via OpenRouter. Context: 41k tokens.

qwen/qwen3-14b:free Click to copy

Qwen: Qwen3 235B A22B (free) (qwen/qwen3-235b-a22b:free)
Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and code tasks, and a "non-thinking" mode for general conversational efficiency. The model demonstrates strong reasoning ability, multilingual support (100+ languages and dialects), advanced instruction-following, and agent tool-calling capabilities. It natively handles a 32K token context window and extends up to 131K tokens using YaRN-based scaling.
Details: Free access via OpenRouter. Context: 131k tokens.

qwen/qwen3-235b-a22b:free Click to copy

Qwen: Qwen3 4B (free) (qwen/qwen3-4b:free)
Qwen3-4B is a 4 billion parameter dense language model from the Qwen3 series, designed to support both general-purpose and reasoning-intensive tasks. It introduces a dual-mode architecture—thinking and non-thinking—allowing dynamic switching between high-precision logical reasoning and efficient dialogue generation. This makes it well-suited for multi-turn chat, instruction following, and complex agent workflows.
Details: Free access via OpenRouter. Context: 41k tokens.

qwen/qwen3-4b:free Click to copy

Qwen: Qwen3 30B A3B (free) (qwen/qwen3-30b-a3b:free)
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
Details: Free access via OpenRouter. Context: 41k tokens.

qwen/qwen3-30b-a3b:free Click to copy

Qwen: Qwen3 Coder 480B A35B (free) (qwen/qwen3-coder:free)
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.
Details: Free access via OpenRouter. Context: 262k tokens.

qwen/qwen3-coder:free Click to copy

TNG: DeepSeek R1T Chimera (free) (tngtech/deepseek-r1t-chimera:free)
DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks. The model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use.
Details: Free access via OpenRouter. Context: 164k tokens.

tngtech/deepseek-r1t-chimera:free Click to copy

TNG: DeepSeek R1T2 Chimera (free) (tngtech/deepseek-r1t2-chimera:free)
DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks.
Details: Free access via OpenRouter. Context: 164k tokens.

tngtech/deepseek-r1t2-chimera:free Click to copy

xAI: Grok 4.1 Fast (free) (x-ai/grok-4.1-fast:free)
Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window. Reasoning can be enabled/disabled using the reasoning enabled parameter in the API. Learn more in our docs
Details: Free access via OpenRouter. Context: 2000k tokens.

x-ai/grok-4.1-fast:free Click to copy

xAI: Grok 4.1 Fast (x-ai/grok-4.1-fast)
Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window. Reasoning can be enabled/disabled using the reasoning enabled parameter in the API. Learn more in our docs
Details: Free access via OpenRouter. Context: 2000k tokens.

x-ai/grok-4.1-fast Click to copy

Z.AI: GLM 4.5 Air (free) (z-ai/glm-4.5-air:free)
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs
Details: Free access via OpenRouter. Context: 131k tokens.

z-ai/glm-4.5-air:free Click to copy

AllenAI: Olmo 3 32B Think (free) (allenai/olmo-3-32b-think:free)
Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruction-following scenarios. Its capacity enables strong performance on demanding evaluation tasks and highly nuanced conversational reasoning. Developed by Ai2 under the Apache 2.0 license, Olmo 3 32B Think embodies the Olmo initiative’s commitment to openness, offering full transparency across weights, code and training methodology.
Details: Free access via OpenRouter. Context: 66k tokens.

allenai/olmo-3-32b-think:free Click to copy

AllenAI: Olmo 3 32B Think (allenai/olmo-3-32b-think)
Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruction-following scenarios. Its capacity enables strong performance on demanding evaluation tasks and highly nuanced conversational reasoning. Developed by Ai2 under the Apache 2.0 license, Olmo 3 32B Think embodies the Olmo initiative’s commitment to openness, offering full transparency across weights, code and training methodology.
Details: Free access via OpenRouter. Context: 66k tokens.

allenai/olmo-3-32b-think Click to copy

Arcee AI: Trinity Mini (free) (arcee-ai/trinity-mini:free)
Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function calling and multi-step agent workflows.
Details: Free access via OpenRouter. Context: 131k tokens.

arcee-ai/trinity-mini:free Click to copy

Amazon: Nova 2 Lite (free) (amazon/nova-2-lite-v1:free)
Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing documents, extracting information from videos, generating code, providing accurate grounded answers, and automating multi-step agentic workflows.
Details: Free access via OpenRouter. Context: 1000k tokens.

amazon/nova-2-lite-v1:free Click to copy

TNG: R1T Chimera (free) (tngtech/tng-r1t-chimera:free)
TNG-R1T-Chimera is an experimental LLM with a faible for creative storytelling and character interaction. It is a derivate of the original TNG/DeepSeek-R1T-Chimera released in April 2025 and is available exclusively via Chutes and OpenRouter. Characteristics and improvements include: We think that it has a creative and pleasant personality. It has a preliminary EQ-Bench3 value of about 1305. It is quite a bit more intelligent than the original, albeit a slightly slower. It is much more think-token consistent, i.e. reasoning and answer blocks are properly delineated. Tool calling is much improved. TNG Tech, the model authors, ask that users follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model. These guidelines are available on Hugging Face (https://huggingface.co/microsoft/MAI-DS-R1).
Details: Free access via OpenRouter. Context: 164k tokens.

tngtech/tng-r1t-chimera:free Click to copy

# Copilot

Details and Limits: Free for everyone (limited). Pro is free for verified students, teachers, & maintainers.

GitHub Copilot
The industry-standard AI pair programmer. Copilot Free: Access to Claude Haiku 4.5, GPT-4.1, GPT-5 mini, and Raptor mini. Copilot Pro: Adds Claude Opus 4.5, Claude Sonnet 4/4.5, Gemini 2.5/3 Pro, GPT-5, GPT-5-Codex, GPT-5.1, Grok Code Fast 1, and more.

# Kilo Code

Details and Limits: Free access to Minimax M2 and Grok Code Fast 1 as well as a cloaked model called Spectre with unlimited usage. Needs an account.

Kilo Code
An open-source VSCode extension for agentig coding. Currently provides free access to premium models including. CLI agent available.

# OpenCode

Details and Limits: Free access via OpenCode Zen.

Big Pickle
Cloaked model with advanced reasoning capabilities.
big-pickle Click to copy
Grok Code Fast 1
xAI's coding model designed for speed.
grok-code Click to copy

# Qwen CLI

Details and Limits: Generous free tier with 60 requests per minute and 2000 requests per day with a Qwen account OAuth.

Qwen CLI
Qwen Code is a powerful command-line AI workflow tool adapted from Gemini CLI, specifically optimized for Qwen3-Coder models. It enhances your development workflow with advanced code understanding, automated tasks, and intelligent assistance.