The AI Models Allowed on This Site
Writing this blog post to keep track of the AI models that are allowed on this site. Users have the ability to choose which AI model to use for writing and coding (I plan to implement the AI coder for analyzing surveys with Jupyter Notebooks, but this has not yet been implemented).
Open Router
OpenRouter is an API provider service that offers access to most of the currently relevant AI models in one location. It is what I use for generative AI problems on this site.
Note: All the cost tables below show the cost per million tokens.
AI Writing Models
- Meta: Llama 3.2 1B Instruct (meta-llama/llama-3.2-1b-instruct)
- Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance.
- Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models.
Input | Output |
---|---|
$0.01 | $0.01 |
- Meta: Llama 3.2 3B Instruct (meta-llama/llama-3.2-3b-instruct)
- Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.
- Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.
Input | Output |
---|---|
$0.015 | $0.025 |
- Meta: Llama 3.1 8B Instruct (meta-llama/llama-3.1-8b-instruct)
- Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.
- It has demonstrated strong performance compared to leading closed-source models in human evaluations.
Input | Output |
---|---|
$0.02 | $0.05 |
- Qwen2.5 7B Instruct (qwen/qwen-2.5-7b-instruct)
- Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:
- Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains.
- Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots.
- Long-context Support up to 128K tokens and can generate up to 8K tokens.
- Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
Input | Output |
---|---|
$0.025 | $0.05 |
- Mistral: Mistral 7B Instruct (mistralai/mistral-7b-instruct)
- A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
Input | Output |
---|---|
$0.03 | $0.055 |
- Google: Gemma 2 9B (google/gemma-2-9b-it)
- Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.
- Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.
Input | Output |
---|---|
$0.03 | $0.06 |
- Meta: Llama 8B Instruct (meta-llama/llama-3-8b-instruct)
- Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
- It has demonstrated strong performance compared to leading closed-source models in human evaluations.
Input | Output |
---|---|
$0.03 | $0.06 |
- Mistral: Ministral 3B (mistralai/ministral-3b)
- Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference.
Input | Output |
---|---|
$0.04 | $0.04 |
- Amazon: Nova Micro 1.0 (amazon/nova-micro-v1)
- Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has simple mathematical reasoning and coding abilities.
Input | Output |
---|---|
$0.035 | $0.14 |
- Google: Gemini Flash 1.5 8B (google/gemini-flash-1.5-8b)
- Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results.
Input | Output |
---|---|
$0.0375 | $0.15 |
- Mistral: Ministral 8B (mistralai/ministral-8b)
- Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications.
Input | Output |
---|---|
$0.10 | $0.10 |
- Google: Gemini Flash 2.0 (google/gemini-2.0-flash-001)
- Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.
Input | Output |
---|---|
$0.10 | $0.40 |
- Meta: Llama 3.3 70B Instruct (meta-llama/llama-3.3-70b-instruct)
- The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
Input | Output |
---|---|
$0.12 | $0.30 |
Added 3/5/2025
- Meta: Llama 3.1 70B Instruct (meta-llama/llama-3.1-70b-instruct)
- Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.
- It has demonstrated strong performance compared to leading closed-source models in human evaluations.
Input | Output |
---|---|
$0.12 | $0.30 |
- Google: Gemma 7B (google/gemma-7b-it)
- Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks like question answering, summarization, and reasoning. The Gemma 7B variant is comparable in performance to leading open source models.
Input | Output |
---|---|
$0.15 | $0.15 |
- OpenAI: GPT-40-mini (openai/gpt-4o-mini-2024-07-18)
- GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs.
- As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than GPT-3.5 Turbo. It maintains SOTA intelligence, while being significantly more cost-effective.
Input | Output |
---|---|
$0.15 | $0.6 |
- Meta: Llama 2 13B Chat (meta-llama/llama-2-13b-chat)
- A 13 billion parameter language model from Meta, fine tuned for chat completions
Input | Output |
---|---|
$0.22 | $0.22 |
- Anthropic: Claude 3 Haiku (anthropic/claude-3-haiku)
- Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance.
Input | Output |
---|---|
$0.25 | $1.25 |
- OpenAI: GPT-3.5 Turbo (openai/gpt-3.5-turbo)
- GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks.
Input | Output |
---|---|
$0.50 | $1.5 |
- Anthropic: Claude 3.5 Haiku (anthropic/claude-3.5-haiku)
- Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic tasks such as chat interactions and immediate coding suggestions.
- This makes it highly suitable for environments that demand both speed and precision, such as software development, customer service bots, and data management systems.
Input | Output |
---|---|
$0.80 | $4 |
- OpenAI: o3 Mini (openai/o3-mini)
- OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding.
- This model supports the
reasoning_effort
parameter, which can be set to "high", "medium", or "low" to control the thinking time of the model. The default is "medium". OpenRouter also offers the model slugopenai/o3-mini-high
to default the parameter to "high". - The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.
- The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost.
Input | Output |
---|---|
$1.10 | $4.40 |
- OpenAI: GPT-4o (openai/gpt-4o)
- GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
Input | Output |
---|---|
$2.50 | $10.00 |
AI Coding Models
Not all coding models available are listed here. If a model is available as a coding and writing model, then it is only listed in the writing model section.
- Qwen2.5 Coder 32B Instruct (qwen/qwen-2.5-coder-32b-instruct)
Input | Output |
---|---|
$0.07 | $0.16 |
- Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
- Significantly improvements in code generation, code reasoning and code fixing.
- A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
- Google: Gemini Flash 1.5 (google/gemini-flash-1.5)
Input | Output |
---|---|
$0.075 | $0.40 |
- Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.
- Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.
- Google: Gemini Flash 2.0 (google/gemini-2.0-flash-001)
Input | Output |
---|---|
$0.10 | $0.40 |
- Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.
- Meta: Llama 3.3 70B Instruct (meta-llama/llama-3.3-70b-instruct)
Input | Output |
---|---|
$0.12 | $0.30 |
- The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
- OpenAI: GPT-4o-mini (openai/gpt-4o-mini)
Input | Output |
---|---|
$0.15 | $0.60 |
- GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs.
- As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than GPT-3.5 Turbo. It maintains SOTA intelligence, while being significantly more cost-effective.
- Mistal: Mistral 7B Instruct v0.1 (mistralai/mistral-7b-instruct-v0.1)
Input | Output |
---|---|
$0.2 | $0.2 |
- A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.
Added 3/5/2025
- Google PaLM 2 Chat (google/palm-2-chat-bison)
- PaLM 2 is a language model by Google with improved multilingual, reasoning and coding capabilities.
Input | Output |
---|---|
$1 | $2 |
Comments
You can read more about how comments are sorted in this blog post.
User Comments
I currently use the Salesforce/blip-image-captioning-large model from HuggingFace to generate captions for images on this site that don't have captions. (An implementation of this method of captioning images can be seen here). I should try to test the Google: Gemini Flash 1.5 and OpenAI: GPT-4o-mini to see if they could be used for image captioning and possibly for safe search.
I should also try to implement question answering and ai-summarization for blogs, notes, markdown notes, and the like on this site using one of the models above.