Ithy - Ithy

Detailed History of Google Gemini Models: Context Limits and Pricing

Google's Gemini family of models has undergone significant evolution since its inception, marked by advancements in capabilities, context window sizes, and pricing structures. This comprehensive history details the progression of Gemini models, their context limits, and per-token pricing, drawing from official Google documentation and credible sources.

Early Foundations: LaMDA and PaLM

Before the Gemini series, Google developed foundational language models that paved the way for its current AI offerings:

LaMDA (Language Model for Dialogue Applications) - 2021: This model was Google's initial foray into conversational AI, laying the groundwork for future models. While not directly part of the Gemini family, it was a crucial precursor. ¹
PaLM (Pathways Language Model) - 2022: PaLM represented a significant leap forward, boasting enhanced coding, multilingual, and reasoning skills compared to LaMDA. It was later succeeded by PaLM 2 which was used in Bard. ¹

Gemini 1.0 Pro

Gemini 1.0 Pro marked the initial release of the Gemini family, designed for general-purpose AI tasks.

Release Date: December 2023. ²
Capabilities: It offered text and image reasoning, along with stable and reliable performance for production use, supporting text generation, embeddings, and structured outputs. ³
Context Limit: 32,000 tokens. ⁴
Pricing:
- Free Tier:
  - Input: Free
  - Output: Free
  - Rate Limits: 15 requests per minute (RPM), 32,000 tokens per minute (TPM), 1,500 requests per day (RPD). ⁵
- Pay-as-You-Go:
  - Input: $1.25 per 1 million tokens (for prompts <128k tokens). ⁶
  - Output: $5.00 per 1 million tokens (for prompts <128k tokens). ⁶
  - Context Caching: $0.3125 per 1 million tokens. ⁶
  - Prompts longer than 128k tokens:
    - Input: $2.50 per 1 million tokens. ⁶
    - Output: $10.00 per 1 million tokens. ⁶
    - Context Caching: $0.625 per 1 million tokens. ⁶

Gemini 1.5 Series

The Gemini 1.5 series introduced significant advancements, particularly in context window size and multimodal capabilities.

Gemini 1.5 Pro

Initial Release Date: May 2024 (announced at Google I/O). ⁷
Updated Version: Gemini-1.5-Pro-002 (Released September 24, 2024). ⁸
Capabilities: Enhanced general performance across text, code, and multimodal tasks. ⁹
Context Limit: 2 million tokens. ¹⁰
Pricing:
- Free Tier:
  - Input: Free
  - Output: Free
  - Rate Limits: 2 RPM, 32,000 TPM, 50 RPD. ¹¹
- Pay-as-You-Go (as of September 2024):
  - Input: $1.25 per 1 million tokens (for prompts <128k tokens). ¹²
  - Output: $5.00 per 1 million tokens (for prompts <128k tokens). ¹²
  - Context Caching: $0.3125 per 1 million tokens. ¹²
  - Prompts longer than 128k tokens:
    - Input: $2.50 per 1 million tokens. ¹²
    - Output: $10.00 per 1 million tokens. ¹²
    - Context Caching: $0.625 per 1 million tokens. ¹²
- Pricing Update (September 24, 2024):
  - >50% reduction in pricing for prompts <128k tokens. ¹³

Gemini 1.5 Flash

Initial Release Date: May 2024. ¹⁴
Updated Version: Gemini-1.5-Flash-002 (Released September 24, 2024). ¹⁵
Capabilities: Optimized for lower latency and faster output. ¹⁶
Context Limit: 1 million tokens. ¹⁷
Pricing:
- Free Tier:
  - Input: Free
  - Output: Free
  - Rate Limits: 15 RPM, 1 million TPM, 1,500 RPD. ¹⁸
- Pay-as-You-Go:
  - Input: $1.25 per 1 million tokens. ¹⁹
  - Output: $5.00 per 1 million tokens. ¹⁹
  - Context Caching: Free for up to 1 million tokens of storage per hour. ¹⁹
- Rate Limit Update (September 24, 2024):
  - 2x increase in rate limits for Flash models. ²⁰

Gemini 1.5 Flash-8B Experimental (Exp-0924)

Release Date: September 24, 2024. ²¹
Capabilities: Experimental updates with improved performance across text and multimodal use cases. ²²
Context Limit: 1 million tokens. ²³
Pricing: Same as Gemini 1.5 Flash. ²⁴

Gemini 2.0

Announcement Date: December 11, 2024. ²⁵
Capabilities: Positioned as a next-generation AI model for the "agentic era," with advanced reasoning and multimodal capabilities. ²⁶
Context Limit: Not explicitly mentioned but expected to exceed 2 million tokens based on advancements in the Gemini series. ²⁷
Pricing: Details not yet disclosed but expected to align with or improve upon the Gemini 1.5 Pro pricing structure. ²⁸

Key Updates and Milestones

December 2023: Launch of Gemini 1.0 Pro with a 32,000-token context window. ²⁹
May 2024: Introduction of Gemini 1.5 Pro and Flash models at Google I/O, featuring multimodal capabilities and larger context windows. ³⁰
June 2024: Updates to Gemini Code Assist to use the Gemini 1.5 Flash model. ³¹
September 24, 2024: Release of Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002 with:
- >50% reduced pricing for 1.5 Pro. ³²
- 2x faster output and 3x lower latency. ³³
- 2x higher rate limits for Flash and ~3x higher for Pro. ³⁴
- Updated default filter settings. ³⁵
December 11, 2024: Announcement of Gemini 2.0 as the next-generation model. ³⁶

Summary of Pricing Structures

Free Tier:

Input and output are free for all models.
Rate limits vary by model:
- Gemini 1.0 Pro: 15 RPM, 32,000 TPM. ³⁷
- Gemini 1.5 Flash: 15 RPM, 1 million TPM. ³⁸
- Gemini 1.5 Pro: 2 RPM, 32,000 TPM. ³⁹

Pay-as-You-Go:

Input Pricing: $1.25 per 1 million tokens (for prompts <128k tokens). ⁴⁰
Output Pricing: $5.00 per 1 million tokens (for prompts <128k tokens). ⁴⁰
Context Caching: $0.3125 per 1 million tokens. ⁴⁰
Prompts >128k tokens: Higher rates apply. ⁴¹

Notes

All pricing and rate limits are subject to updates as per Google's official announcements.
Developers can access Gemini models via Google AI Studio or Vertex AI for enterprise use cases.
Safety filters are available but not applied by default for the latest Gemini models, allowing developers to customize configurations.
Context caching is available for free up to specified limits for models like Gemini 1.5 Flash and Flash-8B.
Grounding with Google Search is not supported in any of the models as of the latest updates.
Experimental models are periodically released and replaced with newer versions as part of Google's development process.

This detailed history provides a comprehensive overview of the evolution of Google Gemini models, ensuring accuracy and relevance based on official sources. The information is current as of December 11, 2024.