Tokens in Generative AI

November 20, 2024 6 min read Programming Fullstack Generative AI Docs IBM-FSSD AI Tokens Tokenization Optimization

This document provides a comprehensive guide to tokens in generative AI covering tokenization, text processing, input limits, token pricing, and optimization strategies for AI models.

On this page

This document explains the concept of tokens in generative AI, detailing tokenization, processing, and strategies for optimizing AI model performance and cost.

Tokens in Generative AI

Tokens play a vital role in generative AI models, influencing how text is processed, generated, and priced. Understanding tokens and related concepts provides insights into their function and significance in AI systems.

Definition of Tokens

Tokens: Tokens are the fundamental units of text that AI models process. They can represent characters, words, subwords, or even punctuation. For instance, in the sentence “AI is evolving,” the tokens might be:
- “AI,” “is,” and “evolving” (word-based tokenization).
- “A,” “I,” “is,” “evolv,” and “ing” (subword-based tokenization).
Token Limits: AI models have a predefined limit on the number of tokens they can process in a single operation. For example, GPT-4 has limits such as 8,000 or 32,000 tokens, depending on the version. This limit includes both input and output tokens.

Exploring Tokens

Tokens define the scope of interactions with generative AI models. Understanding their structure helps optimize model usage and ensures adherence to size restrictions.

Tools for Token Exploration

Several tools assist in analysing and understanding token usage in text:

OpenAI Tokenizer: A web-based tool to visualize how text is split into tokens for OpenAI models.
Hugging Face Tokenizer: Allows users to experiment with various tokenization algorithms, including those for specific models like GPT and BERT.
Python Libraries: Tools like tiktoken (for OpenAI) or tokenizers (from Hugging Face) enable programmatic token exploration.

Counting Tokens

Counting tokens involves determining how a given text is divided into units for processing:

Using Tokenization Tools: Paste text into tools like OpenAI’s tokenizer to see the number of tokens generated.
APIs or Libraries: Employ APIs or libraries (tiktoken in Python) to analyze token counts for programmatic needs.

Example (Using `tiktoken`)

1import tiktoken
2
3text = "AI is evolving rapidly."
4encoding = tiktoken.encoding_for_model("gpt-4")
5tokens = encoding.encode(text)
6print(f"Number of tokens: {len(tokens)}")

OpenAI Guidelines for Token Count

Include Input and Output: When estimating token usage, consider both the tokens in the prompt (input) and the response (output).
Plan for Model Limits: Ensure that token usage stays within model-specific token limits to prevent truncation or errors.
Optimize Prompts: Use concise prompts to save tokens and reduce costs.

Estimating AI Costs

Token usage directly affects the cost of interactions with AI models. Estimation involves:

Count Total Tokens: Calculate the number of input and expected output tokens.
Match Against Pricing: Use the pricing table for the specific AI model to estimate costs.
Adjust Usage: Optimize text or response requirements to manage expenses effectively.

Current Pricing of AI Models

Below is an example of OpenAI’s GPT-4 pricing as of 2024:

GPT-4-8k
- Input Tokens: $0.03 per 1,000 tokens
- Output Tokens: $0.06 per 1,000 tokens
GPT-4-32k
- Input Tokens: $0.06 per 1,000 tokens
- Output Tokens: $0.12 per 1,000 tokens

Other models like GPT-3.5-turbo have lower costs, making them suitable for less demanding applications.

Conclusion

Understanding tokens and tokenization is crucial for optimizing generative AI models and managing costs. Mastery of these concepts enables more efficient and effective AI applications.

FAQ

Tokens are crucial fragments of the ChatGPT API, representing segments of words. Before processing prompts, the API breaks down the input into these individual tokens. Tokens can include trailing spaces and sub-words, and do not necessarily align exactly with the start or end of words.

Requests for OpenAI’s language models are constrained by a token limit shared between the prompt and completion, which depends on the model's context window. For example, the text-davinci-003 model has a 4097-token limit. If the prompt uses 4000 tokens, the completion can use a maximum of 97 tokens. Creative solutions to stay within these constraints include condensing the prompt or splitting the text into smaller chunks. Always verify the token limit specific to the model in use.

Model-Specific Token Limits:

The token limit of 4097 is specific to certain models like text-davinci-003. Other models, such as GPT-4 or GPT-3.5, have different limits. For example:
- GPT-4 (8k context): 8192 tokens.
- GPT-4 (32k context): 32,768 tokens.
- GPT-3.5-turbo: 4096 tokens. Dependent on Context Window:
The exact token limit depends on the model’s context window, which varies across OpenAI’s offerings. Exact Token Distribution:
The 4097-token limit includes not just the prompt and completion, but also other factors like system messages in chat-based models. This is relevant if the model adds instructions or metadata automatically.

The API offers various model types at different price points. Each model has a range of capabilities, with “gpt-3.5-turbo” being the most capable. Requests made to these models have different prices, with detailed information available on the product API page.

The API treats words based on their context in the corpus data. GPT-3 converts the input into a list of tokens, processes the prompt, and converts the predicted tokens back into words as a response. Identical words may be generated as different tokens depending on their context within the text.

The OpenAI interactive tokenizer tool helps calculate the number of tokens and observe how text is broken down into tokens. For programmatic tokenization, Tiktoken is a fast BPE tokenizer designed for OpenAI models. Other libraries include the transformers package for Python and the gpt-3-encoder package for Node.js.

To count tokens for an OpenAI API call, follow these steps:

Identify the API endpoint and review the API documentation.
Check if token-based authentication is required and obtain an access token.
Count each API call with the access token in the request header as one token.
Track token usage to ensure staying within any usage limits or quotas set by the API provider.

According to OpenAI:

1 token is approximately 4 characters in English.
1 token is approximately 3/4 of a word.
100 tokens are approximately 75 words.
1-2 sentences are approximately 30 tokens.
1 paragraph is approximately 100 tokens.
1,500 words are approximately 2048 tokens.

To estimate AI costs:

Determine the number of words in the input prompt.
Calculate the cost based on the number of tokens, using the cost per 1000 tokens.
Calculate the cost of the output generated by the AI model.
Add the input and output costs to get the total estimated price.
For example, if an application calls the API 1000 times a day, calculate the daily and monthly costs based on the number of tokens used.

Pricing varies based on the model and is subject to change. Prices are per 1,000 tokens, with 1,000 tokens being about 750 words. Model-specific pricing is as follows:

Model	Input/Usage	Output/Usage
GPT-4 Turbo	$0.01 / 1K tokens	$0.03 / 1K tokens
gpt-4-1106-vision-preview	$0.01 / 1K tokens	$0.03 / 1K tokens
GPT-4	$0.03 / 1K tokens	$0.06 / 1K tokens
gpt-4-32k	$0.06 / 1K tokens	$0.12 / 1K tokens
GPT-3.5-Turbo	$0.0010 / 1K tokens	$0.0020 / 1K tokens
gpt-3.5-turbo-instruct	$0.0015 / 1K tokens	$0.0020 / 1K tokens
Code Interpreter	$0.03 / session
DALL·E 3 Standard	1024×1024 - $0.040 / image
DALL·E 3 HD	1024×1024 - $0.080 / image

NLP

AI for Help

Browse Courses

Tokens in Generative AI

Tokens in Generative AI

Definition of Tokens

Exploring Tokens

Tools for Token Exploration

Counting Tokens

Example (Using tiktoken)

OpenAI Guidelines for Token Count

Estimating AI Costs

Current Pricing of AI Models

Conclusion

FAQ

What are tokens in AI APIs?

What are token limits in AI APIs?

Give example of different models' token limits

How is token pricing structured?

How does the API explore tokens?

What are popular token tools?

How can tokens be counted for an OpenAI API call?

What are the OpenAI guidelines for token count?

How can AI costs be estimated?

What is the current pricing of AI models?

Example (Using `tiktoken`)