Workers AI provides access to AI models directly from Workers. Use the API to run inference on text, images, and embeddings.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/cloudflare/cloudflare-typescript/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Access the Workers AI API:Run inference
Execute AI models on-demand.Text generation
Generate text using large language models.The AI model identifier (e.g., ‘@cf/meta/llama-2-7b-chat-int8’)
Your Cloudflare account ID
The input text prompt
Maximum number of tokens to generate
Controls randomness (0.0 to 1.0, higher = more random)
Nucleus sampling parameter
Top-k sampling parameter
The generated text response
Token usage statistics
Number of tokens in the prompt
Number of tokens in the completion
Total number of tokens used
Chat completion
Generate chat responses using a messages format.Array of message objects
Message role: ‘system’, ‘user’, or ‘assistant’
Message content
Text embeddings
Generate vector embeddings for text.The embedding model (e.g., ‘@cf/baai/bge-base-en-v1.5’)
Text string or array of strings to embed
Array of embedding vectors
Shape of the embedding array [count, dimensions]
Text classification
Classify text into categories.Classification label
Confidence score (0-1)
Translation
Translate text between languages.Text to translate
Source language code (e.g., ‘en’)
Target language code (e.g., ‘es’)
Summarization
Summarize long text.Text to summarize
Maximum summary length in tokens
Text-to-image
Generate images from text descriptions.Text description of the image to generate
Things to avoid in the image
Number of diffusion steps (higher = better quality, slower)
How closely to follow the prompt (1-20)
Image width in pixels
Image height in pixels
Image classification
Classify images into categories.Image data as array of integers (8-bit unsigned)
Automatic speech recognition
Transcribe audio to text.Audio data as array of integers
Source language of the audio
The transcribed text
Number of words transcribed
Models
Browse and discover available AI models.List models
Retrieve all available AI models.Your Cloudflare account ID
Model identifier
Model description
Task type (e.g., ‘text-generation’, ‘text-embeddings’)
Using Workers AI in Workers
Bind Workers AI to your Worker:Best practices
- Model selection: Choose the right model for your task (size vs. performance)
- Caching: Cache embeddings and frequently used results
- Rate limiting: Implement rate limiting for user-facing applications
- Error handling: Handle model errors gracefully with fallbacks
- Streaming: Use streaming for long-running text generation
- Context length: Be mindful of model context limits