[ENDPOINT_ID]
with your Serverless endpoint ID.
Endpoint | Description | Status |
---|---|---|
/chat/completions | Generate chat model completions | Fully supported |
/completions | Generate text completions | Fully supported |
/models | List available models | Supported |
MODEL_NAME
environment variable is essential for all OpenAI-compatible API requests. This variable corresponds to either:
mistralai/Mistral-7B-Instruct-v0.2
)OPENAI_SERVED_MODEL_NAME_OVERRIDE
as an environment variable/chat/completions
endpoint is designed for instruction-tuned LLMs that follow a chat format.
/completions
endpoint is designed for base LLMs and text completion tasks.
/models
endpoint allows you to get a list of available models on your endpoint:
Variable | Default | Description |
---|---|---|
RAW_OPENAI_OUTPUT | 1 (true) | Enables raw OpenAI SSE format for streaming |
OPENAI_SERVED_MODEL_NAME_OVERRIDE | None | Override the model name in responses |
OPENAI_RESPONSE_ROLE | assistant | Role for responses in chat completions |
Issue | Solution |
---|---|
”Invalid model” error | Verify your model name matches what you deployed |
Authentication error | Check that you’re using your Runpod API key, not an OpenAI key |
Timeout errors | Increase client timeout settings for large models |
Incompatible responses | Set RAW_OPENAI_OUTPUT=1 in your environment variables |
Different response format | Some models may have different output formatting; use a chat template |