OpenAI API WebSocket Mode Guide
WebSocket mode is a specialized feature of OpenAI's Responses API designed for long-running, tool-call-heavy workflows. It enables developers to maintain persistent WebSocket connections to /v1/responses and continue each turn by sending only new input items along with a previous_response_id.
Key Features
- Persistent Connections: Maintain a single WebSocket connection for multiple response turns, reducing connection overhead
- Incremental Inputs: Send only new input items (like tool outputs and user messages) rather than full context each time
- Lower Latency: Achieve up to 40% faster end-to-end execution for workflows with 20+ tool calls
- Compatibility: Works with both Zero Data Retention (ZDR) and
store=falsemodes - Connection-Local Caching: The service keeps previous-response state in memory for fast continuation
Use Cases
- Agentic Coding: Long chains of code generation and tool execution
- Orchestration Loops: Workflows with repeated tool calls and model interactions
- Real-time Applications: Scenarios requiring minimal latency between model-tool round trips
- High-Volume Tool Usage: Applications where each workflow involves many tool interactions
Technical Implementation
The WebSocket mode uses the same previous_response_id chaining semantics as HTTP mode but adds a lower-latency continuation path on the active socket. When continuing from the most recent response, the service can reuse connection-local state stored in memory, providing significant performance benefits for sequential workflows.
Connection Management
- Sequential Processing: A single WebSocket connection processes responses sequentially (one at a time)
- 60-Minute Limit: Connections are limited to 60 minutes duration
- Reconnection Patterns: Multiple strategies for reconnecting after connection closure
- Error Handling: Specific error codes like
previous_response_not_foundandwebsocket_connection_limit_reached
Integration with Other Features
- Compaction: Works with both server-side compaction (
context_management) and standalone/responses/compact - Streaming Events: Follows the existing Responses streaming event model
- Warm-up Requests: Supports
generate: falserequests to prepare request state before actual generation

