WebSocket Mode

OpenAI API WebSocket Mode Guide

WebSocket mode is a specialized feature of OpenAI's Responses API designed for long-running, tool-call-heavy workflows. It enables developers to maintain persistent WebSocket connections to /v1/responses and continue each turn by sending only new input items along with a previous_response_id.

Key Features

Persistent Connections: Maintain a single WebSocket connection for multiple response turns, reducing connection overhead
Incremental Inputs: Send only new input items (like tool outputs and user messages) rather than full context each time
Lower Latency: Achieve up to 40% faster end-to-end execution for workflows with 20+ tool calls
Compatibility: Works with both Zero Data Retention (ZDR) and store=false modes
Connection-Local Caching: The service keeps previous-response state in memory for fast continuation

Use Cases

Agentic Coding: Long chains of code generation and tool execution
Orchestration Loops: Workflows with repeated tool calls and model interactions
Real-time Applications: Scenarios requiring minimal latency between model-tool round trips
High-Volume Tool Usage: Applications where each workflow involves many tool interactions

Technical Implementation

The WebSocket mode uses the same previous_response_id chaining semantics as HTTP mode but adds a lower-latency continuation path on the active socket. When continuing from the most recent response, the service can reuse connection-local state stored in memory, providing significant performance benefits for sequential workflows.

Connection Management

Sequential Processing: A single WebSocket connection processes responses sequentially (one at a time)
60-Minute Limit: Connections are limited to 60 minutes duration
Reconnection Patterns: Multiple strategies for reconnecting after connection closure
Error Handling: Specific error codes like previous_response_not_found and websocket_connection_limit_reached

Integration with Other Features

Compaction: Works with both server-side compaction (context_management) and standalone /responses/compact
Streaming Events: Follows the existing Responses streaming event model
Warm-up Requests: Supports generate: false requests to prepare request state before actual generation

WebSocket Mode

Introduction

OpenAI API WebSocket Mode Guide

Key Features

Use Cases

Technical Implementation

Connection Management

Integration with Other Features

Analytics

Information

Categories

Tags

More Products

OpenFang

Browser Use Cloud

Epismo Agent Skills