LogoDomain Rank App
icon of WebSocket Mode

WebSocket Mode

Learn how to use OpenAI's Responses API WebSocket mode for lower-latency agentic workflows with persistent connections and incremental inputs.

Introduction

OpenAI API WebSocket Mode Guide

WebSocket mode is a specialized feature of OpenAI's Responses API designed for long-running, tool-call-heavy workflows. It enables developers to maintain persistent WebSocket connections to /v1/responses and continue each turn by sending only new input items along with a previous_response_id.

Key Features
  • Persistent Connections: Maintain a single WebSocket connection for multiple response turns, reducing connection overhead
  • Incremental Inputs: Send only new input items (like tool outputs and user messages) rather than full context each time
  • Lower Latency: Achieve up to 40% faster end-to-end execution for workflows with 20+ tool calls
  • Compatibility: Works with both Zero Data Retention (ZDR) and store=false modes
  • Connection-Local Caching: The service keeps previous-response state in memory for fast continuation
Use Cases
  • Agentic Coding: Long chains of code generation and tool execution
  • Orchestration Loops: Workflows with repeated tool calls and model interactions
  • Real-time Applications: Scenarios requiring minimal latency between model-tool round trips
  • High-Volume Tool Usage: Applications where each workflow involves many tool interactions
Technical Implementation

The WebSocket mode uses the same previous_response_id chaining semantics as HTTP mode but adds a lower-latency continuation path on the active socket. When continuing from the most recent response, the service can reuse connection-local state stored in memory, providing significant performance benefits for sequential workflows.

Connection Management
  • Sequential Processing: A single WebSocket connection processes responses sequentially (one at a time)
  • 60-Minute Limit: Connections are limited to 60 minutes duration
  • Reconnection Patterns: Multiple strategies for reconnecting after connection closure
  • Error Handling: Specific error codes like previous_response_not_found and websocket_connection_limit_reached
Integration with Other Features
  • Compaction: Works with both server-side compaction (context_management) and standalone /responses/compact
  • Streaming Events: Follows the existing Responses streaming event model
  • Warm-up Requests: Supports generate: false requests to prepare request state before actual generation

Analytics