Skip to content

Implement auto-reconnection for SUB sockets#230

Merged
rgbkrk merged 2 commits into
masterfrom
quill/surabaya-v1
Mar 4, 2026
Merged

Implement auto-reconnection for SUB sockets#230
rgbkrk merged 2 commits into
masterfrom
quill/surabaya-v1

Conversation

@rgbkrk
Copy link
Copy Markdown
Member

@rgbkrk rgbkrk commented Mar 4, 2026

Summary

This PR implements automatic reconnection support for zmq.rs SUB sockets. When a connected peer disconnects, a background task automatically reconnects with exponential backoff, and subscriptions are automatically re-sent on reconnection. This restores standard ZeroMQ behavior where SUB sockets maintain connectivity.

Implementation Details

  • New src/reconnect.rs module providing ReconnectConfig and spawn_reconnect_task
  • FairQueue enhancements with on_disconnect callback for detecting peer disconnections
  • Backend shutdown improvements to ensure TCP connections close during cleanup
  • SubSocketBackend supports registering disconnect notifiers for reconnection tasks
  • SubSocket overrides connect() to spawn background reconnection monitors
  • Previously-ignored reconnection_compliant test now passes

rgbkrk added 2 commits March 4, 2026 12:57
This commit adds automatic reconnection support to zmq.rs SUB sockets when connected peers disconnect and reconnect. When a connection is lost, a background task attempts to reconnect with exponential backoff (100ms to 30s), and on successful reconnection, subscriptions are automatically re-sent to restore the subscription state.

The implementation includes:
- New src/reconnect.rs module with ReconnectConfig and spawn_reconnect_task
- FairQueue enhancements: on_disconnect callback to detect peer disconnections
- Backend shutdown fix: clear fair_queue streams to ensure TCP connections close
- SubSocketBackend with disconnect notifier support for reconnection tasks
- SubSocket override of connect() to spawn background reconnection monitors
- Updated reconnection_compliant test that now passes (previously ignored)

All CI checks pass: formatting, clippy (--deny warnings), and tests across all three runtime backends (tokio, async-std, async-dispatcher).
Address three issues identified in code review:

1. Arc cycle fix: Use Arc::downgrade() in fair_queue on_disconnect callback
   to capture a Weak reference instead of a strong Arc. This prevents the
   cycle: backend -> fair_queue_inner -> callback -> backend.

2. Shutdown check in retry loop: Add shutdown signal check during backoff
   sleep using futures::select!. The shutdown_rx is now fused so it can be
   polled multiple times. This ensures ReconnectHandle::shutdown() works
   even when the endpoint is down and retries are in progress.

3. Emit Connected event on reconnect: try_reconnect now returns the
   resolved endpoint, and the reconnect task emits SocketEvent::Connected
   after successful reconnection. This ensures monitor consumers see
   matching Connected/Disconnected events.
@rgbkrk rgbkrk merged commit 34aa1a5 into master Mar 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant