Task Broker
Table of Contents
This is still under design.
The ZIO task broker is essentially a rip off of ZeroMQ majordomo aka service oriented reliable queueing pattern (python version) with some variation:
- It is more generalized ("generaldomo"?) in how client requests and workers are matched.
- Replace ROUTER with SERVER so that clients may use thread-safe CLIENT socket.
1 Requirements
The primary requirements (goals) for the system are:
- Allow one program ("client") to run a task in an "external" program ("worker") where the two may be on different computers.
- Support N-to-M mapping between clients and workers such that either number may change over time without affecting operation ("broker", aka ZeroMQ reliable request pattern / pirate patterns).
- Do not restrict client endpoints to always run from the same thread (use CLIENT socket, not REQ nor DEALER).
- Separate out broker logic from request/service matching logic.
- Broker accepts clients and workers appearing and disappearing over its lifetime.
- Reissue task if worker disappears after accepting it.
- Never return the same result to a client (eg, if worker reappears after a reissue)
- Clients and workers can discover the broker SERVER socket address.
- Broker provides status information.
2 Parts
- client request message specification and payload
- worker service specification
- request/service matching mechanism
- result messages specification and payload
- client API
- worker API
- broker implementation
3 Roles and activity
- broker (B) starts with a given service resolver (SR) plugin
- one worker (W) of many and one client (C) of many connect to broker in any order
- W sends as a query to B a message M1 which includes a service description object (SDO)
- If M1 has payload, B sends M1 to waiting client C previously associated to W.
- B applies SR to SDO and receives associated service management object (SMO)
- B adds W to SMO
- C sends message M2 with request description object (RDO) + payload
- B applies SR to RDO and receives associated SMO
- B sends M2 to first W in SMO, removes W
This is essentially identical to ZeroMQ Majordomo except for adding
the transformations SR(SDO) -> SMO
and SR(RDO) -> SMO
. In Majordomo,
the SMO is simply a hard-wired service name which both client and
worker agree on by contract. Here, client and worker to express their
own terms and the contract is embodied in code (the SR transform
function).
Besides breaking a tight conceptual binding, this allows the SR to take into account additional information which could be static or dynamic. A static example is to apply some priority bias to selecting which service to match to a request. Eg, workers fronting GPUs of different capabilities. A dynamic example would be to load balance workers across some set of resources.
This generality reduces to Majordomo. If not needed then a trivial SR function may be provided.
4 Why SERVER/CLIENT
WCT has a single-threaded and a multi-threaded graph execution engine. The latter runs WCT components via a TBB thread pool in TBB's "flow graph" implementation.
If a the conventional DEALER socket was used from a component it will execute on different threads over its lifetime, yet it is not thread-safe. Proper operation is not guaranteed and likely not even by luck.
Instead CLIENT must be used which forces a SERVER socket somewhere. Two clear options exist:
- Re-implement Majordomo broker to replace ROUTER with SERVER.
- Run one or more SERVER<–>DEALER "devices" outside of TBB thread pool to proxy to standard Majordomo ROUTER.
A Python prototype for option 1 exists as generaldomo. It lets SERVER be swapped for ROUTER at runtime by providing a dichotomy in how message handling is implemented. This adds some runtime overhead but it's not horrible. A "real" implementation might stick to just a SERVER-based implementation. Note, such an implementation is not RFC 18/MDP-compliant as use of ROUTER is explicitly required. This prototype also does not handle message envelopes.
Option 2 would allow the "official" Majordomo implementation to be
used. As it is C/C++, the Majordomo broker could be embedded in a WCT
(service-ish) component, as could the SERVER-DEALER "device" proxy.
This would allow inproc://
transport from client, through proxy to
broker. Besides reducing message transfer time inherently, it would
allow messages to hold pointers. And, it would simplify job
configuration and management as the over all job would look to the
user like just one wire-cell
program. As the broker's router could
bind to both inproc://
and tcp://
external workers can still be
supported. However, this would either require either that
pointer-based messages are not used or it would require some
data-aware "device" proxy running inside WCT which converts between
pointer and object and forwards to a worker.