cogs ⚙ demo
Demonstration of Configuration Object Generation System

Table of Contents

Introduction

The demo/ directory of the cogs repository holds a demonstration of one way to make use of cogs. It includes:

  • A mocked up framework, application and components.
  • Schema to generate configuration classes.
  • Example cogs configuration stream file.
  • Simple tooling to rerun code generation.
  • Integration into cogs build system.

This document describes how to build and run the demo. It then gives a tour of the mocked framework to understand one possible way to allow cogs to be used. Details on the code generation steps come next and it ends with a section that uses the demo's schema to illustrate how one may develop schema for applications.

Build

This section describes issues about building the demo code.

Prerequisites

In addition to what cogs library requires, the demo requires:

  • moo Python program from moo (only to re-codegen)

Compile

With prerequisites satisfied, the demo builds with the cogs library. For example:

waf configure --prefix=$(pwd)/install \
    --with-nljs=$HOME/opt/nljs \
    --with-ers=$HOME/opt/ers 
waf install

You should be rewarded with:

./install/bin/cogs-demo || /bin/true
2020-Jul-09 14:10:51,819 INFO [main(...) at unknown/demo/cogs-demo.cpp:12] usage: ./install/bin/cogs-demo <uri>

Regenerate

The demo relies on generated code which is committed to the repository to reduce the build-time dependency of cogs. If the additional prerequisites are satisfied, it may be regenerated:

./demo/generate.sh
codegen in /home/bv/dev/cogs/demo
Codegen for structs and serialization
generating ./comp_nljs.hpp:
generating ./comp_structs.hpp:
generating ./head_nljs.hpp:
generating ./head_structs.hpp:
generating ./node_nljs.hpp:
generating ./node_structs.hpp:
Validating configuration
[
    null,
    null,
    null,
    null
]
null above means okay!
Compiling configuration to cogs stream file

Below we will look more at what this script does.

Run

The demo provides a ready made cogs configuration stream file:

./install/bin/cogs-demo file://demo/demo-config.json | sed -e 's/^[^]]*\]//'
making stream for file://demo/demo-config.json
main:		lookup: [demoSource]: "mycomp_source1"
Source:	constructing
main:		configure: [demoSource]: "mycomp_source1"
Source:	configured to send 42 things
main:		lookup: [demoNode]: "mynode_inst1"
Node:		constructing
main:		configure: [demoNode]: "mynode_inst1"
Node:		mynode1
Node:			making port: src
Node:				link: bind'ing to: tcp://127.0.0.1:5678
Node:		lookin up component: mycomp_source1
Node:			set port: src
Source:	given port src
main:		configuration stream done

The sed is simply to remove ERS output augmentation more appropriate to log files. The output shows the configuration driving the construction of a "node" and a "component" (the "source") followed by their configuration. When the node is configured it makes a (dummy) "port" and hands that C++ object to the source. The next section on the demo framework describes these terms. They are not inherent to cogs itself, just this demo but they represent typical code patterns.

Framework

It is possible to use cogs in a variety of patterns. This demo illustrates one particular pattern such as may be used in an "application framework". This pattern might be named something like "factory configuration". It provides for a highly flexible, configuration-driven method for "aggregating" an application instance from a set of factory-instantiated components.

Demo configuration stream

The construction of the demo application and its configuration is driven by a cogs configuration stream. The stream is composed of a sequence of pairs of configuration objects. The first object in each pair co responds to a fixed type of demo::ConfigurableBase. It provides information required to locate a component instance. The second object in a pair corresponds to the configuration of that component instance.

The stream is illustrated as:

component 1: democfg::ConfigHeader
component 1: corresponding cfg object
component N: democfg::ConfigHeader
component N: corresponding cfg object

Dispatching configuration

The ConfigHeader provides two attributes:

implementation identifier
this is some name associated with a construction method for an implementation of ConfigurableBase. This identifier is some simple name, likely derived from the component's C++ class name.
instance identifier
multiple instances of one component may be constructed and this identifier keeps then distinct.

The main application walks the cogs::Stream using the ConfigHeader to retrieve an instance from the demo factory. It then reads the next object from the cogs::Stream and passes it to the component's configure() method. When the stream is exhausted the demo app simply exits. A real app would of course go on to some other phase of execution.

Non-trivial application patterns

The demo adds some non-trivial complexity by considering two types of configurable objects:

node
an object which has a collection of ports such may be associated with sockets. The demo keeps ports as dummies but they represent some shared resource that is non-trivial to construct.
components
an object which is configurable and may also want to use ports. There is only a single component in the demo called a "source". It represents some arbitrary "code execution unit" aka "user module".

The node is really just another component but it is called out special here as it uses the factory to locate instances of other components, as directed by its configuration and in order to deliver fully formed "ports". A component must inherit from demo::PortuserBase and be listed in the node's configuration in order to receive its ports.

This pattern is a mock of a real implementation found in ZIO which uses a zio::Node to create and link zio::Port instances either directly or automatically with the help of a zio::Peer performing distributed network discovery.

Codegen

This section provides a tour of code generation part of the demo. The tour focuses on the short generate.sh script. This script runs its commands from the demo/ directory and that should be taken into consideration when reading excerpts of the script which are shown below.

Generating C++

User code should not be burdened with validating and interpreting a configuration byte stream or even interpreting dynamic C++ object like nlohman::json. Instead, with cogs the user code receives a fully typed C++ struct, thus guaranteeing at least valid object structure.

The code for a struct and its serialization methods is generated from an application schema realized with the Avro domain schema and a few extra bits of information. This information for the demo's "node", "comp" and "head" application schema is brought together in the short file demo-codegen.jsonnet included here:

local moo = import "moo.jsonnet";

local s = {
    node: import "node-schema.jsonnet",
    comp: import "comp-schema.jsonnet",
    head: import "head-schema.jsonnet",
};

local render_one = function(cg) [
    moo.render(cg, "avro_%s.hpp.j2"%t, "%s_%s.hpp"%[cg.name,t])
    for t in ["nljs", "structs"]];


local cg = moo.schema.avro.codegen;
std.flattenArrays([render_one(cg(k, s[k], "democfg")) for k in std.objectFields(s)])

To produce the set of six header files (one for structs and one for their serialization for each application schema) one runs:

echo "Codegen for structs and serialization"
moo render-many demo-codegen.jsonnet

Configuration stream

We finally generate an example cogs configuration stream in the form of a JSON file holding an array. This file is created from Jsonnet by moo:

echo "Compiling configuration to cogs stream file"
moo -D model compile demo-config.jsonnet > demo-config.json

The demo-config.json file is what was used above to run the demo. It is not long and so is included here:

[
    {
        "impname": "demoSource",
        "instname": "mycomp_source1"
    },
    {
        "ntosend": 42
    },
    {
        "impname": "demoNode",
        "instname": "mynode_inst1"
    },
    {
        "compdefs": [
            {
                "config": "",
                "ident": "mycomp_source1",
                "portlist": [
                    "src"
                ],
                "type_name": "demoSource"
            }
        ],
        "ident": "mynode1",
        "portdefs": [
            {
                "ident": "src",
                "links": [
                    {
                        "address": "tcp://127.0.0.1:5678",
                        "linktype": "bind"
                    }
                ]
            }
        ]
    }
]

You can see the paired objects, each preceded by what will be come a demo::ConfigHeader followed a an object of a specific type corresponding to the component named in the preceding header.

Note, the choice of ordering is intentional. It leads to the construction and configuration of the demoSource prior to the use of this component inside the node. That use calls back to the component in order to pass in the requested "port" objects.

Schema

This section describes how to develop schema. It first describes the layer of "application schema" and "abstract base schema". It then illustrates the elements of the the latter and walks through an example of the former.

Layers

The demo assumes two layers or schema. The lowest is called an "abstract base schema". Strictly speaking it is a specification of a set of function names and their arguments. The demo then provides a number of implementations of this base schema. A implementation of a base schema function then returns a corresponding data structure that adheres to the schema vocabulary of a particular domain.

For example, one base schema implementation provides structures suitable for directly producing Avro schema JSON. Another provides structures which adhere to JSON Schema vocabulary. Another example given above is one that produces structure that may be applied to a message.proto.j2 template to produce Protobuf .proto file that can then be compiled into C++ classes via protoc.

Using these primitive base functions, an application developer writes the next layer of functions which emit schema that describes the specific data types required by the developers components.

The next section describes the functions provided by a base schema followed by a tour of the application level schema for the configuration used by the "node" component in the cogs demo.

Base schema

The base schema in its abstract form is a set of Jsonnet function prototypes which are summarized here. An implementation of an abstract function is expected to return a description of the type named by the function in some domain vocabulary. For example the demo provides one base implemented for the Avro schema domain and one for that of JSON Schema.

Domains will differ in what they can meaningfully accept. This means that some domains may ignore some arguments to their functions. Furthermore, some arguments are optional which are indicated by setting default value to Jsonnet null. A domain may either provide a default inside the function body or the argument shall be ignored (no null values should "leak out" from the functions).

The abstract base schema functions are:

boolean()
a Boolean type
number(dtype, extra={})
a numeric type. The dtype argument should provide specific type information using Numpy codes (eg i4 for C++ int, u2 for C++ uint16_t). The extra may specify JSON Schema constraints.
bytes(encoding=null, media_type=null)
a sequence of byte values
string(patern=null, format=null)
a string type, pattern and format are JSON Schema arguments specifying a regular expression or a named format that a valid string must match.
field(name, type, default=null, doc=null)
an named and typed element in the context of a record. If the type is not scalar (eg, is a record) then type should be given as the name of the type. The default may provide a default value of this field. The doc provides a brief English description of the meaning of the field.
record(name, fields=[], doc=null)
a type which aggregates fields. This corresponds to a JSON object or a C++ struct or class, etc. The fields array is a sequence of objects returned from the field() function (from the same domain).
sequence(type)
an ordered sequence holding elements of type type.
enum(name, symbols, default=null, doc=null)
an enumerated type. The symbols is an array of string literals naming the enumerated values. The default may specify an enumerated value to be used if otherwise not specified.

Node schema

The concept of a "node" in this demo has been descried above. Here we examine the node-schema.jsonnet file as an example of an application-level schema.

First we look at the high-level structure of the file:

function(schema) {
    // defines types
    types: [ typeA, typeB, ...]
}

This Jsonnet compiles down to a single function object which takes the argument schema which provides a set of base schema functions such as described in the previous sections. The primary result of this function is to return a Jsonnet object ({...}) which contains an attribute types holding an array of objects describing types constructed through calls to functions provided by schema.

Looking at the first few lines:


Here, the Jsonnet file re.jsonnet is imported and is provided by moo. It contains a set of regular expressions that are to be used to constrain the validity of strings in the schema. For example it begins with:

{
    // Basic identifier (restrict to legal C variable nam)
    ident: '[a-zA-Z][a-zA-Z0-9_]*',
    ident_only: '^' + self.ident + '$',

Thus a string with pattern set to ident may be validated to hold only a limited alphanumeric content.

Back to node-shcema.jsonnet, we ident defined as a string with a pattern re.ident_only as a Jsonnet local. This means the variable is temporary and known only in the scope of the object. This lets it be referred to simply by the name ident later.

Next we find an example of an enum and a record which describe a "link":

local address = schema.string("Address", pattern=moo.schema.re.uri),

local ltype = schema.enum("LinkType", ["bind","connect"], default="bind",
                     doc="How a port links to an address"),
local link = schema.record("Link", fields= [
    schema.field("linktype", ltype,
            doc="The socket may bind or connect the link"),
    schema.field("address", address, 
            doc="The address to link to")
], doc="Describes how a single link is to be made"),

A "link" is intended to generalize the concept of relationship of a socket and an address. The LinkType represents one of the two allowed link mechanism (a bind() or a connect()). Note how the Jsonnet representation of this type is used to further define the Link along with the representation of a string type Address. In a real system like ZIO, an address is in the form of a URL like tcp://127.0.0.1:5678 for direct ZeroMQ addressing or zyre://nodename/portname for automated network peer discovery.

Next we come to a "port":

local port = schema.record("Port", fields=[
    schema.field("ident", ident,
            doc="Identify the port uniquely in th enode"),
    schema.field("links", linklist, 
            doc="Describe how this port should link to addresses"),
], doc="A port configuration object",),

A port is another record with an identifier (name) of a type ident which we defined above. That is, a string which may be validated against a regular expression. The second field is links which is a sequence. In ZIO a port corresponds to a ZeroMQ socket which may have a multitude of both bind() and connect() links.

Next we define the part of the node configuration which describes what a node needs to know in order to interact with a component:

local comp = schema.record("Comp", fields=[
    schema.field("ident", ident, 
            doc="Identify copmponent instance uniquely in the node"),
    schema.field("type_name", ident, 
            doc="Identify the component implementation"),
    schema.field("portlist", portlist,
            doc="Identity of ports required by component"),
    schema.field("config", extra_config,
            doc="Per instance configuration string used by node")
], doc="An object used by the node to partly configure a component"),

The first two fields are identifiers used to look up the component using the factory (ie, matching what is also provided to main() in the header object). The portlist is a sequence of identifiers which must mach those used in defining a Port above. This required consistency can be enforced by Jsonnet when generating actual configuration objects as described in the next section. And, finally, an arbitrary extra string is provided which the demo does not actually use for anything. It may be used by the node to interpret some special action on the component (eg, "ignore" or something).

Penultimately, we get to the top level of the "node" schema:

local node = schema.record("Node", fields=[
    schema.field("ident", ident,
            doc="Idenfity the node instance"),
    schema.field("portdefs", portdefs,
            doc="Define ports on the node to be used by components"),
    schema.field("compdefs", compdefs,
            doc="Define components the node should instantiate and configure"),
], doc="A node configures ports and components"),

This defines a record type Node with fields meant to hold the port and component definitions.

And, finally, the "return" value collects all the types:

types: [ ident, address, ltype, link, linklist,
         port, portlist, portdefs,
         extra_config, comp, compdefs, node ],

Configuration objects

The demo generates a configuration stream (JSON array) with demo-config.jsonnet. It defines two top level attributes: model which provides the configuration sequence and schema which is a sequence of objects in JSON Schema vocabulary.

The configuration stream can first be validated with moo:

moo -S schema -D model \
   validate --sequence \
   -s demo/demo-config.jsonnet \
      demo/demo-config.jsonnet

This tells moo to assume both the named model and schema are actually each a sequence of model or schema, respectively and then to test each pair one by one. Success is marked by a null for each. Failure will be greeted with information indicating where the model is invalid.

When valid the model may be compiled into a form that can be consumed as a cogs stream:

moo -D model  compile demo/demo-config.jsonnet > demo/demo-config.json

One may now go back to the section above Run where we ran this configuration through the cogs-demo application.

Author: Brett Viren

Created: 2020-07-09 Thu 14:10

Validate