moo
無 ovalid
Validating objects with moo
oschema types
Table of Contents
Concepts
moo provides a method called ovalid
which is used to validate data structures (models) against moo oschema (or JSON Schema). When using moo oschema
to produce moo otypes a valid by construction pattern is enacted. On the other hand, it is common to receive data structures (models) which "should" be valid against an oschema
but which may actually have been constructed in some faulty manner. The moo ovalid
methods can be used to check their validity.
This second form of validation relies on the standard JSON Schema form of schema information and its Python implementations for validation (so far including the jsonschema
and fastjsonschema
Python packages). moo will accept a schema description in moo oschema
form (as well as JSON Schema form) and some arbitrary data and will determine if that data is valid against the schema. A flexible moo validate
command line interface is provided to apply the validation to a single model/schema pair or to a pair of matched sequences of models and schema. This "sequence mode" is particularly well suited to writing unit tests to assure your schema and example models are mutually valid.
Much of the same functionality exposed by the moo validate
command line interface can be utilized from your own Python programs via the moo.ovalid
module.
The remainder of this document describes how to apply moo ovalid
validation on the command line through a series of examples and then describes how to apply it in your own Python code.
Validation from the command line
The validation is actually performed using both schema and data structure (model) represented as Python objects. With moo's support for many file formats the user is able to provide file representations of these objects to the command line interface in a variety of formats.
Validation at the command line starts with the moo validate
commmand line interface and its comprehensive "help" provides all the essential documentation:
moo validate --help
Usage: moo validate [OPTIONS] MODEL Validate models against target schema. A full "context" schema must be provided by -s/--schema if it is required for target schema to resolve any dependencies. The "context" schema is identified with a string of the form "filename with optional dataprefix". -s myschema.subschema:my-schema.jsonnet This resulst ins the "subschema" attribute of the "myschema" attribute of the top level object from "my-schema.jsonnet" to be used as the "context" schema. A "target" schema is what is used to validate a model and may be specified in a variety of "target forms" with the -t/--target option. The supported target forms are: - an integer indicating an index into the full "context" schema is alloweed when the context is of a sequence form. - a simple string indicating either a key of the full "context" schema, allowed only if the context is an object, or indicating the "name" attribute of an moo oschema object held in the context (be it of sequence or object form). - a filename with optional "datapath:" prefix. When this last form is used the resulting data structure may be any target form listed above or may directly be an moo oschema object or a JSON Schema object. By default, this command operates in "scalar mode" meaning a single model and single target schema are processed. It may instead operate in "sequence mode" which expects a matching sequence of models and target schema. Sequence mode is entered when any of the following are true: - the --sequence option is given indicating array of models is given - more than one -t/--target is given - a -t/--target value is a comma-separated list of target forms - a -t/--target is a filename with optional "datapath:" prefix and the loaded data produces a list or tuple form. The multiple targets are concantenated and the resulting sequence must match the supplied sequence of models. In the special cases that all target schema are either in JSON Schema form or are in moo oschema form but lack any type dependency, a context schema is not required. Options: -o, --output FILE Output file, default is stdout -s, --schema TEXT File containing a representation of a schema. -t, --target TEXT Specify target schema of the model --sequence Indicate the model is a sequence of models (ie, not an array model) --passfail Print PASS or FAIL instead of null/throw -V, --validator [jsonschema|fastjsonschema] Specify which validator -h, --help Show this message and exit.
Through this CLI you may provide both schema and model (data) files in a variety of ways and formats giving flexibility in use. This flexibility can become complex as needed. The rest of this section goes through examples, starting from the simple and gaining complexity to show more advanced processing patterns.
Atomic model and schema
The most simple case is to validate a single "atomic" unit of data with no structure such as a number or a string. To keep the number of files small we bundle both the model and schema in a single file.
local moo = import "moo.jsonnet"; local as = moo.oschema.schema("ovalid.atomic"); { model: 42, target: as.number("Count", dtype="u4") }
We tell moo where in that single file to find the model and the target schema via "data path prefixed file names":
moo validate --passfail \ -t target:examples/ovalid/atomic.jsonnet \ model:examples/ovalid/atomic.jsonnet
[ true ]
If you read the --help
output above you may note taht there is no "context schema" provided via the -s/--schema
option. This can be avoided in this case because the target schema is provided as an object directly and because that schema depends on no other schema (it is atomic). In examples below we will show how the "context" schema becomes required.
Providing JSON Schema
Before increasing the complexity of the schema we can use the "atomic" example to show how moo validate
can also utilize schema in JSON Schema form in addition to moo oschema
form. It is convenient here to provide the JSON Schema via the moo jsonschema
command but the JSON Schema could just as well be provided in some other manner. Just to show what JSON Schema gets used in validation we emit the intermediate results:
moo jsonschema target:examples/ovalid/atomic.jsonnet
{ "$schema": "http://json-schema.org/draft-07/schema#", "$defs": {}, "type": "integer", "minimum": 0 }
More complex JSON Schema can be generated from moo oschema
via this command. If the schema is compound then a target schema likely must be provided via -t/--target
option. We will see more about target schema in the later moo validate
examples below. Or, as always, see the help:
moo jsonschema -h
Usage: moo jsonschema [OPTIONS] OSCHEMA Convert from moo oschema to JSON Schema Options: -o, --output FILE Output file, default is stdout -t, --target TEXT Specify target schema -h, --help Show this message and exit.
In any case, here shows that the JSON Schema form also validates our simple atomic model:
moo jsonschema target:examples/ovalid/atomic.jsonnet > atomic.json moo validate --passfail \ -t atomic.json \ model:examples/ovalid/atomic.jsonnet
[ true ]
Validating with a non-trivial but still simple schema
A schema describing an atomic model is not very expressive. Within the rules of moo oschema
, arbitrarily complex structure can be described. In this example we minimally extend the atomic case to include a few more atoms and a record
type named Object
with fields composed of these other types.
As described more in the oschema doc we typically build any moderately complex schema in the context of a "working object" (often named a "hier" as in "hierarchy of types"). This allows constructing one type with references to others via referencing features provided by the Jsonnet language. Here is a simple such schema:
local moo = import "moo.jsonnet"; local as = moo.oschema.schema("ovalid.simple"); { name: as.string("Name"), count: as.number("Count", dtype="u4"), real: as.number("Real", dtype="f4"), any: as.any("Data"), obj: as.record("Object", [ as.field("rname", self.name, doc="required string"), as.field("rany", self.any, doc="required any"), as.field("oname", self.name, optional=true, doc="optional string"), as.field("oany", self.any, optional=true, doc="optional any"), as.field("dname", self.name, default="", doc="default string"), ///NOTE: can not currently provide a default to an any! //as.field("dany", self.any, default=???, doc="default any"), ]), counts: as.sequence("Counts", self.count), cobj: as.record("CountsObject", [ as.field("counts", self.counts), ]), }
And here is an example of a model that matches the Object
schema:
{ rname: "required_name", rany: ["anything",4,"you"], }
Let's check if it indeed matches:
moo validate \ --passfail \ --target Object \ --schema examples/ovalid/simple-schema-hier.jsonnet \ examples/ovalid/simple-model.jsonnet
[ true ]
Here we have provided moo validate
schema information in two ways that are different from the atomic example.
- We have provided a "context schema" with the
-s/--schema
option. - We have identified the target inside this context via its type name
Object
An example below shows some other ways to provide target schema.
More information when validation fails
So far, the examples are all valid and true
is returned. Let's make a failure.
moo validate \ --passfail \ --target Count \ --schema examples/ovalid/simple-schema-hier.jsonnet \ examples/ovalid/simple-model.jsonnet
[ false ]
Now a false
is printed. The model is really meant to be of type Object
but we validate it against type Count
. We can see what the underlying JSON Schema validation engine thinks of this situation by omitting the --passfail
option. Here is an example:
moo validate \ --target Count \ --schema examples/ovalid/simple-schema-hier.jsonnet \ examples/ovalid/simple-model.jsonnet 2>&1 | awk /Failed/,EOF
Failed validating 'type' in schema: {'$defs': {}, '$schema': 'http://json-schema.org/draft-07/schema#', 'minimum': 0, 'type': 'integer'} On instance: {'rany': ['anything', 4, 'you'], 'rname': 'required_name'}
We use the awk
bit to avoid cluttering this display with the Python traceback that precedes the more useful bits.
Using a different validation engine
By default moo validate
uses the jsonschema Python module to perform ovalid
type validation. Optionallhy it may apply fastjsonschema like so:
moo validate \ --target Count \ --validator fastjsonschema \ --schema examples/ovalid/simple-schema-hier.jsonnet \ examples/ovalid/simple-model.jsonnet 2>&1 | grep '^fastjsonschema'
fastjsonschema.exceptions.JsonSchemaValueException: data must be integer
As can be seen, fastjsonschema
provides a rather more terse explanation of validation failures.
Other ways to identify target schema
Getting back to our "simple" schema, we identified a target schema in the above examples by providing a schema name
of Object
. Because as that schema was provided in a "hier" schema object form we can also give an object key:
moo validate \ --passfail \ --target obj \ --schema examples/ovalid/simple-schema-hier.jsonnet \ examples/ovalid/simple-model.jsonnet
[ true ]
You can learn the key name by reading the Jsonnet source but it can sometimes be easier to compile the Jsonnet to JSON and examine that. We won't do that here but the command would be:
moo dump -f json examples/ovalid/simple-schema-hier.jsonnet
In may cases, the context schema is provided not as a "hier" object but as a sequence which has been topologically sorted according to type dependency information. The "simple" hier object is transformed into such a sequence with this example:
local moo = import "moo.jsonnet"; local hier = import "simple-schema-hier.jsonnet"; moo.oschema.sort_select(hier)
In order to identify a target schema in a sequence context schema one can still provide the schema name (ie Object
, Count
, etc) or we may identify the target in the sequence by providing an index as an integer counting in the usual "Python" way:
moo validate \ --passfail \ --target -2 \ --schema examples/ovalid/simple-schema-seq.jsonnet \ examples/ovalid/simple-model.jsonnet
[ true ]
And, again, moo dump
may provide an easy way to learn which index to supply.
Validating in sequence mode
So far the examples validated in the default "scalar mode" of moo validate
. This mode assumes both the target schema and the model are singular, be they atomic or an aggregate. moo validate
has a second mode called "sequence mode" where a pair of matched sequences of individual models and schema is assumed.
The "sequence mode" is really not much more than a glorified loop calling moo validate
in "scalar mode" on each singular, matched model/schema from their individual sequences. User may implement this loop themselves by calling moo validate
many times in "scalar mode". The benefit in moving this loop inside of moo validate
is that the user may provide more concise and fewer files holding schema and model information. In particular, sequence mode is useful to develop unit tests which exercise various portions of a larger schema.
We now look at an example of applying sequence mode using the "simple" example. We provide a single, concise file that brings both context and a sequence of target schema together with their sequence of models:
local h = import "simple-schema-hier.jsonnet"; { schema: h, targets: ["real", h.count, "Name", "Counts", "Counts", "Counts", "Count", "Count"], models: [6.9, 42, "Arthur", [1,2,3], [1.1, 2.2, 3.3], ["one", "two", "three"], 1.1, "one"] }
Note the targets
attribute of the top-level object which this file produces is an array of same length as that of the models
array. The validation walk down both arrays in step. The targets
array holds a schema name or the key by which the schema can be found in the context schema, provided by the schema
attribute.
We validate the entire sequence with this command:
moo validate \ --passfail \ -t targets:examples/ovalid/simple-sandm.jsonnet \ -s schema:examples/ovalid/simple-sandm.jsonnet \ models:examples/ovalid/simple-sandm.jsonnet
[ true, true, true, true, false, false, false, false ]
Compared to examples above, we identify target and context schema and the models all as attributes of the same file. Because the data provide by -t/--target
resolves to more than one target schema, moo validate
enters sequence mode automatically. Sequence mode is also detected if more than one -t/--target
option is given or the user can explicitly request it with the --sequence
flag. Strictly speaking, this flag is only required if one processes a sequence of exactly one target.
As can be seen, half the validations failed. This is contrived and you can examine simple-sandm.jsonnet
to understand why, In looking at that file, not that several ways to "spell" a target are used. As described in moo validate --help
you can provide a target in at least these ways:
- the key name into a context schema object as in
"real"
- directly a moo oschema or JSON Schema object as in
h.count
- the schema name as in
"Name"
If the context schema is provided as an array of schema objects one can specify a target as an integer index. It is even possible to specify a target as another data path prefixed file name.
Validation in Python
The moo validate
command line interface is a thin wrapper around moo
Python modules. This section steps through the essential function calls.
Loading files
A function is needed to load data files. Without going into details here is a one example. If search paths, top-level-arguments and datapath prefixes are not important, they can be omitted.
def load_file(fn, path=(), **tlas): dpath, filename = moo.util.unprefix(filename) sp = moo.util.search_path(filename, path) return moo.io.load(filename, sp, dpath, **tlas)
Loading the context schema
The context schema, if used, can then be loaded:
context = load_file(context_filename)
Loading target schema
The target schema can be given as described above in many forms. They can be resolved with code like:
targets = [...] # list of targets in various forms targets = moo.util.resolve_schema(targets, context, load_file)
Loading models
Models are loaded like schema
models = ctx.obj.load(model) # or with multiple models: models = [ctx.obj.load(model) for model in models]
Validation
Finally, validation, assuming a matched sequence of target schema and models:
res = moo.ovalid.validate(models, targets, context, throw=False)