Spreadsheet Designer’s Guide¶
In this guide you’ll learn the various rules Flatten Tool uses to convert one or more sheets in a spreadsheet into a JSON document. These rules are documented with examples based around a cafe theme.
Once you’ve understood how Flatten Tool works you should be able to design your own spreadsheet structures, debug problems in your spreadsheets and be able to make use of Flatten Tool’s more advanced features.
Before we get into too much detail though, let’s start by looking at the Command Line API for unflattening a spreadsheet.
Command Line API¶
To demonstrate the command line API you’ll start with the simplest possible example, a sheet listing Cafe names:
name |
---|
Healthy Cafe |
We’d like Flatten Tool to convert it to the following JSON structure for an array of cafes, with the cafe name being the only property we want for each cafe:
{
"cafe": [
{
"name": "Healthy Cafe"
}
]
}
Let’s try converting the sheet to the JSON above.
$ flatten-tool unflatten -f=csv examples/cafe/simple/
{
"main": [
{
"name": "Healthy Cafe"
}
]
}
That’s not too far off what we wanted. You can see the array of cafes, but the
key is named main
instead of cafe
. You can tell Flatten Tool that that the
rows in the spreadsheet are cafes and should come under a cafe
key by
specifying a root list path, described next.
Caution
Older Python versions add a trailing space after ,
characters when
outputting indented JSON. This means that your output might have whitespace
differences compared to what is described here.
Root List Path¶
The root list path is the key under which Flatten Tool should add an array of objects representing each row of the main sheet.
You specify the root list path with --root-list-path
option. If you don’t
specify it, main
is used as the default as you saw in the last example.
Let’s set --root-list-path
to cafe
so that our original input generates the
JSON we were expecting:
$ flatten-tool unflatten -f=csv examples/cafe/simple/ --root-list-path=cafe
{
"cafe": [
{
"name": "Healthy Cafe"
}
]
}
That’s what we expected. Great.
Note
Although --root-list-path
sounds like it accepts a path such as
building/cafe
, it only accepts a single key.
The root is a list¶
You can also specify the data outputted is just a list, using the --root-is-list
option.
$ flatten-tool unflatten -f=csv examples/cafe/simple/ --root-is-list
[
{
"name": "Healthy Cafe"
}
]
Writing output to a file¶
By default, Flatten Tool prints its output to standard output (your terminal).
If you want it to write its output to a file instead, you can use the
--output-name
option (or -o
for short).
$ flatten-tool unflatten -f=csv examples/cafe/simple/ -o=examples/cafe/simple-file/unflattened.json
$ cat examples/cafe/simple-file/unflattened.json
{
"main": [
{
"name": "Healthy Cafe"
}
]
}
Base JSON¶
If you want the resulting JSON to also include other keys that you know in advance, you can specify them in a separate base JSON file and Flatten Tool will merge the data from your spreadsheet into that file.
For example, if base.json
looks like this:
{
"country": "England"
}
and the data looks like this:
name |
---|
Healthy Cafe |
you can run this command using the --base-json
option to see the base.json
data with the spreadsheet rows merged in:
$ flatten-tool unflatten -f=csv --root-list-path=cafe --base-json=examples/cafe/simple-base-json/base.json examples/cafe/simple-base-json/
{
"country": "England",
"cafe": [
{
"name": "Healthy Cafe"
}
]
}
Warning
If you give the base JSON the same key as you specify in --root-list-path
then Flatten Tool will overwrite its value.
All unflatten options¶
You can see all the options available for unflattening by running:
$ flatten-tool unflatten -h
usage: flatten-tool unflatten [-h] -f {csv,ods,xlsx} [--xml]
[--id-name ID_NAME] [-b BASE_JSON]
[-m ROOT_LIST_PATH] [-e ENCODING]
[-o OUTPUT_NAME] [-c CELL_SOURCE_MAP]
[-a HEADING_SOURCE_MAP]
[--timezone-name TIMEZONE_NAME] [-r ROOT_ID]
[-s SCHEMA] [--convert-titles]
[--vertical-orientation]
[--metatab-name METATAB_NAME]
[--metatab-schema METATAB_SCHEMA]
[--metatab-only]
[--metatab-vertical-orientation]
[--xml-schema [XML_SCHEMA [XML_SCHEMA ...]]]
[--default-configuration DEFAULT_CONFIGURATION]
[--root-is-list] [--disable-local-refs]
[--xml-comment XML_COMMENT] [--convert-wkt]
input_name
positional arguments:
input_name Name of the input file or directory.
optional arguments:
-h, --help show this help message and exit
-f {csv,ods,xlsx}, --input-format {csv,ods,xlsx}
File format of input file or directory.
--xml Use XML as the output format
--id-name ID_NAME String to use for the identifier key, defaults to 'id'
-b BASE_JSON, --base-json BASE_JSON
A base json file to populate with the unflattened
data.
-m ROOT_LIST_PATH, --root-list-path ROOT_LIST_PATH
The path in the JSON that will contain the unflattened
list. Defaults to main.
-e ENCODING, --encoding ENCODING
Encoding of the input file(s) (only relevant for CSV).
This can be any encoding recognised by Python.
Defaults to utf8.
-o OUTPUT_NAME, --output-name OUTPUT_NAME
Name of the outputted file. Will have an extension
appended as appropriate.
-c CELL_SOURCE_MAP, --cell-source-map CELL_SOURCE_MAP
Path to write a cell source map to. Will have an
extension appended as appropriate.
-a HEADING_SOURCE_MAP, --heading-source-map HEADING_SOURCE_MAP
Path to write a heading source map to. Will have an
extension appended as appropriate.
--timezone-name TIMEZONE_NAME
Name of the timezone, defaults to UTC. Should be in
tzdata format, e.g. Europe/London
-r ROOT_ID, --root-id ROOT_ID
Root ID of the data format, e.g. ocid for OCDS
-s SCHEMA, --schema SCHEMA
Path to a relevant schema.
--convert-titles Convert titles. Requires a schema to be specified.
--vertical-orientation
Read spreadsheet so that headings are in the first
column and data is read vertically. Only for XLSX not
CSV
--metatab-name METATAB_NAME
If supplied will assume there is a metadata tab with
the given name
--metatab-schema METATAB_SCHEMA
The jsonschema of the metadata tab
--metatab-only Parse the metatab and nothing else
--metatab-vertical-orientation
Read metatab so that headings are in the first column
and data is read vertically. Only for XLSX not CSV
--xml-schema [XML_SCHEMA [XML_SCHEMA ...]]
Path to one or more XML schemas (used for sorting)
--default-configuration DEFAULT_CONFIGURATION
Comma separated list of default parsing commands for
all sheets. Only for XLSX not CSV
--root-is-list The root element is a list. --root-list-path and meta
data will be ignored.
--disable-local-refs Disable local refs when parsing JSON Schema.
--xml-comment XML_COMMENT
String comment of what generates the xml file
--convert-wkt Enable conversion of WKT to geojson
As you can see, some of the documentation is specific to two projects that use Flatten Tool:
- OCDS - http://standard.open-contracting.org/validator/
- 360Giving - http://www.threesixtygiving.org/standard/reference/
Other options such as --cell-source-map
and --heading-source-map
will be
described in the Developer Guide once the features stabilise.
Understanding JSON Pointer and how Flatten Tool uses it¶
Let’s consider this data again and explore the algorithm Flatten Tool uses to make it work:
name |
---|
Healthy Cafe |
Here’s a command to unflatten it and the resulting JSON:
$ flatten-tool unflatten -f=csv examples/cafe/simple/ --root-list-path=cafe
{
"cafe": [
{
"name": "Healthy Cafe"
}
]
}
The key to understanding how Flatten Tool represents more complex examples in a spreadsheet lies in knowing about the JSON Pointer specification. This specification describes a fairly intuitive way to reference values in a JSON document.
To briefly describe how it works, each /
character after the first one drills
down into a JSON structure. If the value after the /
is a string, then a key
is looked up, if it is an integer then an array index is taken.
For example, the JSON pointer /cafe/0/name
is equivalent to taking the
following value out of a JSON document named document
:
>>> document['cafe'][0]['name']
In the JSON document above, the JSON pointer /cafe/0/name
would return Healthy Cafe
.
Note
JSON pointer array indexes start at 0, just like lists in Python, hence the first cafe is at index 0.
Whilst JSON pointer is designed as a way for getting data out of a JSON document, Flatten Tool uses JSON Pointer as a way of describing how to move values into a JSON document from a spreadsheet.
To do this, as it comes across JSON pointers, it automatically creates the objects and arrays required.
You can think of Flatten Tool doing the following as it parses a sheet:
Load the base JSON or use an empty JSON object
For each row:
- Convert each column heading to a JSON pointer by removing whitespace and
prepending with
/cafe/
, then adding the row index and another/
to the front - Take the value in each column and associate it with the JSON pointer (treating any numbers as array indexes, and overwriting existing JSON pointer values for that row if necessary)
- Write the value into the position in the JSON object being specified by the JSON pointer, creating more structures as you go
- Convert each column heading to a JSON pointer by removing whitespace and
prepending with
In this example there is only one sheet, and only one row, so when parsing that
first row, /cafe/0/
is appended to name
to give the JSON pointer
/cafe/0/name
. Flatten Tool then writes Healthy Cafe
in the correct position.
Index behaviour¶
There is one subtlety you need to be aware of though before you see some examples.
Although Flatten Tool always uses strings in a JSON pointer as object keys, it only takes numbers it comes across as an indication of the array position.
For example, if you gave it the JSON pointer /cafe/1503/name
, there is no
guarantee that the name
would be placed in an object at index position 1503.
Instead Flatten Tool uses numbers in the same sheet that are at the same parent
JSON pointer path (/cafe/
in this case), as being the sort order the child
objects should appear in, but not the literal index positions.
If two objects use the same index at the same base JSON pointer path, Flatten Tool will keep both but the one it comes across first will come before the other.
This behaviour has two advantages:
- data won’t be lost if for some reason the index wasn’t specified correctly
- the data in the generated JSON will be in the same order as it was specified in the sheets which is likely to be what the person putting data into the spreadsheet would expect
This behaviour is also important when you learn about Lists of Objects (without IDs) later.
Tip
You’ll see later in the relationships section, that special id
values can
alter the index behavior described here and allow Flatten Tool to merge rows
from multiple sheets.
Multiple rows¶
Let’s look at a multi-row example:
name |
---|
Healthy Cafe |
Vegetarian Cafe |
This time Healthy Cafe
would be placed at /cafe/0/name
and Vegetarian
Cafe
at /cafe/1/name
producing this:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/simple-row/
{
"cafe": [
{
"name": "Healthy Cafe"
},
{
"name": "Vegetarian Cafe"
}
]
}
Although both Healthy Cafe
and Vegetarian Cafe
are under a column that
resolves to /cafe/0/name
, the rules described in the previous section explain
why both are present in the output and why Healthy Cafe
comes before
Vegetarian Cafe
.
Multiple columns¶
Let’s add the cafe address to the spreadsheet:
name | address |
---|---|
Healthy Cafe | 123 City Street, London |
Vegetarian Cafe | 42 Town Road, Bristol |
Note
CSV files require cells containing ,
characters to be escaped by wrapping
them in double quotes. That’s why if you look at the source CSV, the addresses
are escaped with "
characters.
This time Healthy Cafe
is placed at /cafe/0/name
as before, London
is
placed at /cafe/0/address
. Vegetarian Cafe
at /cafe/1/name
as before and
Bristol
is at /cafe/1/address
.
The result is:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/simple-col/
{
"cafe": [
{
"name": "Healthy Cafe",
"address": "123 City Street, London"
},
{
"name": "Vegetarian Cafe",
"address": "42 Town Road, Bristol"
}
]
}
Multiple sheets¶
So far, all the examples have just used one sheet. When multiple sheets are involved, the behaviour isn’t much different.
In effect, all Flatten Tool does is:
- take the JSON structure produced after processing the previous sheets and use it as the base JSON for processing the next sheet
- keep track of the index numbers of existing objects and generate JSON pointers that point to the next free index at any existing locations (with the effect of having new objects appended to any existing ones at the same location)
Once all the sheets have been processed the resulting JSON is returned.
Note
The CSV specification doesn’t support multiple sheets. To work around this, Flatten Tool treats a directory of CSV files as a single spreadsheet with multiple sheets - one for each file.
This is why all the CSV file examples given so far have been written to a
file in an empty directory and why only the directory name was needed in
the flatten-tool
commands.
Here’s a simple two-sheet example where the headings are the same in both sheets:
name |
---|
Healthy Cafe |
name |
---|
Vegetarian Cafe |
When you run the example you get this:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/multiple/
{
"cafe": [
{
"name": "Healthy Cafe"
},
{
"name": "Vegetarian Cafe"
}
]
}
The order is because the data
sheet was processed before the other
sheet.
Tip
CSV file sheets are processed in the order returned by os.listdir()
so
you should name them in the order you would like them processed.
Objects¶
Now you know that the column headings are really just a JSON Pointer specification, and the index values are only treated as indicators of the presence of arrays you can write some more sophisticated examples.
Rather than have the address just as string, we could represent it as an object. For example, imagine you’d like out output JSON in this structure:
{
"cafe": [
{
"name": "Healthy Cafe",
"address": {
"street": "123 City Street",
"city": "London"
}
},
{
"name": "Vegetarian Cafe",
"address": {
"street": "42 Town Road",
"city": "Bristol"
}
}
]
}
You can do this by knowing that the JSON Pointer to “123 City Street” would be
/cafe/0/address/street
so that we would need to name the street column
address/street
.
Here’s the data:
name | address/street | address/city |
---|---|---|
Healthy Cafe | 123 City Street | London |
Vegetarian Cafe | 42 Town Road | Bristol |
Let’s try it:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/object/
{
"cafe": [
{
"name": "Healthy Cafe",
"address": {
"street": "123 City Street",
"city": "London"
}
},
{
"name": "Vegetarian Cafe",
"address": {
"street": "42 Town Road",
"city": "Bristol"
}
}
]
}
Lists of Objects (without IDs)¶
The cafe’s that have made up our examples so far also have tables, and the tables have a table number so that the waiters know where the food has to be taken to.
Each cafe has many tables, so this is an example of a one-to-many relationship if you are used to working with relational databases.
You can represent the table information in JSON as a array of objects, where each
object represents a table, and each table has a number
key. Let’s imagine the
Healthy Cafe
has three tables numbered 1, 2 and 3. We’d like to produce this
structure:
{
"cafe": [
{
"name": "Healthy Cafe",
"table": [
{
"number": "1"
},
{
"number": "2"
},
{
"number": "3"
}
]
}
]
}
In the relationships section later, we’ll see other (often better) ways of arranging this data using identifiers, but for now we’ll demonstrate an approach that puts all the table information in the same row as the cafe itself.
For example, consider this spreadsheet data:
name | table/0/number | table/1/number | table/2/number |
---|---|---|---|
Healthy Cafe | 1 | 2 | 3 |
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/list-of-objects/
{
"cafe": [
{
"name": "Healthy Cafe",
"table": [
{
"number": "1"
},
{
"number": "2"
},
{
"number": "3"
}
]
}
]
}
We’ll use this example of tables (of the furniture variety) in subsequent examples.
Index behaviour¶
Just as in the multiple sheets example earlier, the exact numbers at the table index positions aren’t too important to Flatten Tool. They just tell Flatten Tool that the value in the cell is part of an object in an array.
In this particular case though, Flatten Tool will keep columns in order implied by the indexes.
For example here the index values are such that the lowest number comes last:
name | table/30/number | table/20/number | table/10/number |
---|---|---|---|
Healthy Cafe | 1 | 2 | 3 |
We’d still expect 3 tables in the output, but we expect Flatten Tool to re-order the columns so that table 3 comes first, then 2, then 1:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/tables-index/
{
"cafe": [
{
"name": "Healthy Cafe",
"table": [
{
"number": "3"
},
{
"number": "2"
},
{
"number": "1"
}
]
}
]
}
Child objects like these tables can, of course have more than one key. Let’s
add a reserved
key to table number 1 but to try to confuse Flatten Tool,
we’ll specify it at the end:
name | table/30/number | table/20/number | table/10/number | table/30/reserved |
---|---|---|---|---|
Healthy Cafe | 1 | 2 | 3 | True |
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/tables-index-reserved/
{
"cafe": [
{
"name": "Healthy Cafe",
"table": [
{
"number": "3"
},
{
"number": "2"
},
{
"number": "1",
"reserved": "True"
}
]
}
]
}
Notice that Flatten Tool correctly associated the reserved
key with table 1
because of the index numbered 30
, even though the columns weren’t next to
each other.
For a much richer way of organising arrays of objects, see the Relationships section.
Plain Lists (Unsupported)¶
Flatten Tool doesn’t recognise arrays of JSON values other than objects (just described in the previous section) unless a schema is used.
Heading names such as tag/0
and tag/1
would be ignored and an
empty array would be put into the JSON.
Here’s some example data:
name | tag/0 |
---|---|
Healthy Cafe | health |
Vegetarian Cafe | veggie |
And the result:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/plain-list/
{
"cafe": [
{
"name": "Healthy Cafe",
"tag": []
},
{
"name": "Vegetarian Cafe",
"tag": []
}
]
}
However, an array of tags in the following format (semi-colon separated) can be handled if a schema is passed to Flatten Tool specifying the array type of the field.
name | tags |
---|---|
Healthy Cafe | health;low-cost;locally sourced food;take-out |
Vegetarian Cafe | veggie |
The schema we’ll pass is:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"tags": {
"items": {
"type": "string"
},
"type": "array"
},
"name": {
"type": "string"
}
}
}
And the result:
$ flatten-tool unflatten -f=csv --root-list-path=cafe --schema=examples/cafe/plain-list-schema/tagsArraySchema.json examples/cafe/plain-list-schema/
{
"cafe": [
{
"name": "Healthy Cafe",
"tags": [
"health",
"low-cost",
"locally sourced food",
"take-out"
]
},
{
"name": "Vegetarian Cafe",
"tags": [
"veggie"
]
}
]
}
Read on for more about typed fields and use of schemas.
Typed fields¶
In the table examples you’ve seen so far, the table numbers are produced as strings in the JSON. The JSON Pointer specification doesn’t provide any way of telling you what type the value being pointed to is, so we can’t get the information from the column headings.
There are two places we can get it from though:
- The spreadsheet cell (if the underlying spreadsheet type supports it, e.g. CSV doesn’t but XLSX does)
- An external JSON Schema describing the data
If we can’t get any type information we fall back to assuming strings.
Here is the sample data we’ll use for the examples in the next two sections:
name | table/0/number | table/1/number | table/2/number |
---|---|---|---|
Healthy Cafe | 1 | 2 | 3 |
Using spreadsheet cell formatting¶
CSV files only support string values, so the easiest way to get the example above to use integers would be to use a spreadsheet format such as XLSX that supported integers and make sure the cell type was number. Flatten Tool would pass the cell value through to the JSON as a number in that case.
Note
Make sure you specify the correct format -f=xlsx
on the command line if
you want to use an XLSX file.
$ flatten-tool unflatten -f=xlsx --root-list-path=cafe examples/cafe/tables-typed-xlsx/tables.xlsx
{
"cafe": [
{
"name": "Healthy Cafe",
"table": [
{
"number": 1
},
{
"number": 2
},
{
"number": 3
}
]
}
]
}
Caution
Number formats in spreadsheets are ignored in Python 2.7 so this example won’t work. It does work in Python 3.4 and above though.
If you look at Flatten Tool’s source code you’ll see the in test_docs.py
that the above example is skipped on older Python versions.
Using a JSON Schema with types¶
Here’s an example of a JSON Schema that can provide the typing information:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"TableObject": {
"type": "object",
"properties": {
"number": {
"type": "integer"
}
}
}
},
"type": "object",
"properties": {
"table": {
"items": {
"$ref": "#/definitions/TableObject"
},
"type": "array"
}
}
}
$ flatten-tool unflatten -f=csv --root-list-path=cafe --schema=examples/cafe/tables-typed-schema/cafe.schema examples/cafe/tables-typed-schema/
{
"cafe": [
{
"name": "Healthy Cafe",
"table": [
{
"number": 1
},
{
"number": 2
},
{
"number": 3
}
]
}
]
}
Tip
Although this example is too simple to demonstrate it, Flatten Tool ignores the order of individual properties in a JSON schema when producing JSON output, and instead follows the order of the columns in the sheets.
Human-friendly headings using a JSON Schema with titles¶
Let’s take a closer look at the array of objects example from earlier again:
name | table/0/number | table/1/number | table/2/number |
---|---|---|---|
Healthy Cafe | 1 | 2 | 3 |
The column headings table/0/number
, table/1/number
and table/2/number
aren’t very human readable, wouldn’t it be great if we could use headings like
this:
name | Table: 0: Number | Table: 1: Number | Table: 2: Number |
---|---|---|---|
Healthy Cafe | 1 | 2 | 3 |
Flatten Tool supports this if you do the following:
- Write a JSON Schema specifying the titles being used and
specify it with the
--schema
option - Use
:
characters instead of/
characters in the headings - Specify the
--convert-titles
option on the command line
Caution
If you forget any of these, Flatten Tool might produce incorrect JSON rather than failing.
Here’s a new JSON schema for this example:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"TableObject": {
"type": "object",
"properties": {
"number": {
"title": "Number",
"type": "integer"
}
}
}
},
"type": "object",
"properties": {
"table": {
"items": {
"$ref": "#/definitions/TableObject"
},
"title": "Table",
"type": "array"
}
}
}
Notice that both Table
and Number
are specified as titles.
Here’s what we get when we run it:
$ flatten-tool unflatten -f=csv --root-list-path=cafe --schema=examples/cafe/tables-human-1/cafe.schema --convert-titles examples/cafe/tables-human-1/
{
"cafe": [
{
"name": "Healthy Cafe",
"table": [
{
"number": 1
},
{
"number": 2
},
{
"number": 3
}
]
}
]
}
Optional array indexes¶
Looking at the JSON Schema from the last example again you’ll see that table
is specified as an array type:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"TableObject": {
"type": "object",
"properties": {
"number": {
"title": "Number",
"type": "integer"
}
}
}
},
"type": "object",
"properties": {
"table": {
"items": {
"$ref": "#/definitions/TableObject"
},
"title": "Table",
"type": "array"
}
}
}
This means that Flatten Tool can work out that any names specified in that
column are part of that array. If you had an example with just one column
representing each level of the tree, you could miss out the index in the
heading when using --schema
and --convert-titles
.
Here’s a similar example, but with just one rolled up column:
name | Table: Number |
---|---|
Healthy Cafe | 1 |
Here’s what we get when we run this new data with this schema:
$ flatten-tool unflatten -f=csv --root-list-path=cafe --schema=examples/cafe/tables-human-2/cafe.schema --convert-titles examples/cafe/tables-human-2/
{
"cafe": [
{
"name": "Healthy Cafe",
"table": [
{
"number": 1
}
]
}
]
}
Relationships using Identifiers¶
So far, all the examples you’ve seen have served to demonstrate how Flatten Tool works, but probably wouldn’t be particularly useful in real life, simply because they require everything related to be on the same row.
In this section you’ll learn how identifiers work and that will allow you much more freedom in designing different spreadsheet layouts that produce the same JSON.
In Flatten Tool, any field named id
is considered special. Flatten Tool knows
that any objects with the same id
at the same level are the same object and
that their values should be merged.
ID-based object merge behaviour¶
The merge behaviour happens whether the two IDs are specified in:
- different rows in the same sheet
- two rows in two different sheets
Basically, any time Flatten Tool comes across a row with an id
in it, it will
lookup any other objects in the array to see if that id
is already used and if
it is, it will merge it. If not, it will just append a new object to the array.
Caution
It is important to make sure your id
values really are unique. If you
accidentally use the same id
for two different objects, Flatten Tool
will think they are the same and merge them.
Flatten Tool will merge an existing and new object as follows:
- Any fields in new object that are missing in the existing one are added
- Any fields in the existing object that aren’t in the new one are left as they are
- If there are fields that are in both that have the same value, that value is kept
- If there are fields that are in both with different values, the existing values are kept and conflict warnings issued
This means that values in later rows do not overwrite existing conflicting values.
Let’s have a look at these rules in action in the next two sections with an example from a single sheet, and one from multiple sheets.
ID-based object merge in a single sheet¶
Here’s an example that demonstrates these rules:
id | name | address | number_of_tables |
---|---|---|---|
CAFE-HEALTH | Healthy Cafe | ||
CAFE-HEALTH | Vegetarian Cafe | 3 | |
CAFE-HEALTH | 123 City Street, London | ||
CAFE-HEALTH | 4 |
Let’s run it:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/relationship-merge-single/
Notice the warnings above about values being over-written:
You may have a duplicate Identifier: We couldn't merge these rows with the id "CAFE-HEALTH": field "name" in sheet "data": one cell has the value: "Healthy Cafe", the other cell has the value: "Vegetarian Cafe"
You may have a duplicate Identifier: We couldn't merge these rows with the id "CAFE-HEALTH": field "number_of_tables" in sheet "data": one cell has the value: "3", the other cell has the value: "4"
The actual JSON contains a single Cafe with id
value CAFE-HEALTH
and all
the values merged in:
{
"cafe": [
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"number_of_tables": "3",
"address": "123 City Street, London"
}
]
}
ID-based object merge in multiple sheets¶
Here’s an example that uses the same data as the single sheet example above,
but spreads the rows over four sheets named a
, b
, c
and d
:
id | name | address | number_of_tables |
---|---|---|---|
CAFE-HEALTH | Healthy Cafe |
id | name | address | number_of_tables |
---|---|---|---|
CAFE-HEALTH | Incorrect value | 3 |
id | name | address | number_of_tables |
---|---|---|---|
CAFE-HEALTH | 123 City Street, London |
id | name | address | number_of_tables |
---|---|---|---|
CAFE-HEALTH | 4 |
Let’s run it:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/relationship-merge-multiple/
Notice the warnings above about values being over-written:
You may have a duplicate Identifier: We couldn't merge these rows with the id "CAFE-HEALTH": field "name" in sheet "b": one cell has the value: "Healthy Cafe", the other cell has the value: "Incorrect value"
You may have a duplicate Identifier: We couldn't merge these rows with the id "CAFE-HEALTH": field "number_of_tables" in sheet "d": one cell has the value: "3", the other cell has the value: "4"
And the rest of the output:
{
"cafe": [
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"number_of_tables": "3",
"address": "123 City Street, London"
}
]
}
The result is the same as before.
Parent-child relationships (arrays of objects)¶
Things get much more interesting when you start dealing with arrays of objects
whose parents have an id
. This enables you to split the parents and children
up into multiple sheets rather than requiring everything sits one the same row.
As an example, let’s imagine that Vegetarian Cafe
is arranged having two
tables numbered 16
and 17
because they are share tables with another
restaurant next door.
{
"cafe": [
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"table": [
{
"number": "1"
},
{
"number": "2"
},
{
"number": "3"
}
]
},
{
"id": "CAFE-VEG",
"name": "Vegetarian Cafe",
"table": [
{
"number": "16"
},
{
"number": "17"
}
]
}
]
}
From the knowledge you gained when learning about arrays of objects without IDs earlier, you know that you can produce the correct structure with a CSV file like this:
id | name | table/0/number | table/1/number | table/2/number |
---|---|---|---|---|
CAFE-HEALTH | Healthy Cafe | 1 | 2 | 3 |
CAFE-VEG | Vegetarian Cafe | 16 | 17 |
This time, we’ll give both the Cafe’s IDs and move the tables into a separate sheet:
id | name |
---|---|
CAFE-HEALTH | Healthy Cafe |
CAFE-VEG | Vegetarian Cafe |
id | table/0/number |
---|---|
CAFE-HEALTH | 1 |
CAFE-VEG | 16 |
CAFE-HEALTH | 2 |
CAFE-HEALTH | 3 |
CAFE-VEG | 17 |
By having the tables in a separate sheet, you can now support cafe’s with as
many tables as you like, just by adding more rows and making sure the id
column for the table matches the id
value for the cafe.
Let’s run this example:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/relationship-lists-of-objects/
{
"cafe": [
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"table": [
{
"number": "1"
},
{
"number": "2"
},
{
"number": "3"
}
]
},
{
"id": "CAFE-VEG",
"name": "Vegetarian Cafe",
"table": [
{
"number": "16"
},
{
"number": "17"
}
]
}
]
}
By specifying an ID, the values in the tables sheet can be associated with the correct part of the tree created by the cafes sheet.
Index behaviour¶
Within the array of tables for each cafe, you might have noticed that each table
number has a JSON Pointer that ends in with /0/number
. Since they all have the
same index, they are simply ordered within each cafe in the order of the rows
in the sheet.
Grandchild relationships¶
In future we might like to extend this example so that we can track the dishes ordered by each table so we can generate a bill.
Let’s take the case of dishes served at tables and imagine that Healthy Cafe
has its own health fish and chips
dish. Now let’s also imagine that the dish
is ordered at tables 1 and 3.
If you are used to thinking about relational database you would probably think
about having a new sheet called dishes
with a two columns, one for an id
and one for the name
of the dish. You would then create a sheet to represent
a join table called table_dishes
that contained the ID of the table and of
the dish.
The problem with this approach is that the output is actually a tree, and not a
normalised relational model. Have a think about how you would write the
table_dishes
sheet. You’d need to write something like this:
table/0/id | dish/0/id |
---|---|
TABLE-1 | DISH-fish-and-chips |
TABLE-3 | DISH-fish-and-chips |
The problem is that dish/0/id
is really a JSON Pointer to /cafe/0/dish/0/id
and so would try to create a new dish
key under each cafe, not a dish
key
under each table.
You can’t do it this way. Instead you have to design you dish
sheet to
specify both the ID of the cafe and the ID of the table as well as the name of
the dish. If a dish is used in multiple tables, you will have multiple rows,
each with the same name in the name column. In this each way row contains the
entire path to its position in the tree.
Since nothing depends on the dishes yet, they don’t have to have an ID themselves, they just need to reference their parent IDs:
id | name |
---|---|
CAFE-HEALTH | Healthy Cafe |
id | table/0/id | table/0/number |
---|---|---|
CAFE-HEALTH | TABLE-1 | 1 |
CAFE-HEALTH | TABLE-2 | 2 |
CAFE-HEALTH | TABLE-3 | 3 |
id | table/0/id | table/0/dish/0/name |
---|---|---|
CAFE-HEALTH | TABLE-1 | Fish and Chips |
CAFE-HEALTH | TABLE-3 | Fish and Chips |
Here are the results:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/relationship-multiple/
{
"cafe": [
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"table": [
{
"id": "TABLE-1",
"dish": [
{
"name": "Fish and Chips"
}
],
"number": "1"
},
{
"id": "TABLE-3",
"dish": [
{
"name": "Fish and Chips"
}
],
"number": "3"
},
{
"id": "TABLE-2",
"number": "2"
}
]
}
]
}
Notice the ordering in this example. Because dishes
is processed before
tables
, TABLE-3
gets defined before TABLE-2
, and dish
gets added as a
key before tables
.
If the sheets were processed the other way around the data would be the same, but the ordering different.
Tip
Flatten Tool supports producing JSON hierarchies of arbitrary depth, not just the parent-child and parent-child-grandchild relationships you’ve seen in the examples so far. Just make sure that however deep an object is, it always has the IDs of all of its parents in the same row as it, as the tables and dishes sheets do.
Arbitrary-depth in a single table¶
You can also structure all the data into a single table. It is only recommended to do this if you have a very simple data structure where there is only one object at each part of the hierarchy.
In this example we’ll use a JSON Schema to infer the structure, allowing us to use human-readable column titles.
Here’s the data:
Identifier | Name | Table: Identifier | Table: Number | Table: Dish: Name |
---|---|---|---|---|
CAFE-HEALTH | Healthy Cafe | TABLE-1 | 1 | Fish and Chips |
CAFE-HEALTH | Healthy Cafe | TABLE-2 | 2 | |
CAFE-HEALTH | Healthy Cafe | TABLE-3 | 3 | Fish and Chips |
Let’s unflatten this table:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/relationship-grandchild/ -s examples/receipt/cafe.schema --convert-titles
{
"cafe": [
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"table": [
{
"id": "TABLE-1",
"number": 1,
"dish": [
{
"name": "Fish and Chips"
}
]
},
{
"id": "TABLE-2",
"number": 2
},
{
"id": "TABLE-3",
"number": 3,
"dish": [
{
"name": "Fish and Chips"
}
]
}
]
}
]
}
If you’d like to explore this example yourself, here’s the schema used in the example above:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"TableObject": {
"type": "object",
"properties": {
"id": {
"type": "string",
"title": "Identifier"
},
"number": {
"type": "integer",
"title": "Number"
},
"dish": {
"items": {
"$ref": "#/definitions/DishObject"
},
"type": "array",
"title": "Dish"
}
}
},
"DishObject": {
"type": "object",
"properties": {
"id": {
"type": "string",
"title": "Identifier"
},
"name": {
"type": "string",
"title": "Name"
},
"cost": {
"type": "number",
"title": "Cost"
}
}
}
},
"type": "object",
"properties": {
"id": {
"type": "string",
"title": "Identifier"
},
"name": {
"type": "string",
"title": "Name"
},
"address": {
"type": "string",
"title": "Address"
},
"table": {
"items": {
"$ref": "#/definitions/TableObject"
},
"type": "array",
"title": "Table"
}
}
}
Missing IDs¶
You might be wondering what happens if IDs are accidentally missing. There are two cases where this can happen:
- The ID is missing but no child objects reference it anyway
- The ID is missing and so children can’t be added
To demonstrate both of these in one example consider the following example. In particular notice that:
CAFE-VEG
is missing from thecafes
sheetCAFE-VEG
is missing from the last row in thetables
sheet
id | name | address |
---|---|---|
CAFE-HEALTH | Healthy Cafe | 123 City Street, London |
Vegetarian Cafe | 42 Town Road, Bristol |
id | table/0/number |
---|---|
CAFE-HEALTH | 1 |
CAFE-HEALTH | 2 |
CAFE-HEALTH | 3 |
CAFE-VEG | 16 |
17 |
Let’s run this example:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/relationship-missing-ids/
{
"cafe": [
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"address": "123 City Street, London",
"table": [
{
"number": "1"
},
{
"number": "2"
},
{
"number": "3"
}
]
},
{
"id": "CAFE-VEG",
"table": [
{
"number": "16"
}
]
},
{
"name": "Vegetarian Cafe",
"address": "42 Town Road, Bristol"
},
{
"table": [
{
"number": "17"
}
]
}
]
}
You’ll notice that all the data and tables for CAFE-HEALTH
are output
correctly in the first object. This is what we’d expect because all the IDs
were present.
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"address": "123 City Street, London",
"table": [
{
"number": "1"
},
{
"number": "2"
},
{
"number": "3"
}
]
},
Next is this cafe:
{
"name": "Vegetarian Cafe",
"address": "42 Town Road, Bristol"
},
This is as much information as Flatten Tool can work out from the second row of
the cafes
sheet because the ID is missing. Flatten Tool just appends a new
cafe with the data it has.
Next, Flatten Tool works through the tables
sheet, it finds table 16 and
knows it must be associated with a cafe called CAFE-VEG
that is specified in
the id
column, but because this id
is present in the cafes
sheet, it
can’t merge it in. Instead it just appends data for the cafe:
{
"id": "CAFE-VEG",
"table": [
{
"number": "16"
}
]
},
Finally, Flatten Tool finds table 17 in the tables
sheet. It doesn’t know
which Cafe this is for, but it knows tables are part of cafes so it adds
another unnamed cafe:
{
"table": [
{
"number": "17"
}
]
}
Relationships with JSON Schema¶
If you want to use Flatten Tool’s support for JSON Schema to extend to relationships
you need to amend your JSON Schema to tell it about the id
fields:
- Make sure that the
id
field is specified for every object in the hierarchy (although this isn’t necessary for objects right at the bottom of the hierarchy) - Give the
id
field a title ofIdentifier
With these two things in place, Flatten Tool will correctly handle relationships.
Caution
If you forget to add the id
field, Flatten Tool will not know anything
about it when generating templates or converting titles.
Sheet Shapes¶
Now that you’ve seen some of the details of how Flatten Tool works we can look in more detail at the different shapes your data can have in the sheets.
To discuss the pros and cons of the different shapes, we’ll work through a whole example.
Imagine that you Healthy Cafe and Vegetarian Cafe are both part of a chain and you have to create a receipt system for them. You need to track which dishes are ordered at which tables in which cafes.
The JSON you would like to produce from the sheets the waiters write as they take orders looks like this:
{
"cafe": [
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"table": [
{
"id": "TABLE-1",
"number": "1",
"dish": [
{
"name": "Fish and Chips",
"cost": "9.95"
},
{
"name": "Pesto Pasta Salad",
"cost": "6.95"
}
]
},
{
"id": "TABLE-2",
"number": "2"
},
{
"id": "TABLE-3",
"number": "3",
"dish": [
{
"name": "Fish and Chips",
"cost": "9.95"
}
]
}
]
},
{
"id": "CAFE-VEG",
"name": "Vegetarian Cafe",
"table": [
{
"id": "TABLE-16",
"number": "16",
"dish": [
{
"name": "Large Glass Sauvignon",
"cost": "5.95"
}
]
},
{
"id": "TABLE-17",
"number": "17"
}
]
}
]
}
There are many ways we could arrange this data:
- cafes, tables and dishes all separate
- cafes and tables together, dishes separate, with one row per table in the cafes and tables sheet
- cafes and tables together, dishes separate, with one row per cafe in the cafes and tables sheet
- tables and dishes together, cafes separate, with one row per table in the tables and dishes sheet
- tables and dishes together, cafes separate, with one row per dish in the tables and dishes sheet
Let’s take a look at the first three cases. Combining tables and dishes into one sheet follows the same principles as combining cafes and tables, so we won’t demonstrate those examples too.
Separate sheet for each object¶
Here’s the first way of doing this with everything in its own sheet. This is the recommended approach unless you have a good reason to move some parts of a table into another one. It is also the default you will get when using Flatten Tool to flatten or generate a template for a JSON structure. You’ll learn about that later.
id | name |
---|---|
CAFE-HEALTH | Healthy Cafe |
CAFE-VEG | Vegetarian Cafe |
id | table/0/id | table/0/number |
---|---|---|
CAFE-HEALTH | TABLE-1 | 1 |
CAFE-HEALTH | TABLE-2 | 2 |
CAFE-HEALTH | TABLE-3 | 3 |
CAFE-VEG | TABLE-16 | 16 |
CAFE-VEG | TABLE-17 | 17 |
id | table/0/id | table/0/dish/0/name | table/0/dish/0/cost |
---|---|---|---|
CAFE-HEALTH | TABLE-1 | Fish and Chips | 9.95 |
CAFE-HEALTH | TABLE-1 | Pesto Pasta Salad | 6.95 |
CAFE-HEALTH | TABLE-3 | Fish and Chips | 9.95 |
CAFE-VEG | TABLE-16 | Large Glass Sauvignon | 5.95 |
Note
Notice that this time the CSV sheets are prefixed with an integer to make sure they are processed in the right order. If the prefixes weren’t there, the order of the tables in the resulting JSON might be different.
If you were using an XLSX file, Flatten Tool would process the sheets in the order they appeared, regardless of their names, so the prefix wouldn’t be needed.
You can run the example with this:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/receipt/normalised/
You should see the same JSON as shown at the top of the section.
The advantage of this set up is that it allows any number of cafes, tables and dishes. The disadvantage is that it requires three sheets, making data a bit harder to find.
Combining objects¶
Now let’s imagine that all your cafe’s are small and they never have more than three tables. In this case we can combine tables into cafes so that we just have two sheets.
Table per row¶
Here’s what it looks like when you want to use one row per table:
id | name | table/0/id | table/0/number |
---|---|---|---|
CAFE-HEALTH | Healthy Cafe | TABLE-1 | 1 |
CAFE-HEALTH | Healthy Cafe | TABLE-2 | 2 |
CAFE-HEALTH | Healthy Cafe | TABLE-3 | 3 |
CAFE-VEG | Vegetarian Cafe | TABLE-16 | 16 |
CAFE-VEG | Vegetarian Cafe | TABLE-17 | 17 |
id | table/0/id | table/0/dish/0/name | table/0/dish/0/cost |
---|---|---|---|
CAFE-HEALTH | TABLE-1 | Fish and Chips | 9.95 |
CAFE-HEALTH | TABLE-1 | Pesto Pasta Salad | 6.95 |
CAFE-HEALTH | TABLE-3 | Fish and Chips | 9.95 |
CAFE-VEG | TABLE-16 | Large Glass Sauvignon | 5.95 |
You can run the example with this:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/receipt/combine-table-into-cafe/
If you do run it you’ll see the JSON is exactly the same as before.
Unlike a database, Flatten Tool won’t complain if different Cafe names are associated with the same Cafe ID in the same table, instead you’ll just get a warning.
Combining sheets works best when:
- the child (the object being combined in to the parent) doesn’t have that many properties
- you can be sure there won’t be too many children for each parent
- there is a low risk of typos being made in the duplicated data
Cafe per row¶
There’s another variant of this shape that we can use. If we just want to use one row per cafe.
id | name | table/0/id | table/0/number | table/1/id | table/1/number | table/2/id | table/2/number |
---|---|---|---|---|---|---|---|
CAFE-HEALTH | Healthy Cafe | TABLE-1 | 1 | TABLE-2 | 2 | TABLE-3 | 3 |
CAFE-VEG | Vegetarian Cafe | TABLE-16 | 16 | TABLE-17 | 17 |
id | table/0/id | table/0/dish/0/name | table/0/dish/0/cost |
---|---|---|---|
CAFE-HEALTH | TABLE-1 | Fish and Chips | 9.95 |
CAFE-HEALTH | TABLE-1 | Pesto Pasta Salad | 6.95 |
CAFE-HEALTH | TABLE-3 | Fish and Chips | 9.95 |
CAFE-VEG | TABLE-16 | Large Glass Sauvignon | 5.95 |
You can run the example with this:
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/receipt/combine-table-into-cafe-2/
The JSON is the same as before, as you would expect.
All in one table¶
It would also be possible to put all the data in a single table, but this would look quite complicated since there is more than one table in each cafe and more than one dish at each table.
To understand the approach, have a look at the “Arbitrary-depth in a single table” section earlier.
Tip
If you’d like to explore these examples yourself using human-readable column titles, you can use the schema in the “Arbitrary-depth in a single table” section too.
Metadata Tab¶
Flatten Tool supports naming of a special sheet (or Tab) in a spreadsheet to add data to the top level of the returned data structure. Currently it only supports output format JSON and the input format has to be XLSX.
Example Usage¶
You have a spreadsheet named “mydata.xlsx with” 2 sheets. The first sheet named “Cafe”:
name | address |
---|---|
Healthy Cafe | 123 City Street, London |
Vegetarian Cafe | 42 Town Road, Bristol |
A second sheet you would like to add some metadata to this list of rows to a sheet named “Meta”:
dataLicense | CC |
rowCount | 2 |
publishedDate | 2001-01-01 |
As you can see it is also possible to choose to have the metadata headings on the first column (not the first row) with metadata vertical.
The command for doing this:
$ flatten-tool unflatten --input-format=xlsx --metatab-name Meta --metatab-vertical-orientation examples/cafe/meta-tab/meta-tab.xlsx
{
"dataLicense": "CC",
"rowCount": "2",
"publishedDate": "2001-01-01",
"main": [
{
"name": "Healthy Cafe",
"address": "123 City Street, London"
},
{
"name": "Vegetarian Cafe",
"address": "42 Town Road, Bristol"
}
]
}
Options¶
--metatab-name
This is the name of the sheet with the metadata on. It is case sensitive. It is the only mandatory option if you want to parse a metatab, without it no metatab will be parsed
--metatab-schema
The JSON schema of the metatab. This schema will be used to determine the types and/or titles of the data in the metatab. It works in the same way as the –schema option but just for the metatab. The schema used with the –schema option has no effect on the metatab parsing, so this has to be specified if you need title handling or want to specify types.
--metatab-only
Just return the metatab information and not the rest of the doc. Using the example above:
$ flatten-tool unflatten --input-format=xlsx --metatab-only --metatab-name Meta --metatab-vertical-orientation examples/cafe/meta-tab-only/meta-tab.xlsx
{
"dataLicense": "CC",
"rowCount": "2",
"publishedDate": "2001-01-01"
}
--metatab-vertical-orientation
Say that the metatab data runs vertically rather that horizontally see example above.
Configuration properties: skip and header rows¶
Flatten Tool supports directives in the first row of a file to tell it to:
- skiprows - start processing data from n rows down
- headerrows - the total number of header rows. Note that the first header row will be treated as field paths.
Example usage¶
You have a CSV file named “mydata.csv” that contains:
- Two rows of general provenance information or notes;
- The field paths;
- Two rows that explain the meaning of the fields
This pattern may occur, for example, when you export from a spreadsheet that includes formatted header rows that explain the data.
By adding a row containing a cell with ‘#’, and then a set of configuration directives, you can instruct Flatten Tool to skip rows at the top of the file, and to recognise that the field paths are followed by a set of additional header lines.
# | skiprows 2 | headerrows 3 |
This row is skipped | It could contain some provenance data | |
This row is skipped too | ||
name | address/street | address/city |
Name | Street Address | City address |
Use name from the sign | Don’t include the postcode | |
Healthy Cafe | 123 City Street | London |
Vegetarian Cafe | 42 Town Road | Bristol |
Flatten tool will interpret the ‘#’ configuration row, and generate the appropriate output with no additional parameters needed at the command line.
$ flatten-tool unflatten -f=csv --root-list-path=cafe examples/cafe/skip-and-headers/
{
"cafe": [
{
"name": "Healthy Cafe",
"address": {
"street": "123 City Street",
"city": "London"
}
},
{
"name": "Vegetarian Cafe",
"address": {
"street": "42 Town Road",
"city": "Bristol"
}
}
]
}
List of configuration features¶
There is also the ‘ignore’ command, which can be used to ignore sheets in a multi-tab workbook.
When configuration options are set in metatab, they apply to all sheets unless they are overridden.
Further configuration options can be seen at https://github.com/OpenDataServices/flatten-tool/blob/7fa96933b8fc3ba07a3d44fe07dccf2791165686/flattentool/lib.py
Source maps¶
Once you have unflattened a spreadsheet into a JSON document you will usually pass the document to a JSON Schema validator to make sure all the data is valid.
If there are any errors in the JSON, it is very useful to be able to point the user back to the corresponding place in the original spreadsheet. Flatten Tool provides source maps for exactly this purpose.
There are two types of source map:
- Cell source map - points from a JSON pointer path to a cell (or row) in the original spreadsheet
- Heading source map - specifies the column for each heading
Here’s an example where we unflatten a normalised spreadsheet, but generate both a cell and a heading source map as we do.
$ flatten-tool unflatten -f=csv --root-list-path=cafe --cell-source-map examples/receipt/source-map/actual/cell_source_map.json --heading-source-map examples/receipt/source-map/actual/heading_source_map.json examples/receipt/source-map/input/
Here’s the source data:
id | name |
---|---|
CAFE-HEALTH | Healthy Cafe |
CAFE-VEG | Vegetarian Cafe |
id | table/0/id | table/0/number |
---|---|---|
CAFE-HEALTH | TABLE-1 | 1 |
CAFE-HEALTH | TABLE-2 | 2 |
CAFE-HEALTH | TABLE-3 | 3 |
CAFE-VEG | TABLE-16 | 16 |
CAFE-VEG | TABLE-17 | 17 |
id | table/0/id | table/0/dish/0/name | table/0/dish/0/cost |
---|---|---|---|
CAFE-HEALTH | TABLE-1 | Fish and Chips | 9.95 |
CAFE-HEALTH | TABLE-1 | Pesto Pasta Salad | 6.95 |
CAFE-HEALTH | TABLE-3 | Fish and Chips | 9.95 |
CAFE-VEG | TABLE-16 | Large Glass Sauvignon | 5.95 |
Here’s the resulting JSON document (the same as before):
{
"cafe": [
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"table": [
{
"id": "TABLE-1",
"number": "1",
"dish": [
{
"name": "Fish and Chips",
"cost": "9.95"
},
{
"name": "Pesto Pasta Salad",
"cost": "6.95"
}
]
},
{
"id": "TABLE-2",
"number": "2"
},
{
"id": "TABLE-3",
"number": "3",
"dish": [
{
"name": "Fish and Chips",
"cost": "9.95"
}
]
}
]
},
{
"id": "CAFE-VEG",
"name": "Vegetarian Cafe",
"table": [
{
"id": "TABLE-16",
"number": "16",
"dish": [
{
"name": "Large Glass Sauvignon",
"cost": "5.95"
}
]
},
{
"id": "TABLE-17",
"number": "17"
}
]
}
]
}
Let’s look in detail at the cell source map and heading source map for this example.
Cell source map¶
A cell source map maps each JSON pointer in the document above back to the cells where that value is referenced.
Using the example you’ve just seen, let’s look at the very last value in the
spreadsheet for the number of TABLE-17
in CAFE-VEG
. The JSON pointer is
cafe/1/table/1/number
and the value itself is 17
.
Looking back at the source sheets you can see the only place this value appears
is in 2_tables.csv
. It appears in column C (the third column), row 6 (row 1
is treated as the heading so the values start at row 2). The heading of this
column in table/0/number
(which happens to be a JSON pointer, but if we were
using human readable headings, those headings would be used instead). We’d
therefore expect the cell source map to have just one entry for
cafe/1/table/1/number
that points to cell C2
like this:
"cafe/1/table/1/number": [
[
"2_tables",
"C",
6,
"table/0/number"
]
],
Here’s the actual cell source map and as you can see, the entry for
cafe/1/table/1/number
is as we expect (it is near the end):
{
"cafe/0/id": [
[
"1_cafes",
"A",
2,
"id"
],
[
"2_tables",
"A",
2,
"id"
],
[
"2_tables",
"A",
3,
"id"
],
[
"2_tables",
"A",
4,
"id"
],
[
"3_dishes",
"A",
2,
"id"
],
[
"3_dishes",
"A",
3,
"id"
],
[
"3_dishes",
"A",
4,
"id"
]
],
"cafe/0/name": [
[
"1_cafes",
"B",
2,
"name"
]
],
"cafe/0/table/0/dish/0/cost": [
[
"3_dishes",
"D",
2,
"table/0/dish/0/cost"
]
],
"cafe/0/table/0/dish/0/name": [
[
"3_dishes",
"C",
2,
"table/0/dish/0/name"
]
],
"cafe/0/table/0/dish/1/cost": [
[
"3_dishes",
"D",
3,
"table/0/dish/0/cost"
]
],
"cafe/0/table/0/dish/1/name": [
[
"3_dishes",
"C",
3,
"table/0/dish/0/name"
]
],
"cafe/0/table/0/id": [
[
"2_tables",
"B",
2,
"table/0/id"
],
[
"3_dishes",
"B",
2,
"table/0/id"
],
[
"3_dishes",
"B",
3,
"table/0/id"
]
],
"cafe/0/table/0/number": [
[
"2_tables",
"C",
2,
"table/0/number"
]
],
"cafe/0/table/1/id": [
[
"2_tables",
"B",
3,
"table/0/id"
]
],
"cafe/0/table/1/number": [
[
"2_tables",
"C",
3,
"table/0/number"
]
],
"cafe/0/table/2/dish/0/cost": [
[
"3_dishes",
"D",
4,
"table/0/dish/0/cost"
]
],
"cafe/0/table/2/dish/0/name": [
[
"3_dishes",
"C",
4,
"table/0/dish/0/name"
]
],
"cafe/0/table/2/id": [
[
"2_tables",
"B",
4,
"table/0/id"
],
[
"3_dishes",
"B",
4,
"table/0/id"
]
],
"cafe/0/table/2/number": [
[
"2_tables",
"C",
4,
"table/0/number"
]
],
"cafe/1/id": [
[
"1_cafes",
"A",
3,
"id"
],
[
"2_tables",
"A",
5,
"id"
],
[
"2_tables",
"A",
6,
"id"
],
[
"3_dishes",
"A",
5,
"id"
]
],
"cafe/1/name": [
[
"1_cafes",
"B",
3,
"name"
]
],
"cafe/1/table/0/dish/0/cost": [
[
"3_dishes",
"D",
5,
"table/0/dish/0/cost"
]
],
"cafe/1/table/0/dish/0/name": [
[
"3_dishes",
"C",
5,
"table/0/dish/0/name"
]
],
"cafe/1/table/0/id": [
[
"2_tables",
"B",
5,
"table/0/id"
],
[
"3_dishes",
"B",
5,
"table/0/id"
]
],
"cafe/1/table/0/number": [
[
"2_tables",
"C",
5,
"table/0/number"
]
],
"cafe/1/table/1/id": [
[
"2_tables",
"B",
6,
"table/0/id"
]
],
"cafe/1/table/1/number": [
[
"2_tables",
"C",
6,
"table/0/number"
]
],
"cafe/0": [
[
"1_cafes",
2
],
[
"2_tables",
2
],
[
"2_tables",
3
],
[
"2_tables",
4
],
[
"3_dishes",
2
],
[
"3_dishes",
3
],
[
"3_dishes",
4
]
],
"cafe/0/table/0/dish/0": [
[
"3_dishes",
2
]
],
"cafe/0/table/0/dish/1": [
[
"3_dishes",
3
]
],
"cafe/0/table/0": [
[
"2_tables",
2
],
[
"3_dishes",
2
],
[
"3_dishes",
3
]
],
"cafe/0/table/1": [
[
"2_tables",
3
]
],
"cafe/0/table/2/dish/0": [
[
"3_dishes",
4
]
],
"cafe/0/table/2": [
[
"2_tables",
4
],
[
"3_dishes",
4
]
],
"cafe/1": [
[
"1_cafes",
3
],
[
"2_tables",
5
],
[
"2_tables",
6
],
[
"3_dishes",
5
]
],
"cafe/1/table/0/dish/0": [
[
"3_dishes",
5
]
],
"cafe/1/table/0": [
[
"2_tables",
5
],
[
"3_dishes",
5
]
],
"cafe/1/table/1": [
[
"2_tables",
6
]
]
}
You’ll notice that some JSON pointers map to multiple source cells. This happens when data appears in multiple places, such as when the cell refers to an identifier.
You’ll also notice that after all the JSON pointers that point to values such
as cafe/0/id
or cafe/1/table/1/number
there are a set of JSON pointers that
point to objects rather than cells. For example cafe/0
or cafe/1/table/1
.
These JSON pointers refer back to the rows which contain values that make up
the object. For example cafe/1/table/1
looks like this:
"cafe/1/table/1": [
[
"2_tables",
6
]
]
This tells us that the data that makes up that table in the final JSON was all
defined in the 2_tables
sheet, row 6 (remembering that rows start at 2
because the header row is row 1). Again, if data from multiple rows goes to
make up the object, there may be multiple arrays in the JSON pointer result.
This second kind of entry in the cell source map is useful when a JSON schema validator gives errors to describe a missing value since it is likely that you will need to add the value on one for the rows where the other values are defined.
Heading source map¶
The heading source map maps a JSON pointer with all numbers removed, back to the column heading at the top of the columns where corresponding values have been placed.
Here’s the heading source map that was generated in the example we’ve been using in this section:
{
"cafe/id": [
[
"1_cafes",
"id"
],
[
"2_tables",
"id"
],
[
"3_dishes",
"id"
]
],
"cafe/name": [
[
"1_cafes",
"name"
]
],
"cafe/table/dish/cost": [
[
"3_dishes",
"table/0/dish/0/cost"
]
],
"cafe/table/dish/name": [
[
"3_dishes",
"table/0/dish/0/name"
]
],
"cafe/table/id": [
[
"2_tables",
"table/0/id"
],
[
"3_dishes",
"table/0/id"
]
],
"cafe/table/number": [
[
"2_tables",
"table/0/number"
]
]
}
The heading source map is generated separately from the cell source map, so headings can be found even if they have no corresponding data in the resulting JSON.
XML Comment¶
When a file is unflatten to an xml file, there is a default comment that says:
XML generated by flatten-tool
This comment can be edited by using the unflatten argument xml_comment or cli command –xml-comment.