Flattening¶
Caution
This page is a work in progress. The information may not be complete, and unlike the Spreadsheet Designer’s Guide, the tests backing up the documented examples are not correct. Use with caution.
The Spreadsheet Designer’s Guide was about unflattening - taking a spreadsheet and producing a JSON document.
In this section you’ll learn about flattening. The main use case for wanting to flatten a JSON document is so that you can manage the data in a spreadsheet.
Flatten Tool provides the flatten-tool flatten
sub-command for this purpose.
Generating a spreadsheet from a JSON document¶
Generating a spreadsheet from a JSON document is very similar to creating a template.
$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --schema=examples/receipt/cafe.schema examples/receipt/normalised/expected.json -o examples/flatten/simple/actual
One difference is that the default output name is flattened
, and so the
command above will generate a flattened/
directory with CSV files and a
flattened.xlsx
file in the current working directory.
The schema is the same as the one used in the user guide and looks like this:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"TableObject": {
"type": "object",
"properties": {
"id": {
"type": "string",
"title": "Identifier"
},
"number": {
"type": "integer",
"title": "Number"
},
"dish": {
"items": {
"$ref": "#/definitions/DishObject"
},
"type": "array",
"title": "Dish"
}
}
},
"DishObject": {
"type": "object",
"properties": {
"id": {
"type": "string",
"title": "Identifier"
},
"name": {
"type": "string",
"title": "Name"
},
"cost": {
"type": "number",
"title": "Cost"
}
}
}
},
"type": "object",
"properties": {
"id": {
"type": "string",
"title": "Identifier"
},
"name": {
"type": "string",
"title": "Name"
},
"address": {
"type": "string",
"title": "Address"
},
"table": {
"items": {
"$ref": "#/definitions/TableObject"
},
"type": "array",
"title": "Table"
}
}
}
The input JSON file looks like this:
{
"cafe": [
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"table": [
{
"id": "TABLE-1",
"number": "1",
"dish": [
{
"name": "Fish and Chips",
"cost": "9.95"
},
{
"name": "Pesto Pasta Salad",
"cost": "6.95"
}
]
},
{
"id": "TABLE-2",
"number": "2"
},
{
"id": "TABLE-3",
"number": "3",
"dish": [
{
"name": "Fish and Chips",
"cost": "9.95"
}
]
}
]
},
{
"id": "CAFE-VEG",
"name": "Vegetarian Cafe",
"table": [
{
"id": "TABLE-16",
"number": "16",
"dish": [
{
"name": "Large Glass Sauvignon",
"cost": "5.95"
}
]
},
{
"id": "TABLE-17",
"number": "17"
}
]
}
]
}
If you run the example above, Flatten Tool will generate the following CSV files for you, populated with the data from the input JSON file.
id |
name |
address |
---|---|---|
CAFE-HEALTH |
Healthy Cafe |
|
CAFE-VEG |
Vegetarian Cafe |
id |
table/0/id |
table/0/number |
---|---|---|
CAFE-HEALTH |
TABLE-1 |
1 |
CAFE-HEALTH |
TABLE-2 |
2 |
CAFE-HEALTH |
TABLE-3 |
3 |
CAFE-VEG |
TABLE-16 |
16 |
CAFE-VEG |
TABLE-17 |
17 |
id |
table/0/id |
table/0/dish/0/id |
table/0/dish/0/name |
table/0/dish/0/cost |
CAFE-HEALTH |
TABLE-1 |
Fish and Chips |
9.95 |
|
CAFE-HEALTH |
TABLE-1 |
Pesto Pasta Salad |
6.95 |
|
CAFE-HEALTH |
TABLE-3 |
Fish and Chips |
9.95 |
|
CAFE-VEG |
TABLE-16 |
Large Glass Sauvignon |
5.95 |
Caution
If you forget the --root-list-path
option and your data isn’t under a top
level key called main
, Flatten Tool won’t find your data and will instead
generate a single empty sheet called main
, which probably isn’t what you
want.
If your data has a list as a root, use the --root-is-list
option.
[
{
"id": "shop",
"title": "Shop"
},
{
"id": "pub",
"title": "Pub"
}
]
$ flatten-tool flatten --root-is-list examples/flatten/root-is-list/data.json -o examples/flatten/root-is-list/actual
id |
title |
shop |
Shop |
pub |
Pub |
Sheet Prefix¶
You can pass a string that will be added at the start of all CSV file names, or all Excel sheet names, using the --sheet-prefix
option.
$ flatten-tool flatten --sheet-prefix=test- --root-list-path=cafe --main-sheet-name=cafe --schema=examples/receipt/cafe.schema examples/receipt/normalised/expected.json -o examples/flatten/sheet-prefix/actual
Will produce:
id |
name |
address |
---|---|---|
CAFE-HEALTH |
Healthy Cafe |
|
CAFE-VEG |
Vegetarian Cafe |
id |
table/0/id |
table/0/number |
---|---|---|
CAFE-HEALTH |
TABLE-1 |
1 |
CAFE-HEALTH |
TABLE-2 |
2 |
CAFE-HEALTH |
TABLE-3 |
3 |
CAFE-VEG |
TABLE-16 |
16 |
CAFE-VEG |
TABLE-17 |
17 |
id |
table/0/id |
table/0/dish/0/id |
table/0/dish/0/name |
table/0/dish/0/cost |
CAFE-HEALTH |
TABLE-1 |
Fish and Chips |
9.95 |
|
CAFE-HEALTH |
TABLE-1 |
Pesto Pasta Salad |
6.95 |
|
CAFE-HEALTH |
TABLE-3 |
Fish and Chips |
9.95 |
|
CAFE-VEG |
TABLE-16 |
Large Glass Sauvignon |
5.95 |
Filter¶
When flattening, you can optionally choose to only process some of the data.
Currently, only simple filters can be specified using the --filter-field
and --filter-value
option.
{
"main": [
{
"id": "1",
"type": "cafe",
"title": "Cafe Open",
"dishes": [
{ "title": "All Day Breakfast" }
],
"coffee": "passable"
},
{
"id": "2",
"type": "pub",
"title": "Pints R Us",
"pints": [
{ "title": "Beer" },
{ "title": "Larger" }
],
"coke": "pepsi"
}
]
}
$ flatten-tool flatten --filter-field=type --filter-value=pub examples/flatten/filter/input.json -o examples/flatten/filter/actual
id |
type |
title |
coke |
---|---|---|---|
2 |
pub |
Pints R Us |
pepsi |
id |
pints/0/title |
---|---|
2 |
Beer |
2 |
Larger |
No dishes
sheet is produced, and the main sheet does not have a coffee
column.
The field specified must be a field directly on the data object - it’s not possible to filter on fields like pints/0/title
.
Remove Empty Schema Columns¶
By default, when using this with schema specified columns that are empty (have no data) will be kept in the output.
But you can pass the remove-empty-schema-columns flag to have these removed.
If all columns are removed from a sheet and it is empty, the whole sheet will be removed too.
This shows without and with the flag:
$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --schema=examples/receipt/cafe.schema examples/receipt/normalised/expected.json -o examples/flatten/simple/actual
id |
name |
address |
---|---|---|
CAFE-HEALTH |
Healthy Cafe |
|
CAFE-VEG |
Vegetarian Cafe |
$ flatten-tool flatten --remove-empty-schema-columns --root-list-path=cafe --main-sheet-name=cafe --schema=examples/receipt/cafe.schema examples/flatten/remove-empty-schema-columns/input.json -o examples/flatten/remove-empty-schema-columns/actual
id |
name |
---|---|
CAFE-HEALTH |
Healthy Cafe |
CAFE-VEG |
Vegetarian Cafe |
(Using this without the schema option does nothing.)
Preserve Fields¶
By default, all fields in the input JSON are turned into columns in the CSV output. If you wish to keep only a subset of fields, you can pass the fields you want as a file using the preserve-fields
option.
{
"main": [
{
"id": "1",
"type": "cafe",
"title": "Cafe Open",
"dishes": [
{
"title": "All Day Breakfast",
"allergens": {
"freefrom": ["dairy", "eggs"],
"contains": ["gluten", "soy"],
"label": ["vegan", "vegetarian"]
}
},
{
"title": "Jack's Soup",
"allergens": {
"freefrom": ["onion", "garlic"],
"contains": ["dairy"],
"label": ["vegetarian"]
}
}
],
"coffee": "passable"
},
{
"id": "2",
"type": "pub",
"title": "Pints R Us",
"pints": [
{ "title": "Beer", "price": "2gbp" },
{ "title": "Larger" }
],
"coke": "pepsi"
}
]
}
id
type
title
pints
coke
dishes/title
dishes/price
dishes/allergens/label
$ flatten-tool flatten --preserve-fields examples/flatten/preserve-fields/fields_to_preserve.csv examples/flatten/preserve-fields/input.json -o examples/flatten/preserve-fields/actual
id |
type |
title |
coke |
---|---|---|---|
1 |
cafe |
Cafe Open |
|
2 |
pub |
Pints R Us |
pepsi |
The input file should contain the full JSON paths of the fields you want to preserve, one per line. If any of the fields passed contain objects, all child fields will be preserved. Eg. if you pass title, the top level title fields will be preserved, but dishes/title will not; if you pass dishes, the dishes/title and any other children of dishes will automatically be preserved. If you pass dishes and dishes/title, only dishes/title will be preserved, other children of dishes will be excluded. The order of the fields in the input is not significant.
Rollup¶
If you have a JSON schema where objects are modeled as lists of objects but actually represent one to one relationships, you can roll up certain properties.
This means taking the values and rather than having them as a separate sheet, have the values listed on the main sheet.
To enable roll up behaviour you have to:
Use the
--rollup
flag
And do one of:
Via schema: Add the
rollUp
key to the JSON Schema to the child object with a value that is an array of the fields to roll upDirect input: Pass one or more field at the command line
File input: Pass a file with a line-separated list of fields
If you pass direct input or file input, and a schema which contains a rollUp
attribute,
the schema is used and the direct input or file input are ignored.
However, if you pass direct or file input, and and a schema which does not contain a rollUp
attribute, the direct or file input will be used.
For the following examples, we use this input data:
{
"cafe": [
{
"id": "1",
"type": "cafe",
"title": "Cafe Open",
"owners": [
{
"firstname": "Clark",
"lastname": "Kent",
"email": "contact@cafeopen.example"
}
],
"dishes": [
{
"id": "bfast",
"title": "All Day Breakfast",
"allergens": {
"freefrom": ["dairy", "eggs"],
"contains": ["gluten", "soy"],
"label": ["vegan", "vegetarian"]
}
},
{
"id": "soup",
"title": "Jack's Soup",
"allergens": {
"freefrom": ["onion", "garlic"],
"contains": ["dairy"],
"label": ["vegetarian"]
}
}
],
"coffee": "passable"
},
{
"id": "2",
"type": "pub",
"title": "Pints R Us",
"owners": [
{
"firstname": "Jim",
"lastname": "Kirk",
"email": "pintsrus@protonmail.com"
}
],
"coke": "pepsi"
}
]
}
Here, the owners
property contains a list with a single object. The dishes
property is also a list of objects, but cannot be rolled up, as the cafe has more
than one dish.
Rollup via schema¶
Here are the changes we make to the schema:
--- /home/docs/checkouts/readthedocs.org/user_builds/flatten-tool/checkouts/latest/examples/flatten/rollup/schema/cafe.schema
+++ /home/docs/checkouts/readthedocs.org/user_builds/flatten-tool/checkouts/latest/examples/flatten/rollup/schema/cafe-rollup.schema
@@ -82,7 +82,8 @@
"$ref": "#/definitions/OwnerObject"
},
"type": "array",
- "title": "Owner"
+ "title": "Owner",
+ "rollUp": ["firstname", "lastname", "email"]
},
"dishes": {
"items": {
Here’s the command we run:
$ flatten-tool flatten --root-list-path=cafe --use-titles --main-sheet-name=cafe --rollup --schema=examples/flatten/rollup/schema/cafe-rollup.schema examples/flatten/rollup/input.json -o examples/flatten/rollup/schema/actual
Here are the resulting sheets:
Identifier |
Type |
Name |
Address |
Owner:First name |
Owner:Last name |
Owner:Email |
Coffee quality |
Type of coke |
1 |
cafe |
Cafe Open |
Clark |
Kent |
passable |
|||
2 |
pub |
Pints R Us |
Jim |
Kirk |
pepsi |
Identifier |
Owner:First name |
Owner:Last name |
Owner:Email |
1 |
Clark |
Kent |
|
2 |
Jim |
Kirk |
Notice how Owner: First name
, Owner: Last name
and Owner: Email
now
appear in both the cafe.csv
and Owner.csv
files.
Caution
If you try to roll up multiple values you’ll get a warning like this:
UserWarning: More than one value supplied for "dishes". Could not provide rollup, so adding a warning to the relevant cell(s) in the spreadsheet.
warn('More than one value supplied for "{}". Could not provide rollup, so adding a warning to the relevant cell(s) in the spreadsheet.'.format(parent_name+key))
Rollup via direct input¶
Run this command to rollup all properties of the owners
object:
$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --rollup=owners examples/flatten/rollup/input.json -o examples/flatten/rollup/direct/actual
You can include multiple fields to rollup by passing --rollup
multiple times:
$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --rollup=owners --rollup=dishes examples/flatten/rollup/input.json -o examples/flatten/rollup/direct/actual
For the result:
id |
type |
title |
owners/0/firstname |
owners/0/lastname |
owners/0/email |
coffee |
coke |
1 |
cafe |
Cafe Open |
Clark |
Kent |
passable |
||
2 |
pub |
Pints R Us |
Jim |
Kirk |
pepsi |
Rollup via file input¶
Run this command to rollup all properties of the owners
object:
$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --rollup=examples/flatten/rollup/file/fields_to_rollup.txt examples/flatten/rollup/input.json -o examples/flatten/rollup/file/actual
Where the file contains:
owners
You can include multiple fields to rollup by passing --rollup
multiple times:
For the result:
id |
type |
title |
owners/0/firstname |
owners/0/lastname |
owners/0/email |
coffee |
coke |
1 |
cafe |
Cafe Open |
Clark |
Kent |
passable |
||
2 |
pub |
Pints R Us |
Jim |
Kirk |
pepsi |
Selective rollup¶
If you don’t want to include all of the properties of a rolled up object in the main
sheet, you can use rollup
in combination with preserve-fields
, eg.
$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --rollup=owners --preserve-fields fields-to-preserve.txt examples/flatten/rollup/input.json -o examples/flatten/rollup/direct/actual
Where fields-to-preserve.txt
contains:
id
type
title
owners/email
This excludes owners/firstname
and owners/lastname
from both the main sheet
and the owners sheet.
All flatten options¶
$ flatten-tool flatten -h
usage: flatten-tool flatten [-h] [-s SCHEMA] [-f {csv,ods,xlsx,all}] [--xml]
[--id-name ID_NAME] [-m MAIN_SHEET_NAME]
[-o OUTPUT_NAME] [--root-list-path ROOT_LIST_PATH]
[--rollup [ROLLUP]] [-r ROOT_ID] [--use-titles]
[--truncation-length TRUNCATION_LENGTH]
[--root-is-list] [--sheet-prefix SHEET_PREFIX]
[--filter-field FILTER_FIELD]
[--filter-value FILTER_VALUE]
[--preserve-fields PRESERVE_FIELDS]
[--disable-local-refs]
[--remove-empty-schema-columns]
[--line-terminator LINE_TERMINATOR]
[--convert-wkt]
input_name
positional arguments:
input_name Name of the input JSON file.
options:
-h, --help show this help message and exit
-s SCHEMA, --schema SCHEMA
Path to a relevant schema.
-f {csv,ods,xlsx,all}, --output-format {csv,ods,xlsx,all}
Type of template you want to create. Defaults to all
available options
--xml Use XML as the input format
--id-name ID_NAME String to use for the identifier key, defaults to 'id'
-m MAIN_SHEET_NAME, --main-sheet-name MAIN_SHEET_NAME
The name of the main sheet, as seen in the first tab
of the spreadsheet for example. Defaults to main
-o OUTPUT_NAME, --output-name OUTPUT_NAME
Name of the outputted file. Will have an extension
appended if format is all.
--root-list-path ROOT_LIST_PATH
Path of the root list, defaults to main
--rollup [ROLLUP] "Roll up" columns from subsheets into the main sheet.
Pass one or more JSON paths directly, or a file with
one JSON path per line, or no value and use a schema
containing (a) rollUp attribute(s). Schema takes
precedence if both direct input and schema with
rollUps are present.
-r ROOT_ID, --root-id ROOT_ID
Root ID of the data format, e.g. ocid for OCDS
--use-titles Convert titles. Requires a schema to be specified.
--truncation-length TRUNCATION_LENGTH
The length of components of sub-sheet names (default
3).
--root-is-list The root element is a list. --root-list-path and meta
data will be ignored.
--sheet-prefix SHEET_PREFIX
A string to prefix to the start of every sheet (or
file) name.
--filter-field FILTER_FIELD
Data Filter - only data with this will be processed.
Use with --filter-value
--filter-value FILTER_VALUE
Data Filter - only data with this will be processed.
Use with --filter-field
--preserve-fields PRESERVE_FIELDS
Only these fields will be processed. Pass a file with
JSON paths to be preserved one per line.
--disable-local-refs Disable local refs when parsing JSON Schema.
--remove-empty-schema-columns
When using flatten with a schema, remove columns and
sheets from the output that contain no data.
--line-terminator LINE_TERMINATOR
The line terminator to use when writing CSV files:
CRLF or LF
--convert-wkt Enable conversion of geojson to WKT