Flattening

Caution

This page is a work in progress. The information may not be complete, and unlike the Spreadsheet Designer’s Guide, the tests backing up the documented examples are not correct. Use with caution.

The Spreadsheet Designer’s Guide was about unflattening - taking a spreadsheet and producing a JSON document.

In this section you’ll learn about flattening. The main use case for wanting to flatten a JSON document is so that you can manage the data in a spreadsheet.

Flatten Tool provides the flatten-tool flatten sub-command for this purpose.

Generating a spreadsheet from a JSON document

Generating a spreadsheet from a JSON document is very similar to creating a template.

$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --schema=examples/receipt/cafe.schema examples/receipt/normalised/expected.json -o examples/flatten/simple/actual

One difference is that the default output name is flattened, and so the command above will generate a flattened/ directory with CSV files and a flattened.xlsx file in the current working directory.

The schema is the same as the one used in the user guide and looks like this:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "definitions": {
        "TableObject": {
            "type": "object",
            "properties": {
                "id": {
                    "type": "string",
                    "title": "Identifier"
                },
                "number": {
                    "type": "integer",
                    "title": "Number"
                },
                "dish": {
                    "items": {
                        "$ref": "#/definitions/DishObject"
                    },
                    "type": "array",
                    "title": "Dish"
                }
            }
        },
        "DishObject": {
            "type": "object",
            "properties": {
                "id": {
                    "type": "string",
                    "title": "Identifier"
                },
                "name": {
                    "type": "string",
                    "title": "Name"
                },
                "cost": {
                    "type": "number",
                    "title": "Cost"
                }
            }
        }
    },
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "title": "Identifier"
        },
        "name": {
            "type": "string",
            "title": "Name"
        },
        "address": {
            "type": "string",
            "title": "Address"
        },
        "table": {
            "items": {
                "$ref": "#/definitions/TableObject"
            },
            "type": "array",
            "title": "Table"
        }
    }
}

The input JSON file looks like this:

{
    "cafe": [
        {
            "id": "CAFE-HEALTH",
            "name": "Healthy Cafe",
            "table": [
                {
                    "id": "TABLE-1",
                    "number": "1",
                    "dish": [
                        {
                            "name": "Fish and Chips",
                            "cost": "9.95"
                        },
                        {
                            "name": "Pesto Pasta Salad",
                            "cost": "6.95"
                        }
                    ]
                },
                {
                    "id": "TABLE-2",
                    "number": "2"
                },
                {
                    "id": "TABLE-3",
                    "number": "3",
                    "dish": [
                        {
                            "name": "Fish and Chips",
                            "cost": "9.95"
                        }
                    ]
                }
            ]
        },
        {
            "id": "CAFE-VEG",
            "name": "Vegetarian Cafe",
            "table": [
                {
                    "id": "TABLE-16",
                    "number": "16",
                    "dish": [
                        {
                            "name": "Large Glass Sauvignon",
                            "cost": "5.95"
                        }
                    ]
                },
                {
                    "id": "TABLE-17",
                    "number": "17"
                }
            ]
        }
    ]
}

If you run the example above, Flatten Tool will generate the following CSV files for you, populated with the data from the input JSON file.

sheet: cafe.csv
id name address
CAFE-HEALTH Healthy Cafe  
CAFE-VEG Vegetarian Cafe  
sheet: table.csv
id table/0/id table/0/number
CAFE-HEALTH TABLE-1 1
CAFE-HEALTH TABLE-2 2
CAFE-HEALTH TABLE-3 3
CAFE-VEG TABLE-16 16
CAFE-VEG TABLE-17 17
sheet: tab_dish.csv
id table/0/id table/0/dish/0/id table/0/dish/0/name table/0/dish/0/cost
CAFE-HEALTH TABLE-1   Fish and Chips 9.95
CAFE-HEALTH TABLE-1   Pesto Pasta Salad 6.95
CAFE-HEALTH TABLE-3   Fish and Chips 9.95
CAFE-VEG TABLE-16   Large Glass Sauvignon 5.95

Caution

If you forget the --root-list-path option and your data isn’t under a top level key called main, Flatten Tool won’t find your data and will instead generate a single empty sheet called main, which probably isn’t what you want.

If your data has a list as a root, use the --root-is-list option.

[
  {
    "id": "shop",
    "title": "Shop"
  },
  {
    "id": "pub",
    "title": "Pub"
  }
]
$ flatten-tool flatten --root-is-list  examples/flatten/root-is-list/data.json -o examples/flatten/root-is-list/actual
sheet: main.csv
id title
shop Shop
pub Pub

Sheet Prefix

You can pass a string that will be added at the start of all CSV file names, or all Excel sheet names, using the --sheet-prefix option.

$ flatten-tool flatten --sheet-prefix=test- --root-list-path=cafe --main-sheet-name=cafe --schema=examples/receipt/cafe.schema examples/receipt/normalised/expected.json -o examples/flatten/sheet-prefix/actual

Will produce:

sheet: test-cafe.csv
id name address
CAFE-HEALTH Healthy Cafe  
CAFE-VEG Vegetarian Cafe  
sheet: test-table.csv
id table/0/id table/0/number
CAFE-HEALTH TABLE-1 1
CAFE-HEALTH TABLE-2 2
CAFE-HEALTH TABLE-3 3
CAFE-VEG TABLE-16 16
CAFE-VEG TABLE-17 17
sheet: test-tab_dish.csv
id table/0/id table/0/dish/0/id table/0/dish/0/name table/0/dish/0/cost
CAFE-HEALTH TABLE-1   Fish and Chips 9.95
CAFE-HEALTH TABLE-1   Pesto Pasta Salad 6.95
CAFE-HEALTH TABLE-3   Fish and Chips 9.95
CAFE-VEG TABLE-16   Large Glass Sauvignon 5.95

Filter

When flattening, you can optionally choose to only process some of the data.

Currently, only simple filters can be specified using the --filter-field and --filter-value option.

{
  "main": [
    {
      "id": "1",
      "type": "cafe",
      "title": "Cafe Open",
      "dishes": [
        { "title": "All Day Breakfast" }
      ],
      "coffee": "passable"
    },
    {
      "id": "2",
      "type": "pub",
      "title": "Pints R Us",
      "pints": [
        { "title": "Beer" },
        { "title": "Larger" }
      ],
      "coke": "pepsi"
    }
  ]
}
$ flatten-tool flatten --filter-field=type --filter-value=pub examples/flatten/filter/input.json -o examples/flatten/filter/actual
sheet: main.csv
id type title coke
2 pub Pints R Us pepsi
sheet: pints.csv
id pints/0/title
2 Beer
2 Larger

No dishes sheet is produced, and the main sheet does not have a coffee column.

The field specified must be a field directly on the data object - it’s not possible to filter on fields like pints/0/title .

Remove Empty Schema Columns

By default, when using this with schema specified columns that are empty (have no data) will be kept in the output.

But you can pass the remove-empty-schema-columns flag to have these removed.

If all columns are removed from a sheet and it is empty, the whole sheet will be removed too.

This shows without and with the flag:

$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --schema=examples/receipt/cafe.schema examples/receipt/normalised/expected.json -o examples/flatten/simple/actual
sheet: cafe.csv
id name address
CAFE-HEALTH Healthy Cafe  
CAFE-VEG Vegetarian Cafe  
$ flatten-tool flatten --remove-empty-schema-columns --root-list-path=cafe --main-sheet-name=cafe --schema=examples/receipt/cafe.schema examples/flatten/remove-empty-schema-columns/input.json -o examples/flatten/remove-empty-schema-columns/actual
sheet: cafe.csv
id name
CAFE-HEALTH Healthy Cafe
CAFE-VEG Vegetarian Cafe

(Using this without the schema option does nothing.)

Preserve Fields

By default, all fields in the input JSON are turned into columns in the CSV output. If you wish to keep only a subset of fields, you can pass the fields you want as a file using the preserve-fields option.

{
  "main": [
    {
      "id": "1",
      "type": "cafe",
      "title": "Cafe Open",
      "dishes": [
        {
          "title": "All Day Breakfast",
          "allergens": {
            "freefrom": ["dairy", "eggs"],
            "contains": ["gluten", "soy"],
            "label": ["vegan", "vegetarian"]
          }
        },
        {
          "title": "Jack's Soup",
          "allergens": {
            "freefrom": ["onion", "garlic"],
            "contains": ["dairy"],
            "label": ["vegetarian"]
          }
        }
      ],
      "coffee": "passable"
    },
    {
      "id": "2",
      "type": "pub",
      "title": "Pints R Us",
      "pints": [
        { "title": "Beer", "price": "2gbp" },
        { "title": "Larger" }
      ],
      "coke": "pepsi"
    }
  ]
}
id
type
title
pints
coke
dishes/title
dishes/price
dishes/allergens/label
$ flatten-tool flatten --preserve-fields examples/flatten/preserve-fields/fields_to_preserve.csv examples/flatten/preserve-fields/input.json -o examples/flatten/preserve-fields/actual
sheet: main.csv
id type title coke
1 cafe Cafe Open  
2 pub Pints R Us pepsi

The input file should contain the full JSON paths of the fields you want to preserve, one per line. If any of the fields passed contain objects, all child fields will be preserved. Eg. if you pass title, the top level title fields will be preserved, but dishes/title will not; if you pass dishes, the dishes/title and any other children of dishes will automatically be preserved. If you pass dishes and dishes/title, only dishes/title will be preserved, other children of dishes will be excluded. The order of the fields in the input is not significant.

Rollup

If you have a JSON schema where objects are modeled as lists of objects but actually represent one to one relationships, you can roll up certain properties.

This means taking the values and rather than having them as a separate sheet, have the values listed on the main sheet.

To enable roll up behaviour you have to:

  • Use the --rollup flag

And do one of:

  • Via schema: Add the rollUp key to the JSON Schema to the child object with a value that is an array of the fields to roll up
  • Direct input: Pass one or more field at the command line
  • File input: Pass a file with a line-separated list of fields

If you pass direct input or file input, and a schema which contains a rollUp attribute, the schema is used and the direct input or file input are ignored.

However, if you pass direct or file input, and and a schema which does not contain a rollUp attribute, the direct or file input will be used.

For the following examples, we use this input data:

{
  "cafe": [
    {
      "id": "1",
      "type": "cafe",
      "title": "Cafe Open",
      "owners": [
        {
          "firstname": "Clark",
          "lastname": "Kent",
          "email": "contact@cafeopen.example"
        }
      ],
      "dishes": [
        {
          "id": "bfast",
          "title": "All Day Breakfast",
          "allergens": {
            "freefrom": ["dairy", "eggs"],
            "contains": ["gluten", "soy"],
            "label": ["vegan", "vegetarian"]
          }
        },
        {
          "id": "soup",
          "title": "Jack's Soup",
          "allergens": {
            "freefrom": ["onion", "garlic"],
            "contains": ["dairy"],
            "label": ["vegetarian"]
          }
        }
      ],
      "coffee": "passable"
    },
    {
      "id": "2",
      "type": "pub",
      "title": "Pints R Us",
      "owners": [
        {
          "firstname": "Jim",
          "lastname": "Kirk",
          "email": "pintsrus@protonmail.com"
        }
      ],
      "coke": "pepsi"
    }
  ]
}

Here, the owners property contains a list with a single object. The dishes property is also a list of objects, but cannot be rolled up, as the cafe has more than one dish.

Rollup via schema

Here are the changes we make to the schema:

--- /home/docs/checkouts/readthedocs.org/user_builds/flatten-tool/checkouts/stable/examples/flatten/rollup/schema/cafe.schema
+++ /home/docs/checkouts/readthedocs.org/user_builds/flatten-tool/checkouts/stable/examples/flatten/rollup/schema/cafe-rollup.schema
@@ -82,7 +82,8 @@
                 "$ref": "#/definitions/OwnerObject"
             },
             "type": "array",
-            "title": "Owner"
+            "title": "Owner",
+            "rollUp": ["firstname", "lastname", "email"]
         },
         "dishes": {
             "items": {

Here’s the command we run:

$ flatten-tool flatten --root-list-path=cafe --use-titles --main-sheet-name=cafe --rollup --schema=examples/flatten/rollup/schema/cafe-rollup.schema examples/flatten/rollup/input.json -o examples/flatten/rollup/schema/actual

Here are the resulting sheets:

sheet: cafe.csv
Identifier Type Name Address Owner:First name Owner:Last name Owner:Email Coffee quality Type of coke
1 cafe Cafe Open   Clark Kent contact@cafeopen.example passable  
2 pub Pints R Us   Jim Kirk pintsrus@protonmail.com   pepsi
sheet: owners.csv
Identifier Owner:First name Owner:Last name Owner:Email
1 Clark Kent contact@cafeopen.example
2 Jim Kirk pintsrus@protonmail.com

Notice how Owner: First name, Owner: Last name and Owner: Email now appear in both the cafe.csv and owners.csv files.

Caution

If you try to roll up multiple values you’ll get a warning like this:

UserWarning: More than one value supplied for "dishes". Could not provide rollup, so adding a warning to the relevant cell(s) in the spreadsheet.
  warn('More than one value supplied for "{}". Could not provide rollup, so adding a warning to the relevant cell(s) in the spreadsheet.'.format(parent_name+key))

Rollup via direct input

Run this command to rollup all properties of the owners object:

$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --rollup=owners examples/flatten/rollup/input.json -o examples/flatten/rollup/direct/actual

You can include multiple fields to rollup by passing --rollup multiple times:

$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --rollup=owners --rollup=dishes examples/flatten/rollup/input.json -o examples/flatten/rollup/direct/actual

For the result:

sheet: cafe.csv
id type title owners/0/firstname owners/0/lastname owners/0/email coffee coke
1 cafe Cafe Open Clark Kent contact@cafeopen.example passable  
2 pub Pints R Us Jim Kirk pintsrus@protonmail.com   pepsi

Rollup via file input

Run this command to rollup all properties of the owners object:

$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --rollup=examples/flatten/rollup/file/fields_to_rollup.txt examples/flatten/rollup/input.json -o examples/flatten/rollup/file/actual

Where the file contains:

You can include multiple fields to rollup by passing --rollup multiple times:

For the result:

sheet: cafe.csv
id type title owners/0/firstname owners/0/lastname owners/0/email coffee coke
1 cafe Cafe Open Clark Kent contact@cafeopen.example passable  
2 pub Pints R Us Jim Kirk pintsrus@protonmail.com   pepsi

Selective rollup

If you don’t want to include all of the properties of a rolled up object in the main sheet, you can use rollup in combination with preserve-fields, eg.

$ flatten-tool flatten --root-list-path=cafe --main-sheet-name=cafe --rollup=owners --preserve-fields fields-to-preserve.txt examples/flatten/rollup/input.json -o examples/flatten/rollup/direct/actual

Where fields-to-preserve.txt contains:

id
type
title
owners/email

This excludes owners/firstname and owners/lastname from both the main sheet and the owners sheet.

All flatten options

$ flatten-tool flatten -h
usage: flatten-tool flatten [-h] [-s SCHEMA] [-f {csv,xlsx,all}] [--xml]
                            [--id-name ID_NAME] [-m MAIN_SHEET_NAME]
                            [-o OUTPUT_NAME] [--root-list-path ROOT_LIST_PATH]
                            [--rollup [ROLLUP]] [-r ROOT_ID] [--use-titles]
                            [--truncation-length TRUNCATION_LENGTH]
                            [--root-is-list] [--sheet-prefix SHEET_PREFIX]
                            [--filter-field FILTER_FIELD]
                            [--filter-value FILTER_VALUE]
                            [--preserve-fields PRESERVE_FIELDS]
                            [--disable-local-refs]
                            [--remove-empty-schema-columns]
                            input_name

positional arguments:
  input_name            Name of the input JSON file.

optional arguments:
  -h, --help            show this help message and exit
  -s SCHEMA, --schema SCHEMA
                        Path to a relevant schema.
  -f {csv,xlsx,all}, --output-format {csv,xlsx,all}
                        Type of template you want to create. Defaults to all
                        available options
  --xml                 Use XML as the input format
  --id-name ID_NAME     String to use for the identifier key, defaults to 'id'
  -m MAIN_SHEET_NAME, --main-sheet-name MAIN_SHEET_NAME
                        The name of the main sheet, as seen in the first tab
                        of the spreadsheet for example. Defaults to main
  -o OUTPUT_NAME, --output-name OUTPUT_NAME
                        Name of the outputted file. Will have an extension
                        appended if format is all.
  --root-list-path ROOT_LIST_PATH
                        Path of the root list, defaults to main
  --rollup [ROLLUP]     "Roll up" columns from subsheets into the main sheet.
                        Pass one or more JSON paths directly, or a file with
                        one JSON path per line, or no value and use a schema
                        containing (a) rollUp attribute(s). Schema takes
                        precedence if both direct input and schema with
                        rollUps are present.
  -r ROOT_ID, --root-id ROOT_ID
                        Root ID of the data format, e.g. ocid for OCDS
  --use-titles          Convert titles. Requires a schema to be specified.
  --truncation-length TRUNCATION_LENGTH
                        The length of components of sub-sheet names (default
                        3).
  --root-is-list        The root element is a list. --root-list-path and meta
                        data will be ignored.
  --sheet-prefix SHEET_PREFIX
                        A string to prefix to the start of every sheet (or
                        file) name.
  --filter-field FILTER_FIELD
                        Data Filter - only data with this will be processed.
                        Use with --filter-value
  --filter-value FILTER_VALUE
                        Data Filter - only data with this will be processed.
                        Use with --filter-field
  --preserve-fields PRESERVE_FIELDS
                        Only these fields will be processed. Pass a file with
                        JSON paths to be preserved one per line.
  --disable-local-refs  Disable local refs when parsing JSON Schema.
  --remove-empty-schema-columns
                        When using flatten with a schema, remove columns and
                        sheets from the output that contain no data.