Flatten Tool for BODS¶
BODS 0.4¶
Flatten tool can be used to flatten and unflatten BODS 0.4 data.
Flatten tool has not yet been updated to handle the BODS 0.4 schema. (For the moment the --schema
option should not be used when handling BODS 0.4 data.) This documentation will be updated when the work is complete.
flatten and unflatten¶
This data standard has a list as the root element,
as opposed to other standards where the root element is a dict with meta data and a list of data.
When flattening and unflattening, use the --root-is-list
option.
The id element is statementId
, so also use the --id-name
option.
flatten-tool flatten -f csv --root-is-list --id-name=statementId -o examples/bods-one-flatten examples/bods-one.json
flatten-tool unflatten -f csv --root-is-list --id-name=statementId -o examples/bods-one-unflattened.json examples/bods-one-flatten
BODS 0.3 and previous¶
flatten and unflatten¶
This data standard has a list as the root element,
as opposed to other standards where the root element is a dict with meta data and a list of data.
When flattening and unflattening, use the --root-is-list
option.
The id element is statementID
, so also use the --id-name
option.
flatten-tool flatten -f csv --root-is-list --id-name=statementID -o examples/bods-one-flatten examples/bods-one.json
flatten-tool unflatten -f csv --root-is-list --id-name=statementID -o examples/bods-one-unflattened.json examples/bods-one-flatten
flatten¶
This data standard has three types of statement - entityStatement
, personStatement
or ownershipOrControlStatement
.
When using flatten, the spreadsheets produced can become very mixed up.
The main.csv spreadsheet will have data of all three types in it.
What’s worse is that it’s unclear what columns apply to what types -
for instance, foundingDate
applies to entities but birthDate
applies to people.
However both columns appear in main.csv!
It’s also unclear what subsheets apply to which type - for instance there might be a subsheet called identifiers
but it’s not clear what type this applies to! (The answer is entities)
It would be better to have separate sheets for each type. That way, only the relevant columns will appear in each sheet and it will be clear which subsheet applies to which sheet.
You can solve this with a combination of the filter and sheet prefix options. To flatten a set of data, run 3 commands:
flatten-tool flatten --sheet-prefix=1_person_ --filter-field=statementType --filter-value=personStatement -f csv -o example1/ example1.json --root-is-list --id-name=statementID
flatten-tool flatten --sheet-prefix=2_entity_ --filter-field=statementType --filter-value=entityStatement -f csv -o example1/ example1.json --root-is-list --id-name=statementID
flatten-tool flatten --sheet-prefix=3_ownership_ --filter-field=statementType --filter-value=ownershipOrControlStatement -f csv -o example1/ example1.json --root-is-list --id-name=statementID
You will have a set of sheets:
1_person_addresses.csv
1_person_main.csv
1_person_names.csv
1_person_nationalities.csv
2_entity_identifiers.csv
2_entity_main.csv
3_ownership_interests.csv
3_ownership_main.csv
birthDate
only appears in 1_person_main.csv
and foundingDate
only appears in 2_entity_main.csv
, so it is clear which column is for which type.
Note this works in CSV mode. If you want to use Excel mode, you’ll need to specify 3 separate output files and then combine the sheets in them into one file afterwards by hand.
flatten-tool flatten --sheet-prefix=1_person_ --filter-field=statementType --filter-value=personStatement -f xlsx -o example1/part1.xlsx example1.json --root-is-list --id-name=statementID
flatten-tool flatten --sheet-prefix=2_entity_ --filter-field=statementType --filter-value=entityStatement -f xlsx -o example1/part2.xlsx example1.json --root-is-list --id-name=statementID
flatten-tool flatten --sheet-prefix=3_ownership_ --filter-field=statementType --filter-value=ownershipOrControlStatement -f xlsx -o example1/part3.xlsx example1.json --root-is-list --id-name=statementID
unflatten¶
Schema¶
As well as the options above, also pass the --schema
option so that types are set correctly. Note the boolean and the integer in the output.
$ flatten-tool unflatten -f csv examples/bods/unflatten/input -o examples/bods/unflatten/actual/out.json --root-is-list --id-name=statementID --schema examples/bods/schema/bods-package.json
[
{
"statementID": "fbfd0547-d0c6-4a00-b559-5c5e91c34f5c",
"interests": [
{
"type": "shareholding",
"interestLevel": "direct",
"beneficialOwnershipOrControl": true,
"startDate": "2016-04-06",
"share": {
"exact": 100
}
}
],
"statementType": "ownershipOrControlStatement",
"statementDate": "2017-11-18",
"subject": {
"describedByEntityStatement": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7"
},
"interestedParty": {
"describedByPersonStatement": "019a93f1-e470-42e9-957b-03559861b2e2"
}
}
]
Order is important¶
In the BODS schema, statements must appear in a certain order. Each of the entityStatements
or personStatements
referenced by a particular ownershipOrControlStatement
must appear before that particular statement in the ordered array.
If you have only one main table, you must make sure the statements appear in the correct order.
For example, this is good:
statementID,statementType, ...
1dc0e987-5c57-4a1c-b3ad-61353b66a9b7,entityStatement,
019a93f1-e470-42e9-957b-03559861b2e2,personStatement,
fbfd0547-d0c6-4a00-b559-5c5e91c34f5c,ownershipOrControlStatement,
Pay attention to other sheets to. If a subsheet is loaded before main.csv, the order might still be wrong.
Alternatively, you may have several main tables, one for each type of statement:
main-control-own.csv
main-entity.csv
main-person.csv
In this case, you may get data in the wrong order.
To fix this, put numbers in front of the file names so that you can be sure what order they will appear in. For instance:
1identifiers.csv
1main-entity.csv
2addresses.csv
2main-person.csv
2names.csv
2nationalities.csv
3interests.csv
3main-control-own.csv
create-template¶
You can run this directly on bods-package.json
:
flatten-tool create-template -f csv -s bods-package.json -o template --root-id=statementID
However, this will produce spreadsheets where the several types are all mixed up. As explained above, this creates problems because columns appear in the main sheet that are not relevant to all types, and it’s not clear which subsheet applies to which type.
Instead, this process can be followed to obtain clearer templates:
Create a new blank directory and change into it.
Produce the person sheets only by running:
flatten-tool create-template -f csv -s /path/to/person-statement.json -o . --root-id=statementID
Rename all the files in the directory to have
1_person_
at the start.
If your on a bash shell, you can do this by running:
for FILENAME in *; do mv $FILENAME 1_person_$FILENAME; done
Produce the entity sheets only by running:
flatten-tool create-template -f csv -s /path/to/entity-statement.json -o . --root-id=statementID
Rename all the new files in the directory to have
2_entity_
at the start.
If your on a bash shell, you can do this by running:
for FILENAME in *; do if [[ $FILENAME != 1_* ]] ; then mv $FILENAME 2_entity_$FILENAME; fi; done
Produce the ownership or control sheets only by running:
flatten-tool create-template -f csv -s /path/to/ownership-or-control-statement.json -o . --root-id=statementID
Rename all the new files in the directory to have
3_ownership_control_
at the start.
If your on a bash shell, you can do this by running:
for FILENAME in *; do if [[ $FILENAME != 1_* ]] && [[ $FILENAME != 2_* ]] ; then mv $FILENAME 3_ownership_control_$FILENAME; fi; done
You will now have a directory of files that look like this:
1_person_addresses.csv
1_person_annotations.csv
1_person_identifiers.csv
1_person_main.csv
1_person_names.csv
1_person_nationalities.csv
1_person_pepStatus.csv
1_person_sou_assertedBy.csv
2_entity_addresses.csv
2_entity_annotations.csv
2_entity_identifiers.csv
2_entity_main.csv
2_entity_sou_assertedBy.csv
3_ownership_control_annotations.csv
3_ownership_control_interests.csv
3_ownership_control_main.csv
3_ownership_control_sou_assertedBy.csv
The advantages are:
Separate for each type, so it’s clear what sheet applies to each type.
Each sheet only has the relevant columns in it, so there is no confusion about whether they apply or not.
The sheets have numbers at the start, so that when
unflatten
is used the statements will appear in the right order in the output.