Wikidata:WikiProjekt offeneregister.de
This is a Wikidata project to enrich existing wikidata items with data on legal entities (German companies, foundations/Vereine etc) from offeneregister.de. For now, only OpenCorporates ID (P1320) is licensed CC0 so we'll start with that.
We already downloaded the entire offeneregister dataset, split it up into chunks to make it more managable, and did a bit of pre-processing (re-formatted OpenCorporates ID to include "de/" prefix, split data into sets based on quality of address fields).
To contribute, download a chunk of data and mark it as In progress. Then install Openrefine and load the data as json with the outermost bracket as import path. Some of the chunks have been preprocessed for your convenience and have been uploaded as openrefine projects.
So far we've been reconciling company name ("_ - name") against Organisation (Q43229). The expected hit-miss-rate when reconciling the data with Wikidata is about 0,007% which will probably result in an affected set of 36000 items.
Properties
[edit]Main properties
[edit]Other properties - do not upload! licensing not yet clarified...
[edit]- street address (P6375)
- postal code (P281)
- headquarters location (P159)
- chief executive officer (P169)
- country (P17)
- inception (P571)
- official name (P1448)
TODOs
[edit]https://github.com/rgreschner/offeneregister-wikidata-chunked
Use these templates to mark progress and avoid duplication.
Not done
Done
Tasks:
- Download and chunk data from offeneregister. Done
- Create openrefine_projects, re-format OpenCorporates ID to include "de/" prefix and split chunks further based on quality of address data. User:a_ka_es In progress
- raw = full unprocessed chunked opencorporates.com dataset; 100,000 records - .json
- openrefine_project = only the records with clean addresses; ready to import as a project in Open Refine; OpenCorporates IDs are aligned, addresses are cleaned; ready to reconcile/upload - .openrefine.tar.gz
- without_address = only the records without addresses; OpenCorporates IDs are aligned; ready to import/reconcile/upload - .csv
- to_clean = only the records with "messy" addresses; OpenCorporates IDs are aligned - .csv
- Reconcile chunks with wikidata and upload OpenCorporates ID. In progress
- chunk 0 raw or openrefine_project User:a_ka_es Done without_address In progress to_clean In progress
- chunk 1 raw or openrefine_project Done without_address Done to_clean Done
- chunk 2 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 3 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 4 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 5 raw or openrefine_project Done without_address Done to_clean Done
- chunk 6 raw or openrefine_project Done without_address In progress User:1ucyp to_clean Not done
- chunk 7 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 8 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 9 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 10 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 11 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 12 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 13 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 14 Not done
- chunk 15 raw or openrefine_project In progress User:Kristbaum without_address Done to_clean Done
- chunk 16 raw or openrefine_project In progress User:Kristbaum without_address Not done to clean Not done
- chunk 17 raw or openrefine_project Not done without_address Not done to clean Not done
- chunk 18 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 19 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 20 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 21 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 22 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 23 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 24 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 25 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 26 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 27 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 28 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 29 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 30 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 31 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 32 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 33 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 34 raw or openrefine_project Not done without_address Not done to_clean Not done
links below are not ready yet; "raw" is linked, "openrefine_project", "without_address" and "to_clean" are in progress
- chunk 35 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 36 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 37 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 38 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 39 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 40 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 41 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 42 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 43 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 44 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 45 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 46 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 47 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 48 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 49 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 50 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 51 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 52 raw or openrefine_project Not done without_address Not done to_clean Not done
- chunk 53 raw or openrefine_project Not done without_address Not done to_clean Not done
Participants
[edit]The participants listed below can be notified using the following template in discussions:{{Ping project|WikiProjekt offeneregister.de}}