Wikidata:Dataset Imports/Higher Education Institutions of Brazil (2011)

From Wikidata
Jump to navigation Jump to search

You may find these related resources helpful:

Guidelines for using this page[edit]

Documenting the import[edit]

  • Guidelines on how to import a dataset into Wikidata are available at Wikidata:Data Import Guide.
  • Please include notes on all steps of the process.
  • Once a dataset has been imported into Wikidata please edit the page to change the progress status from in progress to complete.
  • It is strongly recommended to use Visual Editor when making changes to this page, particularly for editing any of the tables.

Creating a Wikidata item for the dataset[edit]

  • Please create a Wikidata item for the dataset, this will allow us to improve the coverage of datasets on Wikidata and understand what datasets are available on that topic and which of them have been added to Wikidata.
  • If you are working with very large dataset you can break it into smaller Mix n' Match catalogues, but only create one Wikidata item.
  • Link the dataset Wikidata item to this page using Wikidata Dataset Imports URL (P5195)

Getting help[edit]

  • If your dataset import runs into issues please edit the page to change the progress status from in progress to help needed.
  • You can ask for help on Wikidata:Project chat.

Overview[edit]

Dataset name[edit]

Higher Education Institutions of Brazil (2011)

Source[edit]

Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (INEP)

Link[edit]

http://dados.gov.br/dataset/instituicoes-de-ensino-superior

Dataset description[edit]

Information about the 2365 institutions present in the 2011 registry of higher education institutions.

Additional information[edit]

Latest dataset available is from 2011, but there's already up to date information of 2014.

Probably INEP will only update this particular dataset upon request.

Progress of import[edit]

The table below is used to track the progress of importing this dataset. The suggested column headings are most applicable to data being imported from a spreadsheet - you can change some column headings or add new columns as required to best describe the progress of this import.

Wikidata item for the datasetImport data into spreadsheetFormat the spreadsheet to import the dataStructure of data within WikidataImporting data into WikidataVisualisations
list of Higher Education Institutions of Brazil (Q56599716)Link: Original Dataset Link: Structured for Wikidata

Done:

- Converted labels into titlecase except a few common stopwords (o, a, os, as, de, do, da, dos, das, para)

- Reconciled municipalities (most were automatic, a few manual matches, fairly certain of quality)

- Reconciled municipality/state/country as source of income, no conflicts with previous matches

- Standardized website URLS and removed invalid ones

- Retrieved coordinates combining Google Geocode API on addresses and Google Places API on org names, accepting a 1km error and prioritizing the Places response

- Generated short descriptions

- Substituted original columns with their properties/statements values

- Names will be imported as rdfs:label and short names as rdfs:altLabel and short name (P1813)

- Descriptions generated with a summary of NOMEORG, REDE and DEPADM5 in portuguese

- Items are going to be instance of (P31) either university (Q3918) or the more general higher education institution (Q38723), depending on NOMEORG

- Items are going to be instance of (P31) either public university (Q875538) or private university (Q902104), depending on REDE

- Items are going to be instance of (P31) private not-for-profit educational institution (Q23002054), depending on DEPADM5

- Municipalities will be imported as located in the administrative territorial entity (P131)

- Addresses will be translated into coordinate location (P625)

- Sanitized website URLS will be added as official website (P856)

Wikidata:Requests for permissions/Bot/GupyBotProWD profile

Edit history[edit]

Use the table below to list batches of edits that have been completed for this dataset. Ideally each entry should have all applicable columns filled out, but at a minimum please make to add a date and description to give an idea of what was added to Wikidata and when.

DateDescriptionMethodPropertiesQualifiersReferencesStatements addedStatements removedLink to import sheet

Discussion of import[edit]

These headings are generally useful, please change this section to suit your needs.

Import completion notes[edit]

Data was imported successfully with OpenRefine, but the batch editing errored a few times, requiring the segmentation of the dataset in smaller batches. This indicates the presence of conflicting reconciliation matches. Those are very few, though, probably at most 10 entities. This is the case for hand curation afterwards, which can probably be done by investigating constraint violations like the presence of multiple "official website" statements.

Visualisations[edit]

Maintenance[edit]

Queries and expected results[edit]

Query linkDescriptionExpected results

Schedule of new data released[edit]