Wikidata:WikiProject India/Portals/Andhra Pradesh/Notes/Village names consistency with Local government directory data

From Wikidata
Jump to navigation Jump to search

Particpants: User:Arjunaraoc Last update: 2024-03-30

I have pulled the village data from the reports available at https://lgdirectory.gov.in/ and updated the English village labels and descriptions for villages of Andhra Pradesh based on census 2011 code as the matching key using Openrefine. Quickstatements is too slow and its batch mode is buggy for working on 15622 items dataset.

Here are my notes

  1. As per LGD 17950 villages are present including uninhabited villages, while wikidata has info on 15629 inhabited villages
  2. LGD has local language name info for 384 entries, with some of them actually the English names itself.
  3. LGD data entries have some spelling mistakes and literal translations for proper nouns (in English) for few villages, which need updation. As the data updates are done by the local authorities, hopefully these get fixed in due course. In order to ensure easy matching, I have not fixed the issues while updating the English label. Examples: Ravigudem [Big] for పెద్ద రావిగూడెం, Prathapaviswana Dhapuram (unnecessary split)
  4. LGD has same village names without differentiation if the same village name is used for multiple habitations particularly in tribal areas. I have used a numerical(indo-arabic) suffix as per the practice in Telugu wikipedia articles. The roman numerical suffix has been used officially for some names already (example:కోడూరు-I,కోడూరు- II)
  5. LGD has version number, which changes whenever somebody updates the name with a different spelling or when the limits of village change
  6. LGD has link to map, but no map is displayed when clicked.
  7. Telugu labels were copied from Telugu article, which has mandal as additional information for ambiguous names. As disambiguation is done through description term, these need to be edited to remove disambiguation other than numerical suffix, in the absence of panchayat name. This is completed, except for names with direction indicator in parenthesis.
  8. Somewhat odd naming with use of @ is also present for few entries
  9. Places upgraded to towns after 2011 have both village and city attributes, for which village is to be removed. Examples: Tuni, Samarlakota;
  10. As town of India property is not established in wikidata, some places like nagar panchayat and above in panchayatraj may be flagged as villages and some as cities and sometimes both in wikidata.
  11. When using Openrefine, the number of entries may show less than number of rows. It means that there are duplicated rows, arising from multiple attributes for the same place like village, human settlement. When updating Label, Description together, the console may show mediawiki error, when there are other items with the previous description label, but the updates are done. I have checked for such items by using wikidata search and fixed the issues, if required manually.
  12. English spelling may be same for few villages, while Telugu name has slight variation like: రమాపురం vs రామాపురం for Ramapuram. In such case if required, I have used numerical suffix for differentiation
  13. The updated names in English will be useful for tool assisted matching with village names on OSM and updating Wikidata, Telugu names on OSM. About 4432 items have coord info as of 2024-03-30. 1209 have Source info.
  14. The Telugu descriptions do not have mandal info for some entries.
  15. The official suffixes of English names are not consistent. For example, some usage of II, Ii, ii exists. Did not intervene, as these need to be consistent with LGD. At regular intervals a sync will be required.
  16. The indo arabic suffix when same name is used more than once in a mandal, may not show -1 for some entries.
  17. As the data is based on census 2011, village names are suffixed with census qualifiers like Ct- for Census town, (part) when referring to the rural part of the original village.