Wikidata:Requests for permissions/Bot/DBpedia-mapper-bot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 08:17, 6 June 2015 (UTC)[reply]
DBpedia-mapper-bot[edit]
DBpedia-mapper-bot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Hjfocs (talk • contribs • logs)
Task/s: Addition of Wikidata-to-DBpedia classes/properties mappings as per the DBpedia ontology, and as discussed below and in this project chat thread.
Code: User:Hjfocs/add_dbpedia_mapping.py, ready to fulfill the task
Function details:
For each (Wikidata, DBpedia) mapping pair found in the DBpedia ontology (currently 484 pairs, see the human-readable DBpedia mappings), the bot adds the following data:
- an equivalency claim (either equivalent class or equivalent property) to a Wikidata Item representing a class or a property in the Wikidata ontology. The claim maps to the equivalent DBpedia ontology item;
- a described at URL qualifier pointing to a human-readable description of the DBpedia ontology item;
- a retrieved qualifier stating when the mapping was retrieved with day precision;
- a reference stating that the claim was imported from DBpedia.
Mapping test edit: see sandbox Item
--Hjfocs (talk) 17:27, 12 March 2015 (UTC)[reply]
- Any comments here? If not, I will approve in a couple of days.--Ymblanter (talk) 10:24, 14 March 2015 (UTC)[reply]
Reopening, after approval by Ymblanter. --Atlasowa (talk) 09:19, 17 March 2015 (UTC)[reply]
Oppose for now:
- No indication, how many edits will be performed. scale? "all"?
- No pairing of properties provided for checking.
- No test edits done (100 test edits are customary)
- What is the point of this pairing, how will this help wikidata? Is it a preliminary step for other edits/imports/projects? Earlier proposals for DBpedia imports have been abandoned,
- Wikidata:Requests_for_comment/Can_we_reuse_anything_from_DBpedia? (Opening date of discussion: 27 August 2013)
- Wikidata:Requests_for_comment/DBpedia_import_process
- Wikidata:Project_chat/Archive/2014/04#DBpedia_headsup
- If this property mapping is useful, why isn't it done on DBpedia? [1]: "We also fully extract wikidata property pages. However, for now we don’t apply any mappings to wikidata properties." If it's not done on DBpedia, why should it be added to wikidata?
- This request needs more explanation and deliberation, ping Hjfocs. --Atlasowa (talk) 09:19, 17 March 2015 (UTC)[reply]
- Hi Atlasowa,
- Thanks for the feedback.
- Let me first highlight what I think is the key point:
- This bot is not intended to provide an import facility for third-party data, but only a linkage one.
- This has 2 benefits:
- Provenance information is kept intact: users can simply check how similar fragments of knowledge are described in different knowledge bases, by browsing through the links. This holds both for humans and machines (since the data is machine-readable);
- No need to merge different data models.
- Here you can find detailed answers for each question:
- No indication, how many edits will be performed. scale? "all"?
- The edits will only affect the schema Items, so the number will not be big. Currently, we have a total of 114 classes and properties mappings as per the DBpedia mappings wiki, plus 688 classes and 335 properties mappings as per this spreadsheet.
- No pairing of properties provided for checking.
- You can have a look at the referenced DBpedia mappings wiki, which lists the mappings that are already in production.
- The referenced spreadsheet content is scheduled to be added to both the DBpedia mappings wiki and Wikidata.
- No test edits done (100 test edits are customary)
- Actually, I have been testing on the Alternative Sandbox Item, by adding a few claims. I don't think I will need many more test edits (certainly not 100).
- What is the point of this pairing, how will this help wikidata?
- Linking to third-party knowledge bases like DBpedia (which contains lots of statements that are not in Wikidata and links to other datasets) facilitates the reuse and consumption of further data, without having to import them into Wikidata. Cf. the key benefits.
- Is it a preliminary step for other edits/imports/projects?
- No, I think it is a standalone action.
- Earlier proposals for DBpedia imports have been abandoned
- I have no knowledge about this, I fear that the DBpedia community was not directly involved into those discussions.
- If this property mapping is useful, why isn't it done on DBpedia? [2]: "We also fully extract wikidata property pages. However, for now we don’t apply any mappings to wikidata properties." If it's not done on DBpedia, why should it be added to wikidata?
- You are referring to an internal project (for which we are looking for feedback from the Wikidata community, that's why it was posted there), which aims at a full integration of Wikidata into DBpedia.
- The property mapping in DBpedia is already in production, cf. my reply above.
- Hi Hjfocs, thanks for answering. Can you try to give a really precise answer to the question of how many edits/mappings ?
- "If this request is approved, it will scale to all the available mappings."
- "Currently, we have a total of 114 classes and properties mappings as per the DBpedia mappings wiki, plus 688 classes and 335 properties mappings as per this spreadsheet."
- Do you want to do 114 classes and properties mappings?
- Do you want to do 114 classes and properties mappings plus 688 classes and 335 properties mappings as per google spreadsheet?
- Do you want to do 114 classes and properties mappings plus classes and properties mappings as per google spreadsheet, minus those that have been classified wrong mapping or uncertain mapping?
- Can you give the number of mappings you want to do? --Atlasowa (talk) 14:10, 17 March 2015 (UTC)[reply]
- Hi Hjfocs, thanks for answering. Can you try to give a really precise answer to the question of how many edits/mappings ?
- Sure, Atlasowa!
- The best case scenario would be to use all of them, so the bot will perform at most 114 official mappings + 688 draft classes mappings + 335 draft properties mappings = 1,137 edits.
- As you noticed, however, the entries in the spreadsheet are still partially validated, so I will need extra pairs of eyes.
- I believe they will come from the 2 communities, as I plan to upload them both in the DBpedia mappings wiki and in Wikidata. Of course, I will personally double-check them before that.
The details of what is being done are not clear to me. Can you explain why "a reference stating that the claim was imported from DBpedia" is true? A statement that two entries in two different databases agree with each other is different from an entry in DBpedia being imported into WikiData. Also, is there an explanation of what test you do to decide if two entries are equivalent? Jc3s5h (talk) 15:23, 17 March 2015 (UTC)[reply]
- Hi Jc3s5h,
- Since the mapping originates from a DBpedia community effort, I thought that the imported from property would best fit. Do you have any suggestions for a better alternative?
- The procedure to mint a new mapping pair combines the following automatic techniques (in order of complexity):
- String similarity measures (i.e., exact match, Levenshtein distance match);
- String kernel matching;
- Logical constraint check (i.e., domain and range);
- Instance distribution similarity;
- SVM-based matching, with features such as labels or aliases.
- Then, the results need at least a round of human validation, and are finally considered official.
- Thanks. My impression is DBpedia created a class or property, and Wikidata independently created a class or property, and the effort you are involved with has discovered that certain classes or properties in the two databases are equivalent. Since the things were created independently, there is no importation involved. Jc3s5h (talk) 20:44, 17 March 2015 (UTC)[reply]
- This is a great synthesis, Jc3s5h! You pointed out the crucial aspects, thanks!
- The bot will perform a schema alignment task. --Hjfocs (talk) 09:23, 18 March 2015 (UTC)[reply]
- This is a great synthesis, Jc3s5h! You pointed out the crucial aspects, thanks!
- Sure, I totally agree. Also, the bot's behavior will be updated, in order to handle the date stamp of the claim. I would add a qualifier with property == point in time and value == date stamp, like in the population of Berlin. Do you agree, Atlasowa? --Hjfocs (talk) 11:50, 18 March 2015 (UTC)[reply]
- I would suggest to only add mappings that are "already in production" at DBpedia. Further mappings should not be "wrong" or "uncertain". ;-)
- Some further links that might be useful:
- Wikidata:Tools/External_tools#Wikidata Class and Property Browser: Lets you explore Wikidata property and class usage in your browser. http://tools.wmflabs.org/wikidata-exports/miga/?classes#
- Schema.org should have mappings to Wikidata terms where possible #280 danbri opened this Issue Jan 23, 2015, started incomplete Schema.org-Wikidata property mappings google spreadsheet
- Wikidata_talk:WikiProject_Freebase#Property_mappings: started incomplete WikiData-Freebase property mappings google spreadsheet
- Property:P1282 OpenStreetMap tagging schema (a Key:key or Tag:key=value) for classes of things, Wikidata:Property_proposal/Archive/22#OpenStreetMap-Tagging, Autolist of items with OpenStreetMap tag or key (P1282) and subclass of (P279)
- [3] [4]
- HTH, --Atlasowa (talk) 10:24, 19 March 2015 (UTC)[reply]
- Thanks for the pointers Atlasowa, they are really useful.
- Agreed WRT the automatically generated mappings: they still need human validation, and this will be done first on the DBpedia community side. Then, I can propose the linkage to Wikidata.
- Looking forward to getting more feedback on which property best fits the reference. --Hjfocs (talk) 17:54, 19 March 2015 (UTC)[reply]
- What is the current situation here?--Ymblanter (talk) 09:19, 25 March 2015 (UTC)[reply]
- I'm waiting for feedback on which property to use for referencing. If no one objects, I will proceed with imported from. --Hjfocs (talk) 10:28, 25 March 2015 (UTC)[reply]
Hi everyone, the bot is ready to run!
The function details are updated.
Instead of point in time, I found that the retrieved sub-property is more specific and fits better.
If no one objects, I will run it in a couple of days.
Thanks for your precious feedback! --Hjfocs (talk) 17:00, 4 June 2015 (UTC)[reply]
- Any new comments please?--Ymblanter (talk) 05:55, 5 June 2015 (UTC)[reply]