Wikidata:Requests for permissions/Bot/Botcrux
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 14:21, 3 January 2017 (UTC)[reply]
Botcrux[edit]
Botcrux (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Horcrux92 (talk • contribs • logs)
Task/s: Import the identifier for the new property Treccani ID (P3365) using as source Italian Wikipedia (Q11920)
Function details: I use PWB. The functions parses the text of a Wikipedia page searching for all the article identifiers of the site treccani.it. In order to do this, it uses 2 regular expressions:
\{\{ *(?:[Tt]emplate *: *)?[Tt]reccani(?![^}]*\|\s*v *= *[^|}\s])\s*\|\s*([a-z0-9-]+)/?\s*[|}]
treccani\.it/enciclopedia/([a-z0-9-]+)/?\s*[|}\]<]
Then it takes the italian label and aliases, it unidecodes them, it turns them to lowercase and id replaces spaces with hypens. Finally, if exists a match between one Treccani's identifier and a label/alias (now well-formed w.r.t. the synthax of Treccani's URL) it adds the property, using as source it.wikipedia. The matching is necessary because, in general, it's not true that a link refers to the same subject of the article that references it.
Example: in the article w:it:Gesù is present a link to http://www.treccani.it/enciclopedia/gesu-cristo/, so I get the identifier gesu-cristo
. Then from the alias "Gesù Cristo" I am able to get the string gesu-cristo
, so I get the match and I add the property.
--Horcrux92 (talk) 21:17, 4 December 2016 (UTC)[reply]
- Here's a handful of edits: Q4, Q13396, Q21201, Q23522, Q14378, Q83440, Q2329, Q11465, Q12800, Q23792, Q5990, Q8818, Q1537056, Q133212, Q13369. --Horcrux92 (talk) 21:36, 4 December 2016 (UTC)[reply]
- Although I can appreciate a good regex, I'm all in favour of easy solutions. Take for example Rimini (Q13369) where you did this edit based on "http://www.treccani.it/enciclopedia/rimini/" on it:Rimini. That url is in the externallinks. Several ways to get this externallink, you can search for it (is also an api version afaik), have MediaWiki parse it for you, pull it from the search engine dump, do a database query, etc. Hope this helps. Multichill (talk) 11:00, 7 December 2016 (UTC)[reply]
- That's very easier and useful, thank you. --Horcrux92 (talk) 15:25, 7 December 2016 (UTC)[reply]
- Although I can appreciate a good regex, I'm all in favour of easy solutions. Take for example Rimini (Q13369) where you did this edit based on "http://www.treccani.it/enciclopedia/rimini/" on it:Rimini. That url is in the externallinks. Several ways to get this externallink, you can search for it (is also an api version afaik), have MediaWiki parse it for you, pull it from the search engine dump, do a database query, etc. Hope this helps. Multichill (talk) 11:00, 7 December 2016 (UTC)[reply]
- Are we ready for approval here?--Ymblanter (talk) 19:55, 20 December 2016 (UTC)[reply]
- I am :) --Horcrux92 (talk) 11:20, 28 December 2016 (UTC)[reply]