Wikidata talk:WikiProject Authority control/Archive 1

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Discussion moved from main page

"GND co-referencing is more advanced than VIAF co-referencing" Well, yes: "The inclusion of authority data in Wikimedia projects was pioneered by the German Wikipedia at the request of the German library community. As of June 2012, the German Wikipedia had over 220,000 articles tagged with "Normdaten". As of June 2012, Wikimedia Commons had over 31,000 categories tagged with Authority control. As of April 2013, the English Wikipedia has over 260,000 articles tagged with Authority control." en:Wikipedia:Authority_control As of now ~320.000 GND in german WP. de:Wikipedia:Normdaten#Statistik. Basically, enWP copied from deWP.

The german GND is not just persons, while viaf is only persons. GND has different types:
  • p Person (individualisiert) person (individualized)
  • n Name (nicht individualisiert) name (not individualized)
  • k Körperschaft corporate body/institution
etc. This is really valuable information, which we store in deWP. It was also copied to wikidata. But some people at wikidata were too stupid, they abused "GND type" as "MAIN type", fucked things up badly, then: Deleted all GND type data! A lot of people at deWP are still mad about this.
Don't start botruns if you don't know what you're dealing with. --Atlasowa (talk) 19:48, 23 January 2015 (UTC)

@Atlasowa:

  • viaf is only persons: this is false. Everything in GND is copied over to VIAF! Can you point out one GND entity at DNB that is not in VIAF? My point is exactly that we can leverage GND id's in Wikidata (and any other VIAF-participating id's) to populate VIAF id's (and all other VIAF-participating id's).
  • "GND type" ... fucked things up: I wasn't involved, and I agree with you that "GND type" should have been kept, since it's useful to the Authority Control maintainers. But IMHO its general utility is limited. Let's not open this topic again, let's work together on the new opportunities.
  • enWP copied from deWP: You should feel good about this, not offended! I've read the VIAFbot paper, which says it imported some 370k VIAF id's using the WKP (enwiki) links. Whoever can, contributes coreferencing info, which is then propagated to other authorities. This is how the global coreferencing network grows!
  • Don't start botruns if you don't know: Oh, I'm too new here to dare run bots. That's why I'm raising this question. But it seems to me that huge coreferencing opportunities through the VIAF clusters remain unexploited. Every time an id is added for a VIAF-participating authority, it should be propagated to all VIAF-participating ids
  • Would you support my suggestion to start and Authority Control project here? --Vladimir Alexiev (talk) 08:02, 25 January 2015 (UTC)
You're right, Vladimir Alexiev, viaf is NOT only persons. I guess i got confused somehow, maybe because i see viaf ID only on person articles at english WP. --Atlasowa (talk) 11:45, 27 January 2015 (UTC)

First of all: Great idea. But: How can we avoid, that the same errors are imported multiple times?

Good question. A related question is: how to keep precise provenance of identifiers. When a GND is copied from VIAF to Wikidata, how to mark this consistently with a "source" qualifier, and if the VIAF number is changed, how to use this to clear off the GND field. And how to signal to VIAF when a correction is needed. (Vladimir Alexiev, 28. Januar 2015)

VIAF is helpful but also incomplete, outdated, and shows in some cases different, internal numbers that produces dead links. (It has for example problems with hyphens.) VIAF is only harvesting numbers. It is not the original source (see: Property talk:P227#Usage note). --Kolja21 (talk) 08:40, 27 January 2015 (UTC)

Yes, VIAF itself doesn't register Persons (except the xA and xR files, which are maintained by OCLC and have some 2.5M records... but they're used more for "error correction"). This doesn't mean it's not important. It's not some hobby initiative: it's run by OCLC but is managed/directed by the VIAF consortium that includes the major national libraries, and DNB/GND is one of the main stakeholders. It's hugely important in the library community. See Name Data Sources for Semantic Enrichment and the discussion thereof in the main section. (Vladimir Alexiev, 28. Januar 2015)
No doubt, VIAF is important. We can import the VIAF number if we have a GND, LCNAF etc. But we can't import the number the other way around without errors. Nevertheless if we use imported from Wikimedia project (P143) + retrieved (P813) it's ok to import GND, LCNAF etc. from VIAF anyway. But again: How can we avoid that a bot is adding numbers that have already been deleted by hand because they do not match? --Kolja21 (talk) 15:15, 29 January 2015 (UTC)
Instead of deleting, a user might set a "no value" or use the proposed differentFrom (see example there). But there's no good "source" props for a user to explain what he consulted, provide evidence (which facts), and his reasoning: you can't add a comment and user as source --Vladimir Alexiev (talk) 15:14, 23 February 2015 (UTC)
BTW: It would be a great help if a bot can check the GNDs that have been imported from German Wikipedia. Some of them are Tn (not valid), others are outdated. Also the property retrieved (P813) is missing. The last one and a half years there have been made a lot of corrections in Wikipedia that are not shown on Wikidata. --Kolja21 (talk) 15:26, 29 January 2015 (UTC)
@Pasleim: Maybe your bot can help? --Succu (talk) 16:24, 29 January 2015 (UTC)
As soon as time allows, I will create a list with differences between dewiki and Wikidata. With this list we can then define further bot tasks. --Pasleim (talk) 22:50, 29 January 2015 (UTC)
These differences could be determined easily if de:Template:Normdaten would be setting an automatic maintenance category (or just hidden redlinks like in de:Template:MdEP). Also inappropriate GND numbers in de:WP are determined at irregular intervals at de:Benutzer:Gymel/GND-Probleme#Entitätenfehler. What is missing seems to be a list of those wikidata items whose GND ID (P227) values correspond to "undifferentiated" GND records: A bot could simply remove them. -- Gymel (talk) 10:58, 30 January 2015 (UTC)
What is a undifferentiated GND record? Do you have an example? --Pasleim (talk) 14:06, 30 January 2015 (UTC)
They were referred to as "Tn" above. The VIAF cluster 10125559 for Julian Musielak (Q11729072) depicts two "DNB" (e.g. GND) records, one (d 1928-) is the "good" one 17226894X showing "Person" in the HTML display and <rdf:type rdf:resource="http://d-nb.info/standards/elementset/gnd#DifferentiatedPerson" /> in the "RDF/XML-Repräsentation dieses Datensatzes" (link in the right column of the HTML display). The other one is marked (undifferentiated) in VIAF (and also (sparse) which does not matter here), the GND display 110829719 says "Name" instead of "Person" and the RDF/XML form calls it a <rdf:type rdf:resource="http://d-nb.info/standards/elementset/gnd#UndifferentiatedPerson" />. -- Gymel (talk) 15:03, 30 January 2015 (UTC)
A similar concern holds about Wikimedia Disambiguation pages and RKDartists disambiguation pages. See Filter_out_Disambiguation_entries_and_Un-notable_Persons --Vladimir Alexiev (talk) 15:14, 23 February 2015 (UTC)
Linkfix: VIAF:10125559. @Pasleim: Thanks for your help! Explanation undifferentiated GND record (= Tn): de:Hilfe:GND#Personen. --Kolja21 (talk) 16:01, 30 January 2015 (UTC)
Ok I understand. Now I can think about 4 different cases: The Wikidata GND value is Tn and
  1. there is no sitelink to dewiki
  2. dewiki article has the same GND value
  3. dewiki article has a different GND value but also a GNDName value
  4. dewiki article has a different GND value and no GNDName value
What should I do in these cases? --Pasleim (talk) 19:44, 30 January 2015 (UTC)
Don't worry about the 4 cases. Imho all Tns in WD can be deleted without loss of information, since they were inserted accidentally. In deWP we take care of Tns and other problems through de:Kategorie:Wikipedia:Normdaten-Wartung. de:Vorlage:Normdaten#GND-Einträge mit Wartungsbedarf has a special parameter called "GNDName". --Kolja21 (talk) 22:47, 30 January 2015 (UTC)
ad 1: When P227's are deleted KrBot will restore them on a regular basis from the individual wikipedias in case the number is still listed in local authority control templates (fortunately KrBot provides the statements it adds with imported from Wikimedia project (P143) qualifiers so one has a chance to identify at least one wikipedia where a wrong entry originates from). Thus fixing (i.e. removing) undifferentiated ones here is only part of the game. In cases P227 has a source statement the bot should log it (i.e. Wikidata item number, P227 value, source statement to give us a chance to clean up the local AC templates before the undifferentiated values are automatically re-added to wikidata). When there is a) no source statement or b) imported from Wikimedia project (P143) with any value or c) stated in (P248) with value Virtual International Authority File (Q54919) then P227 should be deleted. (It would be nice to have the log partitioned into two sections: values already deleted by bot vs. to be checked manually).
ad 3-4: We can safely exclude the case with multiple P227 (they are listed in Wikidata:Database_reports/Constraint_violations/P227 and kicking out undifferentiated entries is daily routine in order to keep the number of "unique" violations small). Since KrBot is regularily importing from de:WP (I think import is triggered when the article has not been edited for four weeks) I don't think that there are many cases of differing GND values at all since they would result in multiple values here and therefore be listed in the constraints violation report.
ad 2: This should amount to about 500 entries (count from last October, so you might encounter a couple of hundred more), which are known as problem in de:WP but unfortunately have not been fixed yet. Deleting the values in Wikidata would make the discrepancy more visible without any real loss.
Thus if I were the bot I would not handle cases 2-4 different from case 1, i.e. I would base the decisions solely on findings on wikidata and not even bother cross-checking with de:WP or any other (I hope that we do not need a bot inspecting all sitelinks in order to tell us where some strange entry is stemming from). -- Gymel (talk) 23:26, 30 January 2015 (UTC)
The bot will identify at least 6.200 P227 values (and I guess no more than 6.500) corresponding to undifferentiated persons, and the hope is that a high proportion of them can be weeded out automatically and that not too many of them are still backed by local wikipedia templates which will have to be fixed eventually by hand. -- Gymel (talk) 23:26, 30 January 2015 (UTC)
Finally, I found time to write a bot. On /Tn you find 100 test edits I did today. If they are okay I will continue with removing Tn records --Pasleim (talk) 22:41, 23 February 2015 (UTC)
Thank you very much. Looks good, no objections. Roughly one third of the list has the numbers imported from still active entries in en:wikisource but I don't think this reflects the overall ratio. For many of these there (about 80% of the 16 I checked) exist (relative recently created) individualized GND records but I'm quite certain that we (or anybody on en:ws?) won't be able to systematically process a /Tn-style list of several thousand entries. In the last couple of weeks I noticed (within the unique value constraint violations) an increase of sourceless Tn values for P277, and I suspect some editors have tools which incorporate VIAF clusters into wikidata without the extra check for "undifferentiated" GND entries. Therefore it would be interesting to re-check the numbers one month or so after your bot has completely cleared the list. -- Gymel (talk) 22:00, 24 February 2015 (UTC)

Subject item of this property

@Multichill: removed "Wikidata item of this property (P1629)" from Artsy artist ID (P2042).

This property is collected by the "propose property" form. Has there been a change in policy?


FYI

I really hope someone has an overview of what is copied/deleted from wikipedia/wikidata/viaf and in what direction and order... --Atlasowa (talk) 13:14, 29 April 2015 (UTC)

Nope. It seems that sometimes some bots took "isolated" AC numbers (e.g. those imported from some wikipedia) and supplied wikidata with the corresponding VIAF number. Other bots sometimes took over and provided the item with other AC numbers found in the VIAF cluster. Thus small glitches had a tendency to multiply over time. And there are lots of "small glitches", it seems to be the usual practice in any wikipedia to make a copy of an article to create a new one about a completely different person. This works well except for the authority control templates, where even the most experienced users sometimes omit to delete inappropriate AC numbers from the copy job. Typically these kind of inappropriate numbers can reside there unnoticed for years.
The good news is, that there never was any bulk-transfer of AC numbers from wikidata to an individual wikipedia. Thus the general tendency is concentration at wikidata with its more thorough checks on uniqueness and single-valued-ness.
Even better news is that I have been actively fighting Wikidata:Database reports/Constraint violations/P214#"Unique value" violations for some weeks now. The current caspar bot task does not produce more than about a dozen new errors per day in that section, chiefly I presume because a huge "import AC numbers from en:WP" bot action just happened less than a month ago.
Because of its dynamic nature VIAF was always less suitable as a source than ordinary authority files. Since the recent switch of OCLC to use wikdata instead the english wikipedia as one source of input for VIAF, the situation is slightly more explosive than before: Some data inconsistency here has the potential to trick VIAF e.g. into inappropriate merging of two entities (of which one at the moment may not have a wikidata item at all) and hypothetical bots here too heavily relying on VIAF as a source may strenghten that by importing wrong data or data to inappropiate items in wikidata. This in turn may hinder VIAF to split the cluster in question at the opportunity of future reclustering runs, even in presence of additional data from third parties...
Generally I would rate the quality of AC data in wikidata as very good: In the majority of todays dealings with the VIAF violations report mentioned above the cause turned out to be duplicate items at wikidata, which otherwise would have remained unnoticed for unpredictable time. Thus we are acutally using Authority Control like everyone else and it works! -- Gymel (talk) 17:11, 29 April 2015 (UTC)
"Task: Removes redundant Authority control information and copies them to Wikidata." Excellent, we need more bots doing important tasks like this. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:50, 29 April 2015 (UTC)

The main aim is to move the authority identifiers permanently to Wikidata. 72 wikis use the Authority control template according to Template:Authority control (Q3907614). I'd like to propose the following steps to improve the data quality on all wikimedia wikis:

  1. enforce the lua version as implementation of the Authority control template (implement edit on wikidata-buttons?)
  2. create specific bot tasks for these wikis to copy and delete the identifiers
  3. solve the conflicts between wiki and wikidata
  4. remove the parameters from the templates and use only wikidata as place for them.

It would be good to collect information about these wikis. They could be categorized as phase 1, 2, 3 or 4 wikis. Regards, --T.seppelt (talk) 19:22, 3 May 2015 (UTC)

There's a stage 5 wiki: Add authority control templates to articles where Wikidata has an identifier and the wiki does not. We need to work more on this aspect I think. --Izno (talk) 16:04, 19 May 2015 (UTC)
That's true. My bot has this task on enwiki. It is a little bit controversal. AC info was used almost exclusively at articles about humans. We will have to consider local discussions at the wikis. --T.seppelt (talk) 16:09, 19 May 2015 (UTC)

Some information

@Atlasowa:, @Gymel:, @Pigsonthewing:, @Izno: I began to collect some information at Wikidata:WikiProject Authority control/Status. Kind regards, --T.seppelt (talk) 18:13, 19 May 2015 (UTC)

Before implementation we should look what identifiers are needed in the Wikipedias. Especially small Wikis don't need all identifiers and focus on the authority files that are relevant for their region. --Kolja21 (talk) 20:02, 19 May 2015 (UTC)
This should be easy. We would only improve existing templates with individual selection of identifiers. --T.seppelt (talk) 21:03, 19 May 2015 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

A few points:

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:44, 20 May 2015 (UTC)

User pages

I had to think about it a second time. Drafts should not use the template with paramters because this information can be retrieved after the article is moved to namespace 0. Please have a look at this statistic. The template is used almost never on most wikis on user's main pages. We are discussing very special cases. Regards, --T.seppelt (talk) 11:18, 22 May 2015 (UTC)

Keeping our mappings with authority control databases up to date in a structural / stable way

I'm in a conversation with KIK/IRPA, the Belgian equivalent of RKDartists, at the moment. We already have a property for their Balat-person database - BALaT person/organisation id (P1901) - and I am planning to match their identifiers to persons and organisations on Wikidata at a point in the near future. However, they are asking me a really good and justified question: how do we deal with updates to their database - new IDs for new persons/organisations that are added continuously? I think this issue is at play with all authority files that we are matching here. How to stay up-to-date with their changes and additions? If I understood well, Magnus might scrape their websites every now and then (but not consistently/regularly if I understand well?). And volunteers like Multichill also keep an eye on this. But that seems a bit random, also considering the increasing number of authority files that we are connecting with. I think we need a structural way of dealing with this - also because IMO we want to be a reliable partner to these authority data providers and to users of our data. I have no immediate ideas on how to deal with this. Does anyone have suggestions? Spinster (talk) 15:09, 16 July 2015 (UTC)

Why not have THEM add missing people to Wikidata. We are considering ways of comparing databases. So when they have someone truly new, they can add the data to Wikidata. When they know it exists with an identifier for another source they can add their identifier.. We can have a bot that does much of the legwork for them. Particularly when they want to share anyway, we can in this way add data based on the info they have.
The numbers may for now be relatively small, but they are still well into three figures and we mustn't break them. I anticipate they will grow considerably in future. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:41, 22 May 2015 (UTC)
Yes, we should not interfere any positive development. In this case our aim is consistency between Wikipedia articles and Wikidata entities. How do we achieve this without arguing about exceptions? A solution is to introduce a new template especially for the user namespace, designed to display ac information about Wikipedia contributors. On wikis with such a template we can finally decommission the parameters of the primary template. --T.seppelt (talk) 06:06, 24 May 2015 (UTC)

Cross-wiki tool

Hi everyone, I am working on a tool which can be used to fix the problems (malformed ac on the wikis, cross-wiki differences etc.) which my bot has detected. Please have a look on https://tools.wmflabs.org/kasparbot/ac.php, go through the problems and help me with improving the software. Kind regards, -- T.seppelt (talk) 19:42, 5 July 2015 (UTC)

Offtopic about ORCID

How about holding the authority control data by wikidata and remove the Authority control templates

Hi. I am now running a robot trying to add {{Authority control}} to every zhwiki page that has the authority control information. I think it is better display it as interwiki language links instead of adding a template, but I am not sure if the idea is reasonable. How about showing the authority control data like the language links displayed in the left block, and remove all authority control templates? --Kanashimi (talk) 10:46, 2 August 2015 (UTC)

I assume you add an empty {{Authority control}} template and that template just fetches all the data from Wikidata? With a bit of javascript and css magic you can display it in the sidebar. See for example this page. In the sidebar you'll see "In andere projecten" with a link to Commons:Category:Bloemendaal#mw-subcategories. This is filled by the Commons category template on the page. Multichill (talk) 13:23, 2 August 2015 (UTC)
@Multichill: Thank you for your reply. It is easy to add JS+CSS hack to the template. I just think, why not just showing the authority control data, if there is data in the wikidata? Then even we do not add the template, the information will still shown automatically. Since the data is hold by wikidata, I think it is possible to do this. --Kanashimi (talk) 05:34, 3 August 2015 (UTC)
@Kanashimi:, @Multichill: adding an empty template is worth the time until such a JS/CSS-solution is implemented by default at the Wikipedias. Another aspect is that the template offers the opportunity to decide whether an indentifier is displayed or not to all users with the privilege to edit the template. Manipulating the core web page is much more technical and in my eyes far above the (technical) understanding of most users. Putting AC information on the side bar (which is a very good idea) would cause a loss of transparency. -- T.seppelt (talk) 07:59, 4 August 2015 (UTC)
@T.seppelt:Thank you. By the way, I don't know how to use the tool to solve the conflicts between wiki and Wikidata. Are there more details? --Kanashimi (talk) 13:39, 4 August 2015 (UTC)
@Kanashimi: the tool lists all conflicts between wiki and Wikidata and other related issues. The basic idea is that the user takes a look on the issue, resolves it and presses the button at the right. After that the issue isn't shown any more at the list. You can select specific projects or error types. You can start with this selection. I have to set up a help page with more details. Feel free to ask anything. -- T.seppelt (talk) 16:29, 4 August 2015 (UTC)
@T.seppelt: For some pages, e.g., deprecated VIAF ids or other situations I can detect, I need to modify the data at wikidata and the tool by bot. Are there any APIs to do these? --Kanashimi (talk) 01:44, 5 August 2015 (UTC)
@Kanashimi: Sorry for the delay. You don't need to modify the information stored in the tool (via API). My bot updates the database routinely and detects your edits by itself -- T.seppelt (talk) 08:35, 17 August 2015 (UTC)

Locations

I noticed KrBot‎ popping up on my watchlist on items I created about locations. The bot added viaf links. I reviewed some of these edits and a lot of them are incorrect:

Could probably add some more and I'm quite sure @sjoerddebruin: had some of them too. The logic used here by viaf to match things seems to be causing quite a few mistakes. When did viaf start matching locations in the first place? What to do about this? Also @Ivan A. Krestinin:. Multichill (talk) 20:11, 8 August 2015 (UTC)

also ping @Gymel, Ralphlevan, ThomasBHickey, GerardM, Jura1:
All 3 samples do not have sitelinks. Do anybody know something about VIAF`s matching algorithm for this case? I excluded items without sitelinks from processing for now. — Ivan A. Krestinin (talk) 20:45, 8 August 2015 (UTC)
I also had many examples in my watchlist, which have plenty of sitelinks:
In my opinion, the current matching algorithm for geographic entities is quite bad (I'd eyeball the error rate to about 50% for geographic entities based on my watchlist from last month). In the future either VIAF or your bot should check the matchings for geographic entities before importing them to Wikidata. I could e.g. imagine that they are checked for proximity (by comparing the geographic coordinates in Wikidata to the ones in VIAF), even a treshold of as much as 100km would've prevented all four of the mismatches I mentioned above (e.g. the source records of German Nation library DNB have geographic coordinates quite often). --Floscher (talk) 13:35, 12 August 2015 (UTC)
Well, we are talking of KrBot's one time task to import all VIAF-Wikidata matches performed by VIAF we don't yet know about. Thus the outcome is somehow expected and we have two options to provide VIAF with hints to correct the matchings:
  1. set P214 to "novalue" instead of simply deleting the inappropriate value
  2. provide the correct item with the VIAF number (and delete it from the innapropriate item)
Of course, many erroneous matches will go undetected for the moment. But will not importing them be of any help in the long run? Not processing items without sitelinks might be legitimate since their meaning might be not as precisely defined as for items with sitelinks (but we would have to correct that here anyway?). Not importing from VIAF clusters which contain members of conflicting meaning might be even better - but obviously this cannot be detected unless someone inspects them... -- Gymel (talk) 21:03, 8 August 2015 (UTC)

Moved from User talk:Ivan A. Krestinin#Bot adding random VIAF
Hello Ivan, I've reverted[3] these additions[4] by the bot. Not sure where they're being sourced from… —Sladen (talk) 23:55, 10 August 2015 (UTC)

The value was imported from https://viaf.org/viaf/70720542/ Looks like VIAF matching algorithm is based on name and birth date only. I am not sure that error level of such algorithm is acceptable for us. Need I stop the import? — Ivan A. Krestinin (talk) 04:35, 11 August 2015 (UTC)
FYI, got a reply. Not sure about continue or stop. Do you have a rough idea of what the error rate currently is? Multichill (talk) 16:16, 11 August 2015 (UTC)
Viaf uses name, surname and birth/death dates for people matching as I see. So error rate has order <have same name probability> * <same surname> * <same birth year> * <same death year>. I think this expression has low value for peoples. But the issues have another side: low error rate is non-zero anyway. And we need efficient way to fix found errors. — Ivan A. Krestinin (talk) 20:35, 11 August 2015 (UTC)


Yesterday bot completed VIAF ID import for person items (P31 = Q5). Discussion above does not have some conclusion. So need bot import ID for remaining 105000 non-person items? — Ivan A. Krestinin (talk) 21:08, 26 August 2015 (UTC)

Result? Fixed or not?

Did a bot revert the VIAF errors? At VIAF or Wikidata? Will the errors be reimported again and again? --Atlasowa (talk) 21:26, 29 December 2015 (UTC)

Unfortunately, the Bot Request was never taken up. Your example indicates that VIAF has recently detected the problem with Dutch streets and dropped the mapping. Since it was a one-time action of User:KrBot (to ingest VIAF numbers missing here) there is no risk of repetition. (There might other bots running which clandestinely import data either from VIAF or from other data sets based on VIAF mappings which IMHO should never be done). -- Gymel (talk) 22:32, 29 December 2015 (UTC)
Thank you for the info, Gymel! --Atlasowa (talk) 17:08, 1 January 2016 (UTC)

Laregst Wikipedias with no/ poor authority control integration

It seems that the largest (more than 250,000 articles) Wikipedias with no authority control templates, or poor integration between their template and Wikidata, are:

  • de - no Wikidata integration; no ORCID in template
  • es - no template
  • nl - no template
  • sr - no Wikidata integration

Have missed any? What are their objections? How can we encourage them to join in this initiative? Do we have speakers of those languages, active here, who would be wiling to act as "ambassadors"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:03, 17 August 2015 (UTC)

German

We had this RfC on dewiki which prohibits using Wikidata in that way. I tried to discuss this here. There will not be any progress soon, I think. -- T.seppelt (talk) 09:13, 17 August 2015 (UTC)

In brief, what are the objections? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:15, 17 August 2015 (UTC)
Loss of control and data quality (the flagged revision debate...). According to the RfC direct Wikidata transclusion is only allowed for externally referenced claims. (There are more conditions, but they can be fulfilled easily.). Most Wikidata AC information are referenced by imported from Wikimedia project (P143)Wikipedia language edition (Q10876391)... -- T.seppelt (talk) 09:26, 17 August 2015 (UTC)
*sigh* All AC values are externally (self-) referenced, by default. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:35, 17 August 2015 (UTC)
Yes, I know. We have to wait for a new RfC to get suitable rules. -- T.seppelt (talk) 10:29, 17 August 2015 (UTC)

You can try to find dewiki ambassadors in this and related discussions :-) — Ivan A. Krestinin (talk) 09:24, 17 August 2015 (UTC)

A new discussion has started, at de:Vorlage_Diskussion:Normdaten#ORCID. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:30, 27 August 2015 (UTC)

Spanish

Some discussion here. The main objection seemed to be the use of a separate template; one editor (at least) wanted the values to be displayed in an infobox. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:30, 17 August 2015 (UTC)

This ORCID blog post on ORCID in Latin America may be useful. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:45, 25 August 2015 (UTC)

Double-check

Is it common, that some identifier is the same for more than one Wikidata item? If not, then maybe somebody could run a DB scan or whatever and check, how many Wikidata items have the same identifiers. --Edgars2007 (talk) 06:06, 22 August 2015 (UTC)

We already do this. See Wikidata:Database reports/Constraint violations/P214#"Unique value" violations, for example. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:54, 22 August 2015 (UTC)
Thanks. Edgars2007 (talk) 11:35, 22 August 2015 (UTC)

Stage 3.1 → Stage 4.1 ?

@Pigsonthewing: for most users at other projects Wikidata:WikiProject Authority control/Status is very confusing. It is important that we offer AC templates for the user namespace. In my eyes stage 3.1 belongs to stage 4. Can we regroup the stages in that way and maybe use only one column for 4 and 4.1? This would reduce the complexity of this table. We should also consider to create a new entity to connect all future user namespace Authority control templates. What do you think? -- T.seppelt (talk) 18:31, 29 August 2015 (UTC)

@T.seppelt: I placed it before item 4 because the user-page template should be available before the direct-entry parameters are removed from the existing template. Otherwise, data is made invisible, on user pages. I don't believe that including the name of the user-space template in the same column as "Done" would be wise, but I'm open to a reasoned counter-argument. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:40, 29 August 2015 (UTC)
@Pigsonthewing: yes, you are right. This is okay for me. I am going to propose an implementation for such an user template on enwiki soon. Regards, -- T.seppelt (talk) 06:21, 27 September 2015 (UTC)
see en:Template talk:Authority control#Authority control for user namespace. -- T.seppelt (talk) 06:53, 27 September 2015 (UTC)

Hello everyone,

I have a question concerning malformed Bibliothèque nationale de France ID (P268). My bot spotted about 640 pages in several wikis with Bibliothèque nationale de France ID (P268) with the format \d{8}. According to format as a regular expression (P1793) of this property all values have to match \d{8}[0-9bcdfghjkmnpqrstvwxz]. What happened to the last character in these 640 cases? How can I handle them automatically? Thanks, -- T.seppelt (talk) 19:28, 2 October 2015 (UTC)

It's a check digit mod 29 and VIAF does not show it... Cf. Property talk:P268#Prefix and check character for a pointer to s:User:Inductiveload/BnF ARK format. -- Gymel (talk) 23:28, 2 October 2015 (UTC)
Thank you very much. This was a huge step forward. -- T.seppelt (talk) 08:51, 4 October 2015 (UTC)

VIAF errors

Someone of frwiki reported a VIAF error ( discussion : fr:Marie-Guillemine_Benoist and fr:Marie-Élisabeth_Laville-Leroux seems to be two different person with the same VIAF. Is there a (documented if possible) procedure to mark such mistake as errors on Wikidata to track them and report it back to VIAF ? This might be more efficient than to report it to VIAF directly as someone suggested in the discussion. author  TomT0m / talk page 09:30, 6 October 2015 (UTC)

I added a "novalue" as VIAF ID (P214) for Marie-Élisabeth Laville-Leroux (Q3292321): This could suffice to signal VIAF to remove that item from the VIAF 71658103 in the next processing round (usually monthly). Depending on the case we treat duplicates differently here: If mostly a wikidata item is assigned to a wrong cluster, we remove the wrong VIAF number here (if at all set, VIAF did a matching independent of our settings for P214) and set the correct VIAF number (including novalue in that case) and try to provide as much data (especially exact birth and death dates) as possible. In other cases (VIAF cluster is inconsistent even after subtraction of Wikidata items from the picture) we keep the VIAF number on the different items and rely on VIAF taking into account the listing at Wikidata:Database reports/Constraint violations/P214#"Unique value" violations (currently, after a massive import of VIAF's own matchings this list has a huge number of false positives when it comes to non-persons). -- Gymel (talk) 10:00, 6 October 2015 (UTC)
Addition @TomT0m: You can report errors to VIAF via mail or leave a note at en:Wikipedia:VIAF/errors. The result is the same: nothing will happen. VIAF is harvesting authority control automatically. There is no intellectual work by a librarian done. --Kolja21 (talk) 10:17, 6 October 2015 (UTC)
Well, en:Wikipedia talk:Authority control/VIAF#OCLC Mechanisms indicates that VIAF has switched focus from en:WP to Wikidata and is well aware of the constraint reports here and their possible implications for the VIAF processes. And manual (or rather operator-triggered) operations on VIAF's side are possible. (Also they have their internal small "xA" authority file which serves to permanently overrule problematic clusterings). But it is true that there is no (active, working) reporting pipeline as might have been intended when setting up en:Wikipedia:VIAF/errors and my perception is that neither here nor at OCLC there is any intention to establish one. -- Gymel (talk) 10:43, 6 October 2015 (UTC)

Wikimania 2016

Only this week left for comments: Wikidata:Wikimania 2016 (Thank you for translating this message). --Tobias1984 (talk) 11:47, 25 November 2015 (UTC)

Duplicates in Wikidata/VIAF

FYI:

  • Wikidata-mailinglist: Duplicates in Wikidata "During the most recent VIAF harvest we encountered a number of duplicate records in Wikidata. Forwarding on in case this is of interest (there is an attached file – not sure if that will go through on this list or not). Some discussion from OCLC colleagues is included below."[5]

--Atlasowa (talk) 21:33, 27 December 2015 (UTC)

Thanks for notifying. BTW: Is there anyone taking care of:
--Kolja21 (talk) 23:48, 27 December 2015 (UTC)

Here is the list with the 315 duplicates: User:Kolja21/VIAF duplicates-20151223. --Kolja21 (talk) 03:44, 28 December 2015 (UTC)

Shouldn't they all be on Wikidata:Database_reports/Constraint_violations/P214#Unique_value ? --- Jura 07:50, 28 December 2015 (UTC)
No, for that someone first have to add the VIAF id as a property. P214#Unique_value shows different errors like: Tex Rubinowitz (Q2407532) (b. 1961) ≠ Tex Rubinowitz (Q3519315) (b. 1944). In some cases VIAF has mixed different persons in others Wikidata has errors. --Kolja21 (talk) 15:07, 28 December 2015 (UTC)
Thank you for the list. The majority of duplicates stems from imports of botanist author abbreviation (P428) from IPNI performed almost at the same time as the ingest of Specieswiki (author names) and is easily solvable, since identical names, birth and death dates are provided. But this should be performed manually in order to also add VIAF and other authority control numbers to the merged items. -- Gymel (talk) 10:02, 29 December 2015 (UTC)
They all have been dealt with now. -- Gymel (talk) 10:43, 31 December 2015 (UTC)

Great, thanks! Some good analysis and systematic questions on the mailinglist by User:Tom Morris and User:Spinster. --Atlasowa (talk) 18:04, 31 December 2015 (UTC)

Atlasowa: I don't post on the WIkidata mailing list. I'm different from (P1889) the Tom Morris (Tfmorris) who posts on the mailing list. We do move in some similar circles though so it can get quite confusing. Tom Morris (talk) 18:52, 31 December 2015 (UTC)
I think we should mentioned the good analysis Tom Morris just gave.
--- Jura 10:59, 1 January 2016 (UTC)
One remark to this post. I'm allways checking botanist author abbreviation (P428) via WDQ before creating a new item. The crucial point was the time overlap. There are around 1000 botanist author abbreviation (P428) imported from Wikispecies left which have no reference to an IPNI author ID (P586) --Succu (talk) 11:25, 1 January 2016 (UTC)
There were many instances in the run, where the Wikispecies import happened to match (or skip) an existing item with "regular" sitelinks and your Author Abbreviation import created a duplicate. Thus I think some basic checks (The VIAF findings were based on exact simultaneous matches of English Label, Birth and Death dates) always are appropriate. Or maybe it is an argument for channeling even seemingly trivial tasks like adding one property over semi-automatic means like Mix'n'Match. -- Gymel (talk) 13:21, 1 January 2016 (UTC)
I can also claim, that there were „many“ instances in the run, where the Wikispecies import mismatched the author to a completly wrong person (the total Wikispecies import has a lot of different issues to be fixed). Finding duplicates after creation is much easier than to avoid them. As far as I know there is no possibility in WDQ to query labels and to find persons with an equal birth- and/or death year. Sparql can do that of course and after fixing all the issues I was aware, I created User:Succu/SPARQL to make sure I can do more precious pre- and afterchecks. This query can help to fix the remaining unreferenced botanist author abbreviation (P428). But there are 20,000 authors left, which have only an english name(variant) and no birth- and death year, only a hint to floruit (P1317). I can imagine a lot of pre item creation checks using the given floruit year (e.g. label checks against author name string (P2093)), but I think most of them will fail because we a lacking data. So I'm open for suggestions. --Succu (talk) 23:37, 1 January 2016 (UTC)

Documentation

Where is the documentation for users who need to quickly and easily edit Authority Control that has been moved here from Wikipedia? -- Erika aka BrillLyle (talk) 02:21, 19 January 2016 (UTC)

Good point. There is no guide as far as I know. It would be great if somebody could tidy up Wikidata:WikiProject Authority control (translate to other major languages, move content to subpages etc.) and works on a guide on this. It's getting probably even more important after the implementation of the new identifier data type. -- T.seppelt (talk) 18:21, 24 January 2016 (UTC)
That documentation is probably Help:Statements. --Izno (talk) 14:29, 25 January 2016 (UTC)
@T.seppelt: & @Izno: cc: @Addshore: -- Thanks for the responses.
I would be interested in working on tidying up the Wikidata:WikiProject Authority control, though translation is not my specialty.
I would also be interested in creating some sort of general guide on this.
I understand that Help:Statements are critical but I am envisioning something a little bit more user-friendly and general. It would not be as fancy as what exists on the Wikidata:Tours but would be for GLAM-type folks like me (I've got a Library Science degree which makes me a bit obsessed with Authority Control). I see some sort of pathway between Template:Authority control and Wikidata....
I suspect that there needs to be a similar type of guide for the Infobox data too possibly -- although it hasn't been depreciated like Authority Control -- though maybe that's another discussion. If this is successful I would be happy to assist with that.
Thanks again! -- Erika aka BrillLyle (talk) 23:30, 26 January 2016 (UTC)

Is there a tool?

Vladimir Alexiev Jonathan Groß Andy Mabbett Jneubert Sic19 Wikidelo ArthurPSmith PKM Ettorerizza Fuzheado Daniel Mietchen Iwan.Aucamp Epìdosis Sotho Tal Ker Bargioni Carlobia Pablo Busatto Matlin Msuicat Uomovariabile Silva Selva 1-Byte Alessandra.Moi CamelCaseNick Songceci moz AhavaCohen Kolja21 RShigapov Jason.nlw MasterRus21thCentury Newt713 Pierre Tribhou Powerek38 Ahatd JordanTimothyJames Silviafanti Back ache AfricanLibrarian M.roszkowski Rhagfyr 沈澄心 MrBenjo S.v.Mering

Notified participants of WikiProject Authority control I'm a complete newbie when it comes to large datasets and their merger, which apparently is what this project is all about. However, every now and then I do import authority control numbers to Wikidata, mostly from VIAF. Take a look at Russian State Film and Photo Archive (Q4398058): it's 9 identifiers, all of them imported by hand. It's a tedious task and I wonder if there is a tool that makes it easier to batch import ids for a particular Q. Anyone? (BTW, please ping me). Halibutt (talk) 09:47, 14 August 2017 (UTC)

Hi @Halibutt: Some weeks ago I had a similar problem, namely adding GND ids for all economists which had a VIAF id but no GND id in Wikidata. My approachwas to direct a query to a custom-built SPARQL endpoint with all the VIAF data, selecting in a subquery to the Wikidata service all WD items with VIAF and without GND, and then obtain the GND ids from the VIAF dataset. The result was about 12,000 VIAF-GND pairs which I syntactically transformed to a QuickStatements2 input file, and worked that in batches of some thousand. An additional hint: I've learned that it is considered be appropriate to ask for a bot flag, before using QuickStatements2 on a large scale. The discussions in the Project Chat and in the bot approval process were really helpful. Jneubert (talk) 10:08, 15 August 2017 (UTC)
Thanks @Jneubert:, it seems a great way - for people who understand how all that magic works :) Sadly, I don't (okay, I do get the basics, but it would take me ages to replicate it). I wonder if there's a simpler method for casual WD users like me who would like to add authority control links to particular entries. Say, paste a VIAF url and the tool would automagically add those VIAFs, GNDs, NUKATs and whatnot. Halibutt (talk) 13:30, 15 August 2017 (UTC)
@Halibutt: That seems reasonable. Unfortunately, I currently cannot write such a tool. Perhaps you should ask again in the Wikidata:Project chat. Maybe a similar tool already exists, or somebody takes action to create it. A starting point could be the "justlinks" function of the VIAF API: http://viaf.org/viaf/123911488/justlinks.json (if you know the VIAF id), or http://viaf.org/viaf/sourceID/WKP%7CQ4398058/justlinks.json for Wikidata item Q4398058 (so you would not even have to build the VIAF URL yourself, if it is present in the item), give you all the links VIAF has. Jneubert (talk) 14:02, 15 August 2017 (UTC)
@Jneubert: Thanks, posted my question here, hopefully someone will be interested. Halibutt (talk) 08:45, 23 August 2017 (UTC)
@Halibutt:, Is this what you're looking for? Wikidata:Tools/User_scripts#Authority_control. It will create an "Authority control" link in your toolbar (usually on the left of your display). It's for use on a single WD entry. It parses the links via VIAF. Hazmat2 (talk) 14:31, 11 October 2017 (UTC)

@Hazmat2: thank you! Precisely what I needed! Halibutt (talk) 00:49, 12 October 2017 (UTC)

Thesauruses with hierarchical structure

Vladimir Alexiev Jonathan Groß Andy Mabbett Jneubert Sic19 Wikidelo ArthurPSmith PKM Ettorerizza Fuzheado Daniel Mietchen Iwan.Aucamp Epìdosis Sotho Tal Ker Bargioni Carlobia Pablo Busatto Matlin Msuicat Uomovariabile Silva Selva 1-Byte Alessandra.Moi CamelCaseNick Songceci moz AhavaCohen Kolja21 RShigapov Jason.nlw MasterRus21thCentury Newt713 Pierre Tribhou Powerek38 Ahatd JordanTimothyJames Silviafanti Back ache AfricanLibrarian M.roszkowski Rhagfyr 沈澄心 MrBenjo S.v.Mering

Notified participants of WikiProject Authority control

The property broader concept (P4900) is now available, to be used as a qualifier on the 'external ID' statement matching from a Wikidata item to a thesaurus entry, so that hierarchical relationships in the external thesaurus can now be represented on an item here in Wikidata. The talk page there includes examples of use of the qualifier, and a practical application, viz a query to find all items corresponding to entries under "costume accessory" in the Getty AAT thesaurus (Art & Architecture Thesaurus ID (P1014)), identifying which have upward relationships in the thesaurus (Art & Architecture Thesaurus ID (P1014)) that cannot as yet be 'explained' by our existing subclass of (P279) relations.

It seems to me it would be useful to start a page identifying which of the external sources that we match to have well-developed hierarchical structures, and then to what extent it has been imported and represented via broader concept (P4900), and to what extent it may have been reviewed against our own hierarchical relations.

Either here or Wikidata:WikiProject KOS seem a good place to host such a survey. Indeed, I was thinking it might make sense to make Wikidata:WikiProject KOS a sub-project of this one, and devolve all work on authority control involving thesauruses and controlled vocabularies with hierarchical structure to there, saving this project primarily for authority control for people and individual objects, rather than concepts. But on the other hand the WPKOS pages seem a bit confused at the moment, and perhaps need quite a lot of bringing up to date to reflect what has become the established practice, of using classes of items to represent both the items contained and also the concept of the class, in the same item.

Anyway, whether here or at WPKOS, I do think a page would be useful to track the hierarchical information in external sources, and to what extent that external hierarchical information is now available within Wikidata, via broader concept (P4900).

So far, use of P4900 has been in conjunction with Wikidata:WikiProject Fashion, with hierarchical information from Europeana Fashion Vocabulary ID (P3832), and from Art & Architecture Thesaurus ID (P1014) below costume (Q9053464) now added. Jheald (talk) 15:03, 5 March 2018 (UTC)