Wikidata talk:Primary sources tool/Archive/2015

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Google Books/News

Primary sources tool mockup

To mockup looks already good. It would be even nicer, if someone finds a source of a statement on Google books or Google news, not only an url reference is added but an extended reference according to Help:Sources is created. --Pasleim (talk) 23:21, 17 December 2014 (UTC)

+1 on that (maybe Phase 2?) - PKM (talk) 00:58, 18 December 2014 (UTC)
+1. And obviously not just Google books :) --Denny (talk) 18:23, 18 December 2014 (UTC)
I'd like to see options for scholarly references (e.g. PubMed, arXiv, Google Scholar, CrossRef). --Daniel Mietchen (talk) 00:36, 22 December 2014 (UTC)

See also Wikidata:Referencing improvements input. I think what is needed is a reference tool for wikidata that is a combination of these tools:

1) Sourcerer by user:Magnus Manske (Wikidata:Tools/User scripts#Sourcerer) which provides refs/links from the corresponding Wikipedia articles
Sourcerer screenshot
2) autofilling of references via mw:Citoid service (mw:VisualEditor/Design/Reference_Dialog) after input of URL or DOI, ISBN etc. Citoid relies in part on zotero, currently outputs cite_web and cite_news. (see tools en:User:Salix_alba/Citoid and en:User:Mvolz/veCiteFromURL) and autofilling of references via Wikipedia template filling by user:Diberri, currently outputs cite_book, cite_journal, cite_web etc.
3) a template form/input dialog box for manually correcting the autofilled values, like i.e. MakeRef or Templator Vorlagen-Generator or en:Wikipedia:RefToolbar.

Just automatically adding the first google search result as a "reference" is not OK. --Atlasowa (talk) 09:22, 18 December 2014 (UTC)

Thanks, I completely agree. I hope no one's planning to just use the top Google search result, that wouldn't be particularly useful. Thank you especially for the links to Sourcerer, Citoid and MakeRef - I hope that we can reuse some of that work. I especially wonder if the tool could be mostly implemented as a Gadget (like Sourcerer) or if it should be a stand-alone Website (like the Wikidata Games)? Anyone an opinion? --Denny (talk) 18:23, 18 December 2014 (UTC)
@Denny: Have you taken a look at intelwiki [1] [2]? "This thesis investigates whether an intelligent system that automatically generates resource recommendations could make the process of editing Wikipedia articles easier. This investigation involves the following research questions: 1) How should a system generate resource recommendations? 2) In what way should a system present the recommended resource materials to the user? 3) Does having streamlined access to recommended resource materials make it easier for users to edit Wikipedia articles?" [3]
Seems clearly on topic, even though their "laboratory evaluation with 16 novice Wikipedia editors" is not sufficient testing. --Atlasowa (talk) 23:38, 4 January 2015 (UTC)

Disable all wikimedia.org sites as source?

Would it be an option to disable all wikimedia.org sites from the URLs than can be used? Or is there a good use case in using a wikimedia site as a reference?--Pavel Richter (WMDE) (talk) 13:28, 23 December 2014 (UTC)

Wikisource might be a valid option, especially if the original source of the Wikisource document is not available any more, or not online (see also #Link rot below). --Daniel Mietchen (talk) 11:12, 30 December 2014 (UTC)
Wikisource works should be presented as items, not URLs in references. —Wylve (talk) 20:57, 4 January 2015 (UTC)

Link rot

I like where this tool is heading, but the workflow as currently sketched out ignores link rot. I suggest to add an automated archiving step in order to address that. For a tool that does this already, see ru:Участник:WebCite Archiver. --Daniel Mietchen (talk) 11:12, 30 December 2014 (UTC)

Hi Daniel Mietchen, you're absolutely right about linkrot and archive-links. Have a look at Auto-archive@Talk:Cite-from-id, where i mentioned archiving links of wikiwix@frenchWP and of webarchive@enWP. MVolz proposed to open up a task on phabricator citoid board https://phabricator.wikimedia.org/project/board/62/ Could you please open a task? And post the link here? (I don't contribute to phabricator because it demands my email address) Thank you! --Atlasowa (talk) 09:12, 13 February 2015 (UTC)
Done: https://phabricator.wikimedia.org/T89438 . --Daniel Mietchen (talk) 09:41, 13 February 2015 (UTC)

Great idea

I wonder however if it would be possible (or better) to make it consensus-based - i.e. to make it a real reference, 3 users have to agree on the same reference. Since we propose the suggestions, agreement on the good ones should be quickly reached. Same for deleting bad claims. --Smalyshev (WMF) (talk) 18:35, 16 January 2015 (UTC)

It could be eventually expanded to do this, but this is not where it is going. I think a consensus based approach might maybe make sense if references are not given too much weight, i.e. for unreferenced statements, but when we explicitly ask for references to be added, then I don't think voting is the best way. --Denny (talk) 23:48, 4 February 2015 (UTC)

As a Gadget

After talking with a few folks at Wikimedia and at the DevSummit, we have rethought the proposal and decided that it makes more sense to have it integrated in Wikidata tighter, i.e. as a Gadget then an external Website.

Since we will retain a server-based component for providing the statements and storing the metadata about them, it would actually be quite easy to also add a Website that provides the originally planned tool.

Any opinions? --Denny (talk) 23:51, 4 February 2015 (UTC)

Problem with IE?

With Internet Exlorer 11 on Win 8.1 the link of suggestion don't work,  :( --ValterVB (talk) 19:43, 1 April 2015 (UTC)

Reported here. --Denny (talk) 20:24, 1 April 2015 (UTC)

Taged?

Is there a tag to track which references are added by this tool? --Succu (talk) 20:49, 1 April 2015 (UTC)

Good idea. Tracked here. Do we need to register the tag first? --Denny (talk) 21:03, 1 April 2015 (UTC)
I have no idea. The suggestion was inspired by this change of yours. Is a private blog like http://andreacefalo.com/ reliable? --Succu (talk) 21:23, 1 April 2015 (UTC)
My question would be 'is it an improvement to the previous situation?', but it's up to the community to decide. If we have (implementable) criteria for what makes a good reference we can filter the ones out that do not.
I checked all the Wikipedia articles connected to Conrad I of Hochstaden (Q77882), and none of them has a reference for that particular claim (as an 'Einzelnachweis').
The German article lists a number of books as literature which should have it if it is true and which would obviously be better, also the Italian has one book. --Denny (talk) 21:55, 1 April 2015 (UTC)
I guess if a PhD thesis is questioned as a reliable source by you, not many of the URLs our data currently suggests would meet your threshold, though. --Denny (talk) 22:19, 1 April 2015 (UTC)
Mixing topics? Wow! A by random selected URL should be a problem for all of us. --Succu (talk) 22:28, 1 April 2015 (UTC)
It's not a random URL. It's a URL that should support the claim. As said, I think the question is, whether it is an improvement. If not, then it should be reverted, and the dataset with suggestions be cleaned up to reduce the number of URLs that do not improve the situation. --Denny (talk) 22:37, 1 April 2015 (UTC)

Colors ...

The instructions say "you will see references or whole statements with a green background color, which you can simply accept or reject, or edit." The color is blue for me (Chrome on Win7). Not sure if this is a bug or a feature... PKM (talk) 00:44, 2 April 2015 (UTC)

Yes, you are right. Corrected if. Thanks! --Denny (talk) 15:33, 3 April 2015 (UTC)

URL blacklist

Hey :)

I've started a list of URLs we don't want to be suggested by the Primary Sources Tool for referencing. You can find it at Wikidata:Primary sources tool/URL blacklist. Please add.

The implementation of using this blacklist is tracked at https://github.com/google/primarysources/issues/11. --LydiaPintscher (talk) 16:04, 6 April 2015 (UTC)
Great idea, thanks! I just did a quick check on the data, and removing the one suggestion so far would already remove more than 20k suggestions, which is really good. --Denny (talk) 16:33, 6 April 2015 (UTC)

iframe appearing from time to time

From time to time I see a small iframe appear at the bottom left of the page which loads Wikidata:List of properties/all which appears to be coming from User:Tomayac/freebase2wikidata.js (I can make it show up again by deleting f2w_properties from the browser's local storage). It would be nice if it could load the list of properties without having a visible iframe. - Nikki (talk) 13:06, 24 April 2015 (UTC)

I'm seeing the same. --LydiaPintscher (talk) 13:09, 24 April 2015 (UTC)
This is still happening. Sjoerd de Bruin (talk) 08:23, 3 July 2015 (UTC)
Also noticed this several times today. Mbch331 (talk) 10:48, 16 July 2015 (UTC)

P21 suggestion

Hi! I see that the tool suggests female organism (Q43445) instead of female (Q6581072) for property sex or gender (P21). For example, this happens in Emmanuelle Béart (Q106458). It's OK? --Escudero (talk) 13:37, 29 April 2015 (UTC)

This is already tracked on GitHub --Mineo (talk) 16:43, 29 April 2015 (UTC)

Providing data about already imported statements

Hi,

the "Random Freebase item" link just took me to Q315711 where a new Property:P434 statement is suggested (although it already has one, but that's not relevant for now). All the MusicBrainz IDs that are linked to Wikipedia/Wikidata pages are automatically imported by User:MineoBot, so it seems like a waste of time to show them again, just to let them get rejected. The code of User:MineoBot logs everything it does. Those logs include the property ID, the Wikidata item ID and the MusicBrainz ID added to WD. That's exactly what the statements in the sources tool provide, so if you're interested, I can extract this data from the logs and make it available to mark the statements as approved in the tools database. --Mineo (talk) 12:47, 20 May 2015 (UTC)

problem with "random freebase item"

The Randow Freebase item lead me to Pinetop Smith (Q1095520), but then only looped on the same item.

Using this link from Wikidata:Primary_sources_tool only loops on the page :/ --Hsarrazin (talk) 21:58, 9 June 2015 (UTC)

Hello Helene, I don't manage to reproduce the issue. Could you purge your browser cache and see if this issue is happening again? Cheers, Tpt (talk) 19:50, 10 June 2015 (UTC)
Did you by any chance open the "Random Freebase item" link by anything other than left-clicking on it? It's not a normal link, but works (iirc) as an on-click handler on the text, which is only called by left-clicking. --Mineo (talk) 08:28, 12 June 2015 (UTC)

Bug

  1. Adding references always fails because everytime it always try to add empty references.
  2. If the tool suggests items without label in interface language, error "Cannot read property 'value' of undefined" occurs instead of doing label fallback. Test the bug here.
  3. The word "references" is not localized.
  4. URLs of references should be displayed as links (http://example.com) instead of in quotation marks ("http://example.com").

--GZWDer (talk) 03:39, 12 June 2015 (UTC)

Blacklist not working?

The first domain listed on Wikidata:Primary sources tool/URL blacklist is ebay.com, but http://www.ebay.com/itm/J-POP-Morning-Musume-Early-Single-Box-9CD-JAPAN-LIMITED-cd71-/400500834873 was one of the suggestions I just got for EARLY SINGLE BOX (Q1188639). Is the blacklist not working properly, or has it just not been implemented yet? - Nikki (talk) 15:26, 14 June 2015 (UTC)

3 x primary sources gadget

Hi, i activated the primary sources gadget and clicked "Random Primary Source item" 3 times:

  • Drunk drivers (Q5309330): I'm asked to add a statement "Freebase identifier", stated in Samsung Freebase-Wikidata mapping
  • Aly Cissokho (Q18881) "Welsh Football player": I'm asked to add another "country of citizenship" (before: France with 0 references), "Senegal" and get 3 useless links to facebook as references (https://www.facebook.com/public/Faty-Cissokho)
  • Melanocyma faunula (Q6811449): I'm asked to add a statement "Freebase identifier", stated in Samsung Freebase-Wikidata mapping

Facebook and Freebase? Deinstalled. --Atlasowa (talk) 10:25, 2 July 2015 (UTC)

And regarding "several hundred thousand" nationality claims by Freebase, uhhhh. --Atlasowa (talk) 10:38, 2 July 2015 (UTC)

Where does it come from ?

How does the tool generate these three references? http://www.freebase.com/m/01_xqj has the statement, but not the urls. None of the listed pages even makes the statement. Though one of the website mentions it somewhere else. --- Jura 13:02, 5 July 2015 (UTC)

The sources are not from Freebase but from an internal Google project, called Knowledge Vault, which aim is to extract informations from the web. A paper have been published last year about it: https://www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf Tpt (talk) 02:00, 6 July 2015 (UTC)
Interesting read, thanks for the link. Doesn't it primarily explain how the statements are generated rather than where the links come from? I should probably re-read. --- Jura 10:51, 10 July 2015 (UTC)


Suggestions

On Q1335635, "Spain" gets suggested as place of birth (we already have Alicante). Somehow 1942 gets suggested a 2nd time as date of birth. --- Jura 10:51, 10 July 2015 (UTC)

Also, for Q91, "President of the United States" and "gunshot wound" is suggested a second time (w/o qualifiers).
On Q76, an award is suggested twice (good) as it's for two different works at different dates.
Can I opt out of getting suggestions for P172 and P738? --- Jura 12:32, 10 July 2015 (UTC)
Thank you for reporting these issues. I'll investigate on them after having worked on performance issues. Tpt (talk) 16:52, 10 July 2015 (UTC)
I'm not sure if there is an easy solution for Spain. At some other item, the suggestion (of a city) made me realize that the POB at Wikidata needed correction. At Q91#P20 I accepted the suggestion Washington and set it to depreciated. Not sure which way is ideal. Honolulu keeps getting suggested for Q76#P19 --- Jura 17:21, 10 July 2015 (UTC)
A solution would be to ask a Wikidata query service if the suggestion has any of the existing values inside them. - Jan Zerebecki 23:01, 10 July 2015 (UTC)
For suggestions like "Spain", yes, there is no easy solution because calling a query service is a very expansive operation. For second time suggestions like "gunshot wound" for Abraham Lincoln (Q91) it seems that the filter that I have written to handle this kind of cases hasn't work. I'll investigate on a way to remove this kind of statements from the database. About Abraham Lincoln (Q91) birth place I believe it's better to set the city as "normal" and the precise place as "preferred" in order to respect the semantic of "deprecated". About opting out specific properties, it would be very nice. But I believe I'll work before on an "opt-in" feature in order to be able to review only one property. Tpt (talk) 21:11, 13 July 2015 (UTC)
I have just fixed the issue of duplicate claims like "President of the United States" for Abraham Lincoln (Q91). Tpt (talk) 22:57, 13 July 2015 (UTC)
Looks good, I haven't seen any yet. BTW, re Spain: maybe we could not suggest countries for some fields (place of birth, death) when a value is present.
For religion, "Catholicism" was just changed everywhere into "Catholic Church". At Q946, the old value gets suggested. --- Jura 07:39, 14 July 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── On Q543470 1946 as DOB gets suggested another time. Q21 is frequently suggested for people with P27=Q145. I wonder if we shouldn't get ride of all reference suggestions for gender if it matches the one already listed. I doubt things like the three imdb titles suggested at Q118988#P21 do any good. --- Jura 12:03, 15 July 2015 (UTC)

The 1946 on Q543470 happens because the already existing year is encoded with 1946-01-01T00:00:00Z and not with the usual 1946-00-00T00:00:00Z. For United Kingdom (Q145) vs England (Q21) what do you think about replacing all country of citizenship (P27)=England (Q21) with country of citizenship (P27)=United Kingdom (Q145) and then do the usual duplicate filter. It may be an easy solution if Wikidata prefers to use United Kingdom (Q145) than England (Q21) for this property. About sex or gender (P21) references if you think that the quality of proposed references is too low, yes, I can remove them from Primary Sources. Tpt (talk) 17:19, 15 July 2015 (UTC)
I'd drop the references for P21 if there is an existing property with the same value. Maybe we want to think about which other basic properties we could do the same.
As for p27, let me come back on that. P27 can be tricky. --- Jura 15:57, 16 July 2015 (UTC)

Data

Please exclude taxon name (P225) (44,560) and taxon rank (P105) (134,705). I think the probability introducing old errors is high. BTW we have more than 1,900,000 taxa. --Succu (talk) 05:46, 14 July 2015 (UTC)

✓ Done. Tpt (talk) 17:43, 15 July 2015 (UTC)

Stats per property

Wikidata:Primary_sources_tool#Statements_per_property looks impressive.

Is there a way to say for how many items we could get newly a specific property? Supposedly now "date of death" for John Paul II (Q989) adds three statements as the tool suggests three links. These three wouldn't fill gap of 890962 items mentioned here --- Jura 08:23, 14 July 2015 (UTC)

Yes, "date of death" for John Paul II (Q989) adds 3 statements. In Wikidata:Primary_sources_tool#Statements_per_property I don't count the statement already in Wikidata. For example, I have no source to suggest for the death date of Ulrich Frédéric Woldemar, Comte de Lowendal (Q498) that is the same in Freebase and Wikidata, so there are no statements in Primary Sources about it, and, so, it not counted. Then, I am not sure that we have a death date for every person in Freebase. But, if you are interested I can output the number of claims for each property created by my mapping tool (before having filtered what is already in Wikidata). Tpt (talk) 20:42, 14 July 2015 (UTC)
I think it would be interesting to have an idea of the scale.
For the pseudo-random sample "Lucy" I worked on, the percentages of People_charts#The_3_main_properties are at 100%, 100% and 74%. The following two are at 41% and 92%. These last three percentages could still improve. Freebase identifiers are at 11%. Note that some charts combine several properties. --- Jura 12:28, 15 July 2015 (UTC)

I add new statements and it is incorrectly attributed to the Primary sources tool

The PST is really a pain in the rear end. When I add manually statements it does not mean that it is done by the PST. Only when I approve or reject claims the PST is involved. Thanks, GerardM (talk) 13:27, 14 July 2015 (UTC)

Do you mean that the PST adds the comment "added with the Primary Sources tool" even when you do regular Wikibase statement edit? If yes, it's a strange bug I don't manage to reproduce. Tpt (talk) 20:28, 14 July 2015 (UTC)
Haven't seen that either. BTW shouldn't it add "imported from Freebase" when this is where the data comes from? --- Jura 12:29, 15 July 2015 (UTC)
When I "approve" a statement, I do not want to be credited for an edit. It is not mine. At round about the same time I added the same statements using a tool. This is why the confusion. IMHO this tool is brain dead. It does not show as an existing statement when you do not have the gadget enabled. It is horribly wrong in conception. Why should it not be seen as a statement? The only situation where this might make sense is in using it in Wikipedia but really. Thanks, GerardM (talk) 11:19, 16 July 2015 (UTC)
If you add manually a new statement to Wikidata with the gadgez enabled, the statement disappears (but is added to Wikidata). It only appears once the page is reloaded. There used to be a similar bug with the Wikidata interface when one started adding one statement, but then added another before completing it. --- Jura 13:01, 16 July 2015 (UTC)

Very slow process

I noticed the same behaviour, but did not think it was due to the gadget - thank you Jura1.
Also, the process is very long : the initial upload of every item is very, very long. And every approved or disapproved claim forces the refreshment of the item, which makes it very tedious to approve data. It's quicker to manually add claims without references by copying them :/ — for some items, where a lot of claims where proposed, I had to refresh more than 10 or 12 times… it's very, very, long.
could the behavior be changed, so that approved or rejected claims or refs only are marked as such (or added to the back of the list, like the Magnus Manske/wikidata_useful.js does ? --Hsarrazin (talk) 10:21, 28 July 2015 (UTC)

@Tpt: Please also exclude this property as the data are not in the correct format (₁₂₃₄₅₆₇₈₉₀ instead of 1234567890).--GZWDer (talk) 15:01, 16 July 2015 (UTC)

@GZWDer: ✓ Done If the difference is only ₁₂₃₄₅₆₇₈₉₀ instead of 1234567890 what I could do is convert the Freebase data for the property and upload them again, this time in the good format, into Primary Sources. What do you think about it? Tpt (talk) 17:22, 16 July 2015 (UTC)
@Tpt: Also, the tool should not suggest value if there's already one. For example, ethanol (Q153)'s chemical formula may be C2H5OH or C2H6O, but we only need one.--GZWDer (talk) 18:54, 16 July 2015 (UTC)

Link reference URL?

@Tpt: URLs in references should be linked. Currently we must copy the URLs and paste them to the browser address bar in order to view them. When we click the URLs the related webpages should be opened in a new window/tab.--GZWDer (talk) 15:10, 16 July 2015 (UTC)

@GZWDer: I don't manage to reproduce this issue. Could you give me more details about your web browser? Tpt (talk) 17:28, 16 July 2015 (UTC)
@Tpt: You can check it there. the URLs are not linked. P.S. I have reported it one month ago. The quotation marks are removed but the URL is still unlinked.--GZWDer (talk) 17:32, 16 July 2015 (UTC)
On the latest Firefox and Chrome on Ubuntu 14.04 and OS X 10.11 the URLs are linked. I can't solve your issue if I don't know which browser you use. Tpt (talk) 17:36, 16 July 2015 (UTC)
@Tpt: Is it linked in Windows? However I have used an old version of browser. You can try to reproduce it in Chrome 31.--GZWDer (talk) 18:50, 16 July 2015 (UTC)

Strange genre

I get Danish (Q9035) as a genre for Armadillo (Q1029748). That's absolutely illogic. Mbch331 (talk) 15:39, 16 July 2015 (UTC)

Maybe it comes from there: https://www.freebase.com/m/0chtrm5#%2Fmedia_common%2Fnetflix_title --- Jura 15:46, 16 July 2015 (UTC)
Yes, exactly. See the mapping. If data provided by this property are too bad we could remove them from the mapping. Tpt (talk) 17:30, 16 July 2015 (UTC)

Wrong (very wrong) claims without sources...

Sometimes the tool proposes very strange claims (such as a Country in occupation).

As there is no reference, there is no way to know where this comes from, or how to signal it. Could there be a link of some sort, to be able to check claims without references, so that we know what source (Freebase source, I mean), this comes out from ? --Hsarrazin (talk) 10:24, 28 July 2015 (UTC)

All data currently in Primary Sources are from Freebase, and, more exactly, from the Freebase topic mapped to it. Could you give me some examples of such errors? It would be nice to see if the root cause of these errors is in the mapping process or these wrong claims are also in Freebase. 16:46, 28 July 2015 (UTC)
ok. Next time I'll note the Q… of item ;) - I just rejected them when found them. --Hsarrazin (talk) 20:11, 28 July 2015 (UTC)


Where's his wife now?

Primary sources suggests a location qualifier for the spouse property (spouse (P26)). Supposedly, this is the place where the marriage was celebrated, but currently only start date and end date are accepted as qualifiers for P26. I think we should either create a dedicated qualifier or move this to "significant event". --- Jura 11:29, 4 August 2015 (UTC)

Yes, I believe it would make sense to create a dedicated qualifier. But it is only a personal opinion. Feel free to do a property proposal/open the discussion somewhere else more visible. Tpt (talk) 17:07, 4 August 2015 (UTC)

Allow editors to select properties of interest

I requested the creation of Integrated Postsecondary Education Data System ID (P1771) and now I'm thrilled to see that Primary Sources picked up almost 4000 entities! I'd like to be able to use the Primary Sources tool to choose a random page that needs P1771 reviewed. Can you allow users to set P values (a whitelist) in the Random Item tool? Runner1928 (talk) 18:46, 9 August 2015 (UTC)

This feature is not implemented currently. But the "Primary Sources list" tool available in the tools section of the sidebar is able to list all simple statements with a specific property and may fit your needs. Tpt (talk) 01:04, 10 August 2015 (UTC)
That's exactly what I was hoping for. Thank you! Runner1928 (talk) 02:52, 10 August 2015 (UTC)
Great! If the suggested statements for Integrated Postsecondary Education Data System ID (P1771) are good enough, I could use TptBot to add them all to Wikidata. Tpt (talk) 17:04, 10 August 2015 (UTC)
I haven't found a suggestion error yet in about 40 entities. I'd love TptBot's help. Runner1928 (talk) 17:44, 10 August 2015 (UTC)
✓ Done. See User:TptBot/Existing_statements#Integrated_Postsecondary_Education_Data_System_identifier_.28P1771.29 for the few detected conflicts. Tpt (talk) 20:39, 11 August 2015 (UTC)
Thank you very, very much. It means a lot to me. I'll work through the conflicts soon. Best wishes, Runner1928 (talk) 16:01, 18 August 2015 (UTC)

Reject Claim

The function reject claim doesn't work as I would expect it to. When a suggested claim has multiple references, the top reference gets rejected when I click reject claim. Only when 1 reference is left it actually rejects the claim. I would expect that when I click reject claim, it would reject the claim and all references. (Otherwise I would reject specific references and not the claim). Mbch331 (talk) 07:56, 11 August 2015 (UTC)

I have already reported that on GitHub. --Mineo (talk) 12:17, 11 August 2015 (UTC)
Good to know it's been reported. Hope it can be solved soon. Mbch331 (talk) 12:45, 11 August 2015 (UTC)

Convert this page to Flow?

I proposed to convert this page to a Flow board. Any users oppose?--GZWDer (talk) 07:20, 13 August 2015 (UTC)

I'm opposed to it for the reasons other people gave on Wikidata:Contact_the_development_team#Convert_this_page_to_Flow.3F. - Nikki (talk) 09:01, 13 August 2015 (UTC)

Add a new statement with qualifier or add qualifiers to existing statement?

In general, it would be preferable to suggest qualifiers to existing statements rather than propose new statement with qualifiers. Sample: "start date" for spouse (P26) at Q175910.

There are few properties where several statements with the same value are possible/likely (maybe "award received" or "position held"). --- Jura 05:24, 21 August 2015 (UTC)

Thank you for the suggestion. I have opened a bug on GitHub to track it. Tpt (talk) 01:58, 22 August 2015 (UTC)

Edit summary

Currently, the edit summary of edits made using the tool simply states things like "Added claim: Added via Wikidata:Primary sources tool", leaving out which claim/ qualifier etc. has been added, created, removed etc. I suggest to change this to follow the general practice here, so as to makes it easier to check the tool's edits. --Daniel Mietchen (talk) 21:58, 21 August 2015 (UTC)

It should be done in a few days/weeks when the change that fixes phabricator:T97247 will be deployed. Tpt (talk) 01:54, 22 August 2015 (UTC)

statistics

Could we please have statistics on this tool?

  • the growth rate of its data
  • the speed in which data moves into Wikidata

How else can we deduce if this experiment is worth it or if we should reconsider its operation? Thanks, GerardM (talk) 06:30, 6 September 2015 (UTC)

Suggesting obsolete properties

On MC5 (Q830316), P738 (P738) is being suggested, but that's marked as an obsolete property. - Nikki (talk) 09:51, 21 September 2015 (UTC)

See Fyodor Dostoyevsky (Q991) for another example with many proposed statements for this property. - Nikki (talk) 19:01, 30 September 2015 (UTC)
+1. Omitting obsolete properties should be relatively easy and will help immensely. Good value for the development work. Runner1928 (talk) 19:12, 30 September 2015 (UTC)
✓ Done. I've hidden the statements with P738 (P738). should be relatively easy: no, it isn't because there is no easy way for machines to know if a property is deprecated or not. Thank you for the report. Tpt (talk) 16:59, 1 October 2015 (UTC)
For P738, shouldn't they be added as inverse? --- Jura 17:17, 1 October 2015 (UTC)
I think they already are because I mapped the Freebase inverse of the property mapped to P738 (P738) to influenced by (P737). Tpt (talk) 20:13, 1 October 2015 (UTC)
Obsolete properties hopefully state instance of obsolete Wikidata property, which is machine readable. - Jan Zerebecki 17:25, 1 October 2015 (UTC)
I only meant that the property label in English includes the string "(OBSOLETE)".Runner1928 (talk) 18:33, 1 October 2015 (UTC)
Thank you both for these details. But as the amount of properties used in Primary Sources data is currently small (less than 150) I believe that write a big change to Primary Sources just for that is maybe a little bit overkill. Tpt (talk) 20:13, 1 October 2015 (UTC)
Thanks :) - Nikki (talk) 14:21, 2 October 2015 (UTC)

Blacklisting versus whitelisting?

I have added quite a few URLs to the blacklist. In my experience (most of my edits are related to art and architecture), the tool suggests (rough estimate) more than 80% really unreliable sources.

For some of the properties I use most often, it would probably be quite easy for me, and for other volunteers, to suggest a whitelist instead: of sources that are trustworthy for a specific property by default. In the topic area of the visual arts, that would be all the RKD databases and most major museum websites for instance.

I’d be interested to hear if this idea has been discussed before, and what everyone thinks of this. Spinster (talk) 12:02, 25 September 2015 (UTC)

Just to be clear: I don't think the tool should give source suggestions from the whitelists exclusively. But it would be good if it tried to retrieve suggestions from the whitelisted sources first, and then fallback upon 'the rest of the web'. Spinster (talk) 12:04, 25 September 2015 (UTC)
+1. At least ordering sources by presence in a list of preferred sources. Maybe we could derive this whitelist by a frequency count of URL roots in Wikipedia references. Runner1928 (talk) 19:12, 30 September 2015 (UTC)
Thank you for the suggestion. I've opened a bug on GitHub for it. Tpt (talk) 17:04, 1 October 2015 (UTC)

Open Library identifiers

See for example The Adolescent (Q2011716) which has a number of suggested Open Library ID (P648) statements. None of the links work and none of the values match the regex given on the page for Open Library ID (P648). Not all proposed statements for this property are the wrong format, e.g. those on Fyodor Dostoyevsky (Q991) are the right format. It would be good if the ones which are the wrong format could be filtered out. It might even make sense to do that for all properties (although this is the only property I've noticed so far with a problem). - Nikki (talk) 19:09, 30 September 2015 (UTC)

Thank you for the report. I have filtered the Open Library ids in the wrong format. Tpt (talk) 17:18, 1 October 2015 (UTC)
Thanks :) - Nikki (talk) 14:21, 2 October 2015 (UTC)

Redirecting MusicBrainz IDs

Would it be possible to filter out redirecting MusicBrainz IDs? Regardless of whether we want to store them or not (there is currently a discussion on the mailing list about it), I don't think it makes sense to show them in the primary sources tool. As long as we have the non-redirecting IDs, all the redirects to those IDs can be imported far more efficiently by a bot any time we want, if that's what we want (preferably not until the interface has been fixed to be able to handle hundreds of IDs for a single property without completely overwhelming the page, see Polly (Q1575153) for an example with 485 suggested IDs). It's more productive for humans to check whether a non-redirecting ID is a new match we don't have yet. - Nikki (talk) 14:20, 2 October 2015 (UTC)

Sure, it is definitely possible to do so but it requires some development work. I'll try to do it. Tpt (talk) 13:06, 12 October 2015 (UTC)

Skyscrapercenter ID

I have noticed that the tool often suggests a CTBUH Skyscraper Center building ID (P1305) value. It seems to be reliable, but adding them that way sounds likely to take years. If the values are sufficiently reliable, maybe a bot could upload them all for real ? Or maybe there would be a way to move them to user:Magnus Manske's mixandmatch tool as "auto-matched" value (ok, mixandmatch doesn't support this property yet, just wondering about the idea) ? --Zolo (talk) 07:43, 8 October 2015 (UTC)

Sure, I can import them using TptBot. If you think the ids are good enough, I can do it in the next few days. Tpt (talk) 13:07, 12 October 2015 (UTC)
Thanks, yes they appear good enough. If you do it, I'll try to do a few post-upload sanity checks, like all items are instances of architectural structure. --Zolo (talk) 07:59, 15 October 2015 (UTC)
I see it's done, tanks. If I find wrong values like in The Southern Star (Q1155904), would it be useful to deprecate it or something, or can I simply remove it ? --Zolo (talk) 11:57, 16 October 2015 (UTC)
If the value is wrong I don't see why we should keep it in Wikidata. The Primary Sources database keeps the list of added statements so I don't see any value of keeping these statements in Wikidata too. Tpt (talk) 19:09, 17 October 2015 (UTC)

Migration of enwiki Persondata

I am requesting the permission to run a bot which deletes the Persondata information from enwiki at the moment. Part of the proposal is to provide a tool to import the data manually to Wikidata. We are speaking about 1.9 mio articles with maximum 4 possibly importable statements (plus description and aliases which are not part of this request). I expect at least a million statements. To provide a stable infrastructure for this import is for sure possible, but I am woundering whether we could also use an already existing software instead of developing a new one. I would provide the data set in the required format. The question is: Can the enwiki Persondata be handled by the primary sources tool? Warm regards, -- T.seppelt (talk) 12:41, 6 November 2015 (UTC)

I'd rather see it uploaded directly. including aliases. --- Jura 13:12, 6 November 2015 (UTC)
There are doubts about the data quality, so a user assisted import was considered as the best solution.. --T.seppelt (talk) 15:07, 6 November 2015 (UTC)
Not really. Of course, you'd have to cross-check the data before uploading it. What is a user going to do with it? We would just have millions statements lingering in the tool .. --- Jura 15:13, 6 November 2015 (UTC)
I think it'd be a good use of the primary sources tool. It is exactly made for a human curation step. --LydiaPintscher (talk) 15:23, 6 November 2015 (UTC)
Can you outline the steps that would be done by such "human curation" and the timeline for fixing the tool to resolve the stability issues it has? --- Jura 15:26, 6 November 2015 (UTC)
Steps: statement is offered, you look up a reference for it, you approve the statement, you add a reference. Stability: it works rather well for me personally. Not sure what Tpt and co want to do next. --LydiaPintscher (talk) 16:59, 6 November 2015 (UTC)
I doubt users actually use it that way. --- Jura 17:08, 6 November 2015 (UTC)
I might be looking at the wrong account, but you seem to be adding claims with the tool without any reference whatsoever as well. --- Jura 17:15, 6 November 2015 (UTC)
I do look for references. But I don't add them when the sources I can find are not particularly good (but I am confident enough in the information anyway) or when it is identifiers I am adding. --LydiaPintscher (talk) 19:00, 6 November 2015 (UTC)

Let's focus again on the Persondata. It would be exactly the seam as with Freebase. I provide statements related to person for place of death (P20), place of birth (P19), date of birth (P569) and date of death (P570). They can be referenced with stated in (P248)English Wikipedia (Q328) (which is nonsense in my point of view). Also I think that there are a couple of people at the enwiki community who would like to help with this user-assisted migration. I think they wouldn't only concentrate on the Persondata data set but also on the other data sets which are provided by the primary sources tool right now. It is a good opportunity acquire more help for to go through this millions of statements. -- T.seppelt (talk) 10:30, 7 November 2015 (UTC)

Well, we still need to decide what we expect users to do. Apparently, Lydia would only do part of the ideal path she outlined. Personally, I'm not sure if it's a good use of user time. Last time we discussed it, one of the opinions was that we already imported most of it. I know there is some left, as I imported some of it myself, but that was only a fraction of items that actually had persondata. How did you get your numbers? What do you think users should do once they have such a statement suggested by the tool? --- Jura 10:39, 7 November 2015 (UTC)
  • T.seppelt: It would really help to see numbers for each field. How many we already got, how much is missing. You could add maintenance categories to the persondata template to determine that. --- Jura 11:05, 9 November 2015 (UTC)
@Jura1: I started a complete analysis. The results are available live under [4]. It will take a few days till it's done. You can scan the file for claim_new_* to get entries which would be interesting for the primary sources tool. Regards, -- T.seppelt (talk) 11:38, 9 November 2015 (UTC)
T.seppelt: if we already have a claim, maybe it would be better to note that in priority, instead of "unparsable" (sample: Q1325#P19 and Q1325#P19). For items with no claims, maybe some of the unparsable ones can be assessed and converted manually. --- Jura 11:46, 9 November 2015 (UTC)
For conflicts, maybe it's worth logging the current WD information as well. This could facilitate checking these. --- Jura 11:52, 9 November 2015 (UTC)
@Jura1: It has now to types unparsable_new and unparsable_check (for already existing, possibly conflicting values). The current value in Wikidata is also logged (second \t-separated value). Regards, -- T.seppelt (talk) 12:18, 9 November 2015 (UTC)
T.seppelt: Thanks. Looks good. If it's not already done, maybe text would need possible tabs and c/r converted into spaces. --- Jura 12:26, 9 November 2015 (UTC)

The log can probably provide us with the following:

Persondata field Wikidata New ready New unparsable Conflict Conflict unparsable Same
DATE OF BIRTH date of birth (P569) 51269 4695 88790 5093  ?
PLACE OF BIRTH place of birth (P19) 310575 32086 44907 27230  ?
DATE OF DEATH date of death (P570) 26379 2335 67835 2724  ?
PLACE OF DEATH place of death (P20) 90996 10654 14996 10737  ?
ALTERNATIVE NAMES Aen 101976 n/a n/a
SHORT DESCRIPTION Den 21417 135961
NAME Len (in future Aen) 54 244569

T.seppelt: good work! --- Jura 12:48, 9 November 2015 (UTC)

Here is a first (intermediary) status.

PropertyConflictConflict unparsableNewNew unparsableTotalRow total
P570329828989983774
P5693257481178643980
P202663135614802685767
P194401217725623909530
Len8958 [1]8958
Den105691710586
Aen65976597
Column total3314643034326820659749192
  1. Can be added as alias

T.seppelt: interesting start! --- Jura 18:54, 9 November 2015 (UTC)

@Jura1: ✓ Done so far. The first run is complete -- T.seppelt (talk) 16:39, 15 November 2015 (UTC)

Sorry for the long delay before answering. It's ok on the technical level to add persondata content to Primary Sources. I will be happy to help you with the importation of data into Primary Sources if there is no strong concern against it from the community. Primary Sources backend uses quick statements syntax as input, so if you could provide me a file with all statements in the this format I'll be able to add them to the tool. Tpt (talk) 19:33, 15 November 2015 (UTC)
@Tpt: I uploaded the dataset I have so far to [5]. The date values don't have precisions. If this is very important I can provide the data later with precisions after a second run. I anyway have to do at least one more run so it wouldn't be a problem. In order to demonstrate the functionality to the enwiki community also a part of the set as a teaser would be nice to be uploaded. Thank you very much, -- T.seppelt (talk) 20:02, 15 November 2015 (UTC)
@T.seppelt: Great, thank you. Yes, I need precisions for the time datatype. When it's done I'll be able to upload a teaser into Primary Sources. Have you remove from this dataset data already in Wikidata? If not, I have a script to do it so it's not a big deal. Tpt (talk) 20:53, 15 November 2015 (UTC)
@Tpt: I only uploaded possibly new statements. It should be good like this. -- T.seppelt (talk) 20:37, 16 November 2015 (UTC)
Thank you! But in order to upload the teaser I need to have precision for times and only dates after year 1000 are supported (it's a limitation of the backend and it's not fixed yet). Tpt (talk) 07:02, 17 November 2015 (UTC)
Then please take the places first. I am working on the dates with precisions. -- T.seppelt (talk) 09:40, 17 November 2015 (UTC)
✓ Done. They are in the "enwiki-persondata" dataset. Tpt (talk) 15:46, 20 November 2015 (UTC)
@Tpt: thank you very much! I launched my own tool for the migration now. It can handle aliases, descriptions, places and date. I would be great if you could test it. -- T.seppelt (talk) 15:47, 21 November 2015 (UTC)
Nice! I'll test it as soon as the OAuth system will be working. If you launch and advertise this tool, it's maybe a good idea to remove the persondata content from Primary Sources. What do you think about it? Tpt (talk) 16:49, 21 November 2015 (UTC)
@Tpt: the application is now approved and should work. Yes, I think you can remove the data set -- T.seppelt (talk) 19:47, 22 November 2015 (UTC)
@T.seppelt:. Great! The application is very nice. It would be nice to have a button that would allow to skip the suggested addition when the user is not sure of what should be done just like what is proposed in Wikidata Games. I have removed the persondata dataset from Primary Sources. Tpt (talk) 14:00, 24 November 2015 (UTC)
@Tpt: there is a Skip it button at the bottom. I should make it maybe bigger. Thank your for you effort anyways. -- T.seppelt (talk) 16:10, 24 November 2015 (UTC)
Looks good. Finally it seems to give mainly P19. Can primary sources handle Julian dates? --- Jura 20:32, 15 November 2015 (UTC)
No, it can't. All dates are imported into Wikidata in Gregorian calendar. Tpt (talk) 20:53, 15 November 2015 (UTC)
Without additional parsing? I think the calendar format was one concern of the enwiki community. -- T.seppelt (talk) 20:37, 16 November 2015 (UTC)
Yes, without. So, I think that, at first, import only "safe dates" (i.e. after 1920) into Primary Sources is maybe a good step. Tpt (talk) 07:02, 17 November 2015 (UTC)

"Donation"

This term is inappropriate; better is "release". Donation implies a transfer of property, which is the opposite of what Wikimedia is interested in i.e. public licenses (or increase of the public domain by means of CC-0, as in Wikidata's case). --Nemo 14:46, 10 September 2015 (UTC)

give us filtering capabilities

I, for one, am much more efficient deciding on inclusion of data in a domain I am familiar with. I can't imagine just clicking through a flow of random items. And I don't want to visit every page in some domain to check for available data. And if I'm thinking about manually adding a lot of stuff to a defined set of objects that may be partially already available from primary sources, it would be a waste of effort since just approving statements is much quicker.

So, I need to be able to somehow specify criteria for what source data I want to process. And just choosing one or the other source is not enough here.

Do you see any chance for some WDQ filtering feature in the tool or for integration with Autolist (à la Deep InSight) or a „Non-Random Primary Sources item“ sidebar link or something?.. Would be really useful.--Frysch (talk) 18:06, 29 November 2015 (UTC)

Education institutions properties

I'm a member of Wikidata:WikiProject Education and I've been working on a taxonomy of properties for educational institutions. Two properties in particular are in the Primary Sources suggestions but seem less useful to me than other options:

cc: @Kareyac:, @ArthurPSmith: Runner1928 (talk) 18:08, 1 December 2015 (UTC)

Bug report - Float selection of dataset

Firstly, a bug: The list of datasets have grown too large for the Javascript popup (displayed by clicking on the gear in the navigation frame on the left). Needs a dropdown/searchbox of datasets/simple box wrapping to the next line. --Izno (talk) 15:11, 11 December 2015 (UTC)

Not working?

Nothing is loading up for me. Wondering if there's a server problem? —Tom Morris (talk) 11:49, 18 December 2015 (UTC)

Thank you for the report. I've restarted the server and it should work now. Tpt (talk) 13:23, 18 December 2015 (UTC)