User talk:Multichill/Archives/2014/September

From Wikidata
Jump to navigation Jump to search

Most used templates on Commons?

Do we have (or can it easily be made) a list of the most used templates on Commons? If broken down by namespace - ie separate stats for use on categories, articles, filepages - even better.

Maintenance templates could be excluded; classes like Creator templates and Institution templates could be grouped together.

It would be a great start towards thinking which ones could be fed from Wikidata. Jheald (talk) 17:22, 17 August 2014 (UTC)

Hi Jheald, I guess you already found Commons:Special:MostTranscludedPages? It's pretty straightforward to do queries on Toollabs to figure it out per namespace. Actually, it's about two namespaces, the namespace of the page that transcludes (for example file) and the namespace of the page (for example creator) that is included.
What question are you trying to answer? Creator and institution templates should just all be matched an imported, I don't think statistics would change anything about that. Multichill (talk) 18:38, 21 August 2014 (UTC)
I didn't know about that listing, that's very useful. What I guess I was trying to think through was: what are the most important templates that contain data that ought to be uploaded to Wikidata; and which could and should be amended to allow them to read data, internationalisations, etc, from Wikidata? Plus also I suppose, which templates are heavily used e.g. on File pages, but will probably not ever be replaced with templates that draw from Wikidata? Also, what content is on category pages, that would be likely to go on relying on data specified locally, rather than pulled from Wikidata.
-- eg Template:Infobox_aircraft_image (transclusions), most of the fields of which seem typically to relate to pretty unique per image things (and therefore not good candidates for Wikidata)
but there are also other heavily used templates that might be candidates for drawing data from a WD reference; plus also gallery- and category- header type templates. So I was just trying to get a sense of what there is out there that is important. Jheald (talk) 19:47, 21 August 2014 (UTC)
Jheald getting back to this. Things on the to do list:
  • Licenses, examples. That tree needs to be build here. Would probably be wise to import the translations. This is tough. Be careful.
  • Creator templates, I see you noticed I worked on the this list. A bit more matching and than probably import them here
  • Institution templates, need to match the templates in this category. It's a bit harder than the people I expect.
  • Commons:Category:Single artwork templates is easy. Just a couple of manual to link
  • We have templates on Commons to mark individual monuments, see this category. Wikidata:WikiProject Cultural heritage is working on importing individual monuments to Wikidata. That solves this problem. For example I imported all 60.000+ Dutch Rijksmonumenten
I doubt importing information from artwork templates will work that well. The data is too unstructured. It's better to get the data from the source. Multichill (talk) 10:01, 7 September 2014 (UTC)

Sorting out category-category and article-gallery link integrity for Commons

Dear Multichill,

As per our earlier chat, it would be really incredibly useful to know more about the true state of links between Wikidata and Commons.

For example, currently it seems we have:

There's a strong chance that when the new "In other projects" sidebar goes live as a beta feature on Wikipedia, Wikisource and Wikiquote in seven days time, it will be include links to both Commons categories and Commons galleries, for both Wikipedia articles and Wikipedia categories. This should be fairly possible using Wikidata properties Commons category (P373) and Commons gallery (P935). It would certainly be popular on Commons, the way this RfC is going.

So it would be really good if we could get these properties plumbed in over the next few days.

Lydia has also given the go-ahead for Phase 2 on Commons as soon as Commons wants it. We will therefore be able to have link templates at the top of Commons categories, automatically pointing to the full multilingual set of available Wiki articles.

With these two developments, it would be a really good time to find and sort out whatever mismatches there are in category-category and article-gallery integrity.

Would you be up for helping out with this?

Wikidata:WikiProject_Structured_Data_for_Commons/Phase_1_progress is available as a possible good central co-ordinating page.

Thanks (and sorry if we didn't get off to the best of starts),

All best, Jheald (talk) 08:42, 19 August 2014 (UTC)

@Multichill: Any thoughts on this ? Jheald (talk) 21:48, 24 August 2014 (UTC)
Worked out how to do the queries. (Thanks for the example of your cross-namespace query!)
See:
Cheers, Jheald (talk) 23:25, 5 September 2014 (UTC)

New P910/P301 pairs from matching P373s: your advice sought

Hi! I've now had a first go at trying to identify new topic's main category (P910) / category's main topic (P301) pairs by matching on Commons category (P373)

A sample, from the letter D, can currently be found at User:Jheald/sandbox

These are for categories and non-categories that have a unique P373, where the non-cat currently has no topic's main category (P910) set.

Statistics:

  • Category-like items with P373: 252,181, of which 15,398 have shared P373s, and 236,783 are unique
  • Noncategory-like items with P373: 697,434, of which 60,954 have shared P373s, and 636,480 are unique
  • 75,857 matches between noncats and cats with unique P373, of which 40,289 of the noncats already have P910 set, and 35,568 do not.

It's thus found 35,568 potential match candidates -- which is less than I'd hoped for, given 250000 categories to start with, but better than nothing.

So the question I have as a newbie is then, what to do next. Are (enough of) the matches good enough?

Looking at the sample for the start of the letter D, they seem mostly not bad; but there are some cases where the category seems more precise than the article, or vice-versa, eg Category:Daimler vehicles (Q8359516) and Daimler Company (Q27539), or Category:Compositions in D major (Q8407165) and D major (Q1124006).

I also noticed a few cases where the noncat item is a "list of" article, rather than a concept -- eg Category:Dams in Cyprus (Q8360464) and list of dams and reservoirs in Cyprus (Q1444295). I am not sure whether this is a problem or not.

However on the whole, the matches seem to be pretty good.

So do I now just apply for a bot account, and put them all in? Or is there some more careful thought needed first? Jheald (talk) 12:02, 7 September 2014 (UTC)

I did quite a bit of matching and importing last year, I still seem to have a leftover list at User:Multichill/Kladblok. Not sure if I still have the code somewhere. A good rule of thumb is that if there is a category about a certain topic, that topic is in that category and most of the time the sortkey is " " (a space).
It's sometimes a bit of a puzzle. Best to solve the easy cases with a bot to get them out the way so the humans can focus their time and effort on the difficult edge cases. Multichill (talk) 20:24, 7 September 2014 (UTC)
Jheald: Take a look at User:Multichill/Kladblok. I just updated that page. Multichill (talk) 11:15, 13 September 2014 (UTC)

A thought about WD Phase 3, and Commons Creator templates

As you've explained, the big blocker for Wikidata Phase 3 is tracking the dependency tree of things that would depend on a Wikidata item, that need to have their cached versions marked as dirty is the Wikidata item is edited.

But how much of a problem would this really be for, say, Creator templates ?

If one had a generic template {{Creator|q-number}}, and one had a script that simply made a null-edit on the the template once a week, would that not still be far better than the current system? (And presumably not that unmanageable in terms of pages unnecessarily being marked as dirty -- how many file pages with a creator template get hit a week, and is this beyond the capacity of the servers to cope with?). The page could be locked against anyone else editing it in the meantime.

If somebody wanted to verify that changes made to a painter's Wikidata entry had accurately propagated through, they could just make a null edit on any file-page, to see the result immediately, or otherwise know it would automatically be updated Commons-wide within a week.

Commons wasn't perhaps the first production-wiki that the team may have been thinking of turning Phase 3 on for; but would you agree that, actually, it might not be so crazy? Jheald (talk) 12:32, 7 September 2014 (UTC)

One other thing: for the present system of Creator templates, do you know whether there are any tools to automatically generate a filled-in Creator template from Wikidata item? Or even perhaps a template script, analogous to an ingestion template, that could be subst'd here on Wikidata, to produce the appropriate wikitext? (The latter could be very close to what would be used on Commons, once Phase 3 is live). But thanks for the thought. Jheald (talk) 12:35, 7 September 2014 (UTC)
What you're suggesting is a bad implementation just to save a couple of months. Bugzilla:47930 should be implemented properly. By the time we have that we'll also added a Q id to each creator template and copied the data over to Wikidata. Than we can slowly start migrating (removing) data from the creator templates to use Wikidata. At some point in the future we just end up with only {{Creator|q-number}}. By that time structured metadata will become available on Commons and we can just add a creator claim and the creator templates can disappear completely. Rome wasn't built in a day. Multichill (talk) 20:37, 7 September 2014 (UTC)
I'm sure Bugzilla:47930 will be implemented properly. But do we need to wait for it to be implemented? A {{Creator|q-number}} template with Bugzilla 47930 would be exactly the same as a {{Creator|q-number}} template implemented without it -- and indeed would probably be almost exactly the same as a {{Creator|q-number}} template implemented as a prototype on Wikidata, that could be subst'd to generate a new Commons {{Creator:Artistname}}
It's true, I may be acting like a newbie in a hurry, when in reality there is more than enough other work needing doing to keep everybody busy until Bugzilla 47930 has been implemented. But the same time, as indicated above, in this particular case I'm not sure I actually see the harm in proceeding with Phase 3 for a few tightly controlled templates, even without Bugzilla 47930. Jheald (talk) 21:24, 7 September 2014 (UTC)