Topic on User talk:Magnus Manske

Jump to navigation Jump to search

Mix'n'match improvements - 2020 ideas

2
Epìdosis (talkcontribs)

Hi Magnus! In this 2020, also due to the lockdowns, I've spent a lot of time on Wikidata, and on Mix'n'match in particular (also with the great collaboration of @Bargioni:, which has proven an extraordinary tool for synchronisation of different catalogs with Wikidata. I've had many occasions to disturb you (fortunately a lot less after you gave me the opportunity to perform myself many actions through "catalog editor") and I've always appreciated very much your kind answers and solutions.

This evening I've tried to explore (and close, if possible) some of the old threads open on this page, in order to remember which problems are still open to be solved in the next months, and I've found many interesting issues, more or less important/urgent, related to the functioning of Mix'n'match, which has already made a lot of great improvements in the last years. I've tried to collect them in a unique thread, so that we can have a better overview of the suggestions already moved by different users.

Ideas collected from 2020 threads (I have surely missed something):

  1. it often happens that many IDs get automatched to one (generic) ID [from Topic:Vbnw3kqpnuj7lu5w] > there should be a way to un-automatch all IDs matched to a wrong item ✓ Done
  2. it often happens that all multiple automatches for an entry are wrong [from Topic:Vfxcovdfimsoezgo] > there should be a way to remove all multiple automatches from an entry, as there is a way for removing single automatches ✓ Done
  3. some parameters of the scrape may become outdated [from Topic:Vhwlvtqzghmg31og] > there should be a way to visualize the parameters of a catalog's scrape and change them
  4. references added when an item is created from MnM are plain URLs added all in the same reference to the statement [from Topic:Vit29jh9tlbjpzt3] > different catalogs should constitute different references; references should not be plain URL, but references with stated in (P248)X and Property IDID ✓ Done
  5. at the moment years of birth/death (date of birth (P569)/date of death (P570)) and other eventual data (e.g. VIAF ID (P214) or ISNI (P213)) are extracted in a second moment from auxdata [from Topic:Vjgor5rtcz2kgqmh] > when a catalog is imported through the import tool, it would be very good to have an apposite column to insert a priori at least years of birth/death, if possible also VIAF or ISNI
  6. for very big catalogs, the sync page doesn't load [from Topic:Vqzxlmdonip2vscc and Topic:Vu2hluqaescv78qy] > make it possible in some way to load the sync page for very big catalogs
  7. some catalogs, while having some merits, should not be used for sourcing key information such as birth/death dates [from Topic:Vqzxlmdonip2vscc] > there should be a way to indicate for single catalogs that they should not be used by bots to add references to statements
  8. some catalogs' entries have auxdata which are useful for the matching process but that should not be imported in new Wikidata items if they are created [from Topic:Vl8reqahkhkl4jq8] > there should be a way to indicate that the auxdata of a catalogs should not be added to new Wikidata items if they are created
  9. there is no easy way to merge catalogs [from Topic:Vqzxlmdonip2vscc and Topic:Vucpec2813ab9h5f] > there should possibly be an easy way to merge catalogs
  10. Mix'n'match (and QuickStatements) sessions seem to expire briefly [from Topic:Vnojx2ve72jzm5wl] > check if it possible to make these sessions longer, in order to avoid multiple logins
  11. the function "names in other catalogs" and "creation candidates" can be made more visible [from Topic:Vx91gnzs6zdvyf0v] ✓ Done
  12. detailed remarks from Tpalonen
  13. the Mix'n'match gadget adds IDs not considering if they are already present in the item, which sometimes causes duplication [from talk page; it has also been reported that it still links to "tools.wmflabs.org/mix-n-match" instead of "mix-n-match.toolforge.org"] > Mix'n'match gadget should avoid adding IDs which are already present in the item

Other ideas from me and Bargioni (I will add others in the next days, if we find new ones):

  1. when a catalog has a default type (e.g. human (Q5)), all automatchers should consider only items being instance of (P31)default type (this would avoid a lot of wrong automatches) ✓ Done
  2. when entries get matched to an item which is afterwards redirected, at the moment the catalog and Wikidata get out-of-sync, while there should be a botton that allows to adjust all such matches substituting the redirected item with the new item
  3. it would be comfortable, in the pages of a catalog (manually matched entries, automatched entries, unmatched entries), having the possibility to show: only entries with auxdata; only entries without auxdata; all entries
  4. it would be comfortable, when searching Mix'n'match, having the possibility to show one or both of the following categories of entries: manually matched entries; automatched entries; multiply-automatched entries; unmatched entries
  5. the automatic description of the items automatched to entries often doesn't load, being substituted by "Could not load description for X"
  6. the internal search sometimes is very slow (also when searching only in a single catalog), although nearly always finally succeeding in showing results
  7. the internal search at the moment searches only in the names of the entries; it would be useful having a second search-box searching only in the auxiliary data of the entries, in order to exploit both at the same time

I thank you again for the great work you do in maintaining Mix'n'match, QuickStatements and all the tools which make our work on Wikidata much easier and enjoyable and I'm looking forward to the next months, with a lot of new MnM catalogs to crossmatch. Good night!

Matlin (talkcontribs)

+ Creation candidates: a possibility to skip candidate with button at the top of proposed items. Sometimes there are candidate with lot of items (for example: David Jones) wchich i don't want to check. It would be less laging and more comfortable to skip candidate without sliding down entire site.  Done

Reply to "Mix'n'match improvements - 2020 ideas"