User:Jmabel/Documentation thoughts

From Wikidata
Jump to navigation Jump to search

During the recent Wikimedia Conference, I talked with User:John Cummings and (separately) User:Jens Ohlig about ways we could improve the help for Wikidata. I gather from the exchange I had at Wikidata talk:Introduction#Editing this page that even where I have simple, concrete changes I'd like to make, it would probably be a poor idea to "be bold" and jump in and edit help pages, etc., so I'm jotting down my ideas here.

User journeys[edit]

I think John Cummings is basically right that we should think of documentation in terms of user journeys, as long as we maintain some flexibility about that. The "journey" model allows for having multiple starting points and multiple destinations, and for some people being interested in "traveling" farther than others.

Most, but not all, of what I have to say relates to the creation of help content and a "help desk."

Making existing external data sets available to Wikidata[edit]

John seems to be already looking very seriously at what is needed by people who have large, existing data sets that they want to bring into Wikidata. Given that he is already clearly working on that, I'm not going to focus on what documentation is needed there, but I do want to raise a few technical considerations related to scope of imports; this thinking may or may not already be included in what John has been thinking about.

  1. I think it is important that Wikidata host metadata about the existence even of online datasets even where it is not plausible for us to import content. In particular, where copyrights prevent us from importing data, we can still model the fact that the data exists and is available. In many cases, the schemas will not be copyrighted, even if the content is, and we should be able to model schemas in ways that will let generic tools access the data, even if we can't host it.
  2. I also think that it is important that when we do massive imports of data we be able to tag that data (probably with qualifiers on values) as having come from a particular import. This is important both in terms of our usual concerns about sourcing, but also in terms of organizations which may be the sources of the data: if values from their data are later overridden by values from somewhere else, they will want it clear that they are not responsible for the latter.
  3. I suspect that this means that each external dataset (whether imported or not) and each batch import (with date/time & an indication of scope) should probably merit a Q-code in our system, though there may be some other way to model this. I'm less sure how modeling the schema of an external dataset fits into our current models, but I think that if it hasn't already been addressed it merits serious thought.
  4. Also, for datasets that have been imported, it might be worth thinking about whether there is a way to do a query to reproduce the content of that import, possibly including whether any of the values are now deprecated in Wikidata.

Potential reusers of Wikidata content[edit]

JM 2018-05-15: This is an area that is largely unfamiliar to me, so it's going to be a while till I have a lot to contribute here beyond the high-level headings themselves, but I think these are clearly important user journeys.

Within Wikimedia projects[edit]

Outside of Wikimedia projects[edit]

  • Are there specific other users with which we should have a comparable level of concern to what we have for other WMF projects? That is, if they want features added for their needs, what is the process?

Increasingly complex queries[edit]

  • What is the path by which a person learns to make increasingly complex queries?

Ability to be aware of changes in your mirrored data[edit]

  • Is there a way for someone to track what changes there are in the results of an identical query made at different times? How do they learn about that? (For what it's worth, there is a mention, but no real discussion, at Wikidata:Data access#Incremental updates.

Wikimedians who have data to add from Wikipedia, Wikisource, etc.[edit]

Increasingly complex modeling[edit]

I think we should be assembling (possibly in various subject matter areas) a set of increasingly complex paradigm examples of how to model different things. For example, for a given value of instance of (P31), it should be possible to find an example of a Q-item that was done right.

Near the "high end" of the increasingly complex paradigm examples consider, for example:

  • A citable, published, but controversial claim that two people that we have under separate Q-codes are actually the same person. E.g. the various claims that certain individuals other than Shakespeare were the actual authors of all or some of Shakespeare's plays.
  • Similarly, a citable, published, but possibly controversial claim that two or more names refer to the same person, e.g. B. Traven (Q342268) (for what it's worth, what we have in that item as of 2018-05-15 gives numerous possible aliases, and no citation for any of them)
  • A politician who is in office at the time a change of charter or constitution changes the title of the office (s)he holds. E.g. Mike O'Brien (Q6848222), where as of 2018-05-15 we haven't addressed the issue of the change of the structure of the Seattle City Council while he served on it.
  • A book written in one language, but first published in a translation to a different language, then eventually published years later in its original language. E.g. Escape to Life (Q1366818), almost certainly incorrectly modeled as of 2018-05-15.

I'd guess that right now only a handful of people have any idea how to correctly represent any of those in Wikidata, and there is no particular process for anyone to learn.

Probably a separate journey for Commons users[edit]

Of course, as of 2018-05, we are just starting to get together structured data for Commons. If someone comes in right now to Wikidata's front page, interested in that, I see no indication at all of where they would look.

Thinking from the user point of view[edit]

When people who are involved with a process write documentation, there is always a danger of thinking from an internal, technical point of view rather than from what the user is trying to accomplish. An example of where we can fall short would be a 5,000-word "how to" that doesn't start off with a "nutshell" description of basic criteria. For example, if the only way we can import third-party data (data not originally gathered by the uploader) is that the creator/owner of the data gives a CC-0 license, that shouldn't be buried halfway through a large document, it should be in a "nutshell" section up front. Similarly, someone shouldn't have to read deep into a document to find out how hard it is to scrape a raster-graphics-only PDF.

Similarly, we need an obvious starting point for help/"how to", and clear ways to get from that to what any given individual is trying to do. From the entry point, users need to be able to easily find:

  • overview
  • "how to"
  • where to ask for further help

Right now, the closest we have is the "Get involved" section of Wikidata:Main Page. That leads to:

  • Learn about Wikidata
    • Wikidata:Introduction. Sort of useful, but really a bit "insiderish".
      • First section "What does this mean?" is almost a mission statement.
      • Second section, "How does Wikidata work?" would be of interest to Wikimedian looking to add a few properties to an existing item, maybe even build an item from scratch; the subsection "Working with Wikidata" (of interest to those who would want to do a query) is a bit buried, the subsection title is hardly a grabber, and if I were someone coming from outside the WMF world wondering whether I could make live queries off of Wikidata I'd leave none the wiser.
        • By the way, why aren't things like "rank" in the Douglas Adams-related image here active links? We have the technology.
      • Third section, "Where to get started" very much presumes people have arrived to help us in our mission, rather than to see how we can help in theirs. Similarly for what follows in the remaining short sections of this page.
    • The Reasonator example for Douglas Adams is semi-useful, though I suspect that a lot of non-techies won't understand what exactly they are looking at. Couldn't we embed that in a page the upper part of which explains what they are seeing? Also, why just one such linked example, and that from a kind-of-geeky context? We talk about wanting to be more demographically inclusive; wouldn't it be good to have, say, 3 to 6 examples, each from a very different field of knowledge?
    • "Get started with Wikidata's SPARQL query service" => Wikidata:SPARQL query service/Wikidata Query Help. I haven't yet had time to look through this, so no detailed critique but: again, "insiderish". A user doesn't come in saying "I want to get started with Wikidata's SPARQL query service," they come in saying "I want to learn how to make queries and extract data from the system."
  • Contribute to Wikidata
    • "Learn to edit Wikidata: follow the tutorials" leads to Wikidata:Tours (instant vocabulary mismatch: tutorials/tours). I do think the two "tours" are fine as a very basic introduction for someone wanting to make very simple edits. But what tells them what they might want to learn next? What leads them into references, qualifiers, etc.? Just time and experience?
    • "Work with other volunteers on a subject that interests you: join a WikiProject" leads to Wikidata:WikiProjects. Man. Such an inside perspective. First thing in the list is "Completed WikiProjects" (of strictly internal interest), then "WikiProject resources" (not a beginner focus), then finally some things that might be of interest. So let's say I think, "History. That's something I focus on. 'WikiProject Ancient Greece'. Sure! I click. I get Category:WikiProject Ancient Greece and unless I'm a very experienced Wikimedian, I've basically hit a dead end. I have no idea what to do.
    • "Individuals and organisations can also donate data." Well, for one thing, "donate" seems completely the wrong word. It's not something they are giving us, it's a way to publish, or to make your data available. Do we talk about "donating" a Wikipedia article? Same with the target page, "Wikidata:Data donation". (That then leads to far more detail than I have time yet to take up, gets into the user journey #Making existing external data sets available to Wikidata, where as I said before I'm largely deferring to John Cummings for the moment.)
  • "Meet the Wikidata community" (TODO)
    • "Visit the community portal or attend a Wikidata event" (TODO)
      • Most of the Community portal is very focused for people who already have a lot of experience with Wikimedia projects. That's not necessarily bad, but there needs to be a comparable page that isn't focused that way.
      • Events: fine, I guess.
    • "Create a user account." Would be nice if there were some indication (at least an info popup on the linked page) of why you'd want to do this, that it's an account across all WMF projects, etc.
    • "Talk and ask questions on the Project chat or via live IRC chat [connect]" (TODO)
      • Project chat is not the most welcoming to beginners, it would really be nice to have something that was more of an "entrance ramp" as well.
      • I'm all for us linking an IRC channel, but how much of the time is it really being monitored? How does anyone know what to expect about that?
  • "Use data from Wikidata"
    • "Learn how you can retrieve and use data from Wikidata."
      • Wikidata:Data access refers, perhaps pedantically, to "dereferenceable URIs", links a not-too-great dictionary definition for "dereferenceable", and never uses the term again. I think this means something like Representational state transfer,what is often called a "RESTful" interface, but even as a person with a degree in Computer Science and decades in the field, I'm not sure. The content continues in that tone: I defy anyone who does not have a serious background in one or another form of computer science or data science to make head or tail of it.
  • "More..." links to community portal, which I remarked on above.

Help desk[edit]

I believe that (analogously to, for example, Commons) we need a Help desk separate from Wikidata:Project chat. Beginner questions tend to get lost in the shuffle on Wikidata:Project chat and, in particular, the page is not particularly attended by people who see it as an important part of their mission to help out beginners and learners. Also, a good help desk process very often helps elucidate what is unclear to less experienced users and feeds into continuous improvement of the documentation.