Wikidata talk:Lexicographical data/Archive/2016/12

From Wikidata
Jump to navigation Jump to search

How will Wiktionaries access lexeme data?

In Wikipedia and other projects, pages are linked to Wikidata, so it is pretty straightforward to determine which Wikidata item to fetch data from by default.

In Wiktionary though, each page can have numerous lexemes. For example, wikt:en:cat has around 20 lexemes. If templates like wikt:en:Template:IPA or wikt:en:Template:audio want to fall back to Wikidata data, how will Wiktionaries be able to say which lexeme applies? Will they have to manually pass the lexeme ID to every single template or will there be a better way?

- Nikki (talk) 17:01, 27 November 2016 (UTC)

It will probably require the creation of a whole new set of query tools. With Wikipedia and Wikisource, the data stored at Wikidata is metadata. But for Wiktionary, Wikidata is planning to store content data. This is a whole new animal, and it is still not clear what or how. This has been attempted at least twice before (Wiktionary Z and Omega Wiktionary), but both projects spiraled into obscurity. --EncycloPetey (talk) 19:23, 27 November 2016 (UTC)
I don't agree that Wikidata stores only metadata. Metadatas on articles would be stuff like the history or the list of authors of the article (datas about datas). In Wikidata we stores stuffs way beyond that (population, the kind of entities, presidents of countries ...) This datas are used to generate content datas as infoboxes as well. author  TomT0m / talk page 20:13, 27 November 2016 (UTC)
You're heading off on a tangent there. Infoboxes are a third use of data, but it's still not content data for either Wikipedia or Wikisource. The content on Wikipedia is the article, and information in the infobox is simply supporting data pertaining to the subject of the article; the data from Wikidata that could be used in the infobox would not used to generate any portion of the article itself. Likewise, the content on Wikisource is the book, article, play, poem, etc.; and the data housed at Wikidata describes the source material, but again, none of the data at Wikidata could ever be used to generate the book, article, etc.
But with Wiktionary, this project is talking about storing content data, and this most certainly is NOT the same as "infobox" data. An infobox is set to the side of a Wikipedia article, providing statistics, metadata, and the like. But definitions of words, pronunciations of words, etymologies, synonyms, antonyms, translations, supporting quotations, hyphenation, spelling variation, etc. are the content at Wiktionary, and Wikidata is talking about trying to house that information here. What the Wiktionary project on Wikidata is essentially trying to do, is to convert all the content at the Wiktionaries into a single database from which entries at any Wiktionary project can be generated automagically. That is an enormous task that, as I said, has been attempted twice before and met with failure.
My point is that we're talking about housing an entirely different sort of data, in an entirely different way, for an entirely different sort of usage, and this has been attempted twice before without success. --EncycloPetey (talk) 02:31, 28 November 2016 (UTC)
I think you're totally wrong. First, in a lot of case, Wikidata could be used to generate parts of the article, say the first sentence of some article. Second, Wikidata is not supposed to only support Wikipédia and can handle datas that will not be used in Wikipedia article. Sure, in most case wikidata will be less detailed that Wikipedia because our data model is less flexible and way less expressive than natural language. Wikidata will also be able to support the generation of graphs in Wikipedia. A good example of why things are not so simple and you can't write a so strong dichotomy could be the references : Wikidata's statements are supposed to be sourced by serious references. This sources can definitely be used in Wikipedia. Say the Wikidata reference for the population is shown in the Wikipedia article, and that this source actually come from Wikidata. Will you duplicate the reference in Wikipedia ? Seems useless to me.
That is an enormous task agreed. that, as I said, has been attempted twice before and met with failure Maybe, but this was not side by side intermingled with the biggest collaborative project of all times and its community. I never had heard the name "Omegawiki" before the Wikidata era. See the difference ? Wikidata has kind of achieved to merge Freebase (who knew freebase ?) and Wikimedia. This seems a success to me. author  TomT0m / talk page 18:15, 28 November 2016 (UTC)
EncycloPetey, I would neither call OmegaWiki a failure, nor would I say it was tried twice with OmegaWiki and Ultimate Wiktionary: in fact, those two are the same project, just at different points in time, or different aspects of the same project. OmegaWiki is up and running, and has a community of its own. Wikidata for Wiktionary has one major difference to OmegaWiki, which is that it is centered around Lexemes whereas OmegaWiki is centered around defined meanings. It is my understanding that the centering around Lexemes is what will allow a much smoother interaction with the Wiktionary projects.
In fact, if one cares less about interacting with the Wiktionary project as they are, it might be that OmegaWiki's model would be the model they'd come up with. Wikidata doesn't have the intention to replace the Wiktionaries with something new, but to provide a backend database that can support the Wiktionaries, if they choose so. This makes the incentive structure and architecture of Wiktionary for Wikidata very different from OmegaWiki, which is why I think that W4W has a good chance in performing better in a number of metrics than OmegaWiki. --Denny (talk) 20:56, 28 November 2016 (UTC)


Nikki, yes, as far as I can tell, we would need to explicate every single lexeme ID in Wiktionary for now. I am pretty sure that after some time contributors will figure out how to use Lua to do that automatically, and at that point we will need to see how the infrastructure will be able to deal with that kind of querying, but for now, my assumption is that the Lexeme IDs are being called explicitly. This also helps the local communities to keep better control of the data. I am fully expecting this to evolve over time, and change as usage patterns from the Wiktionary communities change. --Denny (talk) 17:24, 28 November 2016 (UTC)

I was worried you'd say that. :) I started thinking about it because of what Noé said above about integration with VisualEditor.
I think, as a minimum, there would need to be something in the editing interface which would find lexemes matching the current page name and list them. Ideally, if you're editing a section for a specific language, it would also detect which language and only show matching lexemes. That would mean users would be able to efficiently find the lexeme ID and without having to leave the Wiktionary edit page. Without something like that, I think only hardcore Wikidata fans will bother. :)
I wonder if it would be possible to have something like a magic word variable (e.g. {{LEXEME:L1234}}) which could be added at the beginning of a section and would apply until the end of that section and would be available to all templates in that section... It sounds like it might be too complicated or difficult but if it's a plausible option maybe that could be a way of only having to specify the ID once rather than for every single template.
- Nikki (talk) 23:17, 28 November 2016 (UTC)

That sounds like a great solution! --Denny (talk)

I support Nikki idea of one magic word by sections. Adding an ID to every template may turn the wikicode into a very disturbing galimatias for amateur lexicographers without a curriculum in IT (and this is our task force). I also like very much the way Nikki described the requering operation in VisualEditor. Thanks! Noé (talk) 09:53, 15 December 2016 (UTC)