User talk:Robert Važan

From Wikidata
Jump to navigation Jump to search
Logo of Wikidata

Welcome to Wikidata, Robert Važan!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards! --VIGNERON (talk) 11:58, 17 February 2020 (UTC)[reply]

Forms in Slovak

[edit]

Hi,

A short message to let you know that I added forms for Lexemes about noun in Slovak who didn't had forms before. What I did is that I copied the main lemma as a singular form, see absolútno (L251418). I'm hoping this is mostly correct, I'll let you check and tell me. If there is anything (for instance: errors to correct or regular forms to add), please let me know.

Cheers, VIGNERON (talk) 15:46, 2 March 2020 (UTC)[reply]

@VIGNERON: Hi. Thanks. Slovak nouns usually have 12 forms and the lemma is the singular nominative case form. There are some nouns (for example nohavica (L251194)) that have only plural forms and the lemma is then also plural. So yes, it's mostly correct, but it will all require manual review. Robert Važan (talk) 19:52, 2 March 2020 (UTC)[reply]
@VIGNERON: On second thought, since all these forms were programmatically generated, they are redundant to the existing data (lemma+lexcat). Any user of Wikidata could have done the same programmatic extrapolation of existing data locally on their computer. I think Wikidata should contain content that was manually curated and reviewed at some level. Filling Wikidata with generated data just makes it hard to tell which data is reviewed (and thus highly reliable) and which is not. I think you should avoid uploading any generated data unless you have some plan to subsequently review and correct it. — Robert Važan (talk) 23:25, 2 March 2020 (UTC)[reply]
Yes, it's a bit redundant but not entirely: I've added "singular" as grammatical feature and anyway, Lexeme need to be redundant on that specific point. Yes, anyone could have done the same (someone understand Slovak ideally) but for months nobody did it so I've been bold.
« I think Wikidata should contain content that was manually curated and reviewed at some level. » I understand this point of view but I respecfully disagree and most of data are imported/maintained by bots (I myself added almost 3 million data in Wikidata and I'm a small contributor). That said, I will stop for Lexemes in Slovak (I have enough to do in French and Breton anyway ) and let you complete them by hand all (almost none having complete forms). And in this case, it's easy to know which has not been reviewed since I forgot to add the case during my import.
Again, if you want or need any help, I'll be glad to answer. Meanwhile, here is a simple query that might help you:
SELECT ?l ?lemma ?catLabel (COUNT(?form) AS ?count) (GROUP_CONCAT(DISTINCT ?form;separator=", ") AS ?forms) WHERE {
  ?l dct:language wd:Q9058 ; wikibase:lemma ?lemma ; wikibase:lexicalCategory ?cat ; ontolex:lexicalForm/ontolex:representation ?form .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "sk". }
}
GROUP BY ?l ?lemma ?catLabel
ORDER BY DESC(?count)
Try it!
Cheers, VIGNERON (talk) 13:05, 3 March 2020 (UTC)[reply]
@VIGNERON: Actually, I added nearly all of those lexemes myself a few weeks ago. :-) I am using a custom editing tool, which is Slovak-aware and lets me review all edits. I hope to publish it, so that others can use it for rapid editing. It could be later ported to other languages. — Robert Važan (talk) 13:17, 3 March 2020 (UTC)[reply]
Oh ok, that's sounds great (I'm eager to know more) and then I'm sorry I intruded. Cheers, VIGNERON (talk) 13:32, 3 March 2020 (UTC)[reply]

Arbitrary break

[edit]

I see that you have been creating lots of (mostly empty) Slovak lexemes. It's been ten months since the discussion above; when do you plan on adding information (such as forms and senses) to these? Mahir256 (talk) 18:41, 12 January 2021 (UTC)[reply]

I have been very busy during those 10 months. I am happy to have some time to contribute here again. I am currently adding noun genders (see gender stats) and forms for several lexical categories (see form stats). I will add more data as my software for semi-automated editing matures. — Robert Važan (talk) 18:48, 12 January 2021 (UTC)[reply]

[WMF Board of Trustees - Call for feedback: Community Board seats] Meetings with the Wikidata community

[edit]

The Wikimedia Foundation Board of Trustees is organizing a call for feedback about community selection processes between February 1 and March 14. While the Wikimedia Foundation and the movement have grown about five times in the past ten years, the Board’s structure and processes have remained basically the same. As the Board is designed today, we have a problem of capacity, performance, and lack of representation of the movement’s diversity. Our current processes to select individual volunteer and affiliate seats have some limitations. Direct elections tend to favor candidates from the leading language communities, regardless of how relevant their skills and experience might be in serving as a Board member, or contributing to the ability of the Board to perform its specific responsibilities. It is also a fact that the current processes have favored volunteers from North America and Western Europe. In the upcoming months, we need to renew three community seats and appoint three more community members in the new seats. This call for feedback is to see what processes can we all collaboratively design to promote and choose candidates that represent our movement and are prepared with the experience, skills, and insight to perform as trustees?

In this regard, two rounds of feedback meetings are being hosted to collect feedback from the Wikidata community. Two rounds are being hosted with the same agenda, to accomodate people from various time zones across the globe. We will be discussing ideas proposed by the Board and the community to address the above mentioned problems. Please sign-up according to whatever is most comfortable to you. You are welcome to participate in both as well!

Also, please share this with other volunteers who might be interested in this. Let me know if you have any questions. KCVelaga (WMF), 14:33, 21 February 2021 (UTC)[reply]

15000 bot-like edits?

[edit]

Hi, I recently understood that we have an informal limit of 10.000 non-interactive bot edits. Above that the community wants a bot request and the edits done with a bot account. See https://wikidata.wikiscan.org/user/Robert_Va%C5%BEan --So9q (talk) 04:06, 12 April 2021 (UTC)[reply]

@So9q: From my user page: "I am using semi-automated editing tool, which allows me to save whole batch of edits at once. Don't confuse this with a bot. My edits are all manually reviewed. All data I add should be highly reliable." I guess I am a flooder, so bot account is inappropriate. — Robert Važan (talk) 04:10, 12 April 2021 (UTC)[reply]
I just saw now that what I wrote above does not seem to be in line with the community. I'm curious, which tool do you use? Is the code available somewhere? Could you add it to Wikidata:Tools if not already done?--So9q (talk) 04:19, 12 April 2021 (UTC)[reply]
@So9q: I develop my own tool. I would be happy to share it, but it's not in publishable state yet, unfortunately. — Robert Važan (talk) 04:22, 12 April 2021 (UTC)[reply]
Do you know github and the like platforms? I publish my code there, finished or not. This is the latest work in progress https://github.com/dpriskorn/LexSAOB. I would like to read you code even if you think it is not finished. Code IMO is never really finished, there are always things to improve. :)--So9q (talk) 04:48, 12 April 2021 (UTC)[reply]
@So9q: I think that would be just a waste of time for you and support/communication overhead for me. I am however working on well-documented lexeme editing API in Wikidata Toolkit. After that, I can upstream more general-purpose code from my private tool. Once the low-level routines are out, I can try to publish portions of my tool that are less dependent on hardcoded knowledge about Slovak language. — Robert Važan (talk) 05:03, 12 April 2021 (UTC)[reply]
Ok, good luck with improving Wikidata-Toolkit. In python-land WikibaseIntegrator already supports all features of lexemes which is very nice :).--So9q (talk) 05:26, 12 April 2021 (UTC)[reply]


Request To Know More About Lexeme Editing Tool

[edit]

Hi Robert Važan After reading on a few discussions on several Wikiprojects on lexemes,I found out that you developed a tool to gather/edit lexeme data. Can we have a quick session to discuss this tool? Eugene233 (talk) 14:19, 23 June 2021 (UTC)[reply]

@Eugene233: Sure. The tool currently hardcodes rules for Slovak language. I am gradually making it multi-lingual. I intend to publish it once it becomes useful for other editors. Besides that, I am also working on lexeme editing API in WDTK. — Robert Važan (talk) 15:00, 23 June 2021 (UTC)[reply]