Wikidata talk:Wikidata Lexeme Forms

Jump to navigation Jump to search

About this board

Previous discussion was archived at Wikidata talk:Wikidata Lexeme Forms/Archive 1 on 2018-12-27.

Switching Wikifunctions API

2
Summary by Lucas Werkmeister

new API is now used

Jdforrester (WMF) (talkcontribs)
Lucas Werkmeister (talkcontribs)

Thanks for the reminder (I still had the task email unread but it slipped out of my attention); implemented and deployed. The API calls happen server-side because there can be many of them, and this way the network requests all happen inside the data center (Toolforge to MediaWiki) and we only pay for the user⇔DC RTT once.

Maltese templates ready

12
Summary by Lucas Werkmeister

deployed

Mtanti (talkcontribs)

@Lucas Werkmeister

Note that I will probably be editing this some more in the near future.

Nikki (talkcontribs)

A couple of issues came up while adding the templates to the tool:

The template for feminine nouns has "feminil" in the heading and "femminil" in the label, which one is right?

In the one for verbs, the last two forms both have "plural imperattiv" as the label. I assume the first one should be "singular imperattiv"?

Also for verbs, am I right in thinking the lemma should be "ħadem" (third-person masculine singular perfective)? (Right now the tool always uses the first form as the lemma)

Mtanti (talkcontribs)

Hi Nikki, sorry for taking so long to answer. I've fixed the first two issues, you were right about them. Yes, the lemma should be the third person masculine singular perfective. What should I do?

Nikki (talkcontribs)

I don't think there's anything you need to do about the verbs - that's something Lucas will need to fix. :) It might take a bit longer for the verb one to be added though because of that.

Was the first change in Special:Diff/1863957047 (changing plural (Q146786) to masculine (Q499327)) intentional?

Mtanti (talkcontribs)

Nope it was a mistake! Good catch! I fixed it.

Is there some way to see a preview of what the form will look like?

Mtanti (talkcontribs)
Mtanti (talkcontribs)
Lucas Werkmeister (talkcontribs)

I pushed a commit that should support using other forms for the lemma; hopefully Nikki can try it out later.

Mtanti (talkcontribs)
Mtanti (talkcontribs)
Nikki (talkcontribs)
Lucas Werkmeister (talkcontribs)

The templates are deployed now, please try them out.

Your tool docs are featured as an example in the new Tool Docs Guide!

2
TBurmeister (WMF) (talkcontribs)

Hello Wikidata Lexeme Forms maintainers, contributors, and fans! I wanted to let you know that I highlighted the Wikidata Lexeme Forms documentation as a shining example in the new Tool Docs guide that I just published. Thank you for creating lovely tool documentation that can serve as an example to help others create and improve tool docs :-) This guide was created as part of the Doc Your Tool project for the upcoming 2024 Hackathon. If you're interested, please join that project to work on or talk about tool documentation during the hackathon!

Lucas Werkmeister (talkcontribs)

Thank you, that’s lovely to hear :)

(And it prompted me to look through the docs again and notice some outdated parts :D)

Reply to "Your tool docs are featured as an example in the new Tool Docs Guide!"
So9q (talkcontribs)
Nikki (talkcontribs)

"All tools" is clearly not true. There are plenty of tools which are not subpages of "Wikidata:Tools" (Wikidata:Recoin, Wikidata:Listeria, Wikidata:PAWS, to name just a few), and I don't think they should be either. Categorising content is what categories are for. There's no need to make the page names any longer than necessary.

So9q (talkcontribs)

like your arguments 😀

Strange behaviour with lang subtag

5
VIGNERON (talkcontribs)

Hi @Lucas Werkmeister:,

Apparently, the tool doesn't understand lang subtag. So when editing an existing lexemes with such tag, it seems a bit lost. See for instance tavañjer (L628403) (Special:Diff/1962462516) where it removed the F2 and merged it with F1. Not sure how common are such cases but I least I want to warn about this.

Bonus general question, how can I have the F2 back? if I add a new form, it would be F14 and not F2, right?

Cheers,

Lucas Werkmeister (talkcontribs)

Revert the bad edit? That should restore F2, I think.

VIGNERON (talkcontribs)

Well the edit is mostly good (creation of forms F4 to F13)... I could put it in F14, it's ota big deal.

Lucas Werkmeister (talkcontribs)

Ah, okay, I didn’t notice those other forms :/

Lucas Werkmeister (talkcontribs)

(And then I can take a look again at how the tool handled the old version of the lexeme and try to understand what it should have done instead.)

Reply to "Strange behaviour with lang subtag"

I created a couple of templates for Aragonese

4
Summary by Lucas Werkmeister

all deployed

Juanpabl (talkcontribs)
Lucas Werkmeister (talkcontribs)

Thanks! The templates look mostly good to me, but the labels should be in Aragonese, not English. I could replace the template labels with the headings (which sound like they’re already in Aragonese), but I’m not sure about the form labels: are singular and plural the same words in Aragonese, or should they be different?

Juanpabl (talkcontribs)

Yes, singular and plural are the same words. I already changed the labels. Thanks!

Lucas Werkmeister (talkcontribs)

Alright, should be deployed now.

عُثمان (talkcontribs)

I have prepared a template each for masculine and feminine nouns in Punjabi at Wikidata:Wikidata Lexeme Forms/Punjabi.

This template does seem to run into a similar dilemma mentioned for Vietnamese just recently on this talk page in that it involves multiple representations of forms. I separated the Gurmukhi (`pa`) and Shahmukhi (`pnb`) text in the template with slashes as I initially misunderstood the slash feature mentioned in the description as being for this purpose rather than for separate forms with the same grammatical characteristics. Maybe a different separator could be used for this purpose, like a semicolon? I can try to help with implementing support for this though I am not sure if this involves a technical limitation I am not aware of.

I think it is important in this case that the template requires input in both scripts. The situation for Punjabi is that a greater number of textual references are written with Gurmukhi, but a majority of Punjabi speakers can only read Shahmukhi (Perso-Arabic script). Only including the former makes the lexemes inaccessible to the majority of speakers, and only including the latter makes the lexemes challenging to verify or to distinguish lemmas which are homographs in only one of the scripts, while also still excluding the many Punjabi speakers who are only familiar with Gurmukhi. --Middle river exports (talk) 13:39, 12 July 2022 (UTC)

عُثمان (talkcontribs)
Lucas Werkmeister (talkcontribs)

Hm, I don’t like the idea of another separator, for a couple of reasons:

  • “Special-casing” any character in the input is a bit awkward, because in principle almost any character is allowed in a form representation (i.e., as part of the data, instead of separating it). I’m already a bit uneasy with the slash (there are some lexemes where it’s a legitimate part of a form representation, e.g. I don’t think 24/7 (L44135) is unreasonable as a lexeme), and a semicolon or something else would make this worse.
  • It’s pretty non-obvious for users. The slash is at least a comparatively uncommon feature; you’re saying every Punjabi lexeme should have both representations.
  • You provided all the example sentences in both scripts (great!), but a simple separator solution wouldn’t provide a place for both versions of the surrounding sentence.
  • Mixing different languages/scripts in a single HTML input is probably bad for accessibility and usability in general (with a single language per input, it’s at least theoretically possible to, say, have a smartphone’s on-screen keyboard in different languages automatically, though to be fair I have no idea if browsers actually do that); more specifically –
  • mixing a left-to-right script and a right-to-left script in a single input sounds extra confusing.

At first, I thought that a relatively simple to implement approach would be to first have the user create the lexeme in one script, but when you click submit, instead of redirecting to the created lexeme, the tool immediately redirects to edit mode for the same lexeme in the other script. But that doesn’t solve the example sentence problem at all – the user would still have to know to enter the Shahmukhi form representations in the middle of sentences written in Gurmukhi, or vice versa. So that’s not really great either.

I think there’s no way around changing the data model of the tool a bit. Instead of a plain string, the example sentence could also be a map from language code to string, a bit like this:

'forms': [
	{
		'label': 'ਇੱਕ ਵਚਨ, ਸਾਧਾਰਣ ਰੂਪ',
		'example': {
			'pa': 'ਇਹ [ਕਮਰਾ] ਹੈ।',
			'pnb': '.ایہہ [کمرہ] ہے',
		},
		'grammatical_features_item_ids': ['Q110786', 'Q1751855'],
	},
	# ...
]

(In fact, I suppose the label should also become a map / dictionary.) And then the tool would just show both at the same time, one below the other. And the “heading” for each form would probably have the labels one after the other (ਇੱਕ ਵਚਨ, ਸਾਧਾਰਣ ਰੂਪ / اِک وچن، سادھارن رُوپ).

How does that sound?

Lucas Werkmeister (talkcontribs)
Deryck Chan (talkcontribs)

Interesting. If we get this to work, there will certainly be things Chinese can learn from. For Punjabi we are talking about two completely different scripts, not variant standardisations of the same script like Chinese though (as you know, hans and hant have a large [~2/3 by frequency] overlap).

عُثمان (talkcontribs)

That sounds really good! I appreciate your response and agree this is a better solution than using a separator character per the reasons stated. Dealing with left-to-right and right-to-left scripts in the same field is generally a pain, so having separate inputs and using separate HTML attributes would be the best approach. I have my Wikidata common.css set up to give text wrapped with lang="pnb" a different font treatment from other Arabic-based scripts, so it makes that sort of thing simpler as well.

If this is possible, it will also tentatively be feasible to have a Hindustani template which takes separate Hindi and Urdu inputs for the same form. --Middle river exports (talk) 22:04, 12 July 2022 (UTC)

Lucas Werkmeister (talkcontribs)

One thing I forgot to clarify above is the question whether these inputs should all be required. You mention that most Punjabi speakers can only read Shahmukhi, so requiring both inputs to be filled in might exclude a lot of users – but making either of them optional would potentially result in a lot of incomplete data (though you could at least query for that). What do you think?

عُثمان (talkcontribs)

I was thinking about this, and I think requiring it for the lexeme form tools for Punjabi specifically would be reasonable - the considerations may be different for other languages. While it does require more work on the part of the person entering data, so does understanding the grammatical distinctions, and entering one or the other is still possible via the main Wikidata interface. Querying for incomplete data is possible but a little challenging at the moment - part of what would become more achievable with manual entry is building a more reliable tool for converting between the two scripts, as the existing ones are a bit rudimentary and make too many mistakes to be used at the moment.


(Also, the number of people adding Punjabi lexemes at the moment is rather small, if anybody wants pointers in converting between the two scripts I would be happy to direct them to resources to help. The Punjabi-English Dictionary published online by the Punjabi University (Q3631316) provides excellent examples of headwords in both scripts, and there are books published in both scripts available online (see and ). I am from a Pakistani background and was initially only familiar with (Perso-)Arabic-based scripts, so I can understand learning an additional script can be a bit of a hurdle, but I would recommend it to anyone interested in adding lexicographical data in the language since otherwise it means missing out on a lot of information.)

Lucas Werkmeister (talkcontribs)

Sounds good – I expect in advanced mode all the inputs will still be optional, if anyone really doesn’t want to enter the other script.

Lucas Werkmeister (talkcontribs)

Hm, how is this going to work in bulk mode?

I could imagine just listing the representations consecutively – either grouped by script:

ਕਮਰਾ|ਕਮਰੇ|ਕਮਰੇ|ਕਮਰਿਆਂ|کمرہ|کمرے|کمرے|کمریاں

or grouped by form:

ਕਮਰਾ|کمرہ|ਕਮਰੇ|کمرے|ਕਮਰੇ|کمرے|ਕਮਰਿਆਂ|کمریاں

but in either of those cases the LTR/RTL mixing could become pretty confusing.

Or we group each script in its own line, with extra pipe characters to mark one line as a “continuation” of the other (rather than a new lexeme):

ਕਮਰਾ|ਕਮਰੇ|ਕਮਰੇ|ਕਮਰਿਆਂ|
|کمرہ|کمرے|کمرے|کمریاں

(That’s extra pipe at the beginning of the second line, in case it isn’t visually clear.) But that’s kind of ambiguous with the possibility to leave forms out.

Perhaps “multi-script templates do not support bulk mode” is also an acceptable answer, at least for the initial version. Then it can be added later without breaking anything.

عُثمان (talkcontribs)

I am partial to the separate line option as it makes dealing with the different script directions much easier, but even with separate pipe-separated lines, I can see it becoming hard to parse visually since the forms would not line up.


If possible, I think using single line breaks to separate representations of the same form, and double line breaks for different forms would be the easiest to read. (Then triple line breaks for different lexemes?) Something like:


ਕਸਰਾ

کمرہ


ਕਸਰੇ

کمرے


ਕਸਰੇ

کمرے


ਕਸਰਿਾਂ

کمریاں


It's not a huge deal, but this would also evade any confusion caused by the fact that the Gurmukhi full stop punctuation mark । looks a bit like |.


Of course, if this is too difficult to implement at the moment, that's fine as well. Bulk editing would be very useful but the non-bulk templates on their own would be too.

عُثمان (talkcontribs)

@Lucas Werkmeister: As the Hindustani templates have been working out well so far, I have now created a series of Punjabi templates in the same style. I still need to finish transcribing the templates towards the end to Gurmukhi, but since these will also likely have to be deployed in parts I should have those transcribed by the time you get to that part.

You will find that these templates are similar to the Hindustani ones in forms and features. (John Beames makes the analogy that it is like the German to Hindustani's English.) Although there are also "red" and "black" adjectives, these take different values for "paradigm class" as the way adjectives decline in the feminine differs. The most marked differences are in the verb templates - these are also set up along the verb phase model, but the number and combination of forms on a Punjabi verb is different for each one. Most of the verb forms are marked as optional to accomodate this; for example, some verbs might semantically exclude the passive forms or various gendered/numbered forms of the subjunctive. Each subsequent phase has fewer forms than the one before it.

Lucas Werkmeister (talkcontribs)

Alright, thanks! I’ll see when I can find some time to transcribe them (might take a bit, I’m afraid).

عُثمان (talkcontribs)

Thanks for catching the error on the feminine noun template—I'll give the big ones an additional look over today since I can be a bit scatter brained

Lucas Werkmeister (talkcontribs)

Alright, thanks! I’ve deployed the twofour noun templates now.

Lucas Werkmeister (talkcontribs)

Adjective templates also deployed.

Lucas Werkmeister (talkcontribs)

Adverb templates also deployed :)

Lucas Werkmeister (talkcontribs)
عُثمان (talkcontribs)

Thank you very much! I appreciate the amount of effort you put into this.

Lucas Werkmeister (talkcontribs)

@عُثمان: I transcribed the punjabi-verb-basic-transitive-{shah,guru} templates now, but unfortunately there’s an issue preventing them from being deployed yet: form 1 is indistinguishable from form 71, and form 2 from form 70. Since those pairs of forms have the same grammatical features and statements, they can’t be matched correctly in edit mode. Perhaps forms 70 and 71 are missing something in their grammatical features or statements? (This applies to the -shah and -guru versions equally, by the way.)

Lucas Werkmeister (talkcontribs)

I also noticed two things in the punjabi-verb-additive-causative templates: form 31 is missing variety of lexeme, form or sense (P7481)Charhdi (Q113612554) + variety of lexeme, form or sense (P7481)Central Punjabi (Q25592134) compared to the surrounding definite subjunctive forms (39–42), is that intentional? And after form 42, the numbering jumps back to form 38, making the form numbers misleading (the final form is called form 52, but there are in fact 57 forms, not 52).

عُثمان (talkcontribs)

@Lucas Werkmeister: Ah, forms 70 and 71 on the basic-transitive templates were meant to be the verbal noun forms for the passive stem, I've added passive (Q1194697) to those to fix that. Thanks for catching

I am investigating what to do about the causatives at the moment

عُثمان (talkcontribs)

OK, I've fixed the issues with numbering and missing statements on the additive causative templates, and on the causative-double templates (hopefully you hadn't started on those yet).

Then I went back and checked some other issues—

  • This was already in the templates but the value for paradigm class on the adjective-black templates should be kāḷā adjective, Common Punjabi (Q113064965)
  • For verbs, there are a few spelling updates. For each of these form numbers I have updated the spelling of the bracketed placeholder form:
    • basic-intransitive-shah: 65
    • basic-intransitive-guru: 64
    • basic-transitive-shah: 35, 48, 50
    • additive-transitive-shah: 19, 26, 39, 41
    • additive-transitive-guru: 54
    • additive-causative-shah: 17, 24, 39, 41
    • additive-causative-guru: 52

Sorry about all the confusion! We have a few new people besides myself starting to contribute Punjabi lexemes so it has helped to get some additional sets of eyes.

Lucas Werkmeister (talkcontribs)

Alright, I deployed the changes to the intransitive templates, as well as basic-transitive (shah, guru) and additive-transitive (shah, guru).

عُثمان (talkcontribs)

Thank you!

Lucas Werkmeister (talkcontribs)

additive-causative also done (shah, guru).

Lucas Werkmeister (talkcontribs)

And addive-causative-double also done (shah, guru).

Whew!

Reply to "Punjabi noun templates ready"

Rendering error in locales which have the title translated

5
Summary by Lucas Werkmeister

Fixed.

عُثمان (talkcontribs)

@Lucas Werkmeister: Compare for example, changing the setting to Punjabi (either script) to Sanskrit to see what I mean

Lucas Werkmeister (talkcontribs)

Apparently this happens when the “logged in as” translation uses $1 inside {{GENDER:$2|...}}. I’ll see if I can fix the formatting code, but you can also avoid this and other issues by just not using {{GENDER:}} in the translation if it’s not needed.

عُثمان (talkcontribs)

Oh! I assumed it was necessary even if not used. Good to know.

Lucas Werkmeister (talkcontribs)

Yeah, it’s unfortunate that translatewiki.net shows (IIRC) a warning when the $2 isn’t used.

I’ve concluded that I can’t really fix this in code – I’ll just update all the translations (and hope the l10n update bot can update the on-wiki pages accordingly without conflicts) and add a test to ensure it doesn’t happen with any future translations either.

Lucas Werkmeister (talkcontribs)

Should be fixed now.

عُثمان (talkcontribs)

@Lucas Werkmeister:: I have created noun templates for Hindustani (Hindi/Urdu) following a similar format to the Hindko ones at Wikidata:Wikidata Lexeme Forms/Hindustani. I have been rethinking about the pros and cons of having separate templates for different scripts and I think at this point the benefits of having separate templates outweigh the amount of effort to keep the representations in sync. The Urdu templates could be used to add forms to lexemes with just the Hindi representations and vice versa using edit mode, and at this point I have some tools to assist in automating the script conversions. If these work out, then I can add some more Hindi/Urdu and Punjabi templates which are set up similarly.

The gender features are marked as optional just to account for the fact that gender statements and features are missing on a lot of Hindustani lexemes, so Lexeme Forms could help fill those out.

عُثمان (talkcontribs)

@Lucas Werkmeister:: The templates for all major parts of speech in Hindustani in both scripts/registers are now ready and on the page above.

The Hindi translation of the Lexeme Forms messages was already completed by someone, and I can finish the transcription of those for Urdu in short order.

(Sorry for the extra ping, I am often unsure of what messages result in a notification on here. Whenever you have a chance to take a look.)

Lucas Werkmeister (talkcontribs)

I see, interesting approach… I think this is doable, but since the index page (toolforge:lexeme-forms) groups templates by language code, they’ll be separated there: one group for اردو (ur), and one group for हिन्दी (hi). (I’m not sure if the blocks would be adjacent or not… so far, the groups are sorted by language code, so hi and ur would be quite a bit apart – but I could also put them next to each other, I think.) Does that sound okay?

عُثمان (talkcontribs)

If they can be placed next to each other, that would be perfect - I think it would make sense to place them in the alphabetical position of "hi" since this is what the joined language name also starts with.

Lucas Werkmeister (talkcontribs)

Regarding the template identifiers, I think it would make more sense to put the language code at the end (e.g. hindustani-noun-masculine-ur) – I think that would match the “more general parts first” guideline that I generally follow for the identifiers, for example:

  • hindustani-noun-masculine-hi
  • hindustani-noun-masculine-ur – still a masculine noun in Hindustani, only the language code changed
  • hindustani-noun-feminine-hi – still a noun but now feminine
  • hindustani-adjective-red-hi – still Hindustani but no longer a noun

Though it’s not totally clear (hi/ur is a bit more “orthogonal” to the other “dimensions” than usual, perhaps). But on a more practical note, it might make editing the URL easier to toggle between the Hindi and Urdu templates ^^ [edit: What do you think?]

And a more straightforward thing: the last two forms (the optional ones) in the Hindi version of the feminine noun template have masculine (Q499327) instead of feminine (Q1775415) as the grammatical feature, is that intentional or a copy+paste mistake? (I suspect the latter, since the Urdu version has feminine (Q1775415) for all six forms.)

عُثمان (talkcontribs)

That makes sense regarding the language codes, I have changed the identifiers accordingly. And yes, that was a copy paste mistake on the Hindi feminine vocative forms, thanks for catching that

Lucas Werkmeister (talkcontribs)

Alright, thanks! Then I’ll start transcribing the templates (and hopefully not notice anything else amiss).

Lucas Werkmeister (talkcontribs)
Lucas Werkmeister (talkcontribs)

So far it seems to me like the identifier might as well use “declinable”/“indeclinable”/“comparable” – or are there going to be more adjective templates?

عُثمان (talkcontribs)

It's more common within in-language sources, but these categories are also brought up in Hindi: An Essential Grammar. The names are sort of idiomatic in that they use an adjective of the type they refer to - the common word for red "lal" is invariant, while the word for black "kala" changes for gender, number, and case. Grammars of the other northwestern Indic languages like Punjabi, Hindko, Saraiki, etc. have adopted the same pattern for adjective naming so it is also consistent with that (Saraiki for example has "unfast" adjectives which change for gender but not number, named for an adjective for unfast dyes which does this.)

The reason for not using a more generic label like declinable/indeclinable is there are other types of declinable adjectives which decline along a pattern other than that of "black" (changing just for case and nothing else, for example). There are too few of these in common use in Hindustani to warrant their own templates however.

عُثمان (talkcontribs)

I haven't created a lexeme for the Hindustani cognate yet, but this would be an example of what I mean by an adjective which declines along a different paradigm - ਬਾਕੀ/باقی (L1037483). ਹੱਬਾ/ہبّا (L985132) has the same number of forms but the endings are different. So I think these types of adjectives are too idiosyncratic to have their own templates.

Lucas Werkmeister (talkcontribs)

Google isn’t letting me read this book (“You have reached your viewing limit for this book”, even in a private window), so I’ll have to take your word for it :D thanks! I wasn’t too fazed by red/black (I think I’ve seen something similar in templates for another language, though I don’t remember which one), but handsomest felt a bit stranger ^^

عُثمان (talkcontribs)

That one can be changed to comparable if that sounds less silly, there isn't a competing comparable adjective paradigm so I only chose that one to match the scheme of red / black.

Lucas Werkmeister (talkcontribs)

It’s alright… but why “handsomest” and not just “handsome”?

عُثمان (talkcontribs)

Part of why the color names are used for the other adjectives is that it is apparent from the label what type it is - so for "black adjective" it becomes "kali sifat" if we use "sifat" a feminine word for adjective, or "kala gun" using a masculine word. "Handsomest" indicates the existence of a superlative form in the same way, and English has comparable adjectives but not gendered ones so it is possible to carry that idea over in the calque.

Lucas Werkmeister (talkcontribs)

Alright, that sounds sensible enough to me. Thanks!

Lucas Werkmeister (talkcontribs)

The templates for nouns, adjectives and adverbs should be deployed now – the verbs will need more time, I’m afraid (I’ll try to start them tomorrow but might need more than one evening to finish them).

عُثمان (talkcontribs)

Fantastic, thank you - it took me several evenings to assemble the verb templates, so that is understandable

عُثمان (talkcontribs)

I've made a couple adjustments to these templates - User:Mahir256 pointed out I had used two different spellings of "Hindustani" in Devanagari without realizing it, so I have updated them all to हिंदुस्तानी which seems to be the most common spelling in writing as opposed to हिंदुसतानी (the more common spelling apparently differs from the Urdu spelling and from the pronunciation, but that seems to be expected in this context). I have also updated the items for the transitivity values on the verbs as User:Nikki has started making an effort to shift all of these to new items describing transitivity as a property of verbs instead of the items for the types of verbs themselves.

Lucas Werkmeister (talkcontribs)

I think some of the verb template identifier components should perhaps be switched around too:

  • hindustani-verb-basic-transitive-urhindustani-verb-transitive-basic-ur
  • hindustani-verb-additive-transitive-urhindustani-verb-transitive-additive-ur
  • hindustani-verb-causative-ur is fine, but mentioning for context
  • hindustani-verb-double-causative-urhindustani-verb-causative-double-ur

This way, the first two templates still share the property of being transitive, and the last two also have the causative element in common. What do you think? (And the same would apply to the hi templates too, of course.)

عُثمان (talkcontribs)

So it's slightly counter-intuitive, but the "additive transitive" phase is actually a property of intransitive verbs - this "verb phase" model comes from John Beames's work A Comparative Grammar of the Modern Aryan Languages of India (Q113330708). There is a regular pattern in Hindustani (and Punjabi and the other northwestern Indic languages) where most intransitive verbs can be made to take additional objects. (The inverse used to be true in Hindustani too, but this feature has been dropped over time - the Punjabi templates I am working on also have the "subtractive" phases which allow intransitive forms of transitives, or even avalent forms of intransitives.) The "basic" phase describes the base form of the verb (intransitive or transitive). The "additive" phases then describe extensions of either type of base. If written out fully, the categories would be:

  • verb-basic-intransitive
  • verb-basic-transitive
  • verb-additive-transitive
  • verb-additive-causative
  • verb-additive-double-causative

However, since the intransitive phase can only be a "basic" form, there is no need to distinguish this, and likewise causative forms are only "additive" as there are no verbs with a causative base form. Only the transitive extension of intransitives needs to be distinguished from the "base" transitive template. The idea behind these templates is that it is a lot easier to enter the forms for these if broken up into smaller chunks rather than having one massive template with ~200+ fields, especially since not every verb goes up to double causative - the additive-transitive and causative / double-causative template would be used in edit mode to add these forms to existing lexemes. I think it would be fine to switch the order for double causative to causative double, but the others seem like they might cause additional confusion in this context.

If it would be helpful to look at an example of what I mean, गड़ना/گڑنا (L991835) is a fully modeled Hindustani intransitive verb, and on the talk page of that lexeme there is an explanation of how this verb phase information can be used in the context of larger verbal expressions.

Lucas Werkmeister (talkcontribs)

Okay, I see… tbh, I think I’d prefer to include the “redundant” parts in the identifier in that case? As in your fully written out version, except probably still with causative-double too. I feel like this makes the relationship between the templates a bit clearer. But I’m also okay with keeping the current identifiers if you prefer that.

عُثمان (talkcontribs)

OK, that's fine with me; I have updated the identifiers accordingly. Broadly speaking, the two types of template available can be grouped as the "basic" ones and the "additive" ones so I can see how that would be less confusing.

Lucas Werkmeister (talkcontribs)

Alright, thanks! The basic-intransitive templates are up now (hi, ur), the rest will follow.

Lucas Werkmeister (talkcontribs)

basic-transitive also up now (hi, ur).

Lucas Werkmeister (talkcontribs)

additive-transitive too (hi, ur). I’m starting to figure out how to efficiently transcribe them ^^

عُثمان (talkcontribs)

Nice, I appreciate your efforts on this since I know these are a lot. So far these have worked great

عُثمان (talkcontribs)

In trying these out some more, I noticed I propogated a slightly hairy mistake across Form 30 specifically in each of the Urdu verb templates. I've updated the template wikitext, but to highlight the changes, these are the corrected strings which should be in the brackets in the example sentences for Form 30 on each of them:

  • basic-intransitive:
    • پھَیلےگی
  • basic-transitive:
    • دھارےگی
  • additive-transitive:
    • پھَیلائےگی
  • additive-causative:
    • کھِلائےگی
  • additive-causative-double:
    • کھِلائےگی

I don't remember what I did that might have resulted in this, but thankfully it looks like there are no issues with any of the other forms or in the Hindi versions of this form upon second review.

Lucas Werkmeister (talkcontribs)

Alright, that fix should be deployed now. Thanks!

Lucas Werkmeister (talkcontribs)

additive-causative also done (hi, ur).

Lucas Werkmeister (talkcontribs)

additive-causative-double also done (hi, ur) – that should be everything now \o/

عُثمان (talkcontribs)

Amazing, thank you! This opens up a lot of possibilities.

عُثمان (talkcontribs)

I spotted another typo I made - on `hindustani-verb-basic-intransitive-ur` Form 1, the bracketed example should be پھَیلنا (I just fixed this on the wiki page).

Lucas Werkmeister (talkcontribs)

Alright, the fix should be deployed now.

Reply to "Hindustani templates ready"
عُثمان (talkcontribs)

@Lucas Werkmeister: - Here are some Hindko noun templates I put together Wikidata:Wikidata Lexeme Forms/Hindko, for masculine and feminine nouns. This should be simpler than the Punjabi ones as only one script is involved

(only the first two are complete, I am adding WIP verb templates underneath but these will take some time.)

عُثمان (talkcontribs)

Hindko is not available on TranslateWiki for the interface messages. As Punjabi is mutually intelligible, using the pnb strings would be acceptable, but I could also provide Hindko adaptations of these to insert manually if that is preferrable.

Lucas Werkmeister (talkcontribs)

Thanks! In the noun templates, the order of grammatical features seems a bit inconsistent (e.g. singular/plural moves to the end of the list in the 4th form), and IIRC this order is kept when the lexemes are saved and viewed later, so it would be nice to have it consistent – can you move them around? (I’d probably go for case, then singular/plural, then masculine, but I’m not sure.)

Regarding the translations, it would definitely be easier for me to just use the pnb translations, but I don’t know how disruptive that would be to tool users… if you want, I can ask some translatewiki people if it would be possible to add the language, or if it would disturb the bot that transfers the messages (FuzzyBot IIRC?) if we added the messages manually to the tool’s source code.

عُثمان (talkcontribs)

Sorry about that, I was scatter brained in adding the features - I went with gender-case-number since if selecting adjective/verb/postposition forms to use for a noun, that is the order that makes sense to follow.

Later tonight, I'll have a look at the strings and assess how feasible it is to tweak the translations in a way that is grammatical in both languages. It is hard to say off of the top of my head; it depends on if there are enough overlapping constructions to suit the context.

عُثمان (talkcontribs)

@Lucas Werkmeister: I have now rewritten the pnb translations checking along side the Gandhara Hindko Board dictionary and managed to make it work in a way that is valid in both languages, so you can just use that. This tool is specific enough in scope that it is possible to avoid many constructions without coming across as awkward; adding Hindko to TranslateWiki is likely suited to a larger project at a later time.

Lucas Werkmeister (talkcontribs)
عُثمان (talkcontribs)

Looks good, thank you! تحصِیل (L743917)جُرُم (L743913)

The grammatical gender properties at the top of the lexeme did not show though.

Also, I made a mistake in the feminine template and put the diacritic in the placeholder under the wrong letter, it is تحصِیل not تحِصیل. I was zoomed out to far when I typed that, sorry. It is corrected for each form in the wiki template now.

Lucas Werkmeister (talkcontribs)

Alright, should be fixed now. Thanks!

عُثمان (talkcontribs)

@Lucas Werkmeister: As Hindko is now available on TranslateWiki, I have moved the translations for this to the hno code with some adjustments now that the messages do not have to be shared.

Lucas Werkmeister (talkcontribs)

Alright, those translations should now be used. Sorry for the delay.

عُثمان (talkcontribs)

No worries, and thank you!

عُثمان (talkcontribs)

@Lucas Werkmeister: And now the code hno is available in MediaWiki and usable with lexemes, so these templates can be changed to use that.

Lucas Werkmeister (talkcontribs)

Thanks, should be deployed now.

عُثمان (talkcontribs)

Thanks!

Lucas Werkmeister (talkcontribs)

FYI, I just deployed a change to move the Hindko templates between the Hindustani and Croation ones, so they’re sorted by language code (hi, hno, hr) – I should’ve done that earlier. Just in case you’re wondering where they went in the list on the main page ^^

Reply to "Hindko noun templates ready"