Wikidata:Wikidata Lexeme Forms/Transcribing templates

From Wikidata
Jump to navigation Jump to search

These are some rough and incomplete instructions for transcribing new templates for languages where at least one template already exists. They’re very specific to my (Lucas Werkmeister’s) setup, and more meant for my own future reference, but if I get hit by a bus or something, perhaps they’re useful to someone else as well.

  • one-time setup: add User:Lucas Werkmeister/Wikidata Lexeme Forms.css (or your own version; generated with python entity_ids) to your common.css
  • before doing anything else: read through the template and check that it makes sense, so you don’t waste a bunch of time transcribing a template that will have to be revised later (minor corrections can be ported later, but for major changes it’s too risky in my opinion, so that would require an almost complete repeat of this work)
  • open the template page in a browser window on one screen
  • edit templates.py in Emacs in a terminal browser window on the other screen
  • go to the last existing template of that language
  • copy that whole template below itself
  • adjust the comment if the new template was contributed by a different user
  • replace the identifier
  • replace the label
  • replace the lexical category item ID (see below)
  • empty the forms
  • for each form
    • (if the labels are wrapped in <q> elements, omit the single quotes surrounding the label value, it’ll be pasted with double quotes anyways)
    • type { Enter 'lab M-/ ': '
    • (if there was an <hr> above this form, add 'section_break': True, before the label)
    • copy label (triple-click)
    • paste into terminal – this may paste with double quotes and/or four extra spaces, don’t worry about those for now
    • type ', Enter 'exa M-/ ': '
    • copy example (triple-click)
    • paste into terminal – ditto
    • type ', Enter 'gra M-/ ': ['
    • enter the grammatical feature item IDs (see below)
    • type '], Enter }, Tab Enter
  • replace all " with ' and then all ' with ', taking note of the Replaced n occurrences message (should match the number of forms)
  • run make to find the almost inevitable syntax errors, then fix them (the Emacs command to go to a line by line number is M-g M-g)
  • run the tool and look at the template in the browser
  • search for a similar commit in the log and copy the hash
  • git add -p templates.py
  • git commit -c PASTE --reset-author
  • update the language, lexical category and author in the commit message with global search-and-replace (:%s/old/new/g)
  • :x
  • git push
  • git push github
  • (usually in a new tab) exec ssh toolforge
  • exec become lexeme-forms
  • cd www/python/src/
  • git pull
  • kubectl rollout restart deployment lexeme-forms

Entity IDs (property IDs, language item IDs, lexical category item IDs, grammatical feature item IDs, or statement value item IDs) are usually specified as variables from one of the modules in entity_ids/; the user CSS mentioned earlier will show you the right variable name, and you can use M-/ to autocomplete the variable names. M-/ with no input will autocomplete based on the previous buffer contents, which can also be useful; in a long list of grammatical features where only the last few change between most forms, repeated use of M-/ SPC after 'grammatical_features_item_ids': may be quite useful (the commas will be inserted automatically).

Depending on the template, Emacs registers can also be useful: C-x r s LETTER to save to register LETTER, C-x r i LETTER to insert register LETTER. For instance, a common 'statements': line may be saved in registers s/S, and the line 'optional': True, in register o.

New entity IDs may have to be added to the appropriate module first; for new language item IDs, also add a corresponding language_Name variable in templates.py, similar to the existing variables there.