Wikidata:Lexicographical data/Documentation/Languages/fr
Context[edit]
Language spoken in France (Q142), Belgium (Q31), Switzerland (Q39), Canada (Q16) (Quebec (Q176)).
Corresponding language codes :
fr
for "standard" French- no
fr-CH
yet, use fr-x-Switzerland (Q39) instead fr-x-orthographic corrections of French in 1990 (Q486561)
for modern orthography
SELECT ?languageCode (COUNT(?lexeme) AS ?count) WHERE {
?lexeme dct:language wd:Q150 ; wikibase:lemma ?lemma .
BIND(LANG(?lemma) AS ?languageCode)
}
GROUP BY ?languageCode
ORDER BY DESC(COUNT(?lexeme))
Lexical categories[edit]
- noun (Q1084)
- adjective (Q34698)
- verb (Q24905)
- proper noun (Q147276)
- adverb (Q380057)
- preposition (Q4833830)
SELECT ?lexCat ?lexCatLabel (COUNT(?lexeme) AS ?count) WHERE {
?lexeme dct:language wd:Q150 ; wikibase:lexicalCategory ?lexCat .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?lexCat ?lexCatLabel
ORDER BY DESC(COUNT(?lexeme))
Two genders: masculine (Q499327), feminine (Q1775415)
Some rare weird cases like, famously the 3 nouns: orgue (L471), amour (L1021), délice (L15976) (masculine in singular, feminin in plural).
Some cases where dictionaries disagree on the gender (après-midi (L25740)).
Question of the occupations (where masculine is sometimes - old-fashion? - be seen as neutral/general ; cases where masculine and feminine are the same pirate (L24230), géologue (L621684)).
SELECT ?genre ?genreLabel (COUNT(?l) AS ?nb) WHERE {
?l dct:language wd:Q150 ; p:P5185/ps:P5185 ?genre .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?genre ?genreLabel
ORDER BY DESC(?nb)
Two grammatical numbers: singular (Q110786), plural (Q146786).
Some nouns are invariable but we still need two separate forms for each number.
The infinitive should be used as the main lemma (in lowercase, lemma are not title).
Forms[edit]
Forms vary depending on several grammatical features:
- grammatical mood (Q184932): infinitive (Q179230), participle (Q814722), indicative (Q682111), subjunctive (Q473746), conditional (Q625581), imperative (Q22716), ...
- grammatical tense (Q177691): present tense (Q192613), past tense (Q1994301), preterite (Q442485), imperfect (Q108524486), simple future (Q1475560), ...
- grammatical person (Q690940): first person (Q21714344), second person (Q51929049), third person (Q51929074)
- grammatical number (Q104083): singular (Q110786), plural (Q146786)
- grammatical gender (Q162378): feminine (Q1775415), masculine (Q499327)
- grammatical aspect (Q208084)
- voice (Q211101)
Note: conditional (Q625581) is sometimes considered as part of indicative (Q682111); in Wikidata, we keep it as a mode on its own.
On Wikidata, it is proposed to fill only non-obvious forms (for instance, composite tenses or gerund would not be filled). On the general case, this would give 51 forms for verbs:
- infinitive (Q179230) (1)
- participle (Q814722)
- present tense (Q192613) (1)
- past tense (Q1994301) (4, combining singular (Q110786) / plural (Q146786) and feminine (Q1775415) / masculine (Q499327))
- indicative (Q682111)
- present tense (Q192613) (6*)
- preterite (Q442485) (6*)
- imperfect (Q108524486) (6*)
- simple future (Q1475560) (6*)
- subjunctive (Q473746)
- conditional (Q625581)
- imperative (Q22716)
- present tense (Q192613) (3**)
With :
* combining first person (Q21714344) / second person (Q51929049) / third person (Q51929074) and singular (Q110786) / plural (Q146786)
** first person singular and plural, second person plural
Some verbs (called defective verb (Q2721259)) can have fewer forms, like pleuvoir (L1917). Some other verbs have more forms, like payer (L10770).
Grammatical features must use atomic values listed above (for instance first person (Q21714344) and singular (Q110786) instead of first-person singular (Q51929218)).
Groups[edit]
French has 3 groups of conjugation. This is stored in conjugation class (P5186) :
- conjugation of Group I French verbs (Q2993354)
- all verbs finishing with -er
- exception: aller (L750) (and some double verb (Q54595814))
- conjugation of Group II French verbs (Q2993353)
- all verbs finishing with -ir and their present participle finishing with -issant
- includes haïr (L17358)
- conjugation of Group III French verbs (Q2993358)
- all other verbs (all more or less irregular)
#title: French verbs by group
SELECT ?group ?groupLabel (COUNT(?lexeme) AS ?nb) WHERE {
?lexeme dct:language wd:Q150 ; wdt:P5186 ?group .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?group ?groupLabel
ORDER BY ?groupLabel
Graphs are temporarily unavailable due to technical issues. |
Notes on filling groups on Wikidata[edit]
- Generate QuickStatements commands for verbs of first group missing conjugation class (P5186) (the list should be empty):
SELECT ?lexeme (GROUP_CONCAT(?lemma; separator = ', ') AS ?lemmas) (CONCAT(SUBSTR(STR(?lexeme), 32), ',Q2993354') AS ?qs) {
?lexeme dct:language wd:Q150 ; wikibase:lexicalCategory wd:Q24905 ; wikibase:lemma ?lemma .
FILTER NOT EXISTS { ?lexeme wdt:P5186 [] }
FILTER(REGEX(?lemma, 'er$'))
}
GROUP BY ?lexeme
ORDER BY ?lemmas
- Generate QuickStatements commands for verbs of third group missing conjugation class (P5186) (the list should be empty):
SELECT ?lexeme (GROUP_CONCAT(?lemma; separator = ', ') AS ?lemmas) (CONCAT(SUBSTR(STR(?lexeme), 32), ',Q2993358') AS ?qs) {
?lexeme dct:language wd:Q150 ; wikibase:lexicalCategory wd:Q24905 ; wikibase:lemma ?lemma .
FILTER NOT EXISTS { ?lexeme wdt:P5186 [] }
FILTER((!REGEX(?lemma, 'er$') && !REGEX(?lemma, 'ir$'))|| REGEX(?lemma, 'oir$'))
}
GROUP BY ?lexeme
ORDER BY ?lemmas
- After emptying previous lists, you have to manually work on all verbs missing conjugation class (P5186), splitting them on second and third groups. The user script User:Envlh/FrenchLexemes.js can be useful to quickly add conjugation class (P5186) (and other properties) on a verb.
SELECT ?lexeme (GROUP_CONCAT(?lemma; separator = ', ') AS ?lemmas) (IRI(CONCAT('https://fr.wiktionary.org/wiki/Conjugaison:français/', ?lemmas)) AS ?wkt) {
?lexeme dct:language wd:Q150 ; wikibase:lexicalCategory wd:Q24905 ; wikibase:lemma ?lemma .
FILTER NOT EXISTS { ?lexeme wdt:P5186 [] }
}
GROUP BY ?lexeme
ORDER BY ?lemmas
Identifiers[edit]
Here is a list of identifiers used on French lexemes ; on 2024/04/07, the top 3 is: Cordial Dictionary ID (P11178), Larousse Online French Dictionary ID (P11118), Littré ID (P7724) (more than 10 000 uses each).
SELECT ?prop ?propLabel (COUNT(?l) AS ?number) WHERE {
?l dct:language wd:Q150 ;
?dict ?id .
?prop wikibase:directClaim ?dict .
?prop wdt:P31 wd:Q56216056 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?prop ?propLabel
ORDER BY DESC ( ?number)
Ressources[edit]
|
|
|
- Wikidata:Lexicographical coverage/fr/Statistics (see also: Wikidata:Lexicographical data/Statistics)
- Wikidata:Lexical Masks : E164
- Last edits (month) on Lexemes in French
- Ordia about French (Q150)
- Publications about French (Q150) in Scholia
- SQL query
- Properties linking to online dictionaries (list with usage statistics):
- TLFi ID (P7722) https://w.wiki/5UXA
- Littré ID (P7724) https://w.wiki/5RcQ
- Dico en ligne Le Robert ID (P10338) https://w.wiki/5RcN
- Bob ID (P7766)
- Dictionnaire de l'Académie française ID (9th edition) (P7732)
- Dictionnaire électronique des synonymes ID (P7765)
- Larousse Online French Dictionary ID (P11118)
- Cordial Dictionary ID (P11178)