Wikidata:WikiProject Chemistry/Guidelines/Naming conventions

From Wikidata
Jump to navigation Jump to search

General remarks[edit]

This page presents general guidelines regarding the use of labels, aliases and descriptions in chemistry-related items. While some of the information here will be similar to those from the help pages regarding labels, descriptions and aliases, there is also more detailed information directly related to the field of chemistry, in particular items describing chemical entities, their groups and classes. The following rules apply to English, but it is advisable to apply them appropriately in other languages, if possible.

It is worth remembering that labels, descriptions and aliases were never intended to replace statements. They are a collection of data primarily for use within Wikidata and for Wikidata purposes, especially to facilitate the use of the search engine. Therefore, labels and aliases are not a set of all possible names, but only those that may be helpful in searching; similarly, descriptions are not intended to contain a description of a given chemical entity, but only a simple information allowing the item to be distinguished from another with the same or similar label.

Labels[edit]

Ambiguity[edit]

While labels in Wikidata can be ambiguous, chemical nomenclature allows to avoid ambiguity, thus it may be advisable to use a more specific name as a label than that resulting from common usage or the name of a Wikipedia article. Other names may be added as aliases.

Despite the fact that the most common name for 2-phenethylamine is phenethylamine or phenylethylamine, the use of the locant in the label is advisable to easily distinguish it from its positional isomer – DL-1-phenethylamine(Q3560549).

Similar situation occurs in the case of glucose, where this name is usually given to D isomer and is described as such in many language versions of Wikipedia. In Wikidata both isomers should be described equally as D-glucose and L-glucose, at the same time, it allows for differentiation from the item describing both isomers – glucose(Q37525).

Wikidata describes reality in a different way than Wikipedia. The level of detail here is much greater than in encyclopedia articles, and something that is usually described in one article may be divided into many Wikidata items. Although salbutamol (i.e. both enantiomers and the racemic mixture) is usually described in one article, it is divided into four items in Wikidata. Each of its enantiomers are described in different items, one item describes a racemic mixture, and one – the so-called structure with an undefined configuration. Therefore, it is advisable to name items describing racemic mixtures with an appropriate prefix (rac-), while in the case of structures with an undefined configuration, use the prefixes RS, DL, or ±.

Labels should not be used with disambiguation information in parentheses.

The former label for this item was D-glucose (closed ring structure, complete stereochemistry). However, proper structure can be easily specified using chemical nomenclature: D-glucopyranose.

Capitalisation[edit]

Standard Wikidata rules applies here. Label should begin with a lowercase letter, except situations in which an uppercase letter is required.

Since no rules require the use of capital letters, the labels of such items are written in lower case.

IUPAC nomenclature recommendations require that some prefixes or locants should be capitalised.

Non-standard symbols and formatting[edit]

Although IUPAC nomenclature recommendations require name formatting in certain cases (e.g. italics or smaller font), this is neither technically possible in Wikidata, nor important from the point of view of the search engine. Therefore, formatting should be omitted. In the case of names with superscript or subscript text, it is advisable to find a name without such text, alternatively two names may be added (one as a label, the other as an alias), one of which uses Unicode superscript or subscript symbols, while the second has this text written in a regular font.

Both S locant and D prefix should use formatting, but neither italics nor small script is technically possible in Wikidata.

Names with non-Latin characters as locants should be added in both forms: one that uses a non-Latin letter and the other with the full name of such letter.

Item describing β-alanine has in its label a Greek locant, but also beta-alanine as an alias.

Grammatical number[edit]

Labels of items that describe groups or classes of chemical entities should be written in singular. Plural names should be added as aliases.

Both items have the same label, but describe different concepts: one describes a specific chemical entity, the other class of chemical entities. Distinction can be made here based on their descriptions.

Names imported from external sources[edit]

Most labels in Wikidata items come from external sources. However, names present in such sources are not always correct. Sometimes incorrect names are propagated through many sources until they finally reach Wikidata. This may occur especially in situations of similar stereoisomers, of which only one spatial arrangement is correctly referred to by a given name. Such situations require checking with reliable sources, primarily scientific articles that describe the spatial configuration of a specific chemical entity.

This item had a label siamenoside I imported from PubChem database. However, this is a backbone structure with 30 stereocentres with undefined configuration and as such is considered a group of stereoisomers in Wikidata. The correct structure of siamenoside I is described under siamenoside I(Q3482900) with defined configuration of all stereocentres. The proper label for this item would be its systematic name, however, it is too long to be added as a label.

Character limit of labels[edit]

Due to the specificity of chemical nomenclature, some names may be too long and exceed the permissible limit of 250 characters. If no shorter and correct name exists for the structure, InChIKey added as English label is used as a temporary name.

Aliases[edit]

Most of the rules that apply to labels also apply to aliases. Importantly, aliases are other names for the described concept and must strictly refer to it. These cannot be the names of similar or related concepts.

Names of derivatives or other forms[edit]

Even though Wikipedia sometimes describes some derivatives of a given compound or the forms in which it occurs within one article, in Wikidata they will usually be described in different items. Therefore, it is important that aliases only apply to the concept described.

This item had butyrate, butanoate, propanecarboxylate and a few other aliases added that refers to its deprotonated form. This form, however, is described in butyrate(Q55582441) and such names are valid only in this item. Such forms are linked via a pair of properties: conjugate acid (P4147) and conjugate base (P4149).

At some point in the past calcium chloride dihydrate has been added to this item as an alias. As this item describe anhydrous form of calcium chloride, such name is valid only for calcium chloride dihydrate(Q29207042). Hydrates are linked to its anhydrous form using hydrated form of (P4770) property.

Although from the pharmacological point of view the difference may be small, the active substance and its salt are described in separate items. Hence doxorubicin hydrochloride as an alias for this item is not valid, this salt is described under doxorubicin hydrochloride(Q27032359) and linked to its parent form using parent form of an active substance (P12099).

Acronyms, formulae, identifiers[edit]

Adding acronyms as aliases is allowed, but care must be taken to ensure that they are correct and that they refer to the concept described, not its other forms. Use of short name (P1813) in items describing chemical entities has not been discussed.

Chemical formulae may be added when their presence may in any way help in finding a given chemical entity using a search engine, i.e. entering them manually is so simple and obvious that someone can realistically search for a compound using such a formula. Adding chemical formulae that are identical for many chemical compounds makes no sense. Chemical formulae should always be added using chemical formula (P274).

Any external database identifiers should not be added as aliases. Instead, such identifiers should be added using the appropriate properties.

Names of commercial products[edit]

Commercial products are usually not pure chemical substances – which are described in Wikidata items – but mixtures of various chemical compounds. Therefore, they should not be added as aliases. This applies primarily to the names of pharmaceutical products, which are described in separate items in Wikidata and their names are correct only in these items.

This item had Zoloft as an alias, which is a pharmaceutical product which contains sertraline hydrochloride as an active ingredients (amongst other non-active ingredients, i.e. chemically it is a mixture). However, this product is described under Zoloft(Q47522126) which should be linked to sertraline(Q407617) through sertraline hydrochloride(Q27108281) using parent form of an active substance (P12099), active ingredient in (P3780) and has active ingredient (P3781).

Descriptions[edit]

General rules regarding descriptions from the relevant help page applies here. Descriptions are intended solely for Wikidata purposes – to disambiguate the concept described – and are not intended to be definitions, just short phrases; the concept described in the item is defined only by the statements. Too long descriptions may even be confusing, making it difficult to quickly grasp the differences between items. Moreover, the description should refer to the concept described in Wikidata, not in the Wikipedia article, which do not always correspond 1:1.

For these reasons, descriptions such as chemical compound or group of isomers are completely sufficient in most cases. Describing uses, properties or functions of a chemical entity is unnecessary and therefore not recommended.

A temporary exception to this are descriptions of classes of chemical entities. They contain a fairly concise description characterizing the members of such a class. This makes it easier to manually classify items into chemical classes until solutions like Wikidata usage instructions (P2559) are better supported in Wikidata.

Notes regarding automatic import of names[edit]

Before any large import of names, it is advisable to discuss this fact on the main WikiProject discussion page.

Bot operators and users of automatic or semi-automatic Wikidata editing tools should be mindful of the following:

  1. do not automatically change names that have been manually curated,
  2. do not assign specific languages names in other languages (databases do not always make appropriate distinctions between names in different languages),
  3. do not add names that differ only in capitalization from the existing ones,
  4. do not add names that do not directly refer to a given concept – the fact that a name appears in the corresponding entry in the database does not mean that this name should also be included in Wikidata.

All of the above points result from previous errors made during data import and required or still require a lot of work to fix them.