Wikidata talk:WikiProject Biography

From Wikidata
Jump to navigation Jump to search

Ethically complicated WikiProject[edit]

WikiProject Biography is the single largest WikiProject in every Wikipedia where it exists. In English Wikipedia for example there are about 6 million articles with about 1 million of those being biographies. Likewise in Wikidata there are many people curating the data profiles of people, and thereby making biographies of those people in the form of data.

Wikipedia was established in 2001. WikiProject Biography in English Wikipedia was established in 2002. Wikidata was established in 2012. While there were WikiProjects in Wikidata somewhat related to biographies for some years, I and some others are just establishing this WikiProject now in 2022. It is hard for me to survey the Wikidata editorial community to ask why no one has already created this WikiProject, but by looking at WikiProject Biography (Q4913761), I see that there are already 34 language versions of this project for Wikipedias and I know that many people expect that a WikiProject like this must exist in Wikidata. The reason why I think no one has yet established this WikiProject is that anyone who thinks about community organization in this space immediately recognizes that data modeling individuals is 1) problematic and invites social and ethical discussion and 2) already is happening at a high rate in Wikidata and other online platforms. Since creating a WikiProject Biography would be unlikely to speed the curation of data in Wikidata and since there are not going to be easy answers for explaining how this data curation works, this is why I think there is not already a WikiProject.

I personally do editing with Wikidata:WikiProject LGBT, Wikidata:WikiProject Higher education, and Wikidata:WikiProject Source Metadata, so I am encountering enough issues related to data about people that I think establishing this WikiProject is the correct thing to do right now. I am not sure what issues will arise or when, but I think many will arise. Bluerasberry (talk) 18:31, 26 May 2022 (UTC)[reply]

Other similar projects[edit]

Here are other existing projects which have some overlapping scope with this one.

Bluerasberry (talk) 20:45, 26 May 2022 (UTC)[reply]

Cinemaazi film & person ID's[edit]

I have created property proposals for the Cinemaazi film ID & Cinemaazi people ID for Indian Cinematic works and people, including for regional language works.

Apologies for adding this here, but the "Ping" function is not working correctly.

Kindly requesting any comments or concerns about the property proposals. Wallacegromit1 (talk) 21:55, 30 October 2022 (UTC)[reply]

@Wallacegromit1: I supported. Proposals for identifiers are a better fit at Wikidata:WikiProject Biographical Identifiers.
I think I fixed the ping also. Bluerasberry (talk) 15:51, 3 November 2022 (UTC)[reply]
@Bluerasberry Thanks! Wallacegromit1 (talk) 05:48, 4 November 2022 (UTC)[reply]

Athlete and model requests deletion for personal privacy[edit]

This image does not depict the person discussed. This is a case about surfing and sport in general.

To protect the person's privacy I want to describe a case in general terms. The issue is recurring.

A user writes into meta:Volunteer Response Team asking for removal of their Wikidata entry and also Wikipedia articles. They note that their Wikidata entry includes name, place of birth, date of birth, and height. Other information is available that they do not mention, including weight. They ask for privacy. Right now it is fairly uncommon for people to write in about Wikidata, as the general public is often unaware of Wikidata and only sees Wikipedia.

This person was a professional surfer some years ago but I see no recent records of competition. Some of their sponsors include major brands for swimsuits and professional level athletic gear. They ranked highly in multinational athletic competition. For a sport like surfing, and when there is documentation that their sponsors are swimsuit brands, it does not seem surprising to me that media would report the information I see in Wikidata. In addition to being a surfer, this person is documented as a fashion model with major fashion houses at high profile shows. In that context also height, weight, and gender are common data fields.

Wikidata ingests datasets at scale. As sports and fashion datasets become more accessible, Wikidata ingests them. If we were to delete the entry on this person for some reason, then I expect they would re-enter Wikidata as their name appears in the kinds of public datasets we collect. It is not possible to document sports outcomes without naming those who ranked in competition, so for those sports rankings, they meet Wikidata:Notability's requirement of being a "structural need" to report sports outcomes.

I want to remark on gender also. I have seen a lot of requests by women who want to erase themselves from Wikipedia and online records. I think that either only women do this, or otherwise women do this ten times more than men. There are activists who observe that Wikipedia covers men more often than women, and who are trying to counter gender bias on Wikipedia (Q17002416) by ensuring that women are in the records. There is a social and ethical issue to balance here that women are underrepresented in media records, and that activists try to correct that lack of representation, but at the same time there are significant numbers of accomplished and documented women who request removal from that representation.

Under what circumstances do we grant requests for privacy versus publish data which seems to be in the scope of Wikidata's coverage?

This issue merits conversation somehow. Bluerasberry (talk) 15:40, 3 November 2022 (UTC)[reply]

Adding onto this, simply to underscore what I see as the absolutely vital question that you pose at the very end, there. We need to be precise about this: might not some people's lives be at stake, in the most extreme of circumstances, if data about, say, their ethnicity or religion was available on Wikidata? And yet, even in asking that question, I know that some other people's lives might be at stake if they get (correctly) identified as a "white supremacist" or some other reprehensible category... HappyBear5000(talk) 14:05, 5 November 2022 (UTC)[reply]

Data modeling the person[edit]

5 minute talk about the selection of examples in the data model, with @ HappyBear5000: presenting

Bluerasberry (talk) 21:14, 13 December 2022 (UTC)[reply]

Complaint about Wikidata having a former name[edit]

Wikidata mailing list, "Web reference", 28 July 2023

The general situation is that a scientist changed their name 40 years ago and retired 20 years ago. Recently Wikidata surfaced all their names through the WikiCite project, because Wikidata attempts to present their full bibliography of scientific publications.

The person wants to be recorded only by their current name and not surface the change from decades ago. From the Wikidata community perspective, the objective is simply to associate all papers to a given author. Bluerasberry (talk) 20:18, 3 August 2023 (UTC)[reply]

Clarifying guidelines for "Affiliation" and "Employer" properties[edit]

Hi y'all. After reflecting on my recent experiences within the WikiCite (Q21831105) project over the past few months (items of authors), I believe it's time to open a discussion regarding the appropriate and functional utilization of properties affiliation (P1416) and employer (P108) (and related properties, but mostly these two). It's not uncommon for properties to overlap on Wikidata; for instance, we have languages spoken, written or signed (P1412) and writing language (P6886), and despite the latter being described as a "subproperty of" the former, both are still utilized. Therefore, the issue lies not in the existence of these properties, but rather in establishing guidelines for their usage. I've observed instances of oversimplified use and removalin the history of items, which I don't really agree. Let's discuss and establish clearer guidelines for their effective application.

I've noticed a prevailing notion that if "employer" is used, then "affiliation" is not necessary, but they are not interchangeable. Firstly, "employer" is a delicate property that also encodes personal information, specifically who pays whom, with a very specific legal meaning. For example, Ph.D. students might not receive a salary (and educated at (P69) covers the description of unpaid students, yet we need to specify their affiliation because they can also be multiple), and some professors or researchers may have dual affiliations but are paid by one institution. One should be extremely careful in drawing inferences, as incorrect personal information may be providedina public database.

This also involves specifying the proper sources to be utilized. affiliation (P1416) is a piece of information that can often be inferred from specific articles (which is why it also has a variant for strings). It can be sourced with bibliometric identifiers, making it manageable for those working with to-down massive bibliometric imports with no interest in enetring a bigger details of statement. On the other hand, employer (P108) should ideally be sourced with links to ORCID (where the information is clearly stated as such) official CVs or maybe LinkedIn profiles. For instance, in the long term, a thorough cleanup should be introduced with some mild constraints to progressively educate users to pay attention.

So, I open here the discussion for more examples to finalize in the following months some more refined guidelines,in terms at least of warnings and constraints to encourages users to exercise caution and precision in utilizing them, so we can build some literacy for the future. Alexmar983 (talk) 16:10, 17 February 2024 (UTC)[reply]

@Alexmar983: Good points - I have defaulted to using employer (P108) whenever I see an affiliation, unless I have other information (for example ORCID or LinkedIn or other sources to indicate the association was for education, not employment). I think you make a good point that using affiliation (P1416) would be a better default, I'll try to do that going forward. ArthurPSmith (talk) 13:12, 22 February 2024 (UTC)[reply]
I also have for some time had doubts about usage of 'employer' and 'affiliation'. For me there's a subtle difference between the two: affiliation rather hints to scientific field of work of the scientist.
The other point would be that leading scientists often establish labs. His/Her co-workers are affiliated with that lab which in turn imply field of work or study. Kpjas (talk) 09:27, 25 February 2024 (UTC)[reply]
Hello, I am re-entering this discussion to point out that ORCID says "employment", not "employer", which in some cases may induce a confusion. To my knowledge, there is no way to indicate "affiliation" on ORCID, except through this "employment" section. In some countries (e.g. France has a notoriously complicated situation) or for some positions (ph D students), employers (=people or institutions paying the salary) may be multiple and it may also happen that the affiliations (work done in a lab, for instance) are independent and/or multiple. Thus we tend to put both affiliations and employers in the section "employment" on ORCID. But the Research Agencies (ERC, ANR in France, NSF, etc) distribute money (sometimes salaries) and ask to be explicitly mentionned in the articles ; then, some journals put them in " affiliation", which is the only section at your disposal. Thus we need to have an overview of all these problems to decide about the properties and their definitions.

(Moreover, ORCID refers to Wikidata and wikipedia now! Thus we turn into circles). --Cgolds (talk) 12:20, 11 April 2024 (UTC)[reply]

educated at (P69) - university or faculty[edit]

I would like to hear a kind of clarification about the usage of educated at (P69).

If a person graduates from a university (or other higher education institution). Is he/she educated at for example Jagiellonian University (Q189441) or Faculty of Biology and Earth Sciences of the Jagiellonian University (Q9379335) ? Kpjas (talk) 09:46, 25 February 2024 (UTC)[reply]

I would say it would be the legal entity, so probably in most cases the university rather than lower levels (faculty, departments). — Finn Årup Nielsen (fnielsen) (talk) 16:31, 9 April 2024 (UTC)[reply]
Unfortunately, I would say that it depends on the country and the time. In some cases, the diploma is indicated as from a Faculty, in others from a University (or other institutions). It is particularly difficult because the official/legal status of an entity may change over time (it is the case in France). The best thing would be to distinguish between what is written on the diploma (to put as a qualifier of the diploma, for instance) and education in the large, but I am not sure it can be known for each person.--Cgolds (talk) 16:29, 12 April 2024 (UTC)[reply]