Wikidata:Property proposal/risk factor

From Wikidata
Jump to navigation Jump to search

risk factor[edit]

Originally proposed at Wikidata:Property proposal/Natural science

   Done: risk factor (P5642) (Talk and documentation)
DescriptionThis relation highlights which factors are associated with a high prevalence of a particular gene, disease or characteristic. These factors can be country of origin, country of citizenship, race, gender, occupation, anamnesis, etc. Further information can be found in https://en.wikipedia.org/wiki/Risk_factor.
Data typeItem
Domainitem
Example 1myocardial infarction (Q12152)smoking (Q662860)
has effect (P1542) mortality (Q1239812)
Example 2hepatitis C (Q154869)Egypt (Q79)
criterion used (P1013) residence (Q699405)
has effect (P1542) incidence (Q217690)
Example 3myocardial infarction (Q12152)male (Q6581097)
criterion used (P1013) gender (Q48277)
has effect (P1542) incidence (Q217690)
Example 4lactic acidosis (Q1500373)metformin (Q19484)
criterion used (P1013) medical treatment (Q179661)
has effect (P1542) incidence (Q217690)
SourcePubMed articles

@علاء, Ebrahim, *Youngjin, -revi, Addshore, Ajraddatz:

@Arkanosis, ChristianKl, Ladsgroup, Mahir256, Mbch331, Nikki:

@Okkn, Pamputt, Romaine, Sannita, Stryn:

Please create the property as the proposal is currently ready.

Motivation[edit]

Tobias1984
Doc James
Bluerasberry
Gambo7
Daniel Mietchen
Andrew Su
Andrux
Pavel Dušek
Mvolz
User:Jtuom
Chris Mungall
ChristianKl
Gstupp
Sintakso
علاء
Adert
CFCF
Jtuom
Drchriswilliams
Okkn
CAPTAIN RAJU
LeadSongDog
Ozzie10aaaa
Marsupium
Netha Hussain
Abhijeet Safai
Seppi333
Shani Evenstein
Csisc
Morgankevinj
TiagoLubiana
ZI Jony
Antoine2711
JustScienceJS
Scossin
Josegustavomartins
Zeromonk
The Anome
Kasyap
JMagalhães
Ameer Fauri

Notified participants of WikiProject Medicine

Discussion[edit]

Instead of merely 'high prevalence of a particular gene or disease', how about 'high prevalence of a particular gene or disease or characteristic'? MaynardClark (talk) 14:13, 1 August 2018 (UTC)[reply]
MaynardClark: Excellent idea. Fixed. --Csisc (talk) 12:48, 2 August 2018 (UTC)[reply]
This is an evidence-based resource. What is/are the (verifiable threshold) criteria for "high prevalence"? For the examples given, could such sourcing be provided? Soupvector (talk) 16:38, 1 August 2018 (UTC)[reply]
Soupvector: The estimation of prevalence as high is relative. For a disease, 0.01 can be a high prevalence rate. For another one, 0.5 can be a medium prevalence rate. That is why we will be dependent on explicit statements of high prevalence in biomedical scientific literature for this purpose. For hepatitis C (Q154869)Egypt (Q79), the used reference can be https://www.ncbi.nlm.nih.gov/pubmed/28553150. --Csisc (talk) 13:36, 2 August 2018 (UTC)[reply]
 Comment What about a prevalence property - perhaps providing a list of countries and quantitative prevalence values as a Commons table? ArthurPSmith (talk) 20:30, 1 August 2018 (UTC)[reply]
And I see we already have prevalence (P1193) - this can be used now to express this with appropriate qualifiers for country etc. ArthurPSmith (talk) 20:31, 1 August 2018 (UTC)[reply]
ArthurPSmith: prevalence (P1193) is a quantitative property that gives the prevalence rate of a disease or a characteristic in general or in a given country. However, the proposed property is a qualitative one that returns the list of countries that are known to have a high prevalence rate of a particular disease, a gene or a characteristic. The two characteristics are consequently different. The latter is easier to automatically identify from the scientific literature. --Csisc (talk) 13:40, 2 August 2018 (UTC)[reply]
How so? If you have the data available for prevalence by country, sources and hard numbers and all, shouldn't it be possible to derive which countries have the highest prevalence from that? Wouldn't this be redundant data? --Yair rand (talk) 03:33, 7 August 2018 (UTC)[reply]
Yair rand: That is absolutely accurate. However, it is difficult to extract data about the prevalence of all diseases in all countries due to the lack of needed resources. That is why I proposed this new property that can be easily extracted from medical bibliographic databases. I had already developed a Python code for that. --Csisc (talk) 09:16, 7 August 2018 (UTC)[reply]

 Comment We cannot always get precise prevalence (P1193), so this kind of property is very useful and meaningful to describe epidemiology of diseases and other medical entities. However, is there any reason why you only focus on countries, @Csisc? There are many other factors associated with diseases, such as race, gender, occupation, anamnesis, etc. It seems like it would be better to expand the scope of the value of this property to "any factors" that are not based on a clear cause-and-effect relation. If that is the case, I will positively support this proposal. --Okkn (talk) 02:23, 9 August 2018 (UTC)[reply]

Okkn: Excellent idea. Fixed. --Csisc (talk) 15:43, 10 August 2018 (UTC)[reply]
 Support Ok, “risk factor” seems good. --Okkn (talk) 02:28, 11 August 2018 (UTC)[reply]
Blue Rasberry
A risk factor is a characteristic you have that let you more exposed to a disease. In the example you had given, lactic acidosis (Q1500373) (effect) is an adverse effect (Q2047938) of metformin (Q19484) (drug). --Csisc (talk) 16:42, 11 August 2018 (UTC)[reply]
@Csisc: In this example, is taking metformin a risk factor for lactic acidosis? Not everyone experiences the adverse effect, but this risk factor property applies to all use of the drug, does it not? Blue Rasberry (talk) 17:05, 11 August 2018 (UTC)[reply]
Blue Rasberry : This is absolutely accurate. I added this as an example to the property proposal --Csisc (talk) 18:36, 11 August 2018 (UTC)[reply]
 Support @Csisc: Wow, this can get complicated, but I might that perhaps 50% of all medical papers talk about this. Blue Rasberry (talk) 18:42, 11 August 2018 (UTC)[reply]
Maybe criterion is medical treatment (Q179661)? Blue Rasberry (talk) 18:44, 11 August 2018 (UTC)[reply]
Blue Rasberry : Thank you for your support to the proposal. I thank you for your advice concerning the criterion. It is absolutely useful. As for the complexity of extracting risk factors and adding them to Wikidata, I certainly know that. However, I will find a simple method to let the work easier. You can join this effort if you like that. --Csisc (talk) 19:11, 11 August 2018 (UTC)[reply]
Jura: Of course. However, I did not find "Country of residence" as a Wikidata property. --Csisc (talk) 16:46, 11 August 2018 (UTC)[reply]
Jura: Useful information. Fixed. Thank you. --Csisc (talk) 18:36, 11 August 2018 (UTC)[reply]
@Jura1, Csisc: criterion used (P1013) seems redundant in these cases to me. Is it really needed for these statements? --Okkn (talk) 03:14, 15 August 2018 (UTC)[reply]
Okkn: Excellent question. Risk factor is mostly a transitive relation. For example, if we say that a risk factor for hepatitis C is Egypt, most users will have a question: they will ask if it is Egypt as a residence, as a country of birth or as a visited country. That is why we absolutely have to use criterion used (P1013). --Csisc (talk) 09:35, 15 August 2018 (UTC)[reply]
@Csisc: In many cases, we cannot distinguish an environmental factor (as a residence or a visited country) from a genetic factor (as a country of birth). Are your examples clearly refers to countries as residences? --Okkn (talk) 09:49, 15 August 2018 (UTC)[reply]
Okkn: Of course. For hepatitis C (Q154869)Egypt (Q79) as a country of residence, you can see https://www.ncbi.nlm.nih.gov/pubmed/28553150. --Csisc (talk) 10:35, 15 August 2018 (UTC)[reply]
In addition, diseases have not only risk factors for their onsets, but also those for the prognosis. I think P1013 should be used to specify what the risk factor is for. --Okkn (talk) 09:59, 15 August 2018 (UTC)[reply]
Okkn: Of course. However, I think that has effect (P1542) is better for such situations. Fixed --Csisc (talk) 10:35, 15 August 2018 (UTC)[reply]
Andy Mabbett: This is not accurate. Practitioners will not be interested to see the relative risk of the hazard ratio of the risk factor that may differ from a study to another. The main concern for them is to have an exhaustive list of risk factors they can easily use for prevention. --Csisc (talk) 10:56, 15 August 2018 (UTC)[reply]
  •  Comment, I think it's essential that every such statement is backed-up by literature and I would like to see this enforced (via constraints, of course)... there is a lot of simple stuff here (not so controversial statements), but other statements will be time bound (changing health policies) or just outright controversial (think wine and anything that causes and prevents cancer). Can expected qualifiers please be incorporated into the proposal? --Egon Willighagen (talk) 05:45, 13 August 2018 (UTC)[reply]
  •  Support This is how it is called in most of papers. Using the usual vocabulary will make Wikidata more intuitive. -- Thibdx (talk) 14:29, 13 August 2018 (UTC)[reply]
  •  Comment This feature will only be as useful as it is feasible to populate reliably. I find the examples not to be compelling, the statement "Biomedical relations with this property can be easily retrieved with references using PubMed Entrez API" seems out of step with our usual approach to sourcing, and I feel that a clear and sustainable plan compliant with a standard like WP:MEDRS should be coupled to this in order for it to succeed. Soupvector (talk) 01:52, 15 August 2018 (UTC)[reply]
Soupvector: This is an excellent and absolutely useful information. After the property will be created. I will certainly do several Skype meetings in which I show the method to be used for the automatic extraction of risk factors, I will inform interested users about them soon in the mailing lists of Wikidata and of Wiki Project Med. If you like that, you can participate in one of them and share with me your opinions. --Csisc (talk) 10:56, 15 August 2018 (UTC)[reply]
I expressed a need for a process with clearly articulated standards of evidence, which (IMHO) is essential. I do see the value in having a database that could inform inferences about epidemiology - but our standards of evidence need to be clear and firm. Soupvector (talk) 21:55, 15 August 2018 (UTC)[reply]
Soupvector: I agree. I ask about what you can propose to me to adjust this. --Csisc (talk) 09:32, 16 August 2018 (UTC)[reply]