Wikidata talk:WikiProject Wikidata for research

From Wikidata
Jump to navigation Jump to search


Contributions to the "Wikidata for research" project (including Wikidata:WikiProject Wikidata for research and all its pages) are dual licensed under CC BY-SA 3.0 (the Wikimedia default) and the Creative Commons Attribution 4.0 license.
Contributions by the project to the item and project namespaces of Wikidata shall be under CC0.

Licence problem

[edit]

I just want to be sure that this topic is solved. Data stored in Wikidata are under CC0 licence (see Wikidata:Introduction). According to this recommandation data import from any external database should be formally accepted to avoid any problem later. If I don't see a problem for references data (books, scientific articles,...), I see some problems when specific data like for molecules or genes will be imported.

A good solution will be the creation of a OTRS system in Wikidata to allow database owners to provide a formal consent for some data imports. Then a special bot or a special tag should be used for these imports in order to spot the data imported under these agreements. Snipre (talk) 14:11, 9 December 2014 (UTC)[reply]

Hi @Snipre:, thanks for thinking about these copyright issues, and for insisting that we address them. We are aware of the Wikidata policy that content in the item and property namespaces is under CC0, and certainly plan to honour that. We also plan to get approval for any automated imports. I tried several times to post a comment on the blog post in reply to your posts there (finally, a short version thereof got through) - in essence, I fully agree. It's just that
  • Wikidata content outside the item and property namespaces is CC BY-SA 3.0
  • open-access publishing has centred around CC BY 4.0, and the grant proposal by Andrew Su (on which we plan to build) is under CC BY 4.0 too.
  • we plan to use CC BY-SA 3.0/CC BY 4.0 for the materials produced for this proposal, and working out licensing issues for the project later on should be part of the proposal drafting process.
Does that address your concerns? Further feedback on any aspect of the project would certainly be appreciated. --Daniel Mietchen (talk) 21:18, 9 December 2014 (UTC)[reply]
I added {{Wikidata:WikiProject Wikidata for research/Licensing}} to the top of the page to clarify the licensing here. --Daniel Mietchen (talk) 01:43, 10 December 2014 (UTC)[reply]
@Daniel Mietchen:. I just want to be sure that everyone is aware of the licence of Wikidata because to be able to create tools or concepts around wikidata, we need data and right now this is a problem because few databases have a similar licence. I think with this project that we will have plenty of tools to work with data but few data in the database. Snipre (talk) 11:02, 10 December 2014 (UTC)[reply]
@Snipre:. Yes, there are way too few databases available under CC0, and we can't change that over night. There are some good sources of open data, though, and the proposal will try to bring a good part of them together. Help with that is appreciated. --Daniel Mietchen (talk) 11:09, 10 December 2014 (UTC)[reply]

Policy

[edit]

It might be interesting to include a section about current policies within the EU about open data. If the current laws about open data are detrimental to European research interests, then that is a phenomenon in itself, that could be researched. Maybe we can get political and law scientists to look into the licencing problems within the EU. Concrete research could also be used as a basis to argue for certain laws to be improved. --Tobias1984 (talk) 19:30, 11 December 2014 (UTC)[reply]

I heartily agree… Currently, there are many concerns with EU database laws. They are both complex (databases are covered by two different legal system: classic intellectual property and sue generis rights) and quite fuzzy. For instance, even the most copyrighted database is reusable under any kind of open license, providing one does not copy a substantial part. But the extent of substantial has never been precised. There is no way to know when you switch from the information-level to the original structure-level.
In fact, this work of documentation is not even an aside : Wikidata would be well suited to store legislative data. The wording of laws is very formulaic and could be made even more understandable through a relevant ontology…
Alexander Doria (talk) 00:08, 12 December 2014 (UTC)[reply]
I agree that this is an important topic. I am aware of a paper addressing these issues to some extent, and involved with the Bouchout Declaration, which tries to move things forward in the domain of biodiversity research. I also think that making laws more machine readable through Wikidata would be interesting. In designing this proposal, we should keep both issues in mind, but I do not think we can allocate any significant amount of effort to addressing them specifically. --Daniel Mietchen (talk) 01:28, 12 December 2014 (UTC)[reply]

API

[edit]

I think, that it is be especially important to develop an API for IPython-Notebooks. Pulling data right into Numpy or Pandas data structures would ensure that Wikidata integrates well into the scientific workflow. --Tobias1984 (talk) 12:38, 11 December 2014 (UTC)[reply]

Good point, thanks! I put it on the list. --Daniel Mietchen (talk) 13:13, 11 December 2014 (UTC)[reply]

Molecules

[edit]

I speak about what I like, i.e. chemistry: as concrete objectives we can have:

  • defining the most critical identifiers for molecules
  • mapping the items about molecules with the identifiers list in the first step
  • defining a list of properties (physical properties)
  • defining a classification system for molecules (ChEbI onthology can be used but I think we need something more simple in order to give the possibility to everyone to perform query without the need of a deep knowledge in chemistry). Snipre (talk) 15:03, 11 December 2014 (UTC)[reply]
Thanks! All of this looks good to me and is actually mostly already in the current version of the proposal: your first point closely corresponds to Task 3.4, your second to Task 3.1, your third and fourth to 5.2, with small molecules falling under the use cases to be explored in Task 5.1. As the mismatch between your and my numbering implies, there is room for rearrangements, and I'm pinging User:Egon Willighagen to take a look. --Daniel Mietchen (talk) 16:38, 11 December 2014 (UTC)[reply]

Geologic data

[edit]

One possible "Anknüpfungspunkt" (point of contact) for the geosciences would be the article in the September 2014's GMIT (http://www.gmit-online.de/wp-content/uploads/2014/11/Gmit_57_F-1.pdf - There are some English references at the end of the article). Wikidata can be really helpful for regional sciences, because different scientific institutions developed different nomenclatures that are usually not simple translations. For example a layer of sediments in a country closer to the ocean might be influenced more by the marine environment and therefore be rich in carbonates. In another country the influence of the ocean disappears in that layer and the unit is mapped as a sandstone. So from a regional point of view both designations are valid. Wikidata being multi-lingual and enabling us to make statements from different perspectives could really help in the effort to unify such data across Europe. --Tobias1984 (talk) 23:49, 13 December 2014 (UTC)[reply]

Thanks - that sounds like a good fit. I have seen in the Open Data census that National Maps are openly available in a good number of countries, so this also seems doable. Unfortunately, the International Geological Map of Europe and Adjacent Areas (Q18638637) seems to be hampered by copyright issues. --Daniel Mietchen (talk) 01:23, 15 December 2014 (UTC)[reply]
 Strong support, geological cartographic data in Wikidata is a great idea. User:Tobias1984, I think I've seen a presentation by some of the people who wrote that paper. OSM has some geological data, but as far as I know it's all geomorphological ("here is a mountain", not "here are the areas of the mountain that are made of limestone, and here is where the shale outcrops..." [1] [2]).
Making a global open geological map would be brilliant, but would take a keen person willing to put in a lot of work to start it off. Datawise, a correspondance of reference frames, topographic data (w:ASTER? http://www.opentopography.org/?), and annotated lines, polygons, and polyhedra in a decent ontology should do most of it, and that definitely sounds like Wikidata.
Building it as a layer of OSM might reduce the work needed on visualization tools. Layers of LiDAR images etc. would be great. There seem to be public-domain sources for the large-scale (tens of meters for topography, maybe rougher for geology) mapping, but then we'd want to get finer descriptive and spatial resolution. This stage could actually be relatively quick and easy.
I'm trying to think of some good sources. Industry might be persuaded to share some geological map data. Geological Surveys vary. For instance w:USGS maps are public-domain (though they sell prints); w:Canadian Geological Survey ones are under w:Queen's Copyright, which lasts forever. I think OSM has some scanned out-of-copyright geological maps. Digitising them might be crowsourced; [www.missingmaps.org] is already OSM-mapping volcanoes for disaster response, so they might want geological maps of the same areas, and I think their software is opensource. TimeScale Creator is a database of geological categorizations and observations, and tools to use it. Although it's copyright Professor Ogg gives it away and is always looking for more people to work on it, especially people with good technology skills. He would, I think, be amenable to open-licensing the data for some obviously useful purpose, though I'm pretty sure he'd want to retain the right to stamp one copy as official.
Support base? Publishing geological maps can be slow. Maybe an open map database with a good interface could become a sort of arXiv of geological maps. This sounds like the sort of idea that might draw an acdemic grant or substantial industry funding. University students doing small-scale coursework mapping would add to it (especially if they could get a DOI to cite as evidence of their work). Checking out your neighbourhood and overlaying geological maps on satellite images can be fascinating, so that might drive use; help IDing provenanced bedrock as a volunteer service might also be popular, judging by the usual success of we-ID-your-rock outreach events.
It's a great idea. With some good tools, it could really improve the quality of images on Wikimedia. It sounds like it needs its own project, though. HLHJ (talk) 12:32, 1 February 2015 (UTC)[reply]

de-monopolization

[edit]

Though I strongly agree with this concern and objective, I wonder if it makes sense to give it such prominence (looks like 1/3rd or more of the text under "Scientific content") or put it in those terms. The call doesn't use such language, and even in the declaration on responsible research only addresses it indirectly, calling for openness and inclusiveness. It seems like this proposal could be described and motivated in direct alignment with the call, which objectively seems perfectly aligned. WD4R should be presented as a huge and uniquely placed opportunity to further "more effective collaboration...higher efficiency and creativity...thanks to reliable and easy access to discovery, access and re-use of data" etc. The monopolization concern might alternately be expressed in terms of promoting inclusion, engagement, etc aligned with the responsible research declaration and the inclusive, innovative, and reflective societal challenge rather than in a free culture/open access rant (that I agree with! :-)). Mike Linksvayer (talk) 04:40, 18 December 2014 (UTC)[reply]

I agree. Will try to fix this during the day. Thanks! --Daniel Mietchen (talk) 05:06, 18 December 2014 (UTC)[reply]
Done. --Daniel Mietchen (talk) 09:50, 18 December 2014 (UTC)[reply]

Coordinate with OBO Foundry

[edit]

I would encourage efforts from this project to use and maintain straightforward interoperability with major existing scientific ontologies, e.g. ChEBI (maintained by the European Bioinformatics Institute) and other OBO Foundry ontologies. People worth reaching out to on this may include:

  • Colin Batchelor (Royal Society of Chemistry) -- RSC sponsers a Wikimedian in Residence, IIRC
  • Janna Hastings (EBI)
  • Simon Jupp (EBI)
  • Stefan Schulz (Medical University of Graz)
  • Alan Ruttenberg (University at Buffalo)
  • Barry Smith (University at Buffalo)
  • Chris Mungall (Lawrence Berkeley National Laboratory)

That group has built infrastructure for developing ontologies at various levels (upper, middle, and domain) and their work is often integrated into scientific research. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration describes the initiative. Emw (talk) 14:01, 18 December 2014 (UTC)[reply]

Yes. Added. Pinging User:Pigsonthewing (Wikimedian in Residence at RSC). --Daniel Mietchen (talk) 14:20, 18 December 2014 (UTC)[reply]
Thank you. How can I help? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:19, 18 December 2014 (UTC)[reply]
Perhaps you can ask Colin Batchelor for advice on how best to go about interoperability of OBO with Wikidata? Thanks. --Daniel Mietchen (talk) 15:26, 18 December 2014 (UTC)[reply]

Visualizing art history

[edit]

My big interest with Wikidata is art history (and a few other things, but, it's my current focus), and this great paper co-written by a colleague of mine, talk about the use of visualizations in documenting and exploring art history. I thought some of you might find it interesting! See it here. Missvain (talk) 03:20, 22 December 2014 (UTC)[reply]

Venue for Wikidata research

[edit]

Posting this here another time seems appropriate: https://lists.wikimedia.org/pipermail/wikidata-l/2014-October/004678.html --Tobias1984 (talk) 09:46, 23 December 2014 (UTC)[reply]

Mailing list

[edit]

Could we maybe use https://lists.wikimedia.org/ instead of Google Groups for the mailing list? Google Groups uses closed-source software and licenses the content we put on it to Google. Signing up for it also requires agreeing to their terms of use and privacy policy, which are quite different from Wikipedia's. Is there a reason not to use the internal system? HLHJ (talk) 10:56, 1 February 2015 (UTC)[reply]

I wanted to use a Wikimedia mailing list and asked WMDE to help set one up. I wasn't aware that they use Google groups, and given the time constraints we had in preparing the proposal, my priority was on getting things going. I'd be happy to see the list moved over to someplace more Wikimedia. --Daniel Mietchen (talk) 12:36, 1 February 2015 (UTC)[reply]

Tools for primary research

[edit]

I'm sort of disappointed that Wikiversity's research platforms aren't better-used. So much of the software here is (unsurprisingly) really good for research; apparently PLOS is even using MediaWiki internally. Why aren't more academics using it? I think it's because we haven't actually quite got the sort of Open Labbook and collaborative article-editing software that academics want. Wishlist:

  • built-in spreadsheets
  • R interface, such that you can click on the graph to get the script.
  • figure referencing? Have we got this?
  • ContentMine's tools for getting data out of semi-machine-readable tables and graphs
  • Latex exporter (I suspect this exists, too)
  • reading lists and source recommendations
  • ability to annotate sources

Any other ideas? Would a Wikiproject Wikilabbook be good? HLHJ (talk) 21:05, 1 February 2015 (UTC)[reply]

Wikimania 2016

[edit]

Only this week left for comments: Wikidata:Wikimania 2016 (Thank you for translating this message). --Tobias1984 (talk) 12:00, 25 November 2015 (UTC)[reply]

How to note an author's correction

[edit]

Greetings. Can you tell me which property I should include in an item, e.g. a scholarly article, if there is a later published revision or change? I'm referring to those events not where the original is completely replaced, just where the author publishes an addendum. Thanks. I hope you dn't mind my adding this topic. I didn't see any others here, but thought it was the right place.

Example:

I think the question might be related to when a subsequent author replies to, comments on, another. Important as references should consider follow-ups to the original citation.

Thank you for any recommendation. -Trilotat (talk) 14:16, 8 November 2018 (UTC)[reply]

Poor data quality of researcher elements

[edit]

It seems that a lot of scientific researchers have been imported in batches, with poor data quality as the result.
Examples:

Three elements would be easy to correct, but there are ten thousands of them. This leaves some questions:

  1. What is the best way to present, discuss and deal with these challenges?
  2. The good news is that most of these elements are connected to one or more external databases like ORCID, VIAF, Scopus etc. What could be done to fetch data from these sources and supply the elements in Wikidata with a bot?
  3. Several researchers have duplicate records in ORCID or other external databases. What experience do we have with communicating with administrators of external databases about duplicates and other mismatches?
  4. Is there any way to create a query that detects possible duplicates? Or lists with researchers labeled with the syntaxes B Barr, B. Barr or Barr B? --Cavernia (talk) 08:04, 21 June 2019 (UTC)[reply]

Status of this project

[edit]

WikiProject Wikidata for research has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. This project seems rather dead: the mailing list does not work, the hashtag has not been used since years and a lot of content is missing. Is anyone willing to clean up or can I close the project? Too much o outdated information is worse then no information except for one or two links to related projects. -- JakobVoss (talk) 07:25, 23 October 2019 (UTC)[reply]

Bibliography of Wikidata

[edit]

Hi All,

I've created a Wikidata:Bibliography of Wikidata page, feel free to expand if interested.

Best, --Adam Harangozó (talk) 13:02, 13 June 2020 (UTC)[reply]

Thanks for doing this! -- Oa01 (talk) 13:23, 22 July 2020 (UTC)[reply]

knowledge infrastructure

[edit]

Hello all,

Wikidata has an entity for research infrastructure (Q1438053).

Should there be a separate entity for "knowledge infrastructure"? If so, would "research infrastructure" be a subclass?

Your collective expertise on the matter would be much appreciated. Thanks. -- Oa01 (talk) 13:23, 22 July 2020 (UTC)[reply]

Mailing list

[edit]

I never heard of this mailing list before. I also try to avoid mailing lists which require to use proprietary software, where possible, so it might be that I heard of it but opted to not subscribe. Is the mailing list still active?

I think it would be nice to link Wikidata:Mailing list and mail:wiki-research-l, which is where most talk about Wikidata and research happens. Nemo 08:28, 19 September 2021 (UTC)[reply]

New properties applicant (P10602) and co-applicant (P10601)

[edit]

These properties were created today: applicant (P10602) and co-applicant (P10601). I'm posting here because this project has more than 50 participants and couldn't be pinged. UWashPrincipalCataloger (talk) 07:50, 10 April 2022 (UTC)[reply]

@UWashPrincipalCataloger How does it work? For example, can you explain what the relationship between SFB 884: Political Economy of Reforms (Q48975701) and University of Mannheim (Q317070) is? In what way is the university an applicant to a collaborative research center? Thanks, Vojtěch Dostál (talk) 13:26, 10 April 2022 (UTC)[reply]
@Vojtěch Dostál I honestly cannot explain it well. I didn't propose it. I created the properties after they were marked as ready for creation by another property creator. The way I understand it, an applicant is a person or organization that applies for financial support to an agency that gives money for research projects. It does seem like an unusual relationship to record, because there's no link to the specific research that is funded. UWashPrincipalCataloger (talk) 18:10, 10 April 2022 (UTC)[reply]
Ah, in that case, I'm pinging @MasterRus21thCentury: - please, can you enlighten us? :) Vojtěch Dostál (talk) 19:16, 10 April 2022 (UTC)[reply]
MasterRus21thCentury failed to reply even though I pinged him/her many times. This is very unfortunate for a property creator. Can others who voted  Support comment, please? @LukasCBossert, Trade, Hannes Röst, Germartin1:? Thanks ... Vojtěch Dostál (talk) 17:29, 18 April 2022 (UTC)[reply]
@Vojtěch Dostál, UWashPrincipalCataloger: they way I understood this is that it would work a bit like author (P50), a research proposal can be viewed like a contract between the grant agency and the organization that executes the grant. For example, it seems like this grant 884 SFB 884: Political Economy of Reforms (Q48975701) is grant funded by German Research Foundation (Q707283) from 2010 to 2021 and the money is given to University of Mannheim (Q317070) and several others including GESIS – Leibniz Institute for the Social Sciences (Q1485220) (see Antragstellende Institution and Beteiligte Institution) which are applicants and co-applicants. What do you mean "there is no link to the specific research"? I assume that any publications, patents or other research resulting from this grant could be linked to the grant again through funder (P8324), where do you see the problem? Generally researchers declare grant numbers that have funded a certain research in a paper so that they can be linked back to individual grants. There may also be some translation misunderstandings here, SFB 884: Political Economy of Reforms (Q48975701) is not a long-term independent research center that is funded by the University but rather a medium-term (10 year) grant that has sub-grants within, see also here. --Hannes Röst (talk) 18:21, 18 April 2022 (UTC)[reply]
@Hannes Röst So, you are suggesting that these properties should only be used on grant proposal or grant project items? I am not aware of many grant project items in Wikidata but that could work. I will update the property constraints and descriptions after you confirm this. Vojtěch Dostál (talk) 18:47, 18 April 2022 (UTC)[reply]
See here, the original proposal read With the property "applicant", research projects can be described in greater detail. The property could also be used in other domains with calls or tenders. so this is clearly intended for research projects but could be used for other (government) agency grants. Eg if there is a grant to build a infrastructure project from the federal government to a local government, this property could be used. Maybe also for awards / prizes or even architecture calls (not sure where else it could fit). We should also update this data model. --Hannes Röst (talk) 19:46, 18 April 2022 (UTC)[reply]
@Hannes Röst I don't think it's a good idea to mix up research facilities and consortia such as SFB 874 (Q7389939) with research grants that were used to fund them. I think that your definition of applicant (P10602) is OK but it should remain in the domain of research proposals, tenders and calls. I see clear difference between a project (longer, often funded by many grants) and a proposal (which has applicants and co-applicants) So all examples at applicant (P10602) seem wrong. Vojtěch Dostál (talk) 11:12, 19 April 2022 (UTC)[reply]
PS: there is already some "alternative use" for this label here which I am not sure how I feel about it, but it seems like there is a need for this property outside research and we should define it more clearly for such use. --Hannes Röst (talk) 19:52, 18 April 2022 (UTC)[reply]
When the page specifically states 'Applicant' it makes sense to use the property with the same name as qualifier --Trade (talk) 22:43, 18 April 2022 (UTC)[reply]
I am all in favor of making the meaning and use of these properties more clear. Personally, I don't think that they should have been marked as ready to create, even though once they were I was the person who created them, doing my best to make sense of them. I fear that too many new property proposals are being marked as ready when to me they clearly are not completely ready (they often lack the appropriate subject item or applicable stated in items, for example). I create those when I realize they are necessary, but other creators are not doing that. UWashPrincipalCataloger (talk) 03:29, 19 April 2022 (UTC)[reply]
I changed the description a bit. Thoughts? I feel like 'rating organization' is open enough to leave some room for similar cases in the future. @UWashPrincipalCataloger: --Trade (talk) 23:11, 20 April 2022 (UTC)[reply]
@UWashPrincipalCataloger I think the problem is not the description but the domain. Which items should this property be used in? I think only those which are subclass of grant proposal. Vojtěch Dostál (talk) 10:58, 23 April 2022 (UTC)[reply]
Yes, that makes more sense to me. Applicant for a grant proposal. UWashPrincipalCataloger (talk) 19:16, 23 April 2022 (UTC)[reply]

MUSE book ID

[edit]

My recent property proposal may be of interest. Susmuffin (talk) 03:34, 17 June 2022 (UTC)[reply]

I feel like this proposal may have slipped under the radar somehow. Considering three other MUSE properties exist, this one probably should as well. UWashPrincipalCataloger, is there more that should be done to get this proposal promoted? Huntster (t @ c) 16:59, 23 November 2022 (UTC)[reply]

MUSE publisher ID

[edit]

Scinapse Author ID

[edit]

Hello, I have a property proposal and you might be interested in it.

Scinapse Author ID

I kindly invite you to discuss it.

Many thanks, --Luca.favorido (talk) 13:54, 18 August 2022 (UTC)[reply]