Wikidata talk:WikiCite/Roadmap

Your preferred scenario[edit]

Curious to hear people's thoughts on their preferred scenario from 1 to 4 and their motivation. Please remember this is not a vote or a binding RfC or community consultation.--Dario (WMF) (talk) 00:05, 15 August 2018 (UTC)[reply]

I prefer 1, possibly via 2. If that is hard to do right away, we can start with 3 with an eye towards integration into 1 within a year.
4 is already the state of the works outside the wikiverse, and we can help that world start using federated wikibases as a way to be maximally compatible with wikidata for data sharing and cross-comparison. But that will simply make import and sync with WikiCite easier, it should not be the preferred identity of WikiCite — the value of which should be precisely clarity of authority, deduplication, (eventual) completeness. Sj (talk) 03:32, 15 August 2018 (UTC)[reply]
2 I am already finding search less and less useful because I am drowning in scientific articles when I am looking for concepts. This will only get worse if we don’t find a way to exclude scientific articles from search results, and a separate namespace seems like the way to do that. 3 might be also be a good long-term solution. I assume either would require significant grant money for staffing. - PKM (talk) 03:53, 15 August 2018 (UTC)[reply]
It's hard to form a preference without some information on how the massive import has worked so far. For instance, how many users have stumbled upon the imported entities and/or edited them? How many connections are there between those entities and the others on Wikidata? Something I liked, but I had to dig a bit for, is that SourceMD allows us to very easily connect an author to many of their works, transforming the unstructured strings which were imported into real structured data. As far as I understand, this relies (also) on existing entities about people and entities. While affiliation data is very tightly connected to the kind of information we usually host on Wikimedia projects, the graph of citations seems to fit much less and it's not clear what benefit there is from hosting it on Wikidata. --Nemo 07:06, 15 August 2018 (UTC)[reply]
1 as a Wikisourceror, the Wikisource community is used to Wikidata, to the tools to edit directly from Wikisource, etc. It's growing slowly but it's growing and I fear that disruption would break the good momentum. Cdlt, VIGNERON (talk) 08:02, 15 August 2018 (UTC)[reply]
2: can search concepts or refs, not moving away or forks (bibliographic data are data, so they main be kept here)--Barcelona (talk) 08:29, 15 August 2018 (UTC)[reply]
Support only 1, strongly oppose others: Current item model works well and we don't need a second model. It is a huge work to migrate existing items for papers if we decide not to represent papers as items. Also, "there're too many items for papers to query" is not a problem; it only reflects a bias - there're much more other notable concepts, most of them are not described in Wikidata.--GZWDer (talk) 18:21, 15 August 2018 (UTC)[reply]
1 or 2 would be ok with me, I can see the advantages of both. 3 or 4 would be nice to get working for some types of real-world entities (for example the catalog of billions of stars that was recently mentioned in project chat) but I think sources are too integral to what we are trying to do here to have them live in independent locations. ArthurPSmith (talk) 13:58, 16 August 2018 (UTC)[reply]
delay decision, 1 for foreseeable future, be friendly to 4 Wikidata is now 50 million items. Right now we in WikiCite are talking about 200 million existing academic sources. In a few years we will talk about the 2 billion magazine articles and the 5 billion newspaper articles. We have to do all the books and break them down by chapter which is billions more. Somewhere in the midst of that we will talk about the 500 million individual people who are named as topics in all these publications. OpenStreetMap has several billion roads for us and a few billion buildings, each of which has a history in the structured data. All of this will belong to wiki someday. All of this is a part of WikiCite, because we need to know who published what and about what and where and which organizations were involved in which region.

I propose encouraging people to do #4, build out Wikibase instances for their datasets. Everyone should sort any data they like in Wikibase to make it nice for future integration with the mothership. For now people with nice data can push it into Wikidata, #1, or just be on hold.

For now WikiCite should develop #1. I have no idea what to think about future technology. Either search gets faster and putting everything in the main #1 Wikidata collection is not a problem, or search speed does become a problem, and we need to split off some parts of our collection. If the split is necessary, and option #2 or #3 is required, then the cause of that will not be WikiCite. It will be because of the other publications, and all the people, and every location in the world, the proteins, and the astronomical objects. WikiCite is so much smaller than the challenges in our future. We should not seek a permanent solution for WikiCite if we are not confident that what works for our small WikiCite collection is the permanent solution for the next billion items which will immediately follow. Blue Rasberry (talk) 16:55, 16 August 2018 (UTC)[reply]

First thanks for this page and the good summary. I'd start with the motivation to change current practice. WikiCite items and statements will soon outnumber the rest of Wikidata leading to Growing pains with usability and technical infrastructure. If any of the four scenarios allows to filter out bibliographic entities, e.g. when searching for an item, then it is the right way to go. 1 is most convenient in the short run but I see no way to proceed in the long run. In theory 4 would be best but I doubt it's doable in practice. We need methods to filter out bibliographic entities which can be done by both 2 and 3. This must be decided by the ones who implement required changes in Wikibase and MediaWiki. -- JakobVoss (talk) 09:37, 20 August 2018 (UTC)[reply]
1 or 2 I think No.1 is the obvious choice for several reasons. The data created will seamlessly link with the rest of Wikidata, bringing added value, through enrichment of the data. It's a known entity, which is increasingly being used by cultural and academic institutions to share their own data. Having said this, having some kind of skin, or portal which sits on top of Wikidata, specifically for importing/editing/queering relevant data could provide added value and improve user experience. Also i think WikiCite should definitely be open to all citation data and not limited to citations already in use in Wiki projects, as it would encourage users of other projects to seek and utilise a wider range of sources, thus improving the quality of research. Jason.nlw (talk) 15:23, 29 August 2018 (UTC)[reply]
2 (best) or 1. I dislike 3 and 4 – the risks are to high for no real additional value. -- MichaelSchoenitzer (talk) 18:18, 3 January 2019 (UTC)[reply]

My 2c[edit]

Speaking as someone who contributes to the "Sum of all Paintings" Wikiproject on Wikidata and who created the "Women" Wikiproject on Wikidata, I think two things generally about this and both are in the same direction as this page indicates. Firstly, the word "billions" no longer scares me nor does it interest me particularly, because I have come to believe in the power of crowd sourcing to tackle those kinds of numbers. Secondly, build the framework from the bottom up, working from the needs of today and don't worry too much about the needs of tomorrow. We are missing the ontology and the vocabulary to describe what this is, which scares some Wikipedians, and that is OK. Yes, I think we need a separate bibliographic database, in the same way as we need a separate structured commons. I have often said over the years that separating Wikisource into language silos was a bad decision and I am hoping that the structured commons can repair some of those broken "interlanguage links" that never happened because of the silo effect. We need this bibliographic data to be a separate project because wikisource files are just some of the many disconnected files on Commons that are just too messy to pump into Wikidata "as is" and we need the crowd to massage the data and wiggle them in. The bibiographic data is like Wikisource in the language problem and can be seen alongside of Wikisource, as a pipeline into Wikisource, and of course mostly just "not suited for Wikisource due to copyright constraints". This last category is of course tricky with all of the ongoing buzz about database copyrights. I say ignore all of that and assume the laws of today will be the laws of tomorrow, but just to make sure, make the "possible pipeline into Wikisource", aka stuff already on archive.org, the main first focus area. This I know will piss off any researchers working on the (huge) database of recent papers, but it will enabe an influx of Wikisourcers into the project, and get more people thinking and contributing to Wikisource. Jane023 (talk) 06:00, 15 August 2018 (UTC)[reply]

A source is also a topic[edit]

Wikidata describes subjects/topics (towns, animals, roads, paintings). A source may also be a subject/topic, e.g., reviews treat other works that are potential sources as subject/topic. The journals (where a source is published) is a subject described in Wikipedia. The authors publishing a paper may be described in Wikipedia.

Listing of items with a subject that has a subject:

SELECT ?item ?itemLabel ?item2 ?item2Label ?subject ?subjectLabel WHERE {
  ?item wdt:P921 ?item2 .
  ?item2 wdt:P921 ?subject . 
  ?item wdt:P31 wd:Q265158 . 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,da". }
}

Try it!

To me, it seems best to have all in one Wikidata. — Finn Årup Nielsen (fnielsen) (talk) 12:33, 15 August 2018 (UTC)[reply]

I don't think these items should be removed, but the bibligraphic database will be bigger. So this is just the same as e.g. painting items will not be removed from Wikidata once we have "Structured Data on Commons". Wikidata has some painting items that are not on Commons (due to copyright constraints), and Commonshas lots of paintings that are not on Wikidata (because there isn't enough metadata in the file to make a meaningful item for it). Citations work the same way. Many newspapers cited on Wikipedia are not in Wikidata yet and the other way around. The extra project layer will be a landing place for citations that are just deadlinks today, so we record the citation in the new bibliographic database with the text that it supports, even though we don't have enough metadata to build an item for the cited source. Jane023 (talk) 18:27, 15 August 2018 (UTC)[reply]

As an example of papers discussing other papers are Inaccurate retraction notice for meta-analysis by Iwamoto et al (Q56294160) [1]. It is discussing a retraction note, and that retraction note discusses another paper. — Finn Årup Nielsen (fnielsen) (talk) 08:54, 29 August 2018 (UTC)[reply]

Previous Wikidata RFC on this[edit]

See Wikidata:Requests for comment/Source items and supporting Wikipedia sources (from 2013, so it could certainly be revisited). There were some good arguments there on why sources should be treated as regular Wikidata objects - they are real entities in the world that can be described, and we do have wikipedia articles directly about some of them (particular books, magazines or newspapers for example). I guess I'd like to hear more about the technical burdens - for example, could Wikidata search be enhanced to allow filtering by P31/P279 property values, rather than just by namespace? ArthurPSmith (talk) 18:14, 15 August 2018 (UTC)[reply]

How is Structured Data for Commons advancing?[edit]

Whether or not #3 or #4 would work out somehow depends on how well Structured Data for Commons goes. Maybe we should attempt to list things that work for Commons (or we think that should be working) which could apply for these. From what I have seen, it could work out rather well.
--- Jura 18:57, 15 August 2018 (UTC)[reply]

@Jura1: Structured Data on Commons is coming along with the first feature release, multilingual captions, coming out in October. This feature does not use Wikidata. The first integration with Wikidata comes with the release of Depicts (P180) statements for files; release for that feature is planned for early 2019. You can find the development plans on Commons, and you can sign up to get the Structured Data quarterly newsletter. Keegan (WMF) (talk) 18:18, 21 August 2018 (UTC)[reply]

In or out: no intermediate solution[edit]

My first opinion is stay in or leave, so option 1 or 3. I really don't see the benefit of option 2 and option 4 is just want is happening now and results in the current mess with no clear data model. But if we consider option 3, then we have to consider to create a copy of wikidata: in a sister project, you can't just have bibliographic items. What's about authors, publishing compagnies, publication place ? To be able to fill all these data in bibliographic items, you need to create millions of other items, mainly people, so at the end you will duplicate a consequent part of WD because even if you want to focus on bibliographic data, you will add data like citizenship or birth/death date to the author items, you will add country membership to publication places. So then option 3 starts to be not so interesting. So why option 2 is not better than option 3 ? Because if bibliographic items have their own namespace, why people, chemicals, geographic places couldn't have a dedicated namespace ? There is no structural need of a different namespace for bibliographic items, a bibliographic item requires the same sections like for all other items (labels, identifiers, wikilinks,...). Finally we can come back to option 1. And we can discuss the main drawback of this option: billions of items will be created ? Really, and why do we need all these items ? In my working field, chemicals, we can create dozens of millions of items considering all possible combinations of chemical bonds and atoms, but we have only less than 200'000 items about chemicals in WD currently. Why ? Because we focus on chemicals with some interest, with some data available. Instead of speaking billions of bibliographic item and related authors, we can perhaps start a first list of bibliographic documents having some interesting features. Why don't we choose simple rules like all bibliographic documents used as reference in Wikidata/wikipedia and documents matching some objective criteria like a minimal number of citations for scientific articles, or having more than one edition or at least one translation for books to justify the notability criterion ? Having these simple rules reduce billions to dozens of millions. I have the impression that some people loose their critical reflexion when working in WD: they add data because they can add data, especially because they can use bots. Instead of discussing what is the appropriate support to store billions of items, I think more critical to fix a data model for bibliographic documents and to have a structured classification to manage some millions of documents before starting to dream of billions. recently we had a discussion about FRBR model in WD in WikiProject Books. Few contributors took part to it, and I have the impression that WikiCite is not really concerned by the daily problems of contributions having to deal with bibliographic items. So again, stay in or leave. No intermediate solution. Snipre (talk) 21:00, 15 August 2018 (UTC)[reply]

Structured data on Commons more or less follows 3 (or maybe 4) and AFAIK doesn't plan to duplicate Wikidata.
--- Jura 21:11, 15 August 2018 (UTC)[reply]
@Jura1: Did you read my comment ? I provided enough examples showing why the risk of duplication is high but for you I will try to explain better:

Just imagine that Wikicite is a new sister project independent of Wikidata and I want to create one item about a book and add data about it. So the item is created and I want to add the author, how can I do it ? I can add it as string. But in that case I can't extract all books written by the same author in Wikicite, because strings can't be formatted enough rigourously to avoid different data formats (think about a woman who change her name after marriage, how can you handle that in a string?). So the best is to create an item for the author in Wikicite. Then if Wikicite is used as reference to build citation, we have to be able to differentiate first name and last name to generate the different formats of citation. So we need 2 properties for the author item.

And if the author has an item in Wikicite and in Wikidata (a lot of authors are known for other reasons than their work), don't you have the impresssion that the fact of having in both projects an item for the person with the same properties is duplication ?

And don't propose to let the data about people on Wikidata: just imagine the code to create a citation in Wikipedia for a book using data in Wikicite including an author having his data in Wikidata. And when we know the difficulties of Wikipedians to use Wikidata, I can't imagine their reactions when editing a citation in Wikipedia and having to modify some data in wikicite and other in Wikidata. A nightmare: three projects, three communities, three different sets of policies, nobody will buy this. Snipre (talk) 21:57, 15 August 2018 (UTC)[reply]

How to cite?[edit]

How can we "cite" (reference) facts at WD along Op 3/4? --Succu (talk) 21:39, 15 August 2018 (UTC)[reply]

I think it would the the same for all 1/2/3/4. It's just a configuration question if one can use values only from the local Wikibase or also some other. Commons plans to use items and properties from Wikidata, but they could have chose to use only Wikidata items.
--- Jura 21:49, 15 August 2018 (UTC)[reply]
I'd say: pretty much in the same way as we're doing it today? Scenario 4 assumes no central bibliographic record in Wikidata or in another Wikimedia project, so references will continue to follow the current practice of using reference URL (P854) when appropriate or additional dedicated properties and identifiers (if external catalogs or knowledge bases have one, that is). In Scenario 3, I assume we'd have a user interface that can pull metadata from the bibliographic sister site (it would work in a way similar to structured data look up from Commons to Wikidata), but I defer to others at WMDE who are more knowledgeable on this (also @Daniel Mietchen: since he raised a similar question in an earlier draft of the document).--DarTar (talk) 21:53, 15 August 2018 (UTC)[reply]
- I'm reffering to the WD action "Add Reference". --Succu (talk) 22:12, 15 August 2018 (UTC)[reply]
- As on https://federated-commons.wmflabs.org/wiki/File:LighthouseinDublin.jpg except that the items used as values wouldn't be in the local wiki. This should also work for #4 if the Wikibase is configured.
  --- Jura 22:40, 15 August 2018 (UTC)[reply]
If option 3, use the description defined in Help:Sources#Databases (Wikicite will generate ID for its items), if option 4, use the description defined in Help:Sources#Web_page. The problem will be the creation of citation in WP using values from WD and reference data from Wikicite: the template will be a nightmare. Snipre (talk) 22:35, 15 August 2018 (UTC)[reply]
- Help:Sources#Headstones_at_Commons is probably the better comparison (except that type of reference wont be needed and we can't load any media description data from Commons yet).
  --- Jura 22:47, 15 August 2018 (UTC)[reply]

The community appeal is not big enough to deserve an independent project[edit]

Building a ontological database is a niche hobby, building an ontological database of sources is even further away from the mainstream, there are not that many people interested, and the value of any project is in its community. At least while on Wikidata, Wikicite still can piggyback on the success of the parent project, but independently it would be more difficult.

One should take a look to communities built around sources metadata and see what happened with them. For instance, the OpenLibrary can be considered a predecesor since it aimed to collect information on all the books. The result is a an almost inexistent community. There are some edits, but a very limited community.

Also, by the nature of citations, you don't normally refine existing items, the only task left to do for potential contributors is to add new items, which will be increasingly obscure, and highly automated with little human intervention. What is the value of having a purely automated knowledge base? Close to zero. The added value of Wikidata over other databases resides in the human curation.

I also doubt about the usefulness of growing the size of sources unrestrainedly. "Growth for the sake of growth is the ideology of the cancer cell." What is needed instead is intelligent growth. Developing guidelines so that the sources that make it into Wikidata are the most relevant ones, and of course, deleting the rest.

I would see some advantages in creating a dedicated namespace if that would mean that those items would have special characteristics, like sub-entities to describe versions. Something that we missed in the past, but now with the experience gained while creating the Lexeme datatype it should be possible.

In the future, once the Structured Commons is finished and federation works, maybe someone can start a botipedia of sources away from WMF projects and federate it anyway. Our mission is not to collect all knowledge, but to collect the sum of all knowledge, which means that we have to be discriminative about what we dedicate resources to.--Micru (talk) 20:49, 17 August 2018 (UTC)[reply]

I don't think there is enough information to claim that "there are not that many people interested". The lack of community on OpenLibrary is a result of a variety different factors. OpenLibrary has drastically limited developer support, funding, an outdated and sometimes deprecated tech stack, and poorly defined data model. OpenLibrary doesn't even allow for discussion between its users; it's virtually impossible to form a community there. Comparing its success to Wikidata's is not an even playing field. As an avid editor of OpenLibrary and open source contributor of OpenLibrary, we are investigating solutions to these problems, but a Wikimedia-backed or Wikibase-backed approach would allow for a lot of benefits. --Hardwigg (talk) 06:10, 18 August 2018 (UTC)[reply]

The information is there, you only need to find out how many contributors have participated in the millions of source items that we currently have. That would give us an estimate of the number of users that we are talking about.--Micru (talk) 09:15, 18 August 2018 (UTC)[reply]

WikiProject Books has plenty of participants, but somehow the project seems to struggle with the general Wikidata structure, develop and implement a clear structure for its items. Projects in other fields that steered away from its principled approach might be considered as more successful. Given these problems, bibliographic data might actually work better in a separate project.
--- Jura 12:56, 18 August 2018 (UTC)[reply]

There seems to be two different communities that overlap slightly. The bibliographic community like WikiProject Books seems to care only about books and textual materials, while the Wikicite community seems to care mostly about papers. Still, until we don't know how many members participated in the creation of the millions of items about papers, this discussion is pointless.--Micru (talk) 14:06, 18 August 2018 (UTC)[reply]

I think there is a huge community, if you look at it the right way. Anyone who writes a scholarly work probably wants to cite sources. They might type up some metadata from paper sources, download other data from Crossref, and so on. But then you have to proofread the source metadata. There are so many errors even in Crossref data: papers assigned to the wrong century, glaring misspellings in the title (makes it fun to find), DOIs that don't exist, authors' names misspelled, you name it. And plain old missing fields (CC license?), or fields that don't match the reality they are attempting to document, like authors with names from non-western naming conventions. So everyone keeps their own little database of corrected metadata. Some research groups/areas co-operate by sharing proofread bibliographic metadata. If we can provide easy-to-use interfaces right where people are working already, academics and publishers worldwide will use WikiCite as an exchange medium, contributing as they go.

This is not an argument in favour of an independent project per se, as these benefits are realizable regardless, but the best way to realize them might deserve consideration. HLHJ (talk) 03:10, 27 November 2018 (UTC)[reply]

Multiple attempts to create Wikidata-like things have failed socially, and Wikidata succeeded. We seem to be unusually good at creating communities at Wikimedia. Every academic I have ever known curates an ontological database of sources, often using something like Zotero (which is interested in collaboration). It's not a very niche hobby, and I expect that it's one we will all be getting into. HLHJ (talk) 15:47, 15 June 2019 (UTC)[reply]

What is bibliographic data?[edit]

The proposal doesn't really outline this and several approaches are possible. Supposedly it could any edition of a textual work used as value in cites work (P2860) or in a reference of a statement.

From the comments above, it could also include biographical data on any author.
--- Jura 12:56, 18 August 2018 (UTC)[reply]

@Jura1: thanks for bringing this up, the scope of the proposal is limited to bibliographical records narrowly defined, that could be the target of a reference link in Wikidata or a citation template in a Wikipedia articles. Entities about authors, publishers, affiliations, topics/subjects that are not instances of classes in the creative work tree are outside of the scope of this proposal and would still live in the main Wikidata mainspace. Does this make sense? --DarTar (talk) 20:26, 19 August 2018 (UTC)[reply]

No dedicated data model please[edit]

Option 2 proposes to mimic Lexemes, using a dedicated namespace and possibly data model. Do we really need a different data model for bibliographic sources? Is WMDE even committed to invest the development effort this represents? From what I have seen of the work on Lexemes, this is a very costly project, not just for the WMDE team but also for the rest of the technical community who needs to upgrade the tool ecosystem. In terms of data structure, what is wrong with the current approach with items? The statements and qualifiers we have seem to do the job pretty well, no? − Pintoch (talk) 10:48, 19 August 2018 (UTC)[reply]

@Pintoch: thanks for the input. I agree with you that #2 and #3 would incur big costs (and we would need to seek funding to make them happen), I also don't think this should be a blocker if these are the best directions the community agrees upon for the future of the project. You're also right about the cost of a different data model for tool developers (something I tried to capture in the risks and benefits table, but feel free to flesh out more details or change the wording). At Wikimania, we brainstormed a bit about reasons why bibliographic entities may benefit from a dedicated data model.

Allowing the creation of labels or aliases that can be translated in different languages for, say, a book edition, is going to create a horrible mess for data consumers. A book edition/translation really shouldn't have anything other than its title in the original language of the edition as a label. By the same token, a description may not be needed at all if it can be programmatically generated from the contents of the entity / instance of statements.
Probably none of the sitelinks would apply here: notable works should stay in Wikidata and generic works that are not notable enough should not have links in other projects.
Conversely, representing articles/entries across Wikimedia projects that cite a specific work as relations in the data model itself would open up opportunities for data reuse, analysis, and discovery. This cannot be done with the current data model but it was part of the original vision for WikiCite and LibraryBase.

These are just examples, some of which may or may not make sense (and I should clarify that I am not particularly married to scenario #2), but I think there would be benefits associated with a dedicated data model that we'd need to further explore. I think there's also an unrelated, but important, meta issue you're hinting at, i.e. the potential proliferation of namespaces and the creation of incongruent data models that a change like the one proposed in this document could trigger (the "slippery slope" argument in the table).--DarTar (talk) 20:12, 19 August 2018 (UTC)[reply]

The main point of option 2 is a dedicated namespace. Only some properties would have to be changed. -- JakobVoss (talk) 09:25, 20 August 2018 (UTC)[reply]

Questioning "growing pains"[edit]

"The rapid ingestion of content is taking a toll on the querying infrastructure, causing frequent timeouts." Is this the actually the case. Given that @harej: is mostly doing the citations, couldn't Stanislav Malyshev (Q56010307) just talk to harej? The timeout I get on Wikidata Query Service (Q20950365) seems to be related to the imposed 1-minute timeout.
"For example, searching Wikidata for any given keyword is much more likely to return bibliographic items than other types of entities users might be interested" Is this really the case? Can we get some examples on this? Once main subject (P921) is used more I beleive the non-source items will get boosted. I believe sources can actually be used to glue and describe non-source items better together, mostly through main subject (P921).

— Finn Årup Nielsen (fnielsen) (talk) 09:06, 29 August 2018 (UTC)[reply]

"The prevalence of bibliographic content in Wikidata (nearly 40% of its items as of August 2018) makes it hard to sell its value proposition as a domain-general knowledge base." When we have worked with Wikidata for cultural items in Copenhagen meetings, e.g., for painting at Statens Museum for Kunst (Q671384) or films databased at Danish Film Institute (Q1201043), I do not recall ever having heard that concern. Indeed sources like catalogue raisonné (Q1050259) are seamlessly used together with the non-source cultural items. — Finn Årup Nielsen (fnielsen) (talk) 09:15, 29 August 2018 (UTC)[reply]

— Finn Årup Nielsen (fnielsen) (talk) 09:15, 29 August 2018 (UTC)[reply]

I also question these two vaguely mentioned growing pains:

Potential issues resulting from the huge scale of data to be ingested. This is falsely assuming that we have the same configuration on the bridge between the Blazegraph machines and the mysql database. It also ignores the possibility for the community to monitor and act on WD being too busy by having tools like Wikibaseintegrator that delays upload until the query lag is under 6 seconds. This guards against a bad user experience when editing manually on wikidata.org. Tools that do not act like this should be blocked until fixed. Since the maxlag or timeout change in 2020 I have not seen any big issues with lag. The only thing I have seen is one of the WDQS servers to hickup getting minutes behind and need a restart or whatever.
Queries are already taxed by having so many citation items in the database. This is false. Blazegraph is fine, but the timeout cutoff is of course affecting users. Anyone can hire a VPS, setup WIkibase and ingest a subset of or the whole Wikidata from a dump. Then they can run whatever queries they want with no timeout. So9q (talk) 09:25, 15 February 2021 (UTC)[reply]

What options would allow fair-use metadata?[edit]

It would be good to include fair-use metadata, such as abstract and conflict-of-interest texts. See m:Talk:WikiCite#COI use case for some reasons. I'm told fair-use regs are set on a project-by- project basis. I understand that Wikidata may become multi-license, and whether WikiCite should become a separate project is currently a matter of discussion. Views? HLHJ (talk) 03:41, 27 November 2018 (UTC)[reply]

Outcome?[edit]

@Dario (WMF): After all the input received, and the Wikicite conference, what is in your opinion the direction in which WikiCite should further develop?--Micru (talk) 10:52, 4 January 2019 (UTC)[reply]

Again an interesting question now! for the active Wikicite + WD communities; in advance of Wikimania. Sj (talk) 19:43, 7 August 2019 (UTC)[reply]

@micru: See below :) Sj (talk) 09:11, 19 August 2019 (UTC)[reply]

Update: a sister-project approach[edit]

Etherpad: https://etherpad.wikimedia.org/p/wikicitebase

Proposal

Stand up a short-term wikibase (WikiCiteBase) to fill w/ all known metadata, including from fatcat (below). Starting on a $100/mo virtual machine.
Plan to rebuild this from scratch in 1 month, and tear it down + entirely rethink in 6 months.
Seed this w/ properties from WD, and biblio entities from same.
Merge in record from Fatcat (below), add Daniel M's workflows

Immediate VM needs : a VM that supports the default wikibase docker; quickstatements &c. can't guarantee scaling.
See what we have to do to make this fail. Rebuild as needed.
See if reasonator + M&M + other tools can be made to work easily.
Future needs : spec a proper sister project w/ costs. (one estimate: O(200k) for migration & related overhead + 40k/y for a dedicated db cluster)

Sj (talk) 09:10, 19 August 2019 (UTC)[reply]

Asides[edit]

see also Federated knowledge graphs/Wikibase

Bnewbold has metadata for ~130M records in FatCat.

This includes hashes of files and file-entities, their metadata, and clusters of related instances + revisions of a work. Users can edit metadata directly. [fatcat could switch to importing this from WD if it has a QID]
Does fast ORCID and other ID lookups.
- Idea: connect w/ scripts that show md and allow editing [of WD] in-place? fc.w wouldn't want to host this, but it could be a toolforge overlay.
- Request: fatcat is cleaning up metadata about ~150k journals, wants to store this in Wikidata.
CScott: wants to add annotations to narrow the target of citations.

Tagging the page as historical[edit]

Suggestion:
I often find that people assume that the existence of this page implies that there are specific resources (staff, budget, hardware) allocated towards building one of these proposed outcomes. This is incorrect. It is a valid document, much discussion has happened here and m:WikiCite has done lots of things as a result of the momentum with this kind of discussion. Therefore, I wonder if it's not best for everyone - to avoid confusion - if we add this template:

This page is currently inactive and is retained for historical reference. Either the page is no longer relevant or consensus on its purpose has become unclear. To revive discussion, seek broader input via a forum such as the project chat.

Is that Ok?

Furthermore, I would point people to a proposal I've been preparing which is not the same as these ones here but is - I feel - their "spiritual successor": m:WikiCite/Shared Citations.

-- LWyatt (WMF) (talk) 16:39, 15 February 2021 (UTC)[reply]

Support Is m:WikiCite considered the "home" of WikiCite now? ArthurPSmith (talk) 18:09, 15 February 2021 (UTC)[reply]

For what it's worth ArthurPSmith, that is where I, using this WMF user-account as the contracted 'wikicite coordinator', work. LWyatt (WMF) (talk) 15:23, 16 February 2021 (UTC)[reply]

Support--So9q (talk) 18:48, 15 February 2021 (UTC)[reply]
Support - PKM (talk) 21:12, 15 February 2021 (UTC)[reply]

On the basis of these three quick support comments on the first day, and nothing for the subsequent few days, I've made the proposed change. LWyatt (WMF) (talk) 18:51, 18 February 2021 (UTC)[reply]

Wikidata talk:WikiCite/Roadmap

Contents

Your preferred scenario[edit]

My 2c[edit]

A source is also a topic[edit]

Previous Wikidata RFC on this[edit]

How is Structured Data for Commons advancing?[edit]

In or out: no intermediate solution[edit]

How to cite?[edit]

The community appeal is not big enough to deserve an independent project[edit]

What is bibliographic data?[edit]

No dedicated data model please[edit]

Questioning "growing pains"[edit]

What options would allow fair-use metadata?[edit]

Outcome?[edit]

Update: a sister-project approach[edit]

Asides[edit]

Tagging the page as historical[edit]

Navigation menu

Wikidata talk:WikiCite/Roadmap

Your preferred scenario[edit]

My 2c[edit]

A source is also a topic[edit]

Previous Wikidata RFC on this[edit]

How is Structured Data for Commons advancing?[edit]

In or out: no intermediate solution[edit]

How to cite?[edit]

The community appeal is not big enough to deserve an independent project[edit]

What is bibliographic data?[edit]

No dedicated data model please[edit]

Questioning "growing pains"[edit]

What options would allow fair-use metadata?[edit]

Outcome?[edit]

Update: a sister-project approach[edit]

Asides[edit]

Tagging the page as historical[edit]

Navigation menu

Search