Shortcuts: WD:INTRO, WD:I

User:Jaredscribe

From Wikidata
Jump to navigation Jump to search

A category (lit. accusation) is made of an object of thought or action, regardless of whether it potentially or actually exists, or does not exist in either or both cases.

an individual is a member of a species or class, of which some category can be predicated.

a genus is a set of species of the same kind

of the essential and accidental attributes, that which differentiates the species from another, constitutes its essence and is used in the definition of that species or genus under consideration. Aka quiddity or haeccity. Diaresis is the method of arriving at a definition by means of differentiation.

The tractate on Categories (s:Categories (Aristotle)) is instructive on the metaphysics of ontology, and its terms and relations are necessary tools for Prior analytics, which enables the scholarly w:dialectic, the "organic method"

My project on Wikidata is to organize and structure data to enable better cross-language and interwiki linking and datasharing from wikisource to wikiquote, and thence to wikiversity, and of course, wikipedia. And to conduct Posterior analytics


Data Modelling Days November Notes[edit]

Q367664 is engineering w:Data Model w:he:מודל נתונים Q1172480, a method of Model-driven_engineering Q1941909

Wikidata:Events/Data_Modelling_Days_2023, structured data, questions about series, expedition class, subclass, inheritance

Since library-ies are full of data for us to enjoy, I'll someday/maybe: Wikidata:WikiProject_LD4_Wikidata_Affinity_Group

audio/video: YT @wikimediaDE

EntitySchemas[edit]

Slides

presentation and slides by Lydia Pintscher, Arian Bozorg Let's write a simple EntitySchema from scratch, to understand how it works and the tools around it. Facilitator: Jelabra Slides

schemas: EntitySchema are a namespace(?) of encoded pages in Wikidata. human is EntitySchema:E10

Q54872 is a Q235557 used by the Q54837, including Q16354758 which provides RDF serializations to RDF dumps and SPARQL endpoints. EntitySchema:E1 EntitySchema:E50 A Shape is not a schema, which RDF does not want impose, but which producers and consumers assume, therefore there is an implicit schema. Q29377880 Shape expression language language (ShEx), provides description. validation. ShaCL - constraints. ShEx is like w:Turtle, or SPARQL. learn more at shex.io, www.weso.es/YASHE Leverage and reusue existing EntitySchema:E69, for example. ShEx Can be used to w:Generate UI: forms, for input and validation.

[1] User_talk:M.alten.tue wants us to give him input on improving the output of ShEX validator, to make non-conformance findings easier to interpret, and to use. Hoping more projects will start using the schema validator, make it more visible to new users.

Reusing Wikidata on Wikivoyage[edit]

Reusing Wikidata on Wikivoyage

presentation and slides by User:DerFussi new sandbox mine: voy:User:Jaredscribe

Module:Databox Q53931871, said to be Q17487586, instance of Q15184295 Module:Wikidata instance of Q59259626 Module:Vcard

And on Commons[edit]

Slides

Presentation and slides by User:Jarekt

Here's my initiation into the project where I'll try this out for the first time: commons:User:Jaredscribe. (I like to navigate by q:WQ:Blueify-ing red links.) Module:Information SDC Module:Coordinates, Module:Artwork,Module:Institution, Module:Creator, Module:Authority control

Template:Wikidata Infobox

for example, commons:Mona Lisa where much info comes from wd. How do we know which Wikidata item to fetch?

Module:Wikidata data Il8n by Module:Complex data Provenance: owned by P127, significant event P793, collection P195 (hard - previously in collection of, on loan to)

Mitigating conflation Q14946528 and duplication Q55414[edit]

by User:Epidosis at issue especially with batch imports 2 constraint violations: unique value constraint violation (same ID in two or more items - duplication. single value constraint violation: two or more ID values for one item. Have gadget for merges, need one for splits - which now involve alot of manual edits, long and complex. Imrove data round-tripping (and duplicates are found in external databases, thus their dbs could be improved)

Bootstrapping a Knowledge Graph[edit]

texts and tables <==> knowledge extraction <==> terms and relations <==> KG

subset of entities of interest - browse and retrieve the ontology hierarchy. Zero-shot Analogical pruning. Analogical inference requires Abstraction, Inference, Creativity

Accessing Wikidata on Obsidian with example Biodiversity informatics[edit]

Slides to understand species distribution in Botany echinopscis.github.io] based on markdown files, access relevant data in context.

Enforcing Data Models with Autofix[edit]

Slides

speaker User:Epidosis Template:Autofix stored on property talk pages, partially duplicates info of property constraints, cannot be queried, is not much visible. Property_talk:P106 occupation Property:P214, Property_talk:P214 example w:Martyr, Q6498826, w:he:מרטיר. C.f. w:Sanctification of the name => w:Kiddush Hashem w:he:קידוש השם Q2919881 (Judaism), w:he:שהיד w:Shahid (Islam).

In our opinion martyrdom is not an occupation, therefore autofix => subject has role (Property:P2868), aka "status".

Pattern ^Q6498826$ will be automatically replaced to Q6498826 and moved to subject has role (P2868) property.
Testing: TODO list

Wikidata Query Service[edit]

query.wikidata.org and wikidata's SPARQL endpoint

Scaling it - Split the Graph experiment[edit]

speakers: User:GLederrey (WMF), User:DCausse (WMF), re: Wikidata:SPARQL_query_service/WDQS_backend_update/October_2023_scaling_update Wikidata:SPARQL_query_service The Query Service uses Graph database w:Blazegraph instead of RDBMS MySQL (which is used by MediaWiki software) 1MM edits/day. All need to be ingested. Approaching Hard scaling limit, cannot be addressed by more hardware. 3 facets write load, read load, data size. One of the largest SPARQL endpoints on the web.

Scaling Graph databases is a hard problem. Experimenting with splitting into subgraphs and federating. Need a size reduction of 25%

separation of scholarly articles as a good first experiment: Scholarly articles represent roughly half of Wikidata triples, affect only about 2% of queries (many of which are done as part of the data imports), and such a split would be easy to understand. Also considered: Truthy vs fully reified graph: the truthy graph would be smaller, but we would still need to maintain the full graph

Alternate reference model[edit]

speaker: User:ArthurPSmith

Slides for Wikidata Data Modelling Days 2023 Friday 14:00 UTC session

On wp, we all know [citation needed], yes, we want data to be w:WP:Verifiable In wd, ever statement has independent references, which can be cumbersome to enter and lead to duplication and storage limits.

What WikiData Is[edit]

IMHO. copied from Wikidata:Introduction, with proposed improvements eventually forthcoming, someday/maybe .. Help:About_data I will probably try to merge these two pages into one, right here, and change the wikipedia links to wikidata links. Wikidata:Glossary will be important.

Wikidata is a free, collaborative, multilingual, secondary database, collecting structured data to provide support for Wikipedia, Wikimedia Commons, the other wikis of the Wikimedia movement, and to anyone in the world.

What does this mean?[edit]

Let's look at the opening statement in more detail:

  • Free. The data in Wikidata is published under the Creative Commons Public Domain Dedication 1.0, allowing the reuse of the data in many different scenarios. You can copy, modify, distribute and perform the data, even for commercial purposes, without asking for permission.
  • Collaborative. Data is entered and maintained by Wikidata editors, who decide on the rules of content creation and management. Automated bots also enter data into Wikidata.
  • Multilingual. Editing, consuming, browsing, and reusing the data is fully multilingual. Data entered in any language is immediately available in all other languages. Editing in any language is possible and encouraged.
  • A secondary database. Wikidata records not just statements, but also their sources, and connections to other databases. This reflects the diversity of knowledge available and supports the notion of verifiability.
  • Collecting structured data. Imposing a high degree of structured organization allows for easy reuse of data by Wikimedia projects and third parties, and enables computers to process and “understand” it.
  • Support for Wikimedia wikis. Wikidata assists Wikipedia with more easily maintainable information boxes and links to other languages, thus reducing editing workload while improving quality. Updates in one language are made available to all other languages.
  • Anyone in the world. Anyone can use Wikidata for any number of different ways by using its application programming interface.


How does Wikidata work?[edit]

This diagram of a Wikidata item shows you the most important terms in Wikidata.

Wikidata is a central storage repository that can be accessed by others, such as the wikis maintained by the Wikimedia Foundation. Content loaded dynamically from Wikidata does not need to be maintained in each individual wiki project. For example, statistics, dates, locations, and other common data can be centralized in Wikidata.

The Wikidata repository[edit]

Items and their data are interconnected.

The Wikidata repository consists mainly of items, each one having a label, a description and any number of aliases. Items are uniquely identified by a Q followed by a number, such as Douglas Adams (Q42).

Statements describe detailed characteristics of an Item and consist of a property and a value. Properties in Wikidata have a P followed by a number, such as with educated at (P69).

For a person, you can add a property to specify where they were educated, by specifying a value for a school. For buildings, you can assign geographic coordinates properties by specifying longitude and latitude values. Properties can also link to external databases. A property that links an item to an external database, such as an authority control database used by libraries and archives, is called an identifier. Special Sitelinks connect an item to corresponding content on client wikis, such as Wikipedia, Wikibooks or Wikiquote.

All this information can be displayed in any language, even if the data originated in a different language. When accessing these values, client wikis will show the most up-to-date data.

Item Property Value
Q42 P69 Q691283
Douglas Adams educated at St John's College

About me[edit]