Wikidata talk:WikiProject Companies/Archive 1

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

First thread

@Jklamo: What about centralizing our discussion about companies here? To your recent change at Q335415 I have a remark: pulp and paper industry (Q2283886) is the industy, papermaking (Q335415) is the process.--Kopiersperre (talk) 16:20, 12 October 2016 (UTC)

Organizations generally?

I was thinking about starting a wikiproject on "organizations" and noticed this one. Is there an interest in generalizing this project to nonprofit, educational organizations, government agencies etc? I believe the Legal Entity Identifier (P1278), for example, covers a lot more than just companies. I was going to work on getting that more fully populated (only 77 ids so far). ArthurPSmith (talk) 10:10, 10 November 2016 (UTC)

There is, provided we don't get swamped with school boards and sports teams. Many customers need not just company data but also org data (eg who legislated what when, or which govt agencies are working in the particular domain). Eg DnB data includes all kinds of gov departments (so the biggest Global Ultimate parent is the US government, then is the Chinese government. --Vladimir Alexiev (talk) 17:37, 23 February 2017 (UTC)

On the other hand, GLEI is mostly companies, and a lot of them in the financial sector. --Vladimir Alexiev (talk) 17:39, 23 February 2017 (UTC)

Just how much data does Wikidata want to hold?

Not sure I understand Wikidata scope. For example will WikiData follow Wikipedia and exclude most smaller companies from coverage, or will Wikipedia cover every company (or at least every public) company?

For the U.S., perhaps other countries, there is already highly structured financial statement data for all public companies. See for example:

Of course quarterly financial statements are only the tip of the iceburg. Will there be 8K (news releases) included in WikiData, currency reporting in real time, last sale transaction by transaction, bids and offers, etc.

To cover companies you need financials plus market data (last sale, closing price, bids/offers), plus a database of who is trading (Legal Entity Identifiers). All the notes to the financials and all the text filings are very important too. Will WikiData swallow the entire thing? Much of the usage of the financial data sets are high speed, does it make any sense to try to load it into WikiData rapidly?

Rjlabs (talk) 00:14, 5 December 2016 (UTC)

@Rjlabs: good questions. I don't believe wikidata as it stands is suited to capture time-series data generally - for example stock prices over time - there's no datatype appropriate for time series other than quantity datatype with qualifiers, which would mean one statement for every point in time, which is really really messy. However, as far as containing some data about "every company", I think that may legitimately be within wikidata's scope. WD:N basically requires only that the entity be something in the real world that is described by reliable sources, so as long as we have some third party dataset within appropriate information, and with a reasonably compatible license, we can pull that in. Or portions of it depending on the license. I've been working with the GLEIF data for US entities via Mix N' Match - here - and found a huge number of entities there which aren't in wikidata but maybe could be. I've marked all the mutual fund/hedge fund/retirement fund entities in GLEIF as NOT suitable for wikidata - there are a lot of those as well and I doubt we really need a wikidata entity for every "municipal bond index fund" or whatever. Of course a large fund that is covered in wikipedia for some reason would be fine. Also not everything else in GLEIF is a company - there are a lot of churches, cities and counties, private colleges and universities, etc. and there are a few individuals. A lot of the "small companies" in GLEIF seem to be related to real estate - property management or rental etc. So I think by-hand filtering is still appropriate there but in the long run a good fraction of those small companies probably ought to be in wikidata too. ArthurPSmith (talk) 14:08, 5 December 2016 (UTC)

Although WD notability policy is enough broad to cover all companies, I think we have no human resources to maintain millions items about companies. But I have no problem with having items about all listed companies and other notable companies (by means of enwiki notability).

Even if we have market capitalization (P2226), I think it is better to use it on annual (or quarterly) basis, rather then day basis. About total revenue (P2139) (etc.) I think WD is able to swallow quarterly financial statements.

For ownership changes (parent organization (P749) and owned by (P127)) it will be useful to store only sizable changes, not a purchases of few stocky by management. But when we are talking about ownership, it will be useful to clarify use of these two properties first.--Jklamo (talk) 10:36, 6 December 2016 (UTC)

Number of customers property

Often when I read WP articles on subscription based companies (e.g. telecommunication companies, newspapers, pay-tv channels, etc.) the first sentence in the WP article is something like 'company XYZ is the largest company (based on number of customers)'. Seems like a pretty obvious company information. Unfortunately I didn't find a way to tackle that, often detailled information, into wikidata.

So I was wondering: is there really no wikidata property for 'Number of customers'?

Did I miss something?

Any hint appreciated..

Givegivetake (talk) 21:26, 16 January 2017 (UTC)

hmm, closest I can think of for this right now is has part(s) of the class (P2670) with value customer (Q852835) and a qualifier with the count, but that doesn't seem right. I think you should propose a new property for number of customers... ArthurPSmith (talk) 13:50, 17 January 2017 (UTC)

Number of subscribers would be more fitting. The number of customers of a normal newspaper that don't subscribe is likely unavailable. ChristianKl (talk) 16:03, 3 February 2017 (UTC)

Thanks for your feedback! I added a first proposal at https://www.wikidata.org/wiki/Wikidata:Property_proposal/Number_of_subscribers Looking forward for discussion! Givegivetake (talk) 20:01, 25 February 2017 (UTC)

Sources of Data and How Much is Too Much

My impression is that WP/WD are not "business friendly" and have 150-200k true "Companies" (the Company class includes Independent Cities and things that we won't normally consider companies). --Vladimir Alexiev (talk) 15:22, 16 February 2017 (UTC)

I think the issue isn't so much "business friendliness" as a lack of open online resources that provide reliable information about companies. The GLEI data is a good starting point. Also there is a bit of a history of small companies creating their own wikipedia/wikidata entries as a form of advertising which is frowned upon... Anyway, please join the discussion at the existing wikiproject! ArthurPSmith (talk) 15:34, 16 February 2017 (UTC)

GLEI has 500k, but that's very low. DnB has 280M (not just companies, eg a Wallmart store could have a DUNS). OpenCorporates has 127M from 160 registers (there's like 280 registers in the world). We'll be adding the BG Trade Register in a datathon end of March, together with OpenCorporates. My current project is H2020 euBusinessGraph http://cordis.europa.eu/project/rcn/206353_en.html, and we'll likely convert OpenCorprorates to RDF.

The question is: WD won't accept 160M companies, and for good reason. Many companies have various registrations in different jurisdictions (even those that don't seek to hide money flows), plenty of them inactive... If you're doing business for 50 years, registrations "grow organically" and get many and messy. Also, I could register a company in 2-3h and then not do anything with it; or consider mom & pop shops with 2-3 employees. So even if WD Notability is lax, we really need some notability conditions, but it's hard to define such. Market cap or turnover could be such conditions, but they are not always available: many registers don't publish them in structured form, and some important companies are not public. --Vladimir Alexiev (talk) 19:39, 23 February 2017 (UTC)

@Vladimir Alexiev: these are good points. OpenCorporates could certainly be considered a valid source of verifiable data, but 160 million+ items would definitely overwhelm things here. It may be up to us to define reasonable criteria for inclusion. Here are some thoughts:

If a company has a wikipedia page (in any language) then they certainly should be included (presumably the 200k already has these?)
All public companies with stock market listings probably should be included? Do you know how many that includes?
Maybe anything that qualifies as more than a "small business" (i.e. $50 million or more in annual revenue, 50+ employees) should be included?
Factories, warehouses, research centers or other facilities in different locations might be worth having as separate items if they are particularly notable or have large staff (over 50?)

separate stores and franchises probably shouldn't be included generally though. I think with criteria like this we might be looking at up to several hundred thousand to 1 million company items, which seems reasonable to handle here. ArthurPSmith (talk) 20:38, 23 February 2017 (UTC)

@ArthurPSmith: If a company has a wikipedia page: sure, that proves notability but even that is hard to identify. Eg go to OpenCorporates and search for goldman sachs. Which of the 1.7k entries corresponds to https://en.wikipedia.org/wiki/Goldman_Sachs? OCorp has about 1000 https://opencorporates.com/corporate_groupings that are user-contributed (eg Grouping of Goldman Sachs but seems to be currently broken), but that's a small number. --Vladimir Alexiev (talk) 15:45, 11 March 2017 (UTC)

Or maybe someone just deleted from https://opencorporates.com/corporate_groupings/Goldman+Sachs+Group+Inc? Tracking here https://twitter.com/valexiev1/status/840590062169542657 --Vladimir Alexiev (talk) 15:49, 11 March 2017 (UTC)

Entities vs. Establishments (a/k/a Facilities)

Large corporations typically own numerous subsidiaries. These entities are often spread around the globe. Further, large corporations, through those subsidiaries frequently own several facilities (or factories), again, typically spread out in different locations. Many establishments (factories / facilities) keep their own set of accounting records, have their own "reputations", pay on time or extend out payments, borrow "locally" based on a local incorporation, have local assets/liabilities/profitability, and expand or contract at their own local rates. This is why Dunns has separate numbers for establishments. Economic reporting relies on the the carefully constructed national registry of entities, as does state level and county level economic accounting. Considerable effort is expended maintaining the national registry, including good coverage of entities (facilities) so data may be aggregated properly by industry and geographic boundary (county / city, state, national).

There are numerous company pages currently on Wikipedia of companies with less than 500 employees. Likewise there are innumerable establishments that have over 500 employees, yet they do not have dedicated Wikipedia pages. While establishments of over 500 employees may be considered "lower interest" in terms of an encyclopedia, they are of critical interest in terms of industry data, and data at the County and State level. In my opinion, WikiData, given its data (vs. encyclopedia) orientation, should be designed up front to well handle both Companies (a legal concept) and entities, especially since companies have "fuzzy" geo coordinates, are often engaged in dozens of industries. Taxation and regulation of workers and business activities also occurs at the Federal, State and local levels. Legal jurisdiction for liability claims, workers compensation, etc. are often highly dependent on where the establishments are located vs. where the ultimate, top level, consolidated entity happens to be domiciled. Wikidata needs granular data at a low level, that is well structured. The establishments link together upwards to various legal entities (subsidiaries), and the subsidiaries link together upwards to form companies that issue consolidated financial statements. There are a vast number of data uses including different types and levels of aggregations that depend on excellent structuring at the granular level. Rjlabs (talk)

@Rjlabs: your signature here was missing a timestamp - something need to be adjusted in your settings? Anyway I could see from history this bit was recent. On your comment: I absolutely agree, I think each facility (not necessarily each building, but each corporate entity of significant size at a specific geographic location) should have its own item. I think we do have all the properties we need for this - parent organization (P749) and has subsidiary (P355) for example - but maybe you're thinking we need something else here? We're certainly missing the items themselves. For the most part they would not have individual wikipedia pages and so these would be wikidata items without sitelinks. ArthurPSmith (talk) 14:19, 23 March 2017 (UTC)

@ArthurPSmith: Arthur, I'm very much on board with that approach. As for properties and objects I think its best for WikiData to follow rather than try to lead. I think we should "go to school" on what is already in place. Specifically: 1. LEI (especially good for "company" as a legal entity view, including the new hierarchy of ownership/control), 2. EDGAR (Reverse engineer the current company infoboxes on wikipedia and "refactor" them so they can be auto populated from EDGAR xml (via WikiData) - including anything we can extract in terms of segments, sectors, industries, officers and directors, key executives, major owners, exchanges traded on (including the specific exchange ticker identifiers), etc. 3. "Reverse engineer" how the US Census / BLS / SS Administration structures it's surveys of the economy (various surveys of manufacturing, services, etc.) Specifically how they maintain a very high quality database of establishments that they survey periodically, (and constantly interpolate from those periodic surveys) and cast the sample data into NIPA accounts (for things like quarterly GDP / economic reporting, input/output reporting). I see the challenge here more as a "plumbing project" where we hook up wikidata to other well structured source "wells" vs. trying to reinvent "unique" data structures here, then populate by hand. I'm sure there are additional outside expert sources to tap here as well. International accounting standards is close with their own taxonomy, EU?, UK sources of xml company data?, Worldbank/UN economic accounts schema, etc. To be really valuable, company data (including establishment data) must be disciplined enough to "add up" at the county level, state level, nation...all they way up to global. To do that we need to merge accounting and economic schema. Ultimately some of the high level economic "guestimates" could give way to just adding up detailed, granular data, no need for statistical sampling and hand editing. Give us any geo polygon, or industry/sector "polygon" (or both) and have WikiData sum up the most current detailed info for you. (Yes that is a long term goal.) Are you from the U.S.? (we need EU/UK and Asian input too, surely there must be schemas/structured data repositories outside the U.S. that are very important). I'm willing to do some serious investigation into establishment level data in the U.S., and merging accounting and economic schema. Would be great to link up with like minded international counterparts. Rjlabs (talk) 07:01, 26 March 2017 (UTC)

Yes, I'm in the US, definitely agree international input on this would be a big plus. ArthurPSmith (talk) 17:26, 27 March 2017 (UTC)

@ArthurPSmith: got to thinking about your interest in research establishments and their geo locations (as entities in themselves) vs. a company (a different but related ownership entity) which is based on top level consolidated financial statements. This is very similar to a large company having many significant "factories" located in several different U.S. Counties, and spread out internationally. Always the analyst I was thinking how you could back into a full list of academically/scientifically significant R&D centers (public and private). A few possibilities - one is the Description of property"section (required under SEC Reg SK and often captioned Item 102) of a 10K report. Those might note major R&D facilities physically (typically down to city and state, particularly if they occupied many square feet of office or lab space). Another possibility - mine Google Scholar, Microsoft Academic Search, ssrn.com, etc. for authors who are well referenced, grab their emails and see how many match a local, geo-locatable domain? Another possibility is reversing patent filings which contain both the inventor, including (typically) residential city and state and assignee (often a company) including city and state. Rjlabs (talk) 18:23, 27 March 2017 (UTC)

European Companies

I'm in Europe. Two relevant initiatives:

euBusinessGraph (see above) aims to integrate data on EU companies, but started recently.
BRIS allows both national registries to exchange data (eg DE to mirror AT since there is a lot of trade between the two companies), and general customers to get such data. The data is not free though.
- The European Business Registry http://www.ebr.org is the site that presents the results of BRIS data aggregation.
- Countries covered: http://www.ebr.org/index.php/member-countries/: EBR currently covers 26 jurisdictions in Europe, 24 of which are on line in the EBR service. This includes some non-EU/EEA coutries, eg Macedonia.
- Data providers: http://www.ebr.org/index.php/information-distributors/. 17 distributors. Many of them are national registries, but there are also companies, i.e. a sort of Data Economy has formed around EBR.

We can talk all we want, but at the end of the day the question is what open data sources exist... --Vladimir Alexiev (talk) 08:05, 8 April 2017 (UTC)

External Identifiers used for Companies

Please help: (removed old link because it was lost in an old archive, and copied below by rjlabs)--Vladimir Alexiev (talk) 19:40, 23 February 2017 (UTC)

Trying to find all External Identifier props used for Companies. This finds all external identifier props:

select ?wd ?lab ?desc {
  ?wd wikibase:directClaim ?wdt.
  ?wdt a owl:DatatypeProperty 
  filter (exists{?wd wdt:P31/wdt:P279* wd:Q19847637} # Unique Identifier
      || exists{?wd wikibase:propertyType wikibase:ExternalId})
  #filter exists {[?wdt []; wdt:P31/wdt:P279* wd:Q783794]} # a Company: causes timeout
  ?wd rdfs:label ?lab filter(lang(?lab)="en")
  optional {?wd schema:description ?desc filter(lang(?desc)="en")}
}

Try it!

But if I uncomment the part about Company, I get a timeout. The bracketry is probably confusing, so we can expand it like this for clarity:

   filter exists {?company ?wdt ?any_prop; wdt:P31/wdt:P279* wd:Q783794}

--Vladimir Alexiev (talk) 17:05, 23 February 2017 (UTC)

Annual reports

Please see Wikidata:Property proposal/annual report.
--- Jura 11:25, 26 February 2017 (UTC)

Notified participants of WikiProject Companies
--- Jura 06:56, 5 March 2017 (UTC)

Discussion of Financial Statements

moved from the Annual report property discussion to here because it has broad implications

Comment I'm looking at the entire "company" and related object hierarchy (more here). So, only preliminary ideas for this property. Here you might want to think about "generalizing" to Public financial statement preferably with a URL to the real thing. (perhaps as an entity of its own, not a property). Adding "public" qualifies that the statements issued are publicly accessible without restriction. Private financial statements are of little use to WikiData. Under that you might want periodicity (week/month/quarter/annual - annual only would be very restrictive), typical release date, accounting principles followed (enumerated list with link back to accounting standard setter, and year), dates released. If possible have a Legal Entity Identifier (what legal entity is publishing these financial statements). Would be good to mark statements audited or unaudited. If audited it would be good to have an enumerated list of audit opinions (clean, subject to (specifically stating what that is), going concern questioned, etc. The auditor should be identified, and the date of the audit opinion. Special care should be taken to extract as much of this as possible from public XML repositories such as EDGAR in the U.S. and 990 Public Tax Filings from the IRS for non profits. Also need to remember that many entities issue public financial statements are not for-profit entities (cities, municipalities, non profits, PACs, non profits via 990 tax filings, charitable groups, etc. In centrally planned governments there are often public financial statements of various state owned entities. These vary from limited "production reports" to full, detailed accounting statements. Any "financial data" released to the public in the form of a regular statement is within scope for WikiData. Rjlabs (talk) 19:15, 10 March 2017 (UTC)
This is meant for organizations in general, not just companies. I fail to see the advantage of adding "public". If there is an url, it is publicly available.
A general problem we have with Wikidata:Property proposal/Economics is that many properties are requested and created (see Wikidata_talk:WikiProject_Economics#Sample_items.3F), but not necessarily used. This is possibly due to people attempting to follow schemes without much interested in actually contributing to the project or even the capacity to create sample items with their proposals (Wikidata_talk:WikiProject_Economics#Sample_items.3F). This leads to huge overhead of unused properties.
A capacity limitation of Wikidata is also that we can't mirror financial statements in their entirety.
Even with this property, I think people can still create items for specific reports or other reports, but apparently none is actually interested in doing this (at least, I only found 4 (four) items when I checked). Items for annual reports could easily be cross-referenced with statement is subject of (P805).
--- Jura 08:50, 11 March 2017 (UTC)

@Rjlabs: Your comments show good domain expertise, but you're fairly swamping us with info, and as a result it's unlikely that info will be acted upon. Please look around some other property proposals: they're focused and describe one property only. Please study how some other domains are structured to get a feel of the Wikidata data structure. Certainly read the intro about "claims" and "qualifiers". --Vladimir Alexiev (talk) 11:36, 14 March 2017 (UTC)

Further dialog responding to @Jura1: and @Vladimir Alexiev:

Agree if there is a URL, its public, so no need for that word in the label (provided all instances actually have the URL). Not sure I like "annual" as quarterly (and other time periods) are very important.

Would like to hear much more about A capacity limitation of Wikidata is also that we can't mirror financial statements in their entirety I think I've been misinformed on company data scope from other project posters. The lead goal on the Company project page used to be to build a system to rival Bloomberg. (that is a LOT of data!) Who at WikiData/Wikimedia is the final authority on "scope" for company data here, or is there authoritative written guidance?
@Rjlabs: There is no authority, like on all wikipedias it's by consensus. You make property proposals (like this one), and they get discussed (but you have muddled this one). I'm not familiar with the Bloomberg data but if it's in-depth data about important companies, maybe that would be appropriate. However, I personally don't believe WD is the place to store full accounting reports in a structured way: URLs to such, and income/networth/profit yes, but all the numbers no. It's not so much about the tech capacity, it's about the people/crowd capacity. WD currently has 25M items: before talking of adding 100M items, you need to have a plan who'll maintain them and use them. A separate WD (WikiBase) instance could be created for that sort of data, like the EC project EAGLE has done for Epigraphy. Of course, it takes effort and enthusiasm to maintain such instance, EAGLE may have died, see https://www.facebook.com/groups/Wikidata.GLAM/permalink/933245130111585/ --Vladimir Alexiev (talk) 15:30, 19 March 2017 (UTC)

I would like to create a category: Economic properties and go through the property pages and tag them to category: Economic properties if possible. It would be a great benefit to easily know what has already been established.
Good idea. @Jura1:, is there an appropriate prop he can use? For the time being, can you help me judge the rest of these External Identifier properties: https://docs.google.com/spreadsheets/d/1x5ib5UxOpblGSPi8GRyfPEw5h-u0LCJ8uCLOua5Eaak/edit --Vladimir Alexiev (talk) 15:30, 19 March 2017 (UTC)

It strikes me that other standard schema, outside of Wikidata, need to be followed where standardization of data already exists vs. brewing up something quick and "homemade" for one off projects here at WikiData. Company data is pretty complex and very advanced outside of WikiData. How much of that does WikiData ultimately want to try to "improve" upon? Wouldn't it be better, and much more efficient, to identify and merely follow the best external standards/schema? And, to work towards better linking between various standing standards around the globe (further developing concordances, cross references, etc.)

In terms of scope. Here is an estimate of data items covering just the basic financial statements in the U.S. alone. 10,000 companies file in xml at EDGAR; each year 3 quarterly statements on form 10-Q, and one annual on form 10-K; each of those contain four main statements (income statement, balance sheet, cash flow statement, changes in equity); guessing each of those have 100 "mainstream" data points. So over a 10 year period that is 160 million data points. Ultimately you need much more than that to include the data in the annual proxy statements (including all the officers, directors, major shareholders, etc) plus the detailed notes to the financial statements (many running more than 100 pages), managements discussion and analysis, and additional Reg SK disclosures etc.

What "slice" of that data does WikiData want to store directly, and how is it going to import that automatically from EDGAR on a timely basis? Will the schema at WikiData accommodate the xBRL schema at EDGAR or will a transformation be required? Wouldn't it be better to talk EDGAR into offering a SPARQL node, store and serve up the data "through" WikiData?

re please study how some other domains are structured to get a feel of the Wikidata data structure - would like to look at domains that are currently well structured on WikiData that are most similar to the large amounts of "table data" as outlined immediately above. Recommendations as to which domains to study?
Eg see https://www.wikidata.org/wiki/Wikidata:WikiProject_Visual_arts/Item_structure, and https://www.wikidata.org/wiki/Wikidata:WikiProject_Visual_arts/Item_structure/Art_movements for a set of values

Would also like to have pointers to tools that "visualize" or at least output the WikiData class hierarchy, class properties, inheritance, specific enumerations, data types, etc. Anything like XML spy documentation generators? Is there and xsd for WikiData I can just load in XML spy and look at it? I know there are some tools available but am totally new here.
See https://tools.wmflabs.org/sqid/#/view?id=Q176831. For a class that is more populated, see https://tools.wmflabs.org/sqid/#/view?id=Q783794

re Certainly read the intro about "claims" and "qualifiers" Pointers to these? Any pointers to how enumerations are implemented?
https://www.wikidata.org/wiki/Help:About_data and https://www.wikidata.org/wiki/Help:Statements. Enumerations are simply items that are "instance of" some class (eg GAAP is an instance of accounting standard).

re you're fairly swamping us with info Sorry, do not intend to be overwhelming. Oddly, I have the same feeling here trying to acclimate. I have limited time to devote to this so I'm really trying to avoid being misinformed or mislead (as in the "duplicate Bloomberg here" comment in the company project page.)Rjlabs (talk) 18:51, 14 March 2017 (UTC)
The problem is not this discussion, the problem is that it's in the wrong place. I'd suggest cutting it out of here and putting it on the "wikiproject company" discussion page. --Vladimir Alexiev (talk) 15:30, 19 March 2017 (UTC)

Comment interesting comments, but somehow this goes far beyond this (relatively simple) property proposal.
--- Jura 08:47, 18 March 2017 (UTC)

End of block copied from Wikidata:Property proposal/annual report.

Comment I'm sure eventually you will find people interested in expanding it to the field suggested above, but to start, it might be preferable to try something less ambitious, such as the proposal at Wikidata:Property proposal/annual report. This even if it wasn't primarily proposed for companies, but organizations in general. The resulting structured data might also give you a better basis to formulate more property proposals.
--- Jura 10:40, 25 March 2017 (UTC)

Related project: WikiProject Universities

Logo for the Wikidata Wikiproject Universities

Hello!

I have started a somewhat related project, WikiProject Universities, which should overlap this project for private educational institutions. Many of the datasets of interest to our project also cover companies (for instance Open ISNI for Organizations (Q28527677)). Feel free to join, and suggestions are very welcome! Cheers − Pintoch (talk) 09:31, 7 March 2017 (UTC)

Thanks for the ping @Pintoch: I looked at OpenISNI (by Rinngold) and was surprised that amongst 400k entries, there are many non-research orgs (eg Merrill Lynch India, Tata Motors, etc). Lots of overlap so the two projects should definitely collaborate --Vladimir Alexiev (talk) 15:36, 11 March 2017 (UTC)

Great! And thanks for joining! The dataset is on Mix'n'Match (373 and 375), so if you get bored, you can always match things there… My understanding is that most of the institutions there don't have a Wikidata item yet. Not sure if they meet the inclusion criteria though. − Pintoch (talk) 16:28, 11 March 2017 (UTC)

Subsidiaries of multinational companies

Hi, I'm trying to match GRID to Wikidata items and I run into the problem of representing national branches of multinational companies. Should each of these subsidiaries have their own item? Let's consider an example: Honeywell International, Inc. (Q898208). This item already bears multiple GRID ids, each of them corresponding to a national subsidiary of the multinational group. Should I add others, such as grid.410336.3 (Honywell Canada)? Or is it better to create individual items for all these? I guess each branch has its own headquarters, national company number, number of employees, and so on. Or do these things just exist for legal reasons, and they don't actually represent anything different from the main group? But that conflicts with the uniqueness constraint on the identifiers. Ping @ArthurPSmith: who is involved in this import. − Pintoch (talk) 23:03, 18 March 2017 (UTC)

Hi, separate item for each national subsidiary is much more appropriate. --Jklamo (talk) 12:58, 19 March 2017 (UTC)

Agree with Jklamo. I haven't been creating a lot of new items for GRID identifiers as I've occasionally been finding some errors (duplicates) but we probably should feel freer to do that. Not only national subsidiaries, but individual corporate research labs may have separate records such as IBM's IBM Thomas J. Watson Research Center (Q476208). ArthurPSmith (talk) 20:08, 20 March 2017 (UTC)

I've created items for subsidiaries of business (Q4830453) which had multiple GRID with different countries (when the item had only one country itself). I hope I did more good than harm. I'm reasonably confident that these subsidiaries did not have items before as they would have been detected during the GRID import otherwise. At least the constraint violation report should go from 610 items with duplicate GRIDs down to 230. The remaining cases look more subtle. Some of them indeed look like duplicates on GRID's end, as in Red Universitaria Nacional (Q5841811). I've reported one but we might just point them to the list… − Pintoch (talk) 22:48, 3 April 2017 (UTC)

D-U-N-S number (P2771) and Bloomberg company ID (P3377)

Aren't these proprietary, subject to copyright and subject to licensing fees? If so they have to be removed unless they have been donated and that is documented. Same thing for memberships in published indexes such as Standard and Poor's (S&P), MSCI Inc. (formerly Morgan Stanley Capital International and then MSCI Barra), Dow Jones, CUSIP, ABA bank and routing numbers, SWIFT numbers, etc. Tickers and last sale prices tend to be declared public, after 10-15 minutes. Prior to that the data tends to be owned and licensed by the exchange. Rjlabs (talk) 00:00, 30 March 2017 (UTC)

The identifiers themselves cannot be copyrighted: technically, Wikidata just links to their websites by building the URL from the ID. We don't need to ask for permission to link to http://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=27444752, for instance. However, automatically scraping bloomberg.com to add many Bloomberg company ID (P3377) to our items is probably forbidden. For the rest of your concerns, facts cannot be copyrighted: if I learn a fact on a public website and write it in Wikidata with a reference to that source, there is nothing wrong about that. − Pintoch (talk) 08:16, 30 March 2017 (UTC)

OpenCorporates claims that DUNS cannot be used openly. Which would be very strange because it's used in the US Government procurement system. If I tell our DUNS to a potential client or partner, would I be in violation of some right? Weird --Vladimir Alexiev (talk) 07:58, 8 April 2017 (UTC)

Notified participants of WikiProject Companies because many eyes should be focused on not violating WikiMedia's rather conservative copyright policies. Disagree that identifiers which are privately compiled and maintained over time (at considerable expense) can't be copyrighted and made subject to strict licensing and usage fees. Look at the legal controversy around ISIN (P946) which attempted to embed CUSIP without paying the toll. At the end of the day a reduced (but not zero) licensing fee was required for Europe, and U.S. consumers can't just use ISIN (P946) for U.S. companies without paying their own usage fees. (more at [[1]] and [[2]].) Here is a little quote out of a CUSIP license:

“Subscriber agrees and acknowledges that the CUSIP Database and the information contained therein is and shall remain valuable intellectual property owned by, or licensed to, Standard & Poor’s CUSIP Service Bureau (“CSB”) and the American Bankers Association (“ABA”), and that no proprietary rights are being transferred to Subscriber in such materials or in any of the information contained therein. Any use by Subscriber outside of the clearing and settlement of transactions requires a license from the CSB, along with an associated fee based on usage. [[3]]

ABA, S&P, SWIFT, D&B - all have extensive teams of lawyers who's mission is to monetize the databases they create. Rjlabs (talk) 17:43, 30 March 2017 (UTC)

Wikipedia hosts CAS numbers for chemicals for years despite the American Chemical Association claiming to own the numbers. Nothing I saw in the Wiki article about INSI suggests to me that Wikimedia's hosting of the numbers produced any problems. ChristianKl (talk) 19:08, 30 March 2017 (UTC)

Indeed this is a very sensitive topic. ChristianKl I strongly disagree with this approach because it means they could ask for all these numbers to be removed from WD at any point in time. Without permission from the interested parties, these IDs should not be on Wikidata, and even more so if they are part of a business model. Either permission is officially given or these IDs have to be deleted. Another solution would be to create our own open source financial IDs system, which could then be used freely by Wikidata, but obviously it would be a very large undertaking. Parikan (talk) 02:50, 31 March 2017 (UTC)

Why do you believe that the IDs are protected by copyright? The fact that an organisations that gather IDs wants them to be protected doesn't mean that they are. There are a lot of different IDs hosted in Wikidata and the fact that the IDs have a financial background don't distinguish them as far as copyright goes. If you think the laws are in a way that prevent this large gathering of IDs how about contacting https://meta.wikimedia.org/wiki/Legal about it? ChristianKl (talk) 07:12, 31 March 2017 (UTC)

Again, these IDs are just hypertext links! The only reason why we could be required to remove URLs from Wikidata would be that the content that is available there is illegal in its own right (copyvio, child pornography, and so on), which is not the case as far as I can tell. If these institutions do not want people to link to these pages, they should not run websites like that. Some people did try to create a "link tax" in the EU but the project was abandoned, and I would be very surprised anything similar was in place in the US. − Pintoch (talk) 13:18, 31 March 2017 (UTC)

Hierarchy

Let's open the hierarchy issues. My thoughts and practices (feel free to comment directly in the list):

owned by (P127) - direct owner if possible, listing all notable owners. In case of listed company with fragmented ownership I go as far as 3%.
parent organization (P749) - usually only one top consolidated entity, may be multiple layers far
has subsidiary (P355) - direct controlled subsidiaries, I tend to not include indirect subsidiaries, if they are subsidiaries of subsidiary with item
owner of (P1830) - minor shares in other companies, joint ventures, again I prefer only direct ownership
business division (P199) - only non legal entities

--Jklamo (talk) 10:49, 30 March 2017 (UTC)

does part of (P361) and has part(s) (P527) play a role at all? For example chapters of a national nonprofit organization are not really "owned" by the national organization. Although I guess parent organization (P749) suffices there... ? ArthurPSmith (talk)

Comment suggest a hard look at LEI's approach to hierarchy, they have spent a lot of time with it, and they are likley to be the defacto standard going forward - on the legal entity side. I also have inquiries into the U.S. Census as to how they maintain their Business Register - which includes Establishments (roughly factory or facility locations that are geo located, have local employees, are assigned 1+ industry codes, etc.), Establishments aggregate to industry by industry statistics and reporting at the County, State and National level. Federal Employer IDs tie to payroll and geo location. LEI system is very much legal jurisdiction based (only a loose connection to actual geography). Need to incorporate both Legal concepts plus physical/on the ground Establishment/Facilities/Properties concepts into the ultimate hierarchy. Again highly recommend all designers of WikiData as it relates to "companies" thoroughly brief themselves on the specifics of LEI as its rolling out, and also how economic accounting tied to geography currently works. When you get to areas that are not based on capitalism there is still a great deal of economic accounting and the WikiData structures need to be able to accommodate all types of systems. Rjlabs (talk) 16:45, 30 March 2017 (UTC)

Rjlabs can you give us a pointer to LEI hierarchy work somewhere? ArthurPSmith (talk) 17:54, 30 March 2017 (UTC)

ArthurPSmith start here [[4]] with an overview and see the detailed references listed below the article, including detailed xml schemas. They uncovered many issues when it comes to reporting (and hopefully independently verifying) chains of ownership. Issues that WikiData will surely face. Yet, they are forging ahead. Best of all LEI info is free and open, not subject to copyright and fees. Rjlabs (talk) 18:24, 30 March 2017 (UTC)

Databases of companies listed on org-id.guide

Hi!

I have recently discovered org-id.guide, which lists databases of organizations (mostly companies). We now have org-id.guide ID (P4824) to link these lists to items about the database.

One thing we can do is find out which ones do not have Wikidata properties yet: User:Pintoch/orgid. So if you are looking for your next database of companies to import in Wikidata, you can have a look there. Org-id provides explanations in English about the structure of the ids, the openness of the database and other things like that, so it can spare you some time when creating the proposal. I have made Wikidata:Property_proposal/UK_Provider_Reference_Number based on that for instance.

− Pintoch (talk) 11:13, 14 March 2018 (UTC)

Cool, thanks! ArthurPSmith (talk) 15:09, 14 March 2018 (UTC)

Bloomberg database as a reference?

Notified participants of WikiProject Companies

Hello,

I have created Kadimastem (Q52145643) and used statement supported by (P3680) to cite Bloomberg as a reference, but it isn't correct since statement supported by (P3680) can be used as a qualifier only.

Item GoViral Inc. (Q32137229) uses stated in (P248), but it isn't correct either since Bloomberg LP is not a work.

Should we create a specific item about the Bloomberg database and use it with imported from Wikimedia project (P143) ? — Mathieudu68 ^talk 09:39, 23 April 2018 (UTC)

Certainly yes to creating the item for the Bloomberg Database, but instead of imported from Wikimedia project (P143), I would suggest to use stated in (P248). --MB-one (talk) 10:04, 23 April 2018 (UTC)

Agree. As alternative/addition reference URL (P854) with direct url to Bloomberg database (not sure if using Bloomberg company ID (P3377) in refs is appropriate).--Jklamo (talk) 10:21, 23 April 2018 (UTC)

I try to encourage using identifiers in references. If an identifier was wrongly added to an item, that makes it easier to identify the claims that were derived from its record and delete them. It is also useful if the formatter URL changes. − Pintoch (talk) 11:52, 23 April 2018 (UTC)

@Mathieudu68: so to be clear:

imported from Wikimedia project (P143) should not be used for this - it is intended only to indicate imports of data from the wikipedias, not from reliable external sources
You should use stated in (P248) with a new item for the Bloomberg private company database (I'm surprised we don't already have one, but I certainly can't find it!)
It is also encouraged to add Bloomberg company ID (P3377) with the company id, if that is the reference you are relying on.
retrieved (P813) is also a good idea.

thanks for working on this! ArthurPSmith (talk) 12:50, 23 April 2018 (UTC)

Okay, thanks a lot. I've created Bloomberg private company database (Q52148486), but the database seems not to have any formal name.

By the way I came across Q41804121 and I'm not sure that this item is relevant.

— Mathieudu68 ^talk 14:03, 23 April 2018 (UTC)

If you click "What links here" on that item you'll see it was created as a reference for Yves Fortier (Q3205904) - it should probably be nominated for deletion and the reference replaced by something like what we've suggested above. ArthurPSmith (talk) 19:37, 23 April 2018 (UTC)