Wikidata:Requests for permissions/Bot/BorkedBot 3
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved --Lymantria (talk) 10:36, 6 February 2021 (UTC)[reply]
BorkedBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: BrokenSegue (talk • contribs • logs)
Task/s: Populate financial data using ticker symbol (P249) and stock exchange (P414).
Code: See bot user page (though this particular code isn't up yet)
Function details: I plan to populate a series of properties based on the ticker/exchange information using public APIs. In particular I plan to populate (if absent / when applicable / when available):
- Central Index Key (P5531)
- ISIN (P946)
- market capitalization (P2226)
- employees (P1128)
- Legal Entity Identifier (P1278)
- industry (P452) (this one is a little tricky since I'll have to map their industry names to our items. I'll probably be very conservative)
I plan to start with only tickers from US/UK based exchanges and with a similarity filter on company name to avoid issues with ticker re-use/duplication. Once the data is in I think re-running every quarter / half-year or so would make sense for the properties that change (market cap/employees). There are a few more identifiers I could populate but we don't have properties for them (e.g. one for Financial Instrument Global Identifier (Q4928186)). Total volume of edits should be low.
The reason the code isn't ready is that I've been waiting on my previous request to be approved for a long while and so I figured I should get this in now to avoid waiting forever for approval.
--BrokenSegue (talk) 02:49, 29 October 2020 (UTC)[reply]
- This should be done carefully - Many Wikipedia articles and Wikidata items conflate different level of company in one article/item. e.g. China Unicom (Q1068485) refers to five entities:
- China Unicom Group Co., Ltd. owned by (P127) State-owned Assets Supervision and Administration Commission of the State Council (Q1629075) (98.45%) and two others
- China United Network Communications Limited, owned by (P127) #1 above (36%)and many others (SSE: 600050)
- China Unicom BVI, owned by (P127) #1 (26%) and #2 (74%)
- China Unicom (Hong Kong) Limited, owned by (P127) #3 (77%) and others (SEHK: 762 and NYSE: CHU via American depositary receipt (Q463881))
- China Unicom Co., Ltd., owned by (P127) #4 (100%) and owner of many subsidiaries
#2 and #4 are both listed company but they have different market capitalization. Many companies have a more complex group structure.--GZWDer (talk) 13:27, 29 October 2020 (UTC)[reply]
- you are right it can be confusing but I'm unclear what you think I should do. If an item has a ticker symbol (P249) and stock exchange (P414) then I will look up that ticker and return the metadata about it. The possibilities for error are that the ticker has been applied to the wrong item (I can't detect that or do anything about that) or the API I'm using is itself confused (seems unlikely). I don't think we should not populate this data because of those two risks. BrokenSegue (talk) 15:13, 29 October 2020 (UTC)[reply]
- Before adding dozens of data points to an item, it might be worth manually checking each item to validate that it's correctly mapped. --- Jura 09:43, 31 October 2020 (UTC)[reply]
- @Jura1: There are 11,000 items with stock exchange (P414). Me manually checking them all is not feasible. Further even if I did any of them could be changed at any time or become invalid at any time. Plus I'm not sure why I would be better at checking for these confusing cases than whoever entered the data. Maybe a compromise is I could only use statements with references? But asking bot operators to be responsible for existing bad data in wikidata (which will be a minority) doesn't seem like a good path forwards. We need to weigh the cost of errors with the value of the additions. BrokenSegue (talk) 18:18, 31 October 2020 (UTC)[reply]
- Maybe there is a good way to identify cases like China Unicom sample above? Multiple stock exchanges and tickers might be worth checking before hand. Maybe other indicators could be used. If such indicators are present, caution is advised. BTW, reminds me of Wikidata:Property_proposal/financials_URL that still lingers. --- Jura 20:15, 1 November 2020 (UTC)[reply]
- @Jura1: There are 11,000 items with stock exchange (P414). Me manually checking them all is not feasible. Further even if I did any of them could be changed at any time or become invalid at any time. Plus I'm not sure why I would be better at checking for these confusing cases than whoever entered the data. Maybe a compromise is I could only use statements with references? But asking bot operators to be responsible for existing bad data in wikidata (which will be a minority) doesn't seem like a good path forwards. We need to weigh the cost of errors with the value of the additions. BrokenSegue (talk) 18:18, 31 October 2020 (UTC)[reply]
- Before adding dozens of data points to an item, it might be worth manually checking each item to validate that it's correctly mapped. --- Jura 09:43, 31 October 2020 (UTC)[reply]
- @Jura1: Should we consider your last remark as an objection? Lymantria (talk) 09:10, 6 February 2021 (UTC)[reply]
- Let's approve it. Afterall 11000 aren't that many. (I do think one should avoid adding more data to items with clear signs of bad shape). --- Jura 09:33, 6 February 2021 (UTC)[reply]