User talk:ProteinBoxBot/Archive 1

From Wikidata
Jump to navigation Jump to search

Bot Considered spindle (Q213761) a protein

Please prevent misidentifications such as this one – cf. https://www.wikidata.org/w/index.php?title=Q213761&diff=69399914&oldid=31262370. Thanks, --Marsupium (talk) 13:02, 18 April 2014 (UTC)

Thanks for the note, we will try to add some checks for situations like this in the future... Cheers, Andrew Su (talk) 16:51, 18 April 2014 (UTC)

Bot creating some items with a hyphen as an alias

Hi there. Several items this bot has recently created have "-" as an alias. See e.g. Ftcd (Q18250668), Fshr (Q18250662), Frg1 (Q18250659). I assume this is a bug of some sort? Or is a hyphen actually a valid alias for these items, for some reason? — PinkAmpers&(Je vous invite à me parler) 21:25, 14 October 2014 (UTC)

Update: I ran an API query on the aliases of 50 of the bot's edits, and found 10 of them had "-" as an alias. If that number's representative, that'd be 20% of the bot's recent contributions you need to fix, and I'm not sure how long it's been doing this. — PinkAmpers&(Je vous invite à me parler) 21:51, 14 October 2014 (UTC)
Thanks for the update. It is indeed a bug in the bot. I have found it. I will make an update bot, which will remove all "-" from the alias. Andrawaag (talk) 07:03, 15 October 2014 (UTC)
The update bot removed all the dashes from the alias field. Andrawaag (talk) 05:36, 16 October 2014 (UTC)
@Andrawaag:, hey, think you still missed a few. Q18274219, Q18274220, Q18274222, for example. — PinkAmpers&(Je vous invite à me parler) 19:34, 19 October 2014 (UTC)
@PinkAmpers: I did a second thorough run. Now all should be really removed. May I ask how you found these "-" aliases? Andrawaag (talk) 20:32, 21 October 2014 (UTC)

Reverted some additions and dupes

Hi there,

I reverted some additions (Q9384218, Q3621198, Q4267295, Q3451849) since these were to the wrong data object. Also Q5470356 which exists as Q18024349. --Nachcommonsverschieber (talk) 15:29, 15 October 2014 (UTC)

Q5470356 and Q18024349 are not duplicates of each other. Q18024349 is a gene, where Q5470356 is a Protein. Andrawaag (talk) 11:01, 17 October 2014 (UTC)

Also Q5302993. --Nachcommonsverschieber (talk) 16:01, 15 October 2014 (UTC)

In these cases the bot added the gene data to the protein item. --Nachcommonsverschieber (talk) 11:30, 17 October 2014 (UTC)

Was a dupe Q14911608. --Nachcommonsverschieber (talk) 06:46, 17 October 2014 (UTC)

These are not dupe. Q5302993 is a Protein, Q14911608 a gene
Q14911608 and Q18258986 were dupes. --Nachcommonsverschieber (talk) 11:30, 17 October 2014 (UTC)

This Q511968 item got broken by the bot. --Nachcommonsverschieber (talk) 07:15, 17 October 2014 (UTC)

It seems okay to me. How did it break? Andrawaag (talk) 11:04, 17 October 2014 (UTC)
You can't undo the bot edit which added the gene data to the protein item. --Nachcommonsverschieber (talk) 11:30, 17 October 2014 (UTC)
@Nachcommonsverschieber: Thanks for noticing the issue. More of this type exist. I started a discussion on the matter on how to tackle this on the discussion page of the WikiProject Molecular biology

Another dupe: Q14905007 and Q18251182. Should I stop reporting? --Nachcommonsverschieber (talk) 13:45, 17 October 2014 (UTC)

Hi, I just unprotected Q17939676. Please make sure you don't add this much content to it again, items of that size tend to break various things, so please be careful and use common sense when adding content. Cheers, Hoo man (talk) 10:12, 25 November 2014 (UTC)

Hi, I have removed the many claims. It as an attempt to capture individual gene, before I learned of the existence of http://wdq.wmflabs.org/api?. Sorry for any inconveniences I might have caused. Andrawaag (talk) 23:10, 25 November 2014 (UTC)

Once is enough

It looks like you added a lot of duplicated claims, e.g. ICD-9 ID (P493), National Cancer Institute ID (P1395) and Disease Ontology ID (P699) claims on microcephaly (Q431643). Other duplicated claims you find at https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P699#Single_value --Pasleim (talk) 13:49, 26 November 2014 (UTC)

Using qualifiers and/or references to capture provenance on claims

Until now we were using qualifiers to capture the provenance on claims imported through our bot. The rationale behind this is that a given statement can only be valid in a given version of the resource imported (e.g. Disease ontology.) The reason is that a specific disease term can be split in to different subclass or being merged with another class in the disease ontology. So linking an external reference (e.g. ICD9, MeSH) can become invalid in a new version of the disease ontology. To capture this we chose to use the stated in property to express an explicit version of the disease ontology. Recently we noticed that most of the qualifiers made this way were deleted by KrBot. I am interested in why this was done. @Ivan_A._Krestinin: why were these qualifiers removed?

In this specific case it might actually make sense to capture the provenance as references and not as qualifiers. Each new release of the Disease ontology follows a previous release. I applied the provenance model manually on the WikiData entry of Q431643. There we capture that an imported statement is imported from the Disease ontology, on which date it was imported and from which version.

Any input on this matter is appreciated. Andrawaag (talk) 23:31, 5 December 2014 (UTC)

Wikidata model has strong division between references and qualifiers. Breaking this division confuses different data clients. For example it confused my bot`s algorithms. Database versions successfully processed by references, for example:
<my property> [123456]
reference: <stated in (P248)> [my database v1]
reference: <stated in (P248)> [my database v2]
<my property> [654321]
reference: <stated in (P248)> [my database v3]
Additionally the first claim can be marked as deprecated. — Ivan A. Krestinin (talk) 05:48, 6 December 2014 (UTC)
Andra, I agree with Ivan. These problems were foreseen by Wikidata's architects and fortunately have solutions baked into the data model. References are for provenance, and ranks are for claim status -- i.e. preferred, normal, or deprecated. Emw (talk) 15:59, 6 December 2014 (UTC)
emw, Ivan A. Krestinin Thanks for the clarification. I will adapt the our bot accordingly. I did a first run on Q18611349, which is an obsolete term according to the Disease Ontology. This bot run was successful in the sense that the rank "deprecated" was added:
 {   u'entities': {   u'Q18611349': {   u'claims': {   u'P699': [   {   u'id': u'Q18611349$20AA32D2-D98B-4368-88F8-069DC15627E4',
            u'mainsnak': {   u'datatype': u'string',
                                    u'datavalue': {   u'type': u'string',
                                                            u'value': u'DOID:0050001'},
                                                            u'property': u'P699',
                                                            u'snaktype': u'value'},
                                     u'rank': u'deprecated',
     ......
However this rank is not visible in the web browser when looking at the page of Q18611349. Is this correct? Andrawaag (talk) 13:58, 9 December 2014 (UTC)
Andra, the rank is visible, but it's subtle. In the 'Disease Ontology ID' statement in Actinomadura madurae infectious disease (Q18611349), the white portion at right has a stack of three small boxes for rank: preferred, normal or deprecated. The box on bottom is filled in grey, indicating that the claim is deprecated. Emw (talk) 00:29, 10 December 2014 (UTC)

Duplicate items

Hello, bot created many duplicates: Wikidata:Database reports/Constraint violations/P699#Unique value. — Ivan A. Krestinin (talk) 11:06, 2 February 2015 (UTC)

Ivan A. Krestinin thanks for noticing. Something went wrong. I will fix it. Andrawaag (talk) 13:28, 2 February 2015 (UTC)

OMIM codes with MTHU prefix

Hello, bot added ~30 OMIM ID (P492) with MTHU prefix. Example item: polycystic liver disease (Q246002). Links like MTHU014343 say "the requested page could not be found". Need these identifiers be deleted or converted to something else? Actual full list can be found on Wikidata:Database reports/Constraint violations/Mandatory constraints/Violations. — Ivan A. Krestinin (talk) 12:01, 7 February 2015 (UTC)

These escaped from a re-tooling in the source file. Will be fixed asap. Thank you for the heads up, Ivan A. Krestinin . Emitraka (talk) 11:48, 9 February 2015 (UTC)

National Cancer Institute

NCI only has content about cancer. This bot however has added a bunch of broken links to NCI pages that I have been removing [1]

Is there an automated way to remove all the broken NCI links? Thanks Doc James (talk · contribs · email) (if I write on your page reply on mine) 20:39, 18 February 2015 (UTC)

Hmmm, I think we were trying to incorrectly use an existing property. Those are valid identifiers in the NCI Metathesaurus. We'll get that straightened out... Thanks! Andrew Su (talk) 01:08, 20 February 2015 (UTC)
Thanks. Doc James (talk · contribs · email) (if I write on your page reply on mine) 03:59, 22 February 2015 (UTC)
Can you ping me when you get it fixed if you have not already? Doc James (talk · contribs · email) (if I write on your page reply on mine) 06:31, 24 February 2015 (UTC)
Doc James It is in the pipeline together with some other updates. I expect to have the script running and fixing this issue before the end of this week Andrawaag (talk) 11:41, 24 February 2015 (UTC)
Doc James it has been fixed. Andrawaag (talk) 10:41, 2 March 2015 (UTC)
Doc James We currently added all NCIThesaurus ID's to its proper property (NCI Thesaurus ID (P1748)) Thanks for noticing Andrawaag (talk) 15:24, 23 March 2015 (UTC)
Still causing problems. In this Aug 12, 2015 edit [2] you added for "property / National Cancer Institute ID" C3418. This is not correct. That is the NCI Thesaurus not the NCI ID. Doc James (talk · contribs · email) (if I write on your page reply on mine) 20:02, 22 August 2015 (UTC)
Doc James This is quite unfortunate, we recently completely rewrote our bot code. In this process this issue resurfaced unfortunately, in the sense that I selected the wrong property number. I have fixed this and all will be corrected in our next run. Apologies for the inconvenience Andrawaag (talk) 16:29, 24 August 2015 (UTC)
Thanks. Let me know when you have it fixed. User:Doc James. (unfortunately log in by chrome is broken) 24.66.183.25 03:55, 25 August 2015 (UTC)
I have removed it from the infobox disease until this is fixed [3] Cheers Doc James (talk · contribs · email) (if I write on your page reply on mine) 01:42, 28 August 2015 (UTC)
Doc James All incorrect NCI ID have been removed. I first intended to fix it on the next update cycle which is imminent, but then I didn't realise it broke the info boxes. So now I ran a specific process to remove all incorrectly added NCI IDs. Andrawaag (talk) 18:56, 28 August 2015 (UTC)

Hi, Andrawaag. Your bot caused a lot of constraint violations. What are the plans to connect annotations and taxa? --Succu (talk) 08:56, 20 February 2015 (UTC)

Succu I have reverted the taxa annotations. Andrawaag (talk) 10:41, 2 March 2015 (UTC)
Thank you! What about my question? --Succu (talk) 11:21, 2 March 2015 (UTC)

Duplicate items recreating

Hello, please do not create duplicate items like Liddle syndrome (Q19616194). It was merged 4 times already, but your bot created it again. Full list of such items please see on Wikidata:Database reports/Constraint violations/P699#Unique value. Also please merge or delete items from this list. — Ivan A. Krestinin (talk) 21:18, 21 March 2015 (UTC)

Ivan A. Krestinin Thanks for the warning. This issue was caused by the merging term not being known in the original Disease Ontology. We are currently taking care of this and it will be fixed on the next update cycle. In the mean time, does this issue provide us with a nice example where community curation affects proprietary resources. Andrawaag (talk) 15:23, 23 March 2015 (UTC)

serum albumin vs. human serum albumin

This bot added a bunch of statements to human serum albumin (Q11721976), which should have been added to blood albumin (Q424232). human serum albumin (Q11721976) is a species-independent item on serum albumin, whereas blood albumin (Q424232) is the human-specific version (according to the sitelinks). Would it be possible to move all the statements over? If there's no automated way to do this, let me know and I'll help out. –Hardwigg (talk) 03:21, 14 April 2015 (UTC)

@Hardwigg: Thanks for finding and reporting this. This is an edit from september 2014 and apparently it slipped through the net. I am fixing it.
@Andrawaag: Ok, I resolved it. There were already links to the wrong protein as human serum albumin, so I decided to move the sitelinks/labels over. Should be good now. Here's the final standing:
--Hardwigg (talk) 03:11, 18 November 2015 (UTC)

Given names

Hi,

Nice feature to add the given names directly to authors.

Would you try to re-use existing items? "Alexander" is already at Alexander (Q923), so Q19859918 and Q19859790 duplicate/triplicate it.

If you skip P735, eventually my queries would pick it up.

BTW Q19856754 lacked P31:Q5. --- Jura 10:59, 6 May 2015 (UTC)

some more Alexander by your bot: Q19859806, Q19859845, Q19859873, Q19859754 --Pasleim (talk) 14:48, 6 May 2015 (UTC)

Labels

Please consider Help:Label before creating new items. You can read there, that A label is like a page title, but is the smallest unit of information that names an item (e.g. "Paris", not "Paris, France").

So labels like in Medical University of Vienna (Q19859633) or Q19851139 should be corrected. --Pasleim (talk) 14:58, 6 May 2015 (UTC)


Trash

Please check the following and arrange for deletion: --- Jura 21:15, 10 May 2015 (UTC)

@Jura1: Deletion of most if not all of these items has been requested. (See: [[4]]) Apparently, they can't be deleted, but need to be redirected. I am currently reviewing my options, but apparently the requests for redirection need to be added manually and cannot be request in batches, like with the request for deletions. Either way, I am on it. Andrawaag (talk) 23:06, 10 May 2015 (UTC)
@Jura1: All are now merged to single WD items Andrawaag (talk) 23:51, 10 May 2015 (UTC)

AN Discussion

This bot is currently being discussed at Wikidata:Administrators'_noticeboard#User:ProteinBoxBot ·addshore· talk to me! 22:03, 10 May 2015 (UTC)

I commented at this discussion to give support from my perspective. Blue Rasberry (talk) 16:53, 27 July 2015 (UTC)

RfD Q18053915

There is a RfD open for Q18053915. There is a question regarding the bots actions for this item. Could someone please respond at the RfD page? Mbch331 (talk) 10:01, 1 July 2015 (UTC)

I responded on the RdF page. Thanks for pointing. Andrawaag (talk) 10:34, 1 July 2015 (UTC)
Looks like this is resolved ? --I9606 (talk) 16:39, 1 July 2015 (UTC)