User talk:Sic19

Jump to navigation Jump to search

About this board

Previous discussion was archived at User talk:Sic19/Archive 1 on 2019-03-07.

ArthurPSmith (talkcontribs)

Hi Simon - thanks for your lists! I'm going to at least start working on them by hand. The "bad merges" list seems shorter so I was planning to start with that - also some of them seem fine (Rachel Johnson and Rachael Johnson may be the same?) - anyway any suggestions on how to track work so we don't overlap (if you're planning on working on these too)? I can just add some notes here maybe?

ArthurPSmith (talkcontribs)

Starting at the top of the "bad merges" file, just the first one Q43005928 sorted out so far. This seems doable...

ArthurPSmith (talkcontribs)

One hint - if you look at the history on an article, a Krbot action to change the P50 value is a good indicator that the article should be reassigned back to the previous author item, after a bad merge. I wonder if checking for this could be automated? That might help a lot to reverse the effects of merges.

ArthurPSmith (talkcontribs)

Now on lines 23-25 - Q59683200.

Sic19 (talkcontribs)

Hi Arthur, I've been working on the mismatched names and, after comparing authors on articles in Wikidata with the Crossref metadata and checking the number of statements on the author item, I deprecated about 1000 ORCID claims. Some of these items are likely duplicates but with the ORCID deprecated it will be safe to merge them later. The check against Crossref has also revealed some cases where the author order from Pubmed is wrong, which doesn't make things any easier!

The only item I've knowingly sorted out from the bad merges is Q41173397 but there may also have been some overlap with the mismatched names. It should be possible to automate the checking for Krbot edits using the Revisions API.

ArthurPSmith (talkcontribs)

PubMed seems to duplicate authors if it finds two matching ORCID's with the same name (it just adds the same name twice with the two different ORCID ids). So that offsets every subsequent author in the list. I've fixed a bunch of those cases but it seems to be a quite widespread problem.


By the way, currently working on Q87913633 from the bad merges list, lines 50-52.

ArthurPSmith (talkcontribs)
Sic19 (talkcontribs)

Hi @ArthurPSmith - I've updated the mismatched names list with extra data from Wikidata and Crossref to help identify what needs to be fixed and how: Wikidata-ORCID mismatched names with papers 20240426.csv. I'll continue working on this over the weekend but probably won't get much done next week as I'm away for the Wikimedia hackathon (although I'm planning to work on ORCID data processes and documentation there).

ArthurPSmith (talkcontribs)

Thanks! Not sure I'll get a chance to look at that soon, but it looks really useful.

ArthurPSmith (talkcontribs)

From going through many bad merge examples by hand it looks like we have several different types of problem to handle:

  • Where the names differed on Wikidata items *before* the merge: in this case Krbot (or other bot) fixing of redirects probably all should be reversed, because the author relations should have been assigned correctly beforehand.
  • Where the names differ in ORCID, but one of the names on Wikidata was wrong before the merge, making the two names seem the same. In this case the Krbot fixes probably should *not* be reverted as most likely the author relations were assigned to the shared name (wrong for one of them). However there seem to be exceptions to this where despite the name mismatch the correct item (based on ORCID) was assigned as an author before the merge, and so that Krbot fix should be reverted. This may depend on exactly what process did the author assignment in Wikidata the first time?
  • Where the differing names are really for the same person (variant spellings of their name) so it wasn't actually a bad merge after all. Hard to detect automatically I think...
Sic19 (talkcontribs)

My observations from working on the Wikidata items with a different name in ORCID concur with your second and third points. Most of the author relations are correct but there tend to be one or two incorrect assignments that should be the person in ORCID. However, there are some examples where the complete opposite is true and all of the papers belong to the ORCID and the Wikidata item needs to be renamed.

I found the Crossref author data helps a lot with the checking and I've added it to the bad merges file: Wikidata-ORCID - bad merges with papers - 20240429.csv. I tried to filter out the items that have been corrected already but haven't done any comparisons of the Crossref and Wikidata naming to identify errors. At a glance, another problem is apparent; incorrect author assignments that involve a 'third party', i.e. neither the Wikidata or ORCID entity.

ArthurPSmith (talkcontribs)

Thanks @Sic19 that new file is amazing! I've already gone through the first around 2000 lines and resolved a lot of issues, it's so much faster than what I was doing before...

ArthurPSmith (talkcontribs)

Up to about line 6500 now - over 20% of the way through that file! Some authors have over 1000 papers so this is very efficient to check for problem areas. Thanks.

Reply to "ORCID mismatches"
Vanbasten 23 (talkcontribs)

Good Sic19. I hope you are very well, greetings. I have seen some changes in the occupation of university professors and I see a problem with the positions they occupy. For example, in this case there are several job changes in the same company, but when inserting it in WD only the date and the university are imported, which generates duplication of information and it seems that you have worked several times in the same site on the same dates. If the position is not indicated, it should be revised so that the university is only inserted once and on the broadest dates. All the best and thanks for your time.

Reply to "ORCID employment"
MediaWiki message delivery (talkcontribs)
Reply to "Wikidata weekly summary #625"

Wikifunctions & Abstract Wikipedia Newsletter #152 is out: Welcome, Sharvani!

1
MediaWiki message delivery (talkcontribs)

There is a new update for Abstract Wikipedia and Wikifunctions. Please, come and read it!

In this issue, we welcome a new member of the team and we take a look at the latest software developments.

Want to catch up with the previous updates? Check our archive.

Enjoy the reading! -- User:Sannita (WMF) (talk) 19:28, 22 April 2024 (UTC)

Reply to "Wikifunctions & Abstract Wikipedia Newsletter #152 is out: Welcome, Sharvani!"

Wikifunctions & Abstract Wikipedia Newsletter #152 is out: Welcome, Sharvani!

1
MediaWiki message delivery (talkcontribs)

There is a new update for Abstract Wikipedia and Wikifunctions. Please, come and read it!

In this issue, we welcome a new member of the team and we take a look at the latest software developments.

Want to catch up with the previous updates? Check our archive.

Enjoy the reading! -- User:Sannita (WMF) (talk) 17:14, 22 April 2024 (UTC)

Reply to "Wikifunctions & Abstract Wikipedia Newsletter #152 is out: Welcome, Sharvani!"
MediaWiki message delivery (talkcontribs)
Reply to "Wikidata weekly summary #624"
Difool (talkcontribs)
Sic19 (talkcontribs)

Hi @Difool, thanks for your message. I merged based on ORCID ID and name similarity. Agree that IdRef is not a reliable source for ORCIDs and I will check the reference on ORCID ID statements before doing anymore merges.

Reply to "Merges reverted"
MediaWiki message delivery (talkcontribs)
Reply to "Wikidata weekly summary #622"

Wikifunctions & Abstract Wikipedia Newsletter #151 is out: New API for calling Wikifunctions and celebrating 1000 functions

1
MediaWiki message delivery (talkcontribs)

There is a new update for Abstract Wikipedia and Wikifunctions. Please, come and read it!

In this issue, we discuss the new API for calling Wikifunctions, we celebrate our first 1,000 functions, and we take a look at the latest software developments. Also, there's a job opening for joining our team!

Want to catch up with the previous updates? Check our archive.

Enjoy the reading! -- User:Sannita (WMF) (talk) 09:57, 12 April 2024 (UTC)

Reply to "Wikifunctions & Abstract Wikipedia Newsletter #151 is out: New API for calling Wikifunctions and celebrating 1000 functions"
MediaWiki message delivery (talkcontribs)
Reply to "Wikidata weekly summary #623"