User talk:Deansfa

Jump to navigation Jump to search

About this board

Previous discussion was archived at User talk:Deansfa/Archive 1 on 2016-12-09.

Moumou82 (talkcontribs)
Deansfa (talkcontribs)

Bonjour Moumou, je ferais ça ce soir (EST) ou demain. Je te tiens au courant quand c'est fini.

Deansfa (talkcontribs)

Rebonjour, j'ai renseigné la propriété pour tous les items.

Détail important: dans mon script je n'ai pas requêté les nouvelles URLs pour voir si elles correspondaient à une page (car la page HTML est difficile à "parser", c'est plus difficile que certains sites), donc il se peut qu'il y ait des URLs qui atterrissent sur une "Page Not Found". Mais dans l'immense majorité des cas, je pense que c'est bon.

Je pourrais probablement corriger cela lorsque j'arrivais à "parser" correctement la page. Je verrais si j'ai je temps ce weekend. Bonne fin de semaine.

Requête Sparql qui permet de voir les items sans la nouvelle propriété: Résultat Requête

Moumou82 (talkcontribs)

Merci beaucoup !

Je pense que les "Page Not Found" pourraient éventuellement se trouver dans les noms composés où la version du site diffère de celle de Wikidata, comme ici.

Reply to "Akadem"
Richard Arthur Norton (1958- ) (talkcontribs)

Do you have the ability to import all the obituaries from the New York Times archive?

Deansfa (talkcontribs)

It depends. it's always more difficult for old articles, like the ones you have to go in the "Times Machine" to read them.


For more "recent" articles, just looking quickly, it seems for 2018 and after, we can access the obituaries this way:

And we can iterate over the dates and probably get all the obituaries this way (calling the API, etc).


For obituaries previous to 2018, it seems it's there:

Same thing, I can probably loop over the dates in the URL and get the obituaries this way.


Previous to 2006, I need to do more research.


I definitely can do it, it can be some work. Would you be interested?

Richard Arthur Norton (1958- ) (talkcontribs)

I will help any way I can, you just need to tell me what to do. Ancestry and Familysearch have done a great job identifying obituaries at newspapeers.com (ancestry) and genealogybank (familysearch) and automatically assigning them to the entry for the deceased. They recently ran their program again looking for marriage announcements.

Deansfa (talkcontribs)

Hi Richard,

I started importing New York Times obituaries in Wikidata! I'm very happy with the result so far, I finished the 2022 year, and I'm planning in the upcoming days and weeks to do the previous years (2021, 2020, etc).

Here is a Sparql query to see all the obituaries for 2022:

I will improve the query to get the date of death of the person, and to do the difference between publication date and date of death (so we can track the discrepencies).

As you can see, so far less than a dozen are associated with their main subject. Tonight I will run a script to do the association (doing a matching based on the name and the date of death), which will probably work for a majority of the cases. But probably the rest would have to be manual. Will see.

I keep you updated.

Richard Arthur Norton (1958- ) (talkcontribs)

Excellent, you should make this into a talk for the next Wikimania event. Have you given a talk before? Do you think you can automate creation of a Wikidata entry for the decedent, when we have no entry? I especially love the New York Times "Overlooked" series. I will try and add in Familysearch IDs for the people. One more thing to automate is the backlink: "described_by_source=". See: Q115213783

Richard Arthur Norton (1958- ) (talkcontribs)

If you go back far enough in the archive titles are in ALL CAPS are you going to reduce to the sentence case? I see that some journals we have the title of the article in ALL caps, it is very distracting.

Deansfa (talkcontribs)

Hi Richard, thank you for bringing that point. I would try to capitalize the title if it happens. But I won't go that far. I can only process obituaries back to 2006 (right now, we have all obituaries from 2012 to 2022).

Here is a query that shows the number of obituaries by year (the number is hight in 2020 because of COVID):

I'm not sure how to find obituaries previous to 2006.

Reply to "New York Times archive"
Richard Arthur Norton (1958- ) (talkcontribs)

I have asked at Project Chat but there is no consensus. Take a look at Q90922820, there are two places for the image of the new article, which one should we standardize on? I value your opinion, since you are now the newspaper guy.

Reply to "Images of new articles"
Richard Arthur Norton (1958- ) (talkcontribs)

At John Fred Pierson (1839-1932) obituary (Q112567088) you attributed a 1932 obituary to an author born in 1947. I do not see any author attribution in the original article, where is the data coming from? Same for {{Q|Q105762166}}, the obituary is from 1966 and the author started work at the NYT in 2000 and there is no byline in the article, because is came from the Associated Press. I think your bot is off a bit. Does this mean your word count may be incorrect? At Q104907766 where we have the full text the word count is 245, not the 1,091 you added.

Deansfa (talkcontribs)

Hi Richard, thank you for reporting the error. I'm sorry about that.

Just for info, I get the data from a New York Times API. Let's take an example: For this article When Progressives Embrace Hate(50317550), the API response will be here: and some of the response will be like this: {"wordCount":1826,"id":100000005322886,"publishedDate":1501597856000,"publishedTimestamp":1501597856000,.. }


I see now what's the issue in my code. I will rerun it and see where I made errors, and fix them. give me a couple of days. I keep you updated.


Deans

Richard Arthur Norton (1958- ) (talkcontribs)

All good stuff, once working properly! The NYT archive is an amazing source, IBM Watson used Wikipedia and the NYT Times archive as the main source of info to win Jeopardy. At one time reverse CAPTCHA was being used to transcribe articles. https://www.nytimes.com/2011/03/29/science/29recaptcha.html

I wish the Associated Press had an ASCII archive of all their articles.

Reply to "{{Q|Q112567088}}"
Billinghurst (talkcontribs)

Hi. With reference to special:diff/1904523818 and the other like edits, the publication date and other aspects of article are best applied as qualifiers of the publication, rather than directly to the item. This is most notable when people ridiculously apply a page number directly to an item, rather than to the publication (Help:Qualifiers). You truly see the value of this approach when copying and moving these parts of an item.

Deansfa (talkcontribs)

I don't disagree but noone follow this direction, so when you query articles based on date, you miss those (or you have to have a very verbose query to handle this use case).

Deansfa (talkcontribs)

I didn't noticed you removed the date. You should not revert the change, the date can stand as qualifier AND directly on the item. There's more than 4000 NYT articles, I don't see why these 40 articles should differ from the rest. It makes querying articles against the date very challenging.

Reply to "publication date as a qualifier"
Thelemic Magick (talkcontribs)

Hi Deansfa,


is this a mockup for importing all WSJ articles as Wikidata items?

Deansfa (talkcontribs)

Hey Thelemic, I don't plan to import all WSJ items (at least not if it's not needed), so it's not really a mockup.

I definitely did some test imports in the past, but it was limited to one author. I currently import some articles from time to time, in general because I used them as reference in articles in Wikipedia (using templates like Template:Cite Q on wp:en or Template:Bibliographie on wp:fr).

If there's a need, I can definitely do bigger imports (by issue, by author,). I'm currently working on doing something similar for the nytimes and bloomberg). I think there's lots of possibilities in the future.

Reply to "WSJ articles project"
FeralOink (talkcontribs)

Hello Deansfa.

I noticed that you created a Wikidata entry for a Wall Street Journal article that is referenced in the en Wikipedia article for Alameda Research. Could you check the Wikidata entry for that WSJ article? There is an error in one of the fields. It is because the WSJ doesn't use a consistent article URL naming convention. I don't recall what error is thrown in Wikidata, but you'll see it toward the end of the entry, where it is denoted with a circled question mark. Could you correct that please? I don't know how to do that.

Deansfa (talkcontribs)

Hi FeralOink, thank you for the message. So two WSJ-WD articles are referenced in the Alameda Research article: Alameda, FTX Executives Are Said to Have Known FTX Was Using Customer Funds (Q115184709) and Binance Walks Away From Deal to Rescue FTX (Q115184738). I don't think it creates any issue in the article, but you're right, there are some "exceptions"/"errors" thrown in the Wikidata element (around article ID (P2322)).

It's because for each of the WSJ articles I set up an article ID (which is unique per article, it's an exposed and documented attribute) with the property article ID (P2322). The problem is that this property accepts only alphanumeric characters, while WSJ article IDs can be like "WP-WSJ-0000344745" (with dashes). Actually the format of this ID has slightly changed overtime (It was like /SB[0-9]+/ for most of the time).

I'm planning to create a property for WSJ articles ID, so I won't have to use article ID (P2322) in the future (and it won't throw errors). One temporary fix is to add dashes as an accepted character for article ID (P2322).

FeralOink (talkcontribs)

Okay! That sounds like a good way to deal with it. I am a WSJ subscriber, so I know their URL conventions changed about 3 years ago and have some weird variations like the dashes but only sometimes. Thank you so much for looking into it. I saw the notes about regex but that is beyond what I felt like considering ;o)

Deansfa (talkcontribs)
Reply to "WSJ article re FTX and Alameda"
Summary by Deansfa

The user may have not noticed the item was linked to other items.

Bovlb (talkcontribs)

{{subst:Welcome if not exists|welcome=yes}} Thank you for contributing to Wikidata. I see that you recently created an item Q113657295 that does not clearly indicate its notability. The Wikidata project only accepts items that meet its notability criteria, and your item is therefore likely to be deleted soon. In brief, items must have an associated Wikipedia article, must be needed for statements on another notable item, or must have both identifiers and serious sources. For the last case, a good indication of notability would be multiple articles about the subject in independent publications like newspapers or magazines. You can add such sources as references to specific claims using reference URL (P854), or as top-level claims using described at URL (P973). Also, this may not apply in this specific case, but you should know that we discourage editors from contributing on topics with which they have a strong personal connection, as this may present a conflict of interest. If you are being paid to edit here, then you are obliged to disclose this. For a longer version, you might find it useful to read the essay "How to create an item on Wikidata so that it won't get deleted".  ~~~~

Deansfa (talkcontribs)

Hey Bovlb. I really appreciate your message. I'm extremely familiar with Wikidata, having more than 300k contributions on this project and being, as you, on the Wikimedia projects for more than 15 years. This item is notable because it meets Wikidata notability criteria. Have a good evening or night.

Pharos (talkcontribs)
Deansfa (talkcontribs)

Hi Pharos, this is great, thank you for letting me know!

Reply to "Wikidata:List of properties/New York City"
Sammyday (talkcontribs)

Hello Deansfa. Au sujet de ceci, je croyais que les redirections n'étaient pas considérés équivalentes à un article dédié, et qu'il ne fallait donc pas les ajouter aux éléments. Je me trompe ? Sammyday (talk) 17:01, 26 January 2022 (UTC)

Deansfa (talkcontribs)

Bonjour @Sammyday:. C'était peut-être le cas à une époque, mais les liens vers les redirections sont devenus très courants sur Wikidata. Il y a même, depuis maintenant pas mal de temps, un petit "badge" que l'on peut mettre sur le lien vers la redirection (voir un exemple en bas de cette page), qui indique que le lien Wikipédia est un lien vers une redirection.

Sammyday (talkcontribs)

Intéressant. Toutefois, dans le cas qui nous occupe, ce "badge" ne peut pas être mis, parce que bizarrement un message d'erreur apparait, en précisant que Eric Zemmour faisant déjà l'objet d'un élément, on ne peut pas dire que ce lien est une redirection (que je choisisse "lien vers une redirection" ou "lien intentionnel vers une redirection"). Si jamais tu y comprends qqchose...

Deansfa (talkcontribs)

@Sammyday:, oui, c'est un peu compliqué. En fait, il faut le faire au moment où ce lien de redirection est ajouté à l'élément Wikidata ; car pour ajouter une redirection sur Wikidata, il faut temporairement casser la redirection sur la Wikipédia en question (en général en enlevant le dièze), ajouter le lien vers la redirection sur Wikidata, et remettre en place la redirection précédemment cassée (Exemple ici, avec ces deux édits à 1 minute d'intervalle.). Je pourrais le faire pour Reconquête, mais vu la discussion en cours sur la PDD d'Éric Zemmour, il est fort probable que l'article reconquête soit restauré, au moins le temps de la PàS.

Sammyday (talkcontribs)

C'est fort probable. En tout cas merci pour ces astuces, je ne manquerais pas de m'en servir une prochaine fois.

Reply to "Redirection"