Wikidata:Requests for comment/Creating items for videos at online video platforms that are representation of notable items

From Wikidata
Jump to navigation Jump to search
An editor has requested the community to provide input on "Creating items for videos at online video platforms that are representation of notable items" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.

If you have an opinion regarding this issue, feel free to comment below. Thank you!

I believe creating items for videos at online video platform (e.g. Bilibili, Vimeo, Niconico, YouTube, etc.) of notable items would bring major advantages to the community. In this request, I explain what I think are the advantages.

Under the context of online video platforms with "notable items" I am referring, for example, to documentary films, orationes, lectures or TEDx talks.

Advantage no. 1: We can store more properties about these items

These properties doesn't exist yet, but might help some people if created. For example, we could store information about subtitles which might help deaf people.

  • "human-written subtitles available in language"
    • This property would contain the language of the subtitles that have been written or checked by a human.
    • Thanks to this property, an undergraduate student whose native language is spanish studying chinese, could find lectures in chinese which has subtitles in spanish or a deaf could find documentary film whose speech they can totally understand.
  • "auto-generated subtitles available in language"
    • This property would contain the languages the online video platform has generated.
    • Note that in YouTube (Q866), this property doesn't necessarily equal the language of work of that video. That is, not all videos with dialogues have auto-generated subtitles. For example, as of now, the language of this video is Spanish, however it doesn't have auto-generated subtitles in Spanish.
    • A Spanish native speaker proficient in English interested in raising awareness on the work of women scientist of their country could answer this question: What are videos of presentations of women scientists native to my country that doesn't have subtitles in English? Thus, he would be able to create captions in English at that online video platform so that more people can watch it.
  • "embedded subtitles in language"
    • This property would contain the language of the subtitles the video was distributed with.
    • For example, the language of this video in Vimeo is English, but the video has embedded subtitles in Spanish. Here's another example: the language of this video in YouTube is English, but it has embedded subtitles in Spanish.
  • "number of comments"
    • This property would be the total number of comments available at a given video. This data is usually provided by online video platforms.
  • "average audio bit rate" and "audio sampling rate"
    • I honestly don't know how these 2 properties could be useful due to my ignorance on the sound technology field, but YouTube stores this information (as I could see it by using the flag --dump-json of youtube-dl). Because it is an attribute of the sound of the audio of the video, I think this can be useful to an audiophile or an audio engineer.
  • (... more properties related to the attributes of videos at online video platform ...)

I know the properties related to subtitles can be somewhat handled by language of work or name (P407) and applies to part (P518) with values subtitle (Q204028) or closed captioning (Q2367247). However, having these dedicated properties would mean they can be used as qualifiers too, if necessary. In addition to that, there's less friction for people to use it. Note that number of recoveries (P8010) and number of cases (P1603) could also be modelled with population (P1082) and applies to part (P518).

Advantage no. 2: We can store the number of viewers/listeners (P5436) at different point in times

Let's suppose a documentary film A is published in YouTube. Some people might recommend linking A to the video in Youtube by using YouTube video ID (P1651) in A, but this has the following disadvantage:

By doing this, it's not possible to store the number of viewers/listeners (P5436) for a given video at different point in times in an organized way. Sure, we can add multiple values for number of viewers/listeners (P5436) in YouTube video ID (P1651), but this would look as shown in Figure 1. This causes the following problem: we can't know the date in which that number was retrieved (P813), or we can't know the value for archive URL (P1065) or archive date (P2960). This problem would be solved if the video had an item, because that information could be structured as shown in Figure 2.

Figure 1. Qualifiers being used for storing the number of viewers/listeners (P5436) at different point in times
Figure 2. Statements being used for storing the number of viewers/listeners (P5436) at different point in times

Now, the rate at which this data would be added in statements wouldn't be the same for every video since different people have different needs. Some users might argue that this information shouldn't be stored using statements, but storing CSV data in Commons instead. I think enforcing all users follow a single path for contributing would imply some of them to be discouraged to contribute. If Wikidata intends to be the sum of all human knowledge, then different alternatives should be given to people so that they can contribute. That is, both ways should exist: storing that as statements and as CSV files.

Finally, thanks to advantage no. 1, and advantage no. 2, new questions similar to the following ones would be able to be answered with Wikidata

What this RFC is not: I'm not encouraging the creation of Wikidata items for all Youtube videos. I'm arguing the creation of Wikidata items for videos at online video platforms that are strongly related to Wikidata items. For example, videos that show

Here are some videos that I didn't consider when writing this request and, therefore, I'm not arguing the creation of Wikidata items for those videos. Note that I'm not saying that I support or oppose doing this [1].

[1] Some people might also find these videos useful and, therefore, aligned with the core mission of the Wikimedia Foundation: bring free educational content to the world, so I wouldn't call those videos useless. However, this is out of the scope of this request. As mentioned before, this request is about the creation of Wikidata items for videos at online video platforms that are strongly related to Wikidata items.

Request for comments: Do you think items for videos at online video platform that are representations of notable items should be created?

Rdrg109 (talk) 18:06, 21 February 2022 (UTC)[reply]


Discussion[edit]

  • The two figures above don't show the true space of options available to us. We could use number of viewers/listeners (P5436) on the video item without a new item. We could add qualifiers to indicate which hosting platform the view count applies to. Most of the other attributes are static and so could just be qualifiers on YouTube video ID (P1651). This solution avoids having to create a menagerie of properties for each hosting platform to link the video item to the web resource item. BrokenSegue (talk) 03:43, 22 February 2022 (UTC)[reply]
    @BrokenSegue:
    1. With regards to the alternative that you mentioned for storing number of viewers/listeners (P5436) in different point in times: I think what you refer is shown in Figure 3. If it doesn't represent what you mentioned, could you please provide a screenshot or use Template:Statement+ so that I can fully understand what you meant?
    Figure 3. Storing number of viewers/listeners (P5436) of different videos in the same Wikidata item
    2. With regards to storing static attributes of YouTube videos as qualifiers: Note that doing this will make it impossible to use qualifiers for those qualifiers. What if we want to store the time at which a given subtitle started existing? Would this be useful? Yes, this could be useful for someone that is studying the priority at which a given online video platform creates subtitles for specific videos. This could also be useful to someone that is studying the behavior of users that contribute by creating subtitles in a online video platform XYZ. This could also be useful to someone that is studying the age-related warnings that are shown in online video platform. Thanks to this, that person would be able to answer: How long did it took to an online video platform XYZ to show a warning that the minimum age (P2899) for this video is 16 years old since its publication date (P577)?
    3. With regards to creating a property for each online video platform: The topic being discussed here (i.e. creating Wikidata items for videos) doesn't imply creating those properties. I think you are aware of the property proposal under discussion for "Youtube video" (the proposal can be found here). Personally, I don't support that proposal, because, as you mentioned, this would imply creating a property for each online video platform and I think this would make SPARQL queries complex (therefore, the data would only be queried by users that know how to do dabble with SPARQL)
    Instead of creating a property for each online video platform, a single property would be used for linking a work to their representation in each online video platform, the property could be called "full work available at". I'm aware that full work available at URL (P953), but this accepts a URL instead of a Wikidata item, and here we are discussing about the advantages that would bring creating items.
    -- Rdrg109 (talk) 06:39, 22 February 2022 (UTC)[reply]
    Yes figure 3 roughly expresses my idea. The text of this RfC doesn't explain what is wrong with that approach. Yes, I recognize we wouldn't be able to qualify the qualifiers in my suggestion. I guess I don't see enough value there to justify this approach. BrokenSegue (talk) 19:03, 22 February 2022 (UTC)[reply]
    And I would argue such a solution is ontological garbage. Making a few items for notable YouTube videos is a lot better than this inconsistent, qualifier-based, un-queryable solution. Lectrician1 (talk) 14:39, 28 February 2022 (UTC)[reply]
  • I find the argumentation to be weak. It conflates usability with notability, and there is no strict correlation between the two. If the argument was good, then we could also have specific items for every notable person's Twitter, Instagram and Facebook accounts and so on to track their development over time. While it might be interesting data for a statistician to collect, the proposed items are not notable per se and thus this is data that, even though it technically could be added to Wikidata and that the query service could be used to analyze it, is better suited elsewhere. Ainali (talk) 16:49, 22 February 2022 (UTC)[reply]
    @Ainali The reason it seems weird and wrong to have dedicated items for Twitter, Instagram and Facebook accounts is because they aren't really ever significant in any way. However, YouTube videos numerous times have been notable and documented on wikis because they can go viral and actually have newsworthy significance. They also have the relevant statistics that should be documented over time like likes and views. Twitter, Instagram and Facebook accounts do not go viral and their follower count is rarely significant. However, I could extend this argument of creating a dedicated item for another example internet account. Take r/WallStreetBets (Q88239185). This is an internet account. Because it now an item does that mean we can create items for any subreddit? I would say no and I think you would too. However, it is notable and newsworthy and therefore should have an item. I would argue notable YouTube videos should too. Lectrician1 (talk) 14:47, 28 February 2022 (UTC)[reply]
    It seems to me that you are still conflating concepts here, especially through your last sentence. I am not arguing for that YouTube videos cannot be notable, of course they can. But it does not mean that every item about a video would automatically make the YouTube version of that video notable. Instead, the YouTube version should need to be notable on its own rights. The usefulness, or utility, of the items is irrelevant to the notability. So my stance is still  Oppose to making every youtube-video notable by default. Ainali (talk) 17:48, 28 February 2022 (UTC)[reply]
    @Ainali "But it does not mean that every item about a video would automatically make the YouTube version of that video notable. Instead, the YouTube version should need to be notable on its own rights" Actually I would say that the properties of a YouTube video does make it notable in on it's own. This is specifically because the properties of likes and views are notable and specific to the YouTube video itself, not the video. Lectrician1 (talk) 18:10, 28 February 2022 (UTC)[reply]
    Well, we can agree to disagree then because you claim notability from some homemade utility argument of likes and views rather than any of the ones under Wikidata:Notability. Ainali (talk) 21:10, 28 February 2022 (UTC)[reply]
    But by being newsworthy and news outlets talking about the YouTube video itself and its views or likes, it therefore makes the YouTube video a "clearly identifiable conceptual or material entity". Lectrician1 (talk) 21:35, 28 February 2022 (UTC)[reply]
    @Ainali:
    1. With regards to conflating usability with notability: I can agree with you on this. A video at an online video platform is not notable by itself. Therefore, most of them shouldn't have a Wikidata item. However, note that I'm defending/encouraging the creation of those videos that are strongly related to notable items. For example, if a given documentary film is shared in multiple online video platforms, then those videos would be able to have a Wikidata item because they are representations of the documentary film. Thanks to this, we can now store information about the video itself in a more organized way.
    2. With regards to creating items for Twitter, Facebook and Instagram accounts: I wouldn't defend creating items for all accounts in Twitter, Facebook or Instagram, but I'd defend doing that for those accounts that are owned by people that exist in Wikidata. However, This RFC is not about that topic. I understand it is related, but my request doesn't support nor oppose to the creation of Wikidata items for such entities (social media accounts). I've added a new section to this RFC to put that clear.
    3. With regards to that information being better suited elsewhere: I'm wondering what alternatives could be used to store this information instead. If you know one, please let me know. Personally, I think Wikidata is an appropiate alternative because of this main reason: This new data would enrich existing knowledge in Wikidata (e.g. natural disaster, orationes, occurrences, conventions, etc.). So, people would have more information about those entities. So, the core mission of the Wikimedia Foundation, to bring free educational content to the world, would be strengthen.
    Rdrg109 (talk) 22:03, 1 March 2022 (UTC)[reply]
  •  Support for all the great reasons User:Rdrg109 has fully listed. Thank you so much for putting in the work of creating this in-depth RFC! Lectrician1 (talk) 14:49, 28 February 2022 (UTC)[reply]
  •  Support, per the great arguments presented. -- Donald Trung/徵國單  (討論 🀄) (方孔錢 💴) 06:44, 1 March 2022 (UTC)[reply]