Wikidata and the sum of all video games − 2019 edition

13 months ago, I was recapping one year of contributions to Wikidata’s WikiProject Video games, floating the “Sum of all Video games” motto. If you are not familiar with that endeavor, I extensively described it in a mushroom-rambling blog-post a few months ago.

That project was never mine alone, but it has grown so much that as I try to summarize its state as 2019 closes, I will focus on some trends I observed (rather than on my own contributions). Be warned that I will most certainly have my blind spots though. 🙂

Overview

As of January 31st 2020, there are 39K video game (Q7889) items on Wikidata − a 10% growth over the year (but still a fairly low coverage, in the greater scheme of things).

Let’s have a look on how well these items are described (a task made more straightforward this year thanks to my very own inteGraality ^_^): 5.7K have no platform (P400), 8.6K no publication date (P577), 11.3K no genre (P136) − much better numbers than last year: while we kept adding items, the overall proportion of well-described items went up − way to go! Conversely, 19.7K have no developer, 20K no publisher − a share of the total a bit worse than 13 months ago.
(I have a hunch that part of that growth in platform, dates and genres can be attributed to Ghuron’s category inference bot, which leverages on Wikipedia’s extensive category tree [and some tedious annotation work ^_^’]).

The real game changer has been on external identifiers. While 13 months ago, 40% of items (14K) had no external identifiers, we squashed that down to 22% − with only 8.5K items. All that Mix’n’matching was not done in vain!

Besides that, our data-model has not evolved much in the past 13 months. announcement date (P6949) was created in July and made its way to a hundred video game items. More interestingly, mod of (P7075) was created and is now used on ⅔ of our video game mod items. And in October, content descriptor (P7367) came around to qualify media content ratings.

Some activities

Project management

We expanded the project documentation significantly. New additions include a showcase queries, an activity log, and a bunch of Listeria-based reports and dashboards − in particular our external identifier dashboard. Of course, we continued maintaining and organizing our data-model/properties list.

Some cool things in no particular order

Externals identifiers & bulk-imports

From 50-something external identifiers, we tripled that to 180 video-game related external identifiers. Mix’n’match catalogues bumped from 42 to 110.

Looking at one example, usage of the MobyGames property went from 13K to 24K − an amazing growth (that can be very much attributed to the tireless Mix’n’matching from certain research assistants based in Leipzig ^_^)

Some pretty sweet bulk imports of identifiers also took place. Connor opened the year by “Connecting PCGamingWiki and Wikidata”. In March, Envel took on “Matching BnF and Wikidata video games using Dataiku DSS”. In June, [[User:Kristbaum]] added about 5,000 speedrun.com identifiers. Over the summer, Tracy from the diggr project imported an interlinking dataset of Mobygames and MediaArt Database. End of September, [[User:Premeditated]] made some 20,000 matches to the Mixer streaming website. Connor closed the year with a first match of the Lutris database, and later with IGDB. On a much smaller scale, I had some luck with populating some data from Wikipedias, using QuickStatements and HarvestTemplates.

Outreach

We did a lot of outreach this year − internally and externally.

In July, I was invited to a workshop “Videogame and Visual Media Data – Community-driven Initiatives and Research Avenues” at Leipzig University, co-organized by our friends from the diggr project (I summarized my statement there). In October, Tracy kindly presented “Using Wikidata for Video Game Research” at our Wikidata goes Library event at the Vienna public library.

With Tracy’s and Envel’s support, we put together a poster for Wikimania 2019 − which we recycled enhanced for WikidataCon. And the three of us banded together for a session there on the Sum of All video games project, presenting the state, relevance, achievements and challenges of the project.

External interest and reuse

Our work generated some interest: more and more people interested in linking to, leveraging or reusing Wikidata for video game metadata.

In January, Tracy and Peter Chan from Stanford University Libraries worked on linking the OLAC Video Game Vocabulary with Wikidata. In July, the German International Computer Game Collection relaunched, linking to Wikidata and leveraging us to import titles in different languages on-the-fly. In August, the Visual Novel Database replaced its Wikipedia links with Wikidata links, and started automatically fetching more links from us.

In June, Connor lifted the veil on vglist − a webapp for people to track their game library − not the first of such apps, but this one entirely powered by Wikidata metadata. The same month, Wikidata was briefly considered for use (although rejected, for reasonable reasons) by the grilo project, among other things used by the GNOME Games application.

Finally, at the last International Conference on Dublin Core and Metadata Applications in September, Kazufumi Fukuda (Center for Game Studies, Ritsumeikan University) presented a paper titled “Using Wikidata as Work Authority for Video Games”, concluding that “adopting Wikidata as a work authority was found to be a valid method”, and underlining the need to further enhance the metadata in Wikidata.

The road ahead

That was 2019! (If there’s anything I missed, please do drop a comment)

While a lot happened, there are many things we did not get to. One of the most crucial issues of all: advancing the data-model. It is still up in the air whether we want to implement a more sophisticated data model for work/releases. We also did not significantly expand our grammar and vocabulary to model things like art styles, viewpoint, narrative style or gameplay features. This is something we will have to tackle in 2020. More fun!

Thanks to Envel and Tracy for proofreading this article ; and to Connor for motivating me to write it.


Creative Commons LicenceThis work is licensed under a Creative Commons Attribution 4.0 International License.

3 commentaires

Votre commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l’aide de votre compte WordPress.com. Déconnexion /  Changer )

Photo Facebook

Vous commentez à l’aide de votre compte Facebook. Déconnexion /  Changer )

Connexion à %s