Wikidata and the sum of all video games − 2020 edition

After the 2018 and the 2019 recaps, let’s cover what happened in 2020 with the project that has been much of my focus, Wikidata’s WikiProject Video games. If you are not familiar with that endeavor, I will refer you to the mushroom-rambling blog-post I wrote in September 2019.

I feel that this year was a bit slower for the project than previous ones − maybe we were all too busy playing Q64566657 or Q96417649 (or, you know, busy with other things) − it definitely was for me, and as such, I will not claim exhaustivity 😉


As of February 15th 2020, there are 42K video game (Q7889) items on Wikidata − a 6,4% growth (2.5K items) over the year.

As always, let’s have a look on how well these items are described: 6K have no platform (P400), 9.6K no publication date (P577), 11.8K no genre (P136) − somewhat worse numbers than last year: the overall proportion of well-described items went down, as we added more items.

Our work on external identifiers continues to pay off: only 4K items without any identifiers (9,7%). This is slightly misleading, as this figure includes items with vglist video game ID (P8351), which is itself based on Wikidata. Excluding vglist, we arrive at 5.2K items (12.5%). Down from 22% a year ago, and 40% two years ago, we are on a good trend 🙂

Data-model wise, we gained two new major properties:

Externals identifiers

We have now reached over 240 video-game related external identifiers (compared to 180 external last year).

The additions range from the old ( (P8059) started in 1994) to the super-recent (Game UI Database ID (P8994), added a mere weeks after its announcement), and span several languages (many English, but also French [Apple IIGS France ID (P7799) or Gamekult ID (P7913)], German [ ID (P7853) or GameStar ID (P7877)], Japanese [ URL (P7890)], Polish [ ID (P8020)], Danish [Play:Right ID (P9143)], Russian [Absolute Games ID (P8279)] or Spanish [D-MSX ID (P7802) or Computer Emuzone ID (P7733)])

They specialize in various ways:

Mix’n’match catalogues, which we use to align the external database with Wikidata, almost doubled, from 110 to 209 (a situation I somewhat contributed to: at some point I even found it easier to keep track of databases to create properties for by simply importing its catalogue first ;-þ)

Discontinued databases

While there is no shortage of video game databases accessible out there on the Internet, there are many inaccessible ones: as I said before, « databases disappear all the time, leaving only behind the smoke of a “PHP version not supported”, “502 bad gateway” or “this domain is for sale” ». However, their usefulness and relevance did not end with their own demise − in some cases, it might be the only database about a particularly niche topic (platform or country).

I thus took an interest in creating Mix’n’match catalogues for a couple of discontinued databases − as long as they were reasonably well indexed in the Internet Archive’s Wayback Machine: AustrianGames (Austria), (NES), Commodore Gamebase (Commodore PET and VIC), AmigaMemo and EAGER (Amiga). As is, this already serves a purpose: identifying potential gaps in Wikidata’s coverage. But the next step could be to have Wikidata properties for them − making the linkage readily available, and effectively (re)creating a search index for the late database (the Wayback machine has many amazing features, but easy browsing is not exactly one of them).

I have not made such proposal yet: while it’s accepted practice to keep properties around after the website disappears, I’m not sure how the community would feel about creating properties for effectively dead websites.

Bulk imports

As far as I know, this was a slower year for video-game related bulk imports, with only the ones done by Connor: in June, addition of vglist video game ID (P8351) to 33.4K items ; and in December, addition of many MobyGames game ID (P1933), HowLongToBeat ID (P2816), and Internet Game Database game ID (P5794) based on their PCGamingWiki ID (P6337)

Some other things in no particular order

I work at the Royal Danish Library documenting and archiving Danish video game and am currently in the middle reforming our metadata records for each registered game. This would be a good time to start adding missing identifiers to WikiData entries and our own records.

Here might be this GLAM cooperation I long dreamed of 🙂

Shiny SPARQL queries

Wikidata truly shines when it’s able to answer questions that would otherwise be very hard to answer. The challenge is less about writing the SPARQL (as one can often find someone to do it for you) and more about having a really interesting question to ask − and while our showcase queries page grew over the years, I never quite came up with a video-game related question that could not be just as well (or better) answered from other sources.

Until I saw this tweet from Ryan Hamann, musing “What video game series have the longest period of time between one game and its immediate sequel?” With the help from [[User:VIGNERON]] (as often), I put together this query which I am quite proud of: the results roughly matched expert knowledge, but also yielded results one would not have thought of (and Wikidata being a work in progress, that way we also found games released after their sequels ;-þ)

To a lesser extent, I’m also pleased with the idea of “Video games with the most time elapsed between their announcement and their release”, but the data we have is way too incomplete to be of value yet.

Outreach and bridge building

I did not do nearly as much outreach to other communities/organisations as should be done (besides saying hello in a couple of Discord channels) ; however I have one story I’d like to retell.

Back in October, [[User:Trade]] relayed to the project the news that Sony was just about to revamp the PlayStation Store, removing entries for thousands of older games − which would break many citations and external ID entries, and taking away what is likely the only existing (citable) source for some data.

There was little I could directly do about it, having neither the expertise nor the bandwidth. So I reached out to the ArchiveTeam IRC and the Videogame Preservation Collective Discord − relaying information back and forth and eventually getting people in the same virtual room, as well as advocating for the Wiki perspective (most folks cared deeply about backing up the data, but not really about the cite-able HTML pages − which is fair enough). The fine folks there have made great strides in getting at least part of the content saved. I have no claim to that success ; but it’s been heartening to bridge communities together, and watch the collaborative effort take off.

The road ahead

And that’s a wrap! For more information the interested reader may refer to this news roundup / article compilation.

It’s becoming a bit of a running gag, but no, we still did not get around advancing the data-model significantly. No progress on whether we want to implement a more sophisticated data model for work/releases (although we had more modeling discussions that, in my view, showed how this is inevitable) ; not did we expand our grammar and vocabulary to model things like art styles, perspective/viewpoint, narrative style or gameplay features. This is still on the bucket list 🙂

Thanks to Nicolas for proofreading this article.

Creative Commons Licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

3 commentaires

Votre commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:


Vous commentez à l’aide de votre compte Déconnexion /  Changer )

Photo Facebook

Vous commentez à l’aide de votre compte Facebook. Déconnexion /  Changer )

Connexion à %s