Wikidata and the sum of all video games − 2022 edition

It’s that time of the year! After the 2018, 2019, 2020 and 2021 recaps, let’s cover what happened in 2022 with Wikidata’s WikiProject Video games. If you are not familiar with that endeavor, I will refer you to the mushroom-rambling blog-post I wrote in September 2019.

Overview

Mid-year, we passed the major milestone of 50,000 video game (Q7889). As of February 1st 2023, we stand at 55.5K − a whopping 22.5% growth (10,2K items) over the year.

As always, let’s have a look on how well these items are described (using, as always, integraality dashboards): 6,6K have no platform (P400), 10.6K no publication date (P577): while higher number than last year in absolute, the proportion is better: 14% → 12% and 23% → 19%. 23,4K (47%) have no country of origin (P495), which is stable. Conversely, 19K have no genre (P136) which is a not so good trend (27% → 34%).

Regarding external identifiers: only 570 items do not have any (1%, down from 1.8%). We know by now that this number is a bit meaningless − and so is the 1.32K items excluding vglist video game ID (P8351) (2.4%, down from 2.6%).

The number I will be tracking from now on is the count of items without any identifier property maintained by WikiProject (P6104): WikiProject Video games (Q8485882) − which is 6003 (10,8%) at time of writing. Compared to the fairly-comparable 15% of last year (same idea, but slightly different methodology), this is a good trend.

External identifiers

We have now reached 356 video-game related external identifiers (compared to 274 external last year).

Again, the additions cover various languages: English of course, but also Japanese (Tagoo video game ID (P10368), Refuge.tokyo video game ID (P10424)), French (JeuxActu ID (P10455)), many Russian (LKI ID (P10309) or Cybersport.ru ID (P10501)) quite a few German (ntower ID (P11340) or Kultboy video game ID (P10850)), Italian (Adventure’s Planet ID (P11361)), Spanish (amstrad.es ID (P11426)) − and some new languages so far barely represented (or not at all): Chinese (A9VG game ID (P10371), TGbus ID (P10996)) and Korean (Naver game lounge ID (P11058)).

These new identifiers specialize in various ways:

platforms, from the old-school (worldofsam ID (P11338) covering Sam Coupé games) through the classic (Game Boy hardware database ID (P11376) & GBDB ID (P11359) for Game-Boy) to the recent (Nintendo Switch title ID (P11072)) ;
genres (RPG Site game ID (P11418) for RPGs or ifwizz ID (P10841) for interactive fictions) ;
practices (Pocket Gamer ID (P11411) for mobile/handheld play, Esports Earnings game ID (P10802) for e-sports) ;
technology (Viveport ID (P11117) & Oculus Store ID (P11088) ) ;
specific facets, such as accessibility with Can I Play That? Games Codex game ID (P11339), or « evil practices » with microtransaction.zone ID (P11400) or Dark Patterns Games ID (P11425)

That’s for games; but we also have new identifiers covering other entity types:

companies (VideoGameGeek developer ID (P10511), Play:Right company ID (P11321)…)
series/franchises (Play:Right series ID (P11320), VideoGameGeek series/franchise ID (P11459), Glitchwave franchise ID (P11537))
game engines (IGDB game engine ID (P11046) & Mod DB engine ID (P11132))
magazines/sources (Kultboy magazine ID (P10853), OpenCritic outlet ID (P11223), UVL source ID (P11427))
controllers (Kultboy controller ID (P10852))
characters (Glitchwave character ID (P11541))

In terms of origin, we have the usual mix of fan databases, commercial/news websites and online stores ; but also one institutional database with International Computer Game Collection work ID (P11295).

Mix’n’match catalogues, which we use to align the external database with Wikidata, got again a big boom, going from 235 to 305 − so much so that I split the collection in 6: companies (20), genres (10), platforms (23), series (9), sources (6) and the default/misc/games (236). If the Mix’n’match categories are anything to go by, then video games are by far the most represented domain on the tool.

Overview

Looking at which identifiers are used the most, the situation has changed since June: with 48,6% of our Q7889 items, MobyGames game ID (P1933) is dethroned by IGDB game ID (P5794), standing at the top with 58,6%. The Lutris game ID (P7597) joins the podium with a whopping 45%. While only created end of 2021, RAWG game ID (P9968) climbs to 6th place with 29%. (these progresses can be attributed in large part to some automation, which will be discussed later).

Discontinued databases

I continued my interest in discontinued databases, creating Mix’n’match catalogues for a couple of them − as long as they were reasonably well indexed in the Internet Archive’s Wayback Machine: HChistory.de, Personal Computer Museum magazines, LGDB, CoCo Site, CPC-Zone. Still have not made the step to propose properties for these − perhaps when they reach decent matching coverage.

On a sad note, in August the Japan PlayStation Software Database ID (P9636), which covers all games released in Japan on PlayStation systems, vanished from Sony’s website. I had never found a good way to index it in Mix’n’match, so our coverage is pretty low ; and I have since discovered that the Wayback Machine only had a partial snapshot of it (I noticed several pages not archived). I had (unfinished) plans to turn the Wayback Machine dump into a Mix’n’match, I should get to it.

Content rating databases

I feel a breakthrough was made with content rating systems and their databases.

First, the American Entertainment Software Rating Board (ESRB): ESRB game ID (P8303) was around since 2020, but [[User:Nicereddy]] created a Mix’n’match catalogue for it in August − since then, the usage went from 789 to 7300. Then NicereddyBot would come along to add the ESRB rating (P852) (example).

Second, the German Unterhaltungssoftware Selbstkontrolle (USK): I finally figured out the resolvable IDs of its database, and thus was born USK ID (P11063). Kirilloparma compiled a Mix’n’match catalogue, with already close to 700 matches.

Technical support

Automation, automation, automation

In previous year-in-reviews, I have often showcased bulk data imports − QuickStatements batches or bot runs that have populated a bunch of data points (often identifiers) in one go. There’s been some of these, some of which will be listed on the project activity log.

But I feel like a shift was made this year from one-off imports to sustained data-enrichment:

qualifier annotation: we want many identifier properties to be qualified − often for disambiguation purposes, typically with platform (P400) − this is now done automatically for GameTDB ID (P8087), Microsoft Store product ID (P5885), Nintendo eShop ID (P8084), UVL ID (P7555). IGDB ID (P5794) gets added the matching IGDB numeric game ID (P9043)
identifiers addition: database A links to database B, Wikidata links to A, so we can figure out a link to B ; or Wikidata links to B, and we can also figure out the link to A. The Steam ID (P1733) has proven a major hub here, enabling linking to ten other IDs.
data-enrichment, for example
- country of origin (P495) from OGDB (P7564) (example)
- software engine (P408) based on Mod DB (P6774) (example)
- language of work or name (P407) based on Steam (example)
- OpenCritic’s review score (P444) based on OpenCritic ID (P2864) (example) (the number of OpenCritic scores climbed from 17 to 3556, by now the most represented review score source!)

(These are only examples, WikiProject users compiled a more comprehensive list ; I started to map them in a diagram but I gave up for now in front of the complex web drawn ^_^)

This can lead to very elegant dances of bots and humans passing the ball to each other − see for example the edit history of The Last Hero of Nostalgaia (Q114772057) or Cat Cafe Manager (Q111602956).

Some of these were ideas (identifier annotation & addition) I was toying with 4 years ago already, in my very first Year-in-review blogpost − ideas I never followed upon for lack of time and skill. I am very happy to see others independently formulate similar ideas, and more importantly execute on them. A big big thank you to Facenapalm, Nicereddy and Josh404 here! The interested reader can learn more by browsing their programs on Github: Facenapalm’s WikidataBot, Nicereddy’s random-scripts repo, Josh404’s P444_Q21039459.py.

Also worthy of note is Facenapalm’s script to easily create items based on a Steam ID: created in September, it has been used by its author to create over 3300 items − and also picked the interest of Nicereddy and Poslovitch who created another 700 items (see this database query). (EDIT: the author corrected me that the tool existed under an earlier form since March ; that accounts for another 2000 item creations)

UI-enhancement script

On Wikidata, we often establish relationships one-way: for example, we link expansion packs to the main game using expansion of (P8646), and not the other way around. That means that by default, on the StarCraft (Q165929) item page, you would not see any mention of Brood War (Q840409).

There are generic solutions for that, such as the RelatedItems gadget, but I wanted something tailored to our domain. ~~Jealous of~~ inspired by the ExMusica.js UI-enhancement script made by [[User:Nikki]] for WikiProject Music, I wrote ExLudo.js: a user-script that enhances the display of video-game related item pages:

on video game (Q7889):
- its video game expansion pack (Q209163) and video game mod (Q865493)
- the video game compilation (Q16070115) it’s on
- the remakes/remasters based on it.
on game engine (Q193564): the games using the engine
on video game series (Q7058673): the games of the series and their expansion packs/DLCs
on media franchise (Q196600): the works belonging to the franchise
on video game award (Q18328126): the winners of the award

I’m pretty proud of it, even though all the work really had been done by Nikki and I was merely tweaking it here and there. I see a lot of potential for WikiProjects developing their own domain-specific UI-enhancers.

A screenshot of text added by a Wikidata user-script, on the item page for the video game Postal 2. There are three headings: "Expansions", "On the compilations" and "Mods". Under each heading a bullet list with (respectively) each expansion, compilation and mod − with a hyperlink to that item page, and the publication date in parenthesis. — *Postal 2 (Q1974968)*

A screenshot of text added by a Wikidata user-script, on the item page for the video game series Dark Souls. There is a "In this series” heading.Under the heading a bullet list with each game in the series, ordered by publication date (displayed in parenthesis). Under each list item is a sub-list with the expansion packs of each game. — *Postal 2 (Q1974968)*

Some other things I worked on

In April, Twitter user Catel69 published “the first version of his complete list of French adventure games since 1982” (on Google Sheets). The list is impressive for its exhaustiveness, and I strongly believe such extensive data truly shine in an open, connected database (like Wikidata) and not in a close system like Google Docs. Folks like Catel should of course use the tools they prefer, and it’s up to us (me) to then bridge the result over to Wikidata. I thus loaded the Google Sheet into Mix’n’match to further our own coverage.

As part of the celebrations around Wikidata’s tenth birthday in October 2022, Wikimedia Austria organized the “DACH Culture Contest” to add and improve data about culture in Austria, Germany and Switzerland. I modestly contributed a few thousand edits on the topic of DACH video games, improving the coverage nicely.

Outreach and external interest

In June, I was invited by the German Literature Archive Marbach (Q1205813) to moderate a panel about video game metadata at their workshop “Games: Collecting, archiving, accessibility”. The speakers were Malina Riedl and Winfried Bergmeyer from the Stiftung Digitale Spielekultur (Q76632568) and Tracy Arndt and Tobias Steinke from the Deutsche Nationalbibliothek (Q27302) (I had met Winfried at a workshop in 2020, and collaborated with Tracy many times in the last years). It was my first time moderating a panel, and I hope I would do a better job next time over :-þ, but I am happy to have Wikidata part of such institutional and academic discussions.

In August was published the paper A practice of cataloging based on community-generated data as authorities: A case of a video game catalog (Q116918759) by Kazufumi Fukuda. My Japanese is non-existent, so I cannot really process what it says, but it sure mentions Wikidata a lot :-þ. This appears to be a follow-up to the 2019 Using Wikidata as Work Authority for Video Games (Q70467546) by the same author which I mentioned a few years back.

The Pixelvetica project

In April, I was interviewed by Magalie Vetter from Pixelvetica (Q116739051), a pilot project on video game preservation in Switzerland. This meeting came out of first contacts made back in 2021 with the Lausanne-based Gamelab UNIL-EPFL. The project report, “Sauvegarder le jeu vidéo suisse: État des lieux de la préservation du jeu vidéo en Suisse et dans le monde” (Q116770055), was published end of December.

The document is dense and exhaustive. It draws on in-depth interviews to establish the state of the art and current challenges of video game preservation and documentation ; presents an overview of the place of video games in Swiss cultural institutions based on a wide survey ; and concludes with recommendations (to policy makers, institutions, creators…) to develop video game preservation in Switzerland.

Close to my interests, chapter 2.1.2 is dedicated to metadata and description of games − both as artefact and as creative work (here called “panorama”). That section singles out Wikidata as “an interesting resource in which to invest”, emphasizing its openness, interoperability and durability. It points out how “linking one’s database to Wikidata allows to benefit from its multilingualism” (echoing what the ICS does) and to leverage “the research work already done elsewhere” ; while cautioning that this implies to “revamp the structure of one’s database” and to “take part in the life and discussions of the community”.

The appendices are also well worth a read. Appendix 2 is a deeper recount of the 10 interviews underpinning the report, with each time a section discussing the metadata model. Appendix 3 discusses community-driven preservation efforts, including metadata preservation. Appendix 5 is a deep dive through four metadata models for video games.

Finally, one of the final recommendations to archivists and librarians on archiving video games (section 3.4.2) reads as:
Regarding the description of [the document as creative work], we recommend that institutions pool their efforts via collective structures that enable to share the workload, either through participation in Wikidata or a common unified catalog at the national or international level.

I am delighted to see Wikidata mentioned in a report that reads like a Who’s Who of video game preservation − sharing the pages with organisations as established as MO5.com, institutions as prestigious as the French National Library, initiatives as hype as the Embracer Games Archive, tools as ubiquitous as KyroFlux. I hope we can live up to it, and I certainly look forward to working together with Swiss institutions.

My other take-away from this story is that it’s good to make contacts, even if not much happens at first: like seeds thrown in the wind, it may take years for them to sprout − and bear fruit.

The road ahead

This is my fifth year-in-review, so I know better by now than to commit to lead any big data model developments − although I have a couple of ideas of course ;-þ.

But what I will aim to do is pen more of these ideas down on this blog. This year has shown me like no other the power of long-form writing:

not video game related; but my post on the technical volunteer support informed a job description at Wikimedia Sverige and a Wikimedia Foundation pilot project.
I wrote the 50K milestone post hoping to make some noise, and it worked out: it was noticed by editors of zh.wikipedia, who pointed me to several Chinese-language databases to integrate. It was a good post format to tweet out in bits − tagging databases started a nice discussion with folks from IGDB too!
my 2019 mushroom blogpost has proven evergreen, an helpful link to drop as introduction ; and it was cited in the Pixelvetica report!

And if anything, I can try at least to write the next year in review before March 2024 :-þ

This work is licensed under a Creative Commons Attribution 4.0 International License.

2 commentaires

WikiData on Video Games – Set Side B dit :

4 mars 2023 à 16:00

[…] is a discursive lead-in to the work at WikiData in cataloging games and game sites, which is summarized for 2022 here. Information on their efforts was written up […]

Réponse
Wikidata and the sum of all video games − 2023 edition – Commonists dit :

5 mars 2024 à 22:15

[…] that time of the year! After the 2018, 2019, 2020, 2021 and 2022 recaps, let’s cover what happened in with Wikidata’s WikiProject Video games. If you are not […]

Réponse

Commonists

Culture libre: Wikidata, Wikimédia Commons, Wikipédia, photographies, Python…

Wikidata and the sum of all video games − 2022 edition

Overview