Wikidata and the sum of all video games − 2018 edition

Any resemblance to any actual, highly successful Wikidata project is entirely coincidental.

Over the past 12 months, I have focused my Wikidata work on the topic of video games − the very same topic which got me started on Wikipedia more than 10 years ago.

I will start with a short overview of the current status of the topic on Wikidata, and some of my own contributions during the last year. I will then describe some of the challenges I came across, and outline my plan for 2019.

Overview

There are currently 35K video game (Q7889) items on Wikidata. Meanwhile, there are close to 48K records in Media Art Database (Q54760023), 49K in OGDB (Q60315954), over 66K in Giant Bomb (Q1657282), over 186K in MobyGames (Q612975), over 190K in IGDB (Q20056333) . These numbers can be misleading, since records in these databases may not map to a Q7889 Wikidata item ; yet they underline how far we still have to go in terms of coverage.

Using some SPARQL, we can also look at how well described these 35K items are: 9K have no platform (P400) information ; 9.5K no publication date (P577) ; 13K no genre (P136) ; 16K no publisher (P123) ; 17K no developer (P178).
And even closer to my heart: 14K bear no external identifiers whatsoever (and 16K no videogame-related identifiers).

All that to say: there is a lot to do 🙂

Some contributions in 2018

Linking to external databases

My current endgame is to cross-link Wikidata with as many external databases about videogames.

Some numbers for starters of my personal 2018 achievements in that area:

Now, why am I doing this?

  • Leveraging on existing databases is, in my opinion, our best shot at achieving completion. This was already the case back in 2007 on Wikipedia, using them to build long lists (and sometimes fully blue-ing them ;-)). These catalogs may not be enough for actual sourcing, but definitely good enough to know something exists (and look for other sources). While huge databases (MobyGames, GiantBomb, IGDB…) are handy, I strongly believe smaller, highly-specialized databases are crucial.
  • it has been said that Wikidata is becoming the universal glue of the Internet. A few of these databases are linked to one-another, but that’s the exception. Wikidata can become the hub linking to all of them, reconciling heterogeneous data models in the process.
  • some of the data cannot be hosted or does not belong to Wikidata. We will not be hosting copyrighted game covers or screenshots any time soon ; nor will we store, say, average completion time. By linking to specialized databases that do host such data, we make it possible to use it in other ways.
  • While slurping data from these databases is not appropriate, linking to them will open the door to automated sourcing and consistency checking.
  • There is a deadline. I only started compiling this todo-list a year ago, and already some of these databases are gone (pcepc.com, N-sider). The Internet Archive may save the individual record pages, but not the discovery mechanism (typically, there is no static list of all records, but only dynamic database search).

Community and project management

With the help of SPARQL-guru @WikidataFacts, I built a Listeria-based dashboard to keep track of new video game items. I routinely go through it to ensure they have the basic properties and some external identifiers.

On the request of my video game colleague FR, I made a red-links Listeria-list of video games that have articles on many Wikipedias, but not in French. This is a classic yet powerful way for Wikipedians to use Wikidata for their work, and is now being used to create high-priority articles on French-language Wikipedia.

Platforms and hardware

While working on the games, I also ventured into platforms and hardware. Inspired by diggr’s platform_mapping tool, I used Mix’n’match to get 5 external databases aligned with Wikidata . Trying to clarify the ontology, I created the items for both video game console model (Q56682555) and computer model (Q55990535), and reorganized many platforms using subclass of (P279). I also worked on game controllers, on both Wikidata and Wikimedia Commons.

Contacts

In 2018, I made interesting contacts with external parties interested in Wikidata, both in and out of the academic world: the folks from the diggr project (Hi Tracy!), the people of IGDB, and recently folks from Stanford University Libraries − although I unfortunately lacked time to fully follow up on them.

Things that are missing

After a year on the topic, I have encountered many  challenges − I will outline a couple here.

We are missing essential vocabulary to describe things:

However, all issues are dwarfed by one: the data model for video games. We sort-of inherited the current one (or lack thereof) from Wikipedia, where one article may compile knowledge on various elements (ports, remakes…). In the same way that books settled on using FRBR (Q16388), we should be using a more sophisticated data model, informed by current academic research. The paper A conceptual model for video games and interactive media (Q50180436) is a helpful read, differentiating game, editions and local releases.

The interested reader may be referred to more of my own thoughts, as well as Tracy’s thoughts on the topic.

My 2019 roadmap

This is the time for overly ambitious goals for the year to come! What shall I do in 2019?

I will strive to automate some typical identifier management operations. This will include:

  • annotating identifier claims (using qualifiers). Likely candidates are adding platform to claims GameFAQs/Gamerankings, Metacritic, Amiga databases, Guardiana…
  • harvesting from Wikipedia. I had good success importing from identifier templates (using the handy Harvest Templates), the next step would be to harvest (probably semi-automatically as it is used on most articles) URLs from Wikipedia articles − this worked well for old-computers.com IDs. Metacritic would be a particularly good candidate for this, with over 15K game-URLs on English Wikipedia.
  • importing identifiers from other databases. Some databases link to others, and we could leverage this to import IDs − either directly (Metacritic links to GameFAQs, GameFAQs links to Gamespot…) or in reverse (IGDB links to eg Steam and GOG, so we can populate IGDB IDs based on the Steam and GOG IDs ).

My 2018 side-project was about platforms and hardware. In 2019, I may venture in the topic of sourcing and bibliographic metadata: for example, creating items for all issues of major video game magazines, and indexing the reviews that way.

I will try to reach-out to Wikipedians and involve more of them in Wikidata business. There are not that many of us working on the topic on Wikidata, and we could use all the extra help we can get. Also, the decisions made on Wikidata (eg, the data model) will have an impact on how the data is helpful for Wikipedia.

I mentioned earlier the contacts I made over the year. I will cultivate these relationships further, hoping to involve more organisations and institutions, establish networks of cooperation and enable large-scale data donation/editing.

Finally, I will drive the conversation around the video game data model, involving the various stakeholders (Wikipedians, academic researchers, database maintainers) in the discussions.

Come-on, it will be fun!


Creative Commons LicenceThis work is licensed under a Creative Commons Attribution 4.0 International License.

7 commentaires

Votre commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l’aide de votre compte WordPress.com. Déconnexion /  Changer )

Photo Facebook

Vous commentez à l’aide de votre compte Facebook. Déconnexion /  Changer )

Connexion à %s