Every game we create at Filament Games is specifically tailored to meet pre-determined learning objectives. With this commitment to achieving learning objectives comes unique challenges in measuring the learning that occurs during gameplay. Prior to releasing games, we conduct playtests with members of our target audience and sometimes commission independent case studies that include pre and post-tests. We have online examples for Reach for the Sun and Molecubes. We have adopted several strategies for establishing and recording learning metrics from a technical perspective. I am going to outline five of these strategies.
1. Elegant Design – Completion Assessment
Recording game completion or progress is the simplest form of assessment. Elegant Design implies that the mechanics of the game are so tightly coupled to the learning objective that completing the experience demonstrates knowledge of the subject matter. For example, in Prisoner of Echo players learn about sound by manipulating sound waves and seeing how they affect the game environment. You cannot progress through the game without demonstrating your knowledge of sound.
Elegant Design is something that we strive for in every game, however the simplistic “checkmark” grade for completion works best for short games with one learning objective and one mechanic (for the uninitiated, a mechanic is a type of interaction in the game). Examples of games where this is a suitable strategy would be big concepts like “understanding what affects global warming” or “identifying the components of the national debt.” This strategy is not beneficial if you have several discrete learning objectives because it’s impossible to assess what players have learned if they don’t complete the in-game checkpoint.
2. Adaptive Difficulty – Difficulty Assessment
Some of the games we make not only include learning, but extended practice. When we have a learning objective where there are many problems with different degrees of difficulty, we can build the game to adapt to each user. In these adaptive systems every student learns at their own pace. Because the game adjusts the level of difficulty based on the user, the obvious metric to gather is the degree of difficulty the student reached. Measuring the efficacy of the game is very straightforward as you can directly evaluate the change in difficulty over time. There are many subject areas where these kinds of systems can be applied, with the most common being math.
3. Multi-part Problems – Scaffolding Assessment
Games have an amazing ability to offer exactly the help you need, right when you need it. Well scaffolded games provide numerous opportunities for in-game help. Sometimes this is as subtle as audiovisual cues, while other times it is as obvious as giant animated glowing arrows. This kind of help can be indirect, or the game can directly take control and advance you through a part of a problem. Instead of measuring right and wrong answers, we can record how much scaffolding a player required.
Whatever composes your game – levels, enemies, puzzles, etc – very few players will experience every bit scaffolding or every difficulty setting. These games have “low content density.” Each practice session with the game will likely only touch on a small percentage of content compared to a linear game where every player experiences 100% of the content developed.
4. Big Data – External Assessment
At their heart, games are a series of interesting decisions. As programmers, we can record every decision the player makes. Sometimes it is possible to measure every kind of decision against a scoring rubric and forward that information to a server. Typically these kinds of games are meant to be played several times in pursuit of incremental improvement. We have completed many projects using xAPI, which is a standardized way to store information. With access to all the data, third parties can then draw their own conclusions or make their own dashboards.
5. Emergent Gameplay – Behavioral Assessment
If big data tracks “what they did,” this assessment strives to capture “how they did it,” particularly where interactions can be combined and the order of actions matters. Not every game can do this. To facilitate Behavioral Assessment you would need a sandbox game where the problem space is large and there are multiple actions players can perform. Examples include toys like LEGO, or the game Minecraft, but it could also be applied to learning woodworking, cooking, or how to perform a surgery.
Learning occurs in the gamer’s head. Considering extreme scenarios, if someone plays a game perfectly I can’t prove the game taught them anything – they likely already knew the material. In the opposite extreme, someone could fail miserably, but still understand the material discussed in the game. Through the act of failing your preconceived ideas were challenged, and at the very least you learned what didn’t work. However, until you act on what you learned that change cannot be measured. We can only measure actions, not thoughts.
With this more complicated assessment we are looking for a change in behavior when presented with similar situations and comparing behavior against what is ideal. In some instances this includes tracking your sequence of input into the game and comparing active input to previous patterns. Trying anything new at least tells us that players identified a mistake they made or believe a process could be improved. This kind of pattern evaluation and diagnostic can be found in other fields of computer science like robotics and machine learning.
These techniques are only a few examples of how assessment can be measured through gameplay. Whether or not games are successful in the classroom depends on a variety of factors, as research indicates that games are most successful when they’re used in conjunction with traditional classroom teaching methods. In-game evaluation is not the only form of assessment. At Filament Games we are just as serious about the quality of the games as we are the educational outcomes – and we can prove it.