Directory of All Essays

Saturday, November 17, 2007

Project Horseshoe 2007 slides: Smashing the game design atom

Here are my slides (with talking notes) from Project Horseshoe. I blazed through this in about 30 minutes since dinner was waiting and there is nothing more ornery than a crew of wild haired game designers in complete glucose crash. See if you can spot the source of the infamous '8mm' meme that stalked the conference.

Since it was Halloween yesterday, let's start with a tale of horror. Not so long ago there was an experienced team, working with a known platform, and a known engine. They had just scored a popular girl friendly license valued at roughly $160 million. Their game had the potential to hit as big as Pokemon or Nintendogs.

The designer ignored all this. You see, he had always wanted to make a Zelda with the critical element that has always been missing from all Zelda games: Hardcore jumping puzzles. The designer thought, "Nintendo is smart, but how could they have missed such an obvious improvement?" Sure the license was targeted at tween girls, but tweens like big swords don't they? This was the game he personally had always wanted to play.

The team were contractually obligated to go along with the design. It had been green lighted. There were milestones tied to the completion of each voluminous chapter of the tome like design script. If the team missed the milestones, the penalties were extreme. So they crunched in happy little silos of artists, level designers and programmers, all in accordance to a strict production schedule. It was the only possible way to get all the specified content done in time for the looming holiday ship date.

Finally, as the end drew near, they sent it off to testing. Early reports come back that the jumps in the first few levels were rather clumsy. The designer relied on his gut and sent forth an email containing a new set of parameters that were intended to polish the jump mechanics.

Eventually, a month later, someone got around to playing the last few levels. Uh oh. They relied heavily on laboriously constructed jumping puzzles tuned to the old jump mechanics. The last few levels of the game were massively broken.

It was too late to fix all the early levels so they entered into a death march to rework the last few levels. Lacking time or resources, they busted their budget hiring extra crew to take up the extra workload. The game was still delayed by several months.

Surprise, surprise, the end result wasn’t a very good game. It received miserable scores, but even worse, the core audience who would have bought it on the license alone was completely alienated by the design decisions. They wanted to befriend and grow cute little animals. They didn't want to die repeatedly while being attacked by spiky monsters while scrambling through military obstacle courses.

When the licensee pulled out of the sequel, the team collapsed. The human cost was immense. Home were lost, families relocated, many were so burnt out on game development they left the industry permanently, their passion crushed.

There were a lot of problems in this tale, but the primary one was a blatant failure of the game design process at almost every level. The game designer really didn’t know what he was doing. He thought he was writing a movie script. He thought he was making a game for himself. He had no idea that the game systems he was dabbling in were deeply interconnected.

Most game design that occurs isn’t much better off. It is a combination of craft (such as cloning Zelda) and intuition (such as when he hoped that tweaking the jumping mechanics would fix all his problems.) There is no science here. No predictability.

We have the ability to do so much better. We can create a science of game design.

If we want to modernize game design and move beyond the land of craft and intuition, we need to face and conquer four great challenges. These challenges will define game design for at least the next twenty years and perhaps longer.

  • Modeling our players: What do our player really want?
  • Generating new games systems: How do we create new mechanics that serve those player needs?
  • Metrics: How and what do we test to make sure we are doing the right thing?
  • Results: How do we get the results quickly and iterate on them so we can evolve the game quickly?
If we solve these issues and start spreading the resulting practices across the industry, horror stories like the one I just told will become mostly a thing of the past. The ultimate promise of a deep pragmatic science of game design is better game, happier teams and fewer human tragedies.

We are starting to see a smattering of theorists and teams with money to burn tackling these problems. They are creating systems that save their butts from expensive design mistakes. This is damned exciting.
  • You’ve got the Halo folks tracking heat maps of where players die. Valve has been relying on metrics for years. Nintendo builds early player tests and kleenex tester right into their dev process.
  • On the game systems side you’ve got Raph’s game grammar.
  • We are starting to rely on real data to model players moods and reactions with Chris Bateman and Nicole Lazarro’s work.
The work is still at the stage where most pragmatic folks think of these systems as the domain of eccentrics. Yet, each isolated advance reminds me of the turn of the century when physics was being cracked. Brilliant theorists. Great experiments. World changing results.

All these systems are being developed in parallel. You can measure things, but you don’t know what you are supposed to measure. You can write about game grammar, but it never is anything more than a loosely applied system of egghead analysis.

Maybe, just maybe, we can come up with a unified system that tries to answer multiple challenges simultaneously. The connections are all there. We just need to put them together.

In my Seattle laboratory, I'd been working on one attempt. It mixes game grammar, player models and measurement systems into one delightfully unified game design process. I’ve got 10 minutes left to share it with you. Think I can do it?

I started with a player model. Let's assume for a moment that players are naturally inclined to wander about, sucking up tidbits of info in the hope of learning interesting new skills.

From the player model we can construct an atomic feedback loop that describes how they learn all the new skills. This basic atomic loop includes all the fundamental pieces of a game. We are taking the deconstructed analytic elements described in so many books and tying them back together into a functional system.

  • You’ve got your game system, that black box in the center of the loops.
  • You’ve got your player input
  • You've got feedback to the player
  • You have the the players cognitive model of the game.

We’ve reduced the scope to a single atom in order to making things managable and
Press button, character jumps. That’s a skill atom.

Once we have a skill atom we can say interesting things about how the player interacts with it. First, skill atoms have states.
  • The player can figure out how to jump. They can exercise that skill by jumping on things. That is mastery. We can record this.
  • The player can fail to figure out how to jump. They never touch the button again. That’s early burnout.
  • They can get bored with jumping and ever jump again. That is late burnout. We can measure this as well.

Skill atoms are chained together. You can visualize them as a directed graph. Later stage atoms depend heavily on early stage atoms.

Want to kill a Koopa? You need to jump on him. Better hope you mastered the jump skill. We can now represent that classic relationship created by Miyamoto ages ago in a visual model. The theory is slowly catching up with the experimentalists.

You can turn these little chains into big chains that describe the entire game. Here’s a skill chain of Tetris.

Skill chains are remarkably flexible and rather easy to apply to almost any game. You look for the actions the user is performing, the skills they are learning and the positive / negative feedback you’ve built into the game. Any game with explicit rewards can be diagrammed.

There are probably a goodly number of you rolling your eyes at this point. You can create pretty diagrams to analyze anything. Here we've got someone who has created a very lovely and describing diagram of a penguin defecating. This is not a helpful diagram.

We ultimately need pragmatic everyday tools, not egghead analytics. The primary reason we create skill chains is to help solve two of our outstanding challenges:
  • Get real results quickly
  • Choose the right metrics so we aren't wading through huge quantities of data.

Skill chains can be used to create a rapid, iterative test driven game design process.

If we really rapid feedback, let’s build the feedback system into the game from the very beginning. Skip the giant paper tome phase. Start with a playable system that gives you meaningful reports.

The nice thing about skill atoms is that they eminently testable. When you write code that is going to be put in front of player, define your skill atoms. Its the same conceptual ideas behind writing unit tests.
  • You have a test framework.
  • You write the tests when you write game logic.
  • You run the test suite when you run the game logic.
  • You get a clean simple report when someone plays the game.

When you write your game systems, you can instrument each and every atom. It is a relatively inexpensive process.
  • You labels the rewards
  • You label the actions

You know when and atom is touched. You know when it is inactive. All those, states, burnout, inactive, etc you can record.

Remember burnout? The next time someone plays the game, we can visualize burnout directly on our skill chain diagram. You see instantly what atoms folks are missing. Here is someone failing to figure out how to complete a single line in Tetris.

You can also look at the data in terms of feedback channels and activity graphs.

Either way you get quick, easy to decipher feedback.
  • Instead of having a team that creates customized visualizations tailored to your game, you can use a more generalized system.
  • Instead of sorting through dozens or hundreds of badly organized logs, you can see in a glance where problems are occurring.

This requires a change in your development methodology. You want people to play your game as early as possible and as often as possible. Luckily automated testing of skill atoms reduces the cost substantially compared to traditional manual tests.
  • Anytime that anyone, anywhere in the world runs you game, you get valuable play balancing information.
  • Build up a database of a thousand players and release your daily builds to three people a day for every single day of your dev cycle.
In this day of web 2.0 and connected consoles this is now a broadly accessible practice.

Once you have rapid, daily feedback in place, you can use the resulting reports to evolve your design iteratively. All this analytical game grammar silliness becomes a foundational feedback system.
  • We can regression test game designs now.
  • We can fix busted skill atoms and see how things improve the next day.
  • What happen when we refactor our designs to make them more testable? I have no idea, but it excites me.
Imagine if a system like this had been in place when the 'designer' in our horror story made his jumping tweaks. The dashboard would have gone dark almost instantly with burnout spreading across the screen.

The systems I've described today are just the beginning; rough sketch of the future, if you will.
  • Our player models are primitive.
  • Our metrics can advance dramatically in their sophistication. We are just starting to tap into biometrics
  • Our player testing systems are still expensive to run.
  • There are amazing new games waiting to be designed and evolved into stunning experiences.
The great challenges are still out there. Both the theory and practice of our science is still very being born. Sometimes I wonder, "Who is going to take game design to the next level?"

I love this picture. 1927 5th Solvay conference. Einstein, Bohr, Curie. 17 out of 29 attendees went on to win the Nobel prize.

The first conference was in 1911, almost a hundred years ago. Einstein was the youngest present. Who is the youngest person here? These quirky, brilliant people revolutionized our understanding of physics. Without their work, we wouldn’t have semi-conductors, computers or video games. They were theorists and experimenters not so different than what we have in our industry today. A small group of eggheads changed the world.

I look out at this group and I see the same potential. We’ve got the brains. We’ve got the time.

Let’s make this an amazing weekend."

take care

PS: There was one more group photo shown immediately after the Solvay photo. It however, has been redacted due to national security concerns.

Labels: , , , ,

Read more!

Monday, April 04, 2005

Design Testing: The use of addiction metrics to force rapid evolution of innovative game designs

One of my goals with this blog is to formulate a 'new game development methodology' that empowers the little guy and helps the growth of innovation in the game industry. How do we build innovative, highly addictive games more quickly and with lower risk? Part of the answer is the rigid application of gaming metrics to the process of improving player addiction.

The Legacy of Cowboy Designers
The traditional designer is a cowboy designer. Modern game designs are the result of the messy, content dependent process a cowboy designer intuitively follows when building a game.

Cowboy Programmers
The term comes from the land of programming where early programmers would whip out l33t code in as little time as possible. Cowboy programmers were lone guns, experts in their field who possessed a deeply intuitive understanding of what works and what doesn't.

Coding standard, methodologies, even team work were taboo for the ancient cowboy programmers. Their code was inscrutable and many decisions seemed arbitrary. Troubles inevitably arose as the industry matured. Project didn't scale and many failed as they were bogged down in a plague of bugs. Eventually the world figured out better ways of programming that were more reliable, less risky, and produced better results. God bless process advancement.

Cowboy Designers: Copy cats with a hip attitude
Cowboy Designers are similar in many ways. They shoot from the hip when it comes to decisions, relying on their own finely tuned sense of 'fun' to design systems and create requirement docs. This sense of 'fun' is typically built up after internalizing the game play of dozens of similar game titles.

Such expertise works well when you are creating a clone or focusing on the later layers of the game design where you can't do much damage. Adding the 101st 'designer inspired' Pokemon is about as risk free as adding the 100th one. Subtle, oh so well crafted variations on existing themes are the bread and butter of a cowboy designer. The 'I could build it better' syndrome that drives many game designers is not only a contributing factor to the stagnation of innovation, but is actively encouraged by most game publishers as a means of reducing risk.

Cowboy designers stifle innovation
The big problem is that intuitive cowboy designers have high failure rates when it comes to inventing core game mechanics. Experience in pre-existing genres is a poor guide for success when your job is to put together new rules that result in dynamically different psychological scenarios. When cowboy designers attempt to refine a new genre, one of two things typically occur.
  • The half-breed design: The designer mixes two well defined genres. Since their decisions are informed by experiance and not psychology, the result is rarely enjoyable to play.
  • The mush design: The design mixes multiple genres together with rules that are untested and arbitrary in nature. Randomly designed games tend to a remarkably low success rate.

Either way, the result is ruined teams and failed games. It is no wonder that publishers are adverse to large investments in innovation.

Bad process, not bad people
It doesn't have to be this way. We are simply using the wrong design methodology to build innovative games. The old cowboy method only works with Shooter Clone #64. It fails miserably when attempting something new. With the right design methodology built around the concept of make risky game design decisions painless, innovative games have the opportunity to prosper.

Introducing feedback: A miracle design tool
What we really need is a reliable feedback mechanism that lets us reduce investment risk in order to create a safety net for innovation.

The modern feedback desert
Consider the traditional feedback cycle in game design. You spend 12 to 18 months building a game. You recieve the majority of your feedback from traditional game testers and internal 'team testing'. The information is very useful, but runs into several difficulties

  • The information is subjective: Most feedback is qualitative and is filtered through a pre-biased team of hardcore game developers.
  • Feedback is not statistically valid: Testing occurs with small numbers of testers that do not accurately represent the target market, nor are their opinions verifiably the same as larger market.

At this stage you still don't get a chance to react. Almost all gameplay fixes occur in the higher layers of the game design due to its lower risk nature. You can change some art or a few engine variables, but rarely is there time to alter core game mechanics due to the exponential cost of change in a content heavy system.

Once the game is released and in the public's hands, you finally get your first pieces of accurate feedback on your design decisions. You either sell a lot of copies, or you don't. If your game happens to end up a failure, there is no second chance. You didn't get it right the first time and now your entire team will be culled in a grand bloodletting by your disappointed publisher.

This isn't a healthy feedback cycle. The opportunity to make meaninful changes is limited from an early stage. Those who make mistakes are punished dramatically. Those who survive see their lifeless brethren on the roadside and learn that risk taking is dangerous.

Specifying a useful feedback mechanism
We need a tool that:

  • Rapidly informs us when design decisions unbalance the game
  • Lets us test multiple variations on a rule without risk
  • Allows us to see the effects of a change before we invest heavily in expensive, difficult-to-change content.
Such a design tool would allow incremental investments in new game designs. If you make a mistake, you can back that change out without putting the entire project at risk. Since the cycle time on between changes, feedback and exception or reject of the change is short, the team can iterate through a series of changes quickly.

Enter the Metrics

There are a wide variety of testing systems available that give us interesting feedback.

  • Unit testing
  • Market testing
  • Design testing

I'll briefly describe each the first two and then explain how Design testing can radically change how you go about game design.

Unit testing
The most common testing in game development is the unit test, borrowed from Agile programming methodologies. These are covered extensively in a wide variety of books and websites and deal primarily with code integrity and refactorability. This is certainly good important stuff that is essential to creating an agile game design methodology. However, unit tests address only programmer risk, not design risk.

Market testing (aka Market Research)
Another common method of product testing involves giving a product sample to users and having them rate how likely they would be to purchase. Market tesing is a huge field and contains everything from focus groups, to concept testing, to full on market testing with a wide scale deployment of the finished product.

Traditional market testing has some fundamental problems when applied directly to game design.

  • Expensive: This restricts its use to only the biggest of game developers and publishers.
  • Provides limited insight: Second, and most damning is that it is nearly impossible to tell anything about the addictive qualities of a game without actually playing it. I can show you a box with a guy with a gun on it and ask potential players if they would buy it. But such a survey gives me no meaningful information on whether or not I have the next Halo or Daikatana on my hands. "How does it play?" is critical competitive information.
Games, as a testable product, exist in a market research vacuum. Many of the tradition techniques honed over decade of consumer product research simply do not apply. They don't capture 'addiction', the competitive essence of games.

Design testing
We need tests and metrics that capture such ephemeral qualities as 'fun' and 'addiction'.

What makes me think we can test 'fun' and 'addiction'? I believe that core game mechanics rely on relatively simple psychological reward schedules. A successfully addicted player exhibits easily identifiable behavioral symptoms. By tracking these symptoms in a statistically valid manner, the designer gains useful feedback on the addictive properties of their gaming system.

Common Metrics for Design Testing

Testing for addiction is easier than you might imagine. The following are easily gathered metrics for measuring system-wide addictive behavior.
  • Length of playtime
  • Intensity of play time
  • Willingness to play again
  • Length between play times
  • Number of play times
  • Spot exit surveys
Game Token Metrics
You can also get more atomic and measure metrics for each game token in order to dig down into why a particular pattern of addiction is occuring.

  • Use time of each token
  • Frequency of use for each token
  • Gap between token use
  • Spot surveys of user's token enjoyment
ROI metrics
Finally, you can gather ROI metrics by combining the information from metrics above with cost of production information from your project tracking. This gives you some interesting information on where your development investments are paying off.

  • ROI of each token: Calculated by use time / development cost.
  • ROI of each game system
Intriguing results immediately pop to the surface once you implement ROI metrics. Additional level design has a marginal ROI. Character art is the same. You can add a new monster or a new level to the game and the addictive qualities of the title don't budge an inch. On the other hand, add a new powerup system and watch the addiction rise.

Control Charts
You can track these metrics on control charts. This simple charting method tracks changes in specific metrics over time. When a system is changed, you can usually see the results immediately in the control chart.

In general there will be one or two key metrics that (Key Performance Indicators, or KPI) that give you a strong indication of the addiction of your gaming system. Other metrics will be secondary factors that influence you KPI. For example, the reuse time of the powerup system is not the single most important factor in the game, but it influences total session playing time, which is your primary indicator of addiction.

Using the data
Once you have the control charts populated with data, it is a simple matter following a clearly defined change regimen

  1. Create a design change
  2. propagate that change your game players
  3. Measure the results
  4. If the change is positive compared to the previous baseline, keep the functionality.
  5. If the change is negative or mixed, create a new set of changes
  6. Track the key metrics over time to ensure that there is a steady improvement.
Other areas of future exploration
This is a very rough overview of the techniques involved in design testing. It is both a broad and deep field that borrows from many well-developed ideas in the world of market research and process engineering and applies them them to the problem of game design. Other topics include:
Batch testing: Test a wide number of variations in game design mechanics simultaneously. Take the best results and explore them further.
Tie KPI to financially meaningful results: Use regression analysis to link key statistics to financially meaningful results. For online games, measure re-up rate on subscriptions. For ad-based games, measure customer referral rates and number of impressions. For shareware games, measure initial purchase rates.

Design Testing Limitations

Design testing is the core of a rather radical new game design methodology. Let's take a look at some of the limitations.

Not every game can be design tested
Design testing is not for every team nor is it for every type of game. To borrow a term from the agile programmer world, most modern game designs are poorly refactored. They are clumsy, non object-oriented messes of content spaghetti strung together by cowboy designers and their complacent teams of artists and programmers. The typical modern game design has the following attributes:

  • Change is expensive.
  • Testing takes forever.
  • Development cycles are long
  • Static content is king

None of this is conducive to an effective design testable system. I think of applying design testing to an adventure game and shudder. The sheer mass of the static level content combined with the linear sequencing of content results in a system where a change to one location has no effect on any other location. Players are likely to play the game only once and it will take them 40 hours to complete. Good luck getting any timely feedback.

Requirements for a design-testable game
To use design testing as part of our process, we need a game design that is ammendable to thorough application of the technique. The following are some key characteristics of a design-testable game.

  • Refactored Design: The game is composed of highly reusable object-oriented elements. Changes to these elements propagate throughout the entire game system.
  • Game Mechanics focused, not Content focused: Static content in the form of level designs, sequenced boss attacks, fixed plot points, etc is rarely used. Instead the focus is on interesting game rules, meta game rules, dynamically generated levels to create an enjoyable game experience.
  • Automated update mechanism: The game designer can rapidly push changes out to a population of game players.
  • Real-time metrics: When a change is made, statistics on current player usage are immediately sent back to a central electronic dashboard. Most commonly this will be through an internet-based tracking system connected directly into the game.
  • Large population of game players: Statistics are worse than meaningless if you do not have a pertinent population to survey. Testable games require a large standing population of active game players. This suggests extensive open betas and other mechanisms to encourage player interaction before the game is finished. Subscription-based models also work well with this requirement.
Markets ripe for design testing
Online games have a clear advantage. Many of the tracking systems are already built into the web and you already have logs and a database ready to receive the data. You are guaranteed 100% correct information since you see everything that occurs. MMOGs are already doing many of the things outlined in this essay. Their success is readily apparent and I challenge you to find a more addictive genre.

Consoles are moving online in the next generation and most gaming PC's are online already. The technological infrastructure is certainly possible. All it takes are teams innovative enough to improve their development process. Indy games, Nintendo DS titles, and mass market consumer titles all are place where new methodologies might blossom.

Is design testing worth it?

If you want to make a design testable game, you need to throw out decades of highly polished game design experience and theory. You need to rely on cold metrics instead of your warm fuzzy 'I could do it better' cowboy designer instincts. You need to shun common content heavy genres that you grew up with and love in the deepest core of your gamer heart.

What you lose
Design testing fundamentally changes how games are developed.

  • Long development schedules cloistered away from the public: Feedback critical to the product's final success. Alphas, Closed beta, and Open betas become essential tools to releasing a polished game.
  • Offline games: If you aren't online, you've got no way of gathering data about gameplay.
  • Static level design: You need a refactored game design that allows changes to be made quickly.
  • Plot: If the ROI isn't there, kill it. You've got data that proves you better things to spend you development time working on.
What you gain
What you get in return is the ability to make radically addictive, highly competitive games with limited risk of failure.

  1. Increase your competitive advantage: The other guy is spending all his effort just to maintain his position at the top of the king-of-the-genre battle. He invest in mature genres and every game burns out his hardcore audience a little more. You can come onto the scene with a fresh new title that is more addictive than his current offering. When his FPS is merely one of many such competing titles, your title is a one of the kind 'must have' title.
  2. Reduce your costs: Instead of spending millions on movie level content, you gain your addictive rush from intelligent, informed game mechanics. The result is a lower cost structure. Because you have ROI metrics built into your design
  3. Reduce product failure risk: From the very beginning of development you know, to the decimal point, how addictive your game is with your target market. This lets you cull the bad games early, and focus your efforts on the winners.
If I can make a game that does the three things listed above, I'm willing to give up all the game design traditions that don't work for design-testable games.

The result is a refactored, innovation friendly game design methodology. You can take risks and succeed. You can spend less money and still beat the big guy. As the next generation titles come upon us, the smaller game developers have a choice. They can either work smarter or they can die. Design testing is a great tool for avoiding the later.


Labels: , , , ,

Read more!