February 7, 2014

Calibrating CR

In my day job, outcomes measurement is critical to demonstrating success and ensuring continuous quality improvement.  A proposal has to promise measurable results.  Until 2000, few if any tabletop roleplaying games promised measurable results in terms of tension building, dramatic pacing and the player experience of risk and danger.  A heavily overlooked technological advance in 3rd edition D&D gave its players and DMs the concept of Challenge Rating.

Other games had attempted to produce systems for measuring how hard a task or scene was.  Most games provide narrative difficulty descriptors for skill check target numbers, but few had even attempted to gauge the threat level of a combat scene.  Notably, Shadowrun (second edition) introduced "Threat Level," which was merely a way to add bonus dice to NPC attack and defense rolls. 

Challenge Rating (CR) in 3rd edition D&D directly measures the level of intensity or drama in a scene.

So before you accuse me of being a "roll player" keep in mind that this is about...
  • Measuring and improving on the tools you have as a GM
  • Being able to fine-tune tension and pacing in a D&D game
System adjustments can make your game more fun.  If you're afraid of math, that's your problem, not mine.  Also, if you're looking for my advice on fine-tuning skill scenes, see:
  • http://runagame.blogspot.com/2013/07/easily-replacing-4e-combats-with-skill.html
  • http://runagame.blogspot.com/2013/08/4e-skill-challenge-example.html
  • http://runagame.blogspot.com/2013/01/conflict-resolution-options.html
  • http://runagame.blogspot.com/2013/01/gming-sim-play-scaffolds-and-boundaries.html
  • http://runagame.blogspot.com/2013/03/splitting-party.html

The Problem


In every edition of d20 since 2000, the challenge rating system has been thrown off by the players' character "builds."  GMs can act to limit overpowered characters or help underpowered characters catch up, but the party as a whole may vary from the "expected" power level the CR system assumes by a few levels.  Even a few levels either way can make one party breeze through a "boss" encounter while another party of the same level is slaughtered by the same monster. 

Luckily, because CR is a measurable value, we can re-calibrate. 

Challenge Rating helps GMs set a difficulty, which is a subjective experience.  A difficult encounter will generate some measurable results:

  • The player's will have a subjective experience of challenge and risk.
  • Characters will be more likely to "drop" (fall to 0 hit points and begin dying) or even be killed.
  • If the players feel that there is a greater risk, they will use valuable expendable items (such as scrolls or charges from staves) to help them win and avoid death.
So as a consequence, to re-calibrate a Pathfinder game's CR system, here is a quick survey handout for gathering some basic data and using itto adapt your player characters' Adjusted Party Level (APL).

Subjective Experience


The most important aspect of CR is the players' subjective experience of the encounter.  A combat encounter with a CR two levels above the party's APL is supposed to be "Hard."  If the players feel that it's hard, then your game is calibrated well.  If the players feel that it's easy (or overwhelming!) then you may need to adjust their APL.

After you've run your game for a few sessions and the players have a good feel for their characters and have developed some teamwork, start asking them to guess the challenge rating of encounters.  Do this for four or more encounters in a row, preferably of at least two or three different difficulty levels.  Use the Pathfinder difficulty levels (Easy, Average, Challenging, Hard and Epic) and add "Too Easy" and "Overwhelming" to cover extremes beyond the recommended range.  Ask them also if they think the encounter played to their strengths as a party (e.g. undead against a party with a lot of dvine characters) or exploited their weaknesses (e.g. golems against save-or-die casters), or neither.  Adjust their estimates based on their opinion about the party's strengths and weaknesses.  Here's a CR calibration survey form you can hand out to the players if you'd rather use paper forms!

Using the Survey Form

Distribute a copy of the survey to each player after each of 4-6 encounters.  You will need 20-30 copies of the survey depending on how many encounters you want data for and how many players you have to survey. Explain what you’re doing and ask them to fill in the monsters from the fight at the top.  Then describe each question and answer any questions they have.  Give them a minute, then collect the responses.

GREYBOX TEXT:  "I'm trying something I read on a blog.  This guy made a survey to help me calibrate the CR system in Pathfinder to my players' skills and character builds.  Please fill out this short survey and pass it back real quick.  For question 1, put “Werewolves” so I know which encounter this was.  For question 2, write down what you think this encounter was about, other than fighting.  For question 3, circle the answer that best describes how hard the encounter was.  For question 4, let me know if you think this encounter significantly played to your party’s strengths or weaknesses, since that affects the challenge."


Hand out surveys for each of 4 to 6 encounters.  You can do more than 6 and get better data, but it might get cumbersome for your players!  Once you have all the forms and don’t plan to collect any more data, it’s time to compile the answers.  Add the numbers next to the player's chosen answer to #3 and #4 on each sheet. 

EXAMPLE:  If the player selects E for #3 and B for #4, add "+2" and "-1" for a total of "+1."

Average all the players' sheets for each encounter.  This is their average perception of the encounter's difficulty expressed in terms of APL (this allows you to keep going even if the PCs gain a level in between!). 

EXAMPLE:  Ann answered 3.E and 4.B (+1 total).  Ben answered 3.C and 4.C (+1 total).  Chris answered 3.C and 4.C as well (+1 total).  Danny answered 3.E and 4.C (+2 total).  To compute an average, you add all the responses together (1+1+1+2=5) and divide by the number of responses (four responses, so 5/4=1.25).  In this example, the players’ average score is +1.25, so then they perceived the encounter to be APL+1.25 or just over "Challenging" (APL+1). 

Make a table of the encounters you assessed.  In the first column, write a name for the encounter.  In the second, write the CR in terms of APL.  In the third, write the players' average response in terms of APL.  In the fourth column, subtract [INTENDED CR] – [PLAYERS’ ACTUAL EXPERIENCE].  In the fifth column, count how many correctly guessed the purpose of the encounter, or at least came close.  This helps you assess whether you are conveying the story behind your encounters clearly.  Sometimes players only see a fight!

EXAMPLE:

Encounter
CR Intended
Players’ Actual Experience
Difference
(Intent-Actual)
Players Guessed Purpose?
Werewolves
APL+2
APL+1.25
+0.75
4/4
Ogres
APL+1
APL+0
+1
3/4
Bugbears
APL+0
APL-1.75
+1.75
4/4
Allip
APL+0
APL-0.75
+0.75
0/4

In the example above, we’ve surveyed 4 players for 4 encounters.  They almost always experience encounters as easier than the CR system expects.  Also, they totally didn’t understand the Allip fight, so the GM needs to troubleshoot why they didn't understand it, and how to avoid that problem in the future.  Was it just a "room with a monster" encounter?  Was the purpose of the fight not clear enough? 

Average the fourth column.  This tells you how much to adjust the group’s APL for future encounters if you want to calibrate your game.

EXAMPLE:  The average in this case is (0.75+1+1.75+0.75)/4=1.06. 

Consider if you want to calibrate your game.  If the numbers in column 4 are all within about 1-2 of each other, you probably got decent data.

EXAMPLE:  In this example, the lowest number is 0.75 and the highest is 1.75 – they’re all fairly close together.  And the average is reasonable.  It seems like the players are consistently experiencing encounters a little less challenging than the system predicts.  It would be good to try adjusting their APL upward by 1 for a while to see if it makes a difference. 

Re-assess the players with the same survey again three or four sessions later, if you decided to adjust their APL.  This is just to make sure your original assessment was reliable and valid.  If everything works out correctly, the average for column 4 for them should be close to 0.

Tracking Additional Game Outcomes


If you're interested in the Gamist creative agenda, or you're a game designer and interested in this stuff, you may want more than just subjective experience.  In that case, during encounters keep track of...
  • Times a PC drops to 0hp or less.  Any encounter where a PC drops should be at least Challenging. 
  • If any PCs are killed.  Any encounter where a PC dies should be at least Hard, or more likely Epic.
  • If any PC uses an expendable magic item (or charge from any wand of a spell above level 1). Any encounter where a player chooses to use a valuable expendable item should be at least Challenging, and if it's a very expensive item, it should be at least Hard.
I thought about giving guidance for assessing the threat rating of an encounter based on the total damage the PCs take.  I even did the math (DM me at @RunAGame on twitter if you want to hear about it!) but I realized it would be too cumbersome to track at the table.

Summary


Use it to calibrate CR for your Pathfinder (or 3.5) game by administering it for 4-6 encounters and averaging the results.  Adjust your party's APL by the average result.