### How I Learned to Stop Worrying and Love the Hypergeometric Distribution

Posted:

**Sun Apr 14, 2013 12:56 am**This is intended to provide readers with the tools to figure out draw percentages for themselves, instead of following a bunch of thumb rules about how many land to include in a deck and how many of a non-land card to include.

It's a work-in-progress, and will be edited in place. I've included a copy of the spreadsheet on Google Docs, which I'll keep updated as I tweak it. I plan to do some additional calculations regarding land and mana.

Magic is a game of strategy, but it is affected strongly by random chance. Good deckbuilders are able to assemble a pile of 60 cards that provide a good chance of delivering effective and efficient plays during the turns that the deck intends to be strong.

Probability

From Wikipedia, probability is defined as a measure of the expectation that an event will occur or a statement is true. Probabilities are given a value between 0 (will not occur) and 1 (will occur). The higher the probability of an event, the more certain we are that the event will occur.

Every time we draw a card, we can calculate the probability of the card being one that we want. The only information we need to know prior to the draw is the number of cards in the deck and the number of cards within that define a "success".

Single Draw Probability

Since we're red mages, we all know and love this card. I've included the original Revised Lightning Bolt art, because it was the very first one I ever saw when I was 12 years old. I fell in love with 3 damage for [mana]R[/mana] and I've been a red mage ever since.

In my examples, I'll generally use Lightning Bolt as the card that we want to draw from the deck.

Example: If we have a pile of 40 cards, 4 of which are Lightning Bolt, what is the probability of drawing a Lightning Bolt? Probability, defined as P equals the number of successes divided by the total size of the population.

P = ( # successes ) / ( Population size )

P = ( 4 Lightning Bolts ) / ( 40 cards )

P = 4 / 40

P = 0.1 (or 10% - multiply any probability by 100 to find the percentage)

If we shuffle the same pile of 40 cards and draw 1 card, we should draw a Lightning Bolt 1 out of 10 times if we repeat the process enough times for random variance to wash out.

Multiple Draw Probability

Example: What if we take the same pile of 40 cards (4 Lightning Bolts within) and draw 2 cards instead? What are the odds of drawing a Lightning Bolt?

For the first draw, the probability is the

same (P = 10%), but the second draw changes depending on the result of the first draw, since we're not putting the first draw card back into the deck.

P_draw1 = 4 / 40

P_draw1 = 0.1 (10%)

Now we have 1 card in hand and 39 cards in the deck. If the first card we drew wasn't a Lightning Bolt, we still have 4 left in the deck.

P_draw2 = 4 / 39

P_draw2 = 0.103 (10.3%)

When we miss with the first draw, the probability of drawing a Lightning Bolt on the second draw goes up. Nice!

What if we pull a Lightning Bolt on the first draw? What happens to our probability of seeing a Lightning Bolt on the second draw, too?

P_draw2 = 3 / 39 (since there are only 3 left in the deck now)

P_draw2 = 0.077 (7.7%)

This process can be repeated by hand for any number of deck sizes, card multiples within, and number of draw steps. If only there was an easy way to do this without sheets of engineering paper and a slide rule...

The Hypergeometric Distribution

Luckily, all this has been worked out by smart mathematicians and statisticians who wanted to find a general formula for probability of selecting arbitrary numbers of items from a population.

From Wikipedia, the Hypergeometric Distribution is defined as a discrete probability distribution that describes the probability of k successes in n draws from a finite population of size N containing m successes without replacement.

Oh, that's not crystal clear? Those darn mathematicians, always complicating things! In plain English, this means that the hypergeometric distribution models the probability of drawing an item (such as a Magic card) from a population (such as a deck) without replacing the drawn item.

When we want to figure out probabilities of drawing a card from a deck of cards, we use the hypergeometric distribution.

I'll leave the math proof up to the mathematicians, but you can verify it on the Wikipedia page linked above.

The Hypergeometric Distribution and YOU!

If you have access to Microsoft Office, start up your copy of Excel and you can begin using a built-in function called HYPGEOMDIST(). The format of this function, formatted for our Magic application, is HYPGEOMDIST(# of cards we want to draw, number of cards drawn, # of card in the deck, total cards in the deck).

For the earlier example of drawing 1 Lightning Bolt from a pile of 40 cards, we'd use the following function: HYPGEOMDIST(1,1,4,40).

Then press enter. The cell will change to the result of the calculation.

Look at that, 10%! That seems like a lot of work for a single draw calculation, but the hypergeometric calculation really shines when you start looking at multiple draws involving different parameters. Remember the second example where we wanted to draw 2 cards and see how many times we'd find a Lightning Bolt? We end up with 3 possibility for the draw result: We'll either draw 0 Lightning Bolts, 1 Lightning Bolt, or 2 Lightning Bolts. The probability of each result is quite different, but we'd have to do a lot of calculations (or a lot of experiments) to find out the probability of each result.

Using the hypergeometric function saves us a lot of time.

Example: Want to know the probability of drawing 0 Lightning Bolts if you draw 2 cards from a 40 card deck containing 4 Lightning Bolts?

P = HYPGEOMDIST(0,2,4,40)

P = 0.808 (80.8%)

Example: Want to know the probability of drawing 1 Lightning Bolt if you draw 2 cards from a 40 card deck containing 4 Lightning Bolts?

P = HYPGEOMDIST(1,2,4,40)

P = 0.185 (18.5%)

Example: Want to know the

probability of drawing 2 Lightning Bolts if you draw 2 cards from a 40 card deck containing 4 Lightning Bolts?

P = HYPGEOMDIST(2,2,4,40)

P = 0.008 (0.8%)

These results are pretty intuitive. If you draw 2 cards, you won't draw your Lightning Bolt very often, you'll sometimes draw 1, and you'll extremely rarely draw 2.

I rounded the results a little bit, but if you look at the raw numbers, something interesting happens.

P(0) = 0.8077

P(1) = 0.1846

P(2) = 0.0077

The probabilities all add up to 1. This isn't nearly as intuitive, but it means that we've accurately modeled the probabilities of every possible combination of Lightning Bolt and non-Lightning Bolt if we draw 2 cards.

This saves us some time when building spreadsheets, because instead of adding up a bunch of stuff, we can just start with 1 and subtract the values we don't want. Whatever is left is the answer we want.

Example: What is the probability of drawing at least one Lightning Bolts if you draw 2 cards from a 40 card deck containing 4 Lightning Bolts?

P = 1 - HYPGEOMDIST(0,2,4,40)

P = 0.192 (19.2%)

This turns out to be quite a useful calculation! In Magic, we tend to want to know "how soon will I see X card?" and the hypergeometric distribution lets us build tables to find out how soon you'll see it, statistically.

Now, when the goofballs at your LGS give you bad advice like "you should just run 2, bro, so you won't see it in your opener! It'll come down on turn 5 no problem!" you'll be able to see exactly how often you'll actually draw it in your opener and whether that's OK.

Pretty graphics!

I've built some nice tables to help us figure out how often we'll see cards on different turns, depending on whether we're on the play or the draw and how many cards are in the deck.

Chance to draw at least 1

Supporting calculation: P = 1 - HYPGEOMDIST(0, # in deck , 7 card opener, 60 card deck), ascending to include 1 extra card per draw. Percentages on the draw start with 1 extra card.

Chance to draw at least 2

Supporting calculation: P = 1 - HYPGEOMDIST(0, # in deck , 7 card opener, 60 card deck) - HYPGEOMDIST(1, # in deck, 7 card opener, 60 card deck), ascending to include 1 extra card per draw.

This one is a bit more complex, since we're trying to hit 2 of the same card, not just 1. We subtract P(0) and P(1) from 1, which gives us the probability of hitting at least 2.

We could repeat the same calculation to hit 3 of the same card. P = 1 - P(0) - P(1) - P(2), but this percentage is incredibly small.

Using the tables

Since we define what the "success" is, we can use a row in these tables for multiple purposes. For example, if we play a deck with 12 1-drops (like Br zombies), what is the likelihood of drawing 2 in an opening hand? Assuming we don't care which of the 1-drops they are, look at the 12 row, turn 1 on the 2nd chart. We have a 42.8% chance of drawing at least 2 1-drops in their opening hand on the play. Pretty decent odds. What if we cut some of our maindeck creatures and move to an 8 1-drop build? Now we have a 23.2% chance of seeing 2 in our opening hand.

After building these tables, I immediately stopped putting 1-ofs in my deck. For aggro decks, we're primarily concerned with turns 1-5, and the probability of seeing a 1-of on the play by turn 5 is 18.3%, which is pretty miserable. On the draw, the probability improves to 20%, but that's not really much of an improvement. If I put it in my deck, I want to see it.

If I want to see it, why would I only want to see it in 1 out of 5 games? If the game goes long, I still only stand a 26.7% chance of seeing a 1-of by turn 10 on the play, 28.3% on the draw. This seems like bad planning in a deck with no tutors.

The minimum I'll run of anything now is 2. By turn 5, I'll see a 2-of 33.6% of the time on the play and 36.3% on the draw. 1 in 3 games isn't terrible. If the game goes long, my probability of seeing a 2-of by turn 10 is 46.6% on the play, 49% on the draw. So maybe every other game that goes long will cough up that 2-of.

Thundermaw Hellkites in my opener, ZOMG!

This dude is almost as good as a Lightning Bolt, right? The art is similar at least, and he's way better than Shivan Dragon.

Before I built the table, I was pretty nervous about running 3-4 Thundermaw Hellkites in my deck because "what if I draw it in my opener? Might as well have mulliganed!" Enough emotional response, let's look at the tables. If I put 3 Hellkites in my deck, I'll see them in my opener 31.5% of the time on the play, 35.4% on the draw. 1 out of 3 games isn't backbreaking.

What about the nightmare scenario, seeing a Hellkite in the opener and then drawing one on turn 2? Look at the tables! With 3 Hellkites, I stand a 4.4% chance of seeing 2 of them by turn 2 on the play, 5.6% on the draw. This is hardly worth worrying about, and running 3-of Hellkite lets me see them more regularly. If I put them in the deck, I want to see them and beat faces.

So... should I run 3 of everything?

Nope! Some stuff you'd love to see in your opener. 4-of Stromkirk Noble is obvious, you'll see him on turn 1 39.9% of the time on the play, 44.5% on the draw. Roughly every other game, you'll see his smug white face in your opener. If you run 4 Stromkirk Noble and 4 Rakdos Cackler, you'll see at least 1 in the opener 60.1% of the time on the play, 70.6% on the draw. Pretty consistent!

Bigger stuff like a Thundermaw Hellkite should be a 2-of or a 3-of, because you'll minimize the chance to draw them in your opener, but stand a reasonable chance of seeing at least 1 by turn 5 if you're curving out.

Card draw and why we love it

Remember those Underworld Connections I kept raving about? The amazing thing about extra card draw is that we move further along the tables without having to spend extra turns. If we can reliably cast our Underworld Connections or include card drawing spells like Faithless Looting, Wild Guess or Dangerous Wager, we can move over 1 or 2 columns at a time. This increases your chance to see every one of your cards, which will let you run fewer bombs to clog up your opening hand.

Example: If I run 2-of Thundermaw Hellkite, but I stick an Underworld Connections on turn 4, how much do I improve my chances of seeing a Hellkite on curve?

P = 33.6% (Hellkite on turn 5, right on curve)

P(with UC) = 36.3% (1 extra card drawn)

The extra card draw gives us +2.7% chance to see our Hellkite next turn. If the Hellkite was a 3-of, we'd go from 46.2% to 49.5%, an extra +3.3%. Running multiples in the deck will increase the positive effect of extra card draw.

Example: How many extra cards do I need to draw to make a 2-of behave like a 3-of on turn 5?

P(3-of), turn 5 = 46.2%

P(2-of), turn 5 = 33.6%

Move along the 2-of row until you hit ~46%. This happens on turn 10, so you'll need to draw 5 extra cards to make your 2-ofs behave like 3-ofs from a draw percentage standpoint.

Land!

Supporting calculation: P = 1 - HYPGEOMDIST(0, land count in deck, 7 card opener, 60 card deck) - HYPGEOMDIST(n-1, ... ) - HYPGEOMDIST(n, ...) etc. for each extra turn.

This is why we care about the hypergeometric distribution. Playing spells on time means having land on time. The table above will show you how likely you will have at least one land to play on each turn of the game, assuming no extra card draw and a 7 card opening hand.

Note that this table does not take into account any flooding mitigation. If you draw 7 land in your opener, the table considers that a success since you've drawn at least one.

The table shows about what we'd expect. Higher land count means drawing more land, which means playing more of your spells on curve. Pretty intuitive, I think.

What's surprising about this table is what happens between the "low aggro" land count of 22 and the "accepted" land counts of 24, especially around turn 5. With a 22 land deck, you'll see that 5th land on turn 5 36.7% of the time, or roughly once in 3 games. With a 24 land deck, you'll see that 5th land 46.7% of the time, or roughly once every other game. This is a big difference from a consistency perspective, so please consider how many land you run carefully. Cutting 1 to wedge in an extra spell might hurt you in the long run, even if you don't notice it.

How much land you choose to run is mostly dictated by your top end spells. If you really need that Hellrider on turn 4, play enough land that you'll see it on turn 4 more often than not. The break-even point is 50%, so I wouldn't consider playing a Hellrider Sligh deck with any less than 22 land. If I want to play my Thundermaw Hellkite on curve, I'd start at 25. You'll notice that our friendly neighborhood red mage, Zemanjaski, ran 25 land in his 4x Hellkite "Sledgehammer" deck. He may actually be a robot, because that's excellent land optimization when it comes to top end curve considerations.

Multivariate Hypergeometric Distributions

Holy crap, we're about to go really deep here. The tables above are really nice for calculating land drops, but what about deeper analysis where you want to draw more than 1 type of card? That's where the multivariate hypergeometric distribution comes in.

The multivariate hypergeometric distribution (abbreviated MHD from here) allows you to determine the probability that a given sample of elements from a population will contain m elements of type x and n elements of type y. This distribution can be extended to any number of elements and any number of types.

To really understand the MHD, we need to delve deeper into a mathematical function knows as the binomial coefficient. Commonly abbreviated nCr on calculators, the binomial coefficient allows you to determine how many ways that you can choose k items from a sample population of n. It is commonly written shorthand as (n choose k), and that's the form that I'll use here.

Example: How many possible combinations of 1 Lightning Bolt can I draw from a population of 4 Lightning Bolts?

# of combinations = (4 choose 1)

# of combinations = 4

Example: How many possible combinations of 2 Lightning Bolts can I draw from a population of 4 Lightning Bolts?

# of combinations = (4 choose 2)

# of combinations = 6

Once we're comfortable with the (n choose k) form, we can investigate how to calculate a multivariate hypergeometric distribution.

A probability is, at its most basic, a ratio of possibilities. When we calculate a draw probability in Magic, we are simply comparing how many ways we draw what we want and how many possible draws the population could produce. If the ratio is high, we can be more inclined to get what we want more often than not.

Example: In a deck of 4 Lightning Bolts and 4 Mountains, how many possible ways can we draw a Mountain AND a Lightning Bolt?

# of combinations of 1 Mountain = (4 choose 1)

# of combinations of 1 Mountain = 4

# of combinations of 1 Lightning Bolt = (4 choose 1)

# of combinations of 1 Lightning Bolt = 4

This make sense. If you numbered each Mountain and each Lightning Bolt, you'd find that you can select 1 from the pile 4 different ways.

# of combinations of 1 Mountain and 1 Lightning Bolt = (4 choose 1) * (4 choose 1)

# of combinations of 1 Mountain and 1 Lightning Bolt = 16

This makes some sense. If you numbered each Lightning Bolt and Mountain and made individual, unique pairs of them, you'd get 16.

Example: How many unique configurations of 2 cards could I draw from a deck of 4 Lightning Bolts and 4 Mountains?

# of combinations = (8 choose 2)

# of combinations = 28

Lightning Bolt + Mountain = BFF?

It's fun to know that we can draw 16 unique configurations of Lightning Bolt and Mountain if we draw 2, but how does that help me? If we remember that probability is simply a ratio of what we want to what we might get, we can start making some calculations.

Example: What is the probability of drawing a Lightning Bolt and a Mountain from the deck above?

P = # of ways to get what we want / # of possible combinations that we might get

P = 16 / 28

P = 0.571 (57.1%)

Cool! If we draw 2 cards from the top of our 8 card deck, we'll get a hand to burn our opponent immediately 57.1% of the time.

This is a MHD, but we approached it intuitively instead of jumping directly into the formula.

MHD and you!

To calculate the probability of drawing 1 Lightning Bolt and 1 Mountain from the deck above, we form an equation like this.

P = (n1 choose k1) * (n2 choose k2) / (n_total choose k_total)

P = (4 choose 1) * (4 choose 1) / (8 choose 2)

P = 4 * 4 / 28

P = 0.571 (57.1%)

This is the same result that we calculated above, but all wrapped up into one handy formula.

To form the fraction, we multiply what thing we want * what other thing we want and divide it by how many possible combinations could the population give me when I sample it?.

MHD and mana bases

Let's take the following sample deck and do some analysis on it -

[deck]Spells

4 Pillar of Flame

4 Searing Spear

Creatures

4 Stromkirk Noble

4 Rakdos Cackler

2 Stonewright

4 Ash Zealot

4 Gore-House Chainwalker

4 Pyreheart Wolf

4 Falkenrath Aristocrat

Lands

18 Mountain

4 Blood Crypt

2 Hellion Crucible[/deck]

It's a nice 2-color red deck with a minimal black splash for Falkenrath Aristocrat. Hasty and explosive, we like it! It will also let us do some tricky things with the MHD to evaluate whether we'll have mana on time to cast our spells.

Example: On the play, what is the probability of drawing a Stromkirk Noble in your opening hand and having no untapped turn 1 red sources to cast it?

We'll form our numerator from 3 distinct combinations. I'll use the form C_(description) to represent an (n choose k) combination.

C_stromkirk = (4 choose 1)

C_untapped red = (22 choose 0)

C_everything else = (34 choose 6)

C_possible draws = (60 choose 7)

C_stromkirk = (4 choose 1) seems pretty obvious. We have 4 in the deck, we want to have 1 in the opener. We can do this 4 ways.

C_stromkirk = 4

C_untapped red = (22 choose 0) seems a little strange, right? We limited ourselves to turn 1 untapped red, so right away our Hellion Crucibles can't help us. We have 4 Blood Crypts and 18 Mountains, so 22 untapped red sources total. We're calculating failure here, so we want to choose 0.

C_untapped red = 1

C_everything else = (34 choose 6) is the least intuitive part of the calculation. If we want our opening hand to look like 1 Stromkirk Noble and 0 red-producing lands, it must have 6 more cards. We want to be sure that we're not drawing more Nobles or any of those red sources, so we will fill the rest of the hand it with whatever is left in the deck. 60 - 4 - 22 = 34. We fill the rest of our hand with 34 cards from that population with (34 choose 6).

C_everything else = 1,344,904

C_possible draws = (60 choose 7) is the number of ways that we could draw an opening hand of 7. In this case, it's BIG.

C_possible draws = 386,206,920

We multiply the combinations of things that we want and divide by the total number of possible combinations in the sample, and out pops a probability!

P = C_stromkirk * C_untapped red * C_everything else / C_possible draws

P = 4 * 1 * 1,344,904 / 386,206,920

P = 0.0139 (1.39%)

This is great news! If we draw 1 Stromkirk Noble, the odds of not having the mana to cast him on turn 1 is 1.39%. I can sleep at night with this low risky deck design.

As a check, you should always find that the ns in your numerator add up to the n in your denominator. Similarly, the ks in the numerator should add up to the k in the denominator. To check our math, 4 + 22 + 34 = 60 and 1 + 0 + 6 = 7.

Example: On the play, what is the probability of drawing an Ash Zealot by turn 2 and not having the mana to cast it?

There are three ways to fail here, so we have to be wary of the OR condition when evaluating probabilities. When we can fail in one way OR any other number of ways, the probabilities add together.

P = P_1 + P_2 + P_3

P_1 represents the chance of drawing an Ash Zealot and no land.

P_2 represents the chance of drawing an Ash Zealot and only 1 land.

P_3 represents the chance of drawing an Ash Zealot and 2 lands of the wrong kind (1 Mountain and 1 Hellion Crucible)

Failure mode #1 - no lander

C_ash zealot = (4 choose 1)

C_ash zealot = 4

C_land = (24 choose 0) since we're looking to hit 0 land in our 24 land deck.

C_land = 1

C_everything else = (32 choose 7) since we're on turn 2! We will see 8 cards by turn 2 and need to fill the rest of the hand with them.

C_everything else = 3,365,856

C_possible draws = (60 choose 8)

C_possible draws = 2,558,620,845

P_1 = (4 choose 1) * (24 choose

0) * (32 choose 7) / (60 choose 8)

P_1 = 0.0053 (0.53%)

Math check: 4 + 24 + 32 = 60, 1 + 0 + 7 = 8.

Failure mode #2 - 1 lander

C_ash zealot = (4 choose 1)

C_ash zealot = 4

C_land = (24 choose 1) since we're looking to hit just 1 land in our 24 land deck.

C_land = 24

C_everything else = (32 choose 6) since we will see 8 cards by turn 2 and we need to fill the remaining 6 slots in the hand.

C_everything else = 906,192

C_possible draws = (60 choose 8)

C_possible draws = 2,558,620,845

P_2 = (4 choose 1) * (24 choose 1) * (32 choose 6) / (60 choose 8)

P_2 = 0.034 (3.4%)

Math check: 4 + 24 + 32 = 60, 1 + 1 + 6 = 8.

Failure mode #3 - right mana, wrong colors

C_ash zealot = (4 choose 1)

C_ash zealot = 4

C_land_red = (22 choose 1) since we're looking to hit just 1 red land

C_land_red = 22

C_land_other = (2 choose 1) since we have 2 non-red lands in the deck and we want to choose 1

C_land_other = 2

C_everything else = (32 choose 5) since we will see 8 cards by turn 2 and we need to fill the remaining 5 slots in the hand.

C_everything else = 201,376

C_possible draws = (60 choose 8)

C_possible draws = 2,558,620,845

P_3 = (4 choose 1) * (22 choose 1) * (2 choose 1) * (32 choose 5) / (60 choose 8)

P_3 = 0.0139 (1.39%)

Math check: 4 + 22 + 2 + 32 = 60, 1 + 1 + 1 + 5 = 8.

Now that we've analyzed our failure modes, let's look at the probability of Ash Zealot languishing in your hand.

P_sad Zealot = P_1 + P_2 + P_3

P_sad Zealot = 0.53% + 3.4% + 1.39%

P_sad Zealot = 5.32%

The odds are good that your Zealot will be happy and swinging on turn 2.

It's a work-in-progress, and will be edited in place. I've included a copy of the spreadsheet on Google Docs, which I'll keep updated as I tweak it. I plan to do some additional calculations regarding land and mana.

Magic is a game of strategy, but it is affected strongly by random chance. Good deckbuilders are able to assemble a pile of 60 cards that provide a good chance of delivering effective and efficient plays during the turns that the deck intends to be strong.

Probability

From Wikipedia, probability is defined as a measure of the expectation that an event will occur or a statement is true. Probabilities are given a value between 0 (will not occur) and 1 (will occur). The higher the probability of an event, the more certain we are that the event will occur.

Every time we draw a card, we can calculate the probability of the card being one that we want. The only information we need to know prior to the draw is the number of cards in the deck and the number of cards within that define a "success".

Single Draw Probability

Since we're red mages, we all know and love this card. I've included the original Revised Lightning Bolt art, because it was the very first one I ever saw when I was 12 years old. I fell in love with 3 damage for [mana]R[/mana] and I've been a red mage ever since.

In my examples, I'll generally use Lightning Bolt as the card that we want to draw from the deck.

Example: If we have a pile of 40 cards, 4 of which are Lightning Bolt, what is the probability of drawing a Lightning Bolt? Probability, defined as P equals the number of successes divided by the total size of the population.

P = ( # successes ) / ( Population size )

P = ( 4 Lightning Bolts ) / ( 40 cards )

P = 4 / 40

P = 0.1 (or 10% - multiply any probability by 100 to find the percentage)

If we shuffle the same pile of 40 cards and draw 1 card, we should draw a Lightning Bolt 1 out of 10 times if we repeat the process enough times for random variance to wash out.

Multiple Draw Probability

Example: What if we take the same pile of 40 cards (4 Lightning Bolts within) and draw 2 cards instead? What are the odds of drawing a Lightning Bolt?

For the first draw, the probability is the

same (P = 10%), but the second draw changes depending on the result of the first draw, since we're not putting the first draw card back into the deck.

P_draw1 = 4 / 40

P_draw1 = 0.1 (10%)

Now we have 1 card in hand and 39 cards in the deck. If the first card we drew wasn't a Lightning Bolt, we still have 4 left in the deck.

P_draw2 = 4 / 39

P_draw2 = 0.103 (10.3%)

When we miss with the first draw, the probability of drawing a Lightning Bolt on the second draw goes up. Nice!

What if we pull a Lightning Bolt on the first draw? What happens to our probability of seeing a Lightning Bolt on the second draw, too?

P_draw2 = 3 / 39 (since there are only 3 left in the deck now)

P_draw2 = 0.077 (7.7%)

This process can be repeated by hand for any number of deck sizes, card multiples within, and number of draw steps. If only there was an easy way to do this without sheets of engineering paper and a slide rule...

The Hypergeometric Distribution

Luckily, all this has been worked out by smart mathematicians and statisticians who wanted to find a general formula for probability of selecting arbitrary numbers of items from a population.

From Wikipedia, the Hypergeometric Distribution is defined as a discrete probability distribution that describes the probability of k successes in n draws from a finite population of size N containing m successes without replacement.

Oh, that's not crystal clear? Those darn mathematicians, always complicating things! In plain English, this means that the hypergeometric distribution models the probability of drawing an item (such as a Magic card) from a population (such as a deck) without replacing the drawn item.

When we want to figure out probabilities of drawing a card from a deck of cards, we use the hypergeometric distribution.

I'll leave the math proof up to the mathematicians, but you can verify it on the Wikipedia page linked above.

The Hypergeometric Distribution and YOU!

If you have access to Microsoft Office, start up your copy of Excel and you can begin using a built-in function called HYPGEOMDIST(). The format of this function, formatted for our Magic application, is HYPGEOMDIST(# of cards we want to draw, number of cards drawn, # of card in the deck, total cards in the deck).

For the earlier example of drawing 1 Lightning Bolt from a pile of 40 cards, we'd use the following function: HYPGEOMDIST(1,1,4,40).

Then press enter. The cell will change to the result of the calculation.

Look at that, 10%! That seems like a lot of work for a single draw calculation, but the hypergeometric calculation really shines when you start looking at multiple draws involving different parameters. Remember the second example where we wanted to draw 2 cards and see how many times we'd find a Lightning Bolt? We end up with 3 possibility for the draw result: We'll either draw 0 Lightning Bolts, 1 Lightning Bolt, or 2 Lightning Bolts. The probability of each result is quite different, but we'd have to do a lot of calculations (or a lot of experiments) to find out the probability of each result.

Using the hypergeometric function saves us a lot of time.

Example: Want to know the probability of drawing 0 Lightning Bolts if you draw 2 cards from a 40 card deck containing 4 Lightning Bolts?

P = HYPGEOMDIST(0,2,4,40)

P = 0.808 (80.8%)

Example: Want to know the probability of drawing 1 Lightning Bolt if you draw 2 cards from a 40 card deck containing 4 Lightning Bolts?

P = HYPGEOMDIST(1,2,4,40)

P = 0.185 (18.5%)

Example: Want to know the

probability of drawing 2 Lightning Bolts if you draw 2 cards from a 40 card deck containing 4 Lightning Bolts?

P = HYPGEOMDIST(2,2,4,40)

P = 0.008 (0.8%)

These results are pretty intuitive. If you draw 2 cards, you won't draw your Lightning Bolt very often, you'll sometimes draw 1, and you'll extremely rarely draw 2.

I rounded the results a little bit, but if you look at the raw numbers, something interesting happens.

P(0) = 0.8077

P(1) = 0.1846

P(2) = 0.0077

The probabilities all add up to 1. This isn't nearly as intuitive, but it means that we've accurately modeled the probabilities of every possible combination of Lightning Bolt and non-Lightning Bolt if we draw 2 cards.

This saves us some time when building spreadsheets, because instead of adding up a bunch of stuff, we can just start with 1 and subtract the values we don't want. Whatever is left is the answer we want.

Example: What is the probability of drawing at least one Lightning Bolts if you draw 2 cards from a 40 card deck containing 4 Lightning Bolts?

P = 1 - HYPGEOMDIST(0,2,4,40)

P = 0.192 (19.2%)

This turns out to be quite a useful calculation! In Magic, we tend to want to know "how soon will I see X card?" and the hypergeometric distribution lets us build tables to find out how soon you'll see it, statistically.

Now, when the goofballs at your LGS give you bad advice like "you should just run 2, bro, so you won't see it in your opener! It'll come down on turn 5 no problem!" you'll be able to see exactly how often you'll actually draw it in your opener and whether that's OK.

Pretty graphics!

I've built some nice tables to help us figure out how often we'll see cards on different turns, depending on whether we're on the play or the draw and how many cards are in the deck.

Chance to draw at least 1

Supporting calculation: P = 1 - HYPGEOMDIST(0, # in deck , 7 card opener, 60 card deck), ascending to include 1 extra card per draw. Percentages on the draw start with 1 extra card.

Chance to draw at least 2

Supporting calculation: P = 1 - HYPGEOMDIST(0, # in deck , 7 card opener, 60 card deck) - HYPGEOMDIST(1, # in deck, 7 card opener, 60 card deck), ascending to include 1 extra card per draw.

This one is a bit more complex, since we're trying to hit 2 of the same card, not just 1. We subtract P(0) and P(1) from 1, which gives us the probability of hitting at least 2.

We could repeat the same calculation to hit 3 of the same card. P = 1 - P(0) - P(1) - P(2), but this percentage is incredibly small.

Using the tables

Since we define what the "success" is, we can use a row in these tables for multiple purposes. For example, if we play a deck with 12 1-drops (like Br zombies), what is the likelihood of drawing 2 in an opening hand? Assuming we don't care which of the 1-drops they are, look at the 12 row, turn 1 on the 2nd chart. We have a 42.8% chance of drawing at least 2 1-drops in their opening hand on the play. Pretty decent odds. What if we cut some of our maindeck creatures and move to an 8 1-drop build? Now we have a 23.2% chance of seeing 2 in our opening hand.

After building these tables, I immediately stopped putting 1-ofs in my deck. For aggro decks, we're primarily concerned with turns 1-5, and the probability of seeing a 1-of on the play by turn 5 is 18.3%, which is pretty miserable. On the draw, the probability improves to 20%, but that's not really much of an improvement. If I put it in my deck, I want to see it.

If I want to see it, why would I only want to see it in 1 out of 5 games? If the game goes long, I still only stand a 26.7% chance of seeing a 1-of by turn 10 on the play, 28.3% on the draw. This seems like bad planning in a deck with no tutors.

The minimum I'll run of anything now is 2. By turn 5, I'll see a 2-of 33.6% of the time on the play and 36.3% on the draw. 1 in 3 games isn't terrible. If the game goes long, my probability of seeing a 2-of by turn 10 is 46.6% on the play, 49% on the draw. So maybe every other game that goes long will cough up that 2-of.

Thundermaw Hellkites in my opener, ZOMG!

This dude is almost as good as a Lightning Bolt, right? The art is similar at least, and he's way better than Shivan Dragon.

Before I built the table, I was pretty nervous about running 3-4 Thundermaw Hellkites in my deck because "what if I draw it in my opener? Might as well have mulliganed!" Enough emotional response, let's look at the tables. If I put 3 Hellkites in my deck, I'll see them in my opener 31.5% of the time on the play, 35.4% on the draw. 1 out of 3 games isn't backbreaking.

What about the nightmare scenario, seeing a Hellkite in the opener and then drawing one on turn 2? Look at the tables! With 3 Hellkites, I stand a 4.4% chance of seeing 2 of them by turn 2 on the play, 5.6% on the draw. This is hardly worth worrying about, and running 3-of Hellkite lets me see them more regularly. If I put them in the deck, I want to see them and beat faces.

So... should I run 3 of everything?

Nope! Some stuff you'd love to see in your opener. 4-of Stromkirk Noble is obvious, you'll see him on turn 1 39.9% of the time on the play, 44.5% on the draw. Roughly every other game, you'll see his smug white face in your opener. If you run 4 Stromkirk Noble and 4 Rakdos Cackler, you'll see at least 1 in the opener 60.1% of the time on the play, 70.6% on the draw. Pretty consistent!

Bigger stuff like a Thundermaw Hellkite should be a 2-of or a 3-of, because you'll minimize the chance to draw them in your opener, but stand a reasonable chance of seeing at least 1 by turn 5 if you're curving out.

Card draw and why we love it

Remember those Underworld Connections I kept raving about? The amazing thing about extra card draw is that we move further along the tables without having to spend extra turns. If we can reliably cast our Underworld Connections or include card drawing spells like Faithless Looting, Wild Guess or Dangerous Wager, we can move over 1 or 2 columns at a time. This increases your chance to see every one of your cards, which will let you run fewer bombs to clog up your opening hand.

Example: If I run 2-of Thundermaw Hellkite, but I stick an Underworld Connections on turn 4, how much do I improve my chances of seeing a Hellkite on curve?

P = 33.6% (Hellkite on turn 5, right on curve)

P(with UC) = 36.3% (1 extra card drawn)

The extra card draw gives us +2.7% chance to see our Hellkite next turn. If the Hellkite was a 3-of, we'd go from 46.2% to 49.5%, an extra +3.3%. Running multiples in the deck will increase the positive effect of extra card draw.

Example: How many extra cards do I need to draw to make a 2-of behave like a 3-of on turn 5?

P(3-of), turn 5 = 46.2%

P(2-of), turn 5 = 33.6%

Move along the 2-of row until you hit ~46%. This happens on turn 10, so you'll need to draw 5 extra cards to make your 2-ofs behave like 3-ofs from a draw percentage standpoint.

Land!

Supporting calculation: P = 1 - HYPGEOMDIST(0, land count in deck, 7 card opener, 60 card deck) - HYPGEOMDIST(n-1, ... ) - HYPGEOMDIST(n, ...) etc. for each extra turn.

This is why we care about the hypergeometric distribution. Playing spells on time means having land on time. The table above will show you how likely you will have at least one land to play on each turn of the game, assuming no extra card draw and a 7 card opening hand.

Note that this table does not take into account any flooding mitigation. If you draw 7 land in your opener, the table considers that a success since you've drawn at least one.

The table shows about what we'd expect. Higher land count means drawing more land, which means playing more of your spells on curve. Pretty intuitive, I think.

What's surprising about this table is what happens between the "low aggro" land count of 22 and the "accepted" land counts of 24, especially around turn 5. With a 22 land deck, you'll see that 5th land on turn 5 36.7% of the time, or roughly once in 3 games. With a 24 land deck, you'll see that 5th land 46.7% of the time, or roughly once every other game. This is a big difference from a consistency perspective, so please consider how many land you run carefully. Cutting 1 to wedge in an extra spell might hurt you in the long run, even if you don't notice it.

How much land you choose to run is mostly dictated by your top end spells. If you really need that Hellrider on turn 4, play enough land that you'll see it on turn 4 more often than not. The break-even point is 50%, so I wouldn't consider playing a Hellrider Sligh deck with any less than 22 land. If I want to play my Thundermaw Hellkite on curve, I'd start at 25. You'll notice that our friendly neighborhood red mage, Zemanjaski, ran 25 land in his 4x Hellkite "Sledgehammer" deck. He may actually be a robot, because that's excellent land optimization when it comes to top end curve considerations.

Multivariate Hypergeometric Distributions

Holy crap, we're about to go really deep here. The tables above are really nice for calculating land drops, but what about deeper analysis where you want to draw more than 1 type of card? That's where the multivariate hypergeometric distribution comes in.

The multivariate hypergeometric distribution (abbreviated MHD from here) allows you to determine the probability that a given sample of elements from a population will contain m elements of type x and n elements of type y. This distribution can be extended to any number of elements and any number of types.

To really understand the MHD, we need to delve deeper into a mathematical function knows as the binomial coefficient. Commonly abbreviated nCr on calculators, the binomial coefficient allows you to determine how many ways that you can choose k items from a sample population of n. It is commonly written shorthand as (n choose k), and that's the form that I'll use here.

Example: How many possible combinations of 1 Lightning Bolt can I draw from a population of 4 Lightning Bolts?

# of combinations = (4 choose 1)

# of combinations = 4

Example: How many possible combinations of 2 Lightning Bolts can I draw from a population of 4 Lightning Bolts?

# of combinations = (4 choose 2)

# of combinations = 6

Once we're comfortable with the (n choose k) form, we can investigate how to calculate a multivariate hypergeometric distribution.

A probability is, at its most basic, a ratio of possibilities. When we calculate a draw probability in Magic, we are simply comparing how many ways we draw what we want and how many possible draws the population could produce. If the ratio is high, we can be more inclined to get what we want more often than not.

Example: In a deck of 4 Lightning Bolts and 4 Mountains, how many possible ways can we draw a Mountain AND a Lightning Bolt?

# of combinations of 1 Mountain = (4 choose 1)

# of combinations of 1 Mountain = 4

# of combinations of 1 Lightning Bolt = (4 choose 1)

# of combinations of 1 Lightning Bolt = 4

This make sense. If you numbered each Mountain and each Lightning Bolt, you'd find that you can select 1 from the pile 4 different ways.

# of combinations of 1 Mountain and 1 Lightning Bolt = (4 choose 1) * (4 choose 1)

# of combinations of 1 Mountain and 1 Lightning Bolt = 16

This makes some sense. If you numbered each Lightning Bolt and Mountain and made individual, unique pairs of them, you'd get 16.

Example: How many unique configurations of 2 cards could I draw from a deck of 4 Lightning Bolts and 4 Mountains?

# of combinations = (8 choose 2)

# of combinations = 28

Lightning Bolt + Mountain = BFF?

It's fun to know that we can draw 16 unique configurations of Lightning Bolt and Mountain if we draw 2, but how does that help me? If we remember that probability is simply a ratio of what we want to what we might get, we can start making some calculations.

Example: What is the probability of drawing a Lightning Bolt and a Mountain from the deck above?

P = # of ways to get what we want / # of possible combinations that we might get

P = 16 / 28

P = 0.571 (57.1%)

Cool! If we draw 2 cards from the top of our 8 card deck, we'll get a hand to burn our opponent immediately 57.1% of the time.

This is a MHD, but we approached it intuitively instead of jumping directly into the formula.

MHD and you!

To calculate the probability of drawing 1 Lightning Bolt and 1 Mountain from the deck above, we form an equation like this.

P = (n1 choose k1) * (n2 choose k2) / (n_total choose k_total)

P = (4 choose 1) * (4 choose 1) / (8 choose 2)

P = 4 * 4 / 28

P = 0.571 (57.1%)

This is the same result that we calculated above, but all wrapped up into one handy formula.

To form the fraction, we multiply what thing we want * what other thing we want and divide it by how many possible combinations could the population give me when I sample it?.

MHD and mana bases

Let's take the following sample deck and do some analysis on it -

[deck]Spells

4 Pillar of Flame

4 Searing Spear

Creatures

4 Stromkirk Noble

4 Rakdos Cackler

2 Stonewright

4 Ash Zealot

4 Gore-House Chainwalker

4 Pyreheart Wolf

4 Falkenrath Aristocrat

Lands

18 Mountain

4 Blood Crypt

2 Hellion Crucible[/deck]

It's a nice 2-color red deck with a minimal black splash for Falkenrath Aristocrat. Hasty and explosive, we like it! It will also let us do some tricky things with the MHD to evaluate whether we'll have mana on time to cast our spells.

Example: On the play, what is the probability of drawing a Stromkirk Noble in your opening hand and having no untapped turn 1 red sources to cast it?

We'll form our numerator from 3 distinct combinations. I'll use the form C_(description) to represent an (n choose k) combination.

C_stromkirk = (4 choose 1)

C_untapped red = (22 choose 0)

C_everything else = (34 choose 6)

C_possible draws = (60 choose 7)

C_stromkirk = (4 choose 1) seems pretty obvious. We have 4 in the deck, we want to have 1 in the opener. We can do this 4 ways.

C_stromkirk = 4

C_untapped red = (22 choose 0) seems a little strange, right? We limited ourselves to turn 1 untapped red, so right away our Hellion Crucibles can't help us. We have 4 Blood Crypts and 18 Mountains, so 22 untapped red sources total. We're calculating failure here, so we want to choose 0.

C_untapped red = 1

C_everything else = (34 choose 6) is the least intuitive part of the calculation. If we want our opening hand to look like 1 Stromkirk Noble and 0 red-producing lands, it must have 6 more cards. We want to be sure that we're not drawing more Nobles or any of those red sources, so we will fill the rest of the hand it with whatever is left in the deck. 60 - 4 - 22 = 34. We fill the rest of our hand with 34 cards from that population with (34 choose 6).

C_everything else = 1,344,904

C_possible draws = (60 choose 7) is the number of ways that we could draw an opening hand of 7. In this case, it's BIG.

C_possible draws = 386,206,920

We multiply the combinations of things that we want and divide by the total number of possible combinations in the sample, and out pops a probability!

P = C_stromkirk * C_untapped red * C_everything else / C_possible draws

P = 4 * 1 * 1,344,904 / 386,206,920

P = 0.0139 (1.39%)

This is great news! If we draw 1 Stromkirk Noble, the odds of not having the mana to cast him on turn 1 is 1.39%. I can sleep at night with this low risky deck design.

As a check, you should always find that the ns in your numerator add up to the n in your denominator. Similarly, the ks in the numerator should add up to the k in the denominator. To check our math, 4 + 22 + 34 = 60 and 1 + 0 + 6 = 7.

Example: On the play, what is the probability of drawing an Ash Zealot by turn 2 and not having the mana to cast it?

There are three ways to fail here, so we have to be wary of the OR condition when evaluating probabilities. When we can fail in one way OR any other number of ways, the probabilities add together.

P = P_1 + P_2 + P_3

P_1 represents the chance of drawing an Ash Zealot and no land.

P_2 represents the chance of drawing an Ash Zealot and only 1 land.

P_3 represents the chance of drawing an Ash Zealot and 2 lands of the wrong kind (1 Mountain and 1 Hellion Crucible)

Failure mode #1 - no lander

C_ash zealot = (4 choose 1)

C_ash zealot = 4

C_land = (24 choose 0) since we're looking to hit 0 land in our 24 land deck.

C_land = 1

C_everything else = (32 choose 7) since we're on turn 2! We will see 8 cards by turn 2 and need to fill the rest of the hand with them.

C_everything else = 3,365,856

C_possible draws = (60 choose 8)

C_possible draws = 2,558,620,845

P_1 = (4 choose 1) * (24 choose

0) * (32 choose 7) / (60 choose 8)

P_1 = 0.0053 (0.53%)

Math check: 4 + 24 + 32 = 60, 1 + 0 + 7 = 8.

Failure mode #2 - 1 lander

C_ash zealot = (4 choose 1)

C_ash zealot = 4

C_land = (24 choose 1) since we're looking to hit just 1 land in our 24 land deck.

C_land = 24

C_everything else = (32 choose 6) since we will see 8 cards by turn 2 and we need to fill the remaining 6 slots in the hand.

C_everything else = 906,192

C_possible draws = (60 choose 8)

C_possible draws = 2,558,620,845

P_2 = (4 choose 1) * (24 choose 1) * (32 choose 6) / (60 choose 8)

P_2 = 0.034 (3.4%)

Math check: 4 + 24 + 32 = 60, 1 + 1 + 6 = 8.

Failure mode #3 - right mana, wrong colors

C_ash zealot = (4 choose 1)

C_ash zealot = 4

C_land_red = (22 choose 1) since we're looking to hit just 1 red land

C_land_red = 22

C_land_other = (2 choose 1) since we have 2 non-red lands in the deck and we want to choose 1

C_land_other = 2

C_everything else = (32 choose 5) since we will see 8 cards by turn 2 and we need to fill the remaining 5 slots in the hand.

C_everything else = 201,376

C_possible draws = (60 choose 8)

C_possible draws = 2,558,620,845

P_3 = (4 choose 1) * (22 choose 1) * (2 choose 1) * (32 choose 5) / (60 choose 8)

P_3 = 0.0139 (1.39%)

Math check: 4 + 22 + 2 + 32 = 60, 1 + 1 + 1 + 5 = 8.

Now that we've analyzed our failure modes, let's look at the probability of Ash Zealot languishing in your hand.

P_sad Zealot = P_1 + P_2 + P_3

P_sad Zealot = 0.53% + 3.4% + 1.39%

P_sad Zealot = 5.32%

The odds are good that your Zealot will be happy and swinging on turn 2.