## Zero determinant strategy for people who can't math good

*in*Literature

I myself cannot math. But social dilemma strategies, such as the new set of "extortion" or zero determinant strategies outlined by Press & Dyson's 2012 article (open access here!), are generally chalk-full of equations that take some time to sift through if you aren't a game theory and/or math/stats wiz.

I, too, looked through the internet for straight-forward interpretations of this fabled zero determinant strategy. I came upon three things:

- This blog post by Mike Shulman has some very nice commentary on Press & Dyson's article. Shulman definitely helps break down some barriers in understanding the paper, but is still a bit math-heavy for those who are math-phobic.
- This 2013 paper by Hilbe, Nowak, & Sigmund which simplifies portions of the extortion strategy equation into specific ratios.
- And the realization that the math isn't too complex if you do it yourself. It comes down to plugging and chugging.

### The Prisoner's Dilemma and strategies

*If you're already familiar with The Prisoner's Dilemma, go ahead and skip this section.*

The Prisoner's Dilemma is a social dilemma game that involves either choosing to cooperate with your partner or not cooperate your partner (defecting against your partner).

The situation is that you and your partner are caught on minor charges by the police. You are brought into separate interrogation rooms. The police want to pin a larger crime on one or both of you. They give you an opportunity to testify against your partner. You can either choose to testify or you can stay silent.

As you can imagine, your partner is also getting this same choice. And therein lies the dilemma: do you cooperate with your partner and stay silent, or do you defect and testify against them?

You can read more about the rules of the game in this other post. In this post, I'll be focusing on more of the game theory dynamics.

True Prisoner's Dilemma payoff matricies have to follow the following rules (first letter is the focal player's action; e.g. *xy* notation):

- DC > CC > DD > CD
- 2(CC) > DC + CD

Using the single-letter payoff nickname notation, it classically looks like

- T > R > P > S
- 2(R) > T+S

This is in fact a Nash Equilibrium, in which there are two competing people and they are trying to make the best decision based off of previous knowledge of that person.

This situation, although clearly framed as a social dilemma, is usually not played in a "one-shot" format. Rather, most studies using The Prisoner's Dilemma are generally played with multiple trials -- so you would encounter this same dilemma 10 or so times. Since you know the choices your partner made previously, you begin to form strategies in order to maximize your points.

One famous strategy is called Tit-for-Tat. Essentially, you choose the choice your partner made one choice ago.

Another strategy is called win-stay, lose-switch. It is exactly how it sounds: if the choice you make is not hurting you, keep choosing it. If you lose points, then change your choice. Some know this as the Pavlovian strategy.

Two other strategies that are common for Prisoner's Dilemma are always-cooperate and always-defect.

### An introduction to zero determinant strategies

*If you're familiar with zero determinant strategy definitions and logic, you can skip to the next section.*

Press & Dyson spend the majority of their introduction discussing whether keeping track of what the outcomes were helps with point maximizing. They come to the conclusion that it actually has no bearing on what points you earn (which is fascinating in itself). Therefore, if there is no reason to remember a string of outcomes previously, what *does* matter? They state that the shorter amount of memory you apply to the game, the more you are the decider of the overall strategy.

Expanding on the idea *strategies need to be simpler* leads to the concept that you can mathematically derive a strategy that more or less preys upon the predictability of other strategies. In short and in general, a zero determinant strategies are strategies computed to "set your own score" within the limits of the payoff matrix. So, in the {5, 3, 1, 0} payoff, it's good to set your own score above 3, as that implies you are existing within the realm of cooperation or defecting when your partner is cooperating. However, how does one set their score? That involves math.

### Zero determinant strategies for people who can't math good

As discussed in the Press & Dyson paper, there surely are a handful of proofs and equations. It gets intimidating if you aren't a math person. Even the essential zero determinant equation:

is completely foreign to me. It stems from seemingly 6 other equations that I have no idea how to even begin to terse out.

But after some hours of reading other papers and calculating it out on my own, I figured out a few things within the extortion strategy -- which is the meat and potatoes of zero determinant strategies:

The article discusses variable χ, which is your extortion factor. As this increases, you become less cooperative after CC ("reward" or "R") and DC ("temptation" or "T") outcomes.

There also exists variable ϕ, which is some sort of rate/probability modulator. The lower limit of ϕ is equal to 0 whereas the upper limit of ϕ is no greater than 0.2 (assuming a {5,3,1,0} payoff matrix). As your ϕ increases towards the upper limit, your chances for cooperation decrease quickly after R outcomes in relation to an increasing extortion factor. If your ϕ is half of the upper limit, your chances of cooperation after R still decreases as your extortion factor increases, but at a lower rate than if it were at the higher limit.

Let's take a look at one example:

Say we have an extortion rate of 2. The upper limit of ϕ is 0.111, whereas half of the upper limit is 0.056. If you encounter a R outcome, the rate at which you would cooperate using the upper limit ϕ would be about 78% for your next choice. If you encounter a R outcome and were abiding by the half-the-limit, the rate you would cooperate at would be about 89% on your next choice.

In comparison, if you encounter a R outcome using the Tit-for-Tat strategy, you would cooperate 100% of the time on your next choice.

Below is a quick graph I made up for both ϕ and half-of-ϕ:

X-axis is your extortion rate. Y-axis is your probability of cooperation after a given outcome. Each outcome is plotted on 4 different lines. One of the rules for this particular strategy is whenever there is a DD (punishment, "P") outcome, you always defect -- that is noted by the yellow line at 0. For a CD (sucker, "S") outcome -- in which you are defected against -- the upper limit ϕ calls for you to always defect (this would be true for a Tit-for-Tat strategy), whereas the half-ϕ calls for you to defect only 50% of the time.

The paper discusses how Tit-for-Tat actually is a zero determinant strategy -- and also the "most fair" strategy out of all the zero determinant/extortion strategies. You can see Tit-for-Tat visualized in the upper limit ϕ graph, extortion rate = 1.

To visualize something more like the Generous Tit-for-Tat (wherein a S condition, rather than always defecting for the next choice, you cooperate 10% of the time), you have to divide ϕ by 1.11. As you can imagine, dividing ϕ by something less than 2 will get you an in-between cooperation rate for R and T choices when compared to the previous graphs.

With an extortion strategy twist on the Generous Tit-for-Tat, we see that there should also exist 10% less cooperation after a T condition.

### Uh, okay. So how do you intuitively propose someone to apply this?

The problem with probabilities and human nature is that we can always calculate something happening between people AFTER it has happened, but determining percentiles in the moment without previous knowledge is difficult. And socially, nearly impossible.

For someone like me, a researcher who would like to train someone how to form strategy in a zero determinant manner in the moment, it requires some watering-down.

And really, it boils down to variants of Tit-for-Tat.

Tit-for-Tat is a zero determinant strategy. You can argue that choosing Tit-for-Tat is zero determinant if the other strategies you are comparing against are *not* zero determinant strategies. Therefore, the difference you may see due to strategy usage may be due to zero determinant strategies versus cooperative and/or deceptive strategies.

We can further modify Tit-for-Tat by saying:

If your partner cooperates, rather than always cooperating, incorporate some rate of defection.

In some fashion, that over-generalization of the extortion strategy is basically the antithesis to the Generous Tit-For-Tat. If you look at the ϕ/1.11 graph, you can actually make an argument that rather than doing the Generous Tit-for-Tat strategy, you could do the "modified extortion" strategy, where you defect 1/10 times after a T condition. That would make your T choices an extortion factor of 1 and is relatively easy to train (*"If you see a T outcome, defect on your next choice once out of ten times you see it."*)

To have a human *exactly* mimic the extortion strategy would require the use of calculators, dice, or some sort of multiple-faced probability decider. Unlike Tit-for-Tat or win-stay, lose-switch, there is no easy rule of thumb. You can choose convenient extortion factors for different outcomes (e.g. with the upper limit ϕ: R extortion factor 1 and T extortion factor 3.5), but that wasn't exactly how this extortion strategy was computed.

Then again, strategy formation is never about what someone else thinks is successful, but rather what the individual feels is both successful in their own terms and also comfortable in usage. The extortion strategy with a high extortion factor is very difficult for a human to exactly follow. Even though it is computationally more advanced than any of the more popular strategies, who could really be comfortable thinking in that manner?

### TL;DR - Within the ZD strategies, the "extortion-2" strategy could be considered the "optimal" ZD extortion strategy.

If you really just want to "apply" an extortion strategy without having to do all this complex math, the seemingly most reported extortion strategy is the ext-2 strategy, which implies an extortion factor of 2. If we assume the limiter is at the upper limit in a {5,3,1,0} game, your cooperation percentages become:

- T = 66.7%
- R = 77.8%
- P = 0%
- D = 0%

To note: if you use a different payoff matrix, for example: {4,3,0,-1}, you end up with different cooperation rates. For this particular payoff matrix (which I commonly use in my research):

- T = 66.7%
- R = 66.7%
- P = 0%
- D = 0%

And now, hopefully, you have a better footing on ZD strategies without having to math good!

Thanks for this helpful post!

This is really a great post.