Can Simulation Entice Cooperation in Non-Zero-Sum Games?

Meinolf Sellmann

Creator of Optimization Solvers, Architect of the ECB Transaction Settlement System, Inventor of Algorithms

Published Aug 9, 2024

In honor of Joe Halpern's 70ies Birthday, a year ago Vincent Conitzer posted this puzzle: https://www.cs.cmu.edu/~conitzer/halpernbirthday.pdf

Can you solve it?

We provide a solution for Game 1.

Let us denote the probability that Player B plays 'left' with p_B, and the probability of a simulated example being presented to Player A with s in [0,1]. Then, we can formalize the expected outcomes and rewards as follows:

The probability p_A of Player A of playing 'top' is determined by three other probabilities p_0, p_1, p_2 in [0,1] which Player A must set as her strategy. p_0 is the probability of playing 'top' when no simulation takes place. p_1 is the probability of playing 'top' if simulation took place and Player B's action sample is 'left.' p_2, finally, is the probability that Player A plays 'top' when simulation took place and Player B's action sample was 'right.'

Based on the strategy (p_0,p_1,p_2), we can determine

In other words, the strategy for Player A has become a function of the strategy of Player B due to the occasional simulation.

Denote with B in R^{2 x 2} the reward matrix for Player B. The expected reward for Player B then is

Using Equation 1, and setting

we get

Therefore, given Player A’s strategy (p_0, p_1, p_2), Player B should choose p_B such that the above quadratic term is maximized.

In our game, we have

and therefore

As long as p_1 - p_2 = q > 2/3 , Player B will want to play the (pure) strategy p_B = 1, which means Player B will always play ’left.’

Recommended by LinkedIn

What does it mean to approach IRL like we do video…

Cristian Llop 2 years ago

Iron Gut Publishing celebrates partnership with video…

Anthony Marks 7 years ago

Role-Playing Games: Creating a Character, Part I

Chrissy Mackey, PhD 6 years ago

Consider the strategy (p0 = 1, p1 = 1, p2 = 0) for Player A, which implies q = 1 and therefore p_B = 1. Since Player B now only plays ’left,’ Player A will always play ’top’ (p_2 has become immaterial since no simulated sample will ever be ’right’). At the same time, this is the optimal outcome for Player A since no other cell offers a greater reward for her than top-left. Therefore, Player A has no incentive of changing her strategy. At the same time, given this strategy for Player A, as we have seen, Player B also has no incentive of changing his strategy either. We have therefore found an equilibrium with rewards (2, 3). Simulation has led to cooperation in this example.

Practical Gameplay

In the game above, we found that the reward matrix of the simulated player had equal sums on their two main diagonals, leading to x = 0 which turned the quadratic optimization problem for Player B into a linear optimization problem. Now, consider a game where this is not the case:

This time, we will simulate an action of Player B with probability 0.7.

a) Can you devise a strategy for Player A? Is there an equilibrium that is strictly greater than (2,2) for both players?

b) Consider a variant where Player B continues to want to maximize his expected reward, while Player A also wants to manage her risk of receiving a negative reward. Particularly, what strategy should Player A play if she wants to keep her expected rewards above 2.09 while minimizing her chances of running a loss?

Answering these questions in theory is very demanding. In practice, we may therefore revert to using a tool for optimizing our strategy. The stochastic optimizer InsideOpt Seeker, for example, can answer both questions with the following simple program:

import seeker as skr

def halpern(s, A, B):
    z = B[1][0] - B[1][1]
    y = B[0][1] - B[1][1]
    x = B[0][0] - B[1][0] - y

    env = skr.Env("license.sio", stochastic=True)
    env.set_restart_likelihood(0.05)
    p = [env.continuous(0, 1) for _ in range(3)]
    q = p[1] - p[2]
    quadratic_coeff = s * x * q
    mixed_case = quadratic_coeff < 0
    linear_coeff = s * y * q + (s * x) * p[2] + 
                       ((1 - s) * x) * p[0] + z
    b_mixed = env.max([env.convert(0), env.min([env.convert(1),
                       -linear_coeff / 
                       (2 * quadratic_coeff)])])
    b_extreme = env.if_(quadratic_coeff + linear_coeff > 0,
                        1, 0)
    p_B = env.if_(mixed_case, b_mixed, b_extreme)
    p_A = s * (p_B * q + p[2]) + (1 - s) * p[0]
    draw = [env.continuous_uniform(0, 1) for _ in range(2)]
    plays = [env.if_(draw[0] < p_A, 0, 1),
             env.if_(draw[1] < p_B, 0, 1)]
    reward_A = env.index([plays[0], plays[1]], env.tensor(A))
    reward_B = env.index([plays[0], plays[1]], env.tensor(B))
    exp_reward_A = env.aggregate_mean(reward_A)

    # maximize reward for Player A for 60 seconds
    env.maximize(exp_reward_A, 60)
    exp_reward_B = env.aggregate_mean(reward_B)
    prob_A = env.aggregate_relative_frequency_geq(reward_A, 0)
    env.evaluate()
    print("Maximizing Expected Profit")
    print("Prob A >= 0:", prob_A.get_value())
    print("Reward A:", exp_reward_A.get_value())
    print("Reward B:", exp_reward_B.get_value())
    print("Strategy A:", p_A.get_value(), 
              [p[i].get_value() for i in range(3)])
    print("Strategy B:", p_B.get_value())
    print("Evaluations", env.get_number_evaluations())
    print()

    # constrain the expected profit
    env.enforce_geq(exp_reward_A, 2.09)
    # maximize probability to achieve at least profit 0
    env.maximize(prob_A, 60)
    print("Maximizing Chance of Profit>=0 (exp profit>=2.09)")
    print("Prob A >= 0:", prob_A.get_value())
    print("Reward A:", exp_reward_A.get_value())
    print("Reward B:", exp_reward_B.get_value())
    print("Strategy A:", p_A.get_value(), 
              [p[i].get_value() for i in range(3)])
    print("Strategy B:", p_B.get_value())


if __name__ == "__main__":
    A = [[3, 0], [-1, 2]]
    B = [[5, 9], [0, 2]]
    halpern(0.7, A, B)

As a result we get:

In the case of the simple maximization of the expected profits for both players, we find that the solver chooses to set p_0 = 0.877. As x = -2 in this game, lowering p_0 is an interesting option for getting Player B to cooperate more: We see that he plays 'left' with probability 85.66%. The equilibrium is achieved at (2.1055, 4.8377).

In the case where we minimize Player A’s risk of running a negative reward, we find that Seeker increases p_0 = 1, which makes it beneficial for Player B to defect more. As a result, the expected reward for Player A declines slightly, but the risk of running a loss is reduced by 1.2%.

Conclusion

As you see, InsideOpt Seeker allows us to devise optimized strategies in competitive environments, even in the face of stochasticity and non-linearities. After all, not everyone is lucky enough to employ a Joe Halpern!

Happy Birthday, Joe!

Can Simulation Entice Cooperation in Non-Zero-Sum Games?

Meinolf Sellmann

Creator of Optimization Solvers, Architect of the ECB Transaction Settlement System, Inventor of Algorithms

Recommended by LinkedIn

Practical Gameplay

Conclusion

InsideOpt Weekly Puzzle

2,439 followers

More articles by Meinolf Sellmann

Insights from the community

Others also viewed

The Golden Age of Games

GUEST BLOG [In]Securities: Mind Games by Aegis Frumento Esq (BrokeAndBroker.com Blog)

'Final Fantasy 7 Remake' Release Date Confirmed For 2018? See Details

Overpowered

DESTINATION GAMES — Current state and future of digital games economy

My PhD Literature Search Notes-6: Equilibrium Points in N-person games

Invention: The Remote Card Game System and Device

"D/Generation is a game everyone should play"

7 Manly Qualities Your Son Won't Learn Playing Video Games or Watching News

NYT: Interactive Games/Sims Can Decrease Decision-Making Bias

Explore topics

Recommended by LinkedIn

Practical Gameplay

Conclusion

InsideOpt Weekly Puzzle

2,439 followers

More articles by Meinolf Sellmann

Ant Race

Precision Farming

Tariffs

Balancing Supply and Demand

OR 2025-2030

Game of Life

Non-Linear Optimization

More Needles, More Haystack: Or The Irony of Optimization

The Unquantified Risk

A Problem Larger Than the Universe

Insights from the community

Others also viewed

The Golden Age of Games

GUEST BLOG [In]Securities: Mind Games by Aegis Frumento Esq (BrokeAndBroker.com Blog)

'Final Fantasy 7 Remake' Release Date Confirmed For 2018? See Details

Overpowered

DESTINATION GAMES — Current state and future of digital games economy

My PhD Literature Search Notes-6: Equilibrium Points in N-person games

Invention: The Remote Card Game System and Device

"D/Generation is a game everyone should play"

7 Manly Qualities Your Son Won't Learn Playing Video Games or Watching News

NYT: Interactive Games/Sims Can Decrease Decision-Making Bias

Explore topics