Continuous Blackjack

Recently I had an interview where as an extra “bonus” question at the end I was asked an interesting maths problem. With a couple of hints from the interviewer, I was able to sketch out a rough solution, however afterwards I wanted to look up a proper solution just to verify it. Interestingly, I wasn’t able to find one (I’m sure it’s out there, I just need to look harder). Anyway, I thought it was a nice little problem and so I thought it was worth posting what I believe to be the correct solution.

The problem is this – consider that you are playing a game with one other person, and you are going to be going first. The rules are you pick a number randomly from the Uniform(0,1) distribution (i.e. a random number between 0 and 1). You then decide either to stick with this total, or you play on and choose another such random number. You can do this as many times as you like, however if the sum total of the numbers you pick goes over 1, you go bust and automatically lose. If you decide to stick with a number less than 1, the other player has a go and plays by the same rules. The person who sticks at the higher number (or doesn’t go bust when their opponent does) is the winner. Clearly it is an advantage to go second, and the optimal strategy for player 2 is extremely simple – keep playing until you get a higher total than your opponent or until you go bust. The question is, given that you are player 1 what is the best strategy you can adopt?

The first thing to realise is that our optimal strategy will be divided at some number t, which I shall call the “decision boundary”, and where if we have a sum less than t we will draw a new number, and if we have a sum greater than t we will stick. We can then think about what the probability of winning is, given that we stick at a particular value t. This is equal to 1 minus the probability that we lose – and the probability that we lose is the probability that the second player is able to land their sum within the interval \([t,1]\), given that they play on until they either reach this interval or go bust. To go about calculating this, let us define \(P_t[x]\) to be the probability that we are able to land in the interval \([t,1]\), given that we are currently at x and definitely going to play on if we have not yet reached t. We can write down an equation for this as follows:

\(P_t[x] = (1-t) + \int_x^t P_t[y]dy\)

where the first term is the probability that we reach the interval in the next turn, and the second term is the integral of the probability of reaching y < t (=1 as we are drawing uniform(0,1) random variables) multiplied by the probability that we reach the interval \([t,1]\) starting from y. We can convert this from an integral equation into an ODE:

\(\frac{dP_t[x]}{dx} = -P_t[x] \ \ \ \implies P_t[x] = A e^{-x} \)

We can obtain the constant A by noting that \(P_t[t] = 1-t \), and hence that \(A = (1-t)e^t\). This means that:

\(P_t[x] = (1-t)e^{t-x}\). Now, player 2 starts from \(x=0\), and so the probability of losing given that we stuck at a value t is simply \(P_t[0] = (1-t)e^t\). The probability that we win is \(1-P_t[0] = 1-(1-t)e^t\).

The final step we need to solve this is to say that the threshold at which our strategy changes should be the following point: where the probability of winning given that we stick at t is exactly the same as the probability that we win given that we choose one more number. We can write this condition as:

\(1-(1-t)e^t = \int_t^1 \left[ 1-(1-t’)e^{t’}\right]dt’ = (1-t) – e^t(t-2) – e \)

This gives a non-linear equation for the optimal decision boundary t, which cannot be re-arranged nicely but numerically we can solve to find that \(t \approx 0.57\). That is, if our sum is less than approximately 0.57 we should pick another number, and if it’s more we should stick!

Author: Henry Charlesworth

I am currently a PhD student at the University of Warwick, based in the Centre for Complexity Science. My research has been looking at how the principle of "future state maximisation" can be applied to both explaining behaviour in animal systems as well as generating behaviour in artificial systems. Loosely speaking, this is the idea that taking everything else to be equal it is preferable to keep your options open as much as possible. I am also very interested in artificial intelligence and machine learning, particularly reinforcement learning.

Leave a Reply

Your email address will not be published. Required fields are marked *