The Bunny in 100 Holes

jump bunny jump
Problem goes like this:
- There are 100 holes in a line and there is bunny🐇 hiding in one of them.
- You can only look in one hole at a time, every time you look it jumps to adjacent hole.
What’s your strategy?
Take a breath. Think before reading on.
I found this through Uli’s blog. Uli tried the obvious “smart” strategy (always check the most probable hole) and found it barely beat random guessing. That bothered me. So I went down a rabbit hole. Yes, I know.
What I thought would be a quick afternoon turned into bandit algorithms and eventually a lesson I should have learned much earlier: make sure you’re measuring the right thing before you start optimizing.
§ Randomly picking hole:
The naive strategy: just pick a random hole each time, no thinking involved. In code:
import random
# No of holes:
N = 100
def jump(hole):
if hole == 0:
return hole + 1
elif hole == N - 1:
return hole -1
else:
return hole + random.choice([-1, 1])
def play():
# pickup random hole
hole = random.randint(0, N - 1)
t = 0
while True:
t += 1
guess = random.randint(0, N - 1)
if hole == guess:
print(f'Found the bunny in hole {guess} after {t} tries!')
break
else:
print(f'[{t}] Guessed {guess} (true is {hole})')
# wrong hole, bunny jumps:
hole = jump(hole)
return t
if __name__ == "__main__":
play()
When you run it you will see the number of tries it took to find the bunny:
$ uv run bunn.py
[1] Guessed 29 (true is 55)
[2] Guessed 75 (true is 56)
[3] Guessed 68 (true is 55)
...
[175] Guessed 28 (true is 55)
[176] Guessed 74 (true is 54)
Found the bunny in hole 55 after 177 tries!
One simple way to find out how good my strategy is to find out how well it does on average:
r = [play() for _ in range(10000)]
import statistics as stats
mean, median, stdev = stats.mean(r), stats.median(r), stats.stdev(r)
print(f'mean: {mean} median: {median} stddev {stdev}')
I get this:
mean: 99.3195 median: 70.0 stddev 97.85726556670393
About 100 guesses on average. Half the time I finish in 70 or fewer. But sometimes it drags past 500. Very unpredictable.
§ Getting smarter: track where the bunny probably is
Random guessing throws away all information. Every time I check a hole and miss, I know the bunny is not there. And since the bunny can only move one step, I also know something about where it went. Let’s use that.
The idea: maintain a probability distribution P over all 100 holes. Start uniform (bunny equally likely anywhere). After each miss, update two things. First, zero out the hole I just checked and renormalize. Second, spread the distribution one step to account for the bunny jumping.
def guess(P, j):
# bunny is not in hole j, remove it and renormalize
s = sum(P) - P[j]
return [0 if i == j else P[i] / s for i in range(N)]
def jumps(P):
# bunny took one random step, diffuse the distribution
Pn = P.copy()
Pn[0] = P[1] * 0.5
Pn[N-1] = P[N-2] * 0.5
for j in range(1, N-1):
Pn[j] = 0.5 * P[j-1] + 0.5 * P[j+1]
return Pn
Then always check argmax(P), the most probable hole. After every miss, zero out that hole, renormalize, and diffuse to neighbors to account for the jump. Makes sense in theory. Let’s see.
mean: 94.5 median: 50.5 stddev: 118.4
Barely beats random on mean. Worse stddev. What went wrong?
The belief state P[] is actually correct. The problem is what I do with it. Always picking argmax(P) causes a ping-pong trap. I check hole 50, miss, probability peaks split to 49 and 51. I check 51, miss, peak shifts to 50 and 52. I check 50 again. I keep bouncing in the same neighborhood while the bunny has long since wandered off to hole 30. I am laser-focused on a region the bunny already escaped.
The information is good. The decision rule on top of it is not.
§ Getting excited: bandit algorithms
This felt like a classic exploration vs exploitation problem. I have a good estimate of where the bunny is (exploit that), but I also need to check holes I have not visited in a while in case the bunny drifted there (explore those). There is a whole field built around this exact tradeoff: multi-armed bandits.
The standard algorithm is UCB (Upper Confidence Bound). Add a bonus to each option based on how long it has been since I tried it. Holes I have not checked recently get boosted, which forces me to spread attention over time.
def ucb_choice(P, visits, t, c=1.5):
scores = [P[i] + c * np.sqrt(np.log(t) / (visits[i] + 1))
for i in range(N)]
return argmax(scores)
Result:
mean: 79.9 median: 56.0 stddev: 76.5
Mean dropped from 94 to 79. I got excited. I started tuning c, trying linear staleness instead of the log-sqrt formula, trying hybrid approaches. Each run felt like progress because the mean kept inching down. Linear staleness (P[i] + c*(t - last_visit[i])), gradient-aware scoring, anti-ping-pong penalties. Eventually got the mean down to around 75.
I thought I was winning.
§ The column I was not looking at
Here is the full comparison I should have been looking at from the very beginning.
Strategy Mean Median StdDev Max
------------------------------------------------
random 97.3 65.0 97.6 727
argmax 94.5 50.5 118.4 1021
ucb_c1.5 79.9 56.0 76.5 574
sweep 98.5 100.0 58.1 198
Look at the sweep row. Just scanning left to right, then right to left, repeating forever. The mean looks bad at 98.5, worse than UCB’s 79.9. I had been dismissing it the whole time based on that one number. But look at the last column. Max of 198. Always. No exceptions across 10,000 runs.
UCB occasionally takes 574 steps. My best staleness strategy hits 620 in the worst case. The sweep never exceeds 198.
The reason is a simple physical argument. The bunny moves one step per turn. The sweep also advances one position per turn. The bunny cannot outrun a sweep. On a 100-hole line, one full back-and-forth covers every hole in exactly 2x(100-1) = 198 steps, and that bound is guaranteed.
def sweep_position(t):
period = 2 * (N - 1) # 198
pos = t % period
if pos < N:
return pos # forward: 0 to 99
else:
return period - pos # backward: 99 to 0
The stddev tells the same story. Sweep gets 58. UCB gets 76. All my staleness variants get 75 or higher. The boring sweep is the most consistent strategy by a wide margin.
I spent days pushing the mean from 94 down to 76, completely ignoring the max and stddev columns. The sweep had better variance and a hard upper bound the whole time. I just was not looking.
§ Why mean was the wrong thing to optimize
UCB was designed for stationary slot machines. Each arm has a fixed unknown payout and you want to find the best one over time. Here the “arms” are not stationary. The bunny moves. UCB has no model of that movement at all. Its exploration bonus happened to force rotation, which happened to reduce the mean, but this was accidental. UCB has no worst-case guarantee because it was never built to think about a moving target.
The sweep encodes exactly one fact: the bunny is a 1D random walker and cannot teleport. That physical constraint is where the hard guarantee comes from. No bandit algorithm can exploit that constraint because bandit algorithms do not model the target at all.
Depending on what I actually care about, the right answer is very different:
| Goal | Best strategy |
|---|---|
| Minimize average steps | UCB or lin_staleness (~76 mean) |
| Minimize worst case | Sweep (198 steps, guaranteed) |
| Minimize variance | Sweep (StdDev 58 vs 76+) |
| Search and rescue, anything high stakes | Sweep, not even close |
§ One more twist: what if the bunny is smart?
Everything above assumed the bunny moves randomly. What if it moves against me, always jumping away from the hole I just checked?
Holes: 1, 2, 3, 4, 5. I just checked hole 3.
Random bunny in hole 2: goes to 1 or 3 with equal probability
Adversarial bunny in hole 2: always goes to 1, never toward the checked hole
Now the sweep fails completely. A bunny that knows my pattern can stay one step ahead of a predictable sweep forever. The problem becomes pursuit-evasion, and I need randomness in my own strategy to have any guarantee. Everything changes.
That is a rabbit hole for another day.
*Full source code here. Problem originally from Uli’s blog.