You get beautiful designs when you build simple things that compose correctly.- Chris Lattner
I’ve been ramping on on writing software for AI, and Python is the language of AI. So I thought to myself, could I write a Cribbage Engine in Python, wrap it in Gymnasium and then use Stable Baselines to build an RL agent that could play cribbage well through re-enforced learning. Welp, I’m not there yet, because I built the cribbage engine and started doing some statistical analysis and have gone done the rabbit whole of programmatic cribbage and statistical regression.
Cribbage holds a special place in my heart. When I first met Mrs.Chaos somehow the topic of cribbage came up and we learned that both of us had played a lot of cribbage. It is not a popular card game. She beat me at cribbage... a lot. So, all that time ago, being the nerd I was (am?) I wrote a little engine in JavaScript where you put in the six cards in your hand and it would do a very basic calculation to tell what to discard. Running that PWA on my iPhone 3 took around 30 seconds to crunch, I wonder how fast it would go today? Sadly, the program is lost to the land of wind and ghosts.
This time, because I was trying to learn Python, I built the whole cribbage game as an engine with an agent that played randomly. I have THOUGHTS on Python for later. I kept running 100 games and player one would win 70/30 and I figured I must have a bug - and it was player one always had the first crib. So, if you have the first crib in the game, you have a significant advantage. Once I fixed this so that playing multiple games alternates the crib, the win rate balanced out to 50/50. Good! Seems solid.
There are a couple really difficult problems in scoring cribbage. When the run is occuring, a straight of 3-cards can occur out of order and then can be built on. So a run of (5,7,6) scores a run of three, for three. Then a (5,7,6,4) builds on that for a run of four. But a play of (5,7,4,6) does not count as a run of four and scores no points. In scoring a hand, it's important to score a complete run (4,5,6,7) for for points, but then DO NOT score the sub-runs (4,5,6 and 5,6,7) of three.
Then, rather than build the re-enforced learning agent, could I program a decent cribbage player? I find when I start to peel a game apart it makes me a better player. Play begins with six cards dealt to you, so how to decide what to discard? Oh, university combinatorics, how I miss you. C(6,2) (six choose two) and Python does have a nice helper for that: list(combinations(player_hand, 2))
Now, once those two cards are “discarded” to the crib, run through ALL POSSIBLE forty-six cards left in the deck and if it were chosen as the run card, what is the hand worth? Get an average score for the possible hands. Then for the two cards into the crib for ALL POSSIBLE combinations of two from the forty-five cards C(45,2) what is the average value of the crib? If it’s your crib, add it and if it’s the opponents crib subtract it. Pick the highest score and go for it.
There are some missed optimizations here - because based on your opponent you might have different strategies. For a good opponent, the likelihood of any particular discard to the crib will not be random. ACTUALLY, this would be a fascinating statistic to look out. What are the most common discards for this particular agent over time and then use that to adjust the model from an even distribution. Put it on the W0511 list (that's a pylint joke, hysterical)!
After the discard, the other part of the game is what cards to play during the run portion of the game. So here was the strategy I programmed. First, for all the cards in your hand, if you played it, what is the amount of points you would get? Then I nudge this based on a few different strategies. 1) If the run total is going to be less than 5, nudge it up, because that means your opponent can’t get 15. 2) If the run total is going to be exactly 5 or 10, nudge it down, because that means your opponent may more likely get a 15. 3) If the run total was LESS than 15 and your card makes it MORE than 15, nudge up, because you blocked your opponent. 4) Finally, all other things being equal, play the biggest card you can because that helps towards you getting a “Go” for a point.
So that’s the engine. How do you think it does when the OptimizedPlayer goes against the random player? It wins 80/20. So, what I’m reading here is… if you are a really good cribbage player and your opponent is playing randomly, they still win one out of five times. How did Mrs.Chaos beat me so often? I will never know. Maybe I was sub-consciously letting her win.