## Action-selection and learning-rates in Q-learning

Implementing a Q-table reinforcement-learner is in many ways simple and straight-forward and also somewhat tricky. The basic concept is easy to grasp; but, as many have mentioned, reinforcement-learners almost want to work, despite whatever bugs or sub-optimal math might be in the implementation.

Here are some quick notes about the approach I've come to use, specifically about action-selection (e.g. epsilon-greedy versus UCB) and managing learning-rates. They've helped my learners converge to good strategies faster and more reliably. Hopefully they can help you, too!

Continue reading "Action-selection and learning-rates in Q-learning"

## Simulating deck-shuffling

I recently worked on a small project simulating random events that were far too numerous to enumerate. In such cases, every bit of speed matters.

The project in this case was similar to determining likelihood of five-card Poker hands in seven-card draws.

Simulation of shuffling the deck and drawing cards can take a large part of the runtime if not done well, but there's a trick that makes it almost trivial.

## Exact random sums

Sometimes, you need a list of random numbers that sum to a known constant. There's a known algorithm to provide this list of numbers with the proper distribution, but a straight-forward implementation may give a list that doesn't sum exactly to the desired constant because of rounding error.

This article describes the basic algorithm, why the rounding error happens, and the solution.

## Precision of random numbers

In some sense, random numbers uniformly-distributed in the range $$[0, 1)$$ are the easiest class of random number to generate.

Because of the internal representation of floating-point numbers, all you need to do is fill the significand with random bits, set the exponent to -1, and the sign bit to positive.

Some language run-times do this better than others.