Skip to content

Better averages for online machine-learning

Averages are used, in some form or other, and many machine-learning algorithms. Stochastic gradient descent is a great example of an average in disguise, thin though it may be.

Picking the right kind of average can be critical. As learning algorithms explore sub-optimal choices, the resulting negative impact on backed-up state values can persist over epochs, hampering performance. Alternatively, some kinds of average don't converge, preventing the algorithm from settling into optimal outcomes.

Here, I officially release a paper on a particular kind of average that's adaptable like an exponential moving average, but has guaranteed convergence like a simple average.

Continue reading "Better averages for online machine-learning"

Action-selection and learning-rates in Q-learning

Implementing a Q-table reinforcement-learner is in many ways simple and straight-forward and also somewhat tricky. The basic concept is easy to grasp; but, as many have mentioned, reinforcement-learners almost want to work, despite whatever bugs or sub-optimal math might be in the implementation.

Here are some quick notes about the approach I've come to use, specifically about action-selection (e.g. epsilon-greedy versus UCB) and managing learning-rates. They've helped my learners converge to good strategies faster and more reliably. Hopefully they can help you, too!

Continue reading "Action-selection and learning-rates in Q-learning"

Simulating deck-shuffling

I recently worked on a small project simulating random events that were far too numerous to enumerate. In such cases, every bit of speed matters.

The project in this case was similar to determining likelihood of five-card Poker hands in seven-card draws.

Simulation of shuffling the deck and drawing cards can take a large part of the runtime if not done well, but there's a trick that makes it almost trivial.

Continue reading "Simulating deck-shuffling"

Exact random sums

Sometimes, you need a list of random numbers that sum to a known constant. There's a known algorithm to provide this list of numbers with the proper distribution, but a straight-forward implementation may give a list that doesn't sum exactly to the desired constant because of rounding error.

This article describes the basic algorithm, why the rounding error happens, and the solution.

Continue reading "Exact random sums"