Better averages for online machine-learning
Averages are used, in some form or other, and many machine-learning algorithms. Stochastic gradient descent is a great example of an average in disguise, thin though it may be.
Picking the right kind of average can be critical. As learning algorithms explore sub-optimal choices, the resulting negative impact on backed-up state values can persist over epochs, hampering performance. Alternatively, some kinds of average don't converge, preventing the algorithm from settling into optimal outcomes.
Here, I officially release a paper on a particular kind of average that's adaptable like an exponential moving average, but has guaranteed convergence like a simple average.
Continue reading "Better averages for online machine-learning"