## The correct way to start an Exponential Moving Average (EMA)

The EMA is a very handy tool. It lets us calculate an average over recent data. But, unlike a Simple Moving Average, we don't have to keep a window of samples around—we can update an EMA "online," one sample at a time.

But the perennial question is: how do you start an EMA?

First, here are a couple of wrong ways.

Let's assume that we have incoming data that looks like this:

x <- c(1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0)


The most straight-forward way to start an EMA is to simply let it take on some arbitrary constant (usually 0) as its initial value. This means the first values that the EMA returns will be biased towards this constant, and we have to feed in enough samples to "warm it up" before we can get decent numbers out.

It's simple to implement:

make.ema0 <- function (r) {
s <- 0
list(
update=function (x) {
s <<- r*s + (1-r)*x
}
)
}

m0 <- make.ema0(0.7)

for (i in 1:length(x)) {
y0[i] <- m0$update(x[i]) }  The common alternative is to take the first sample as the initial value for the EMA. Code for that looks like: make.ema1 <- function (r) { started <- FALSE s <- NULL list( update=function (x) { if (!started) { started <<- TRUE s <<- x } else { s <<- r*s + (1-r)*x } } ) } m1 <- make.ema1(0.7) for (i in 1:length(x)) { y1[i] <- m1$update(x[i])
}


In both cases, we're committing what's essentially the same mistake. In the first case, we're treating the EMA as if we've seen an infinite string of 0s before our first real sample. In the second, we're treating it as if we've seen an infinite string of our first sample before getting started.

Either way, we have to give enough time for the EMA to "warm up," which really means that we need to give it enough time that the impact of the imaginary infinite sequence becomes negligible.

The correct approach is to actively account for how much data has gone into the EMA, versus how much of the EMA's value is from phantom data before our samples arrived.

Let's set our baseline.

Let's take $$r = 0.5$$, and our first 3 samples to be, in order: 3, 4, 5. Taking the exponentially-weighted sum, and dividing by the summed weights, gives us our expected EMA:

$$s = \frac{3 \cdot 0.5^2 + 4 \cdot 0.5^1 + 5 \cdot 0.5^0}{0.5^2 + 0.5^1 + 0.5^0} \approx 4.428571$$

However, if we use the first method of starting the EMA at 0, we would get $$3.875$$. If we use the second method of initializing with the first sample, we'd get $$4.25$$.

We could assume that we have, in fact, seen an infinite sequence of data, initialize our EMA according to that, and then try to remove its effect. To keep things simple, we'll see that we've seen an infinite sequence of a single value, $$\alpha$$. Our equation would then look like this:

$$s = \frac{\ldots + \alpha \cdot 0.5^4 + \alpha \cdot 0.5^3 + 3 \cdot 0.5^2 + 4 \cdot 0.5^1 + 5 \cdot 0.5^0} {\ldots + 0.5^4 + 0.5^3 + 0.5^2 + 0.5^1 + 0.5^0}$$

We want to remove the effect of the infinite sequence of $$\alpha$$ from both the numerator and the denominator. If we let $$\alpha = 0$$, then the numerator takes care of itself:

$$s = \frac{3 \cdot 0.5^2 + 4 \cdot 0.5^1 + 5 \cdot 0.5^0} {\ldots + 0.5^4 + 0.5^3 + 0.5^2 + 0.5^1 + 0.5^0}$$

All that's left is to scale the result appropriately to account for the extra weights in the denominator.

The following code performs this correction and gives the correct EMA:

make.ema2 <- function (r) {
s <- 0
extra <- 1
list(
update=function (x) {
s <<- r*s + (1-r)*x
extra <<- r*extra
s / (1-extra)
}
)
}


Here's the new method against the original data, with the two alternatives:

This correction may not matter much if all you care about is the average of your data, but the same technique can be used to get a meaningful and useful Exponential Moving Variance and other related values. From those, it's possible to construct a completely online regression to fit data as it comes in, with meaningful confidence and prediction intervals.