Exact random sums

Posted by David Owen on 2020-06-16

Sometimes, you need a list of random numbers that sum to a known constant. There's a known algorithm to provide this list of numbers with the proper distribution, but a straight-forward implementation may give a list that doesn't sum exactly to the desired constant because of rounding error.

This article describes the basic algorithm, why the rounding error happens, and the solution.

Basic algorithm

Many approach the problem with a simple but incorrect algorithm: generate a list of random numbers (the uniform distribution is often used), then rescale them to have the desired sum.

def rsum_wrong(m):
    x = [random.random() for i in range(m)]
    a = sum(x)
    return [b/a for b in x]

x = rsum_wrong(4)
x

0.3700578276458176

0.3321644972830869

0.18431049041512548

0.1134671846559699

sum(x)

1.0

The random numbers from this algorithm have a tendency to bias toward 1/N (for N random numbers) more strongly than they should.

The correct algorithm is to generate N-1 uniformly-distributed numbers, include 0 and 1, sort them, then calculate the successive differences.

def rsum1(m):
    x = [0.0, 1.0] + [random.random() for i in range(m - 1)]
    x.sort()
    return [x[i + 1] - x[i] for i in range(m)]

rsum1(4)

0.4049341374504143

0.10634058391819423

0.2725238676661641

0.21620141096522738

Rounding error

If you use the basic algorithm as-is, you will get a list of numbers that correctly sum to exactly 1.0.

def rsum(n, m):
    x = []
    for i in range(n):
        x.append(rsum1(m)) \
    return x

x = rsum(10, 3)
x

0.30331272607892745	0.17328422807342836	0.5234030458476442
0.5833820394550312	0.324730845740304	0.09188711480466483
0.28183784439970383	0.22284901141768643	0.49531314418260974
0.6183689966753316	0.1374352074818923	0.24419579584277606
0.25050634136244054	0.6592399146057996	0.0902537440317599
0.8102172359965896	0.17256824004106353	0.017214523962346906
0.3101475693193326	0.5920183811202501	0.0978340495604173
0.7298317482601286	0.16900653970786483	0.10116171203200652
0.47214271545271336	0.2118412164627279	0.31601606808455873
0.1007012080683658	0.3334706273854179	0.5658281645462163

[sum(y) for y in x]

1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0

However, if you need the list to sum to some other constant, you could run into problems. Simply scaling the output produces results with round-off problems:

def rsum(n, m, a=1.0):
    x = []
    for i in range(n):
        x.append([a*b for b in rsum1(m)]) \
    return x

random.seed(9)
a = 10000
x = rsum(10, 3, a)
[sum(y) - a for y in x]

0.0
1.8189894035458565e-12
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.8189894035458565e-12

As we already know, the uniform random numbers have precision down to a fixed power of 2. Subtracting any two uniform randoms will then be precise down to the same fixed power of 2.

Multiplying is different, though. If a random is in the range of \([0.5, 1.0)\), it requires all of the available bits to represent its value. Multiplying such a number by anything with multiple bits set means that some rounding must occur.

For example, assume 4-bit precision, multiplying a random 0.1011 by constant 11: the answer is \(1.011 + 0.1011 = 10.0001\), which would then round to 10.00.

If a random is in a lower range, however, then counting off after whichever bit is the leading set bit, it will have some "spare" bits in the low-order end that are set to 0. This means that the result of multiplication essentially has some extra room, meaning rounding may not be necessary at all, or at least round off fewer set bits.

For example, multiplying a random 0.0011 by constant 11: the answer is \(0.0110 + 0.0011 = 0.1001\), which is unaffected by rounding.

In order to maintain precision down to the same power of 2 for all numbers, a sacrificial leading bit must be set. This can be done by adding 1.0 to the uniform randoms. At that point, multiplying the uniform randoms (which are now in the range \([1.0, 2.0)\)) will be done at a constant precision in terms of powers of 2. As successive differences are calculated, they will then all be clipped identically; the added offset will also be canceled out without further work.

def rsum1(m, a=1.0):
    x = [0.0, 1.0] + [random.random() for i in range(m - 1)]
    x = [(b + 1.0) * a for b in x]
    x.sort()
    return [x[i + 1] - x[i] for i in range(m)]

def rsum(n, m, a=1.0):
    x = []
    for i in range(n):
        x.append(rsum1(m, a)) \
    return x

random.seed(9)
a = 10000.0
x = rsum(100, 3, a)
[sum(y) - a for y in x]

0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

David Owen’s blog