Sampling in Floating Point (2/3): 1D Intervals
After learning about Walker’s
algorithm
for uniformly sampling
The good news is that it was a fun thought exercise with things learned along the way. In the end I found enough insights to come up with a solution. However, upon further digging I found two previous implementations of the approach I came up with. So much for a tour de force paper describing my findings in ACM Transactions on Modeling and Computer Simulation. They are:
- Christoph Conrads’s Rademacher Floating Point Library, which dates to 2018. See the make_uniform_random_value() function there.
- Olaf Bernstein’s dist_uniformf_dense() which seems to date to 2021.
(I’d be interested to hear if there are any earlier instances of it.)
Nevertheless, I still thought it might be useful to write up some of the motivation, walk through my route to a solution, and explain some of the subtleties.
What’s Wrong With Linear Interpolation?
If you want to sample a value in
First one must choose an equation with which to linearly interpolate. For an
interval
The first,
In graphics, respecting intervals is important since we often do things
like bound vertex positions at different times, linearly interpolate them,
and then want to be able to assert that the result is inside the bounds. In this
case,
Both of these approaches also suffer from a minor bias for reasons similar
to why dividing a random 32-bit value by
For more on the above, including an entertaining review of how linear interpolation is implemented in assorted programming languages’ standard libraries and assorted misstatements in their documentation about the characteristics of the expected results, see Drawing random floating-point numbers from an interval, by Frédéric Goualard.1
A Few Utility Functions
Before we go further, let’s specify a few utility functions that will be used in the forthcoming implementations. (All of the following code is available in a small header-only library.)
For efficient sampling of the
int CountLeadingZeros(uint64_t value);
We’ll assume the existence of two functions that provide uniform random values.
uint64_t Random64Bits();
uint32_t Random32Bits();
C++20’s std::bit_cast()
function makes it easy to convert from a float32 to
its bitwise representation and back.
uint32_t ToBits(float f) { return std::bit_cast<uint32_t>(f); }
float FromBits(uint32_t b) { return std::bit_cast<float>(b); }
A few helper functions extract the pieces of a float32. (If necessary, see
the Wikipedia
page
for details of the in-memory layout of float32s to understand what these
are doing.) Zero is returned by SignBit()
for positive values and one
for negative. Because the float32 exponent is stored in memory in biased
form as an unsigned value from zero to 255, Exponent()
returns the
unbiased exponent, which ranges from
int SignBit(float f) { return ToBits(f) >> 31; }
int Exponent(float f) { return ((ToBits(f) >> 23) & 0xff) - 127; }
constexpr int SignificandMask = (1 << 23) - 1;
int Significand(float f) { return ToBits(f) & SignificandMask; }
We’ll also find it useful to be able to generate uniform random significands.
uint32_t RandomSignificand() { return Random32Bits() & SignificandMask; }
Float32FromParts()
constructs a float32 value from the specified pieces.
The assertions document the requirements for the input parameters.
float Float32FromParts(int sign, int exponent, int significand) {
assert(sign == 0 || sign == 1);
assert(exponent >= -127 && exponent <= 127);
assert(significand >= 0 && significand < (1 << 23));
return FromBits((sign << 31) | ((exponent + 127) << 23) | significand);
}
A positive power-of-two float32 value can be constructed by shifting the biased exponent into place.
float FloatPow2(int exponent) {
assert(exponent >= -126 && exponent <= 127);
return FromBits((exponent + 127) << 23);
}
Expressed with these primitives, Reynolds’s pragmatic
compromise
algorithm for uniformly sampling in
float Sample01() {
uint64_t bits = RandomBits64();
int significand = bits & SignificandMask;
if (int lz = CountLeadingZeros(bits); lz <= 40)
return Float32FromParts(0, -1 - lz, significand);
return 0x1p-64f * significand;
}
First Steps
Back to our original question: can we do better than linear interpolation
to sample an arbitrary interval, and more specifically, is it possible to
generalize Walker’s algorithm to remove the limitation of sampling over
An easy first case is to consider intervals that start at zero and end with
an arbitrary power of two. I first took the smallest step possible and
thought about
Have we missed anything? We should be careful. To further validate this
direction, consider the case where we have a tiny power-of-two sized
interval, say
We sample the
On this topic, Walker wrote:
In practice, the range of the exponent will be limited, and the probability of the number falling into either of the two smallest intervals will be the same.
Denormalized numbers were invented after his paper so it seems that this was a minor fudge in his original approach, corrected today by advances in floating-point.
Here is a function to sample a float32 exponent for this case, taking 64
random bits at a time and counting zeros to sample the distribution. An
exponent is returned either if a one bit is found in the random bits or if
enough zero bits have been seen to make it to the denorms. In either case,
if the denorms have been reached,
int SampleToPowerOfTwoExponent(int exponent) {
assert(exponent >= -127 && exponent <= 127);
while (exponent > -126) {
if (int lz = CountLeadingZeros(Random64Bits()); lz == 64)
exponent -= 64;
else
return std::max(-127, exponent - 1 - lz);
}
return -127;
}
Given SampleToPowerOfTwoExponent()
, the full algorithm to
uniformly sample an interval
float SampleToPowerOfTwo(int exponent) {
int ex = SampleToPowerOfTwoExponent(exponent);
return Float32FromParts(0, ex, RandomSignificand());
}
An implementation that uses a fixed number of random bits can be found with
a straightforward generalization of Reynolds’s “pragmatic” algorithm that
always consumes only 64 random bits, though there are two differences
compared to the
The second difference is that if all of the bits used to select an interval
are zero, then if the initial exponent is
float SampleToPowerOfTwoFast(int exponent, uint64_t bits) {
int significand = bits & SignificandMask;
int lz = CountLeadingZeros(bits);
if (lz == 41 && exponent - 41 > -127)
return significand * 0x1p-23f * FloatPow2(exponent - 41);
int ex = exponent - 1 - lz;
return Float32FromParts(0, std::max(-127, ex), significand);
}
Another easy case comes with an interval where both endpoints have the same exponent. In that case, the spacing between them is uniform and a value can be sampled by randomly sampling a significand between theirs. That setting is shown on the floating-point number line below; the values marked in red are the possible results, depending on the significand.
The code is easy to write given a RandomInt()
function that returns a
uniform random integer between 0 and the specified value, inclusive:
float SampleSameExponent(float a, float b) {
assert(a < b && Exponent(a) == Exponent(b));
int sa = Significand(a), sb = Significand(b);
int sig = sa + RandomInt(sb - sa - 1);
return Float32FromParts(SignBit(a), Exponent(a), sig);
}
Arbitrary (Power of 2) Lower Bounds
The easy successes stop coming when we consider intervals with a
power-of-two value at their lower bounds: say that we’d like to sample
uniformly over
Here we most definitely see the importance of the denorms and the last power-of-two sized interval of normal floating-point numbers having the same width. With a power-of-two interval that ends above zero, we no longer have two intervals at the end that should be sampled with the same probability and things fall apart.
Upon reaching this realization, I had no idea how to proceed; I feared that
the cause might be lost. Lacking any other ideas, I wondered if it would
work to apply Walker’s approach still with the probability
With this method, the probability of sampling the
SampleExponent()
implements the algorithm that consumes random bits until
it successfully samples such a single power-of-two interval.
int SampleExponent(int emin, int emax) {
int c = 0;
while (true) {
if (int lz = CountLeadingZeros(Random64Bits()); lz == 64)
c += 64;
else
return emax - 1 - ((c + lz) % (emax - emin));
}
}
If emin
and emax
are not known at compile time, computing the integer
modulus in SampleExponent()
may be expensive. Because the maximum value
of emax-emin
is 253, it may be worthwhile to maintain a table of
constants for use with an efficient integer modulus algorithm (see e.g.,
Lemire et al. 2021.)
With SampleExponent()
in hand, the full algorithm is straightforward.
// Sample uniformly and comprehensively in [2^emin, 2^emax).
float SampleExponentRange(int emin, int emax) {
assert(emax > emin);
int significand = RandomSignificand();
return Float32FromParts(0, SampleExponent(emin, emax), significand);
}
For a “pragmatic” version of this algorithm that uses a fixed number of
random bits, we could take the number of leading zeros modulo the number of
power-of-two sized intervals under consideration to choose an interval and
then use a uniform random significand. However, in the rare case where all
of the bits used to sample an interval are zero, the remaining interval is
of the form
Partial Intervals at One or Both Ends
With that we finally have enough to return to the original task, uniformly
and comprehensively sampling an arbitrary interval
An approach based on rejection sampling can be used to sample the specified
interval: the idea is that we will sample all of the possible intervals as
before, with probability according to their width. Then we uniformly sample a
significand in the chosen interval and then accept the value if it is
within
The implementation isn’t much code given all the helpers we’ve already
defined, though there are two important details. First, the upper exponent
is bumped up by one if b
’s significand is non-zero. To understand why,
consider the difference between sampling the intervals
float SampleRange(float a, float b) {
assert(a < b && a >= 0.f && b >= 0.f);
int ea = Exponent(a), eb = Exponent(b);
if (Significand(b) != 0) ++eb;
while (true) {
int e = (ea == -127) ? SampleToPowerOfTwoExponent(eb) :
SampleExponent(ea, eb);
float v = Float32FromParts(0, e, RandomSignificand());
if (v >= a && v < b)
return v;
}
}
Note that it would probably be worthwhile to handle the special case of
matching exponents with a call to SampleSameExponent()
, as rejection
sampling with a significand that spans the entire power-of-two range will be
highly inefficient if the two values are close together.
The worst case for this algorithm comes when b
’s significand is
small—i.e., b
is just past a power-of-two. The upper power-of-two range
will be sampled with probability at least 1/2 but then the sampled value
v
will usually be rejected, requiring another time through the while
loop. Conversely, having a
just below a power of 2 is less trouble,
since the corresponding power-of-two interval is the least likely to be
sampled.
Closed Intervals
One nice thing about the algorithm implemented in SampleRange()
is that
handling closed intervals if
test in the while
loop accordingly. The only other difference is
that eb
is always be increased by one. Thus, the worst case for this
version of the algorithm is when b
is an exact power of 2, again giving a
Further Improvements
Stratified sampling is a topic we didn’t get to today; it is often
desirable when one is generating multiple samples over an interval. For a
power-of-2 stratification, it’s possible to work backward from the various
sampling algorithms to determine constraints on the bit patterns the
achieve stratification. I’ll leave the details of that to the reader;
Sample01()
is a good place to start.
We also haven’t dug into the case of an interval that spans zero. To
achieve uniform sampling a similar rejection-based approach is probably
needed where given such an interval
Discussion
It was pretty good going through the special cases until we reached the end. Unfortunately, I don’t see a good way to work around the need to do rejection sampling when there are partial power-of-two intervals at the ends of the range. Perhaps that isn’t the worst thing ever, but having an irregular amount of computation afoot is not ideal if high performance on GPUs or using SIMD instructions is of interest.
Nevertheless, a quick benchmark suggests that SampleRange()
is only about
2.5 times slower than
note
-
Goualard also suggests a sampling algorithm based on a uniform spacing over the interval that is by design not able to generate all possible floating-point values. This algorithm seems to have been previously derived by Artur Grabowski in his random-double library from 2015; see rd_positive() there. ↩