<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.3.1">Jekyll</generator><link href="https://pharr.org/matt/blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://pharr.org/matt/blog/" rel="alternate" type="text/html" /><updated>2026-05-15T12:45:14-07:00</updated><id>https://pharr.org/matt/blog/feed.xml</id><title type="html">Matt Pharr’s blog</title><subtitle>It seemed worth writing up at the time.
</subtitle><entry><title type="html">A Visit to the Sponza Palace’s Atrium</title><link href="https://pharr.org/matt/blog/2023/07/10/sponza-atrium.html" rel="alternate" type="text/html" title="A Visit to the Sponza Palace's Atrium" /><published>2023-07-10T00:00:00-07:00</published><updated>2023-07-10T00:00:00-07:00</updated><id>https://pharr.org/matt/blog/2023/07/10/sponza-atrium</id><content type="html" xml:base="https://pharr.org/matt/blog/2023/07/10/sponza-atrium.html">&lt;p&gt;Back in the early 2000s, the &lt;a href=&quot;http://hdri.cgtechniques.com&quot;&gt;CGTechniques
website&lt;/a&gt; had a “rendering challenge”, where
an interesting model would be posted and then artists would try to make the
best rendering they could of it.  I remember how remarkable the images were
in those early days of global illumination–seeing complex 3D models
coupled with beautiful lighting was incredibly inspiring, especially when
there were few especially interesting scenes available for use in rendering
research.  One of the models used for the challenge was the now-famous
“Sponza Atrium,” created by Marko Dabrovic.  The CGTechniques website is
remarkably still online, but the &lt;a href=&quot;http://hdri.cgtechniques.com/~sibenik2/&quot;&gt;contest
pages&lt;/a&gt; are missing images, though
&lt;a href=&quot;https://web.archive.org/web/20030221053202/http://hdri.cgtechniques.com/~sibenik2/&quot;&gt;archive.org delivers some of them at
least&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In those days of the rendering challenge, Greg Humphreys and I were in the
thick of working on the first edition of &lt;em&gt;Physically Based Rendering&lt;/em&gt;.  We
were desperate for good scenes to render.  I got in touch with Marko and he
was more than happy to give us permission to use the Sponza Atrium model in
the book and to help with the conversion.  In addition to the atrium and a
model of the Šibenik Cathedral, less famous though still wonderful, Marko
and his colleague Mihovil Odak had a nice model of an Audi TT that we made
extensive use of as well.  They were all great scenes, especially for those
days; we kept using them in the book through the third edition, only now
moving on to new ones, two decades later.&lt;/p&gt;

&lt;p&gt;I was recently able to visit Croatia, home to the originals for the Sponza
Atrium and Šibenik Cathedral models.  I didn’t make it to Šibenik, so
missed the cathedral, but I did visit Dubrovnik, home of the Sponza
Palace.  You get a nice postcard when you pay the entrance fee.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;a href=&quot;/matt/blog/images/sponza-palace-postcard.jpg&quot;&gt;
    &lt;img src=&quot;/matt/blog/images/sponza-palace-postcard.jpg&quot; width=&quot;453&quot; height=&quot;605&quot; alt=&quot;Postcard of the Sponza Palace&quot; /&gt;
&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;The Sponza Place is now home to the &lt;a href=&quot;https://en.wikipedia.org/wiki/Dubrovnik_Archive&quot;&gt;Dubrovnik State
Archive&lt;/a&gt;. A room at the
entrance has a memorial to the 200 soldiers who died defending Dubrovnik
during the &lt;a href=&quot;https://en.wikipedia.org/wiki/Siege_of_Dubrovnik&quot;&gt;Siege of
Dubrovnik&lt;/a&gt;.  The Palace
itself was damaged then, hit by a number of shells.  You couldn’t tell now,
nor do you see any hint of the broader devastation to the city, just 30
years ago.&lt;/p&gt;

&lt;p&gt;It was quiet when I visited the Palace even though the streets of
Dubrovnik outside were packed with tourists.  And by quiet I mean that no
one else was in the Atrium most of the time I was there, not that I’m
complaining.  I’m happy to confirm that the GI effects are impressive to
see in person.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;a href=&quot;/matt/blog/images/mmp-sponza-atrium.jpg&quot;&gt;
    &lt;img src=&quot;/matt/blog/images/mmp-sponza-atrium.jpg&quot; width=&quot;453&quot; height=&quot;605&quot; alt=&quot;Matt in the Sponza Palace Atrium&quot; /&gt;
&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;I took advantage of the emptiness and recorded &lt;a href=&quot;https://pub-3d16d67d2c68402fa2fb05197bac91f9.r2.dev/sponza%20atrium.MOV&quot;&gt;a three-minute long
video&lt;/a&gt;,
walking about and panning my camera around to capture the details as well
as I could.  Here’s a low-resolution GIF of the first 20 seconds of it.
(Caution: I find it a little nausea-inducing to view the video now; the
goal was a thorough capture rather than something pleasing for human
consumption.)  I offer it up as fodder for a fun NeRF, or at least as
reference to the original.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;a href=&quot;https://pub-3d16d67d2c68402fa2fb05197bac91f9.r2.dev/sponza-atrium.gif&quot;&gt;
    &lt;img src=&quot;https://pub-3d16d67d2c68402fa2fb05197bac91f9.r2.dev/sponza-atrium.gif&quot; width=&quot;320&quot; height=&quot;180&quot; alt=&quot;Short video of the Sponza Palace Atrium&quot; /&gt;
&lt;/a&gt;
&lt;/p&gt;</content><author><name></name></author><summary type="html">Croatia's must-see tourist destination.</summary></entry><entry><title type="html">Some News About the 4th Edition of Physically Based Rendering</title><link href="https://pharr.org/matt/blog/2022/12/22/pbr-4ed.html" rel="alternate" type="text/html" title="Some News About the 4th Edition of Physically Based Rendering" /><published>2022-12-22T00:00:00-08:00</published><updated>2022-12-22T00:00:00-08:00</updated><id>https://pharr.org/matt/blog/2022/12/22/pbr-4ed</id><content type="html" xml:base="https://pharr.org/matt/blog/2022/12/22/pbr-4ed.html">&lt;p&gt;I’m delighted to report that the final laid-out pages of the 4th edition of
&lt;em&gt;Physically Based Rendering&lt;/em&gt; are now on their way to the printer.  As they
say, it’s been a journey, but I think that all involved are thrilled with
the final result.  It has been a delight working with MIT Press this time
around, especially after all of the
&lt;a href=&quot;/matt/blog/2018/10/15/pbr-online.html#the-second-and-third-editions&quot;&gt;disappointment&lt;/a&gt;
of the conglomerate that shall not be named that was the publisher for the
last two editions.&lt;/p&gt;

&lt;p&gt;Speaking of delightful things (and of things that printers require), we
have a final cover design.  It’s another thing that I think turned out
well; the idea is to convey some idea of the distance that’s
traveled—more or less from scratch to photorealism–over the course of
the book.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;a href=&quot;/matt/blog/images/pbr-4ed-cover.jpg&quot;&gt;
    &lt;img src=&quot;/matt/blog/images/pbr-4ed-cover.jpg&quot; width=&quot;512&quot; alt=&quot;Physically Based Rendering Book Cover&quot; /&gt;
&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;The printed book will be available on March 28, 2023 and full book text
will be available online for free starting on November 1, 2023.  That first
date has once again slipped, though just a few weeks this time.  Apologies,
though I’m glad we took a little extra time for the last rounds of reviews
and fine edits before sending it to the printer.&lt;/p&gt;

&lt;p&gt;For buyers in the US, a 20% off preorder discount and free shipping is
available if you enter the promo code “MITPHoliday22” and order from
&lt;a href=&quot;https://www.penguinrandomhouse.com/search/site/?q=9780262048026&quot;&gt;Penguin Random
House&lt;/a&gt;.
Alternatively, it’s available for preorder on
&lt;a href=&quot;https://www.amazon.com/Physically-Based-Rendering-fourth-Implementation/dp/0262048027?keywords=physically+based+rendering+4th+edition&amp;amp;qid=1671730412&amp;amp;sprefix=physically+based%2Caps%2C145&amp;amp;sr=8-1&amp;amp;linkCode=ll1&amp;amp;tag=pharr-20&amp;amp;linkId=81a816d90f0c7e872617f1f930a51fd6&amp;amp;language=en_US&amp;amp;ref_=as_li_ss_tl&quot;&gt;Amazon&lt;/a&gt;
and elsewhere.&lt;/p&gt;

&lt;p&gt;In the meantime, we have posted PDFs of two complete chapters from the new
edition: &lt;a href=&quot;https://pub-49ca6a23a58a46ef9cf5a5b34413a7ba.r2.dev/pbr-4ed-chap11.pdf&quot;&gt;Chapter 11, Volume
Scattering&lt;/a&gt; and &lt;a href=&quot;https://pub-49ca6a23a58a46ef9cf5a5b34413a7ba.r2.dev/pbr-4ed-chap14.pdf&quot;&gt;Chapter 14,
Light Transport: Volume
Rendering&lt;/a&gt;.  Together they
are over 100 pages of text, almost all of it brand new or largely rewritten
since the third edition.  Those two chapters cover the state of the art in
volumetric light transport, up to and including the null-scattering path
integral.&lt;/p&gt;

&lt;p&gt;One more thing… The cover image has always been an important part of the
book, conveying the value proposition—study the contents and you can
understand how to write a program that makes images like this.  For each
new edition, we’ve tried to find better and better scenes to keep up with
pbrt’s increasing capabilities and all of the topics covered in the book.&lt;/p&gt;

&lt;p&gt;This time around, we licensed the rights to two lovely scenes from &lt;a href=&quot;https://www.lucydreams.it&quot;&gt;Angelo
Ferretti&lt;/a&gt;, allowing us to convert them into
pbrt’s format and to distribute the result.  They are now available in the
&lt;a href=&quot;https://github.com/mmp/pbrt-v4-scenes&quot;&gt;pbrt-v4-scenes repository&lt;/a&gt;.
Together they are nearly 6 GiB, so your &lt;code class=&quot;highlighter-rouge&quot;&gt;git pull&lt;/code&gt; may take some time.&lt;br /&gt;
While you wait, here’s a selection of a few views of them that are
rendered, naturally, with pbrt.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;a href=&quot;/matt/blog/images/kroken-camera1.jpg&quot;&gt;
    &lt;img src=&quot;/matt/blog/images/kroken-camera1.jpg&quot; width=&quot;625&quot; alt=&quot;Kroken Scene, Camera 1 View&quot; /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;a href=&quot;/matt/blog/images/watercolor-camera-3.jpg&quot;&gt;
    &lt;img src=&quot;/matt/blog/images/watercolor-camera-3.jpg&quot; width=&quot;625&quot; alt=&quot;Watercolor Scene, Camera 3 View&quot; /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;a href=&quot;/matt/blog/images/watercolor-camera-13.jpg&quot;&gt;
    &lt;img src=&quot;/matt/blog/images/watercolor-camera-13.jpg&quot; width=&quot;625&quot; alt=&quot;Watercolor Scene, Camera 13 View&quot; /&gt;
&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Happy rendering!&lt;/p&gt;</content><author><name></name></author><summary type="html">We have liftoff.</summary></entry><entry><title type="html">Let’s Stop Calling it “GGX”</title><link href="https://pharr.org/matt/blog/2022/05/06/trowbridge-reitz.html" rel="alternate" type="text/html" title="Let's Stop Calling it &quot;GGX&quot;" /><published>2022-05-06T00:00:00-07:00</published><updated>2022-05-06T00:00:00-07:00</updated><id>https://pharr.org/matt/blog/2022/05/06/trowbridge-reitz</id><content type="html" xml:base="https://pharr.org/matt/blog/2022/05/06/trowbridge-reitz.html">&lt;p&gt;Fifteen years ago, Walter et al. published a &lt;a href=&quot;https://www.cs.cornell.edu/~srm/publications/EGSR07-btdf.html&quot;&gt;fantastic paper about
microfacets&lt;/a&gt;
at EGSR 2007.  It’s full of great contributions, including working out the
theory of refraction through rough microfacet models and evaluating various
models with respect to measured data.  Justifiably, it won the EGSR Test
of Time award in 2021.&lt;/p&gt;

&lt;p&gt;That paper also introduced a microfacet distribution, named there “GGX.”
That distribution was more effective at fitting their measured data than
distributions that had been used before in graphics.  To the authors’
knowledge at the time it was new, but it later became apparent that GGX is
equivalent to a microfacet distribution that Trowbridge and Reitz
introduced in 1975.&lt;sup id=&quot;fnref:tr&quot;&gt;&lt;a href=&quot;#fn:tr&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; It was an unintentional reinvention—these
things happen.&lt;/p&gt;

&lt;p&gt;Although this connection now seems to be fairly widely known, “GGX” seems
to have stuck in graphics.  To this day, “GGX” is used widely in the titles
of papers and their text, often without any reference to Trowbridge and
Reitz.  It’s an unfortunate state of affairs:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;First and foremost, Trowbridge and Reitz deserve their acknowledgment.
Their paper is fantastic&lt;sup id=&quot;fnref:visual&quot;&gt;&lt;a href=&quot;#fn:visual&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; and their work dates to 1975.&lt;/li&gt;
  &lt;li&gt;It doesn’t reflect well on graphics as a field for us to continue to use
our own renaming of a preexisting model.  For example, if we all called
Monte Carlo integration “the Kajiya method,” the broader Monte Carlo
community would quite reasonably raise an eyebrow.&lt;/li&gt;
  &lt;li&gt;It reduces the impact of work done in graphics that is based on the
Trowbridge–Reitz distribution; if someone in another field is aware
of Trowbridge–Reitz but not “GGX,” then there’s research in
graphics that they’re unlikely to find even though it may be relevant to
their work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, better late than never—let’s make it “Trowbridge–Reitz,” or
if you prefer, “Trowbridge–Reitz (GGX).”&lt;/p&gt;

&lt;h2 id=&quot;notes&quot;&gt;notes&lt;/h2&gt;
&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:tr&quot;&gt;
      &lt;p&gt;Trowbridge, S., and K. P. Reitz. 1975. &lt;a href=&quot;/matt/blog/images/average-irregularity-representation-of-a-rough-surface-for-ray-reflection.pdf&quot;&gt;Average irregularity representation of a rough ray reflection&lt;/a&gt;. &lt;em&gt;Journal of the Optical Society of America&lt;/em&gt; &lt;em&gt;65&lt;/em&gt; (5), 531–36. &lt;a href=&quot;#fnref:tr&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:visual&quot;&gt;
      &lt;p&gt;I love this: “The ellipsoid model may prove useful by allowing estimations of its parameter (e) to a reasonable accuracy simply from visual examination of a surface’s micro-structure. On each of the surfaces we examined, one of the authors has visually estimated the shape of the average ellipsoid by observing cross sections of surface irregularities and by observing variations of abundances of surface microareas with orientation relative to the macrosurface.” &lt;a href=&quot;#fnref:visual&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html">A short plea for getting the naming right.</summary></entry><entry><title type="html">Sampling in Floating Point (2/3): 1D Intervals</title><link href="https://pharr.org/matt/blog/2022/03/14/sampling-float-intervals.html" rel="alternate" type="text/html" title="Sampling in Floating Point (2/3): 1D Intervals" /><published>2022-03-14T00:00:00-07:00</published><updated>2022-03-14T00:00:00-07:00</updated><id>https://pharr.org/matt/blog/2022/03/14/sampling-float-intervals</id><content type="html" xml:base="https://pharr.org/matt/blog/2022/03/14/sampling-float-intervals.html">&lt;p&gt;After learning about &lt;a href=&quot;https://www.semanticscholar.org/paper/Fast-generation-of-uniformly-distributed-numbers-Walker/71ebd4c11bf15f87918325d92a5b476344b3c7a2&quot;&gt;Walker’s
algorithm&lt;/a&gt;
for uniformly sampling \([0,1)\) in floating-point, I started thinking
about how his approach might be generalized to arbitrary intervals; being
able to uniformly sample any interval while potentially generating all
possible floating-point values inside it would certainly be a nice tool to
add to the toolbox.&lt;/p&gt;

&lt;p&gt;The good news is that it was a fun thought exercise with things learned
along the way.  In the end I found enough insights to come up with a
solution. However, upon further digging I found two previous
implementations of the approach I came up with.  So much for a &lt;em&gt;tour de
force&lt;/em&gt; paper describing my findings in &lt;em&gt;ACM Transactions on Modeling and
Computer Simulation&lt;/em&gt;.  They are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Christoph Conrads’s &lt;a href=&quot;https://gitlab.com/christoph-conrads/rademacher-fpl&quot;&gt;Rademacher Floating Point
Library&lt;/a&gt;, which dates
to 2018.  See the 
&lt;a href=&quot;https://gitlab.com/christoph-conrads/rademacher-fpl/-/blob/master/include/rademacher-fpl/impl/uniform-real-distribution.hpp#L225&quot;&gt;make_uniform_random_value()&lt;/a&gt;
function there.&lt;/li&gt;
  &lt;li&gt;Olaf Bernstein’s
&lt;a href=&quot;https://github.com/camel-cdr/cauldron/blob/7d5328441b1a1bc8143f627aebafe58b29531cb9/cauldron/random.h#L1604&quot;&gt;dist_uniformf_dense()&lt;/a&gt;
which seems to date to 2021.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(I’d be interested to hear if there are any earlier instances of it.)&lt;/p&gt;

&lt;p&gt;Nevertheless, I still thought it might be useful to write up some of the
motivation, walk through my route to a solution, and explain some of the
subtleties.&lt;/p&gt;

&lt;h2 id=&quot;whats-wrong-with-linear-interpolation&quot;&gt;What’s Wrong With Linear Interpolation?&lt;/h2&gt;

&lt;p&gt;If you want to sample a value in \([a,b)\) for arbitrary \(a\) and
\(b\), the usual thing is to take a uniform value in \([0,1)\) and use
it to linearly interpolate between \(a\) and \(b\).  &lt;a href=&quot;/matt/blog/2022/03/05/sampling-fp-unit-interval.html&quot;&gt;Last
time&lt;/a&gt; we saw how to
compute a gold-standard uniform floating-point value in \([0,1)\), so why
not use that to interpolate?  Needless to say, that works in practice, but
once again, there are some subtleties.&lt;/p&gt;

&lt;p&gt;First one must choose an equation with which to linearly interpolate.  For an
interval \([a,b)\) with interpolation parameter \(t\), are two common
choices: \(a+t (b-a)\) and \((1-t)a +t b\).  Each has its own strengths
and weaknesses.&lt;/p&gt;

&lt;p&gt;The first, \(a + t (b-a)\), requires fewer operations but has the disadvantage
that even if \(t \in [0,1)\), it is not guaranteed that the interpolated
value will be in \([a,b)\).  For example, with \(a=2.5\),
\(b=8.87385559\), and \(t=1-2^{-24}\), the last floating-point value
before 1, then with float32, we find that
\[
2.5 + (1 - 2^{-24}) (8.87385559 - 2.5) \rightarrow 8.87385559;
\]
the result is equal to the upper bound even though 
\(t&amp;lt;1\).  A similar problem occurs with the closed interval
\([a,b]\): the value \(t=1\) can yield value that is greater than
\(b\).&lt;/p&gt;

&lt;p&gt;In graphics, respecting intervals is important since we often do things
like bound vertex positions at different times, linearly interpolate them,
and then want to be able to assert that the result is inside the bounds.  In this
case, \((1-t)a+tb\) is
preferable, since \(t=0\) always yields \(a\) and \(t=1\) gives
\(b\). However, that formulation has the surprising shortcoming that
increasing \(t\) sometimes causes the interpolated value to move
backwards.  Consider this interpolant with \(a=2.5\) and
\(b=10.53479\):
\[
(1-t) \cdot 2.5 + t \cdot 10.53479.
\]
With \(t=0.985167086\), the interpolant gives 10.4156&lt;strong&gt;113&lt;/strong&gt;. Moving
\(t\) up to the next possible floating-point value, 0.985167146, the
interpolant’s value is reduced, down to 10.4156&lt;strong&gt;103&lt;/strong&gt;.  The rounding on
the terms has gone differently with that small change in
\(t\) and it’s (slightly) downhill from there.  In practice, these little
wobbles are unlikely to cause trouble, though they do mean that an
assertion that implicitly assumes that the interpolant is monotonic may
fail for fairly obscure reasons.&lt;/p&gt;

&lt;p&gt;Both of these approaches also suffer from a minor bias for reasons similar
to why dividing a random 32-bit value by \(2^{-32}\) to generate a
uniform random variable led to a bias in sampled floats: each \(t\) value
maps to a single floating-point value and if there are more of one than the
other, rounding may introduce a minor non-uniformity.  (The numerics people
like to say this is due to the pigeonhole principle. I can’t say that is
incorrect, but I like to think of it in terms of aliasing: it’s taking
something at one frequency it and then resampling it at another—things
always get a little messy when you do that and aren’t careful.)&lt;/p&gt;

&lt;p&gt;For more on the above, including an entertaining review of how linear
interpolation is implemented in assorted programming languages’ standard
libraries and assorted misstatements in their documentation about the
characteristics of the expected results, see &lt;a href=&quot;https://hal.archives-ouvertes.fr/hal-03282794/&quot;&gt;Drawing random
floating-point numbers from an
interval&lt;/a&gt;, by &lt;a href=&quot;https://frederic.goualard.net&quot;&gt;Frédéric
Goualard&lt;/a&gt;.&lt;sup id=&quot;fnref:fixed&quot;&gt;&lt;a href=&quot;#fn:fixed&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;h2 id=&quot;a-few-utility-functions&quot;&gt;A Few Utility Functions&lt;/h2&gt;

&lt;p&gt;Before we go further, let’s specify a few utility functions that will be
used in the forthcoming implementations.  (All of the following code is
available in a &lt;a href=&quot;https://github.com/mmp/uniform_float&quot;&gt;small header-only
library&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;For efficient sampling of the \(p=1/2\) geometric distribution, a function that
uses native bit counting instructions will be useful:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CountLeadingZeros&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We’ll assume the existence of two functions that provide uniform random
values.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Random64Bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Random32Bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;C++20’s &lt;code class=&quot;highlighter-rouge&quot;&gt;std::bit_cast()&lt;/code&gt; function makes it easy to convert from a float32 to
its bitwise representation and back.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;ToBits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bit_cast&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;FromBits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bit_cast&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;A few helper functions extract the pieces of a float32.  (If necessary, see
the &lt;a href=&quot;https://en.wikipedia.org/wiki/Single-precision_floating-point_format&quot;&gt;Wikipedia
page&lt;/a&gt;
for details of the in-memory layout of float32s to understand what these
are doing.)  Zero is returned by &lt;code class=&quot;highlighter-rouge&quot;&gt;SignBit()&lt;/code&gt; for positive values and one
for negative.  Because the float32 exponent is stored in memory in biased
form as an unsigned value from zero to 255, &lt;code class=&quot;highlighter-rouge&quot;&gt;Exponent()&lt;/code&gt; returns the
unbiased exponent, which ranges from \(-126\) to 127 for normal
floating-point values, with \(-127\) reserved for zero and the
denormalized floats.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SignBit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ToBits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;31&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ToBits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0xff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SignificandMask&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Significand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ToBits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SignificandMask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We’ll also find it useful to be able to generate uniform random
significands.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;RandomSignificand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Random32Bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SignificandMask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;Float32FromParts()&lt;/code&gt; constructs a float32 value from the specified pieces.
The assertions document the requirements for the input parameters.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Float32FromParts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sign&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sign&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FromBits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sign&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;31&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;A positive power-of-two float32 value can be constructed by shifting the
biased exponent into place.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;FloatPow2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;126&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FromBits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Expressed with these primitives, Reynolds’s &lt;a href=&quot;http://marc-b-reynolds.github.io/distribution/2017/01/17/DenseFloat.html#the-parts-im-not-tell-you&quot;&gt;pragmatic
compromise&lt;/a&gt;
algorithm for uniformly sampling in \([0,1)\) is:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Sample01&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bits&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RandomBits64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bits&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SignificandMask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CountLeadingZeros&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;40&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Float32FromParts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x1&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;h2 id=&quot;first-steps&quot;&gt;First Steps&lt;/h2&gt;

&lt;p&gt;Back to our original question: can we do better than linear interpolation
to sample an arbitrary interval, and more specifically, is it possible to
generalize Walker’s algorithm to remove the limitation of sampling over
\([0,1)\)?  I had no idea how to go all the way from the original
algorithm to arbitrary intervals so I started with a few small thought
experiments to chip away at the edges of the problem an improve my
intuition.  (In all of the following, I’ll assume a half-open interval
\([a,b)\) that does not span zero; we’ll come back to the generalizations
of a closed interval \([a,b]\) and intervals that span zero at the end.)&lt;/p&gt;

&lt;p&gt;An easy first case is to consider intervals that start at zero and end with
an arbitrary power of two.  I first took the smallest step possible and
thought about \([0,2)\).  Indeed, Walker’s approach just works; there’s
nothing in it that requires the upper bound to be 1; we can apply the same
idea of starting with the upper \([1,2)\) interval, randomly selecting it
with probability 1/2 and otherwise continuing down intervals until one is
chosen or we hit the denorms. There’s an easy first victory.&lt;/p&gt;

&lt;p&gt;Have we missed anything?  We should be careful.  To further validate this
direction, consider the case where we have a tiny power-of-two sized
interval, say \([0,2^{-124})\).  The minimum exponent for normal numbers
is \(-126\), so we have just two regular power-of-two sized intervals and
then the denorms.  Here’s how that looks on the floating-point number line
with valid floats marked with hashes and a 3-bit significand to keep the
figure scrutable:&lt;/p&gt;

&lt;center&gt; &lt;img src=&quot;/matt/blog/images/fp-sample-interval-blogpost_down-by-denorms.svg&quot; height=&quot;70&quot; /&gt; &lt;/center&gt;

&lt;p&gt;We sample the \([2^{-125}, 2^{-124})\) interval with probability 1/2 and
otherwise sample \([2^{-126}, 2^{-125})\) with probability 1/2.  If
neither is selected, we uniformly sample the denorms which are between
\([0,2^{-126})\).  This extreme case helps us better understand how the
edge case of the denorms is handled: because the width of the last interval
of normal floats and the width of the denorms is equal, choosing between
them with equal probability leads to uniform sampling of the full interval.&lt;/p&gt;

&lt;p&gt;On this topic, Walker wrote:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;In practice, the range of the exponent will be limited, and the
probability of the number falling into either of the two smallest
intervals will be the same.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Denormalized numbers were invented after his paper so it seems that this
was a minor fudge in his original approach, corrected today by
advances in floating-point.&lt;/p&gt;

&lt;p&gt;Here is a function to sample a float32 exponent for this case, taking 64
random bits at a time and counting zeros to sample the distribution. An
exponent is returned either if a one bit is found in the random bits or if
enough zero bits have been seen to make it to the denorms.  In either case,
if the denorms have been reached, \(-127\) is returned so that a
denormalized or zero floating-point value results.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SampleToPowerOfTwoExponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;126&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CountLeadingZeros&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Random64Bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Given &lt;code class=&quot;highlighter-rouge&quot;&gt;SampleToPowerOfTwoExponent()&lt;/code&gt;, the full algorithm to
uniformly sample an interval \([0,2^x)\) is simple.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SampleToPowerOfTwo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ex&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SampleToPowerOfTwoExponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Float32FromParts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ex&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RandomSignificand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;An implementation that uses a fixed number of random bits can be found with
a straightforward generalization of Reynolds’s “pragmatic” algorithm that
always consumes only 64 random bits, though there are two differences
compared to the \([0,1)\) case. First, if the initial interval is
\([0,2^{-88})\) or smaller, then the 41 bits remaining after the
significand is extracted from the 64-bit random value are more than are
needed to consider for all of the possible power-of-two intervals.  In that
case, we need to be careful to finish in the denorms rather than trying to
construct a float32 with an invalid exponent.  Clamping the exponent at
\(-127\) takes care of this.&lt;/p&gt;

&lt;p&gt;The second difference is that if all of the bits used to select an interval
are zero, then if the initial exponent is \(x\), then the remaining
interval that we will sample using equal spacing is \([0,2^{x-41})\).
Given a 23-bit significand \(s\), the sampled value is then
\[
\frac{s}{2^{23}} 2^{x-41}.
\]
It is tempting to merge the division by \(2^{23}\) and the multiplication
by \(2^{x-41}\) into a single constant, though doing so would lead to
underflow when \(x &amp;lt; -63\).  (Reynolds’s algorithm for \([0,1)\) could
just multiply the significand by \(2^{-64}\) in this case since \(x\)
was always 0 and there were no concerns about underflow.)&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SampleToPowerOfTwoFast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bits&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SignificandMask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CountLeadingZeros&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;41&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;41&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x1&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FloatPow2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;41&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ex&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exponent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Float32FromParts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ex&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Another easy case comes with an interval where both endpoints have the
same exponent.  In that case, the spacing between them is uniform and a
value can be sampled by randomly sampling a significand between theirs.
That setting is shown on the floating-point number line below; the values
marked in red are the possible results, depending on the significand.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/matt/blog/images/fp-sample-interval-blogpost_single-interval.svg&quot; height=&quot;70&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The code is easy to write given a &lt;code class=&quot;highlighter-rouge&quot;&gt;RandomInt()&lt;/code&gt; function that returns a
uniform random integer between 0 and the specified value, inclusive:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SampleSameExponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sa&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Significand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sb&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Significand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sig&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sa&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RandomInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sb&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sa&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Float32FromParts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SignBit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sig&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;h2 id=&quot;arbitrary-power-of-2-lower-bounds&quot;&gt;Arbitrary (Power of 2) Lower Bounds&lt;/h2&gt;

&lt;p&gt;The easy successes stop coming when we consider intervals with a
power-of-two value at their lower bounds: say that we’d like to sample
uniformly over \([1,7)\).  Our intervals are \([1,2)\), \([2,4)\),
and \([4,7)\).  Their respective widths are 1, 2, and 4; the sampling
probabilities are \(1/7\), \(2/7\), and \(4/7\).  So much for a nice
geometric distribution with \(p=1/2\).  The setting is illustrated below:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/matt/blog/images/fp-sample-interval-blogpost_sample-1-8.svg&quot; height=&quot;80&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Here we most definitely see the importance of the denorms and the last
power-of-two sized interval of normal floating-point numbers having the same
width.  With a power-of-two interval that ends above zero, we no longer have
two intervals at the end that should be sampled with the same probability
and things fall apart.&lt;/p&gt;

&lt;p&gt;Upon reaching this realization, I had no idea how to proceed; I feared that
the cause might be lost.  Lacking any other ideas, I wondered if it would
work to apply Walker’s approach still with the probability \(1/2\) of
sampling each interval but then cycling around when one goes past the lower
interval, along these lines:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/matt/blog/images/fp-sample-interval-blogpost_sample-1-8-redo.svg&quot; height=&quot;110&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;With this method, the probability of sampling the \([4,7)\) interval is
then \(1/2\) the first time around.  With \(1/8\) probability we cycle
back around for another \(1/2\) chance, and so forth.  We have:
\[
\frac{1}{2} + \frac{1}{8} \frac{1}{2} + \cdots = \frac{1}{2}
\sum_{i=0}^\infty \frac{1}{8^i} = \frac{4}{7}
\]
Success! (Needless to say, the other intervals work out with the desired
probabilities as well.)&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;SampleExponent()&lt;/code&gt; implements the algorithm that consumes random bits until
it successfully samples such a single power-of-two interval.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SampleExponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;emin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;emax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CountLeadingZeros&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Random64Bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;emax&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lz&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;emax&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;emin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;If &lt;code class=&quot;highlighter-rouge&quot;&gt;emin&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;emax&lt;/code&gt; are not known at compile time, computing the integer
modulus in &lt;code class=&quot;highlighter-rouge&quot;&gt;SampleExponent()&lt;/code&gt; may be expensive.  Because the maximum value
of &lt;code class=&quot;highlighter-rouge&quot;&gt;emax-emin&lt;/code&gt; is 253, it may be worthwhile to maintain a table of
constants for use with an efficient integer modulus algorithm (see e.g.,
&lt;a href=&quot;https://arxiv.org/abs/2012.12369&quot;&gt;Lemire et al. 2021&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;With &lt;code class=&quot;highlighter-rouge&quot;&gt;SampleExponent()&lt;/code&gt; in hand, the full algorithm is straightforward.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// Sample uniformly and comprehensively in [2^emin, 2^emax).
&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SampleExponentRange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;emin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;emax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;emax&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;emin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RandomSignificand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Float32FromParts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SampleExponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;emin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;emax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;significand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;For a “pragmatic” version of this algorithm that uses a fixed number of
random bits, we could take the number of leading zeros modulo the number of
power-of-two sized intervals under consideration to choose an interval and
then use a uniform random significand.  However, in the rare case where all
of the bits used to sample an interval are zero, the remaining interval is
of the form \([2^a, 2^c)\) where \(c=41 \mod (b-a)\); we’ve used up all
of our random bits to be faced with the same general problem we started
with (unless it happens that \(c=a+1\).)  At that point we might use
linear interpolation to sample the remaining interval, though that’s
admittedly unsatisfying, as linear interpolation is the thing we’re trying
to avoid.&lt;/p&gt;

&lt;h2 id=&quot;partial-intervals-at-one-or-both-ends&quot;&gt;Partial Intervals at One or Both Ends&lt;/h2&gt;

&lt;p&gt;With that we finally have enough to return to the original task, uniformly
and comprehensively sampling an arbitrary interval \([a,b)\).  This is,
unfortunately, the point at which I haven’t been able to figure out a
reasonable “pragmatic” implementation that uses a small and fixed number of
random bits.  The figure below shows the general setting; as before, the
valid candidate values are marked in red.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/matt/blog/images/fp-sample-interval-blogpost_sample-general-interval.svg&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;An approach based on rejection sampling can be used to sample the specified
interval: the idea is that we will sample all of the possible intervals as
before, with probability according to their width.  Then we uniformly sample a
significand in the chosen interval and then accept the value if it is
within \([a,b)\).  For the power-of-two intervals in the middle, we will
always accept the sample, and for the intervals on the ends, the
probability of acceptance is proportional to how much of the power-of-two
interval overlaps \([a,b)\).&lt;/p&gt;

&lt;p&gt;The implementation isn’t much code given all the helpers we’ve already
defined, though there are two important details.  First, the upper exponent
is bumped up by one if &lt;code class=&quot;highlighter-rouge&quot;&gt;b&lt;/code&gt;’s significand is non-zero.  To understand why,
consider the difference between sampling the intervals \([a, 8)\) and
\([a, 8.5)\).  In the former case, we will never need to consider an
exponent of 3, but for the later case, we must.  Second, the algorithm used
for sampling exponent must account for whether zero or the denorms are
included in \([a,b)\); this corresponds to the differences we saw earlier
in how to sample intervals like \([0,2^x)\) versus \([2^x,2^y)\).&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SampleRange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;assert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ea&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eb&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Significand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
       &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ea&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;127&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SampleToPowerOfTwoExponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
                              &lt;span class=&quot;n&quot;&gt;SampleExponent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ea&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
       &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Float32FromParts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RandomSignificand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
       &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
           &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Note that it would probably be worthwhile to handle the special case of
matching exponents with a call to &lt;code class=&quot;highlighter-rouge&quot;&gt;SampleSameExponent()&lt;/code&gt;, as rejection
sampling with a significand that spans the entire power-of-two range will be
highly inefficient if the two values are close together.&lt;/p&gt;

&lt;p&gt;The worst case for this algorithm comes when &lt;code class=&quot;highlighter-rouge&quot;&gt;b&lt;/code&gt;’s significand is
small—i.e., &lt;code class=&quot;highlighter-rouge&quot;&gt;b&lt;/code&gt; is just past a power-of-two.  The upper power-of-two range
will be sampled with probability at least 1/2 but then the sampled value
&lt;code class=&quot;highlighter-rouge&quot;&gt;v&lt;/code&gt; will usually be rejected, requiring another time through the &lt;code class=&quot;highlighter-rouge&quot;&gt;while&lt;/code&gt;
loop.  Conversely, having &lt;code class=&quot;highlighter-rouge&quot;&gt;a&lt;/code&gt; just below a power of 2 is less trouble,
since the corresponding power-of-two interval is the least likely to be
sampled.&lt;/p&gt;

&lt;h2 id=&quot;closed-intervals&quot;&gt;Closed Intervals&lt;/h2&gt;

&lt;p&gt;One nice thing about the algorithm implemented in &lt;code class=&quot;highlighter-rouge&quot;&gt;SampleRange()&lt;/code&gt; is that
handling closed intervals \([a,b]\) is mostly a matter of updating the
&lt;code class=&quot;highlighter-rouge&quot;&gt;if&lt;/code&gt; test in the &lt;code class=&quot;highlighter-rouge&quot;&gt;while&lt;/code&gt; loop accordingly.  The only other difference is
that &lt;code class=&quot;highlighter-rouge&quot;&gt;eb&lt;/code&gt; is always be increased by one.  Thus, the worst case for this
version of the algorithm is when &lt;code class=&quot;highlighter-rouge&quot;&gt;b&lt;/code&gt; is an exact power of 2, again giving a \(1/2\)
chance of selecting the upper interval each time, with a \(1-2^{-23}\)
probability of rejecting the sample in that interval.&lt;/p&gt;

&lt;h2 id=&quot;further-improvements&quot;&gt;Further Improvements&lt;/h2&gt;

&lt;p&gt;Stratified sampling is a topic we didn’t get to today; it is often
desirable when one is generating multiple samples over an interval.  For a
power-of-2 stratification, it’s possible to work backward from the various
sampling algorithms to determine constraints on the bit patterns the
achieve stratification.  I’ll leave the details of that to the reader;
&lt;code class=&quot;highlighter-rouge&quot;&gt;Sample01()&lt;/code&gt; is a good place to start.&lt;/p&gt;

&lt;p&gt;We also haven’t dug into the case of an interval that spans zero.  To
achieve uniform sampling a similar rejection-based approach is probably
needed where given such an interval \([a,b)\) we define an extended
interval \([-c,c)\) with \(c=\max (|a|, |b|)\) that encompasses the
original interval.  We can then randomly select the positive or negative
side, generate a sample, and then reject it if it is not inside the
original interval.  However, the combination of an unbalanced interval that
spans zero and also includes an exact power of two at its upper bound gives
an even worse worst case: consider a highly unbalanced interval like
\([-2^{-100}, 2^{64}]\): we end up with a nearly \(3/4\) chance of
rejecting each candidate sample.&lt;/p&gt;

&lt;h2 id=&quot;discussion&quot;&gt;Discussion&lt;/h2&gt;

&lt;p&gt;It was pretty good going through the special cases until we reached the
end.  Unfortunately, I don’t see a good way to work around the need to do
rejection sampling when there are partial power-of-two intervals at the ends
of the range.  Perhaps that isn’t the worst thing ever, but having an
irregular amount of computation afoot is not ideal if high performance on
GPUs or using SIMD instructions is of interest.&lt;/p&gt;

&lt;p&gt;Nevertheless, a quick benchmark suggests that &lt;code class=&quot;highlighter-rouge&quot;&gt;SampleRange()&lt;/code&gt; is only about
2.5 times slower than \((1-t)a+tb\) on my system here if the cost of
random number generation is included.  If a reasonable amount of computation
is performed for each sample, the added cost may be no concern.  However, lacking
a clear example of a case where this first-class sampling makes a
difference in the final results, it’s hard to argue for the added expense
in general.&lt;/p&gt;

&lt;h3 id=&quot;note&quot;&gt;note&lt;/h3&gt;
&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:fixed&quot;&gt;
      &lt;p&gt;Goualard also suggests a sampling algorithm based on a uniform spacing over the interval that is by design not able to generate all possible floating-point values.  This algorithm seems to have been previously derived by Artur Grabowski in his &lt;a href=&quot;https://github.com/art4711/random-double&quot;&gt;random-double&lt;/a&gt; library from 2015; see &lt;a href=&quot;https://github.com/art4711/random-double/blob/60464979e3eb039803d5a840dbbde025e0b0956f/arbitrary_range.c#L296&quot;&gt;rd_positive()&lt;/a&gt; there. &lt;a href=&quot;#fnref:fixed&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html">A ramble extending Walker's algorithm to sample arbitrary floating-point intervals, later found to be a rederivation of earlier work.</summary></entry><entry><title type="html">Sampling in Floating Point (1/3): The Unit Interval</title><link href="https://pharr.org/matt/blog/2022/03/05/sampling-fp-unit-interval.html" rel="alternate" type="text/html" title="Sampling in Floating Point (1/3): The Unit Interval" /><published>2022-03-05T00:00:00-08:00</published><updated>2022-03-05T00:00:00-08:00</updated><id>https://pharr.org/matt/blog/2022/03/05/sampling-fp-unit-interval</id><content type="html" xml:base="https://pharr.org/matt/blog/2022/03/05/sampling-fp-unit-interval.html">&lt;p&gt;&lt;em&gt;(The following assumes basic familiarity with the IEEE floating point
representation—sign, power of two exponent, and significand—but 
not necessarily expert-level understanding of it.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Taking samples from various distributions is at the heart of rendering;
perhaps most importantly, it allows us to use importance sampling when
performing Monte Carlo integration, which gives us a powerful tool to
reduce error.  The associated sampling algorithms are generally derived
assuming that real numbers are afoot but are then implemented using
floating-point math on computers.  For sampling, the differences between
reals and floats usually doesn’t cause any problems, though if you look
closely enough there are a few interesting subtleties.  We’ll start this
short series on that topic today with what seems like should be the
simplest of problems: uniformly sampling a floating-point value between
zero and one.&lt;/p&gt;

&lt;h2 id=&quot;uniform-floats-by-dividing-by-an-integer&quot;&gt;Uniform Floats by Dividing by an Integer&lt;/h2&gt;

&lt;p&gt;Just about anywhere you look, from &lt;em&gt;Stack Overflow&lt;/em&gt; to all four editions of
&lt;em&gt;Physically Based Rendering&lt;/em&gt;, you’ll be told that it’s easy to sample a
uniform floating-point value in \([0,1)\): just generate a random \(n\)
bit unsigned integer and divide by \(2^{n}\).  Given real numbers, that’s
fine—the largest value your integer can take is \(2^n-1\) and dividing
by \(2^n\) gives a value that’s strictly less than one.&lt;sup id=&quot;fnref:mult&quot;&gt;&lt;a href=&quot;#fn:mult&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; With
32-bit floats (as we will exclusively consider today), there’s a nit: say that
\(n=32\) (as is used in pbrt).  After floating point rounding, one will
find that
\[
\frac{2^{32} - 1}{2^{32}} \rightarrow 1;
\]
so much for that non-inclusive upper bound.
The problem is that the spacing between the floats right below 1 is
\(2^{-24}\).  Because \(2^{-32}\) is much less than half that, \(1-2^{-32}\)
rounds to 1.  Even worse, all 128 floating-point values in \([2^{32}-128,
2^{32}-1]\) round to 1.&lt;/p&gt;

&lt;p&gt;pbrt &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/fd3c25bf1062ab9a790a9ab5fbd4e84d813c2316/src/pbrt/util/rng.h#L129&quot;&gt;works around that
problem&lt;/a&gt;
by bumping any such 1s down to \(1-2^{-24}\), the last representable
float before 1. That gets things back to \([0,1)\) but it’s a stinkiness
in the code that in retrospect should have led to the algorithm used being
given more attention.&lt;/p&gt;

&lt;p&gt;One way to avoid this issue and many of the following is to set \(n=24\),
in which case all values after division are valid float32 values and no
rounding is required.&lt;sup id=&quot;fnref:petrik&quot;&gt;&lt;a href=&quot;#fn:petrik&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; However, that gives slightly more than 16
million unique values; that’s a fair number of them, but there are actually
a total of 1,065,353,216 float32 values in \([0,1)\)—nearly a quarter
of all possible 32-bit floats.  Under that lens, those 16 million seem
rather few.&lt;/p&gt;

&lt;p&gt;How much better do we do with \(n=32\)?  Although we start out with over
4 billion distinct integer values, if you divide each by \(2^{-32}\) to
generate samples in \([0,1)\) and count how many float32s are generated,
it turns out that those four billion yield only 83,886,081 distinct
floating-point values, or 7.87% of all of the possible ones between zero
and one.  Not only do we have multiple integer values mapping to the same
floating-point value all the way from \(2^{-8}=0.00390625\) to 1, but
between 0 and \(2^{-9}=0.001953125\), the spacing between floats is less
than \(2^{-32}\) and many floating-point values are never generated.&lt;/p&gt;

&lt;p&gt;There’s another problem that comes with the choice of
\(n&amp;gt;24\), nicely explained in the paper &lt;a href=&quot;https://hal.archives-ouvertes.fr/hal-02427338&quot;&gt;&lt;em&gt;Generating Random
Floating-Point Numbers by Dividing Integers: A Case
Study&lt;/em&gt;&lt;/a&gt;, by &lt;a href=&quot;https://frederic.goualard.net&quot;&gt;Frédéric
Goualard&lt;/a&gt;.  When the usual
round-to-nearest-even is applied after dividing by \(2^{32}\), a systemic
bias is introduced in the final values, clearly shown in that paper with
examples that use low floating-point precision.  Thus, it’s not just “we’re
not making the most of what we’ve been given”, but it’s “the distribution
isn’t actually uniform.”&lt;/p&gt;

&lt;p&gt;The rounding problem is still evident with float32s and \(n=32\) bits; if
we consider all \(2^{32}\) floating-point values, we would expect for
example that all floats in \([0.5,1)\) would be generated the same number
of times.  (Indeed, we would expect 256 of each since we have \(2^{32}\)
values, the float32 spacing in that interval is \(2^{-24}\), and
\(2^{32} \cdot 2^{-24} = 256\).)  However, if we count them up, it turns
out that alternating floating-point values are generated 255 times and 257
times, all the way from 0.5 to 1.  That happens in many other intervals,
becoming its worst in the interval \([0.00390625, 0.0078125)\) where
alternating values are generated one and three times.&lt;sup id=&quot;fnref:rounding&quot;&gt;&lt;a href=&quot;#fn:rounding&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Depending on one’s application, all of these issues may be no problem in
practice, and I wouldn’t make the argument that they are likely to cause
errors in rendered images.  Most of the time \(n=24\) and not worrying
about it is probably fine.  Yet IEEE has given us all that precision and it
seems wasteful not to make use of it, if it isn’t too much trouble to do
so…&lt;/p&gt;

&lt;h2 id=&quot;uniform-floats-by-sampling-a-geometric-distribution&quot;&gt;Uniform Floats by Sampling a Geometric Distribution&lt;/h2&gt;

&lt;p&gt;What might be done about these problems?  A remarkably elegant and
efficient solution dates to 1974 with Walker’s paper &lt;a href=&quot;https://www.semanticscholar.org/paper/Fast-generation-of-uniformly-distributed-numbers-Walker/71ebd4c11bf15f87918325d92a5b476344b3c7a2&quot;&gt;&lt;em&gt;Fast Generation of
Uniformly Distributed Pseudorandom Numbers with Floating-Point
Representation&lt;/em&gt;&lt;/a&gt;,
which is based on the following observation (expressed here in terms of modern
IEEE float32s):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;In the interval \([1/2, 1)\), there are exactly \(2^{23}\)
equally-spaced numbers that can be represented
in float32.&lt;/li&gt;
  &lt;li&gt;In the interval \([1/4, 1/2)\), there are exactly \(2^{23}\)
equally-spaced numbers that can be represented
in float32.&lt;/li&gt;
  &lt;li&gt;In the interval \([1/8, 1/4)\), there are exactly \(2^{23}\)
equally-spaced numbers that can be represented
in float32.&lt;/li&gt;
  &lt;li&gt;And so on…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We would like an algorithm that can generate all of those numbers but does
so in a way that gives a uniform distribution over \([0,1)\).  Walker
observed that this can be done in two steps: first by choosing an interval
with probability according to its width, and then by sampling uniformly
within its interval. (This algorithm is sometimes credited to Downey, who
seems to have independently derived it in an &lt;a href=&quot;https://allendowney.com/research/rand/downey07randfloat.pdf&quot;&gt;unfinished
paper&lt;/a&gt; from
2007.)&lt;/p&gt;

&lt;p&gt;Because each interval’s width is half that of the one above it, sampling an
interval corresponds to sampling a geometric distribution with
\(p=1/2\).  There’s thus an easy iterative algorithm to select an
interval: one can first randomly choose to generate the sample within
\([1/2, 1)\) with probability \(1/2\).  Otherwise, sample within
\([1/4,1/2)\) with probability \(1/2\) and so forth; bottom out
if you hit the denorms.  Given an interval, the exponent follows and a
sample within an interval can be found by uniformly sampling a significand,
since values within a given interval are equally-spaced.&lt;/p&gt;

&lt;p&gt;Choosing an interval in that way takes only two iterations in expectation,
but the worst case requires many more.  The associated execution divergence
is especially undesirable for processors like GPUs.  Walker had another
trick up his sleeve, however:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Pseudorandom integer numbers with a truncated geometric distribution may
be obtained by counting consecutive 1s or 0s in a binary random number,
drawn from a set having a uniform frequency distribution.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In other words, generate a random binary integer and, say, count the number
of leading zero bits.  Use that count to choose an interval, where zero
leading zero bits has you sampling in \([1/2,1)\), one leading zero bit
puts you in \([1/4,1/2)\), and so forth.  Given an index \(i\) into the
intervals that starts at 0, the exponent is then \(-1 - i\).  Modern
processors offer bit counting instructions that yield such counts, so this
algorithm can be implemented very efficiently.&lt;/p&gt;

&lt;h2 id=&quot;from-theory-to-implementation&quot;&gt;From Theory to Implementation&lt;/h2&gt;

&lt;p&gt;With float32, the floating-point exponent factors over the \([0,1)\)
interval go from \(2^{-1}\) down to \(2^{-126}\) before the denorms
start.  Thus, 128 random bits may be required to choose the interval.
However, those intervals start becoming so small that one’s commitment to
possibly sampling every possible float might start to waver; the odds of
making it to one of the tiny ones becomes vanishingly small.&lt;/p&gt;

&lt;p&gt;A &lt;a href=&quot;http://marc-b-reynolds.github.io/distribution/2017/01/17/DenseFloat.html&quot;&gt;blog
post&lt;/a&gt;
by Marc Reynolds has all sorts of good insights on the efficient
implementation of this algorithm.  (More generally, &lt;a href=&quot;http://marc-b-reynolds.github.io&quot;&gt;Marc’s
blog&lt;/a&gt; is full of great sampling and
floating-point content; highly recommended.)  He considers multiple
approaches (for example, successively generating as many random 32-bit
values as needed) and ends with a &lt;a href=&quot;http://marc-b-reynolds.github.io/distribution/2017/01/17/DenseFloat.html#the-parts-im-not-tell-you&quot;&gt;pragmatic
compromise&lt;/a&gt;
that takes a single 64-bit random value, uses 41 bits to choose the
exponent, and uses the remaining 23 bits to sample the significand.  The
remaining \([0,2^{-40})\) interval is sampled uniformly.  As long as an
efficient count leading zeros instruction is used, it’s only slightly more
work than multiplying by \(2^{-32}\) and clamping; in practice, most of
the extra expense comes from needing to generate a 64-bit pseudorandom
value rather than just a 32-bit one.&lt;/p&gt;

&lt;!--
// Given uniform 64-bit integer 'u' return a uniform float on [0,1)
// * the interval [2^-40, 1) is dense (all representable values produced)
// * the interval [0, 2^-40) is equidistantly (2^-64) populated

float rng_pdense_f32(uint64_t u)
{
  uint32_t z = lzc_64(u);                   // change to (u|1) for intel pre LZCNT hardware

  if (z &lt;= 40) {
    uint32_t e = 126-z;                     // compute the biased exponent
    uint32_t m = ((uint32_t)u) &amp; 0x7fffff;  // explict significand bits
    return f32_from_bits(e&lt;&lt;23|m);          // construct the binary32
  }
  
  // The probabilty of reaching here is 2^-40. There are as many points
  // on this subinterval as the standard equidistance method produces
  // across the entire output range.

return 0x1.0p-64f*(float)((uint32_t)u);
}
--&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Unless you’re bottlenecked on sample generation, it’s worth considering
using an efficient implementation of Walker’s algorithm to generate
uniform random floating-point numbers over \([0,1)\).  It’s not much more
computation than the usual, it makes the most of what floating point
offers, and it eliminates a minor source of bias.  Plus, you get to
exercise the bit counting instructions and feel like that much more of a
hacker.&lt;/p&gt;

&lt;p&gt;Next time we’ll look at uniformly sampling intervals of floating point
numbers beyond \([0,1)\).  After that, on to how low-discrepancy sampling
interacts with some of the topics that came up today as well as some
discussion about avoiding an unnecessary waste of precision when sampling
exponential functions.&lt;/p&gt;

&lt;h3 id=&quot;notes&quot;&gt;notes&lt;/h3&gt;
&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:mult&quot;&gt;
      &lt;p&gt;In practice, one multiplies by \(2^{-32}\) since dividing by a power of two and multiplying by its reciprocal give the same result  with IEEE floats. &lt;a href=&quot;#fnref:mult&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:petrik&quot;&gt;
      &lt;p&gt;If I remember correctly, Petrik Clarberg explained the superiority of \(n=24\) over \(n=32\) in this context to me a few years ago; it’s a point that I underappreciated at the time. &lt;a href=&quot;#fnref:petrik&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:rounding&quot;&gt;
      &lt;p&gt;On a processor where changing the rounding mode is inexpensive, it is probably a good idea to select rounding down in this case. For example, in CUDA, the multiplication by \(2^{-32}\) might be performed using &lt;code class=&quot;highlighter-rouge&quot;&gt;__fmul_rd()&lt;/code&gt;. &lt;a href=&quot;#fnref:rounding&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html">Sampling algorithms can be subtle and tricky, as can be computations that are performed using floating-point arithmetic. This post starts a short series on that topic, starting with the fundamentals: generating uniform random samples between zero and one.</summary></entry><entry><title type="html">Update: Some Analysis of Physically Based Rendering’s Bibliography</title><link href="https://pharr.org/matt/blog/2022/01/05/pbr-bibliography-4ed.html" rel="alternate" type="text/html" title="Update: Some Analysis of Physically Based Rendering's Bibliography" /><published>2022-01-05T00:00:00-08:00</published><updated>2022-01-05T00:00:00-08:00</updated><id>https://pharr.org/matt/blog/2022/01/05/pbr-bibliography-4ed</id><content type="html" xml:base="https://pharr.org/matt/blog/2022/01/05/pbr-bibliography-4ed.html">&lt;style&gt;
    table {
        font-size: 90%;
    }
&lt;/style&gt;

&lt;p&gt;It’s just over a year ago now that I posted an &lt;a href=&quot;/matt/blog/2020/12/26/pbr-bibliography-1-3ed.html&quot;&gt;analysis of citation counts
in the bibliographies of the first three editions of &lt;em&gt;Physically Based
Rendering&lt;/em&gt;&lt;/a&gt;.  Back then
I promised an update with statistics for the forthcoming fourth
edition “in the next few weeks.”  That turns out to have been rather
optimistic, but here we now are with results finally available.&lt;/p&gt;

&lt;p&gt;The fact that these results are ready means that we’re done fiddling with
the text; we will shortly be handing it over to the publisher so that the
book production process can begin.  We have switched to &lt;a href=&quot;https://mitpress.mit.edu&quot;&gt;MIT
Press&lt;/a&gt; for the fourth edition and it’s been a
fine experience working with them so far; we’re optimistic that our
interests are all well aligned in producing a quality book, much more so
than with the previous corporate conglomerate that shall not be
named.
Happily, MIT Press has agreed that we can continue to post a free edition
of the book online.  (The current plan is for that to be made available
roughly six months after the print edition hits the shelves.)  However,
that brings us to our first point of drama in this bibliographical vanity
contest: the online edition will be a superset of the print edition and so
there are differences between their bibliographies.&lt;/p&gt;

&lt;p&gt;The differences between the two versions are due to the amount of new
content that we wrote for the fourth edition; all in all, it would be about
1,600 printed pages.  That’s too much for a single volume, at least if one
wants paper that isn’t newspaper-thin and a binding that won’t fall apart.
Thus, Wenzel and I went through the exercise of rejiggering the book into a
1,200 page version for print while maintaining the full text for the online
edition.  In deciding what would be online-only, we looked for content that
was mostly independent of the rest of the book and was little-changed from
the third edition.  As examples, both the &lt;a href=&quot;https://pbr-book.org/3ed-2018/Camera_Models/Realistic_Cameras&quot;&gt;section on realistic camera
models&lt;/a&gt; and
the &lt;a href=&quot;https://pbr-book.org/3ed-2018/Light_Transport_III_Bidirectional_Methods&quot;&gt;chapter on bidirectional light
transport&lt;/a&gt;
will not be there in the print edition this time.&lt;/p&gt;

&lt;p&gt;The print edition still includes citations and discussion of previous work
for topics that are not included in its text, though not as much of it as
in the online edition—it doesn’t make sense to go into as much depth in
the citations when the text doesn’t deeply discuss the corresponding
topics.  Therefore, here I will report the results for both the print and
online editions. No doubt there will be years of arguments to come about
which is the more proper measure—one might argue that the online
edition’s bibliography is the canonical one, as it reflects what would be
printed if physical limitations didn’t intrude, or one might argue for the
print edition in that those citations earned consumption of actual paper
and not just electrons.  We will leave that question to be resolved by
future historians of computer graphics.&lt;/p&gt;

&lt;p&gt;Finally, a few notes on methodology: as before, the following is a simple
count of how often each name appears in the bibliography.  Editing a book
or a conference proceedings doesn’t count, but otherwise every citation is
counted equally—from a single-author SIGGRAPH paper to a blog post.  The
citations include work through SIGGRAPH 2021 but nothing published
subsequently.  Yes, SIGGRAPH Asia papers are now out, but we had to draw a
line somewhere in order to get that thing out the door.&lt;/p&gt;

&lt;p&gt;As before, many caveats are in order about how arbitrary a measure this is.
Another to mention today is the impact of the fine series of Eurographics
State of the Art Reports (STARs), twelve of which appear in the fourth
edition’s bibliography.  For topics that are not central to the book (e.g.,
texture synthesis), we will often cite a STAR and only a few additional
publications rather than comprehensively survey previous work.  Thus, there
is an irony in successfully developing a new area of research to the point
that it merits a STAR: in our bibliographic measure, a lengthy publication
record may end up collapsed into a STAR and a few additional citations,
putting one lower than one would have been otherwise.&lt;/p&gt;

&lt;!--
Raw data was generated using the following:

```bash
$ grep -v bibitem biblio.tex | 
  fgrep -v &quot;(Ed.)&quot; | fgrep -v &quot;(Eds.)&quot; |
  sed -e &quot;s/~/ /g&quot; -e 's/\\//g' -e 's/{//g' -e 's/}//g' -e &quot;s/'//g&quot; |
  tr -c '[:alnum:]' '[\n*]' |
  sort | uniq -c |
  sort -nr
```

Line by line: the `\bibitem` lines are removed (since those include some
names and we don't want to double-count them), lines that include editor
names are removed, various TeX-isms and accents are stripped out, words are
printed one per line, lines are sorted and their frequency counted, and the
final result is sorted by frequency.  I then went through the output,
disregarding things like that &quot;Carlo&quot; person who seems to keep showing up
in the bibliography, and culling the actual names.
--&gt;

&lt;p&gt;With that, here are the results for all four editions—author last names
with citation counts:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;1st (2004)&lt;/th&gt;
      &lt;th&gt;2nd (2010)&lt;/th&gt;
      &lt;th&gt;3rd (2016)&lt;/th&gt;
      &lt;th&gt;4th (print, 2022)&lt;/th&gt;
      &lt;th&gt;4th (online, 2022)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Greenberg (26)&lt;/td&gt;
      &lt;td&gt;Jensen (31)&lt;/td&gt;
      &lt;td&gt;Jensen (33)&lt;/td&gt;
      &lt;td&gt;Jarosz,&lt;br /&gt;Jensen (35)&lt;/td&gt;
      &lt;td&gt;Jarosz,&lt;br /&gt;Jensen (40)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Shirley (25)&lt;/td&gt;
      &lt;td&gt;Shirley (29)&lt;/td&gt;
      &lt;td&gt;Shirley (31)&lt;/td&gt;
      &lt;td&gt;Ramamoorthi (32) &lt;/td&gt;
      &lt;td&gt;Ramamoorthi (38)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Hanrahan (22)&lt;/td&gt;
      &lt;td&gt;Greenberg,&lt;br /&gt;Hanrahan (27)&lt;/td&gt;
      &lt;td&gt;Keller (27)&lt;/td&gt;
      &lt;td&gt;Keller,&lt;br /&gt;Shirley (31)&lt;/td&gt;
      &lt;td&gt;Hanika (36)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Jensen (16)&lt;/td&gt;
      &lt;td&gt;Ramamoorthi (23) &lt;/td&gt;
      &lt;td&gt;Wald (25)&lt;/td&gt;
      &lt;td&gt;Hanika,&lt;br /&gt;Jakob (30)&lt;/td&gt;
      &lt;td&gt;Dachsbacher (33)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Arvo (14)&lt;/td&gt;
      &lt;td&gt;Wald (18)&lt;/td&gt;
      &lt;td&gt;Greenberg,&lt;br /&gt;Hanrahan (24)&lt;/td&gt;
      &lt;td&gt;Wald (29)&lt;/td&gt;
      &lt;td&gt;Jakob,&lt;br /&gt;Shirley (32)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Mitchell (13) &lt;/td&gt;
      &lt;td&gt;Keller (17)&lt;/td&gt;
      &lt;td&gt;Slusallek (21)&lt;/td&gt;
      &lt;td&gt;Slusallek (27)&lt;/td&gt;
      &lt;td&gt;Keller (31)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Keller (12)&lt;/td&gt;
      &lt;td&gt;Arvo,&lt;br /&gt;Seidel,&lt;br /&gt;Slusallek (16)&lt;/td&gt;
      &lt;td&gt;Marschner,&lt;br /&gt; Ramamoorthi (19) &lt;/td&gt;
      &lt;td&gt;Dachsbacher,&lt;br /&gt;Marschner (26)&lt;/td&gt;
      &lt;td&gt;Křivànek,&lt;br /&gt;Wald (30)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Heckbert,&lt;br /&gt; Torrance (10)&lt;/td&gt;
      &lt;td&gt;Mitchell (13)&lt;/td&gt;
      &lt;td&gt;Arvo,&lt;br /&gt;Seidel (17)&lt;/td&gt;
      &lt;td&gt;Křivànek (25)&lt;/td&gt;
      &lt;td&gt;Marschner,&lt;br /&gt;Slusallek (27)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Cook,&lt;br /&gt;Kajiya,&lt;br /&gt;Levoy,&lt;br /&gt;Pattanaik,&lt;br /&gt;Ward (8)&lt;/td&gt;
      &lt;td&gt;Pattanaik,&lt;br /&gt; Torrance,&lt;br /&gt; Walter (12)&lt;/td&gt;
      &lt;td&gt;Jarosz (16)&lt;/td&gt;
      &lt;td&gt;Hanrahan,&lt;br /&gt;Novák (23)&lt;/td&gt;
      &lt;td&gt;Hachisuka (25)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Henrik continues his reign, though he now shares
the top spot with Wojciech Jarosz, who had not a single appearance in the
first edition’s bibliography.  The smallest of lexicographical
differences—“a” before “e”—puts Wojciech first in the alphabetical
ordering.  Fittingly, Wojciech was Henrik’s Ph.D. student, so I have to
assume that for Henrik the bitterness of sharing the glory is balanced by
the sweetness of a former student’s achievement.&lt;/p&gt;

&lt;p&gt;Other than Henrik, Alex Keller and Pete Shirley are the only others who
made the list for all four editions, though Pat Hanrahan also has that
distinction if one neglects the online 4th edition.  (Surely Pat will therefore
be out there arguing vigorously that the print edition is canonical.)&lt;/p&gt;

&lt;p&gt;Ravi Ramamoorthi has displaced Pete Shirley from his long-held number two
position, though just by a hair, at least in the print edition.  Johannes
Hanika has also rocketed up in the standings, making an especially quick
climb given that he was not cited in either the first or the second
editions.  Carsten Dachsbacher has also climbed rapidly, starting from one
citation in the first edition, two in the second, and 10 in the third.
Finally and fittingly, Jaroslav Křivànek is up there in the latest edition
as well.&lt;/p&gt;

&lt;p&gt;There you have it. I must admit my disappointment at seeing that the top
two spots had the same names for both versions of the fourth edition—all
the less potential controversy over which version is the canonical one,
though there’s enough motion farther down the list that one might hope for
some sparks in the future.&lt;/p&gt;</content><author><name></name></author><summary type="html">Ending a year of suspense, the numbers are finally in for the fourth edition.</summary></entry><entry><title type="html">Debugging Your Renderer (5/n): Rendering Deterministically</title><link href="https://pharr.org/matt/blog/2021/12/24/debugging-renderers-rendering-deterministically.html" rel="alternate" type="text/html" title="Debugging Your Renderer (5/n): Rendering Deterministically" /><published>2021-12-24T00:00:00-08:00</published><updated>2021-12-24T00:00:00-08:00</updated><id>https://pharr.org/matt/blog/2021/12/24/debugging-renderers-rendering-deterministically</id><content type="html" xml:base="https://pharr.org/matt/blog/2021/12/24/debugging-renderers-rendering-deterministically.html">&lt;p&gt;Deterministic program execution has a lot going for it.  For most programs,
it’s the natural way of being: for any particular input, the program
generates the same output.  Determinism makes debugging much easier, as it
saves you from having to re-run the system repeatedly to trigger a bug that
only happens sometimes, and it’s great for end-to-end tests, since you can
safely make strict assertions about cases where the program’s output should
remain absolutely unchanged (e.g., that &lt;a href=&quot;/matt/blog/2021/12/19/debugging-renderers-end-to-end-tests.html#when-the-images-should-not-change-at-all&quot;&gt;float parser
example&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;However, deterministic execution doesn’t always come naturally when you’re
rendering, especially when you’re rendering in parallel.  Today’s post will
go into some of the ways that deterministic execution can be lost, talk
about how to maintain determinism, and then finish with some further
discussion of its benefits.&lt;/p&gt;

&lt;h2 id=&quot;the-basics&quot;&gt;The Basics&lt;/h2&gt;

&lt;p&gt;To start, let’s settle on a more precise definition of deterministic
rendering than “same input gives same output.”  It is too much to ask for
bit accuracy in output across machines; not only will we encounter
different standard math libraries with different levels of precision, but
there are a number of corners of C++ that allow for things like variation
in &lt;a href=&quot;https://en.cppreference.com/w/cpp/language/eval_order&quot;&gt;order of
evaluation&lt;/a&gt; across
compilers that can lead to innocuous differences in output.&lt;/p&gt;

&lt;p&gt;Therefore, we’ll define the observable effect of determinism as: &lt;em&gt;on a
particular system with a particular compiler, repeatedly running the
renderer on the same input always produces the same value at every pixel&lt;/em&gt;.
Implicit in that definition is that the same computations are performed to
compute each pixel’s value, though not necessarily in the same order.  That
definition is plenty for our needs; the benefits from nailing it down
further almost certainly wouldn’t be worth the trouble.&lt;/p&gt;

&lt;p&gt;A render running on a single core should naturally achieve that goal.  If
it does not, fixing that is the first order of business.  Most likely it’s
an uninitialized memory access, other memory corruption, or code somewhere
that randomly seeds a random number generator based on something that
varies like the process id or current time.  (I won’t say more about fixing
those sorts of problems here, as it’s all rendering-independent and is
regular everyday debugging.)&lt;/p&gt;

&lt;p&gt;Rendering in parallel is when things get more complicated.  Indeed, none of
the versions of pbrt before the latest,
&lt;a href=&quot;https://github.com/mmp/pbrt-v4&quot;&gt;pbrt-v4&lt;/a&gt;, was deterministic.  That was
always a minor annoyance when debugging and testing the system, though I
honestly didn’t realize what a productivity drag it was until determinism
was achieved.&lt;/p&gt;

&lt;h2 id=&quot;consistent-samples&quot;&gt;Consistent Samples&lt;/h2&gt;

&lt;p&gt;For rendering to be deterministic, the Monte Carlo sampling routines must
use exactly the same random sample points at every sample taken in every
pixel.  If they are not, then determinism is lost from the start, since
different rays will be traced each time due to slightly different rays
leaving the camera, different sampling decisions will be made at
intersections, and so forth.  One might assume that deterministic is the
natural way of being for the
&lt;a href=&quot;https://pbr-book.org/3ed-2018/Sampling_and_Reconstruction/Sampling_Interface#BasicSamplerInterface&quot;&gt;Sampler&lt;/a&gt;s
that generate those points, but that was not so prior to pbrt-v4.  There
were two issues: the placement of low discrepancy point sets and carried
state in samplers that led to nondeterminism with multithreading.&lt;/p&gt;

&lt;p&gt;When using low discrepancy point sets like Halton points, pbrt-v3 aligns
the origin of the points with the upper left pixel of the image.  That’s
normally \((0,0)\), but then if the user specifies a crop window to
render just part of the image the low discrepancy points all shift in
compensation.  That was always a bother for debugging since you couldn’t
narrow in on a problem pixel without perturbing all of the samples and
often no longer hitting the bug.  That detail was easy enough to fix given
attention to it.&lt;/p&gt;

&lt;p&gt;The other issue came from the fact that each thread maintains its own
&lt;code class=&quot;highlighter-rouge&quot;&gt;Sampler&lt;/code&gt; instance.  This way samplers can maintain state that depends on
the current pixel and pixel sample (e.g., an offset into the Halton
sequence).  Many samplers also use pseudorandom number generators (RNGs) in
their work; those, too, are per-sampler state.  (For example, the
stratified sampler uses a RNG to jitter sample locations and low
discrepancy samplers use RNGs for randomization via scrambling.)&lt;/p&gt;

&lt;p&gt;In pbrt-v3, those per-sampler RNGs are seeded once at system startup time
and then chug along, generating random numbers as requested.  Because
threads are dynamically assigned to work on regions of the image, they may
not work on the same pixels over multiple runs.  In turn, the values that a
RNG returns at a pixel both depends on which thread was assigned that pixel
as well as how many random numbers it had supplied previously for other
pixels.&lt;/p&gt;

&lt;p&gt;The fix was easy: reseed the RNG before generating sample points at a
particular pixel sample.  The &lt;code class=&quot;highlighter-rouge&quot;&gt;Sampler&lt;/code&gt; interface includes a
&lt;code class=&quot;highlighter-rouge&quot;&gt;StartPixelSample()&lt;/code&gt; method that is called before samples are requested at
a given pixel sample, so it’s just a few lines of code to put those RNGs in
a known state.  Here’s that method in &lt;code class=&quot;highlighter-rouge&quot;&gt;IndependentSampler&lt;/code&gt;, which generates
uniform independent samples without any further nuance:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StartPixelSample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Point2i&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sampleIndex&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dimension&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;rng&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SetSequence&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;rng&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Advance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sampleIndex&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;65536ull&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dimension&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;There are two things to note in &lt;code class=&quot;highlighter-rouge&quot;&gt;StartPixelSample()&lt;/code&gt;’s implementation.
First, pbrt uses the &lt;a href=&quot;https://www.pcg-random.org/index.html&quot;&gt;PCG&lt;/a&gt; RNG,
which allows the specification of both a particular sequence of
pseudorandom values as well as an offset into that sequence.  Thus, we
choose a sequence according to the pixel coordinates and then offset into
it according to the index of the sample being taken in the pixel.&lt;/p&gt;

&lt;p&gt;The other thing to mention there is &lt;code class=&quot;highlighter-rouge&quot;&gt;Hash()&lt;/code&gt;, which has been useful all
over the place in pbrt-v4.  Here is its signature:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;template&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;typename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;You can pass a bunch of values or objects straight away to it and it
marshals them up and passes them to
&lt;a href=&quot;https://en.wikipedia.org/wiki/MurmurHash&quot;&gt;MurmurHash&lt;/a&gt; to hash
them.&lt;sup id=&quot;fnref:padding&quot;&gt;&lt;a href=&quot;#fn:padding&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; In its use in the &lt;code class=&quot;highlighter-rouge&quot;&gt;IndependentSampler&lt;/code&gt;, we also allow the
user to specify a seed for random number generation; &lt;code class=&quot;highlighter-rouge&quot;&gt;Hash()&lt;/code&gt; makes
it simple to mush that together with the current pixel coordinates to
choose a pseudorandom sequence for the current pixel.&lt;/p&gt;

&lt;p&gt;There is, needless to say, a short &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/64c0a5cc0b29d6c6ffdacc53f93bc714e047e3b0/src/pbrt/samplers_test.cpp#L15&quot;&gt;unit
test&lt;/a&gt;
that ensures all of the samplers consistently generate the same sample values.&lt;/p&gt;

&lt;h2 id=&quot;other-moments-of-randomness&quot;&gt;Other Moments of Randomness&lt;/h2&gt;

&lt;p&gt;Samplers were much of the trouble in bringing pbrt-v4 into the land of
deterministic output, though two other places in the system that made
random decisions without the involvement of a sampler needed attention.&lt;/p&gt;

&lt;p&gt;First was a &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/64c0a5cc0b29d6c6ffdacc53f93bc714e047e3b0/src/pbrt/cpu/primitive.cpp#L57&quot;&gt;stochastic alpha
test&lt;/a&gt;,
deep in the primitive intersection code.  For shapes that have an alpha
texture assigned to them, we’d like to ignore any intersections where the
alpha texture is zero and randomly accept ones with fractional alpha with
probability according to their alpha value.  The sampler isn’t available in
the ray intersection routines and keeping a persistent RNG in that code has
obvious problems, so here is what we do instead:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alpha&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Evaluate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;si&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;intr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Possibly ignore intersection based on stochastic alpha test
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;Float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HashFloat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// Ignore this intersection and trace a new ray
&lt;/span&gt;        &lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Given a less-than-one alpha value, a call to &lt;code class=&quot;highlighter-rouge&quot;&gt;HashFloat()&lt;/code&gt; gives a uniform
random floating-point value between 0 and 1.  It’s a buddy of &lt;code class=&quot;highlighter-rouge&quot;&gt;Hash()&lt;/code&gt; and
is also happy to take whichever-all values you pass it to turn into a
random floating-point value.  (Above, it’s the ray origin and direction.)&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;template&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;typename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HashFloat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;...))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x1&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Thus, the results are deterministic for any given ray.&lt;/p&gt;

&lt;p&gt;The second case was in pbrt-v4’s &lt;code class=&quot;highlighter-rouge&quot;&gt;LayeredBxDF&lt;/code&gt; class, which implements &lt;a href=&quot;https://shuangz.com/projects/layered-sa18/&quot;&gt;Guo
et al.’s algorithm&lt;/a&gt; for
stochastic evaluation and sampling of the BRDFs of layered materials.  That
needs an unbounded number of independent random samples, so we instantiate
an RNG for each evaluation, but &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/64c0a5cc0b29d6c6ffdacc53f93bc714e047e3b0/src/pbrt/bxdfs.h#L513&quot;&gt;seed it via the incident and outgoing
directions&lt;/a&gt;.
Thus again, for any pair of directions passed to the BRDF evaluation
method, the same set of random samples will be generated and the returned
value will be deterministic.&lt;/p&gt;

&lt;h2 id=&quot;consistent-pixel-sums&quot;&gt;Consistent Pixel Sums&lt;/h2&gt;

&lt;p&gt;With what we have so far, the same rays will be traced each
time the renderer runs and in turn, if an assertion fires along the
way, it will do so consistently.  That’s a big benefit for debugging, but
we have not yet achieved deterministic output, which is important for
making end-to-end tests maximally useful.&lt;/p&gt;

&lt;p&gt;The remaining challenge lies in summing sample values to compute each
pixel’s final value. Because floating-point addition is not associative, if
the image samples that contribute to a pixel are not accumulated carefully
the order of summation may be different across different runs of the
program and so the output may change.  That was a problem in pbrt-v3 due to
how it computed final pixel values: there, the image is decomposed into
rectangular regions that are assigned to threads and threads generate samples
within their regions, updating the pixels that each sample contributes to.&lt;/p&gt;

&lt;p&gt;This figure illustrates the problem with that, showing all of the samples
that contribute to a particular output pixel (black dot):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/matt/blog/images/thread pixel sampling.svg&quot; height=&quot;400&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We have two threads responsible for adjacent \(4 \times 4\) pixel regions of the
image (thick boxes).  For an output image pixel near the boundary of the
two regions that has a reconstruction filter that is wider than the pixel
spacing (shaded circle), some of the samples that contribute will be taken
by thread 1 (orange dots) and some will come from samples taken by thread 2
(blue dot).  Because the threads are independent, the filtered sample
values are not accumulated in a deterministic order and thus, the final
pixel value is not deterministic.&lt;/p&gt;

&lt;p&gt;pbrt-v4 addresses this issue by adopting Ernst et al.’s &lt;a href=&quot;http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.183.3579&amp;amp;rep=rep1&amp;amp;type=pdf&quot;&gt;filter importance
sampling&lt;/a&gt;
approach.  Independent samples are taken for each output pixel, with no
sample sharing with other pixels.  If only a single thread works on a pixel
at a time, then the samples for each output pixel are naturally generated
in a consistent order, giving a consistent sum.  (Filter importance
sampling has a number of additional advantages that are detailed in the
paper, including better preservation of the benefits of high-quality
sampling patterns.)  With that tuned up, we (almost) have deterministic
output.&lt;/p&gt;

&lt;h2 id=&quot;those-pesky-splats&quot;&gt;Those Pesky Splats&lt;/h2&gt;

&lt;p&gt;One more thing… pbrt-v4’s output is not quite deterministic if a light
transport algorithm that traces paths starting from the light sources is
being used.  In that case, light path vertices are splatted into the image
at whichever pixel they are visible; if multiple threads end up splatting
into the same pixel, then we are back to nondeterminism from unordered
floating-point addition.&lt;/p&gt;

&lt;p&gt;This issue could be addressed by having each thread splat into its own
image and then summing the images at the end, though that would incur a
cost in memory use that scales with the number of threads.  Alternatively,
we might use fixed-point rather than floating-point to store those pixel
values.  For now that issue is unaddressed; it rarely causes any trouble,
especially since those splatted values are accumulated in double precision
and generally converted all the way down to half-float precision for
storage.  Most of the time that loss of precision hides any sloppy sums.&lt;/p&gt;

&lt;h2 id=&quot;the-joys-of---debugstart&quot;&gt;The Joys of --debugstart&lt;/h2&gt;

&lt;p&gt;The greatest benefit of deterministic rendering has been the ability to
quickly iterate on bugs: you can add some logging code or more assertions,
recompile, and re-render, confident that the new code will see the same
inputs as triggered the bug.  Samplers that give exactly the same
samples at each pixel also means that you can speed things up by just
rendering a crop window or even a single pixel as you’re chasing a bug.&lt;/p&gt;

&lt;p&gt;Even better, it was easy to go even further and add support for retracing
just a single offending ray path.  pbrt-v4 has a &lt;code class=&quot;highlighter-rouge&quot;&gt;CheckCallbackScope&lt;/code&gt; class
that uses RAII to register a callback function that will run if an
assertion fails or if the renderer crashes.  Here is how it is used in most
of pbrt’s CPU integrators:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;thread_local&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Point2i&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threadPixel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;thread_local&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threadSampleIndex&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;CheckCallbackScope&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StringPrintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Rendering failed at pixel (%d, %d) sample %d. Debug with &quot;&lt;/span&gt;
                        &lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;--debugstart %d,%d,%d&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                        &lt;span class=&quot;n&quot;&gt;threadPixel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threadPixel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threadSampleIndex&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                        &lt;span class=&quot;n&quot;&gt;threadPixel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threadPixel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threadSampleIndex&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As rendering proceeds, each thread keeps its thread-local &lt;code class=&quot;highlighter-rouge&quot;&gt;threadPixel&lt;/code&gt; and
&lt;code class=&quot;highlighter-rouge&quot;&gt;threadSampleIndex&lt;/code&gt; variables up to date and if the renderer aborts due to
an error, you get a message like:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Rendering failed at pixel (915, 249) sample 83. Debug with &quot;--debugstart 915,249,83&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;at the bottom of the crash output.  If you then rerun pbrt passing it that
&lt;code class=&quot;highlighter-rouge&quot;&gt;--debugstart&lt;/code&gt; option, a &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/64c0a5cc0b29d6c6ffdacc53f93bc714e047e3b0/src/pbrt/cpu/integrators.cpp#L68&quot;&gt;specialized code
path&lt;/a&gt;
traces just that single ray path in the main thread of execution.  That
gives a simpler debugging context than launching a bunch of threads and
waiting for the bug to hit again; it’s delightfully helpful for bugs that
otherwise only happen after a substantial amount of time has gone by.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;We’ve made it past “detecting rendering bugs” and have made our way to
“reliably replicating those bugs.”  Next time will be a few thoughts about
performance bugs before we get into actual debugging techniques.&lt;/p&gt;

&lt;h2 id=&quot;note&quot;&gt;note&lt;/h2&gt;
&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:padding&quot;&gt;
      &lt;p&gt;The attentive reader of &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/64c0a5cc0b29d6c6ffdacc53f93bc714e047e3b0/src/pbrt/util/hash.h#L121&quot;&gt;the &lt;code class=&quot;highlighter-rouge&quot;&gt;Hash()&lt;/code&gt; implementation&lt;/a&gt; will note that if a struct or class that
        has padding between elements is passed to it, the
        results may be nondeterministic since it hashes their in-memory contents
        directly. It would be nice to use a C++ SFINAE trick to get a
        compilation error in that case, but I’m not aware of a way to
        detect that at compile time. &lt;a href=&quot;#fnref:padding&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html">Making a renderer fully deterministic—the same input always giving exactly the same output—has a few tricky corners that were never all addressed in pbrt until the latest version. Achieving that determinism has all sorts of benefits for testing and debugging.</summary></entry><entry><title type="html">Debugging Your Renderer (4/n): End-to-end tests (or, “why did that image change?”)</title><link href="https://pharr.org/matt/blog/2021/12/19/debugging-renderers-end-to-end-tests.html" rel="alternate" type="text/html" title="Debugging Your Renderer (4/n): End-to-end tests (or, &quot;why did that image change?&quot;)" /><published>2021-12-19T00:00:00-08:00</published><updated>2021-12-19T00:00:00-08:00</updated><id>https://pharr.org/matt/blog/2021/12/19/debugging-renderers-end-to-end-tests</id><content type="html" xml:base="https://pharr.org/matt/blog/2021/12/19/debugging-renderers-end-to-end-tests.html">&lt;p&gt;Here we are, three posts into the meat of this series, and we’re still on
the topic of determining if the renderer is buggy in the first place—the
actual craft of debugging has not yet seen much discussion.  We’re getting
there—I promise—but I’m going to finish discussing ways of detecting
bugs before getting into fixing them.&lt;/p&gt;

&lt;p&gt;Beyond unit tests, I’ve also found that having a good set of end-to-end
rendering tests is of enormous benefit.  In this context, the idea of an
end-to-end test is simple: you render an image of a scene and then check
the image to make sure it is correct.&lt;/p&gt;

&lt;p&gt;There’s plenty of nuance in that sentence: which scene?  (And not just one,
right?)  How do you check whether the output is correct?  Needless to say,
it’s “many scenes,” and as we’ll see, verifying correctness from an image
can be as much art as science.  We’ll dig into all of those questions
today.&lt;/p&gt;

&lt;h2 id=&quot;building-a-library-of-test-scenes&quot;&gt;Building a Library of Test Scenes&lt;/h2&gt;

&lt;p&gt;I’ve been collecting scenes to use for testing pbrt for at least a decade;
there are upward of 600 of them in the test suite today.  Most of them
don’t make pretty pictures and some output very low resolution images. Some
are as small as \(10 \times 10\) pixels—nothing much to look at at all.
They can be split into a few categories:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Simple scenes with analytic solutions.&lt;/li&gt;
  &lt;li&gt;Scenes that target a single renderer feature.&lt;/li&gt;
  &lt;li&gt;Complex(ish) scenes.&lt;/li&gt;
  &lt;li&gt;Reproduction cases for user-reported bugs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each type is valuable.  Take the scenes with analytic
solutions: one such scene is a diffuse sphere with radius 1, a reflectance
of 0.5, and a point light with intensity \(\pi\) at its center.  Put a
camera inside that thing and render it with your path tracer: if your
pixels don’t all have a value very close to 1 (given sufficient samples),
you’ve got a bug.  Stop right there, fix it, and be happy you had such an
easy way to detect something was off.&lt;/p&gt;

&lt;p&gt;You can take that scene and easily make variations of it.  Replace that
single point light with four point lights with intensities that sum to
\(\pi\)—that should be all ones as well.  Or Take out the point light
and make the interior of the sphere emissive with spatially- and
directionally-uniform radiance of 0.5, leaving the diffuse reflectance at
0.5.  Once again, you should get pixels that are all 1.  That emissive
sphere you can make bigger or smaller; it should be all ones if you make a
variant with a different radius.&lt;/p&gt;

&lt;p&gt;Once you start thinking in terms of scenes where you can work out the
correct answer, there’s lots more you can do.  You could light a diffuse
quad with an infinite light source and then again with an emissive sphere
surrounding it.  You could test your bidirectional algorithms by putting a
glass where with an index of refraction of 1 around the quad; in principle,
that should have no effect.&lt;/p&gt;

&lt;p&gt;And then you can also make variants of those variants that exercise all of
the different sample generation algorithms and light transport algorithms;
each of those is just a small change to the scene description file, so
getting up to 600 doesn’t need to go one at a time.&lt;/p&gt;

&lt;p&gt;The analytic scenes rarely fail once you’ve gotten them working the first
time, but when they do, the debugging problem is a relatively easy
one—much nicer than “images of the Moana Island scene are too dark when
the bidirectional path tracer is used.”  For example, for the scene with a
single point light, every ray should return the same value—at each
intersection point, the reflected radiance due to direct lighting should be
0.5 and then the indirect radiance (also 0.5) should be scaled by 0.5.
(Expand out that series and you get your expected value of 1.)&lt;/p&gt;

&lt;p&gt;Of course, those scenes may all render correctly and you may well still
find that the Moana Island scene is still too dark with your bidirectional
path tracer, but you’ve at least carved off the easy-to-fix cases in a way
that makes them easy to debug.&lt;/p&gt;

&lt;p&gt;For most of the renderer’s capabilities, it’s not too hard to come up with
a simple scene that targets that feature without exercising too many other
parts of the renderer.  Those are also useful to have in end-to-end
tests. As an example, pbrt’s test suite includes a scene comprised of a
single quad with a high-frequency texture viewed at an oblique angle.  The
BSDF is diffuse, it’s lit by a directional light, there’s no complex
visibility or multiple light scattering.  This is it:&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;img width=&quot;400&quot; height=&quot;400&quot; src=&quot;/matt/blog/images/aa-perspective.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;That scene is effectively a test of pbrt’s ray differentials and texture
filtering code.  If one makes a change to the renderer and then that scene
goes bad, you can make a good guess about where the bug lies from the
limited subset of the rendering code that runs in generating it.  In such a
case, if scenes without textures still render correctly then you have a
stronger hint, though if those are also broken, then you have a hint that
texture filtering isn’t your problem.  (Or, that you have multiple
problems.)&lt;/p&gt;

&lt;p&gt;Sometimes things only go wrong in the presence of complexity; a number of
scenes culled from the &lt;a href=&quot;https://github.com/mmp/pbrt-v4-scenes&quot;&gt;pbrt-v4-scenes
distribution&lt;/a&gt; and added to the
end-to-end tests take care of that.  When those scenes fail, it’s usually
the case that simpler ones do as well.  If not, it’s often worth trying to
simplify the more complex scene as much as possible while still hitting the
bug; that, too, is a source of more test scenes for the future.  (More on
that topic in a future post as well.)&lt;/p&gt;

&lt;p&gt;Finally, there are the scenes from user bug reports. I add all of those to
the test suite; not only are they all cases that testing previously wasn’t
rigorous enough to catch, but there’s no reason to risk the embarrassment
(on this end) and annoyance (on the bug reporter’s end) of that same bug
reappearing in the future due to a change to the renderer inadvertently
reintroducing it.&lt;/p&gt;

&lt;p&gt;There is a time versus coverage trade-off in assembling this collection of
scenes: the more scenes you have with the more pixels to render and the
more samples per pixel, the more you’re exercising the renderer.  Yet, the
more of all of that you have, the longer it takes to run the tests.  If
running them takes too long, you won’t run them as often as you should.
I’ve ended up tuning them to be about an hour of single-core CPU time
(though they run on multiple cores, so it’s just a few minutes of
wall-clock time).  As you add scenes and the total time to run all of them
increases, you can judiciously reduce the resolution of some of the tests
or dial down the sampling rate used when rendering them.&lt;/p&gt;

&lt;h2 id=&quot;does-everything-render-to-completion&quot;&gt;Does Everything Render to Completion?&lt;/h2&gt;

&lt;p&gt;So you have a few tens or hundreds of test scenes and, let’s hope, a script
to render all of them and save the images.  What now?  Run that script and
see what happens.&lt;/p&gt;

&lt;p&gt;Most of the corners of the renderer’s code ends up being fairly well
exercised if you have hundreds of varied scenes designed to exercise
it.&lt;sup id=&quot;fnref:coverage&quot;&gt;&lt;a href=&quot;#fn:coverage&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; That’s good news for your assertions, as far as giving them
plenty of variety to assert about.  It’s also encouragement to add more
assertions; sometimes adding a new assertion and running through all of the
existing test scenes will unearth a new failure.  You might even add
expensive assertions for a single run-through of the test scenes to see if
they find anything, planning to debug if so and to remove them or demote
them to debug-only assertions when you’re done.&lt;/p&gt;

&lt;p&gt;Finding a failing assertion in that way really is a good thing, even though
you’ve found more work for yourself.  You’ve got yourself a debugging task
ahead of you but it’s not completely open ended, and it’s on your own
terms without the panic of a user reporting a serious bug where you have
no idea what the cause may be.  It’s also likely with a simpler scene than
a user would have been rendering if they encountered the bug later.&lt;/p&gt;

&lt;p&gt;Assertions aside, the renderer may crash for some or even all of the
scenes.  Same deal with that: a crash is not fun, but better to find it
yourself while running the tests and fix it before your users are bothered
by it.&lt;/p&gt;

&lt;p&gt;A good collection of test scenes is also good fodder for tools like
&lt;a href=&quot;https://valgrind.org&quot;&gt;valgrind&lt;/a&gt;,
&lt;a href=&quot;https://valgrind.org/docs/manual/hg-manual.html&quot;&gt;helgrind&lt;/a&gt;, and assorted
&lt;a href=&quot;https://clang.llvm.org/docs/AddressSanitizer.html&quot;&gt;sanitizers&lt;/a&gt;.  There’s a
much better chance of those sorts of tools finding something if you give
them a variety of rendering computations to examine.  Chasing down any
errors those report is also something you must do before proceeding when
you find them: there’s no way to know how much havoc lies in their wake, so
you might as well fix them once you’re aware of them, lest you spend hours
chasing down some other bug that turned out to be due to one of those.&lt;/p&gt;

&lt;h2 id=&quot;are-the-images-correct&quot;&gt;Are the Images Correct?&lt;/h2&gt;

&lt;p&gt;If all of the scenes render to completion, now you have a few hundred
images sitting on disk.  How do you know if each one is correct?&lt;/p&gt;

&lt;p&gt;For pbrt’s test scenes, I maintain a set of “golden” images that provide a
reference.&lt;sup id=&quot;fnref:gengold&quot;&gt;&lt;a href=&quot;#fn:gengold&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;  The test script then checks the output from the current version
of the renderer with the golden images.  How tricky could that be?  The
first hard problem is generating golden images in the first place.  The
second is determining if a rendered image is correct.  We’ll consider both
topics in turn.&lt;/p&gt;

&lt;p&gt;Creating an initial set of golden images is a bootstrapping problem.  For
the scenes with analytic solutions you can manually verify correctness via their
pixel values, but for the rest it’s not so easy.  I have partially been
able to sidestep that issue by assuming that the last released version of
pbrt is bug free and using its output as a starting point.  While pbrt is
surely not bug free, after it has been out for a few years enough people
have spent enough time with the code that it’s reasonable to assume it’s in
pretty good shape.&lt;/p&gt;

&lt;p&gt;For a different renderer, one might try using the output of pbrt or another
renderer as an initial reference, though that’s tricky business, with
differences in BSDF models, texture filtering, and details like rendering
in RGB versus using spectra.  One can at least make sure that one’s
renderer is in the right ballpark that way, if another renderer is both
trusted and well-understood and if it’s not too hard to render scenes in
both it and your own renderer.&lt;/p&gt;

&lt;p&gt;Another option is to gain confidence in candidate golden images via
experiments.  We’ll come back to this topic in more detail once we get to
debugging techniques, but to understand the idea, let’s consider that
texture filtering test from before.  Say that you’ve implemented ray
differentials and a texture filtering algorithm and can render images that
aren’t obviously wrong.  Lacking a verified solution, how can you become
more confident that they are correct?&lt;/p&gt;

&lt;p&gt;You might render the scene with no texture filtering but with many pixel
samples to get an antialiased image that way.  That’s something to compare
to.  You know that your implementation won’t match that perfectly, but if
it’s too far off you might be suspicious of your differentials’
correctness.  Another useful technique is to explore the parameter space:
render it once with your implementation, then again with your texture
filter widths half as wide as you think they should be, then again with
them twice as wide. You should see aliasing with the narrow filters and
blurring with the wide ones.  If so, you have some more confidence in your
implementation, and if not, you have something to dig into further.&lt;/p&gt;

&lt;p&gt;Here are some images that show the results of applying that approach for
the texture filtering test above; the images are as we would expect.  (The
images are presented using &lt;a href=&quot;https://jeri.io&quot;&gt;jeri&lt;/a&gt;; click on
them and hit ‘f’ to go full screen if necessary to see the differences.)&lt;/p&gt;

&lt;div class=&quot;card-img-top&quot; style=&quot;display:block; padding-top: 47%;  position:relative;&quot;&gt;
&lt;div id=&quot;aa-comparison&quot; style=&quot;position: absolute; top: 0; left: 0; right: 0; bottom: 0;&quot;&gt;&lt;/div&gt;&lt;/div&gt;
&lt;script&gt;
Jeri.renderViewer(document.getElementById('aa-comparison'), {
  title: 'aa-comparison', children: [
 { title: 'Antialiased', image: '/matt/blog/images/aa-perspective.png' },
 { title: '2x filter widths', image: '/matt/blog/images/aa-perspective-blurred.png' },
 { title: '1/2x filter widths', image: '/matt/blog/images/aa-perspective-aliased.png' },
 ]});
&lt;/script&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;At minimum one may decree that the output of the renderer at some point in
time gives the golden images.  Going forward, any deviation in them should
be explained, either from fixing a bug or from a well-understood
improvement to the renderer.&lt;/p&gt;

&lt;h2 id=&quot;when-the-images-should-not-change-at-all&quot;&gt;When the Images Should Not Change at All&lt;/h2&gt;

&lt;p&gt;Given golden images, a change to the renderer, and a run of the end-to-end
tests, we have a set of new images that may or may not the match the golden
images.  How one feels about that depends on the sort of change one has
made.  Here are a few representative cases where not a single pixel of a
single image should be different:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;A pooled memory allocator was introduced to optimize small memory
allocations.&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;An optimized routine for parsing text floating-point values in the scene
description was adopted.&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;The function that loads image texture maps has been parallelized to
reduce start-up time.&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For all of those cases, there’s no reasonable explanation for why anything
should change in the final output, yet sometimes you make a change like
that and find differences.  If it’s major differences, then presumably
you’ve broken something fundamental; the debugging problem in those cases
is often not too bad due to the wide impact.  Choose the one of the
simplest scenes that went astray and take it from there.&lt;/p&gt;

&lt;p&gt;For minor differences, it’s also critical to understand what happened.  It
can hard to be disciplined about that: if it’s just one pixel in one scene
out of hundreds of scenes with perhaps billions of pixels changes after you
replaced the float parser, it’s easy to tell yourself that a single float
was parsed differently and hey, quite possibly you just fixed a bug you
didn’t know you had.  Yet something more serious may be lurking; it may
just be that your tests only hit a buggy case once but other scenes would
hit it often.  If you don’t understand the root cause, you’re building the
rest of the system on sand.&lt;/p&gt;

&lt;p&gt;For the case of the float parser, it’d be crucial to track down which float
(or floats) went astray and why—keep both parsers around, call both for
each float parsed, and assert that both give the same result.  When they
disagree, figure out which one was correct.  Your assertion may never fire,
which would be “interesting” as well; it may be that the pixel change was
not due to a difference in parsing floats but was due to some other bug
that was tickled by your changes.  Those sorts of bugs aren’t fun to chase
down but are equally important to understand when you encounter them.&lt;/p&gt;

&lt;p&gt;Implicit in these imperative statements about no pixels changing has been
the assumption that the renderer is &lt;em&gt;deterministic&lt;/em&gt;—that rendering the
same scene gives exactly the same output image.  For now we will take that
as given.  Making the renderer so is tricky but worthwhile; that will be
the sole topic of the next post in this series.&lt;/p&gt;

&lt;h2 id=&quot;when-the-images-may-change&quot;&gt;When the Images May Change&lt;/h2&gt;

&lt;p&gt;Whenever changes are made to code involving ray tracing, other geometric
computations, or light transport algorithms, it’s almost inevitable that
images will change.  This brings us to the tricky question of “are those
changes ok, or suggestive that there is a bug?”&lt;/p&gt;

&lt;p&gt;To motivate this case, let’s consider a (real) example: making what is
believed to be an improvement to the algorithm that makes sure that rays
leaving bilinear patches do not incorrectly reinstersect the patch.
Assuming that we had a reasonable algorithm for this previously, we
would expect very small changes in the images for every scene that has
bilinear patches in it, but would not expect any big image changes.
(Though we might hope to have a scene that shows a case where the current
algorithm is insufficient, in which case we would hope for significant
and visually evident improvement with it.)&lt;/p&gt;

&lt;p&gt;My testing script uses pbrt’s &lt;em&gt;imgtool&lt;/em&gt; program to compare the output
images to the golden images.  It prints nothing when they match exactly, so
if you’ve just changed the float parser, you might run the end-to-end
tests, wait for them to finish, and move along happily if nothing is
reported.  When there is a discrepancy, &lt;em&gt;imgtool&lt;/em&gt;’s output is like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/matt/blog/images/blp-imgtool.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;That output is carefully crafted. The three lines in turn:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The news: the images are different.&lt;/li&gt;
  &lt;li&gt;The pathnames of the two images, relative to the current
directory. These are there alone and together on a line so that it’s
easy to triple click that line to select it, then type the name of an
image viewer in the shell, paste the selection, hit return, and then view the two
images.&lt;/li&gt;
  &lt;li&gt;Numerical details about the images and how they differ.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Those details include the average value of all of the pixels in each image,
their percentage difference, and their mean squared difference.
Often those numbers alone are enough to indicate what’s going on.
If we saw something like the above for all of the test scenes that had
bilinear patches in them (minuscule differences in average pixel values and
MSE), we could be fairly confident that all was well.  It would still be
worth a quick glance at a few of the images, but there would be no need to
view all of them to feel good about the change.&lt;/p&gt;

&lt;p&gt;With that workflow in mind, &lt;em&gt;imgtool&lt;/em&gt; offers some color in its output to
make it easier to see higher levels of error.  Here’s what it said about
another scene after I made that change:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/matt/blog/images/splash-imgtool.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;That red text says “this seems a little high”, and indeed it is so—here
are the corresponding images:&lt;/p&gt;

&lt;div class=&quot;card-img-top&quot; style=&quot;display:block; width: 100%; padding-top: 47%;  position:relative;&quot;&gt;
&lt;div id=&quot;bad-splash&quot; style=&quot;position: absolute; top: 0; left: 0; right: 0; bottom: 0;&quot;&gt;&lt;/div&gt;&lt;/div&gt;
&lt;script&gt;
Jeri.renderViewer(document.getElementById('bad-splash'), {
  title: 'bad-splash', children: [
 { title: 'Test Image', image: '/matt/blog/images/run-splash.pbrt.png' },
 { title: 'Golden Image', image: '/matt/blog/images/golden-splash.pbrt.png' },
 ]});
&lt;/script&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Something funny is happening at the boundary of the liquid at the top of
the cup; it is evident that one of two images must be wrong, though it
isn’t obvious which one is.  Time to start debugging.&lt;/p&gt;

&lt;h2 id=&quot;using-statistics-to-your-advantage&quot;&gt;Using Statistics to Your Advantage&lt;/h2&gt;

&lt;p&gt;For the bilinear patch intersection example, the image statistics are
useful for giving a good first indicator of “all is well”, “something may
be fishy here”, or “things are Not Good.”  That is plenty useful, but when
one is making changes to Monte Carlo sampling code, those numbers have even
greater value.  Consider improving a BRDF importance sampling routine 
to better match the BRDF.  In that case, we hope for significant
image changes for the better thanks to lower error.  How do we distinguish
between an improvement in error and an incorrect result?&lt;/p&gt;

&lt;p&gt;Just looking at the images may not be enough.  Consider these three images
of the San Miguel scene where the first is the baseline reference and the
others correspond to two different changes to the renderer, one correct and
one buggy.  It’s not evident from just looking at the images which one
is wrong.&lt;/p&gt;

&lt;div class=&quot;card-img-top&quot; style=&quot;display:block; width: 100%; padding-top: 47%;  position:relative;&quot;&gt;
&lt;div id=&quot;sanmiguel&quot; style=&quot;position: absolute; top: 0; left: 0; right: 0; bottom: 0;&quot;&gt;&lt;/div&gt;&lt;/div&gt;
&lt;script&gt;
Jeri.renderViewer(document.getElementById('sanmiguel'), {
  title: 'sanmiguel', children: [
 { title: 'Reference', image: '/matt/blog/images/sanmiguel-ref.exr' },
 { title: 'Change A', image: '/matt/blog/images/sanmiguel-a.exr' },
 { title: 'Change B', image: '/matt/blog/images/sanmiguel-b.exr' },
 ]});
&lt;/script&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;However, &lt;em&gt;imgtool&lt;/em&gt; has something interesting to report: the average pixel
value of “Change A” is 0.17% higher than the reference image, but the
average pixel value of “Change B” is 4.03% higher.  In the context of
unbiased Monte Carlo, a 4% change is most definitely a sign of something
going wrong.&lt;/p&gt;

&lt;p&gt;One way to think about why this is so is that if you’re using unbiased
Monte Carlo algorithms, rendering images of thousands of pixels, each with
tens of samples, then you have hundreds of thousands or even millions of
sample values that feed into that average.  If you have changed your
importance sampling routines (and your estimators don’t have ridiculously
high variance), then those average image values should be well locked
in if both “before” and “after” are bug-free.&lt;/p&gt;

&lt;p&gt;That idea also explains why that San Miguel test has a fairly low sampling
rate—just 16 samples per pixel.  You often don’t need to render the whole
image to convergence to tell if the Monte Carlo bits have gone wrong; the
statistics over all of the pixels often tell the tale.&lt;/p&gt;

&lt;p&gt;But how do you know how much of a change is acceptable?  Is that 0.17%
something to worry about?  In practice, it depends; the answer depends on
how many samples you’re taking and how much variance there is in your
estimators.  For pbrt’s tests, I’ve learned to have a sense of what’s
expected, but that’s admittedly imprecise.  A much better way would be to
follow the ideas presented in Kartic Subr and Jim Arvo’s &lt;a href=&quot;http://www0.cs.ucl.ac.uk/staff/K.Subr/research.html#HypothesisMCEstimators&quot;&gt;paper on
applying proper statistical tests to these
tasks&lt;/a&gt;.
They show not only the right way to decide if two images have the same
mean, accounting for the number of samples taken in setting a threshold,
but also showing how to robustly determine the answer to questions like
“does image a have lower variance than image b?”&lt;/p&gt;

&lt;p&gt;For all of these evaluations of images, it’s crucial that images are stored
in floating point, not clamped, and without any tone mapping or gamma
correction.  When you’re making images for people to look at, you’re more
than welcome to use 8-bit PNGs and run your pixels through the ACES curve
for a “filmic” look.  For the purposes of end-to-end tests, maintaining
good old linear values with their full dynamic range is the only thing that
allows you to reason about what’s going on with them statistically.&lt;/p&gt;

&lt;p&gt;Finally, even if the numbers look good, it’s still important to view the
images, or at least all of those ones with the greatest reported
differences.  A shortcoming of those image-wide statistics is that they
don’t indicate whether the error has some unsightly structure to it that is
sneaking under the radar.  One way to better automate that test would be to
also use a perceptual error metric like
&lt;a href=&quot;https://research.nvidia.com/publication/2020-07_FLIP&quot;&gt;ꟻlip&lt;/a&gt;, though
that requires high-quality reference images, which pbrt’s end-to-end tests
currently avoid in the interests of running more quickly.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;This has turned into a longer post than I intended and there’s still plenty
more to say, especially about the tricky problem of having two rendered
images and trying to figure out what their differences signify.  We will
most certainly come back to that in following posts since it is frequently
integral to the renderer debugging process.&lt;/p&gt;

&lt;p&gt;The best thing about having a good set of tests—both unit and
end-to-end—is being able to iterate on code with confidence.  You can
refactor swaths of the system, you can cleanup things that are a little
grungy, and if the tests are clear, you can feel confident about committing
those changes.  Sometimes you can try out speculative ideas—things where
you’re not sure if the idea is right—and quickly gather some empirical
data about whether the idea works or not.  If those indicators are
promising and you pursue your idea you should still find better ways to
validate it, but I’ve found that a quick yes/no can be a helpful guide.&lt;/p&gt;

&lt;p&gt;Next time we’ll go into the details of making a renderer deterministic,
which is one of the foundations of everything discussed today.  That post
will certainly be less to digest than this one was.&lt;/p&gt;

&lt;h1 id=&quot;notes&quot;&gt;notes&lt;/h1&gt;

&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:coverage&quot;&gt;
      &lt;p&gt;The Right Thing to do would be to use a tool that measures code
         coverage, see which parts of the renderer never or rarely run
         given your test scenes, and to introduce new scenes
         intentionally to exercise that code.  Admittedly, I have not
         yet found that discipline for pbrt. &lt;a href=&quot;#fnref:coverage&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:gengold&quot;&gt;
      &lt;p&gt;Note that the golden images must be generated from scratch for
        each operating system and compiler used, as differences in
        details like precision in the system math library usually leads
        to minor image differences across different systems. &lt;a href=&quot;#fnref:gengold&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html">Still in the thick of the task of detecting the presence of bugs in a renderer in the first place, this time the focus is on the value of a large suite of test scenes. Soon soon we will turn to what to do about all of these bugs when we find them.</summary></entry><entry><title type="html">Debugging Your Renderer (3/n): Assertions (and on not sweeping things under the rug)</title><link href="https://pharr.org/matt/blog/2021/12/02/debugging-renderers-assertions.html" rel="alternate" type="text/html" title="Debugging Your Renderer (3/n): Assertions (and on not sweeping things under the rug)" /><published>2021-12-02T00:00:00-08:00</published><updated>2021-12-02T00:00:00-08:00</updated><id>https://pharr.org/matt/blog/2021/12/02/debugging-renderers-assertions</id><content type="html" xml:base="https://pharr.org/matt/blog/2021/12/02/debugging-renderers-assertions.html">&lt;p&gt;Today we’ll keep the discussion to the topic of runtime assertions in
renderers; next time it’ll be on to end-to-end tests, which will start
to lead us into a more image-focused view of graphics debugging that will
keep us busy for a while.&lt;/p&gt;

&lt;p&gt;A principle in the last post on &lt;a href=&quot;/matt/blog/2021/11/26/debugging-renderers-unit-tests.html&quot;&gt;unit testing for
renderers&lt;/a&gt; was
the idea that you’d like your debugging problem to be as simple as
possible; one way to achieve that is if bugs manifest themselves in a way
other than “some of these pixels don’t look right…”  While there will
always be plenty of that sort of bug, those are usually a much harder
debugging problem than a conventional one like “the program printed an
error and crashed.”  A good set of runtime assertions can be an effective
way to turn obscure bugs into more obvious ones.&lt;/p&gt;

&lt;p&gt;An assertion is a simple thing: a statement that a condition is always true
at some point in the execution of a program.  It seems that the original
idea of them dates to Goldstine and von Neumann in 1947.&lt;sup id=&quot;fnref:firstassert&quot;&gt;&lt;a href=&quot;#fn:firstassert&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; If
such a statement is ever found to be false, then a fundamental assumption
underlying the system’s implementation has been violated.  The
implications—to the performance of the program or to the correctness of
its output—may be wide-ranging and possibly impossible to recover from.
Assertions a great way to catch little things early before they turn into
big things that are only evident much later.&lt;/p&gt;

&lt;p&gt;In contrast to unit tests, which just have to be fast enough to not be
annoying to run often, assertions must be efficient, since they often run
in the innermost loops of the renderer.  In return, they have the advantage
that they can check many more situations than a unit test. It turns
out that a myriad of unexpected edge cases come up as you trace billions of
rays in many different scenes.  Yet an assertion that has no chance of
firing is only a drag on overall performance without offering any value.
The art is to write the ones that you don’t think will ever fire but yet
sometimes do so.&lt;/p&gt;

&lt;p&gt;For a well-written general discussion of assertions, see &lt;a href=&quot;https://blog.regehr.org/archives/1091&quot;&gt;John Regehr’s
blog post on the topic&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;the-basics&quot;&gt;The Basics&lt;/h2&gt;

&lt;p&gt;While C++ provides an &lt;a href=&quot;https://en.cppreference.com/w/cpp/error/assert&quot;&gt;assert
macro&lt;/a&gt; in the standard
library, it has a few shortcomings:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Assertions are either enabled or disabled, via the &lt;code class=&quot;highlighter-rouge&quot;&gt;NDEBUG&lt;/code&gt; macro. Often,
they are disabled completely for optimized builds, which in turn means that
they run rarely and do not catch many bugs.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;When an assertion fails, only the text of the assertion (e.g., “x &amp;gt; 0”)
and its location in the source code is printed without any further
context.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;pbrt-v4 therefore has its &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/c4cfd6679e436d512bed5b03fed33a1971d8ee6d/src/pbrt/util/check.h#L36&quot;&gt;own set of assertion
macros&lt;/a&gt;,
which are also integrated with pbrt’s runtime logging system.  pbrt’s
assertion macros are based on &lt;a href=&quot;https://github.com/google/glog#check-macros&quot;&gt;those in Google’s glog
package&lt;/a&gt;.  It includes
assertions that are always included, even in release builds, and those that
are only for debug builds, where more costly checks may be acceptable.
They also provide much more helpful information than &lt;code class=&quot;highlighter-rouge&quot;&gt;assert()&lt;/code&gt; does when
an assertion fails.&lt;/p&gt;

&lt;p&gt;Beyond a basic Boolean assertion (&lt;code class=&quot;highlighter-rouge&quot;&gt;CHECK()&lt;/code&gt;), there are separate assertions
for checking equality, inequality, and greater-than/less-than.  For
example, &lt;code class=&quot;highlighter-rouge&quot;&gt;CHECK_GE()&lt;/code&gt; checks that the first value provided to it is greater
than or equal to the second.  Here is an example of its use in pbrt:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CHECK_GE(1 - pAbsorb - pScatter, -1e-6);
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;There’s a bit of context packed into that simple check: we have two
probabilities, &lt;code class=&quot;highlighter-rouge&quot;&gt;pAbsorb&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;pScatter&lt;/code&gt;, and if you look at the code
&lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/c4cfd6679e436d512bed5b03fed33a1971d8ee6d/src/pbrt/cpu/integrators.cpp#L999&quot;&gt;before
it&lt;/a&gt;
you can see that the light transport algorithm has just computed three probabilities
where the third, &lt;code class=&quot;highlighter-rouge&quot;&gt;pNull&lt;/code&gt; is &lt;code class=&quot;highlighter-rouge&quot;&gt;1 - pAbsorb - pScatter&lt;/code&gt;.  Thus, the assertion is
effectively making sure that we are using valid probabilities when
computing &lt;code class=&quot;highlighter-rouge&quot;&gt;pNull&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;More broadly, that check is in the context of pbrt’s code for sampling
volumetric scattering.  That code requires that the volumetric
representation provide a majorant that bounds the density of the volume
over a region of space.  The &lt;code class=&quot;highlighter-rouge&quot;&gt;CHECK_GE()&lt;/code&gt; then is effectively checking that
the majorant is a valid bound.  Thus, it’s really a check on the
validity of the code that computes those bounds, which is &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/c4cfd6679e436d512bed5b03fed33a1971d8ee6d/src/pbrt/media.cpp#L552&quot;&gt;far away in the
system&lt;/a&gt;
from where the check is made.&lt;/p&gt;

&lt;p&gt;While that decoupling has the disadvantage that a failing assertion may
require searching to find the code actually responsible for the bug, the
advantage is that the check is made at every sample taken in every
volumetric medium that is provided to pbrt for rendering; it gives the
majorant computations a thorough workout.  That check has found many bugs
in that code since it was introduced; there are plenty of corner cases in
the majorant computations, especially when you’re doing trilinear
interpolation, which requires considering a larger footprint, and also
using the nested grid representation of
&lt;a href=&quot;https://dl.acm.org/doi/fullHtml/10.1145/3450623.3464653&quot;&gt;NanoVDB&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If that assertion fails, pbrt dumps more information than just the text of
the assertion:&lt;sup id=&quot;fnref:digits&quot;&gt;&lt;a href=&quot;#fn:digits&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[ tid 12129819 @     1.252s cpu/integrators.cpp:1004 ]
    FATAL Check failed: 1 - pAbsorb - pScatter &amp;gt;= -1e-6
        with 1 - pAbsorb - pScatter = -0.3336507, -1e-6 = -0.000001
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;In addition to the id of the thread in which the assertion failed, we have
the elapsed time since rendering began (about 1.25 seconds here), the
location of the assertion in the source code, what was asserted, as well as
both of the values that were passed to &lt;code class=&quot;highlighter-rouge&quot;&gt;CHECK_GE()&lt;/code&gt;.  Having those values
immediately at hand is often helpful.  In the best case, one can understand
the bug immediately, for example by seeing that an edge case that had been
assumed to be impossible actually happens in practice.  For this one,
knowing whether the value was slightly outside of the limit or far outside
of the limit (as it was here) may be a good starting point for further
investigation.&lt;/p&gt;

&lt;p&gt;A full stack trace then follows; that, too, can give a useful first pointer
for understanding the issue.  It is especially useful in still getting
something from bug reports from users when it’s not possible to reproduce a
bug locally as well as when pbrt is used for assignments in classes.  In
the latter case, the conversation often goes something like this:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;“pbrt is buggy! It crashes when I call the function to normalize a vector.”&lt;/li&gt;
  &lt;li&gt;“That’s interesting–what does it print when it crashes?”&lt;/li&gt;
  &lt;li&gt;(pbrt’s output)&lt;/li&gt;
  &lt;li&gt;“That’s not a crash; it’s a failing assertion. The problem is that the
&lt;code class=&quot;highlighter-rouge&quot;&gt;foo()&lt;/code&gt; function that you added there is passing a degenerate vector to
the vector normalization routine.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given that students often don’t seem to read that output in the first
place, I’m not sure if any lessons are being learned about the value of
assertions through that exercise, but you can at least work through that
cycle much more quickly if it doesn’t require the student to fire up the
debugger to provide more information.&lt;/p&gt;

&lt;h2 id=&quot;resilience-versus-rigidity&quot;&gt;Resilience Versus Rigidity&lt;/h2&gt;

&lt;p&gt;When an assertion fails, a program generally terminates.  That’s a harsh
punishment, especially if the program is well into a lengthy computation.
One can treat failed assertions as exceptions and terminate just part of
the computation (and maybe just a small part, like a single ray path), or
one can also try to recover from the failing case and go on.  How to
approach all this is something of a philosophical question.&lt;/p&gt;

&lt;p&gt;A widely-accepted principle about assertions is that they should not be
used for error handling: invalid input from the user should never lead to
an assertion failure but rather should be caught sooner (and a helpful
error message printed, even if the program then terminates).  An assertion
failure should only represent an actual bug in the system: a mistake on the
programmer’s side, not on the user’s, even if something goofy provided by
the user is what tripped up the program.  That to me seems like an
unquestionably good principle.&lt;/p&gt;

&lt;p&gt;But even with assertions limited to errors in the implementation, what else
might one do when one fails?  One might try to recover, patching over the
underlying issue (for example, forcing the third probability to zero in the
majorant case), but that approach isn’t fully satisfying.  One issue is that the
code paths for the error cases will only run rarely, so they won’t be well
tested—it’s then hard to have confidence in their correctness.&lt;/p&gt;

&lt;p&gt;For a commercial product (or one that is not open source), not annoying
your users with an unexpected program termination is probably a good idea,
though I have to say that in my experience the error handling you get is
often not much better.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/matt/blog/images/illustrator.jpg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;More optimistically, assertion failures represent useful data points.
Papering over them is ignoring evidence of a deeper issue.  Perhaps your
code for recovering from the failed assertion is running all the time and
there’s a massive bug lurking but you have no idea it exists in the first
place.&lt;/p&gt;

&lt;p&gt;So I have come to believe that the best approach is to be strict, at least
for a system like pbrt.  Include error handling code to deal with invalid
user input, add cases as necessary to make your algorithms general-purpose
and robust, but when things go wrong in a way that you hadn’t thought was
possible, don’t try to muddle through it—fail if a null vector is to be
normalized and abort if the majorants are seriously off.  Those sorts of
unexpected cases merit investigation and resolution.  By making them
impossible to ignore you reduce the chance of letting something serious
fester for a long time.  It’s an annoyance in the moment, but it makes the
system much more robust in the end.&lt;/p&gt;

&lt;h2 id=&quot;track-down-rare-failures&quot;&gt;Track Down Rare Failures(!)&lt;/h2&gt;

&lt;p&gt;About not letting things fester…  One of the reasons I’ve come to the
rigidity view is an experience I had with the &lt;a href=&quot;https://github.com/mmp/pbrt-v1&quot;&gt;first version of
pbrt&lt;/a&gt;.  That version was more on the
resilience side of things, or perhaps it was just negligence.  Over the
course of rendering the image below it would always print a handful of
warnings about rays having &lt;a href=&quot;https://en.wikipedia.org/wiki/NaN&quot;&gt;not-a-number
(NaN)&lt;/a&gt; values in their direction
vectors.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://www.pbrt.org/gallery/a22.jpg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I expected that something obscure was occasionally going wrong in the
middle of BSDF sampling but I didn’t dig in for years after first seeing
those warnings.  Part of my laziness came from the (correct) assumption
that it would be painful debugging since the warnings didn’t appear until
rendering had gone on for some time.  The underlying bug didn’t seem
important to fix since it happened so rarely.&lt;/p&gt;

&lt;p&gt;Eventually I chased it down. As with many difficult bugs, &lt;a href=&quot;https://github.com/mmp/pbrt-v1/commit/024ef868cedb4c6adf9bc5bdbca1e4c759b950c3&quot;&gt;the
fix&lt;/a&gt;
was a single-character change: a greater or equals that should have been a
greater than—“equals” being a case that otherwise led to a division by
zero.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;        // Handle total internal reflection for transmission
-       if (sint2 &amp;gt; 1.) return 0.;
+       if (sint2 &amp;gt;= 1.) return 0.;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;When I rendered that scene afterward, not only were the warnings gone, but
the entire rendering computation was \(1.25\times\) faster than it was
before.  I couldn’t understand why that would be so and spent hours trying
to figure out what was going on.  At first I assumed the speedup must be
due to something else, like a different setting for compiler optimizations,
but I found that it truly was entirely due to that one-character fix.&lt;/p&gt;

&lt;p&gt;Eventually I got to the bottom of it.  Here is where thing were going
catastrophically wrong—with a few lines of code elided, this is the heart
of the &lt;a href=&quot;https://github.com/mmp/pbrt-v1/blob/9d361637cafcc9e6d82c2f3440e5f7e7279254df/accelerators/kdtree.cpp#L337&quot;&gt;kd-tree traversal code in
pbrt-v1&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;int axis = node-&amp;gt;SplitAxis();
float tplane = (node-&amp;gt;SplitPos() - ray.o[axis]) * invDir[axis];
// ...
if (tplane &amp;gt; tmax || tplane &amp;lt;= 0) {
    // visit first child node next
} else if (tplane &amp;lt; tmin) {
    // visit second child node next
else {
    // enqueue second child to visit later and visit first child next
}
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Consider that code with the lens of not-a-number. There are two rules
to keep in mind: a calculation that includes a NaN will yield a NaN, and
any comparison that includes a NaN evaluates to false.  (Thus, the fun
idiom of testing &lt;code class=&quot;highlighter-rouge&quot;&gt;x == x&lt;/code&gt; as a way to check for a NaN.)  Above, &lt;code class=&quot;highlighter-rouge&quot;&gt;tplane&lt;/code&gt; will
be NaN since the inverse ray direction is NaN.  The condition in the first
“if” test will be false, since both comparisons include a NaN.  The
condition in the second “if” test will also be false.  In turn, the third
case is always taken and &lt;em&gt;every node of the kd-tree will be visited&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Thus, a NaN-direction ray is intersected with each and every primitive in
the scene.  For a complex scene, that’s a lot of intersection tests and
thus, the performance impact of just a handful of those rays was
substantial.  Good times.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Here we are with two posts in a row that are comprised of me arguing for a
particular way of doing things and then ending with a story about me not
practicing what I’m preaching.  One could take this to mean that I don’t
know what I’m talking about, or one could take it to mean that my pain has
the potential to be your gain.  Either way works for me.&lt;/p&gt;

&lt;p&gt;More generally, I’ve come to learn that if something seems a little stinky
or uncertain in code, it really is worth stopping to take the time to chase
down whether there is in fact something wrong.  You have in hand evidence
of a problem in a particular place in a system—that’s valuable.  If you
ignore it and there is a bug there, often that bug will later manifest
itself in a way that’s much more obscure, maybe not evidently connected to
that part of the system at all.  You end up spending hours chasing it down
just to discover that if you had investigated the questionable behavior
when you first encountered it, you’d have fixed the underlying issue much
earlier and much more easily.&lt;/p&gt;

&lt;h2 id=&quot;notes&quot;&gt;notes&lt;/h2&gt;

&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:firstassert&quot;&gt;
      &lt;p&gt;Goldstine and von Neumann. 1948. &lt;a href=&quot;https://www.ias.edu/sites/default/files/library/pdfs/ecp/planningcodingof0103inst.pdf&quot;&gt;Planning and Coding of problems
for an Electronic Computing
Instrument&lt;/a&gt;. Technical
Report, Institute of Advanced Study. &lt;a href=&quot;#fnref:firstassert&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:digits&quot;&gt;
      &lt;p&gt;To my previous frequent frustration, the &lt;code class=&quot;highlighter-rouge&quot;&gt;CHECK&lt;/code&gt; macros in
       Google’s glog package do not print floating-point values with
       their full precision, which leads to error messages like &lt;code class=&quot;highlighter-rouge&quot;&gt;Check
       failed: x != 0 with x = 0&lt;/code&gt; bring printed when &lt;code class=&quot;highlighter-rouge&quot;&gt;x&lt;/code&gt; is very small
       but not actually zero.  This is another reason pbrt provides its
       own &lt;code class=&quot;highlighter-rouge&quot;&gt;CHECK&lt;/code&gt; macros. &lt;a href=&quot;#fnref:digits&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html">Some notes on productively detecting bugs when they occur during the course of rendering and a cautionary tale about what can happen when you ignore runtime errors.</summary></entry><entry><title type="html">Debugging Your Renderer (2/n): Unit Tests</title><link href="https://pharr.org/matt/blog/2021/11/26/debugging-renderers-unit-tests.html" rel="alternate" type="text/html" title="Debugging Your Renderer (2/n): Unit Tests" /><published>2021-11-26T00:00:00-08:00</published><updated>2021-11-26T00:00:00-08:00</updated><id>https://pharr.org/matt/blog/2021/11/26/debugging-renderers-unit-tests</id><content type="html" xml:base="https://pharr.org/matt/blog/2021/11/26/debugging-renderers-unit-tests.html">&lt;p&gt;Here we are, a year and a half after I posted an &lt;a href=&quot;/matt/blog/2020/04/26/debugging-intro.html&quot;&gt;introduction that was
full of talk&lt;/a&gt; about a
forthcoming series of blog posts about debugging renderers.  When I posted
that I already had a text file full of notes and had the idea that I’d get
through a series of 8 or so posts over the following few weeks.&lt;/p&gt;

&lt;p&gt;…and it’s been nothing but crickets after that setup.&lt;/p&gt;

&lt;p&gt;There’s no good reason for my poor follow-through, though this series did
turn into one of those things that got more daunting to return to the
longer time went by; I felt like the bar kept getting higher and that my
eventual postings would have to make up for the bait and switch.&lt;/p&gt;

&lt;p&gt;Now that I’m at it again, I can’t promise that these posts will make up for
the wait; in general, you get what you pay for around here.  But let’s
reset and try getting back into it.&lt;/p&gt;

&lt;p&gt;To get back in the right mood, here are a pair of images back from the
first time I tried to implement Greg Ward’s irradiance caching algorithm
back when I was in grad school:&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img src=&quot;/matt/blog/images/irradCacheInfinity.png&quot; width=&quot;288&quot; height=&quot;192&quot; /&gt;
&lt;img src=&quot;/matt/blog/images/irradCache.png&quot; width=&quot;288&quot; height=&quot;192&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;In the left image (which was rendered from right to left for some reason),
there was a bug that caused energy to grow without bound as the cache was
populated (no doubt a missing factor of \(1/\pi\) that led to a feedback
loop).  I always liked how that image went from ok to a little too bright
to thermonuclear by the time it was halfway through.  The image on the
right is my eventual success, with a slightly different scene layout.&lt;/p&gt;

&lt;h2 id=&quot;avoiding-the-bad-place&quot;&gt;Avoiding The Bad Place&lt;/h2&gt;

&lt;p&gt;There’s nothing fun about an image that starts out ok and then goes bad or
your renderer crashing after its been running for an hour with a stack
trace 20 levels deep.  There’s lots to be unhappy about:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Things are broken, but they’re not utterly broken, which suggests that
the underlying bug will be subtle and thus difficult to track down.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;There’s an enormous amount of state to reason about—the scene in all
its complexity, all of the derived data structures, and everything that
happened since the start of rendering until things evidently went wrong.
Any bit of it may hold the problem that led to disaster.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;More specifically, the actual bug may be in code that ran long before the
bug became evident; some incorrect value computed earlier that messed
things up later, possibly in an indirect way.  This is a particular
challenge with algorithms that reuse earlier results, be it spatially,
temporally or otherwise.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;It may be minutes or even hours into rendering before the bug manifests
itself; each time you think you’ve fixed it, you’ve got to again wait
that much longer to confirm that you’re right.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anything you can do to avoid that sad situation reduces the amount of time
you spend on gnarly debugging problems, and in turn, the more productive
you’ll be (and the more fun you’ll have, actually implementing fun new
things rather than trying to make the old things work correctly.)  That
goal leads to the first principle of renderer debugging:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Try to make it a conventional debugging problem (“given these inputs,
this function produces this incorrect output”) and not an unbounded “this
image is wrong and I don’t know why” problem.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One of the best ways to have more bugs be in the first category is to
have a good suite of unit tests. There’s nothing glamorous about writing
unit tests, at least in the moment, but they can give you a lot in return
for not too much work.  Not only does failing unit test immediately narrow
down the source of a bug to the few things that the test exercises, but it
generally gives you an easier debugging problem than a failure in the
context of the full renderer.&lt;/p&gt;

&lt;h2 id=&quot;starting-simple&quot;&gt;Starting Simple&lt;/h2&gt;

&lt;p&gt;A good unit test is crisp—easy to understand and just testing one thing.
Writing tests becomes more fun if you embrace that way of going about
it—it’s easy coding since the whole goal is to not be tricky, with the
idea that you want to minimize the chance that your test itself has bugs.
A good testing framework helps by making it easy to add tests; I’ve been
using &lt;a href=&quot;https://github.com/google/googletest&quot;&gt;googletest&lt;/a&gt; for years, but
there are plenty of others.&lt;/p&gt;

&lt;p&gt;It’s good to start out by testing the most obvious things you can think of.
That may be counter-intuitive—it’s tempting to start with devious tests
that poke all the edge cases.  However, if you think about it from the
perspective of encountering a failing test, then the simpler the test is,
the easier it is to reason about the correct behavior, and the easier
debugging will be.  (There is an analogy here to the old joke about the
&lt;a href=&quot;https://en.wikipedia.org/wiki/Streetlight_effect&quot;&gt;drunk searching for his car keys under the street
light&lt;/a&gt;.)  Only once the
basics are covered in your tests is it worth getting more clever.  If your
simpler tests pass and only the more complex ones fail, then at least you
can assume that simple stuff is functioning correctly; that may help you
reason about why the harder cases have gone wrong.&lt;/p&gt;

&lt;p&gt;Here is an example of a simple one from pbrt-v4. pbrt provides an
&lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/792aaaa08d97dbedf11a3bb23e246b6443d847b4/src/pbrt/util/parallel.h#L126&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;AtomicFloat&lt;/code&gt;&lt;/a&gt;
class that can atomically add values to a floating-point
variable.&lt;sup id=&quot;fnref:atomicfloat&quot;&gt;&lt;a href=&quot;#fn:atomicfloat&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; This test ensures that &lt;code class=&quot;highlighter-rouge&quot;&gt;AtomicFloat&lt;/code&gt; isn’t utterly
broken.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;TEST(FloatingPoint, AtomicFloat) {
    AtomicFloat af(0);
    Float f = 0.;
    EXPECT_EQ(f, af);

    af.Add(1.0251);
    f += 1.0251;
    EXPECT_EQ(f, af);

    af.Add(2.);
    f += 2.;
    EXPECT_EQ(f, af);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The test is as simple as it could be: it performs a few additions and makes
sure that the result is the same as if a regular &lt;code class=&quot;highlighter-rouge&quot;&gt;float&lt;/code&gt; had been used.
It’s hard to imagine that this test would ever fail, but if it
did, jackpot! We have an easy case to reason about and trace through.&lt;/p&gt;

&lt;p&gt;Here’s another example of a not-very-clever test from pbrt-v4. Most
of the sampling functions there now provide an inversion function that goes
from sampled values back to the original \([0,1]^n\) sample space.  Thus,
it’s worth checking that a round-trip brings you back to (more or less)
where you started.  The following test takes a bunch of random samples &lt;code class=&quot;highlighter-rouge&quot;&gt;u&lt;/code&gt;,
warps them to directions &lt;code class=&quot;highlighter-rouge&quot;&gt;dir&lt;/code&gt; on the hemisphere, then warps the directions
back to points &lt;code class=&quot;highlighter-rouge&quot;&gt;up&lt;/code&gt; in the canonical \([0,1]^2\) square, before checking
the result is pretty much back where it started.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;TEST(Sampling, InvertUniformHemisphere) {
    for (Point2f u : Uniform2D(1000)) {
        Vector3f dir = SampleUniformHemisphere(u);
        Point2f up = InvertUniformHemisphereSample(dir);

        EXPECT_LT(std::abs(u.x - up.x), 1e-3);
        EXPECT_LT(std::abs(u.y - up.y), 1e-3);
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;There’s not much to that test, but it’s a nice one to have in the bag.
Once it passes, you can feel pretty good about your
&lt;code class=&quot;highlighter-rouge&quot;&gt;InvertUniformHemisphereSample&lt;/code&gt; function, at least if you have independent
confidence that &lt;code class=&quot;highlighter-rouge&quot;&gt;SampleUniformHemisphere&lt;/code&gt; works.  And how long does it take
to write?  No more than a minute or two.  Once it is passing, you can 
more confidently make improvements to the implementations of either of
those functions knowing that this test has a good chance of failing if you
mess something up.&lt;/p&gt;

&lt;p&gt;About succinctness in tests: that &lt;code class=&quot;highlighter-rouge&quot;&gt;Uniform2D&lt;/code&gt; in that test is a &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/792aaaa08d97dbedf11a3bb23e246b6443d847b4/src/pbrt/util/sampling.h#L1075&quot;&gt;little
thing&lt;/a&gt;
I wrote purely to make unit tests more concise.  It’s crafted to be used
with C++ range-based &lt;code class=&quot;highlighter-rouge&quot;&gt;for&lt;/code&gt; loops and here generates 1000 uniformly
distributed 2D sample values to be looped over.  It and a handful of other
sample point generators save a few lines of code in each test that
otherwise needs a number of random values of some dimensionality and
pattern.  I’ve found that just about anything that reduces friction when
writing tests ends up being worthwhile in that each of those things
generally leads to more tests being written in the end.&lt;/p&gt;

&lt;h2 id=&quot;the-challenge-of-sampling&quot;&gt;The Challenge of Sampling&lt;/h2&gt;

&lt;p&gt;One of the challenges in implementing a Monte Carlo renderer is that the
computation is statistical in nature; sometimes it’s hard to tell if a
given sample value is incorrect or if it’s a valid outlier.  Bugs often
only become evident in the aggregate with many samples.  That challenge
extends to writing unit tests—for example, given a routine to draw
samples from some distribution, how can we be sure the samples are in fact
from the expected distribution?&lt;/p&gt;

&lt;p&gt;The Right Thing to do is to apply proper statistical tests.  For example,
&lt;a href=&quot;http://rgl.epfl.ch/people/wjakob/&quot;&gt;Wenzel&lt;/a&gt; has written code that applies a
\(\chi^2\)-test to pbrt’s &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/792aaaa08d97dbedf11a3bb23e246b6443d847b4/src/pbrt/bsdfs_test.cpp#L280&quot;&gt;BSDF sampling
routines&lt;/a&gt;.
Those tests recently helped him chase down and fix &lt;a href=&quot;https://github.com/mmp/pbrt-v4/commit/dfa1107459745b4d276c9bbdae73941cb269e077&quot;&gt;a tricky bug in pbrt’s
rough dielectric sampling
code&lt;/a&gt;. Much
respect for doing it the right way.&lt;/p&gt;

&lt;p&gt;My discipline is not always as strong as Wenzel’s, though there are some
more straightforward alternatives that are also effective.
For example, pbrt has many little sampling functions that
draw samples from some distribution.  An easy way to test them is to
evaluate the underlying function to create a tabularized distribution and
to confirm that both it and the sampling method to be tested more or less
generate the same samples with same probabilities.  As an example, here is
an excerpt from the &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/792aaaa08d97dbedf11a3bb23e246b6443d847b4/src/pbrt/util/sampling_test.cpp#L815&quot;&gt;test for sampling a trimmed
Gaussian&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    auto exp = [&amp;amp;](Float x) { return std::exp(-c * x); };
    auto values = Sample1DFunction(exp, 32768, 16, 0, xMax);
    PiecewiseConstant1D distrib(values, 0, xMax);

    for (Float u : Uniform1D(100)) {
        Float sampledX = SampleTrimmedExponential(u, c, xMax);
        Float sampledProb = TrimmedExponentialPDF(sampledX, c, xMax);

        Float discreteProb;
        Float discreteX = distrib.Sample(u, &amp;amp;discreteProb);
        EXPECT_LT(std::abs(sampledX - discreteX), 1e-2);
        EXPECT_LT(std::abs(sampledProb - discreteProb), 1e-2);
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;highlighter-rouge&quot;&gt;Sample1DFunction&lt;/code&gt; utility routine takes a function and evaluates it in
a specified number of buckets covering a specified range, returning a
vector of values. &lt;code class=&quot;highlighter-rouge&quot;&gt;PiecewiseConstant1D&lt;/code&gt; then computes the corresponding
piecewise-constant 1D distribution.  We then take samples using the exact
sampling routine and the piecewise-constant routine and ensure that each
sample value is approximately the same and each returned sample probability
is close as well.  (This test implicitly depends on both sampling
approaches warping uniform samples to samples from the function with values
of &lt;code class=&quot;highlighter-rouge&quot;&gt;u&lt;/code&gt; close to zero at the lower end of the exponential and &lt;code class=&quot;highlighter-rouge&quot;&gt;u&lt;/code&gt; close to
one at the upper end, which is the case here.)&lt;/p&gt;

&lt;p&gt;To be clear: &lt;code class=&quot;highlighter-rouge&quot;&gt;SampleTrimmedExponential&lt;/code&gt; could still be buggy even when that
test passes.  One might fret about those fairly large &lt;code class=&quot;highlighter-rouge&quot;&gt;1e-2&lt;/code&gt; epsilons used
for the quality test, for example.  It is possible that the looseness of
those epsilons might mask something subtly wrong, but we can at least trust
that the function isn’t completely broken, off by a significant constant
factor or the like.&lt;/p&gt;

&lt;p&gt;Writing this sort of test requires trusting your functions for sampling
tabularized distributions, but those too have their own tests;
eventually one can be confident in all of the foundations.  For example,
&lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/792aaaa08d97dbedf11a3bb23e246b6443d847b4/src/pbrt/util/sampling_test.cpp#L216&quot;&gt;this
one&lt;/a&gt;
compares those results to a case where the expected result can be worked
out by hand and ensures that they match.&lt;/p&gt;

&lt;h2 id=&quot;preserving-the-evidence&quot;&gt;Preserving the Evidence&lt;/h2&gt;

&lt;p&gt;Another good use for unit tests is for isolating bugs, both for debugging
them when they first occur and for ensuring that a subsequent change to the
system doesn’t inadvertently reintroduce them.&lt;/p&gt;

&lt;p&gt;Disney’s &lt;em&gt;Moana Island&lt;/em&gt; scene helped surface all sorts of bugs in pbrt;
many were fairly painful to debug since many were of the form of “render
for a few hours before the crash happens.” For those, I found it useful to
turn them into small unit tests as soon as I could narrow down what was
going wrong.&lt;/p&gt;

&lt;p&gt;Here’s one for a ray-triangle intersection that went bad.  We have a
degenerate triangle (note that the x and z coordinates are all equal), and
so the intersection test should never return true. But for the specific ray
here, it once did, and then things went south from there.  Trying potential
fixes with a small test like this was a nice way to work through the issue
in the first place—it was easy to try a fix, recompile, and quickly see
if it worked.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;TEST(Triangle, BadCases) {
    Transform identity;
    std::vector&amp;lt;int&amp;gt; indices{ 0, 1, 2 };
    std::vector&amp;lt;Point3f&amp;gt; p { Point3f(-1113.45459, -79.0496140, -56.2431908),
                             Point3f(-1113.45459, -87.0922699, -56.2431908),
                             Point3f(-1113.45459, -79.2090149, -56.2431908) };
    TriangleMesh mesh(identity, false, indices, p, {}, {}, {}, {});
    auto tris = Triangle::CreateTriangles(&amp;amp;mesh, Allocator());

    Ray ray(Point3f(-1081.47925, 99.9999542, 87.7701111),
            Vector3f(-32.1072998, -183.355865, -144.607635), 0.9999);

    EXPECT_FALSE(tris[0].Intersect(ray).has_value());
}
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;One thing to note when extracting failure cases like this is that it’s
critical to get &lt;a href=&quot;https://randomascii.wordpress.com/2013/02/07/float-precision-revisited-nine-digit-float-portability/&quot;&gt;every last
digit&lt;/a&gt;
of floating-point values: if the floats you test with aren’t precisely the
same as the ones that led to the bug, you may not hit the bug at all in a
test run.&lt;/p&gt;

&lt;h2 id=&quot;never-defer-looking-into-a-failing-test&quot;&gt;Never Defer Looking into a Failing Test&lt;/h2&gt;

&lt;p&gt;A cautionary tale to wrap up: a few months ago a &lt;a href=&quot;Https://github.com/mmp/pbrt-v4/issues/177&quot;&gt;bug
report&lt;/a&gt; about a failing unit
test in pbrt-v4 came in.  It had the following summary:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;ul&gt;
    &lt;li&gt;gcc-8.4 has stuck forever on ZSobolSampler.ValidIndices test&lt;/li&gt;
    &lt;li&gt;gcc-9.3 passed all tests&lt;/li&gt;
    &lt;li&gt;gcc-10.3 gives me the following message (in an eternal cycle) during tests&lt;/li&gt;
  &lt;/ul&gt;

  &lt;p&gt;&lt;tt&gt;/src/pbrt/samplers_test.cpp:182: Failure&lt;/tt&gt;&lt;br /&gt;
&lt;tt&gt;Value of: returnedIndices.find(index) == returnedIndices.end()&lt;/tt&gt;&lt;br /&gt;
&lt;tt&gt;  Actual: false&lt;/tt&gt;&lt;br /&gt;
&lt;tt&gt;  Expected: true&lt;/tt&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;code class=&quot;highlighter-rouge&quot;&gt;ZSobolSampler&lt;/code&gt; implements &lt;a href=&quot;http://abdallagafar.com/publications/zsampler/&quot;&gt;Ahmed and Wonka’s blue noise
sampler&lt;/a&gt;, which is based on
permuting a set of low-discrepancy samples in a way that improves their
blue noise characteristics.  pbrt’s &lt;a href=&quot;https://github.com/mmp/pbrt-v4/blob/792aaaa08d97dbedf11a3bb23e246b6443d847b4/src/pbrt/samplers_test.cpp#L167&quot;&gt;ZSobolSampler.ValidIndices
test&lt;/a&gt;
essentially just checks that the permutation is correct by verifying that
the same sample isn’t returned for two different pixels.  That test had been
helpful when I first implemented that sampler, but it had been no trouble
for months when that bug report arrived.&lt;/p&gt;

&lt;p&gt;When the bug report came in, I took a quick look at that test and couldn’t
imagine how it would ever run forever.  No one else had reported anything
similar and so, to my shame, I assumed it must be a problem with the
compiler installation on the user’s system or some other one-off error.  I
didn’t look at it again for almost two months.&lt;/p&gt;

&lt;p&gt;When I gave it more attention, I immediately found that I could reproduce
the bug using those compilers, just as reported.  It was a gnarly bug—one
that disappeared when I recompiled with debugging symbols and even
disappeared with an optimized build with debugging symbols.  The bug would
randomly disappear if I added print statements to log the program’s
execution.  Eventually I thought to try
&lt;a href=&quot;https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html&quot;&gt;UBSan&lt;/a&gt;, and
it saved the day, identifying this line of code as the problem:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;int p = (MixBits(higherDigits ^ (0x55555555 * dimension)) &amp;gt;&amp;gt; 24) % 24;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;0x55555555&lt;/code&gt; is a signed integer and multiplying by &lt;code class=&quot;highlighter-rouge&quot;&gt;dimension&lt;/code&gt;, which was
an integer that starts at 0 and goes up from there, quickly led to
overflow, which is undefined behavior (UB) in C++.  In turn, &lt;em&gt;gcc&lt;/em&gt; was
presumably assuming that there was no UB in the program and optimizing
accordingly, leading in one case to an infinite loop and in another to a
bogus sample permutation.&lt;/p&gt;

&lt;p&gt;At least the fix was easy—all is fine with an unsigned integer, where
overflow is allowed and well-defined:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;int p = (MixBits(higherDigits ^ (0x55555555u * dimension)) &amp;gt;&amp;gt; 24) % 24;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Leaving aside the joys of undefined behavior in C++, it was hard enough to
chase that bug down with it already narrowed down to a failing test.  If
the bug had been something like “images are slightly too dark with
gcc-10.3” (as could conceivably happen with repeated sample values,
depending on how they were being repeated), it surely would have been an
even longer and more painful journey. Score +1 for unit tests and -1 for
me.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;We’re not done with testing! With the unit testing lecture over, next time
it will be on to some thoughts about writing effective assertions and how
end-to-end tests fit in for testing renderers.&lt;/p&gt;

&lt;h2 id=&quot;note&quot;&gt;note&lt;/h2&gt;
&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:atomicfloat&quot;&gt;
      &lt;p&gt;That capability isn’t provided by the C++ standard library
            since floating-point addition is not associative, so
            different execution orders may give different results.
            For pbrt’s purposes, that’s not a concern, so &lt;code class=&quot;highlighter-rouge&quot;&gt;AtomicFloat&lt;/code&gt;
            provides that functionality through atomic compare/exchange
            operations. &lt;a href=&quot;#fnref:atomicfloat&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html">Returning, now with intention, to write up some thoughts about how to effectively debug a renderer.</summary></entry></feed>