Eccentric Developments

Bounding Volume Hierarchies

As I showed in the previous post about bringing back the triangles into the path tracer, the performance is extremely poor. The problem is not the triangles themselves, but the huge amount of them that are needed to create good looking objects.

To make some sense of why this happens, let's remember that for each pixel in the image, the render algorithm will test the primary ray against all objects in the scene, and when an intersection is found, another ray is randomly generated and checked for intersections against the scene again, this continues until the recursiveness limit is reached or a light or nothing at all is found.

To illustrate how many intersection tests are done for a single scene, the following is the formula for the worst case scenario:

f(m,n,o) = m * n * o

Where:

m is the number of pixels
n is the max depth of recursion
o is the number of objects

If we plug in the numbers for the torus scene on the current depth configuration (5), it give us f(307200, 5, 580) = 890_880_000, which is the total number of intersection tests per image. That is a lot of work, for a simple scene.

As with any speed improvement for any algorithm or work, what we need to do is decrease the amount of work done, in the case of the path tracer, we need to decrease the number of intersection tests done per ray. To achieve this kind of outcome, rendering engines use specialized data arragements called acceleration structures.

Acceleration structures reduce the time it takes to find a ray-object intersection by discarding as many objects as possible. Think about it like this: why would you test ray-object interections with the objects on the left of the screen (i.e, X < 320 for a 640px wide image) if you know your ray origin and direction are on the right (i.e, X > 320)? makes sense, right?

There are several different acceleration structures: kd-trees, octrees, binary space partitioning and bounding volume hierarchies.

In this post, I'll be writing about bounding volume hierarchies (BVH), which is the acceleration structure I am more familiar with. And in the next article, I'll explain a bit about octrees which I use in a very particular way.

Bounding Boxes

To undersand bounding volume hierarchies, the first thing to explore is the concept of bounding boxes.

A good analogy for bounding boxes are the book shelves that you find in libraries, they not only help keep the books in a tidy arragement, they also makes it much easier to find any book you are looking for.

Bounding boxes delimit the 3D space of a scene, containerizing the primitives in it, allowing faster intersection tests between rays and the scene.

There are several types of bounding boxes, the most common and also easier to understand are the Axis-Aligned Bounding Boxes ar AABB.

AABB are very simple in concept, they are a data structure composed of two points in 3D space that represent a box that encloses one or more primitives; with the dawback that they cannot be rotated and, most of the times, they will not be a tigh fit for the contained objects.

Nevertheless, they are useful as the memory required to store them is very low and the axis-aligment property makes it possible to do very fast ray intersections.

AABBs are used to form hiearchical structures that divide the space occupied by the objects in a 3D scene. This hierarchical structure is known as a Bounding Volume Hierarchy (BVH).

Bounding Volume Hierarchies

A BVH is analogous to a binary search tree, where each node is a bounding box (in our case an AABB) that in turn contains more space subdivisions in form of AABBs.

BVHs bottom out at AABBs that contain 3D primitives and no more space subdivisions, in other words, a leaf. It is at this point where the ray interestion tests happens against actual scene geometry.

To construct this acceleration structure, the algorithm to follow is composed of the following steps:

Calculate the AABB of the geometry in scope.
If the current geometry count is less than a given threshold, usually 4, mark this AABB as a leaf and stop the recursion.
Divide the primitives evenly along two space subdivisions.
Recursively generate the left child of this AABB, using the first half of the geometry and continuing from step 1.
Recursively generate the right child of this AABB, using the second half of the geometry and continuing from step 1.

The algorithm will stop when there are no more subdivisions to calculate.

Step 3 of the algorithm is the most critical and can be implemented different ways. In this entry, I'll show a naive partition that simply divides the scene primitives in halves without regard to any other property of the scene.

BVH Usage

Using the BVH is simple, having a ray that needs to be checked for intersections and a pointer to the root of the BVH, execute these steps:

Test the ray and current AABB for intersection.
If no intersection is found stop the recursion.
If the AABB is a leaf, add the contained primitives to a list to check later.
If the AABB is a node, move to the left child and go to step 1.
If the AABB is a node, move to the right child and go to step 1.

At the end of the BVH traversal, there will be a list of 3D primitives that we check the ray against for intersections.

Using a BVH this way is similar to a binary search tree, in the sense that, on a well balanced BVH, the expected running time is O(log(n)). Which make the rendering algorithm much faster.

Now Show Me The Code!

The following JavaScript code implements all the previous path tracing methods plus one called initializeBVH which does the following:

Implement the function to create an AABB out of a single triangle.
Initialize a function to merge two AABBs into a larger one.
Recursively build the BVH using a naive implementation.
Export a traceBvh function that will receive a ray and test it agains the BVH, returning the closest triangle found, if any.

In the worker function, you will notice I made some changes so it no longer has all the rendering functions inside it. Instead, the rendering functions are stringified and passed to the worker as part of the intitialization. During this part, the functions are parsed and the rendering pipeline is created.

All the next code is the reguar path tracing implementation, with some minor changes for readability, but other than that, it is all the same.

Finally, this is the main entry point for the path tracer, hit run to see it in action!

Results & Summary

Here are some numbers comparing the previous tracing implementation (brute force), and the new one using the BVH traversal:

Algorithm	Milliseconds per Frame
Brute Force	90300
BVH	2500

It is quite obvious there is a massive speedup of around 36x, even though it is not using multithreading nor SIMD vectorization.

The current naïve implementation works well because the the vast majority of triangles are added to the scene in an organized way and all of them are in the middle of the scene. If this was not the case, the speedup would not be as good, or maybe even worse.

A better algorithm for building the BVH and using a data structure not based on pointers (the tree), might help even more, I will explore those topics in later articles.

Final Note

You have seen me posting rendering speed numbers but be warned that those are only valid for the context of the current article.

I often use different configurations (hardware [not my work laptop, mind you], os, browser) from article to article when running the algorithms, as such, absolute running times cannot be compared as is, it is better to use speedup numbers.

Be sure though, that performance numbers in an article correspond to the same configuration, it would be disingenuous to do otherwise.

Enrique CR - 2024-07-31