View on GitHub

unity-nerf

Volumetric rendering of NeRF/PlenOctrees for Unity

SparseVoxelOctree Structure

This is the sparse octree data structure which we use to cache the spherical harmonic coefficients generated by NeRF-SH for efficient storage and lookup during rendering. This data structure acts as a bridge between our Python scripts, the Unity .NET runtime, and the GPU; as such, implementations are available in Python, C#, and HLSL with varying degrees of functionality. This page mostly concerns the C# implementation, though most of the information presented is true across implementations.

Conventions

Octrees are parameterized with a fixed resolution $(W, H, D)$ upon initialization. Voxels are then indexed using integer triplets $(x, y, z)$ where $0 \leq x \lt W$, $0 \leq y \lt H$, and $0 \leq z \lt D$. Voxels (and their corresponding nodes within the graph) are initialized when set, assuming the implementation supports mutable octrees.

To-do: Perhaps it would be beneficial to adopt the same strategy as N3Tree by requiring the user to specify a maximum depth for the octree instead of allowing for arbitrary resolutions. This way resolution is always restricted to powers of 2, which could significantly simplify the implementation of mipmaping and LOD down the line.

We use a right-handed XYZ coordinate system (with a vertical Z axis) when indexing voxels within the structure. This is unlike Unity's native coordinate system which is left-handed (with a vertical Y axis). Therefore care must be taken to consistently convert between coordinate systems when working within the engine, particularly in shader code.

Octants are sorted in colexicographic order with $(-, -, -)$ being considered the first octant. (This is equivalent to the memory layout of a C multidimensional array of the form A[2][2][2] when indexed using A[z > 0][y > 0][x > 0]). This scheme results in the following octant indices:

$x$ $y$ $z$ Index
$-$ $-$ $-$ 0
$+$ $-$ $-$ 1
$-$ $+$ $-$ 2
$+$ $+$ $-$ 3
$-$ $-$ $+$ 4
$+$ $-$ $+$ 5
$-$ $+$ $+$ 6
$+$ $+$ $+$ 7

Sampling

Octree traversal always starts from the root of the structure, moving down the leaves. So far this method has been fast enough to render all the scenes tested in real time, though there might exist more efficient traversal algorithms given our usage case. In particular, we could take advantage of the fact that samples are highly correlated since they're often taken along a ray.

To-do: Investigate the efficiency of different octree traversal algorithms.

Voxels are sampled using the nearest-neighbor method, though we plan on implementing bilinear and trilinear* interpolation in the near future. As of now, however, it is unclear if the quality benefits of these methods are worth their additional overhead.

* We call these methods bilinear/trilinear since this is the terminology traditionally used within the context of texture filtering in 2D. When working with voxels, however, these methods could be more aptly described as trilinear/quadrilinear since more samples are required (8/16 samples, respectively).

To-do: Investigate the quality/performance tradeoffs of bilinear and trilinear sampling.

Memory layout

In C# the data structure is implemented as a generic class, meaning voxels can be used to store arbitrary data types. Nodes are represented internally via a pair of lists: _nodeChildren and _nodeData. The first list, containing $8n$ integers, is used to encode the structure of the octree. For each node $i \in \mathbb{N}, 0 \leq i \lt n$ entries $8i$ through $8i + 7$ within this list contain the indices for each octant of that particular node, if they exist. The second list, with $n$ entries of the generic type T, stores the data for each node (including non-leaf nodes for LOD purpopses):

class SparseVoxelOctree<T>
{
	public readonly int Width;
	public readonly int Height;
	public readonly int Depth;
	
	private List<int> _nodeChildren;
	private List<T> _nodeData;

	// ...
}

To-do: At the moment we store the children for every single node in the octree. This is unnecessary, however, since we know that the nodes at the last level of the octree can't have children anyway. Not storing these nodes could save a significant amount of space since, statistically, the overwhelming majority of nodes are leaves. Doing this would require sorting nodes based on depth, which leads us into...

To-do: Sorting nodes based on depth. Doing this should make the nodes of each layer contiguous in memory. Implementing this would: a) allow for the optimization mentioned above, and b) simplify memory management for level-of-detail since reducing LOD level would require simply trimming off the last few entries in the nodes list.

As of now we store PlenOctree data using the same layout as the original N3Tree data structure in our prototype. This means that when using the SH16 format, for example, each voxel contains an array of 49 floating-point values corresponding to the 16 SH coefficients for each color channel (plus the density). The particular order for these coefficients within the array are determined by their implementation. This is unfortunate since it requires users to be familiar the layout of these coefficients when using our data structure.

To-do: Since our implementation is generic we have the capability of storing SH coefficients using a custom data structure, which would make the sampling process considerably less error-prone than it currently is. However, our serialization format makes storing and loading custom data types a bit of a hassle. Luckily, tweaking the serialization format to support this wouldn't be too difficult (more information in the relevant section).

Serialization

One of our main objectives with this data structure is to have good cross-platform compatibility. Ideally, creating new implementations (that is, "porting" it) for different languages/systems should be as straightforward as possible. To this end, our serialization format should be well supported in many different environments. After comparing a few free and open-source (FOSS) serialization protocols we landed on Protocol Buffers as the most appropriate format for our data structure. As a binary serialization format, it relies on a specification (in the form of a .proto file) written in a custom interface description language to parse the contents of the data structure. Below is the exact .proto file we use for our data structure:

syntax = "proto3";
package svo.protobuf;
option csharp_namespace = "SVO.Protobuf";

message SparseVoxelOctree {
	string type_url = 1;
	int32 width = 2;
	int32 height = 3;
	int32 depth = 4;
	repeated int32 node_children = 5 [packed=true];
	bytes node_data = 6;
}

In an effort to support generic types, our serialization format stores voxel data in binary form as a bytes array. The type_url field stores an URL/resource pointing to the .proto file describing the data format used within this array. A few basic types are recognized as "well-known types" and are assigned specific URLs within the Protocol Buffers specification (for example, boolean values can be encoded using the BoolValue message type with "type.googleapis.com/google.protobuf.BoolValue" as URL). Actually using this URL is still a work in progress in the current implementation (in practice we assume the data consists floating-point numbers).

To-do: Alter the serialization format to store the topology and data of the octree separately. That way, the use of type URLs wouldn't be necessary as node data could be stored as a standalone array, simplifying the serialization/deserialization process.