NeRF: Neural Radiance Fields for View Synthesis

NeRF: Neural Radiance Fields for View Synthesis#

Overview#

Wisp takes a pragmatic approach to its implementation of NeRF, and refers to a family of works around neural radiance fields. Rather than following the specifics of Mildenhall et al. 2020, Wisp’s implementation is closer to works like Variable Bitrate Neural Fields (Takikawa et al. 2022) and Neural Sparse Voxel Fields (Liu et al. 2020) which rely on grid feature structures.

Differences to original NeRF paper#

The original paper of Mildenhall et al. 2020 did not assume any grid structures, which since then have been gaining popularity in the literature. Where possible, Wisp prioritizes interactivity, and accordingly, our implementation assumes a (configurable) grid structure, which is critical for interactive FPS.

Another difference to the original NeRF is the coarse -> fine sampling scheme which is not implemented here. Instead, Wisp uses sparse acceleration structures which avoid sampling within empty cells (that applies for Octrees, Codebooks, and Hash grids).

The neural field implemented in this app assumes a 3D coordinate input + view direction, and outputs density + RGB color.

Octree / Codebook#

The Octree & Codebook variants follows the implementation details of NGLOD-NeRF from Takikawa et al. 2022, which uses an octree both for accelerating raymarching and as a feature structure queried with trilinear interpolation.

Specifically, our implementation follows the implementation section, which discusses a modified lookup function that avoids artifacts: “any location where sparse voxels are allocated for the coarsest level in the multi-resolution hierarchy can be sampled”.

Simply put, the octree grid takes base_lod and num_lods arguments, where the occupancy structure is defined as levels 1 .. base_lod, and the features are defined for levels base_lods + 1 .. base_lods + num_lods - 1. The coarsest level used for raymarching here is base_lod.

See also this detailed report on Variable Bitrate Neural Fields and its usage with kaolin-wisp. The report was published in the Weights & Biases blog Fully-Connected.

Triplanar Grid#

The triplanar grid uses a simple AABB acceleration structure for raymarching, and a pyramid of triplanes in multiple resolutions.

This is an extension of the triplane described in Chan et al. 2021, with support for multi-level features.

Hash Grid#

The hash grid feature structure follows the multi-resolution hash grid implementation of Muller et al. 2022, backed by a fast CUDA kernel.

The default ray marching acceleration structure uses an octree, which implements the pruning scheme from the Instant-NGP paper to stay in sync with the feature grid.

Diagrams#

The NeRF app is made of the following building blocks:

An interactive exploration of the optimization process is available with the OptimizationApp.

How to Run#

RGB Data#

Synthetic objects are hosted on the original NeRF author’s Google Drive.

Training your own captured scenes is supported by preprocessing with Instant NGP’s colmap2nerf script.

NeRF (Octree)

cd kaolin-wisp
python3 app/nerf/main_nerf.py --config app/nerf/configs/nerf_octree.yaml --dataset-path /path/to/lego

NeRF (Triplanar)

cd kaolin-wisp
python3 app/nerf/main_nerf.py --config app/nerf/configs/nerf_triplanar.yaml --dataset-path /path/to/lego

NeRF (Hash)

cd kaolin-wisp
python3 app/nerf/main_nerf.py --config app/nerf/configs/nerf_hash.yaml --dataset-path /path/to/lego

Forward-facing scene, like the fox scene from Instant-NGP repository, are also supported.

Our code supports any “standard” NGP-format datasets that has been converted with the scripts from the instant-ngp library. We pass in the --multiview-dataset-format argument to specify the dataset type, which in this case is different from the RTMV dataset type used for the other examples.

The --mip argument controls the amount of downscaling that happens on the images when they get loaded. This is useful for datasets with very high resolution images to prevent overload on system memory, but is usually not necessary for reasonably sized images like the fox dataset.

RGBD Data#

For datasets which contain depth data, Wisp optimizes by pre-pruning the sparse acceleration structure. That allows faster convergence.

RTMV data is available at the dataset project page.

The additional arguments below ensure a raymarcher which considers the pre-pruned sparse structure is used.