1. Introduction
This article is an extension to our publication at the 37th Computer Graphics & Visual Computing Gathering 2019 [
1].
Global illumination (GI) rendering based on Monte Carlo (MC) methods allows for the generation images of astonishing realism that can often hardly be distinguished from real photographs. Even though these methods have been around for a long time, their computational complexity remains a major challenge. Ray-based approaches like
path tracing may require a considerable number of rays to be traced through a scene to determine an approximate solution for the Rendering Equation [
2]. Because of the stochastic nature of this process, this can take anywhere from mere seconds to hours until a noise-free result emerges. While the realism gained by accounting for GI is often considerable, rendering previews in current 3D modeling software often does not result in images with the same fidelity due to the limited processing time and the usage of different rendering methods. In recent years, numerous methods have been introduced that help to increase the visual quality by filtering the noise from images rendered with low sample counts. These methods often result in visually pleasing, noise-free images. However, rendering the GI of a scene usually involves computing the light transport at a large number of points that are not directly visible in image space. When moving through the scene, reusing this information for the computation of successive frames can increase visual quality and shorten rendering times at the same time.
We introduce the
HashCache, a hierarchical world-space caching method for GI rendering of static scenes, based on a linkless octree [
3]. Using a hash-based approach makes it possible to perform the reconstruction of cached illumination in constant time, depending only on the actual screen resolution (assuming that the visible geometry is known). This makes it well-suited for the exploration of static scenes. Despite only caching diffuse illumination, our system explicitly supports non-diffuse materials through a hybrid reconstruction scheme. This is an approximate final gathering step similar to Photon Mapping [
4,
5] and is performed before the actual reconstruction. For non-diffuse materials, this step is composed in a hybrid way: Rays are traced up to the first hitpoint that is interpreted as a diffuse material, where the pre-gathered information is then queried from the cache and modulated with the path throughput. This process is described in more detail in
Section 3.2,
Section 3.3 and
Section 3.4. Compared to precomputed radiance transfer, our preprocessing time is much shorter, as we only need to determine geometric cell occupations.
In order to reduce quantization artifacts, we employ a spatial jittering method inspired by Binder et al. [
6]. To increase image quality by reducing noise, we suggest a layered filtering framework, basically projecting path-space filtering [
7] to image space. In order to demonstrate the practicability of our approach, we extend a basic cross-bilateral denoising filter by integrating it into our framework and adjusting it to the kind of noise present in our system, enabling it to filter the image content per light bounce. With this method, we especially aim for improving the visual quality of non-diffuse materials, compared to filtering only at the primary hitpoint without any information about the transport paths. Arbitrary image-space filtering methods may be integrated into the suggested framework in order to improve their handling of specular or glossy material types. Finally, we present image quality comparisons, performance benchmarks, and an analysis of memory requirements, showing the practicability of our approach. While maintaining interactive frame rates, the noise in the image can be reduced significantly. We show that our approach performs comparably to much higher sampling rates in path tracing regarding relative mean square error (relMSE) and multi-scale structural similarity (MS-SSIM) metrics.
2. Related Work
Caching samples is a proven tool for several computer graphics applications. An overview of some of the relevant work from this area has been published by Scherzer et al. [
8]. Currently, most methods try to exploit the temporal coherence in image space. However, caching in world space has the advantage of prolonging the validity of samples in the cases of view-dependent (dis-)occlusions and surfaces that are not directly visible. This is especially beneficial for methods that handle indirect GI. In addition to caching methods, filtering techniques have been introduced to allow for real-time rendering with low sampling rates while maintaining acceptable image quality. In the following section, we elaborate on an overview of the relevant research related to our system, including the fields of sample caching, interactive GI, and filtering, as we combine methods from these fields.
Early work by Ward et al. [
9] uses an octree to cache irradiance values in world space. This approach is easy to implement when rays are cast sequentially. However, updating a data structure is challenging when data are accessed in a parallel fashion. Bala et al. [
10] already presented a ray tracer suited for near-interactive scene editing in 1999. Their visualization is based on object-space radiance interpolation and a hierarchical data structure called
ray segment trees. The latter is introduced for tracking the dependencies between radiance interpolations for regions in world space, and helps with illumination updates triggered by scene manipulations. The
Render Cache by Walter et al. [
11] is an interactive caching and reprojection technique with adaptive sampling. In order to be efficient, only samples within the view frustum are reprojected from one frame to the next. GI computations on surfaces outside the current frame’s frustum are not cached at all. Ward and Simmons [
12] already worked on nondiffuse interactive global illumination in the same year, introducing their holodeck ray cache. The eponymous
holodeck is a four-dimensional data structure that provides a caching mechanism for interactive walk-throughs. Sample density is varied locally, while sampling happens on-demand and is implemented in a parallel fashion. While allowing for dynamically illuminated environments, the precomputed radiance transfer system for real-time rendering presented by Sloan et al. [
13] only supports low-frequency content. Spherical harmonics are used for representing illumination information for glossy and diffuse materials alike. In addition, a method is suggested for rendering soft shadows and caustics from rigidly moving objects onto dynamic receivers. Tole et al. [
14] introduced a caching scheme that supports the interactive computation of global illumination in dynamic scenes. Their
Shading Cache is an object-space hierarchical subdivision mesh that stores shading values at its vertices. Hardware-based interpolation and texture mapping make it possible to generate results at high frame rates, while the results are adaptively refined based on the interpolation error and camera or object motion. Images with a suitable quality are generated within tens of seconds, outperforming other systems that were available at the time.
The interactive rendering and display technique suggested by Bala et al. [
15] supports complex scenes with complex shading such as global illumination. Sparsely distributed samples and analytically computed edges are combined in a way that allows for generating images of a relatively high quality by relying on a compact edge-and-point image. The presented renderer supports scene interaction such as object manipulation and achieved a performance of 8 to 14 frames per second on a desktop PC at the time. The findings of Krivànek et al. [
16] are based on Ward et al.’s earlier work [
9]. The authors present a method for efficient global illumination that relies on sparse sampling, caching, and interpolation. More specifically, the older irradiance caching scheme is extended so that radiance, instead of irradiance, can be cached and interpolated. In work by Christensen et al. [
17], a sparse octree is suggested as a 3D Mipmap to store irradiance values. A brick structure is employed to store sparse samples for individual octree cells. Dietrich et al. [
18] propose a cache that employs a hash map as the spatial index structure to store shading and illumination without the need for a preprocessing step. While our presented work shares many similarities with this approach, the hashing mechanism by Dietrich et al. cannot be easily ported to highly parallel systems such as the GPU. Moreover, they provide neither a level-of-detail mechanism nor a method to filter the results. A method for temporal radiance caching that supports glossy global illumination in animated environments is presented by Gautron et al. [
19]. Their approach is built upon irradiance and radiance caching, while sparse temporal sampling and interpolation of indirect lighting are employed to reuse the computed information in succeeding frames. Lighting information is adaptively updated and flickering artifacts are strongly reduced. Their temporal interpolation approach is based on temporal gradients. According to the authors, one of the key advantages of their method is the straightforward implementation into any existing renderer. Brouillat et al. [
20] introduce the combination of photon mapping and irradiance caching. More specifically, their approach computes an irradiance cache from a photon map. This means that the advantage of photon mapping being view-independent is exploited to perform view-independent irradiance caching, while the actual rendering is done using radiance cache splatting. Radiance Caching by Krivànek et al. [
21] is a method for accelerating GI computation in scenes with low-frequency glossy bidirectional reflectance distribution functions (BRDFs) based on spherical harmonics. Higher-frequency content is supported in work by Omidvar et al. [
22], using Equivalent Area Light Sources. However, all of the methods presented so far are offline processes for non-interactive systems.
Multi-bounce indirect lighting, glossy reflections, arbitrary specular paths, and thus even caustics are supported in Wang et al.’s work [
23], which builds upon scattered data interpolation on the GPU. Interpolation is supported by k-mean clustering and a subsequent final gathering step, where the photon map is approximated using lightcuts. Using this method, it is possible for the user to manipulate the viewed scene at interactive frame rates. Hachisuka and Jensen [
24] describe how to use spatial hashing for constructing photon maps on the GPU. Their method stores a single photon stochastically instead of storing lists or aggregations. This allows the approach to ignore hash collisions but limits the sample set size and expressiveness. Caching samples in world space is either computationally demanding or only worked for a limited set of samples because of memory requirements. Hence, there is a need for fast world-space sample-caching techniques that allow updating aggregated samples. Keeping GPU implementations in mind, it is crucial for these updates to be suited for parallel execution.
Crassin et al. [
25] provide a technique for real-time GI based on approximate cone tracing using a sparse voxel octree. However, the appearance differs between renderings generated with their method and unbiased results. A real-time approach for approximating GI is presented by Thiedemann et al. [
26]. While diffuse near-field GI is rendered at high visual fidelity, voxel-based visibility computation causes glossy reflections not to be handled well.
Ritschel et al. [
27] present a comprehensive summary of the major challenges in interactive GI. Their work includes the underlying theoretical aspects, phenomena, and methods for the actual rendering task. They also provide an overview of ratings regarding various aspects like ease of implementation, also giving information about the transport paths that each method can handle. The radiance-caching method proposed by Scherzer et al. [
28] uses pre-filtered cache items based on MIP-maps as a substitute for spherical harmonics. The coefficient-dependent complexity of spherical harmonics is thus replaced by a constant-time lookup per pixel, improving performance by an order of magnitude when compared to radiance caching with Phong BRDFs. Fast approximations of joint bilateral filtering (as presented by Dammertz et al. [
29] and Bauszat et al. [
30]) and utilizing adaptive manifolds [
31] also help with increasing the image quality and reducing noise. Mara et al. [
32] present a method for efficient density estimation for photon mapping on the GPU. While their work also contains information on using a hash map as their data structure, it is strictly limited to photon mapping. Our approach, however, can be used with several GI methods. An overview of filtering techniques for preview rendering is given by Schwenk [
33]. Intuitively editing the appearance in complex scenes (geometry- and lighting-wise alike) poses a major challenge. Nguyen et al. [
34] present a method that enables the user to freely manipulate surface light fields. Surface reflectance functions are then adapted to best fit the desired surface light field by changing shading parameters. In contrast to earlier approaches, manipulating the surface light field is possible by using a single color brush tool.
Our layered filtering approach is inspired by Keller et al.’s path-space filtering [
7]. Here, the contribution of actual light transport paths is smoothed before reconstruction. While this method performs a range search in path space, our approach effectively brings this filtering to screen-space in a way that is simple to implement. The general decomposition method for filtering specific light paths presented by Zimmer et al. [
35] is closely related to our approach. However, the decomposition in our algorithm is exclusively based on recursion depth, not directly taking specific material properties into account. This results in an easier, more straightforward implementation.
Munkberg et al. [
36] present a method for caching illumination in texture space. While their approach avoids the issues of axis-aligned grids and strongly supports the use of material-specific filters, a global approach like ours makes it easier to account for arbitrary neighboring geometry in the filtering process. Chaitanya et al. [
37] present a technique for reconstructing image sequences based on autoencoders, motivated by recent advances in image restoration with deep convolutional networks. Another method from the same field is presented by Bako et al. [
38], with the focus put on high-end, production-quality renderings. The authors also provide a comparison of these two related approaches. A direct-to-indirect transport technique is described by Silvennoinen et al. [
39]. It is suited for mostly static scenes with fully dynamic lighting, cameras, and diffuse materials, with the incident radiance field being reconstructed from a sparse set of radiance probes. Global illumination is then computed by factorizing the direct-to-indirect transport into two parts—global and local—sampling the global transport with the aforementioned radiance probes and using the sampled radiance field for a reconstruction filter. While the achieved visual quality is convincing, this method requires a relatively long precomputation time of around one hour for the presented scenes. Our method works magnitudes faster to preprocess the scene’s geometry and to build the hashing structure.
Schied et al. [
40,
41] suggest methods for generating temporally stable image sequences from GI at one sample per pixel. The effective sample count available to their approach is increased by accumulating samples temporally, while the denoising method itself is performed by using a hierarchical image-space wavelet filter. Note that Binder et al. [
6] also mention a way of filtering similar to ours as a possible future solution to overcome artifacts. The main contribution of their work is a fast path-space filtering method by using jittered spatial hashing for hiding quantization artifacts. While Binder et al. do not use caching, they also utilize a hash-based approach in order to optimize neighborhood search. Their jittering method is also applicable in our case for hiding quantization artifacts introduced by the discrete scene subdivision.
Binder et al. [
42] provide a massively parallel GPU implementation of path-space filtering using a hash table for searching nearby vertices. This is related to our layered filtering approach, as filtering for both does not only happen on the primary hitpoints, but vertices of paths are shared between pixels when filtering the image. In contrast to approaches like Binder et al.’s, our method projects the information available in path space to the according pixels in screen space, but puts the information from subsequent vertices to different layers, which are then filtered individually. Real-time glossy and specular reflections are improved by combining ray tracing with radiance probes and screen-space reflections in Hirvonen et al.’s work [
43], while Luksch et al. [
44] propose a system for incrementally updating baked global illumination solutions in order to avoid visual disturbances such as flickering or noise. Their many-light GI algorithm is combined with appropriate data structures and prioritization strategies in order to compute differential updates for illumination states. Wang et al. [
45] provide some novel work regarding light-field probes. Their non-uniform placement method has the goal of correctly sampling visibility information and eliminating superfluous probes at the same time. Probe placement relies on scene skeletons and a refinement based on gradient descent. Visibility is cached in a sparse voxel octree. Yalçıner and Sahillioğlu [
46] present a method for populating sparse voxel octrees in the presence of a high number of dynamic objects. Their pre-generated voxel data are transformed from model space to world space on demand, while common approaches voxelize dynamic objects per frame. An additional filtering method enables smooth transitions and reduces aliasing. The authors provide a real-world use case by implementing voxel cone tracing with their discretization method. Zhao et al. [
47] introduce an approach for improving the reconstruction of glossy interreflections, while also showing performance gains when compared to previous approaches. Their view-dependent radiance caching works directly with outgoing radiance at surfaces instead of incoming radiance distributions. In a recent paper, Huo et al. [
48] propose the use of quality and reconstruction networks with a large offline dataset for the adaptive sampling and reconstruction of first-bounce incident radiance fields. The reconstruction network is based on a convolutional neural network. Reconstruction happens in 4D space. At the same time, the quality network is based on deep reinforcement learning and guides the adaptive sampling process. Comparisons with state-of-the-art methods show visually convincing results.
All of the methods above greatly benefit from exploiting spatial and/or temporal coherence in image space. We argue that a world-space sample-caching technique can further improve the image quality of such filtering methods, especially in complex scenes with many occluding surfaces and arbitrary views.
3. Method
In this section, we give an overview of the employed cache structure and describe how it can be used to cache the data generated by stochastic rendering methods. Subsequently, we give more details on how samples are generated during the rendering process in order to reuse recursively generated hitpoints. Here, it is also shown how the actual cache updates are performed. Eventually, we provide information about the reconstruction process, including the support of non-diffuse materials, as well as our proposed layered filtering framework.
3.1. Cache Structure
Monte-Carlo-based (MC-based) rendering methods provide the means to solve Kajiya’s Rendering Equation [
2] numerically. In our implementation, a straightforward path tracer with next-event estimation and multiple-importance sampling is used for computing the illumination data. The path-tracing process generates millions of randomly and sparsely distributed hitpoints located on the scene geometry in each iteration, which means that we are not supporting participating media to be cached. Consequently, a data structure that allows for efficient caching of such data must allow for querying of large amounts of randomly distributed keys at a high performance. The core of our HashCache system is Choi et al.’s concept of a linkless octree [
3], consisting of a number of hash maps implemented with Alcantara’s Cuckoo Hashing [
49]. This hashing method allows for a worst-case constant lookup time, making its choice especially suitable for real-time previews. Cuckoo hashing resolves collisions by employing an additional hash function in order to compute two candidate indices in the hash table for one key. When a collision is detected on key insertion, the already-existing entry is replaced by the new entry. Then, the old entry is inserted at its alternative position, with potential collisions handled the same way iteratively until all entries have been successfully placed. While the result may be an infinite loop, this issue can be detected and the process can then be restarted with alternative hash functions. Although it was possible to use a plain grid instead of an octree, we chose to use the hierarchical approach for its inherent level-of-detail support. When rendering a scene from an arbitrary point of view using a non-hierarchical data structure, parts of this data structure will be potentially subsampled, resulting in aliasing artifacts. With a hierarchical data structure like the HashCache, it is possible to choose the hierarchy’s level whose resolution most closely resembles the projected pixel size in object space, hiding subsampling artifacts effectively.
While the hash-based octree representation is a compact structure, there still is a trade-off between memory consumption and access time. In order to construct the compact hash map, all cells occupied with geometry have to be marked at the highest resolution available in the octree. This information is determined by testing all grid cells within each triangle’s bounding box for an intersection with the triangle, resembling typical grid construction algorithms, such as in the work by Perard-Gayot et al. [
50]. Because of the large number of grid cells at high resolutions, we choose to represent each cell by a single bit in a field of 32 bit types. Each 32 bit chunk forms a block, which is subdivided spatially at a resolution of
bits = 32 bits. The implementation uses CUDA’s atomic operations on the respective chunks, effectively yielding the number of occupied cells. For an illustration of our approach for determining occupied cells, see
Figure 1 and
Figure 2.
During the hash map initialization, this number is used in combination with a space-usage factor to limit the actual memory requirements. We choose an initial space-usage factor of . If the hash map construction fails, another attempt is made with until construction succeeds. This construction process is performed for each octree level, with cell indices being adapted accordingly. As the utilized hash map implementation is bound to 32 bit keys and an octree’s extents are limited to powers of two, the maximum representable resolution is . Higher resolutions are represented by splitting space into multiple hash maps per octree level.
The values stored in the octree’s underlying hash maps are actual indices to global data arrays. These arrays occupy exactly the space required to store all of the information that is computed throughout the process. Note that the presented implementation relies on caching only the outgoing diffuse illumination without any directional information other than the front and back of each cache cell, where the front is determined to be the inverse orientation of the first ray that hits any geometry within a cell. While it would be possible to store information for more directions, this would negatively influence storage requirements and performance. However, storing at least two directions is necessary, since infinitesimally thin geometric primitives may be illuminated differently from both sides. To store more accurate GI information for those cases, we construct the arrays to contain the following data per cell:
Diffuse illumination for the front and back of each cell as six half values (96 bits);
compressed cell normal (32 bits);
currently accumulated number of samples (32 bits);
the frame index denoting when the cell has last been wiped (32 bits).
Thus, the total amount of memory required for the data of one cell is (12 + 4 + 4 + 4) Bytes = 24 Bytes. The reset information is required to rebuild the cache when illumination changes occur. Note that the diffuse illumination is not attenuated by the diffuse material color (albedo) at this point. Instead, this is accounted for during reconstruction, which allows for a higher-quality representation of spatial variation in the appearance of diffuse surfaces. In order to determine the front normal of each cell, which is required to discern the stored orientations of each cache cell, an atomic compare-and-swap is used to store the current normal in a cache cell if no normal is stored so far. All generated samples can then be assigned to the front or back by comparing their stored normals with the front normal.
There are no specific constraints for the number of triangles per octree cell (or, vice versa, the number of octree cells per triangle), as the required resolution largely depends on the lighting situation and the actual camera settings and position—for quick previews during modeling of individual objects, lower resolutions, such as or , may already yield satisfactory results.
3.2. Caching
While
Figure 3 already gives a general overview of the process described in this section, including the general data flow and algorithmic elements, a further description is given below.
During the caching process, rays are shot into the scene from the current camera view and traced along randomly generated paths
, with
being that path’s individual vertices located on scene surfaces, and
being the maximal recursion depth. As we want to cache data not only for the first hitpoint (which would effectively only represent directly visible geometry), we compute illumination along subpaths with a maximum length of
and store these for the first
hitpoints. Thus, since all vertices of a path should account for the energy transported along the same number of consecutive vertices in order to provide consistent data, the maximum path length is
, and the indirect illumination contributed to each vertex
along the path has to be limited to the subpath vertices
,
. This is illustrated in
Figure 4.
As soon as the local illumination and the reflected direction
for the current vertex
have been computed, the energy transported along the current path is updated by computing the throughput
according to the locally evaluated BRDF. The first vertex along the current path that should still account for energy originating at the current vertex is at index
. In order to take into account the accumulated throughput for the current subpath from vertex
back to vertex
, each preceding vertex
is updated with the reflected local energy
by computing the component-wise multiplication
Here, the diffuse material color is not accounted for in vertex . It is instead taken into account after reconstruction in order to avoid loss of spatial variation in the appearance of diffuse materials. All vertices from each path that belong to a Lambertian material are stored in the respective arrays indexed by the hash map. This includes diffuse illumination values , compressed normal vectors , the linear map index (only required if the hash map’s resolution R exceeds ), and the linear cell index , where and are necessary to store the data in our data structure correctly.
Now, the respective cells of the HashCache are updated with the newly computed light transport data. When updating the individual octree levels, the collected data are pre-accumulated before performing an update on the global data arrays in order to avoid synchronization issues. Pre-accumulation is implemented by first sorting the data using a radix sort approach and consecutively performing a reduction on the data with the global data index as the primary key and the binary orientation information (front or back) as the secondary key for both the sorting and reduction. In order to use the orientation information as the secondary key in the reduction, the individual sample’s normal vectors have to be replaced with binary front/back information: 1, if the sample lies within the front-facing hemisphere, and –1 otherwise. If storage is not an issue, more directions could be represented, which may also allow for caching glossy materials. Afterwards, the data are coarsened for the preceding octree level and the process is repeated until all levels have been updated. The full octree update is in , with n being the number of updated cache cells.
3.3. Reconstruction
As rendering scenes with Lambertian materials exclusively may cause them to appear visually dull and unrealistic, our system provides the means for handling materials with glossy or specular properties. An overview of the process described in this section is given in
Figure 5, while a further description is given below.
For the reconstruction step, primary geometry hitpoints are determined for each individual pixel, with the exception of glossy and specular materials, where the specific rays are traced further until they eventually arrive at maximum depth , a diffuse material, or hit the background. For each path, the accumulated throughput is stored for the first vertices as together with the diffuse material color, the local normal, and the appropriate octree level (selected by projecting the pixel area in object space). The reconstruction is executed per-level and accumulated in the image by selecting the correct orientation from each cell and multiplying the retrieved diffuse illumination value with .
In order to reduce the blocky appearance caused by low cache resolutions, we employ a spatial jittering method to compute the actual cell index. This jittering method is based on the hitpoint’s local tangent plane:
Here, and are the original and the jittered hitpoints, are uniformly distributed random numbers, and are the tangent and the binormal, is the actual cell size, and is the user-adjustable scale of the jittering. Finally, a basic edge-aware cross-bilateral denoiser filters remaining noise for each depth layer individually, and also tries to fill holes where cache information is not available.
Figure 6 shows the effect of jittering and denoising in two areas: While the wall in the back shows more high-frequency noise, the statue in the front reveals quantization artifacts due to the great differences between neighboring cache values. However, such artifacts are efficiently removed by the spatial jittering.
Note that spatial jittering may result in slight artifacts when cells are processed which do not have geometry in all neighboring cells that lie on the respective tangent plane (cf.
Figure 7). This is mainly caused by the fact that our data structure does not support enhanced sparsity encoding, but rather relies on constrained access [
51] to avoid further memory consumption. Two cases may appear:
The hash key for the neighboring grid cell may belong to another cell that belongs to the scene’s geometry. In such a case, visual artifacts may occur.
The hash key for the neighboring grid cell may yield an empty entry in the hash map. In this case, the irradiance value is set to the average of the pixel’s neighbors, i.e., invalid or unsampled pixels resulting from spatial jittering are filled in.
However, during our evaluation, we did not observe any major artifacts resulting from this. Thus, we decided not to include any way of querying a cell for its grid coordinates.
In
Section 3.4, we present how we extend our system with a filtering approach that accounts for multiple bounces of glossy and specular reflections and refractions in order to improve visual quality. Note that in our implementation, caching and reconstruction are independent of each other. The caching process can be executed with an arbitrary sampling scheme at freely selectable resolutions, while the reconstruction just retrieves the stored illumination values from the data structure. Thus, the caching process may actually rely on arbitrary distributions of rays throughout the scene, which also enables strategies like randomly or adaptively sampling the scene along camera paths or creating importance-based sampling schemes. Additionally, the separate caching step can be performed with arbitrary numbers of samples. In our case, we tested the caching performance at various resolutions, as shown in
Section 4.
3.4. Layered Filtering
While the plain octree reconstruction described so far may suffice in some scenarios, it is known from regular path tracing that convergence can be slow in many scenes. This makes it necessary to employ filtering methods in order to achieve noise-free images within an acceptable time frame. The kind of noise remaining in the generated images largely depends on the employed rendering method, while quantization artifacts are effectively reduced by the aforementioned spatial jittering method. However, it is important to note that the existing filtering methods developed for path tracing will not work for the kind of noise our approach exhibits, as noise scales with the distance to a surface because of the limited cache resolution. Our suggested filtering approach aims to increase visual fidelity under such circumstances, while explicitly accounting for glossy, specular, and refractive materials.
The main idea of our method is to split the traditional filtering step that is carried out on the final image into multiple steps by filtering each bounce of light in an individual layer (cf.
Figure 8). With a non-layered filtering approach, the scene information available to the actual algorithm is limited to the first hitpoint, and multiple bounces between reflective and refractive materials have to be processed with the information at hand. This may result in a loss of detail or the need for more samples in order to achieve satisfactory results.
At the core of our layered filtering approach is an arbitrary image-space denoising filter. In this exemplary case, this is a slightly extended cross-bilateral filter with a sparse sampling pattern based on the voxel filtering technique presented by Laine et al. [
52]. In contrast to suggesting a concrete filtering method, we present a layered filtering framework, which is inherently independent of the filtering method used. This way, more recent filtering approaches could also be integrated and adapted to further improve the results. In order to provide the necessary information to this filter for each bounce of illumination, we expand upon the stored data described in
Section 3.3. For each vertex
belonging to the path
, the following information is stored:
Path segment length ,
accumulated path length , serving as an extended depth buffer,
geometric normal of the current vertex ,
throughput for the next generated ray ,
shininess of the current material ,
reconstructed diffuse illumination , and
diffuse material color (albedo) .
The edge-stopping functions we use are defined as follows:
As each bounce is filtered individually and the results are propagated, we omitted the depth index
i from the description of the edge-stopping functions. The pixel indices are denoted
p and
q. Shininess is required to distinguish diffuse and glossy materials and apply adequate filter parameters. Albedo is required for preserving texture details, while throughput is required to apply the layered approach to non-diffuse materials properly. The parameters
are user-defined scaling factors for normals, accumulated depth, local depth, color differences, and luminance differences, respectively. The color-based edge-stopping function has an additional weighting factor
. The color difference is computed in L*a*b* color space. Generally, user-definable parameters have been chosen empirically by the best visual impression. The values used for the evaluation are mentioned in
Section 4. Note that non-lambertian materials have a separate weighting factor for the color difference, which is not explicitly mentioned here.
All aforementioned information is available on a per-pixel, per-layer basis, which means that there is an actual image per light bounce (which we refer to as a layer). While later bounces of individual paths may arrive at different points in the scene, this is partially accounted for by using the accumulated ray depth as a filter guide. Although this yielded satisfactory results in our tests, there may be scene arrangements and material properties that cause this to be an issue. In such cases, we suggest using hitpoint world coordinates or HashGrid cell coordinates as possible additional filter guides.
Consequently, we process this data in a per-layer fashion, starting at the maximum bounce, filtering the result, and propagating it to the previous vertex of the path. Each time the filtered result is propagated from layer
i to
, it is multiplied with
to account for the actual path throughput. The result is then added to the reconstructed diffuse illumination
, and the accumulated image is filtered and propagated again until the primary hitpoints are reached. Additionally, after each layer has been filtered, it is multiplied with the local albedo
in order to account for high-frequency content, as it is often contained in diffuse textures and shaders.
Figure 9 shows the reconstructed diffuse illumination propagated from the third bounce to the secondary and then to the primary hitpoints.
Our approach to caching, filtering, and accumulating is essentially an approximate final gathering step split into two separate steps: For diffuse materials, the illumination is approximated by integrating the energy arriving at each octree cell in the caching phase. For all non-lambertian materials, rays are traced until a diffuse material is hit in the reconstruction step. Then, the pre-gathered diffuse illumination is queried from the cache at these points. This hybrid approach is directly supported by our layered filtering method, which makes it possible to filter the illumination gathered in the octree cells separately based on local scene information, even if it is only indirectly visible in an image.
Figure 10 shows a comparison between traditional and layered filtering for the first three bounces.