1. Introduction
Employing computer programs and algorithms to generate 2D or 3D images is referred to as rendering. Rendering is a topic of striking relevance in computer graphics with practical impact on many heterogeneous disciplines, spanning engineering, simulators, video games and movie special effects. Collision detection and collision determination are key elements of rendering as they determine the distance between two objects and their possible intersection [
1]. Due to their widespread use in video representation of time-evolving systems, with tens of frames displayed per second, algorithms for rendering are expected to be very efficient [
2,
3]. Generally, to assess whether two complex objects collide, the distance between their respective bounding volumes is evaluated first. Common bounding volumes are cuboidal boxes, whose axes might or might not be aligned, or spheres. Due to their simple geometry, the collision between cuboids and/or spheres is computationally easier [
4,
5,
6,
7], thus enhancing the speed and efficiency of the overall rendering process [
2]. Collision detection algorithms are of utmost relevance in many heterogeneous applications spanning computer graphics for shape modelling and video games [
8,
9,
10,
11,
12], robotics to prevent potential collisions in man–robot interactions [
13,
14,
15,
16,
17], risk assessment associated to vessel collision [
18] or machining of sculptured surfaces [
19], and simulations of molecular or particle systems to estimate their thermodynamic properties [
20,
21].
Collision algorithms have also been key to address the thermodynamics of liquid and solid phases and their phase transition by early molecular simulation studies that employed the hard-sphere model [
22,
23,
24]. More recently, and following the seminal theory by Onsager on the isotropic-to-nematic transition of hard rods [
25], they were fundamental to confirm the crucial role of excluded volume effects in the formation of colloidal liquid crystal phases of anisotropic particles [
20]. Realising the practical impact of the particle shape on the design of nanomaterials triggered the blooming of biosynthetic [
26], chemical [
27] and physical [
28] experimental routes to manufacture precise building blocks with ad hoc properties, including lock-and-key particles [
29], fused spheres [
30], superballs [
31] and cuboids [
32,
33,
34,
35]. The appearance of these exotic shapes unveiled a realm of novel opportunities in nanomaterials science by offering an increasingly varied selection of morphologies for state-of-the-art applications spanning medicine (controlled drug delivery), smart materials (self-healing coatings) and photonics (light detection), among others. Often anticipating experimental evidence, computer simulations have significantly contributed to our comprehension of the effect of particle shape and interaction at the nanoscale on the material properties at the macroscale [
36,
37,
38,
39]. Understanding the fundamentals of such a complex correlation, which develops over orders of magnitude in length and time scales, dramatically depends on the existence of reliable force fields mimicking the interactions between particles. This is not always the case for most exotic particle shapes, whose force field is assumed to be described by mere excluded volume effects and thus only incorporates a hard-core interaction potential. Consequently, efficient and robust algorithms able to detect collisions and intersections between objects become essential to extract structural, thermodynamic and dynamic properties of such systems from a molecular simulation. In colloid science, cuboids are especially intriguing building blocks that can form a rich variety of liquid crystal phases [
40,
41,
42,
43,
44]. Incorporating guest spherical particles in these phases is relevant to understand phenomena of diffusion in crowded environments that display a significant degree of ordering.
In light of these considerations, which highlight the harmonious inter-disciplinary convergence of computer graphics and colloid science, here we report on the specific case of cuboid-sphere collision detection. In particular, we propose our own Oriented Cuboid Sphere Intersection (OCSI) algorithm to detect collisions in monodisperse systems of cuboids and spheres oriented in a 3D space. OCSI is found to be especially efficient when compared to the Quick Rejection First (QRI) and the Quick Rejection Intertwined (QRF) algorithms, and more user-friendly and easier to implement than the vectorized version of the algorithm that employs SIMD Streaming Extension (SSE) intrinsic functions [
7]. For simplicity, we refer to the vectorized version of the collision detection algorithm developed by Larsson et al. with the abbreviation “SSE-intr”, since it uses Intel
® intrinsic functions specific for SSE instruction set. In particular, QRI and QRF make use of a quick rejection test that discards overlaps if the minimum distance,
, between the surface of a cuboid and the centre of a sphere is larger than the sphere radius. As a result that this test depends on the cuboid size and shape, the efficiency of both QRI and QRF is expected to be determined, to some extent, by the specific sphere and cuboid geometry. By contrast, SSE-intr, which runs in parallel and is therefore significantly faster than QRI and QRF, does not need quick rejection tests and makes use of vectorization to estimate
. Our algorithm, available in C and Fortran 90, incorporates a few key elements (e.g., the absolute value to estimate the minimum distance and OpenMP directives to parallelize the code with no use of SIMD intrinsics) that make it faster than QRI and QRF and more versatile than SEE-intr. This paper is organised as follows. In
Section 2, we detail the theoretical framework of the cuboid-sphere intersection problem and the state-of-the-art in software implementation. In
Section 3, we describe the code that we have specifically developed to test each of the above-mentioned algorithms’ efficiency for cuboids of different shape and spheres of different size. The comparison between the algorithms is then discussed in
Section 4, while, in
Section 5, we draw our conclusions.
2. Algorithms
In geometry, a sphere
is identified by its radius,
R, and the position of its centre,
, with respect to a reference point. Similarly, a cuboid
can be defined by the extension of its thickness,
, length,
and width,
, the position of its centre of mass,
and the unit vectors
,
and
that indicate the orientation of its three orthogonal axes. As a result, all the points within the volume occupied by the cuboid can be indicated by a vector
that reads
where
T,
L and
W indicate, respectively, the cuboid thickness, length and width, whereas
is a scalar interval. With these essential definitions, the minimum distance,
, between the surface of a randomly oriented cuboid and the centre of a sphere can be calculated as follows:
where
and
is the Heaviside step function:
The interested reader is referred to
Appendix A for a formal derivation of Equation (
2). To the best of our knowledge, Arvo was the first to propose an algorithm detecting the intersection between a sphere and an axis-aligned cuboid, that is a cuboid whose orientation matches that of the reference axes [
5]. For this specific case, we assume that the cuboid thickness is aligned with the
x axis, i.e.,
, its length with the
y axis, i.e.,
and its width with the
z axis, i.e.,
. Following this assumption, Equation (
1) can be rewritten as
where
are the reference axes for
T,
L and
W, respectively, and
,
and
. Therefore, for an axis-aligned cuboid,
can be calculated as
By using the infimum and supremum of
, the terms in the above sum can be easily calculated:
if , then ,
if , then ,
if , then .
Consequently, the algorithm proposed by Arvo only requires the extreme values of
along with the sphere radius and position and detects cuboid-sphere collisions if
. An illustrative example of a pseudocode describing its main steps is reported in Algorithm 1.
Algorithm 1- Arvo |
- 1:
functionArvo() - 2:
▹ initialising minimum distance - 3:
for do - 4:
if () then - 5:
- 6:
else if () then - 7:
- 8:
end if - 9:
end for - 10:
if () return ▹ checking overlap - 11:
return - 12:
end function
|
The alignment of the cuboid unit vectors with the reference axes is a particular case of a more general scenario with the cuboid randomly oriented. Eventually, Arvo’s algorithm can also be applied to randomly oriented cuboids by performing a transformation of the vectors involved in the calculation of
in the reference frame of
. Rokne and Ratschek proposed to estimate
by employing interval analysis and reported a test to determine whether a point
is within a sphere delimited by four peripheral points [
6]. The algorithms proposed by Larsson and co-workers employ quick rejection overlap tests to enhance the efficiency of collision detection between a sphere and either an aligned or a randomly oriented cuboid [
7]. The pseudocode of these algorithms are reported in Algorithms 2 and 3, respectively. Both QRI and QRF are based on the implementation of a quick rejection test that immediately excludes an overlap if at least one of the summands in Equation (
2) or Equation (
5) is larger than
. For the general case of a randomly oriented cuboid, this condition reads
A geometrical representation of this condition is provided in
Figure 1, where a sphere
of radius
R and centred at
is at the distance
from the centre of mass of a cuboid
that is centred at
. For this specific arrangement, the left-hand side of Equation (
6) measures the distance of
from the face of
delimited by
T and
W and schematically identified by the vertical solid line of
Figure 1. QRI applies this rejection criterion within the loop that evaluates the minimum distance, precisely at lines 6 and 9 of Algorithm 2. By contrast, QRF performs the three quick rejection tests, one for each summand of Equation (
2), before the computation of the minimum distance, between lines 3 and 6 of Algorithm 3. In this case, the scalar products
are stored in line 4 and eventually employed to compute
in the following loop.
Algorithm 2 QRI |
- 1:
functionQRI() - 2:
▹ initialising minimum distance - 3:
for do - 4:
- 5:
if () then - 6:
if () return ▹ quick rejection test - 7:
- 8:
else if () then - 9:
if () return ▹ quick rejection test - 10:
- 11:
end if - 12:
end for - 13:
if () return ▹ checking overlap - 14:
return - 15:
end function
|
Algorithm 3 QRF |
- 1:
functionQRF() - 2:
▹ initialising minimum distance - 3:
for do - 4:
- 5:
if ( or - 5:
) return ▹ quick rejection test - 6:
end for - 7:
for do - 8:
if () then - 9:
- 10:
- 11:
else if () then - 12:
- 13:
- 14:
end if - 15:
end for - 16:
if () return ▹ checking overlap - 17:
return - 18:
end function
|
The different location of the quick rejection tests in QRI and QRF is expected to determine a difference in the efficiency of the two algorithms, which is analysed in detail in
Section 4. Additionally, QRI and QRF quick rejection tests depend on both
and
R, so these algorithms’ efficiency are expected to be influenced also by sphere and cuboid geometry. Finally, keeping in mind the potential application in computational colloid science, where crowded systems are usually simulated, the efficiency of QRI and QRF is also influenced by the system packing, which determines the probability for an attempted move to produce an overlap.
Larsson et al. also proposed a parallel version of Algorithm 1, generalised for randomly oriented cuboids and using SSE intrinsic functions (SSE-intr) [
7]. SSE is an instruction set available in x86 architectures; it uses 128-bit registers to process simple instructions on multiple data in parallel [
45]. By substituting the
if statements in lines 8 and 11 of Algorithm 3 to compute the minimum distance, with the
max and
min functions available in SSE instruction set, the computation of the minimum distance can be vectorized. This algorithm, running in parallel and thus significantly faster than QRI and QRF, does not need quick rejection tests. A pseudocode for this algorithm, here named after the SSE instruction set, is presented in Algorithm 4 for the general case of randomly oriented cuboids.
Algorithm 4 SSE-intr |
- 1:
functionSSE() - 2:
for do - 3:
▹ vectorising the dot product - 4:
end for - 5:
for do ▹ vectorising the cycle - 6:
- 7:
- 8:
end for - 9:
if return ▹ checking overlap - 10:
return - 11:
end function
|
Finally, we present our own algorithm, which incorporates a number of elements providing additional efficiency when compared to Algorithms 1, 2 and 3, and versatility when compared to Algorithm 4. A new element that significantly simplifies the algorithm is the use of the absolute value to estimate the minimum distance. In addition, we employed OpenMP directives for an SIMD parallelization of the two loops, one over the computation of the dot products of the distance of the centres of mass of the two particles with the orientation of the cuboid, and the other over the computation of the minimum distance, without using SSE intrinsic instructions. OpenMP is an application programming interface specification composed of compiler directives, library routines and environment variables for parallel programming in Fortran and C/C++. From version 4.0, it provides mechanisms to assist SIMD parallelization of loops [
46]. The advantage of avoiding explicit SIMD vectorization is the possibility to vectorize the algorithm using different instruction set architectures, such as the more modern Advanced Vector Extensions (AVX) instruction set [
47], by simply changing compiler settings during compilation. Moreover, in this way, vectorization of the algorithm can be assisted for different programming languages, e.g., Fortran, since SIMD intrinsic functions are available only in C and C++. Given the heterogeneous nature of the communities using collision-detection algorithms and their preference for likely different programming languages, an user-friendly code is a crucial advantage. Our algorithm, referred to as Oriented Cuboid Sphere Intersection (OCSI), proved to be efficient and functional for both C and Fortran 90 (F90). Its pseudocode is presented in Algorithm 5.
Algorithm 5 OCSI |
- 1:
functionOCSI() - 2:
for do ▹ this cycle is vectorised - 3:
- 4:
end for - 5:
for do ▹ this cycle is vectorised - 6:
- 7:
- 8:
end for - 9:
if return ▹ checking overlap - 10:
return - 11:
end function
|
3. Computational Details
To test the relative performance of the above algorithms, we have developed two versions of the same program in C and in F90 that detect collision between one cuboid and one sphere. For compatibility with the benchmark program by Larsson et al., all the floating point variables are expressed in 32-bit single precision. The dimensions of the cuboid and sphere are given in units of the cuboid thickness
T, which is our unit length, and do not change within the same detection-collision test. In particular, the colloid length and width are
and width
, respectively, whereas the sphere radius is
. For each of the cuboid shapes analysed, we have pondered the impact on the algorithms’ efficiency of changing the sphere radius between
and
. To control the value of the acceptance ratio, i.e., the percentage of random configurations that do not produce overlaps, the sphere
is generated within a spherocuboid. This spherocuboid, centred and oriented as
, results from the Minkowski addition [
48] of
and a sphere larger than
and whose diameter is optimized to obtain the desired acceptance rate. A dedicated program optimises the volume of the spherocuboid according to the target value of the acceptance ratio and the dimensions of both
and
, which are specified as input parameters. To generate a configuration,
is initially aligned with the reference axes and its centre is set as origin, while the centre of
is randomly positioned within the volume of the spherocuboid. Then, the reference system is randomly rotated and the cuboid-sphere overlap checked. For consistency, the section of the code that calls the overlap function is the same as that proposed by Larsson et al. [
7]. The time spent by each algorithm to detect collisions for a specific case of cuboid and sphere (in term of radius of the sphere and dimensions of the cuboid) is computed for 3 independent sets of
configurations and then averaged out. The efficiency of the algorithms has been assessed on a Lenovo ThinkCentre M920s Desktop PC, with 8 Gb of DDR4 RAM and Intel
® Core™ i5-8500 CPU @ 3.00GHz (Coffee Lake) CPU with 9 Mb of cache, with Ubuntu 18.04 Desktop version OS. In order to prove the versatility of our algorithm, we performed benchmarks using two different compilers. In particular, we compiled the F90 and C/C++ versions of the program using Intel
® Fortran and C Compilers version 19 [
49] and GNU Fortran and C++ Compilers version 10 [
50]. Both the compilers used OpenMP API 4.5 Specification for vectorization [
51]. In addition, for all the cases listed above, we compiled two versions of the same program, enabling the generation of SSE or AVX instructions. In this work, configurations of cuboids with
,
and spheres with
with an average acceptance ratio of 40% have been tested.
4. Results and Discussion
Due to the large number of benchmarks performed, we intended to report here the behaviour of the run-time efficiency of the algorithms with respect to the shape of the cuboid and the sphere only for the programs compiled using Intel
® C and Intel
® Fortran Compiler, enabling the use of AVX instruction set for SIMD parallelization. The dependence of the algorithms run-time with respect to the shape of the cuboid and the sphere is generally similar for all the compilers and the instruction sets specified during compilation. All the results obtained for the other cases are reported in the
Supplementary Information. The relative performance of each algorithm is assessed in
Figure 2 and
Figure 3 for codes written in C and F90, respectively.
Figure 2 offers a benchmark between QRI, QRF, SSE-intr and OCSI, while
Figure 3 only for QRI, QRF and OCSI, since SSE-intr uses specific Intel
® intrinsic functions: these sets of functions enable to use SIMD instructions (like SSE and AVX) without the need of an assembly code for vectorization, but they are available only for C programming language. Both figures report the run-times for detecting collisions between one cuboid of
and
and one sphere of radius
(frames
a), 0.5 (frames
b) and 5 (frames
c). The
random configurations tested per run have been produced at the constant acceptance ratio of 40%, which is within the usual range of values employed in Metropolis Monte Carlo simulations of hard-core particles [
52]. It is evident that SSE-intr and OCSI perform significantly better than QRI and QRF under the conditions specified here, although we have also tested cuboids of larger length and width (up to
) with the same acceptance ratio and observed very similar tendencies. The difference in performance is especially evident at
as SSE-intr and OCSI run-times are up to 5 and 6 times faster than QRF and QRI, respectively. In general, C codes show a better performance than F90 codes, although this difference is not substantial. Interestingly enough, SSE-intr and OCSI do not present any relevant dependence on the cuboid and sphere geometry, being the run-times basically constant across the whole range of dimensions. This is probably due to the SIMD parallelism implemented, different from QRI and QRF, which have to run in serial for their use of quick rejection tests (see lines 6 and 9 in QRI and line 5 in QRF). Basically, if the quick rejection test is true for the first dot product, the algorithms exit the loop with negative result (
and
do not overlap) with no need to compute the remaining two.
The geometry of both cuboid and sphere exhibits a very intriguing effect on the performance of QRI and QRF as the shape of the run-time surfaces in the
plane suggests. More specifically, for spheres with
(frames
b in
Figure 2 and
Figure 3) the time required for the collision detection decreases upon increasing the cuboid dimensions, with the shortest time detected at
(disk-like cuboid). Larger spheres, with
(frames
c in
Figure 2 and
Figure 3), induce a different performance resulting in an opposite concavity of the run-time surface as compared to that observed for smaller spheres. In this case, for the results obtained using Intel
® Compilers and specifying AVX instruction set during compilation, the slowest detection is measured at
and
for QRI and QRF in C/C++ program, respectively, and
and
for QRI and QRF in F90 program, respectively. For the parameters set in these benchmarks, in terms of acceptance ratio and shapes of cuboids and spheres, QRF is generally faster than QRI. The only exceptions to this tendency are observed for the C/C++ program compiled either with Intel
® C Compilers with AVX instructions (panel (a) of
Figure 2) or with GNU C++ Compiler with SSE instructions (see
Supplementary Information), in both cases when the spheres are especially small (
). The difference in performance between QRI and QRF might also be due to how data are stored and read by C/C++ and F90 compilers. As a matter of fact, Larsson and coworkers had already noticed that the run-times of QRI and QRF were very similar for acceptance ratios of approximately 50%, although their collision-detection method tests run-times for sets of configurations with cuboids and sphere of random dimensions [
7]. Despite the main differences between C /C++ and Fortran programming languages, the average run-time performance of QRI and QRF with respect to the radius of the sphere available in C is similar to the ones we translated and provide also in Fortran in our benchmark source code.
To more easily compare the efficiency of the algorithms tested, the run-times computed for each possible combination of cuboid and sphere size studied here have been averaged out for each value of
. The resulting averaged run-times for all these cases, which are 400 considering all the possible combinations of
and
of the cuboids, are reported in
Table 1 and
Table 2. For every averaged run-time reported in both tables, we evaluated also its standard deviation, which resulted to be less than 0.5 ms for all the cases. For comparison with benchmarks performed by Larsson et al., the run-times are reported with a precision of 1 ms [
7]. We stress that these average run-times should be taken as indicative values for QRI and QRF as their speed strongly depends on the cuboid geometry. We observe that QRI and QRF average run-times tend to be longer for larger spheres, with no significant difference between C/C++ and F90. In contrast, both SSE-intr and OCSI are completely insensible to the sphere radius as no relevant change in their average run-time is detected between
and 0.5.
Regarding the performance of OCSI, we notice that its C version is faster than the F90 version for both the compilers. Moreover, checking the optimization report of the two compilers, we observed that GNU compilers were not capable of vectorizing the first loop of OCSI over the dot products. This seems to be the reason why Intel® Compilers performed better in terms of run-time efficiency. Anyway, except the F90 version compiled with GNU Fortran Compiler, for all the other cases the average OCSI run-time to analyse cuboid–sphere configurations oscillates between 10 and 12 ms. Even for the worst-case scenario, OCSI is still faster than QRI and QRF.
We also notice that the performance of OCSI compiled using SSE and AVX instruction sets is very similar to that of SSE-intr, with our algorithm approximately 1 ms faster or slower than SSE-intr, depending on the compiler and the instruction sets called during compilation. This difference, per se, would not be especially significant if the overlap checks were limited to configurations. However, the typical number of configurations generated in Monte Carlo simulations of colloids is usually a few millions per particle, which are rarely less than a few thousands. Moreover, because colloids can be especially dense systems, one random configuration might generate more than a single collision. Consequently, it is reasonable to assume that, within a single Monte Carlo simulation, a collision-detection algorithm might be called between and times the configurations explored here. This would produce a difference of seconds between OCSI and SSE-intr, which is still not especially relevant.
The main advantage of using OCSI is that it is based on automatic vectorization and employs OpenMP libraries to be parallelized, making it a very user-friendly algorithm. Since SSE-intr uses intrinsic functions that are specific for SSE, this version of the algorithm for cuboid-sphere collision detection is limited only to that instruction set. Moreover, Intel® intrinsic functions are available only in C and cannot be used in Fortran, unless we compile a mixed C/Fortran program. In contrast, OCSI is based on automatic vectorization by the compiler, guided using OpenMP pragmas and directives. In this way, the algorithm can be extended to different instruction sets without changing the source code, simply specifying the instruction set during compilation. It can also implement vectorization in Fortran and extend its use to 64-bit floating point arithmetic, which is commonly used in molecular modelling. OCSI proved to be efficient for the most two common compilers, for two different programming languages and for two different instruction sets, SSE and AVX, highlighting its versatility with respect to SSE-intr.
5. Conclusions
In summary, we have benchmarked four different collision-detection algorithms that check the occurrence of overlaps between one cuboid and one sphere. Our analysis focused on a specific acceptance ratio, which is within the usual range applied to efficiently sample the configuration space of hard-core systems in Monte Carlo simulations [
52]. We notice that SSE-intr has been previously tested for different acceptance ratios and did not show relevant changes in performance [
7]. A similar tendency is also expected for OCSI, but should be confirmed by further tests. While QRI and QRF are observed to be geometry-dependent, SSE-intr and OCSI are basically insensible to the cuboid anisotropy and sphere radius and, thanks to automatic vectorization, they are also significantly faster. According to these results, we expect OCSI and SSE-intr run-times to be constant also for bigger spheres, while QRI and QRF run-times can show a different behaviour than the ones observed so far. To ascertain these tendencies, further tests should be performed. In particular, the OCSI algorithm proved to be especially valuable in terms of performance and simplicity of implementation in both C and F90. Intel ®compilers and GNU Compilers were not able to automatically vectorize QRI and QRF. Anyway, there are ways to perform conditional statements like the ones used in the quick rejection test implementing SIMD parallelism [
53]. Whether or not vectorized versions of QRF and QRI are possible and, if so, how efficient they would be as compared to the current versions requires further study.
It should be stressed that the method applied to generate the sphere around the cuboid is crucial to provide a robust comparison between different algorithms. The choice of the spherocuboid as a sampling volume allows to precisely set the desired acceptance ratio and guarantees that the algorithms are tested for all the possible positions of the sphere around or inside the cuboid. This is especially relevant to fairly assess the performance of QRI and QRF, due to their use of quick rejection tests. In Monte Carlo simulations, where the generation of configurations follows a different procedure, the performance of collision-detection algorithms, most likely affected by the degree of system order and packing, should be tested. Finally, it is important to mention that the OCSI algorithm allows for the calculation of the cuboid-sphere minimum distance, hence suggesting a future study to determine a suitable interaction potential beyond mere hard-core interactions. The formal proof reported here can also be useful to test the intersection of cuboids with particles of different shape. As a final note, we stress that our algorithm has been only tested for specific pairs of geometries (cuboids and spheres), while it might be relevant, in computational colloid science as well as in computer graphics, to assess its performance with other geometries. We are currently working on extending our algorithm to detect collisions between cuboids and oblate or prolate spherocylinders. It would also be especially interesting to investigate to what extent our methodology could be applied to more complex geometries, whose collision is generally detected by more sophisticated decomposition techniques, such as e.g., Voronoi diagrams [
54] or convex polygon triangulations [
55,
56]. We hope that our contribution might stimulate further research in this direction.