1. Introduction
A virtual TV set allows the composition of real actors and/or objects with computer generated elements or even full environments, in real time. Nowadays, this technique is used in most of the broadcasting TV channels (even in live programs) and cinema (in post-production) in order to locate actors in places where they are not and even making them interact with virtual objects or actors (
Figure 1). The origin of this technology dates back to the 1930′s when a patent presented by Goldsmlith [
1], described the process of “insetting” the information from one television camera into the signal from another camera in order to produce a composite result. The term “inset” was used to describe the process where an area in a live captured or recorded video signal (generally an actor) is used to delete a corresponding area from a second camera image (generally a real or synthetic background), so insetting the first camera’s picture, precisely, in the “hole” created in the second camera’s signal.
At the beginning, television was monochrome and brightness variations in the signal were used to delimit the areas to be erased. With the advent of color television this technique evolved towards chroma-keying [
2], which uses hue differences to discern if an area is part of the background or the foreground of the image (
Figure 1). It is a very efficient approach that works by comparing pixel by pixel its color with the chroma key (reference hue) taking into account a variation threshold that must be accepted in order to erase all the similar tones and not only the identical ones. The main disadvantage of this method is that, being based in color, the chroma key hue can only be used in the elements that will be deleted in the final composition, being impossible to preserve the actor’s cloths or the objects with the same tone. Due to this problem, since its beginnings, there have been many attempts to substitute or improve chroma-keying using different approaches. For example, Yamashita et al. [
3] proposed the use of striped cycloramas using two different colors, Agata et al. [
4] tried using a checker pattern and the enterprise LEDchromaKEY uses a reflective screen that does not require additional illumination as it works with a led ring located around the camera lenses [
5]. Different approaches make use of depth information to separate background and foreground using multiple cameras [
6], or time of flight IR cameras [
7]. However, the bigger complexity of these approaches along with their limited advantages over chroma-keying, have produced that it remains as the most common approach for composite real-life productions.
In a virtual TV set, the quality of the final integration comes both from the chroma-keying quality and other elements such as the interaction between the presenter and the virtual environment. Therefore, in this paper, we focus on improving and simplifying the chroma-keying process by ensuring the homogeneous illumination of the cyclorama surrounding the stage keeping the hue or color variation along its surface as minimum as possible. The differences in color may be originated in the painting, such as humidity, wrong painting technique, etc. (in which case the best solution is to repaint the cyclorama); or a wrong illumination, causing the appearance of shadowed or over-lighted areas, that requires the, recalibration of the lights. This last case is the most common, as the lamps tend to degrade with time and the light position may be modified by the day-by-day work on the set. For this reason, the light configuration must be re-calibrated every so often.
The calibration of the cyclorama illumination is generally done by an expert that makes use of a photometer to take several samples along its surface and check that the results are consistent. It is a tedious and handmade process that leaves in the hands of the expert both the sampling and the interpretation of the data making it non-viable for non-experts. Moreover, the fact that an expert is needed makes it an expensive solution (as he must be hired or someone in virtual TV set staff must be trained for that purpose).
The main contribution of this paper is the development of a protocol, which makes the configuration of a correct illumination for any cyclorama by non-experts using a visual tool that simplifies the detection of shadowed and over-lighted areas, possible.
In the next section the background and related work is summarized in order to give a general view of the issue and actual calibration process. Then our solution and its implementation are presented. In
Section 4, the different tests performed are commented. Finally, the conclusions and future work are enumerated.
2. Background and Related Work
As it has been explained earlier, one of the main elements of a virtual TV set is the monochrome cyclorama that surrounds the stage to allow the use of the chroma-keying technique. Any color can be used as the key, being its main requirement the existence of a difference between the key color and the hue of any element that must not be erased in the final composition. The most common colors used for chroma-keying are the saturated green and blue. Blue was more used traditionally due to the use of analogical cameras, as it created less reflections and its presence in nature and human skin is less prevalent. Nowadays, green is the prevalent color, as the sensors of the digital video cameras are more sensitive to this hue [
8]. This better sensitivity for the green channel makes the captured image less noisy, reducing the amount of light needed for the cyclorama [
9], thus avoiding undesired reflections in the elements on the stage that may be stained with green light.
The size of the green areas on the stage may vary from a simple screen, to a plane background or a complete cyclorama surrounding the stage (
Figure 2 top). However, the most usual configuration in a professional virtual TV set is based in a green room, including three walls (both sides and the background) and the ground. One of the main characteristics of this kind of cycloramas is the absence of sharp edges in the union of the walls and the ground in order to avoid shadows that would lead to the appearance of artifacts in the final composition.
Besides the physical construction of the stage, the chroma-keyer itself must be considered in order to improve the chroma-keying process. It receives three video signals: the first one recorded from the cameras, the second one with the computer-generated elements and the third one with a mask that indicates when a virtual element is in front or behind the real scene. Then it uses these three signals to create a composite image combining real and synthetic elements using the color information captured from the cameras and the position information in the mask. The chroma-keyer can be implemented by software (as a computer program) or by hardware (using specific devices). Regardless of its implementation, the process can be configured in order to optimize its behavior for each specific TV set. Mainly both the base hue to be erased and the threshold of the accepted color variability can be configured. The increment of the threshold implies a bigger tolerance to color variations, admitting a worse illumination of the background but being less precise in the hue that is being erased (
Figure 2a). As it has been remarked before, in order to reduce the threshold, thus improving the chroma-keying process performance (
Figure 2b), the cyclorama illumination must be as diffuse and uniform over all its surface as possible. Shadows and shines that may cause tone variations should be avoided as they may produce noise and artifacts in the final composite image, forcing a threshold increase and thus reducing the precision of the process.
In this way, in order to separate the background from the elements to preserve in the final composition adequately, the light of the cyclorama must be uniform and diffuse, avoiding shines and shadows. As a general rule, there must be a difference in the light that helps increasing the color difference between the actors and objects and the monochrome background [
10]. To achieve this goal, the most common solution in commercial virtual tv sets consists of two different illuminations: one for the actors, atrezzo and scenography (usually based on the traditional configuration with frontal, side and back lighting) and another one for the cyclorama. Fresnel spotlights are usually used in the frontal scene the illumination [
11] as they are able to create a good quality lighting (not too strong but not too soft). Led lighting is not recommended for several reasons such as their limited light quality and consistency. When a more complex effect is required, video projectors can also be used as light sources [
12]. For the background, the lights used are large and white, usually fluorescent tubes. The ideal configuration uses specific tubes that saturate green or blue colors (depending on the color of the cyclorama), eliminating the red ones [
13]. To ensure that the scene is correctly separated from the background, the cyclorama should receive an intensity of approximately 2/3 of that used for the actors, atrezzo and scenography [
14].
The traditional illumination calibration process, which (as it has been pointed previously) is carried out by an expert, is recursive and tedious. It may take hours and must be repeated periodically because of the lighting changes in time due to lamp degradation, changes on the frontal lighting, displacement of the light sources, etc. Moreover, these problems are especially relevant in mobile virtual TV sets and low-cost productions, where the changes are more frequent and there is no time or money for an expert to perform a proper calibration.
Therefore, the development of a custom computer tool (linked to a new calibration process) in order to ease the calibration of the cyclorama illumination by non-experts is a very interesting solution. In the literature, we have not found any paper directly dealing with this problem, but we have taken inspiration in the field of computer visualization, where numerous techniques that allow the extraction of the real-world illumination characteristics in order to use them in a virtual environment, have been developed. Generally, these techniques perform an analysis of the light distribution in the real environment obtaining a discrete number of samples which are able simulate it in the virtual world. The methods used to perform this sampling range from recovering the reflectance properties of the surfaces in a set of photographs [
15], using both the surface reflectance and the light source energy [
16], using compact factored representations of the BRDF [
17], performing stochastic ray tracing of direct illumination and importance sampling of the product of distant lighting and surface reflectance [
18], introducing dynamic restriction of the sampling domain and balance of the number of samples [
19], stratifying the maps into rectangular regions and estimating the contribution of each region using Monte Carlo integration techniques [
20], determining quadrature rules for computing direct illumination of diffuse objects [
21], using hierarchical stratification algorithms [
22], generating sampling patterns from the importance density [
23] or subdividing the images in quadrangular regions of similar luminance [
24]. From this extensive list of methods, we take inspiration from Paul Debevec’s median cut algorithm for light probe sampling [
24]. This is a well-known, fast and low computational cost algorithm that samples a High Dynamic Range (HDR) image from a real environment in
regions of similar luminance (light energy) and calculates a representative light source for each region, placing it in its centroid (the point that has equal luminance up and down and left and right in the region). The source color is set to the average value of the three color channels of the pixels in the region. These point sources are then used to simulate the real light pattern in a virtual environment by mapping them into a virtual sphere surrounding the synthetic elements. We have chosen this algorithm for its interactive computational cost and the simplicity of the idea it presents: dividing an image in 2n regions with the same light energy.
As our goal is to achieve a homogeneous illumination of the cyclorama, but the median cut sampling algorithm does not directly provide this information, we have reinterpreted the information provided by the algorithm (regions with similar luminance, centroids, and representative color). A homogeneously illuminated cyclorama analyzed using the median cut sampling algorithm should meet three main characteristics:
All regions must have the same size. If the illumination is homogeneous, there is no difference between the number of pixels in every region.
The centroid of each region must be in its geometric center. If the illumination inside one region is homogeneous, the resulting centroid position in has to be equal to the center of the region.
The color of the light sample associated to each region must be equal to the one used in the chroma-keying process. If the illumination is constant there are no differences between the representative colors of the regions.
These three points describe a perfect and homogeneous light distribution that is very unlikely to be completely matched in reality. Therefore, the main objective of the cyclorama illumination calibration process must be to get as closer as possible to this configuration. In order to achieve this goal, a new calibration process, which is presented in the next section, has been designed.
4. Tests
To prove the validity of the proposed calibration process, two sets of tests were performed. The first one focused on correcting the actual light conditions of a virtual TV set using our workflow (with an expert validating the results) and the second consisted on altering the lighting conditions of a virtual TV set cyclorama to observe the results using known artifacts and creating a set of example cases to illustrate the user manual for nonexperts. Both sets of tests focused in proving the effectiveness of the visual analysis of the different generated images as well as the correspondence between these visual results and the statistical data obtained. Among all the statistical data obtained, we focused in the standard deviation of the area (number of pixels) of the regions. As all the regions should have a very similar area, the standard deviation should tend to cero. In order to facilitate the understanding of the magnitude, we used the coefficient of variation expressed as a percentage to interpret the results. The process was applied dividing the images in 64, 128, 256, 512 and 1024 regions in order to compare the results and verify if the number of divisions had any impact in analysis.
The study was developed in a virtual TV set at the Faculty of Communication Sciences of the University of Santiago de Compostela (Spain). The set has a green cyclorama of 5.10 m wide by 3.3-m long and 3-m high surrounding a manually illuminated acting area. At the beginning of the test process the cyclorama lighting presented several visible artifacts due to time and usage since their last calibration. As it has been previously mentioned, the first set of tests focused on correcting these artifacts.
The first iteration of the calibration process was performed in order to check the original state of the illumination. After applying the first two steps a set of images for each number of subdivisions was obtained (
Figure 8a). In this image the difference between areas is noticeable being the ones in both sides slightly bigger than the ones in the center (
Figure 8a left). This empirical information is confirmed by
Figure 8a right in which the correspondence of the centroids is represented. In the top and center of the image there is an accumulation of centroids while in the sides their number decreases. With these two elements we can empirically conclude that there is too much light in the center and top of the cyclorama and there are shadows in both sides. In
Figure 9, the number of ideal centroids with 0, 1, 2 or more than two correspondent real centroids are represented. The number of centroids with 0 correspondences is around 10% in all cases, being superior in the 512 and 1024 regions configuration.
From a statistical point of view, the coefficient of variation of the size of the regions goes from 25.5% when the image is divided in 64 regions to 27.6% for the 1024 subdivision (
Figure 10). This means that the variation of the size of the regions with respect to the average area (which corresponds to the ideal area size) is, in average, between these two values. As a summary we can conclude that the size of the regions varies between 26% and 28% of the size of the ideal area depending on the number of regions.
The average distance between the geometrical centers of the regions and the centroids representing them is shown in
Figure 9. These distances go from 4% to 2% of the side of the ideal region with a coefficient of variation between 3% and 2%. According to this, the biggest average distance between a centroid and its correspondent is 6.67% of the side of the ideal region, a difference that is not appreciable in a visual examination of the results. This is due to the uniformity of the illumination inside the regions, as there are not big artifacts that are kept inside a region. As the region gets smaller, the probability of finding big differences in the luminance inside it, decreases.
The average color and standard deviation in a 0–256 scale for each channel are shown in
Figure 10. From these data we can conclude that the amount of green is around three times the amount of red and four times the amount of blue, being a good configuration for the chroma-keying technique (as one of the channels stands out from the other two). The coefficient of variation is under 4% for the red, under 2% for the blue and under 8% for the green, as the green is the dominant color.
After this analysis, the illumination was modified by considering that the luminance should increase in both sides of the cyclorama to match the amount light in the center. The new configuration was verified by an expert using the traditional photometer procedure, confirming the validity of the information obtained through the proposed calibration process to detect and correct light variations in the cyclorama.
Then, a second iteration of the calibration process was performed in order to check the improvements in the results and set a data reference from an illumination validated by an expert. An example of the resulting images is presented in
Figure 8b.
Visually, the results show an improvement in the uniformity of the regions yet keeping a difference between the regions on both sides and the regions in the center. From the point of view of the centroid correspondence (
Figure 9), the results are similar.
Statistically the coefficient of variation is reduced between 6% and 4% (depending on the number of regions in which the image is subdivided). In this case, the coefficient of variation goes from a value of 19.7% for 64 regions to 23.3% in the 1024 configuration (
Figure 10). This made us venture that a result with a standard deviation under 25% may be acceptable for the actual standards of illumination quality in the industry.
The average distance between the geometrical centers of the regions and the centroids representing them (
Figure 9) is very similar to the original one, reducing the coefficient of variation noticeably in all cases. This shows the increase in the uniformity of the luminance, but the values are still so small that they are visually undistinguishable.
The average color and coefficient of variation (
Figure 10) show a proportional increase in the numbers, keeping the relations between the three channels.
Although an expert confirmed the validity of this lighting configuration, as some small lighting differences were visible in the images, the illumination was modified again in order to explore the possibility of improving the results even further than the ones obtained using the manual procedure. Thus, a third iteration of the process was performed. Some visual results are presented in
Figure 8c. As it can be noted in these images, the uniformity of the regions has increased, and the number of free centroids or centroids with more than one correspondence has decreased to under 10% in all cases (
Figure 9).
Statistically, an improvement of between 5% and 8%, depending on the number of regions, is achieved. As it is shown in
Figure 10, the coefficient of variation of the area of the regions goes from 14.3% for 64 regions to 16.3% for 1024 regions. This supposes an improvement of more than the 10% with respect to the original worn out configuration.
The average distance between the geometrical centers of the regions and the centroids representing them (
Figure 9) was reduced to under 2% in all cases but the 64 regions configuration (2.45%). The coefficient of variation is kept under 1.5% in all cases with the exception of the one with 64 subdivisions, in which it reaches 1.9%.
The average color and coefficient of variation do not show noticeable changes, as the proportions between the different colors are maintained.
Finally, the lights were recalibrated again, and a fourth iteration was performed, but it did not show any significant improvement, probably due to the number and quality of lights in the set, which did not allow a better configuration.
After proving the effectiveness of the calibration process in a standard situation, such as correcting the illumination of a cyclorama that has been affected by its use, it was necessary to test its validity to detect severe artifacts that could affect decisively the chroma-keying quality. For this purpose, we created four configurations with distorted illumination and known characteristics to check if both the visual and statistical results were able to detect them.
The first artifact consisted on using a spot light to create a very over lighted area in the right part of the cyclorama. Both the visual and statistical results were analyzed. Visually, there is a concentration of small regions around the spot light and many centroids with more than one centroid correspondence in that area (
Figure 11a). This is corroborated by the data shown in
Figure 12, where around 20% of the ideal centroids have a correspondence of more than 1 real centroid, a percentage that doubles the previous tests where the artifacts were much subtler.
The coefficient of variation of the regions area shows the big differences existing between the small regions in the over lighted surface and the big ones in the rest of the cyclorama. As a result, the values go from 41.4% for 64 regions and 45.8% in the 1024 configuration (
Figure 13). This means that the average difference between the region areas is of almost half the area of the ideal region size.
The average distance of the centroids with respect to the geometric center of the regions is also bigger than the one showed in the previous tests (
Figure 12). Moreover, the coefficient of variation of the distance is widely increased, reaching 4.2% in the 64 regions configuration.
From the color point of view, the effect of this artifact is also noticeable in the increase of the variation on the three channels (
Figure 13). Being the value of the green channel 148, the coefficient of variation is 61.4, representing 41.5% of the actual value. This implies the concentrated illumination of the spot light creates huge color differences between the over lighted areas and the shadowed ones.
The second artifact consisted on using the same spot light but introducing a more centered distortion in order to check the impact of the residual light around this area in the final result and the importance of the light artifact location with respect to the results obtained. As it can be seen in
Figure 11c, the visual result is very similar to the previous one, with many small regions concentrated in the spot light, and bigger areas in the rest of the image. The behavior of the centroid correspondence is also very similar (
Figure 12).
The statistical results do not show either significant differences, showing a coefficient of variation of the region areas between 43% for 64 regions and 46.5% in the 1024 configuration. The same happens with the color and the centroid distance (
Figure 12 and
Figure 13). Thus, the difference introduced by displacing the spot light is not relevant as it creates the same artifact in a different point of the cyclorama, not affecting the global results of the calibration process.
The third distortion consisted on using only two extern and very diffuse actor lights to create a slightly over-lighted area in the center of the image. The resulting images are very similar to those of the original time-affected configuration, with a central over lighted area (especially in the top of the cyclorama) and shadows in both sides (
Figure 11d). The correspondence between real and ideal centroids range between 15% and 20% in all cases but the 64 regions configuration, in which every centroid has only one correspondent.
As it is expected, the statistical results are also very similar to the original configuration of the set, ranging from a coefficient of variation of 26.7% for 64 regions to a 29.9% for the 1024 configuration. The three color channels keep the proportions (
Figure 13). The coefficient of variation is reduced with respect to the spot light tests as the artifact introduced is much subtler. The centroid average distance and coefficient of variation is also very similar to the one showed in the original configuration (
Figure 12). This confirms the usefulness of the visual approach of the proposed calibration process, since apparently similar images have a correspondence of similar statistical results.
Finally, the fourth distortion that was tested consisted on using the diffuse actor lights to create the over lighted area in the center left side of the cyclorama, in order to check if the position of more subtle variations had any influence in the results. The resulting images clearly show the location of the over lighted areas in the left and top of the cyclorama. The main difference between this configuration and the previous one is the light sources position. Whereas in the previous test the light came directly from a frontal position, in this case it comes crossed from the right side of the stage, increasing the distance and, therefore, the illuminated area. However, the amount of light that reaches the cyclorama is the same, as the same light sources were used. As a result, the statistical data obtained is very similar in both cases even when the artifacts introduced are visually different. The coefficient of variation in this case goes from 26.3% of the ideal area for 64 regions to 27.7% in the 1024 configuration. The number of centroids with no correspondence goes from 10% to 15%, except in the 64 regions configuration, in which, as in the previous test, every centroid has only one correspondence (
Figure 12). The average color and coefficient of variation are also very similar to those showed in the center diffuse light artifact (
Figure 13). The difference in the average distance of the centroids with respect to the geometric center of the regions is also very similar, apart from the 64 regions configuration, in which the distance is reduced in more than 2% of the side of the ideal area. Anyway, these differences are still very small to be appreciable at glance.
5. Results and Conclusions
A calibration process in order to obtain a homogenous illumination for monochrome cycloramas has been developed. This workflow uses HDR images and a custom tool called CYIA that, using a new interpretation of the Paul Debevec’s median cut algorithm for color image quantization results, makes it possible for a non-expert user to check the homogeneity of a cyclorama and detect the illumination artifacts that could affect the quality of the chroma-keying process. The importance of the homogeneity of the cyclorama illumination in the chroma-keying process and the improvements that can be achieved in the quality of the resulting composite images are shown in
Figure 14.
Several tests have been performed using 64, 128, 256, 512 and 1024 regions subdivisions in order to detect possible differences in the calibration process results both from a statistical and visual point of view. Statistically, there is a common tendency in the relation between the standard deviation of the region’s areas and the size of the ideal regions. The percentage of the ideal area represented by the standard deviation tends to increase slightly with the growth of the number of subdivisions. This is due to the decrease in the size of the regions and it is proportional in all the tests, with all the graphs showing very similar slopes (
Figure 10 and
Figure 13). As this increase is similar in all cases, it can be obviated, as the data from two different configurations should be compared using the same number of subdivisions in order to get solid conclusions.
From the visual point of view the impression is that the information tends to be clearer as the number of regions increases. However, when the regions tend to be too small, the size differences tend to be less noticeable. In the range we used, the 512 subdivisions case tends to be the clearer configuration for the visual interpretation of the results, as there are enough regions to easily detect accumulations, but their size is big enough to clearly see the size difference between the over lighted regions and the shadowed ones. The nearness correspondences between the ideal centroids do not show a direct relation between the number of regions and the percentage of centroids with more than one correspondence, showing that, from the statistical point of view, this approach is only useful when more obvious light artifacts are present in the cyclorama illumination. However, the two types of images used in the tests proved to be very useful to visually detect lighting artifacts. The subdivided image is very precise when dealing with big artifacts in which the illumination differences create big accumulations of small regions in the over lighted areas of the cyclorama. On the other hand, the image showing the closer centroids correspondence eases the detection of more global variations, as in the shadowed areas many centroids will not have a correspondent while in the over lighted regions each of them will have several. This makes it fast and easy to interpret the images and design the needed changes in the illumination configuration.
The distance between the centroids and the geometric center of the regions showed to be too small to be detected visually, as it represents, in average, less than 6% of the side of the ideal region in the test cases with bigger artifacts. This is due to the monochromatic nature of the cycloramas and the subdivision process of the median cut algorithm, which makes that the area inside each region tend to be homogeneous. Statistically, the difference is especially appreciable between the worse cases and the best ones.
From the point of view of the color analysis, the results show an increase in the standard deviation of each channel related to the magnitude of the artifact. Nevertheless, the proportions between the three channels are kept stable in all the tests. So, the color analysis does not show huge differences between the different tests, but the results should be used to configure the chroma-keyer key (based on the values obtained for the three channels) and the threshold of acceptance (based on the standard deviation). This makes it possible to get the best chroma-keyer configuration for a specific cyclorama and illumination.
As a summary, the tests show that a division of the images in 512 regions is enough and that most useful data to correctly interpret the illumination configuration errors are both the coefficient of variation of the region sizes and the visual correspondence between ideal and real centroids.
The effectiveness of the proposed calibration process has been proved through the recalibration of a real virtual TV set cyclorama illumination which had been altered by time and use. The first iteration showed lighting differences along the cyclorama surface (both through the visual approach based on images and the statistical approach). This information was used to redirect and modify the intensities of the lights in order to correct the illumination. An expert validated this new lighting configuration through the traditional method, doing it by hand using a photometer. Even though the quality of the new illumination was valid for a proper chroma-keying, and the data obtained was set to be a quality threshold (
Table 1), the analysis of the images and the statistical results showed that the uniformity of the illumination might be improved. The lights were modified again using the information obtained and the results of a third iteration showed that, indeed, it could be improved. This proved that the proposed calibration process allows the user to perform a finer grain calibration than the traditional approach due to its immediacy, precision and the global view it gives of the lighting conditions compared to the individual samples taken with a photometer. Moreover, it eases the visual interpretation of the results by non-expert users as they only need to interpret correctly the meaning of the variations produced in the size of the regions and the relation between the ideal centroids positions and the real ones. From the tests performed, we can summarize that, statistically, the coefficient of variation of the region size with respect to the ideal size should be under 25%, the average distance between the centroids and the geometrical center of the regions should be inferior to 3% of the longitude of the side of the ideal region and the chroma-keying hue coefficient of variation should be under 12% (
Table 1).
The effectiveness of the calibration process has also been tested by creating ad-hoc artifacts which have been detected easily using both the visual information and the statistical data. This confirms the effectiveness of the system in detecting light variations.
As a global conclusion, we can highlight that the calibration process has probed its validity to check the uniformity of the cyclorama illumination using only visual information. This supposes a speed up and increase in the precision of the system. In those cases, in which the visual analysis is not enough, the statistical data makes it possible to confirm the correctness and uniformity of the illumination. Actually, each iteration takes around 30 min, but the time could be improved (we calculate that around 75%) by using a panoramic camera capable of taking 32-bit HDR images.