1. Introduction
Two-dimensional (2D) images are commonly used for product presentations in e-commerce because they can reveal the object’s texture and are easy to process. However, as 2D images can display only limited views of an object, it may be possible to capture hundreds of 2D images and orient an image at any viewing angle via a web viewer [
1]. However, storing and displaying so many images while maintaining high image quality would have huge memory requirements. In addition, the actual three-dimensional (3D) shape and dimensions of an object cannot be obtained in this representation. 3D image-modeling technology is a technique for reconstructing the 3D model of an object by using multiple 2D images while maintaining its texture on the model (called 3D textured model hereafter). If its texture quality can be comparable to that of 2D images, this technology could be used to replace 2D images for product presentations, because a 3D textured model requires less memory and can freely be oriented in 3D space.
Product presentation usually requires a dedicated photography device to catch high-quality object images with known position and orientation in 3D space. The object images can be obtained using a single-camera device that applies a digital single-lens reflex (DSLR) camera to capture an object placed on a turntable, or a multi-camera device that applies several DSLR cameras mounted on an arm to capture an object placed on a turntable from different angles. These devices can position the camera precisely such that the camera information can be calibrated. The object on the turntable can also be oriented to capture object images in different views. These devices also provide a controlled environment, for example, single background color and adjustable lighting, such that the object images and the background color can easily be separated. As these devices are already used in the field of product presentation, we use them as the image source of the 3D-image modeling technology.
3D image-modeling technology primarily involves the generation of two kinds of information, the 3D model of an object and its texture map. The former employs triangular meshes to describe the object’s surface geometry, and the latter describes its color information. There is a mapping between the 3D model and the texture map such that when the model is displayed in 3D space, accurate object texture can be displayed accordingly. Approaches to generating 3D models from multiple images can be classified into two groups: shape-from-silhouette (SFS) and shape-from-photoconsistency (SFP). The SFP approach has received extensive attention because it can simultaneously yield a 3D geometric model of an object and its texture map. The main idea of this method is to generate photo-consistent models that can reduce some measure of the discrepancy between different image projections of their surface vertices [
2,
3,
4]. The main advantage of the SFP approach is that it can generate fine surface details by using photometric and geometric information. However, the reliability of the SFP approach remains a problem because the texture quality can easily be affected by environmental factors such as noise in the colors, inaccuracies in camera calibration, non-Lambertian surfaces, and homogeneous object color.
However, the SFS approach is a common method used to estimate an object’s shape from images of its silhouettes [
5,
6,
7]. This method is essentially based on a visual hull concept in which the object’s shape is constructed by the intersection of multiple sets of polygons from the silhouettes of multiple 2D images. With a sufficient number of images from different views, this method can yield an approximate model to describe the outline shape of an object. However, this model is not yet suitable for visualization due to the following two reasons. First, the SFS method can produce visual features on the 3D model, such as sharp edges and artifacts, which do not exist on the real object surface; some virtual features may be sufficient large to affect the outline shape. Second, concavities on the object surface are often formed as convex shapes because these are invisible on image silhouettes. Therefore, a quality improvement method must be implemented to remove virtual features while recovering the smoothness of the model [
8]. The removal of artifacts is particularly important because they are difficult to detect and eliminate.
Texture mapping generally includes multiple techniques, such as mesh partitioning, mesh parameterization, texture transferring, and correction and optimization, which are related to each other and affect the texture quality. Research in mesh partitioning can be summarized using several different approaches. Shamir [
9] categorized several methods of mesh partitioning according to segmentation type, partitioning technique, and segmentation criterion. Segmentation type refers to surface-type and part-type. Surface-type mesh partitioning is commonly used in texture mapping [
10,
11,
12] because it can prevent large distortion in mesh parameterization. Mangan et al. [
13,
14] and Lavoué et al. [
15] proposed a constant curvature watershed method to separate a mesh model into several regions. Other applications of surface-type partitioning include remeshing and simplification [
16], mesh morphing, and mesh collision detection [
17]. Part-type mesh partitioning is commonly used for part recognition on a mesh model composed of multiple parts. Mortara et al. [
18,
19] proposed a partitioning method by applying the curvature information at the transition of different parts to decompose a mesh model. Funkhouser et al. [
20] proposed another method by establishing the database of some known parts for the separation of a mesh model. Partitioning techniques include region growing, hierarchical clustering, iterative clustering, and inferring from a skeleton, which can be implemented either alone or together. Segmentation criterion approaches include dihedral angle or normal angle, geodesic distance, and topological relationship, which can also be implemented either alone or together.
Mesh parameterization was classified in accordance with distortion minimization, boundary condition, and numerical complexity [
21,
22]. Distortion minimization can be summarized based on three types: angle, area, and distance. For angle minimization, an objective function is formulated to minimize the distortion of 2D meshes on the UV domain. Several methods can be employed for angle minimization. Lévy et al. [
11] proposed a least-squares approximation of the Cauchy-Riemann equations to minimize both angle and area distortion on 2D meshes. Desbrun et al. [
23] presented an instinct parameterization to minimize angle distortion. These two methods allow free boundaries and linear numerical complexity. Sheffer et al. [
24] optimized the angles on the UV domain based on angle-base flattening. This method sets constraints on the topology of triangular meshes to preserve the correctness of 2D meshes. Sheffer et al. [
25] proposed a hierarchical algorithm to improve the optimization efficiency for the case of huge triangular meshes, and Zayer et al. [
26] proposed a method to solve the optimization problem for a set of linear equations that were derived based on the angle-base flattening approach with a set of constraints specified. In addition, the barycentric mapping is commonly used for mapping 3D meshes onto the UV domain in mesh processing. Tutte [
27,
28] proposed an algorithm to embed a 3D mesh onto the UV domain by evaluating the barycentric position in terms of its neighboring meshes. Eck et al. [
29] proposed an algorithm to calculate the multiresolution form of a mesh via a barycentric map. Floater [
30] applied a “shape-preserve” condition for the barycentric map to preserve the shape of 2D meshes on the UV domain. Floater [
31] and Floater et al. [
32] further applied mean-value weights for the barycentric map to preserve the shape of 2D meshes. For all above-mentioned barycentric mapping, the boundary is fixed and the numerical complexity is linear, which is not suitable for texture editing. For texture mapping, a method of free boundary is more appropriate as it can ensure that the boundary of each island of 2D meshes is close to the real profile, making the texture editing easy. Some other approaches have focused on minimizing the area distortion [
33] and distance [
34].
For texture map generation, the main idea is to deal with the texture transferring problem. Niem et al. [
35] proposed a texture transferring method by identifying the most appropriate image source for a group of meshes. They also minimized the color inconsistency at the transition of two different groups and synthesized the invisible meshes using the color of neighboring pixels. Genç et al. [
36] proposed a method to extract and render the texture dynamically. The extraction was implemented by horizontally scanning the pixels and rendering every color onto the meshes. Baumberg [
37] proposed a blending method to handle the color difference between two different images. The images were separated into high and low bands; the low band images were averaged to minimize the color difference, whereas the high band images were kept to preserve the outline profile. In addition, texture synthesizing is commonly used to improve the transition between different textures. Efros et al. [
38] proposed an image quilting method to quilt together different texture patterns. They extended the boundary of each original pattern and calculated the minimum color difference on the overlapping area to find the new boundary between two patterns. Wei et al. [
39] proposed an algorithm to synthesize the texture pattern based on deterministic searching and use tree-structured vector quantization to improve the efficiency. These two approaches focus mainly on the transition synthesis between two texture patterns.
2. Problem Statement
For product presentations in e-commerce, texture quality is the most crucial issue to investigate because it directly affects the visualization effect. Ideally, the texture quality at any view in 3D space should perfectly match that of the corresponding 2D image. Actual texture on the 3D model, however, is usually worse than that of 2D images, mainly because individual texture on the 3D model comes from different image sources. A 3D model reconstructed using multiple images of an object is only an approximation of the object geometry. The camera model and calibration method used to estimate the camera parameters might yield additional errors in the position and orientation of the object images. These errors, combined with errors caused by the texture mapping process, might lead to discrepancy between the texture of the 3D model and the real object. Any defect in the 3D texture could negatively impact perceptions of the product being presented.
The following are typical problems involving the 3D texture:
Reduced texture resolution: The texture resolution at any view in 3D space is worse than that of the corresponding object image, primarily because of inappropriate scaling of the pixels between the real image domain and the texture mapping image domain.
Missing color on some mesh regions: All 2D meshes on the texture domain should ideally be color-filled, but some may be missed if they are beyond the boundary of the object image, primarily because of insufficient accuracy of the 3D model, especially for those meshes near the image silhouette.
Photo inconsistency at the transition of different image sources: Photo inconsistency usually occurs along the boundary of different groups of meshes, with each group textured by different image sources. This problem is the combined effect of insufficient accuracy of the 3D model and the camera parameters.
Thus, we develop a texture mapping algorithm that focuses on detecting and removing these problems.
The objective of this study is to develop a high-quality texture mapping algorithm that can be combined with a 3D modeling algorithm to generate the 3D textured model of an object for use in e-commerce product presentation. High-quality texture here indicates that the texture at any view in 3D space should be as close as possible to that of the corresponding 2D image, which mainly requires maintenance of the resolution on the texture and elimination of photo-inconsistent errors at the transition of different image sources. A general texture mapping process comprising the following three techniques is proposed: mesh partitioning, mesh parameterization and packing, and texture transferring. Specific efforts are made at each step to initially eliminate problems that might affect the texture of the 3D model. To further reduce the discrepancy of the texture owing to insufficient inaccuracy of the 3D model and camera parameters, a correction and optimization algorithm is presented. The entire texture mapping process is fully automatic and is intended to be used for all kinds of objects.
The main contribution of the proposed texture mapping method is as follows. First, we enhance the techniques of converting 3D meshes onto the UV domain so that the shape of most 2D meshes can be preserved and the finest resolution can be obtained in texture transferring. Three main techniques in converting 3D meshes onto the UV domain are mesh partitioning, mesh parameterization and packing. In the proposed mesh partitioning algorithm, a novel chart growth method is proposed to partition 3D meshes iteratively so that each chart of 3D meshes can be as flat (disk type) as possible, which can reduce the error of 2D meshes in mesh parameterization. In the proposed mesh parameterization algorithm, a novel conformal mapping method is proposed to preserve the shape of 2D meshes as close to that of 3D meshes as possible. In the proposed packing method, all regions of 2D meshes are tightly packed in a rectangular area to acquire the finest resolution. Second, we propose an optimized texture transferring algorithm for generating the texture map, emphasizing the elimination of erroneous texture mapping owing to insufficient accuracy of the 3D model as well camera parameters, and the improvement of the texture resolution as close to that of 2D object images as possible. The strategies used in the proposed algorithm include: (1) increase overall texture size in pixels; (2) increase the number of pixels occupied by each 2D mesh; (3) detect and fill in void meshes; and (4) perform texture blending at the boundaries of mesh islands. The first two operations can improve the resolution of the final texture map, whereas the last two operations can eliminate erroneous texture mapping. Several realistic examples are presented to verify the feasibility of the proposed texture mapping method. The results are also compared with those form commercial software.
3. Overview of the Proposed Method
The 3D textured model is created by covering a 3D model with a texture map that stores the color information of the object. The main idea of direct texture mapping is to generate the texture of the 3D model by directly using the object images.
Figure 1 shows the overall flowchart of the proposed texture mapping method. The input data are the 3D model of an object and multiple object images from different views (
Figure 1a). The original 3D model was generated from silhouettes of the object images using an SFS method. However, the surface quality of the original meshes was not satisfactory because of artifacts and virtual features affecting the outline shape, as well as the surface smoothness. A mesh optimization algorithm combining re-meshing, mesh smoothing, and mesh reduction was employed to eliminate the effect of the above-mentioned phenomena and yield an optimized mesh model [
8]. The model after mesh optimization served as the input of the proposed texture mapping algorithm.
In the proposed texture mapping algorithm, mesh partitioning is first implemented to subdivide the 3D model into several charts (
Figure 1b), each of which is later individually mapped onto the UV domain. Mesh partitioning is based on a chart growth method to assign a weight to each mesh on the model, and grow each chart of meshes one by one from a set of initial seed meshes. The seed meshes are optimized in an iteration process until all meshes have been clustered. This ensures that all charts are flat and compact in the boundary for easy mapping in the mesh parameterization. Mesh parameterization and packing is then implemented to map the meshes on each chart and to pack all 2D meshes on the UV domain (
Figure 1c). An angle-preserving algorithm is proposed to optimize the mapping between the 3D and 2D domains, which can preserve the shape of most 2D meshes. Furthermore, all 2D meshes are tightly packed in a rectangular area to acquire the finest resolution when mapping the pixels from the image domain to the texture domain.
Next, texture transferring is implemented to extract pixels from the image domain, and place them on the texture domain appropriately (
Figure 1d). This procedure comprises three main steps: grouping the 3D meshes, extracting pixels from the image domain, and placing pixels onto the texture domain. We also propose a method to analyze the texture resolution. The proposed texture transferring algorithm ensures that the texture resolution can be set to the equivalent of the 2D images. Finally, we implement correction and optimization of the texture to eliminate erroneous color mapping that might occur due to the insufficient accuracy of the 3D model and camera parameters and to improve the photo consistency at the boundary of different image sources (
Figure 1e). Several photo inconsistent problems are detected and solved one by one. The output texture map is saved as a universal data format (*.obj), which can be displayed with a website viewer (
Figure 1f).
5. Result and Discussion
The results of the texture map and 3D textured model for six examples are depicted in
Figure 10a–f, where the left and right images in each figure panel denote the 3D textured model and the texture map, respectively. The entire texture mapping process is done automatically, with a 3D model and 16 object images in different views as inputs, and the corresponding texture map as the output. The texture size for all six examples is 8912 × 8912. The proposed process includes the following key procedures: mesh partitioning, mesh parameterization and packing, texture transferring, and correction and optimization of the texture. The initial number of seeds on mesh partitioning is set to 10, and the final number of mesh islands generated for all six examples is 10–13. Each of the results in
Figure 10 can be demonstrated as a high-quality 3D textured model by applying the texture correction and optimization during the texture generation process. The results with and without texture correction and optimization are further discussed below.
The first optimization process is mesh island packing. When the AABB method is employed (
Figure 11a), the bounding box of each mesh island is larger, and the empty space inside each boundary box is also larger. When all these boundary boxes are packed onto a UV map of fixed size, each mesh island is over-compressed and loses the texture resolution that it should have. By contrast, when the OBB method is employed (
Figure 11b), each boundary box can best fit its mesh island so that the space that a mesh island occupies is more compact. In addition, the previous resolution of the texture map was 4096 × 4096 pixels. To maintain the resolution 5184 × 3456 of the original image, the larger resolution 8192 × 8192 has been applied to enhance the quality of the final texture. The texture space is four times larger than before. Therefore, each mesh island can be allocated more pixel space when all boundary boxes are packed on the same UV map.
Figure 12 depicts the distribution of the mesh number on each range of pixel numbers for the following four cases: 8192 × 8192/OBB, 8192 × 8192/AABB, 4096 × 4096/OBB, 4096 × 4096/AABB and commercial (3DSOM) software [
43], where 3DSOM is commercial software. When the number of meshes with fewer pixels is reduced, the texture resolution is closer to that of the original images. It is evident that the texture resolution of the case 8192 × 8192/OBB is the best among the five cases because it has the minimum number of meshes with fewer pixels. In addition, the texture resolution of 3DSOM software is the worst as most of meshes have pixels less than 2000. Therefore, the texture resolution of the proposed method is better than that of 3DSOM software.
Figure 13 depicts a local region of the texture for three cases, 3DSOM software, 4096 × 4096/AABB and 8192 × 8192/OBB. The result clearly indicates that the sharpness of the texture in
Figure 13c is better than that in
Figure 13a,b. The 3DSOM software blends the color with a low-pass filtered image, which will result in a loss on the texture resolution. This result indicates that the proposed method can yield a better texture resolution than 3DSOM software.
The next optimization process is the elimination of the texture defects caused by the geometric error. The background color of the image might be wrongly extracted for some meshes near the image silhouette, resulting in white spots on the 3D textured model. The incorrect extraction is caused by the meshes that are located outside the image silhouette when they are projected onto the front image. Thus, we wish to eliminate the influence of the error.
Figure 14 depicts the comparison of 3DSOM software, the previous result, and the proposed result where, for the previous result, no action was taken to deal with this problem, and for the proposed result, the data on the alpha channel of each object image was employed to detect this problem, and then its front image was replaced where necessary. It is evident that white background spots appear both on the result of 3DSOM software and previous result, they have been eliminated on the proposed result and the color is more consistent on the boundary area. For the e-commerce presentation, the color correctness is increased and the entire model viewing experience is improved.
The final optimization process is blending the texture information on the image transition area. The texture information is extracted from different front images. The boundary between two image sources might be inconsistent in color. The results before and after the implementation of the proposed blending algorithm for a shoe and a cup are shown in
Figure 15a,b, respectively. The texture quality on the transition area has been improved. The quality of the entire 3D textured model can therefore be improved for the purpose of e-commerce presentation.