Local Color Voxel and Spatial Pattern for 3D Textured Recognition

— 3D textured retrieval including shape, color dan pattern is still a challenging research. Some approaches are proposed, but voxel-based approach has not much been made yet, where by using this approach, it still keeps both geometry and texture information. It also maps all 3D models into the same dimension. Based on this fact, a novel voxel pattern based is proposed by considering local pattern on a voxel called local color voxel pattern (LCVP). Voxels textured is observed by considering voxel to its neighbors. LCVP is computed around each voxel to its neighbors. LCVP value will indicate uniq pattern on each 3D models. LCVP also quantizes color on each voxel to generate a specific pattern. Shift and reflection circular also will be done. In an additional way, inspired by promising recent results from image processing, this paper also implement spatial pattern which utilizing Weber, Oriented Gradient to extract global spatial descriptor. Finally, a combination of local spectra and spatial and established global features approach called multi fourier descriptor are proposed. For optimal retrieval, the rank combination is performed between local and global approaches. Experiments were performed by using dataset SHREC'13 and SHREC'14 and showed that the proposed method could outperform some performances to state-of-the-art.


I. INTRODUCTION
Currently 3D texture retrieval is still an interested research because the increasing number of 3D models in a variety of shapes, colors and patterns. This causes the main focus retrieval is not only based on the shape, but also color and pattern. Previously there are some basic methods to solve 3D shape retrieval such as view-based, transformbased, graph-based which are combined with a color histogram for 3D textured retrieval. 3D textured retrieval based on pattern voxel, which has never been done yet, while generating a 3D voxel grid keep the information and at the same time is able to equalize the dimensions of various 3D models. This hypothesis appeared by observation of a 3D color voxel visually which is similar to the original one. Research related to generate voxel was already proposed in [1]. Actually some analysis of local pattern based on voxel were already exist but only used binary voxel to capture global shape not included color in such as in [2]- [5], 3D LBP [6], Lightweight binary voxel [7], MFSD [8]. Another voxelbased approach to recover full 3D shape from view based 2.5D and to recognize multiple objects from a single image are in [9], [10]. For the different reason and different use, we study about local extraction on a 3D color voxel to generate local features. In general, the proposed framework is illustrated in Fig. 1 The main contributions in this paper are as follows: (1) Local Color Voxel Pattern (LCVP). First, this method performs 3D grid color voxel rendering by utilizing all vertices then mapping them to voxel in the related position. It is possible that a voxel will be crossed by some faces so that it obtains the color voxel by averaging all the color vertices which are crossing it. The color voxel is interpolated by three vertices on faces which are crossing it. After generating a 3D grid color voxel, the next process is analyzing local pattern of a color voxel one by one inside a cube with 3x3x3 size which covers it. A cube has 6 sides. A side will be called surface if there is at least one composition voxel equal to a non-zero. LCVP formula is calculated only on the surface of cube. LCVP has two types, first type will utilize corner voxels, and second type will utilize middle voxel. The voxel value is not used in LCVP formula directly, but it first must be converted using level color conversion equation. This will ensure that a different surface color will generate a different pattern. To produce a rotation and reflection invariance form, it performs circular shift and reflection circular shift in voxel composition sequence then choose the minimal value as LCVP value. A LCVP histogram is then built. When computing LCVP, it also builds color histogram using color on each voxel center cube.
(2) Rank combination process is a process that combines two ranks from different extraction methods, in this case they are LCVP (our method), weber histogram, oriented gradient and multi Fourier descriptor method. Each rank is built by using the same query model. If there is a member in the first rank and it is also member of second rank then it was called slice member. A sequence change will occur in the first rank. Slice member becomes the highest priority as member with the most similar to query and change the sequence in first rank. This process will be employed on each query. Using this approach, this can improve the performance. This paper is organized as follows. Section I reviews a related work. Section II introduces the methods such as building 3D color voxel, local voxel pattern (LCVP), multi Fourier spectral descriptor (MFSD) and weber method. Section II also introduces the optimization approach using a rank combination. Section III shows some experiments, results, and comparison with the state-of-the-art. Section IV provides some conclusions. In the last decade, the development of 3D shape retrieval is very fast. Due to some areas such as multimedia, computer graphic, computer vision, CAD use 3D models and require a reliable 3D shape retrieval. The development of algorithm began by paying attention to the characteristics of rigid 3D shape. Based on the representation of the shape descriptor in [11], the shape matching is divided into three categories: (1) feature based methods, (2) graph-based methods and (3) geometry-based methods. Some research are in featurebased methods such as [12]- [19]. The concept of global feature-based similarity has been refined recently by comparing the distribution of global features instead of the global features directly such as [20]- [21]. Spatial maps are a representation that captures the spatial location of an object. Some researchers in this domain are [22]- [25]. Local feature based methods provide various approaches to take into account the surface shape in the neighborhood of points the boundary of the shape such as [26]- [29]. Several methods also appear in image problem such as [30]- [36]. Graphbased methods can be divided into three broad categories according to the type of graph use: model graphs [37]- [38], Reeb Graphs [39]- [40], and skeletons [41]- [42]. Geometry based method is classified into four approaches such as view based similarity in [43], volumetric error based similarity in [44]- [45], weighted point set based similarity in and deformation based similarity in [46]- [47] and deformation based similarity in [48]- [49]. Some of these algorithms are Shape Distribution, Spherical Harmonic Descriptor, Light Field Descriptor [50], Elevator Descriptor (ED) [51] and Shape Impact Descriptor [52].
The next consideration moved to the characteristic of 3D shape non-rigid where in general some algorithms consider about local features, topological structures, isometry invariant global geometric properties, direct shape matching or canonical forms. An algorithm which considers local features and also insensitive to isometric transformation are Spin Images [53], Heat Kernel Signatures [54], salient local features (SIFTs) [55], Intrinsic Spin Images [56]. Others algorithm which considers topological structures are Multiresolution Reeb Graphs (MRGs) and skeletons matching technic. Algorithms which notice isometry invariant global geometric properties are employing Laplace-Beltrami spectra [57], eigenvalue from geodesic distance matrix [58], distribution of intrinsic distance including diffusion distance, geodesic distance, a curvature weighted distance [59] and employing canonical form [60].
Rapid development on 3D textured retrieval was triggered by algorithm contest such as SHREC'13 and SHREC '14. There are several algorithms in [61]. M. Abdelrahman et all described a 3D shape textured method by combining a geometric using scale invariant heated kernel signature (SI-HKS) and a photometric contribution. V Garro and A Giachetti used Histogram Area Projection (MAPT) [62]. We also participated in that contest by implementing combination methods such as local binary pattern, local ternary pattern, a histogram of oriented gradient and weber local descriptor. C. Li, A. Godil, A. Ben Hamza used the spectral geometry based framework for textured 3D shape representation and retrieval. This framework is based on the eigendecomposition of the Laplace-Beltrami operator (LBO), which provided a rich set of eigenbases that were invariant to isometric transformation. It consists of two main stages: (1) feature extraction by using spectral graph wavelet signature to capture geometry information and color histogram for texture information, (2) spatial sensitive shape comparison via intrinsic spatial pyramid matching. A.Tatsuma, M.Aono propose Multi Fourier Spectral Descriptor and Multiresolution Representation Local Binary Pattern Histogram (MRLBPH) which captured texture features of rendered image from a 3D model by analyzing multi-resolution representation using LBP. They first enclosed the 3D model within a unit geodesic sphere after normalizing the 3D via Point SVD. They got some number color buffer images rendered from 38 viewpoints. To obtain multiresolution representations, they applied a Gaussian filter with varying scale parameters to an image. S. Velasco-Forero proposed a method that basically computed two features: a shape and a color descriptor. A shape was represented by a geodesic distance matrix (GDM) [63], and a color was represented by a CIElab color histogram. The basic idea was to compute the average Earth Mover Distance (EMD) distance between RGB histogram for two given shapes. C.-X. Xu, and Y.-J Liu. They sampled the 3D model on its surface in N-dimensional space, which includes both geometric and textural information then these sampling points are optimally clustered. Geodesic distance is computed among the points then it got shape distribution of the model. D. Girgi proposed a Textured Shape Distribution (TSD) descriptor that was a color-aware variant on classical Shape Distribution. TSD consists of the distribution of mutual distances computed between points sampled over the surface mesh representing the 3D models. TSD descriptor employs geodesic distance instead of euclidean distance, and that geodesic distance is computed on the surface embedded in the three-dimensional color space. The vertices of the surface mesh are the (L,a,b) coordinates in the CIELab color space. The conclusion of the most common approach was to combine features of shape and texture.

II. MATERIAL AND METHOD
Starting from an early investigation that a 3D voxel still keeps its information about shape, color, and pattern, this study tries to search a good formula for texture and shape feature extraction. Actually, there are some studies related to 3D voxel, but they used binary voxel, not color voxel. This is a different point. We learn it and add new features to perform 3D grid color voxel. Completing this voxel building, we propose a novel formulation to extract local feature based on color voxel against all neighbours in a cube. To add experiment, we also implement Fourier extraction from the spatial image for capturing the global shape of 3D models.

A. 3D Voxel
A 3D model consists of many faces (triangles), and a face consists of three vertices. Each vertex has an RGB value. Color in the face is influenced by interpolation of three color vertices. So that, in 3D grid color voxel rendering, we will compute color from each vertex. We have to know that a face can cross into one or more voxels. So we have to consider each voxel which crossed the face. The illustration is shown in Fig. 2 (a). Illustration of the calculation of face (triangle) in three dimensions is shown in Fig. 2 (b). For details of the formulation will be explained later. Vice versa a voxel can be crossed by one or more faces. If a voxel is crossed by more than one face, then it will be converted into voxel octree voxels. This illustration is shown in Fig. 2 The calculation for 3D color voxel rendering as follows: (1) find mean point on x,y,z axis; (2) calculate distance from each point to mean point; (3) find the longest distance between one point into mean point; (4) map each point into voxel; (4) update voxel with color information; (5) normalize color voxel by number points crossed it. In this experiment we use dimension as 64, it means we map 3D object into voxel size 64x64x64. By deciding to map all 3D models into one standard size, it will omit the size differences between 3D models.

B. Local Color Voxel Pattern
The local pattern on the 3D surface is a fundamental property of the 3D model. By capturing local pattern on the 3D surface means that it gets 3D texture information entirely. In another side, to simplify computation which consists variety vertices and faces then voxel-based approach will be chosen.
In this paper, a new approach will be proposed theoretically and computationally to discriminate 3D texture effectively. The texture on the 3D model is influenced by pattern and color. Members of the same texture class are influenced by pattern, color, and shape. The same pattern could appear on different 3D models and vice versa. Therefore the important thing for the first step is capturing pattern and distribution of color information. To distinguish color, a color histogram as the old approach will be alternative one and used to distinguish the pattern. Local color voxel pattern histogram will be built too. This work is started by constructing cube voxel by size 3x3x3 which has one center voxel. Every voxel will be computed as center cube if it meets some requirements. As an example in Fig. 3, how to create cube voxel.  Fig. 3 shows about how to generate cube voxel. Every voxel has voxel neighbours with color has potentially created cube voxel. Then by splitting this cube, it gets three squares 3x3 voxel. This approach has two different ways. To make different way is clear, each voxel will be colored with a different color such as blue and yellow. Then yellow voxel dan blue voxel will be employed by local color voxel pattern to capture cube texture. The yellow one will be computed by local color voxel pattern type 1, and the blue one will be computed by local color voxel pattern type 2.
By this voxel (red/yellow), we first determine color scale. Dim (dimension) is voxel dimension, and scale is color scale. For example, we will build a 3D object on voxel which has 64x64x64 dimension, so we provide value for dim is 64. Scale is a number that is used to divide color, for example, the scale is 8, so color (0-255) will have 8 scale level. "dist" is distance, this represents distance a voxel from the center. The final equation as follows : (1) In Fig. 3, we take one voxel become a cube center and its surrounding voxels as cube composer. By splitting this cube, we get three squares of 3x3 voxels on each. Yellow voxel and the blue voxel is employed by LCVP equation to capture cube texture. The yellow voxel generates LCVP type 1 while blue voxel generates LCVP type 2. We then implement LCVP operator which is denoted as , P parameter set quantization angle. LCVP histogram is a texture feature of 3D models. We start to derive color scale and to get rotation reflection invariant by defining texture T on each cube voxel surface. On each cube voxel surface, there are two types of texture/ type of pattern provided by symbols such as and . Vc stands for voxel corner and Vm for voxel middle. The pattern is focusing only on voxel corner (the yellow one) while the pattern is focusing only on voxel middle ( the blue one). Every pattern has four value as follows: The local pattern on 3D voxel surface is influenced by color (0-255) and its composition. So the color scale will be used than original color for computation's efficiency. N Scale value will be set empirically and be used to compute local color voxel pattern. Each cube consists of six sides (surfaces) where each surface has 9 voxels (3x3). Employing all voxels in one pattern will obtain very wide pattern and become ineffective for computation. That is the reason why it will be split into two types. Type one employs corner voxels while type two employs middle voxels. This splitting will not reduce accuracy even will make it faster and more effective. Each color on voxel is determined as follows : Where the color value is correlated with the color value scale of voxel corner to p() , while the color scale is correlated with the value of the middle corner to p. We transform color voxel against a scale to get color scale voxel, by defining n factor, then get a unique number that characterizes the spectral texture of the cube. Fig. 4 show how to get LCVP in rotation reflection invariant value, we choose minimum value between patterns to get rotation invariant and choose minimum value between patterns to get reflection invariant value. For final histogram, we add two histograms on each correlated bin. Final formula to get this as follows: Where color value v cp p=0,1,..., P-1 is correlated with scale value color of corner voxel, while color scale v mp is correlated with a scale value of the middle corner. Symbol minROR rotation (minimal Rotation) indicates a minimal value which is chosen as the minimal value after rotation process, similarly symbol minROR reflection indicates minimal value after reflection process. For better understanding, we give an example shown in Fig. 7. First, we get an original cube (white voxel) in the middle. We then split into type one (left side) and type two (right side). The next process is computing each voxel scale such as and We then compute local color voxel pattern both type one and type two. Each type has eight patterns and chooses the minimal one. Finally, we add each result to build LCVP histogram. The formula as follows : Histogram[ ] ++

Histogram[ ] ++
Because every pattern has four members (m1,m2, m3, m4), so LCVP will also have four value. We have to choose a minimal value on every pattern. LCVP 1 is LCVP which focus on corner voxel while LCVP 2 is LCVP which focus on the middle corner. The two histograms built are LCVP 1 and LCVP 2 . Every time we calculate LCVP on each voxel, we will add grade on histogram LCVP 1 and histogram LCVP 2 .

C. Multi Fourier Spectral Descriptor
For shape features, a method called Multi Fourier Spectral Descriptor (MFSD) in [8] was chosen. It uses the low-level Fourier spectra concept to build shape features. This process is started by reading 3D model then performing pose normalization. This process is absolutely needed because every 3D models have different size and orientation while the solution is based on image processing. Pose normalization is definitely very important. One kind of pose normalization method is based on point SVD. 2D image rendering is implemented after pose normalization and generating three kinds of images such as silhouette image, contour image, and depth buffer image. This also implements Periphery Enhanced Image (PEI) to enhance the peripheral shape of all images. All three images are converted into polar coordinate first before performing Fourier transform. It then gets low-frequency spectra from each image transformation. The combination of four different Fourier spectra is called multi Fourier spectra descriptor. These features are used in this experiment. This process can be seen in Fig. 7.

D. Weber Local Descriptor and Histogram Oriented Gradient
The next method is using weber method. As explained in [35], it has two components: differential excitation (ζ) and orientation (θ). How to build weber histogram is illustrated in Fig. 8 and Fig. 9. The first step, we read 2D image in one viewpoint. Gray image is chosen to generate one channel rather than color image because global shape information does not need a color image. In every single pixel, we will compute excitation value and orientation value.
In excitation, there are two filters while in orientation gradient will be generated. The first filter in excitation calculates the differences between its neighbours and the center point as follows: where Vs_11 and Vs_10 are the output of the filter in Fig. 9.
In some previous papers, the idea of depicting an image by histogram has been used in the biologically plausible vision system. Motivated by this idea, we tried to compute excitation and orientation on every pixel against its neighbors.
We also implement Histogram of Oriented Gradient in (HoG) [36] as one of features extraction method that can be implemented for capturing shape. This function meets our need. By using this method, we can receive shape features. The HoG representation has some advantages which can captures edge or gradient structure that is characteristic of local shape. This method is ever implemented to detect human. In practice on this paper, we divide image window into small spatial regions ("cells"), for each cell accumulating a local 1-D histogram of gradient directions or edge orientation over the pixels of the cell.   Fig. 9 describes how to calculate oriented gradient on each image. We divide one image into some cells. Before computing gradient, a color image is converted into the grayscale to focus only one model one color channel . The equation of oriented gradient is similar with filtering. There are two experiments conducted on this method. The first used image in grayscale and second used image after the canny filter. The goal is computing only on its periphery.

E. Color Image Histogram
3D models not only come with shape features but also come with texture features which represented by a different color on each model. To capture different texture, we also can utilize color composition as texture features. We use a simple color histogram which will quantize color based on distance from center pixel and range intensity. The following Table 1 describes our color quantization. For final descriptors, we combine all histograms to be one package. The result of our shape and texture descriptor formula as follows : (17) each part represents a feature of weber histogram, histogram of oriented gradients, spatial Fourier transform histogram and color histogram. This is a different approach from LCVP and MFSD. We will compare it with another approach.

F. Building Dissimilarity Matrix
Dissimilarity Matrix (DM) is built by comparing between all 3D models features. The obtained value represents the similarity between each 3D model. The smaller value is the more similar ones. For the first experiment, there are three features such as local color voxel pattern (LCVP), color histogram (CH) and multi Fourier spectral descriptor (MFSD). For the intentionally, this study will combine two DM both LVP and CH by normalizing first but not for DM MFSD. We will perform rank combination between normalizing (LCVP and CH) and MFSD next. For generating DM, we use Manhattan distance as follows: Symbol d(xi, xj) (distance between object i and object j) indicates the difference between object i-th and object jth. If they are similar, then their distance is closed to zero otherwise closed to one. Every object will have n features. So we compare all features between each 3D objects. From those notations, we can conclude that we have three distance similarity between object based on LCVP, weber, and MFSD. For first calculation, we build dissimilarity matrix based on LCVP and weber. After that, we get the normal value between them. Finally, we get two dissimilarity matrix (DM), one is from LCVP and weber and secondly is from MFSD.

G. Rank Combination
We choose the first optimization by using a rank combination approach. Many rank combination approach can be cast as the problem of assigning scores to classes based on the ranks they receive from multiple constituent classifiers. Once assigned, then classes can be ordered based on their scores, generating a combined ranking. We use the following notation. There are K target classes and a collection of J component classifier algorithms . For any particular query, the output of algorithm (1),..., (K)), with (K) being the rank assigned by algorithm to class . A score function, for each class maps the rankings to a scalar where θ is a class in . As score functions are ultimately used to generate new rankings, they have the important property of being invariant to monotonic transformation.
where g is monotonically increasing and R* denotes the combined ranking derived from the class scores. In [64], they proposed the Mixed Group Ranks (MGR) score function which generalizes the Best Rank and Linear scores. It is a weight linear sum of the minimum rank of all subset of classifiers. (22) The MGR score function combines the democratic voting aspect of the Linear and Borda Scores with the emphasis on confident rankings of the Best Rank score. In [64], they describe the general category of score function that embodies these characteristics. Score functions that are both monotonic and quasiconvex (in the ranks assigned to a class) prefer lower ranks to bigger ranks and assign greater influence to smaller ranks. With non-negative weights, , MGR is both monotonic and quasioconvex, embodying these score properties.
Based on MGR concept, we implement this approach by comparing two ranks (first is LCVP rank and second is MFSD rank) which is built by employing two different dissimilarity matrix (DM). This process is started by building two ranks using the same query. Suppose that with being the rank assigned by algorithm LCVP to class . We then examine whether there is a same member situated in both ranks. If no one than there is no update in the first rank, otherwise this member will be the highest priority in the first rank. Final result of first rank will update LCVP distance matrix. In Fig.  10, it is shown how a rank be generated by both LCVP method and MFSD method using the same query model. The sequence number of models based on distance is shown on each rank. There are some sliced members, then comparison is performed and finally a final rank established. This result will be used to update distance matrix LCVP. The formula is shown as follows: (23) where is rank result within K numbers of query i. This result generated by computing dissimilarity matrix of each model in LCVP DM. While is list of dissimilarity matrix between query i to all models.
is k-th rank result of LCVP. In Fig. 10, we try to present our combination rank. First we build LCVP rank based on LCVP dissimilarity matrix (DM) using a query. At the same time, we also build MFSD rank based on MFSD DM using the same query. By looking into MFSD rank result one by one until end, we try to find the same answer from LCVP rank from the first until the end. Suppose we find the one similar answer, we then shift into first order for final answer. By using this final order, we recompute LCVP DM. We also try to capture our algorithm for combining rank.

III. RESULT AND DISCUSSION
In this section, some experiments will be performed. To show the performance of this system, it used a standard dataset from Shape Retrieval Contest (SHREC'13) [65] and SHREC '14 [66]. Dataset of SHREC'13 has 240 models of 3D objects grouped in 10 geometric classes and 33 texture classes. Dataset of SHREC'14 has 572 models of 3D objects grouped in 16 geometric classes and 48 texture classes. The whole dataset is separated into two-level ground truth: if two models share only shape they will be called relevant, if they share both shape and texture then will be called highly relevant.
The evaluation process has been determined by using several evaluation measures such as Average precissionrecall curves, Nearest Neighboor (NN), First Tier (FT), Second Tier (ST) and Average Dynamic Recall (ADR). At the end of experiments, we will compare this proposed to state-of-the-art by using standard measurement and evaluation. This evaluation code is coming from SHREC organizer. It was done to make sure, that all evaluation will be examined fairly.

A. Experimental Setup
For the experiments testing, we adapt to the rules established by the committee in SHREC 2014 on track Retrieval and Classification on Textured 3D Objects. We also use code generating performance which is provided by the committee. This is done to assure fairly comparison. Detailed information can be found in [12]. The performance of this method has been evaluated according to the following relevance scale. If a retrieved object shares both shape and texture with the query, it is included in a highly relevant group otherwise if it shares the only shape, it is included in the relevant group. The evaluation process has been based on the following evaluation measures such as Average precision-recall curves, Nearest Neighbour (NN), First tier, Second tier (ST) and Average Dynamic Recall (ADR). Average precision-recall curves. Precision is the fraction of retrieved items that are relevant to the query. The recall is the fraction of the items relevant to the query that are successfully retrieved. Being A the set of all the relevant objects and B the set of all the retrieved object. The formula of precision-recall is defined by : (24) Nearest Neighbour, First tier, and Second tier. These evaluation measures aim at checking the fraction of objects in the query's class also appearing within the top k retrieval. Average dynamic recall. The idea is to measure how many of the items that should have appeared before or at a given position in the result list actually have appeared. The average dynamic recall (ADR) at a given position averages this measure up to that position. Precisely, for a given query let A be the set of highly relevant classified items, and let B be the set of relevant items. The ADR is computed as: (25)

B. Relevant
First, we perform global features to captures geometric information by using multi Fourier spectral descriptor (MFSD), it extracts features and generates dissimilarity matrix between each model. Fig. 11 shows partial result using dataset SHREC'13. Left side is a query and the right one is answers. The first query is human by using our proposed methods; we can retrieve the similar one such other humans with a different pose. Similarly, the second and third query, queries are tree and tree again. We can get answers with similar shapes. In Fig. 12, we also show partial result using dataset SHREC'14. Left sides are queries (pliers and scissors), and right sides are the answers. Even some result is failed, but in approximately, this approach gives a good result. Fig. 11 Example of 3D retrieval system in relevant mode using dataset SHREC '13 Fig. 12 Example of 3D retrieval system in relevant mode using dataset SHREC'14

C. High Relevant
Second, we perform our local features called LCVP, then combine with MFSD features using a rank combination. In Fig. 13, we show partial result using dataset SHREC'13. Left side is a query and the right one is answers. The first query is round table, by using our proposed methods we can retrieve the similar one such round table. Similarly, the second and third query, queries are an ant and round table again. Now we can get answers with similar in shapes and texture. In Fig. 14, we also show partial result using dataset SHREC'14. Left sides are queries (study lamp, vase, face sculpture) and right sides are the answers. Once again, we get answers with similar in shapes and texture. We can compare additional performance we get after combine LCVP and MFSD  The result from other participant is obtained from the paper in [65] for SHREC'13 and paper in [66] for SHREC'14. The description of each method also can be found in a paper [65]. As we can see in Table 1 and Table 2 as a result of the experiment when using dataset SHREC'13. Table 1 shows the result in the relevant category. It means that the retrieval system is only focusing on shape. G1 is the best method because it uses geometric calculation using geodesic metric and eigenvalue. This method is best, but it needs time more. We have already tried this method but the computation time needs high specification machine, while our method no needs complex computation. Even though our method is not the best, but the result is still closed to the best Table 2 shows the result on the high relevant category. It means that the retrieval system is focused on shape and texture. The combination of LCVP and MFSD is our proposed method become the best in this category.
In this dataset, it consists of 240 models. The fact is that the 3D model in the same texture class mostly have similar geometry, while the texture is affected by the difference in color so that it can be said that the dataset is not too complex. Our approach is good in this case because there is a geometry element to determine the shape and color element to determine texture between models. Fig. 15 shows the composition of all participants approach in precision recall evaluation . Next experiment is conducted by using SHREC 2014 dataset then comparing with other participants method on that event. For comparison, we refer to a paper in [66], and the result can be seen in Table 3 dan Table 4. Gi method is proposed by D. Giorgi from National Research Council (Italy). This method is called Textured Shape Distribution (TSD). It is a color-aware variant on classical Shape Distribution. TSD consist of the distribution of mutual distances computed between points sampled over the surface mesh representing the 3D models. TSD employs geodesic distance instead of Euclidean, and the geodesic distance is computed on the surface embedded in three-dimensional color space. Ve method is proposed by S. Velasco-Forero (ITWM Fraunhofer Institute, Germany). He also computes the geodesic distance matrix in the mesh information. He uses a spectral representation of the geodesic distance as a descriptor. He also uses color information in RGB space as texture descriptor and finally combines both of them. GG method is proposed by V. Garro and A. Giachetti from the University of Verona (Italy). He computed textured mesh difference based on Histogram of the Multiscale Area Projection Transform (MAPT). This method is based on a spatial map that encodes the likelihood of the points inside the shape of being centers of spherical symmetry. LBG method is proposed by C. Li, A. Godil (NIST, USA) and A. Ben Hanza (Cordia University, Canada). They employ the spectral geometry for textured 3D shape representation and retrieval. This method is based on the eigendecomposition of the Laplace-Beltrami Operator (LBO) which provides a rich set of eigenbases that are invariant to isometric transformation. TA method is proposed by A. Tatsuma, M. Aono and C Sanada (Toyohashi University of Technology). They propose the multi-resolution Representation Local Binary Pattern Histogram (MRLBPH). This method enclosed a 3D model within a unit geodesic sphere after normalizing the 3D model via Point SVD and rendered image as 38 viewpoints. HA is proposed by Hero Yudo and M. Aono (Toyohashi University of Technology). They employ some methods such as LBP and LTP in spatial space. AEF is proposed by M. Abdelrahman, M. El-Melegy and A. Farag from the University of Louiville (USA). He generates shape descriptor by employing scale invariant heat kernel signature. For texture descriptor is based on the color histogram in RGB space. XL method is proposed by C.-X Xu and Y. J. Liu from the Tsinghua University. They propose a sketch-based method, which belongs to the 3D image manners but applies to some specific cases such as 3D CAD design process. We also can see at Table 3 and Table 4 as a result of experiment when using dataset SHREC'14. This dataset consists of 572 models. In relevant criteria, our approach reaches the best one while in high relevant criteria only on the first tier and second tier our proposed is number one. In Fig. 16, we can see the composition of all participants approach in precision recall evaluation. In this paper, a novel approach based on local color voxel pattern called LCVP is proposed. This approach will extract 3D textured features by considering pattern on a voxel to its neighbors. To increase performance, aggregation with some spatial pattern such as Weber descriptor, oriented gradient, and multi Fourier spectral descriptor are utilized. Linear combination is chosen to cumulate each function. One process which should not be overlooked is pose normalization. This process gives a big impact to determine the most appropriate position for capturing the image. This problem still leaves some opportunities to increase performance.
The combination of all approach is also added with a rank combination between them to generate a more satisfactory result. To verify our approach, this approach has been experimentally evaluated with evaluation standards. According to the experiments, it is shown that this approach outperforms to state of the art. In the future, we wish to improve the proposed methods by improving how to do pose normalization and also consider about non-rigid objects.

ACKNOWLEDGMENT
The author would like to thank DIKTI (Indonesian Higher Education) for the research funding.