Video-Based Stylized Rendering Using Frame Difference

— In this paper, we suggest video based stylized rendering using frame difference. Stylized rendering using video frame has a temporal problem that occurs a difference between the previous and current frame. To reduce the temporal problem, we generate reference maps using temporal frame difference in correction and rendering steps. A correction method using reference maps can be reduced flickering effect caused by frame difference between the previous and current frame. We use a background map, an average map, and a quadtree-based summed area table as reference maps. Among these reference maps, the method using quadtree based summed area table can completely remove a flickering and popping effect. Also, a post-blurring method using bilateral filtering can be represented smooth, stylized rendering by removing unnecessary noise. Suggested stylized rendering system can be used in various fields such as visual art, advertisement, game and movie for stylized image contents generation.


I. INTRODUCTION
Stylized rendering is a non-photorealistic rendering (NPR) method that simulates artistic techniques in rendering such as pen-and-ink, sketch, cartoon, watercolor, and painterly rendering, instead of photorealistic rendering (PR) [1]. In recent years, studies have been actively conducted to convert an original frame into a stylized frame in real time by using video streams instead of a still image. Stylized rendering methods using the video stream have problems that temporal coherence cannot be maintained due to the difference between the previous and current frame.
The temporal coherence problems in sketch rendering include both spatial and temporal aspects of the feature line such as flatness, motion coherence and temporal continuity [2]. If motion coherence and temporal continuity are satisfactory but lack flatness, the rendered scene which focused 3D appearance cannot appear similar to traditional hand-drawn animations. Keeping feature lines ensure flatness and temporal continuity but produces shower door effect [3], because the motion of feature lines has no correlation with the motion flow of the scene. Finally, if flatness and motion coherence are maintained but temporal continuity is neglected, the animated scene produce flickering and popping effects [4], [5] since the position of feature lines varies randomly from frame to frame.
Sketch rendering algorithms can be classified into image space and object space [6]. Image space algorithms [7]- [13] use image processing technique to extract lines such as silhouettes, contours or boundaries. These extracted lines may have flickering due to lack of 3D geometry information.
Object space algorithms [14], [15] use the surface of 3D geometry models to extract feature lines. These feature lines can be easy to transform into various width and painting styles. However, lines without exact control of temporal coherence can be produced a variety of artifacts such as sliding or popping effect.
In this paper, we suggest image space based temporal coherence algorithm using pixel difference from frame to frame. To reduce flickering effect at background part, we propose original image correction method using reference maps in correction step. Also, to improve sketch rendering effect, we apply post-blurring method to the stylized image in rendering step. This paper is organized as follows: The still image and video-based sketch rendering method are discussed in detail in Section II. The results of suggested video based sketch rendering are presented in Section III. Finally, the conclusion of this paper and further areas to study are illustrated.

A. Still Image Based Sketch Rendering
As shown in Fig. 1, the first step of sketch rendering convert RGB image captured through video file or web camera to grayscale. And then, we invert the grayscale image to obtain a negative image and apply blurring method to the negative image. Finally, we combine the negative image and the blurred image using color dodge algorithm [16].  A blurred image can be generated from Gaussian blur or bilateral filtering. The differences between Gaussian blur and bilateral filter will be explained in the next section. Fig.   2 shows the resulting image of still image sketch rendering using color dodge algorithm.
If we use a video instead of a still image, flickering and popping effect occurs in which a pixel value at a certain position is randomly changed due to the difference of color between the previous and current frame. Fig. 3 shows a flickering effect when applying still image based sketch rendering method to video. In order to reduce these problems, an algorithm for correcting an original image is required. Fig. 3 The problems of temporal coherence in stylized rendering Fig. 4 shows the overview of suggested video based sketch rendering system. To maintain temporal coherence by removing flickering effect occurring in the background, we suggest a method of correcting an original image using reference maps in the correction step and a post-blurring method in the rendering step. In correction step, video frames can be input through video file or web camera, and then the original frame can be corrected by using frame difference between original image and reference map. In rendering step, we transform corrected frame into a grayscale image and invert the color. And then we blur the inverted frame using bilateral filtering and apply color dodge algorithm in the blurred frame. To reduce the flickering effect and improve sketching effect, a postblurring method using bilateral filtering is applied again in the rendered frame. The detailed explanation of each step will be given in the next section.

C. Correction Step
To reduce the problem of video-based sketch rendering, it is necessary to correct an original image to compensate for differences between frames. In this study, we correct an original image using reference maps such as background map, average map and quadtree based summed area table.
1) Background Map: Background map means an image which has removed foreground part and left background part only. Background map can be generated by capturing an image without foreground in advance or by storing only pixels having a minimum color difference between frames from a video recorded during a predetermined time [17], [18].
According to Eq. 2, a corrected image is generated using frame difference between an original image and background map.
The frame difference 2) Average Map: Since background map needs to capture an image without foreground in advance or record a video for a certain time, there arises a problem of generating a background map again when the viewpoint of the camera changes. To improve this problem, we suggest an average map that divides an image into specific grid areas and stores averages by area. In this study, we divide an image into 2 by 2, 4-by-4 and 8-by-8 pixel area. Also, we use average map for reference frame and average map for the current frame. The average map A M can be calculated as shown in Eq. 3. d refers to the pixel count per coordinate axis of the divided area.
Corrected image using average map can be generated by calculating color difference between average map for reference frame and average map for current frame as shown in Eq. 4.
The average map difference refers to the absolute values obtained by subtracting the corresponding pixel values of the average map for the current frame A C is corrected image using average map. If the difference between frames is smaller than the threshold value A th , the corresponding pixel value of the corrected image is replaced with the pixel value of reference frame. Otherwise, the pixel value of the corrected image is replaced with the current frame, and the pixel value of the average map for reference frame is also changed the pixel value of the average map for the current frame. Fig. 6 shows the corrected images using average map according to 2 by 2, 4-by-4 and 8-by-8 pixel areas. As shown in Fig. 6-c, when correcting the image using average map 8x8, the previous and the current frame is changed, but the pixels of the divided area are not properly reflected, resulting in a problem that the seam between the divided areas is noticeable. a. corrected image(2x2) b. corrected image(4x4) c. corrected image(8x8) Fig. 6 The correction of original image using average map Table: To resolve the problem in the average map, we suggest image correction using quadtree based summed area table (QSAT). QSAT is a map that is expanded by applying summed area table (SAT) to a quadtree. First, we create an SAT SAT M for reference and current frame as shown in Eq. 5.

3) Quadtree-Based Summed Area
The average value of each region is obtained by using SAT. If the average difference between corresponding regions in current and reference frame is greater than the threshold value S th , each region is subdivided into four subregions. This process is recursively repeated until the average difference between corresponding regions is less than S th or until it reaches the predefined smallest subregion. Fig. 7-a shows the result of image segmentation by quadtree using SAT.
The corrected image using QSAT can be generated by comparing each region divided into quadtree as shown in Fig.  7-b. If the average difference of the corresponding region between current and reference frame is less than S th , the corresponding region of the corrected image is replaced with the region for the reference frame. Otherwise, it is replaced with the region for the current frame. a. quadtree representation b. corrected image using QSAT Fig. 7 The correction of original image using QSAT

D. Rendering Step
The rendering step of suggested video based sketch rendering method is similar to the conventional still image based sketch rendering method except for the blurring step [19]. To represent natural sketch effect, we subdivide the blurring step into preprocessing and postprocessing blurring. If only pre-blurring is applied, the extracted feature line does not explicitly reflect the sketch effect.
In this study, we use Gaussian blur and bilateral filter algorithm. When the sketch rendering is performed using Gaussian blur [20], the feature line is expressed correctly but the line is thin, and the object is not clearly displayed. When the bilateral filter [21] is used, the thickness of the line is properly expressed, but unnecessary noise is extracted. Fig.  8 shows a comparison of images using Gaussian blur and bilateral filter in pre-blurring. a. Gaussian blur b. bilateral filter Fig. 8 The comparison of Gaussian blur and bilateral filter.
When these blurring methods are applied to a video, the positions of unnecessarily extracted noises are randomly changed every frame. To reduce this problem, we append post-blurring method to sketch rendered image. Fig. 9 shows the results obtained by using Gaussian blurring and bilateral filtering in pre-blurring and post-blurring, respectively. Fig. 9-a shows the result of applying a Gaussian blur to pre-blurring and bilateral filter to post-blurring. Noises have been removed, but the image is much blurred, so foreground objects and background are not clearly represented. As in the previous case, when Gaussian blur is applied to pre-blurring and post-blurring, the image is very blurry (Fig. 9-b). When applying a bilateral filter to pre-blurring and applying Gaussian blurring to post-blurring, the unnecessary noises remain unremoved as shown in Fig. 9-c. Finally, when the bilateral filter is used for pre-blurring and post-blurring, the noise is removed, and the resulting image is clearly displayed (Fig.9-d). In this study, we used bilateral filtering method in pre-blurring and post-blurring because it maximizes sketch effect.  Table 1 shows the ratio of changed pixels by comparing the corresponding pixel difference of background part (the red box area as shown in Figure 10) between neighboring frames. i F means a final sketch rendered image for the current frame, 1 − i F for the previous frame and 1 + i F for next frame. When the original image is used as it is, the ratio of changed pixels in the neighboring frames is highest. The animated scene produces flickering and popping effects. When the corrected image using background map is applied, the ratio of changed pixels is slightly reduced. Also, when the corrected image using average map is applied, the ratio is improved more than when background map is used. Finally, when applying the corrected image using QSAT, there is no changed pixel in the background part between neighboring frames. The suggested QSAT method can be removed a flickering and popping effect.  . 11 shows the resulting image using video-based sketch rendering. It can be seen that our suggested sketch rendering method is well applied not only in indoor but also outdoor scene. In this paper, we suggest original image correction method using reference maps such as background map, average map, and QSAT to reduce flickering effect at background part in correction step. Also, we append post blurring method using bilateral filtering to reduce the flickering effect and improve sketch effect in rendering step. Suggested rendering system can be used in various fields such as visual art, advertisement, game and movie for stylized image contents generation.
When a corrected image is generated by using QSAT, it sometimes appears that the boundary of the divided area is not smoothly connected. To remove this problem, we need to improve the proposed algorithm in future work.