In this paper, we propose a new lightweight hybrid video codec consisting of a conventional video codec (HEVC or VVC), a lossless image codec, and our new restoration network. The encoder is composed of a conventional video encoder and a lossless image encoder. It transmits a lossy-compressed video bitstream along with a losslessly compressed reference frame. The decoder is constructed with corresponding video/image decoders and a new restoration network, which enhances the compressed video in two-step processes. The first step involves using a network that has been trained with a video dataset to restore the details that are lost by the conventional encoder. After this, we enhance the video quality by using a reference image that is a losslessly compressed video frame. The reference image provides video-specific information, which can be utilized to better restore the details of a compressed video. Experimental results show that the overall coding gain is comparable to recent top-tier neural codecs while requiring much less encoding time and lower complexity. Our code is available at https://github.com/myideaisgood/hybrid_video_compression_rhee.
Single-Image Super-Resolution (SISR) is a technique used to create high-resolution images from low-resolution ones. However, since the low-resolution images lack high-frequency components compared to their high-resolution counterparts, recreating the missing information becomes a crucial task. To address this, we propose directly penalizing the training loss in the frequency domain, in addition to the spatial domain. Our approach involves introducing an adversarial loss for training patches that are converted to the frequency domain using the Discrete Cosine Transform. We use a discriminator consisting of a convolutional neural network for this purpose. We also incorporate the Wavelet-domain High-Frequency Loss, which emphasizes the high-frequency spectrum. Our experiments have demonstrated that our approach can improve both quantitative and qualitative outcomes.
JPEG is a widely used image compression standard that shows a reasonable image quality for a wide range of compression rates. However, when compressed with a low compression quality factor to increase the compression rate, it brings a large loss in the frequency domain, which turns into visible artifacts in the image domain. Accordingly, removing artifacts in JPEG-compressed images has been an essential image restoration task. While most previous methods use the information on compression quality factors available in the header of the JPEG file, we note this approach is not practical in the real-world scenario because many compressed images’ metadata are not exist. To deal with this issue, we propose a new method based on a Deformable Offset Gating Network (DOGNet) and a Variational Autoencoder (VAE). We train the overall network in an end-to-end manner, where the role of the VAE is to guide the offset of the deformable convolution to flexibly deal with images compressed with diverse and unknown quality factors. Extensive experiments validate that our method achieves better or comparable results to the state-of-the-art methods in JPEG Artifact Removal.
Single-Image Super-Resolution methods typically assume that a low-resolution image is degraded from a high-resolution one through “bicubic” kernel convolution followed by downscaling. However, this induces a domain gap between training image datasets and the real scenario’s test images, which are down-sampled from the images that underwent convolution with arbitrary unknown kernels. Hence, correct kernel estimation for a given real-world image is necessary for its better super-resolution. One of the kernel estimation methods, KernelGAN locates the input image in the same domain of high-resolution image for accurate estimation. However, using only a low-resolution image cannot fully utilize the high-frequency information in the original image. To increase the estimation accuracy, we adopt a superresolved image for kernel estimation. Also, we use a flow-based kernel prior to getting a reasonable super-resolved image and stabilize the whole estimation process.
Video super-resolution (VSR) aims to generate a high-resolution (HR) video by reconstructing each frame from the corresponding low-resolution (LR) frame and its neighboring ones. However, it is challenging to generate a clean HR frame due to displacement between the frames. To alleviate this problem, most existing approaches explicitly aligned neighboring frames to the reference. However, these methods tend to generate noticeable artifacts in the aligned results. In this paper, we propose a detail-structure blending network (DSBN), which structurally aligns the adjacent frames by blending the frames in deep-feature-domain. The DSBN extracts deep features from the reference frame and each neighboring frame separately, and blends the extracted detail and structure information to obtain the well-aligned results. Afterward, a simple reconstruction network generates a final HR frame using the aligned frames. Experimental results on the benchmark datasets demonstrate that our method produces high-quality HR results even from videos with non-rigid motions and occlusions.
This paper proposes a blind detection of fabric defects using multiple image features. The aim of proposed method is to detect the new types of fabric defects that have not been learned. For general learning of image features, this paper first learns the characteristics of image features between normal and defective image patches for various types of fabric structures and defects. The image features are frequency coefficients, color histogram, and edge orientation histogram. The mean vectors of 3 image features are calculated, and the vector distance distributions of normal and defective patches are learned using support vector machine. Since the decision boundary is determined from general distributions of image features, the proposed method is not restricted to the detect types and fabric structures which have not used in the learning phase. According to experiments with the real fabric images and defects types that have not been learned, the proposed method detects fabric defects with 96.4% accuracy at 0.4% failure including false positive and false negative errors. This result outperforms the usual CNN (Convolutional Neural Network) approaches.
Single image super-resolution (SISR) is a classical problem in low-level vision task, which aims to find a mapping function from a single low-resolution (LR) input to corresponding high-resolution (HR) output. Recently, deep networks have achieved great success in SISR task. Recent successful deep models mostly consist of stacked the same size convolution filters whose size is mostly 3 by 3. To cope with more variations in both local dependencies and global contexts between LR and HR images, we propose multi-kernel based deep residual networks for SISR. Since larger kernel requires more parameters, we adopt dilated convolution for increasing the size of the receptive field. Also, we adopt local feature fusion, global feature fusion and local residual learning for controlling the multi-scale features and hence for the better performance by accelerating the convergence. Experimental results show that our proposed model yields improved performance.
A skew-estimation method using straight lines in document images is presented. Unlike conventional approaches exploiting the properties of text, we formulate the skew-estimation problem as an estimation task using straight lines in images and focus on robust and accurate line detection. To be precise, we adopt a block-based edge detector followed by a progressive line detector to take clues from a variety of sources such as text lines, boundaries of figures/tables, vertical/horizontal separators, and boundaries of textblocks. Extensive experiments on the datasets of skewed images and competition results reveal that the proposed method works robustly and yields accurate skew-estimation results.
Conventional image stitching methods were developed under the assumption or condition that (1) the optical center of a camera is fixed (fixed-optical-center case) or (2) the camera captures a plane target (plane-target case). Hence, users should know or test which condition is more appropriate for the given set of images and then select a right algorithm or try multiple stitching algorithms. We propose a unified framework for the image stitching and rectification problem, which can handle both cases in the same framework. To be precise, we model each camera pose with six parameters (three for the rotation and three for the translation) and develop a cost function that reflects the registration errors on a reference plane. The designed cost function is effectively minimized via the Levenberg–Marquardt algorithm. For the given set of images, when it is found that the relative camera motions between the images are large, the proposed method performs rectification of images and then composition using the rectified images; otherwise, the algorithm simply builds a visually pleasing result by selecting a viewpoint. Experimental results on synthetic and real images show that our method successfully performs stitching and metric rectification.
A generalized wavelet domain image fusion method which imposes weights on each of the wavelet coefficients for improving the conventional wavelet domain approach is presented. The weights are controlled in the least-squares sense for enhancing the details while suppressing excessive high frequency components. In experiments with IKONOS and QuickBird satellite data, we demonstrated that the proposed method shows a comparable or better performance than conventional methods in terms of various objective quality metrics.
When the horizon or long edges are skewed in photos, they may seem unstable unless they are artistic intentions, and hence we may wish to correct the skews. For the skew correction of faint as well as strong horizons, we propose a skew estimation method for natural images. We first apply a long-block-based edge detector that can construct edge maps even when the edge is faint and/or background is cluttered. We also propose a robust line-detection method that uses the generated edge map, based on progressive probabilistic Hough transform followed by refinement steps. For each of the detected lines, we define their weight and estimate the image skew based on the weighted votes from the lines. Since all the pixels in the long-blocks are used for the edge-map construction, the proposed method can find noisy or faint lines while rejecting curved or short lines. Experimental results show that the first salient angle corresponds with the image skew in most cases, and the skews are successfully corrected.
This paper proposes an algorithm for the detection of pillars or posts in the video captured by a single camera
implemented on the fore side of a room mirror in a car. The main purpose of this algorithm is to complement
the weakness of current ultrasonic parking assist system, which does not well find the exact position of pillars or
does not recognize narrow posts. The proposed algorithm is consisted of three steps: straight line detection, line
tracking, and the estimation of 3D position of pillars. In the first step, the strong lines are found by the Hough
transform. Second step is the combination of detection and tracking, and the third is the calculation of 3D
position of the line by the analysis of trajectory of relative positions and the parameters of camera. Experiments
on synthetic and real images show that the proposed method successfully locates and tracks the position of
pillars, which helps the ultrasonic system to correctly locate the edges of pillars. It is believed that the proposed
algorithm can also be employed as a basic element for vision based autonomous driving system.
We propose a new autofocus method for digital cameras based on the separation of color components from the incoming light rays and the measure of their disparity. For separating color components, we place two apertures with the red and blue color filters on them. This enables us to get two monochromatic images that have disparities proportional to the distances of the objects from the camera. We also propose a new measure to find the disparity of these color components because the conventional disparity measures show low accuracy for the pair of different color channel images. The measure is based on the observation that the overlap of images with disparity has many weak gradients, whereas the overlap with no disparity has a small number of strong gradients. One of two images is shifted from left to right, and the measure is computed for each position. Then, the position with the maximum measure is considered as the disparity, and the direction and distance of focus are computed from the estimated disparity and camera parameters. The proposed method is implemented in a commercial compact camera, and it has been demonstrated that the method finds the focus in a wide range of distance and illumination conditions.
This paper proposes a new colorization method based on the chrominance blending. The weights for the blending are computed by using the random walker algorithm, which is a soft segmentation technique that provides sharp probability transition on object boundaries. As a result, the proposed method reduces color bleeding and provides improved colorization performances compared to conventional ones.
We propose a new method that classifies wafer images according to their defect types for automatic defect classification in semiconductor fabrication processes. Conventional image classifiers using global properties cannot be used in this problem, because the defects usually occupy very small regions in the images. Hence, the defects should first be segmented, and the shape of the segment and the features extracted from the region are used for classification. In other words, we need to develop a classification-after-segmentation approach for the use of features from the small regions corresponding to the defects. However, the segmentation of scratch defects is not easy due to the shrinking bias problem when using conventional methods. We propose a new Markov random field-based method for the segmentation of wafer images. Then we design an AdaBoost-based classifier that uses the features extracted from the segmented local regions.
We proposed an algorithm for the tracking of facial feature points based on the block matching algorithm (BMA) with a new shape of window considering the feature point characteristics and scale/angle changes of the face. The window used in the proposed algorithm is the set of pixels in the 8 radial lines of 0 degree(s),45 degree(s),... from the feature point, i.e. the window has the shape of cross plus 45 degree(s) rotated cross. This shape of window is shown to be more efficient than the conventional rectangular window in tracking the facial feature points, because the points and their neighbor are not usually the objects of rigid body. But since the feature points are usually on the edges of luminance or color changes, at least one of the radial line crosses the edge and it gives distinct measure for tracking the point. Also the radial line window requires less computational complexity than the rectangular window and more readily adjusted with respect to scale and angle changes. For the estimation of scale changes, the facial region is segmented at each frame using the normalized color, and the number of pixels in the facial region are compared.
An algorithm is proposed for the detection of scene changes in video sequences. The algorithm is based on the comparison of several features which represents the characteristics of frames. More specifically, the feature extraction is confined to several blocks that contains strong edges instead of the overall image as in the conventional algorithms, in order to concentrate more on the important colors and objects than the backgrounds. Several non-overlapping blocks of predefined size are first found, which contain strong edges in the frame. Then, three different kinds of features are extracted by using the pixels in the blocks. One is the color histogram of pixels in the blocks, the second one is the sum of absolute difference (SAD) between the blocks of current and previous frame as in video coding, and the third is the number of active blocks, which have the edge strength larger than a given threshold. The dissolve and wipe are detected by comparing the histogram of the blocks, the cut is detected by the SAD, and fade-in/outs are detected by the number of active blocks. The comparison of several test sequences shows that the color histogram from the strong edge blocks is a promising feature for detecting wipes and dissolves. Also, cut detection performance by the SAD of strong edge blocks is shown to be comparable to the conventional feature based algorithm. The fade-in/outs are also easily detected with high precision, by counting the number of active blocks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.