Visualization of photo albums has recently attracted much attention with the goal of organizing personal photo albums on mobile devices. Although there are numerous album management systems, visualizing a photo cluster still remains a challenging issue. The most popular and reasonable visualization method of photo album is to display the representative photo from each photo cluster. In this paper, we propose a method that selects the representative photo given a photo cluster. To this end, three types of evaluations, which are aesthetic photo quality, visual similarity and semantic importance, are conducted within each cluster. The photo with the highest score from three evaluations is selected as the representative photo for visualization. From, experimental results, we confirm that the proposed algorithm provides reliable organization results for various personal albums.
Finding defects with automatic visual inspection techniques is an essential task in various industrial fields. Despite considerable studies to achieve this task successfully, most previous methods are still vulnerable to ambiguities from diverse shapes and sizes of defects. We introduce a simple yet powerful method to segment defects on various texture surfaces in an unsupervised manner. Specifically, our method is based on the multiscale scheme of the phase spectrum of Fourier transform. The proposed method can even handle one-dimensional long defect patterns (e.g., streaks by scratch), which have been known to be hard to process in previous methods. In contrast to traditional inspection methods limited to locating particular sorts of defects, our approach has the advantage that it can be applied to segmenting arbitrary defects, because of the nonlinear diffusion involved in the multiscale scheme. Extensive experiments demonstrate that the proposed method provides much better results for defect segmentation than several competitive methods presented in the literature.
In this paper, we present a novel convergence control method for toed-in stereo camera systems. The proposed
method automatically computes a convergence angle for both (i.e., left and right) cameras to locate a target
object at image center. Unlike other image based auto-convergence algorithms, the proposed method aims
at controlling the angle of yaw of stereo camera, and thus makes a disparity of the target object be zero while
capturing stereoscopic images. The proposed algorithm is based on the fact that an object at convergence position
has a zero-disparity in stereoscopic images under the toed-in camera configuration. As a result, we can avoid
the accommodation-convergence conflict while watching the target object in stereoscopic images. Experimental
results demonstrate that the proposed method effectively estimates convergence angles for target objects at
different distances.
We propose a novel image-importance model for content-aware image resizing. In contrast to the previous gradient magnitude-based approaches, we focus on the excellence of gradient domain statistics. The proposed scheme originates from a well-known property of the human visual system that the human visual perception is highly adaptive and sensitive to structural information in images rather than nonstructural information. We do not model the image structure explicitly, because there are diverse aspects of image structure and they cannot be easily modeled from cluttered natural images. Instead, our method obtains the structural information in an image by exploiting the gradient domain statistics in an implicit manner. Extensive tests on a variety of cluttered natural images show that the proposed method is more effective than the previous content-aware image-resizing methods and it is very robust to images with a cluttered background, unlike the previous schemes.
In this paper a new method for the autostereoscopic display, named the Dual Layer Parallax Barrier (DLPB) method, is
introduced to overcome the limitation of the fixed viewing zone. Compared with the conventional parallax barrier
methods, the proposed DLPB method uses moving parallax barriers to make the stereoscopic view changed according to
the movement of viewer. In addition it provides seamless stereoscopic views without abrupt change of 3D depth feeling
at any eye position. We implement a prototype of the DLPB system which consists of a switchable dual-layered Twisted
Nematic Liquid Crystal Display (TN-LCD) and a head-tracker. The head tracker employs a video camera for capturing
images, and is used to calculate the angle between the eye gazing direction and the projected direction onto the display
plane. According to the head-tracker's control signal, the dual-layered TN-LCD is able to alternate the direction of
viewing zone adaptively by a solid-state analog switch. The experimental results demonstrate that the proposed
autostereoscopic display maintains seamless 3D views even when a viewer's head is moving. Moreover, its extended use
towards mobile devices such as portable multimedia player (PMP), smartphone, and cellular phone is discussed as well.
This paper presents a method to estimate the number of people in crowded scenes without using explicit object
segmentation or tracking. The proposed method consists of three steps as follows: (1) extracting space-time interest
points using eigenvalues of the local spatio-temporal gradient matrix, (2) generating crowd regions based on space-time
interest points, and (3) estimating the crowd density based on the multiple regression. In experimental results, the
efficiency and robustness of our proposed method are demonstrated by using PETS 2009 dataset.
The user-friendliness and cost-effectiveness have contributed to the growing popularity of mobile phone cameras.
However, images captured by such mobile phone cameras are easily distorted by a wide range of factors, such
as backlight, over-saturation, and low contrast. Although several approaches have been proposed to solve the
backlight problems, most of them still suffer from distorted background colors and high computational complexity.
Thus, they are not deployable in mobile applications requiring real-time processing with very limited resources. In
this paper, we present a novel framework to compensate image backlight for mobile phone applications based on
an adaptive pixel-wise gamma correction which is computationally efficient. The proposed method is composed
of two sequential stages: 1) illumination condition identification and 2) adaptive backlight compensation. Given
images are classified into facial images and non-facial images to provide prior knowledge for identifying the
illumination condition at first. Then we further categorize the facial images into backlight images and nonbacklight
images based on local image statistics obtained from corresponding face regions. We finally compensate
the image backlight using an adaptive pixel-wise gamma correction method while preserving global and local
contrast effectively. To show the superiority of our algorithm, we compare our proposed method with other
state-of-the-art methods in the literature.
In this paper, we propose a novel image importance model for image
retargeting. The most widely used image importance model in existing
image retargeting methods is L1-norm or L2-norm of gradient
magnitude. It works well under non-complex environment. However, the
gradient magnitude based image importance model often leads to
severe visual distortions when the scene is cluttered or the
background is complex. In contrast to the most previous approaches,
we focus on the excellence of gradient domain statistics (GDS) for
more effective image retargeting rather than the gradient magnitude
itself. In our work, the image retargeting is developed in the sense
of human visual perception. We assume that the human visual
perception is highly adaptive and sensitive to structural
information in an image rather than non-structural information. We
do not model the image structure explicitly since there are diverse
aspects of image structure. Instead, our method obtains the
structural information in an image by exploiting the gradient domain
statistics in an implicit manner. Experimental results show that the
proposed method is more effective than the previous image
retargeting methods.
In this paper, we propose an automatic image browsing method based on image categorization to effectively browse
high-resolution images on small-display-devices such as cellular phones and digital cameras. Based on face detection
algorithm and spectrum analysis, input images are categorized into face images and non-face images. The non-face
images are again categorized into close-up view images and non-close-up view images. For the non-close-up view
images, we conduct further classification into images-with-vanishing-point and images-without-vanishing-point. In the
face images case, the browsing path is determined by face locations. In images with vanishing point case, the path is
decided along the vanishing lines, while in case of images without vanishing point, we detect the saliency regions using
color variance and edges, and the browsing route is determined by the location of saliency regions. We estimate the
accuracy of the proposed image classification algorithm through experiments. Subjective evaluation is also conducted to
assess the proposed system for automatic image browsing. Experimental results indicate that our system increases
viewing satisfaction to small-display viewers.
Scorebox plays an important role in understanding contents of sports videos. However, the tiny scorebox may give
the small-display-viewers uncomfortable experience in grasping the game situation. In this paper, we propose a novel
framework to extract the scorebox from sports video frames. We first extract candidates by using accumulated intensity
and edge information after short learning period. Since there are various types of scoreboxes inserted in sports videos,
multiple attributes need to be used for efficient extraction. Based on those attributes, the optimal information gain is
computed and top three ranked attributes in terms of information gain are selected as a three-dimensional feature vector
for Support Vector Machines (SVM) to distinguish the scorebox from other candidates, such as logos and advertisement
boards. The proposed method is tested on various videos of sports games and experimental results show the efficiency
and robustness of our proposed method.
Mobile IPTV is a multimedia service based on wireless networks with interactivity and mobility. Under mobile IPTV
scenarios, people can watch various contents whenever they want and even deliver their request to service providers
through the network. However, the frequent change of the wireless channel bandwidth may hinder the quality of service.
In this paper, we propose an objective video quality measure (VQM) for mobile IPTV services, which is focused on the
jitter measurement. Jitter is the result of frame repetition during the delay and one of the most severe impairments in the
video transmission via mobile channels. We first employ YUV color space to compute the duration and occurrences of
jitter and the motion activity. Then the VQM is modeled by the combination of these three factors and the result of
subjective assessment. Since the proposed VQM is based on no-reference (NR) model, it can be applied for real-time
applications. Experimental results show that the proposed VQM highly correlates to subjective evaluation.
KEYWORDS: Video, Detection and tracking algorithms, Algorithm development, Video processing, RGB color model, Communication engineering, Cameras, Video compression, Visualization, 3D image processing
In this paper, we present a method for reducing the intensity of shadows cast on the ground in outdoor sports videos to
provide TV viewers with a better viewing experience. In the case of soccer videos taken by a long-shot camera
technique, it is difficult for viewers to discriminate the tiny objects (i.e., soccer ball and player) from the ground
shadows. The algorithm proposed in this paper comprises three modules, such as long-shot detection, shadow region
extraction and shadow intensity reduction. We detect the shadow region on the ground by using the relationship between
Y and U values in YUV color space and then reduce the shadow components depending on the strength of the shadows.
Experimental results show that the proposed scheme offers useful tools to provide a more comfortable viewing
environment and is amenable to real-time performance even in a software based implementation.
KEYWORDS: Image segmentation, Cameras, Digital cameras, Image processing algorithms and systems, 3D modeling, Imaging systems, Radon, Reconstruction algorithms, Communication engineering, Detection and tracking algorithms
In this paper, we present a novel method for removing foreground objects in multi-view images. Unlike the conventional
methods, which locate the foreground objects interactive way, we intend to develop an automated system. The proposed
algorithm consists of two modules: 1) object detection and removal, and 2) detected foreground filling stage. The depth
information of multi-view images is a critical cue adopted in this algorithm. By multi-view images, it is not meant a
multi-camera equipped system. We use only one digital camera and take photos by hand. Although it may cause bad
matching result, it is sufficient to detect and remove the foreground object by using coarse depth information. The
experimental results indicate that the proposed algorithm provides an effective tool, which can be used in applications for
digital camera, photo-realistic scene generation, digital cinema and so on.
KEYWORDS: Image segmentation, 3D image processing, Image analysis, 3D displays, Communication engineering, Image classification, Edge detection, Imaging systems, Digital image processing, Radio over Fiber
With increasing demands of 3D contents, conversion of many existing two-dimensional contents to three-dimensional
contents has gained wide interest in 3D image processing. It is important to estimate the relative depth map in a single-view
image for the 2D-To-3D conversion technique. In this paper, we propose an automatic conversion method that
estimates the depth information of a single-view image based on degree of focus of segmented regions and then
generates a stereoscopic image. Firstly, we conduct image segmentation to partition an image into homogeneous regions.
Then, we construct a higher-order statistics (HOS) map, which represents the spatial distribution of high-frequency
components of the input image. the HOS is known to be well suited for solving detection and classification problems
because it can suppress Gaussian noise and preserve some of non-Gaussian information. We can estimate a relative depth
map with these two cues and then refine the depth map by post-processing. Finally, a stereoscopic image is generated by
calculating the parallax values of each region using the generated depth-map and the input image.
KEYWORDS: Video, Multimedia, Mobile devices, LCDs, Visualization, Video processing, Detection and tracking algorithms, Cameras, Personal digital assistants, Motion models
Mobile devices have been transformed from voice communication tools to advanced tools for consuming multimedia contents. The extensive use of such mobile devices entails watching multimedia contents on small LCD panels. However, the most of video sequences are captured for normal viewing on standard TV or HDTV, but for cost reasons, merely resized and delivered without additional editing. This may give the small-display-viewers uncomfortable experiences in understanding what is happening in a scene. For instance, in a soccer video sequence taken by a long-shot camera technique, the tiny objects (e.g., soccer ball and players) may not be clearly viewed on the small LCD panel. Thus, an intelligent display technique needs to be developed to provide small-display-viewers with better experience. To this end, one of the key technologies is to determine region of interest (ROI) and display the magnified ROI on the screen, where ROI is a part of the scene that viewers pay more attention to than other regions. In this paper, which is an extension from our prior work, we focus on soccer video display for mobile devices, and a fully automatic and computationally efficient method is proposed. Instead of taking generic approaches utilizing visually salient features, we take domain-specific approach to exploit the attributes of the soccer video. The proposed scheme consists of two stages: shot classification, and ROI determination. The experimental results show that the proposed scheme offers useful tools for intelligent video display for multimedia mobile devices.
KEYWORDS: Video, Mobile devices, LCDs, Multimedia, Visualization, RGB color model, Cameras, Video processing, Multimedia signal processing, Image resolution
A fully automatic and computationally efficient method is proposed for intelligent display of soccer video on small multimedia mobile devices. The rapid progress of the multimedia signal processing has contributed to the extensive use of multimedia devices with small LCD panel. With these flourishing mobile devices with small display, the video sequences captured for normal viewing on standard TV or HDTV may give the small-display-viewers uncomfortable experiences in understanding what is happening in a scene. For instance, in a soccer video sequence taken by a longshot camera technique, the tiny objects (e.g., soccer ball and players) may not be clearly viewed on the small LCD panel. Thus, an intelligent display technique is needed for small-display-viewers. To this end, one of the key technologies is to determine region of interest, which is a part of the scene that viewers pay more attention to than other regions. In this paper, we only focus on soccer video display for mobile devices. Instead of taking visual saliency into account, we take domain-specific approach to exploit the characteristics of the soccer video. We propose a context-aware soccer video display scheme, which includes three folds: ground color learning, shot classification, and ROI determination. The experimental results show the propose scheme is capable of context-aware video display on mobile devices.
The paper proposes a novel unsupervised video object segmentation algorithm for image sequences with low depth-offield (DOF), which is a popular photographic technique enabling to represent the intention of photographer by giving a clear focus only on an object-of-interest (OOI). The proposed algorithm largely consists of two modules. The first module automatically extracts OOIs from the first frame by separating sharply focused OOIs from other out-of-focused foreground or background objects. The second module tracks OOIs for the rest of the video sequence, aimed at running the system in real-time, or at least, semi-real-time. The experimental results indicate that the proposed algorithm provides an effective tool, which can be a basis of applications, such as video analysis for virtual reality, immersive
video system, photo-realistic video scene generation and video indexing systems.
We propose a novel algorithm to partition an image with low depth-of-field (DOF) into focused object-of-interest (OOI) and defocused background. The proposed algorithm consists of three steps. In the first step, we transform the low DOF image into an appropriate feature space for the partition. This is conducted by computing higher-order statistics (HOS) for all pixels in the low DOF image. Next, the obtained feature space, which is called HOS map, is simplified by removing small dark holes and bright patches using a morphological filter by reconstruction. Finally, the OOI is extracted by applying region merging and adaptive thresholding to the simplified image. Experimental results show that the proposed method yields more accurate segmentation results than the previous methods.
KEYWORDS: Video, Digital watermarking, Computer programming, Detection and tracking algorithms, Matrices, Multimedia, Video processing, Image processing, Genetic algorithms, Digital imaging
This paper proposes a novel sequence matching technique to detect copies of a video clip. This copy detection scheme can be used as either an alternative approach or a complementary approach to watermarking for copyright protection. It can also be used for media tracking. A challenge is that the different digitizing and encoding processes give rise to several distortions, such as changes in brightness, changes in color, different blocky artifacts, changes in frame format, and so on. While there exist several algorithms proposed, most of them focus on coping with signal distortions introduced by different encoding parameters. However, we note techniques to deal with aspect ratio conversions must be evenly taken into account because those conversions are frequent in practice. To this end, each image frame is partitioned into 2x2 by intensity averaging, and the partitioned values are stored for indexing and matching. Our spatio-temporal approach combines spatial matching of ordinal signatures obtained from the partitions of each frame and temporal matching of temporal signatures from the temporal trails of the partitions. The efficacy of the proposed method has been extensively tested and the results show the proposed scheme is efficient to deal with various modifications.
Detecting unauthorized copies of digital media (images, audio and video) is a basic requirement for Intellectual Property Right (IPR) protection. This paper proposes a novel method to detect copies of digital images. This copy detection scheme can be used as either an alternative approach or a complementary approach to watermarking. A test image is first reduced to 8×8 sub-image by intensity averaging, and then, the AC coefficients of its discrete cosine transform (DCT) are used to compute distance from those generated from the query image, of which a user wants to find copies. A challenge arises when copies are processed to avoid copy detection or enhance image quality. We show the ordinal measure of DCT coefficients, which is based on relative ordering of AC magnitude values and using distance metrics between two rank permutations, are robust to various modifications of the original image. The optimal threshold selection scheme using the maximum a posteriori (MAP) criterion is also addressed. Through simulations on the database of 40,000 images we show the effectiveness of the proposed system.
KEYWORDS: Linear filtering, Video coding, Quantization, Image compression, Video, Computer programming, Visualization, Video compression, Image filtering, Digital filtering
In this paper, we present a method to reduce blocking and ringing artifacts in low bit-rate block-based video coding. For each block, its DC value and DC values of the surrounding eight neighbor blocks are exploited to predict low frequency AC coefficients. Those predicted AC coefficients allow to infer spatial characteristics of a block before quantization stage in the encoding system. They are used to classify each block into either of two categories, low-activity and high-activity block. In the following post-processing stage, two kinds of low pass filters are adaptively applied according to the classified result on each block. It allows for strong low pass filtering in low-activity regions where the blocking artifacts are most noticeable, whereas it allows for weak low pass filtering in high-activity regions to reduce ringing noise as well as blocking artifacts without introducing undesired blur. In the former case, the blocking artifacts are reduced by one dimensional (1-D) horizontal and vertical low pass filters, and selective use of horizontal/vertical direction is adopted depending on the absolute values of the predicted AC coefficients. In the latter case, deblocking and deringing is conducted by a single filter, which makes the architecture simple. TMN8 decoder for H.263+ is used to test the proposed method. The experimental results show that the proposed algorithm is efficient and effective in reducing ringing artifacts as well as blocking artifacts from the low bit-rate block-based video coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.