In this investigation we study the effects of compression and frame rate reduction on the performance of four video
analytics (VA) systems utilizing a low complexity scenario, such as the Sterile Zone (SZ). Additionally, we identify the
most influential scene parameters affecting the performance of these systems. The SZ scenario is a scene consisting of a
fence, not to be trespassed, and an area with grass. The VA system needs to alarm when there is an intruder (attack)
entering the scene. The work includes testing of the systems with uncompressed and compressed (using H.264/MPEG-4
AVC at 25 and 5 frames per second) footage, consisting of quantified scene parameters. The scene parameters include
descriptions of scene contrast, camera to subject distance, and attack portrayal. Additional footage, including only
distractions (no attacks) is also investigated. Results have shown that every system has performed differently for each
compression/frame rate level, whilst overall, compression has not adversely affected the performance of the systems.
Frame rate reduction has decreased performance and scene parameters have influenced the behavior of the systems
differently. Most false alarms were triggered with a distraction clip, including abrupt shadows through the fence.
Findings could contribute to the improvement of VA systems.
In this investigation we identify relationships between human and automated face recognition systems with respect to
compression. Further, we identify the most influential scene parameters on the performance of each recognition system.
The work includes testing of the systems with compressed Closed-Circuit Television (CCTV) footage, consisting of
quantified scene (footage) parameters. Parameters describe the content of scenes concerning camera to subject distance,
facial angle, scene brightness, and spatio-temporal busyness. These parameters have been previously shown to affect the
human visibility of useful facial information, but not much work has been carried out to assess the influence they have
on automated recognition systems. In this investigation, the methodology previously employed in the human
investigation is adopted, to assess performance of three different automated systems: Principal Component Analysis,
Linear Discriminant Analysis, and Kernel Fisher Analysis. Results show that the automated systems are more tolerant to
compression than humans. In automated systems, mixed brightness scenes were the most affected and low brightness
scenes were the least affected by compression. In contrast for humans, low brightness scenes were the most affected and
medium brightness scenes the least affected. Findings have the potential to broaden the methods used for testing imaging
systems for security applications.
What is the best luminance contrast weighting-function for image quality optimization? Traditionally measured contrast
sensitivity functions (CSFs), have been often used as weighting-functions in image quality and difference metrics. Such
weightings have been shown to result in increased sharpness and perceived quality of test images. We suggest contextual
CSFs (cCSFs) and contextual discrimination functions (cVPFs) should provide bases for further improvement, since
these are directly measured from pictorial scenes, modeling threshold and suprathreshold sensitivities within the context
of complex masking information. Image quality assessment is understood to require detection and discrimination of
masked signals, making contextual sensitivity and discrimination functions directly relevant.
In this investigation, test images are weighted with a traditional CSF, cCSF, cVPF and a constant function. Controlled
mutations of these functions are also applied as weighting-functions, seeking the optimal spatial frequency band
weighting for quality optimization. Image quality, sharpness and naturalness are then assessed in two-alternative forced-choice
psychophysical tests. We show that maximal quality for our test images, results from cCSFs and cVPFs, mutated
to boost contrast in the higher visible frequencies.
An evaluation of the change in perceived image contrast with changes in displayed image size was carried out. This was achieved using data from four psychophysical investigations, which employed techniques to match the perceived contrast of displayed images of five different sizes. A total of twenty-four S-shape polynomial functions were created and applied to every original test image to produce images with different contrast levels. The objective contrast related to each function was evaluated from the gradient of the mid-section of the curve (gamma). The manipulation technique took into account published gamma differences that produced a just-noticeable-difference (JND) in perceived contrast. The filters were designed to achieve approximately half a JND, whilst keeping the mean image luminance unaltered. The processed images were then used as test series in a contrast matching experiment. Sixty-four natural scenes, with varying scene content acquired under various illumination conditions, were selected from a larger set captured for the purpose. Results showed that the degree of change in contrast between images of different sizes varied with scene content but was not as important as equivalent perceived changes in sharpness 1.
This investigation examines the relationships between image fidelity, acceptability thresholds and scene content for images distorted by lossy compression. Scene characteristics of a sample set of images, with a wide range of representative scene content, were quantified, using simple measures (scene metrics), which had been previously found to correlate with global scene lightness, global contrast, busyness, and colorfulness. Images were compressed using the lossy JPEG 2000 algorithm to a range of compression ratios, progressively introducing distortion to levels beyond the threshold of detection. Twelve observers took part in a paired comparison experiment to evaluate the perceptibility threshold compression ratio. A further psychophysical experiment was conducted using the same scenes, compressed to higher compression ratios, to identify the level of compression at which the images became visually unacceptable. Perceptibility and acceptability thresholds were significantly correlated for the test image set; both thresholds also correlated with the busyness metric. Images were ranked for the two thresholds and were further grouped, based upon the relationships between perceptibility and acceptability. Scene content and the results from the scene descriptors were examined within the groups to determine the influence of specific common scene characteristics upon both thresholds.
This paper describes continuing research concerned with the measurement and modeling of human spatial contrast sensitivity and discrimination functions, using complex pictorial stimuli. The relevance of such functions in image quality modeling is also reviewed. Previously1,2 we presented the choice of suitable contrast metrics, apparatus and laboratory set-up, the stimuli acquisition and manipulation, the methodology employed in the subjective tests and initial findings. Here we present our experimental paradigm, the measurement and modeling of the following visual response functions: i) Isolated Contrast Sensitivity Function (iCSF); Contextual Contrast Sensitivity Function (cCSF); Isolated Visual Perception Function (iVPF); Contextual Visual Perception Function (cVPF). Results indicate that the measured cCSFs are lower in magnitude than the iCSFs and flatter in profile. Measured iVPFs, cVPFs and cCSFs are shown to have similar profiles. Barten’s contrast detection model3 was shown to successfully predict iCSF. For a given frequency band, the reduction, or masking of cCSF compared with iCSF sensitivity is predicted from the linear amplification model (LAM)4. We also show that our extension of Barten’s contrast discrimination model1,5 is capable of describing iVPFs and cVPFs. We finally reflect on the possible implications of the measured and modeled profiles of cCSF and cVPF to image quality modeling.
KEYWORDS: Visualization, Contrast sensitivity, Spatial frequencies, Data modeling, Signal detection, Image filtering, LCDs, Human vision and color perception, Mathematical modeling, Visual process modeling
Shape, form and detail define image structure in our visual world. These attributes are dictated primarily by local variations in luminance contrast. Defining human contrast sensitivity (threshold of contrast perception) and contrast discrimination (ability to differentiate between variations in contrast) directly from real complex scenes is of outermost relevance to our understanding of spatial vision. The design and evaluation of imaging equipment, used in both field operations and security applications, require a full description of strengths and limitations of human spatial vision. This paper is concerned with the measurement of the following four human contrast sensitivity functions directly from images of complex scenes: i) Isolated Contrast Sensitivity (detection) Function (iCSF); ii) Contextual Contrast Sensitivity (detection) Function (cCSF); iii) Isolated Visual Perception (discrimination) Function (iVPF) and iv) Contextual Visual Perception (discrimination) Function (cVPF). The paper also discusses the following areas: Barten’s mathematical framework for modeling contrast sensitivity and discrimination; spatial decomposition of image stimuli to a number of spatial frequency bands (octaves); suitability of three different relevant image contrast metrics; experimental methodology for subjective tests; stimulus conditions. We finally present and discuss initial findings for all four measured sensitivities.
The aim of our research is to specify experimentally and further model spatial frequency response functions, which
quantify human sensitivity to spatial information in real complex images. Three visual response functions are measured:
the isolated Contrast Sensitivity Function (iCSF), which describes the ability of the visual system to detect any spatial
signal in a given spatial frequency octave in isolation, the contextual Contrast Sensitivity Function (cCSF), which
describes the ability of the visual system to detect a spatial signal in a given octave in an image and the contextual Visual
Perception Function (VPF), which describes visual sensitivity to changes in suprathreshold contrast in an image. In this
paper we present relevant background, along with our first attempts to derive experimentally and further model the VPF
and CSFs. We examine the contrast detection and discrimination frameworks developed by Barten, which we find
provide a sound starting position for our own modeling purposes. Progress is presented in the following areas:
verification of the chosen model for detection and discrimination; choice of contrast metrics for defining contrast
sensitivity; apparatus, laboratory set-up and imaging system characterization; stimuli acquisition and stimuli variations;
spatial decomposition; methodology for subjective tests. Initial iCSFs are presented and compared with ‘classical’
findings that have used simple visual stimuli, as well as with more recent relevant work in the literature.
The objective of this investigation is to produce recommendations for acceptable bit-rates of CCTV footage of people
onboard London buses. The majority of CCTV recorders on buses use a proprietary format based on the H.264/AVC
video coding standard, exploiting both spatial and temporal redundancy. Low bit-rates are favored in the CCTV industry
but they compromise the image usefulness of the recorded imagery. In this context usefulness is defined by the presence
of enough facial information remaining in the compressed image to allow a specialist to identify a person. The
investigation includes four steps: 1) Collection of representative video footage. 2) The grouping of video scenes based on
content attributes. 3) Psychophysical investigations to identify key scenes, which are most affected by compression. 4)
Testing of recording systems using the key scenes and further psychophysical investigations. The results are highly
dependent upon scene content. For example, very dark and very bright scenes were the most challenging to compress,
requiring higher bit-rates to maintain useful information. The acceptable bit-rates are also found to be dependent upon
the specific CCTV system used to compress the footage, presenting challenges in drawing conclusions about universal
‘average’ bit-rates.
This paper proposes a visual scene busyness indicator obtained from the properties of a full spatial segmentation of static images. A fast and effective region merging scheme is applied for this purpose. It uses a semi-greedy merging criterion and an adaptive threshold to control segmentation resolution. The core of the framework is a hierarchical parallel merging model and region reduction techniques. The segmentation procedure consists of the following phases: 1. algorithmic region merging, and 2. region reduction, which includes small segment reduction and enclosed region absorption. Quantitative analyses on standard benchmark data have shown the procedure to compare favourably to other segmentation methods. Qualitative assessment of the segmentation results indicate approximate semantic correlations between segmented regions and real world objects. This characteristic is used as a basis for quantifying scene busyness in terms of properties of the segmentation map and the segmentation process that generates it. A visual busyness indicator based on full colour segmentation is evaluated against conventional measures.
In this paper an evaluation of the degree of change in the perceived image sharpness with changes in displayed image
size was carried out. This was achieved by collecting data from three psychophysical investigations that used techniques
to match the perceived sharpness of displayed images of three different sizes. The paper first describes a method
employed to create a series of frequency domain filters for sharpening and blurring. The filters were designed to achieve
one just-noticeable-difference (JND) in quality between images viewed from a certain distance and having a certain
displayed image size (and thus angle of subtense). During psychophysical experiments, the filtered images were used as
a test series for sharpness matching. For the capture of test-images, a digital SLR camera with a quality zoom lens was
used for recording natural scenes with varying scene content, under various illumination conditions. For the
psychophysical investigation, a total of sixty-four original test-images were selected and resized, using bi-cubic
interpolation, to three different image sizes, representing typical displayed sizes. Results showed that the degree of
change in sharpness between images of different sizes varied with scene content.
This study aims to introduce improvements in the predictions of device-dependent image quality metrics (IQMs). A
validation experiment was first carried out to test the success of such a metric, the Effective Pictorial Information
Capacity (EPIC), using results from subjective tests involving 32 test scenes replicated with various degrees of sharpness
and noisiness. The metric was found to be a good predictor when tested against average ratings but, as expected by
device-dependent metrics, it predicted less successfully the perceived quality of individual, non-standard scenes with
atypical spatial and structural content. Improvement in predictions was attempted by using a modular image quality
framework and its implementation with the EPIC metric. It involves modeling a complicated set of conditions, including
classifying scenes into a small number of groups. The scene classification employed for the purpose uses objective scene
descriptors which correlate with subjective criteria on scene susceptibility to sharpness and noisiness. The
implementation thus allows automatic grouping of scenes and calculation of the metric values. Results indicate that
model predictions were improved. Most importantly, they were shown to correlate equally well with subjective quality
scales of standard and non-standard scenes. The findings indicate that a device-dependent, scene-dependent image
quality model can be achieved.
Psychophysical image quality assessments have shown that subjective quality depended upon the pictorial content of the
test images. This study is concerned with the nature of scene dependency, which causes problems in modeling and
predicting image quality. This paper focuses on scene classification to resolve this issue and used K-means clustering to
classify test scenes. The aim was to classify thirty two original test scenes that were previously used in a psychophysical
investigation conducted by the authors, according to their susceptibility to sharpness and noisiness. The objective scene
classification involved: 1) investigation of various scene descriptors, derived to describe properties that influence image
quality, and 2) investigation of the degree of correlation between scene descriptors and scene susceptibility parameters.
Scene descriptors that correlated with scene susceptibility in sharpness and in noisiness are assumed to be useful in the
objective scene classification. The work successfully derived three groups of scenes. The findings indicate that there is a
potential for tackling the problem of sharpness and noisiness scene susceptibility when modeling image quality. In
addition, more extensive investigations of scene descriptors would be required at global and local image levels in order
to achieve sufficient accuracy of objective scene classification.
Sorting and searching operations used for the selection of test images strongly affect the results of image quality
investigations and require a high level of versatility. This paper describes the way that inherent image properties, which
are known to have a visual impact on the observer, can be used to provide support and an innovative answer to image
selection and classification. The selected image properties are intended to be comprehensive and to correlate with our
perception. Results from this work aim to lead to the definition of a set of universal scales of perceived image properties
that are relevant to image quality assessments.
The initial prototype built towards these objectives relies on global analysis of low-level image features. A
multidimensional system is built, based upon the global image features of: lightness, contrast, colorfulness, color
contrast, dominant hue(s) and busyness. The resulting feature metric values are compared against outcomes from
relevant psychophysical investigations to evaluate the success of the employed algorithms in deriving image features that
affect the perceived impression of the images.
Psychophysical scaling is commonly based on the assumption that the overall quality of images is based on the
assessment of individual attributes which the observer is able to recognise and separate, i.e. sharpness, contrast, etc.
However, the assessment of individual attributes is a subject of debate, since they are unlikely to be independent from
each other.
This paper presents an experiment that was carried to derive individual perceptual attribute interval scales from overall
image quality assessments, therefore examine the weight of each individual attribute to the overall perceived quality. A
psychophysical experiment was taken by fourteen observers. Thirty two original images were manipulated by adjusting
three physical parameters that altered image blur, noise and contrast. The data were then arranged by permutation, where
ratings for each individual attribute were averaged to examine the variation of ratings in other attributes.
The results confirmed that one JND of added noise and one JND of added blurring reduced image quality more than did
one JND in contrast change. Furthermore, they indicated that the range of distortion that was introduced by blurring
covered the entire image quality scale but the ranges of added noise and contrast adjustments were too small for
investigating the consequences in the full range of image quality. There were several interesting tradeoffs between noise,
blur and changes in contrast. Further work on the effect of (test) scene content was carried out to objectively reveal
which types of scenes were significantly affected by changes in each attribute.
This paper describes an investigation of changes in image appearance when images are viewed at different image sizes
on a high-end LCD device. Two digital image capturing devices of different overall image quality were used for
recording identical natural scenes with a variety of pictorial contents. From each capturing device, a total of sixty four
captured scenes, including architecture, nature, portraits, still and moving objects and artworks under various
illumination conditions and recorded noise level were selected. The test set included some images where camera shake
was purposefully introduced. An achromatic version of the image set that contained only lightness information was
obtained by processing the captured images in CIELAB space. Rank order experiments were carried out to determine
which image attribute(s) were most affected when the displayed image size was altered. These evaluations were carried
out for both chromatic and achromatic versions of the stimuli. For the achromatic stimuli, attributes such as contrast,
brightness, sharpness and noisiness were rank-ordered by the observers in terms of the degree of change. The same
attributes, as well as hue and colourfulness, were investigated for the chromatic versions of the stimuli. Results showed
that sharpness and contrast were the two most affected attributes with changes in displayed image size. The ranking of
the remaining attributes varied with image content and illumination conditions. Further, experiments were carried out to
link original scene content to the attributes that changed mostly with changes in image size.
Colour information is not faithfully maintained by a CCTV imaging chain. Since colour can play an important role in
identifying objects it is beneficial to be able to account accurately for changes to colour introduced by components in the
chain. With this information it will be possible for law enforcement agencies and others to work back along the imaging
chain to extract accurate colour information from CCTV recordings.
A typical CCTV system has an imaging chain that may consist of scene, camera, compression, recording media and
display. The response of each of these stages to colour scene information was characterised by measuring its response to
a known input. The main variables that affect colour within a scene are illumination and the colour, orientation and
texture of objects. The effects of illumination on the appearance of colour of a variety of test targets were tested using
laboratory-based lighting, street lighting, car headlights and artificial daylight. A range of typical cameras used in CCTV
applications, common compression schemes and representative displays were also characterised.
The paper is focused on the implementation of a modular color image difference model, as described in [1], with aim to predict visual magnitudes between pairs of uncompressed images and images compressed using lossy JPEG and JPEG 2000. The work involved programming each pre-processing step, processing each image file and deriving the error map, which was further reduced to a single metric. Three contrast sensitivity function implementations were tested; a
Laplacian filter was implemented for spatial localization and the contrast masked-based local contrast enhancement method, suggested by Moroney, was used for local contrast detection. The error map was derived using the CIEDE2000 color difference formula on a
pixel-by-pixel basis. A final single value was obtained by calculating the median value of the error map. This metric was finally tested against relative quality differences between original and compressed images, derived from psychophysical investigations on the same dataset. The outcomes revealed a grouping of images which was attributed to correlations between the busyness of the test scenes (defined as image property indicating the presence or absence of high frequencies) and different clustered results. In conclusion, a method for accounting for the amount of detail in test is required for a more accurate prediction of image quality.
The measurement of the MTF of JPEG 6b and similar compression systems has remained challenging due to their nonlinear
and non-stationary nature. Previous work has shown that it is possible to estimate the effective MTF of the system
by calculating an 'average' MTF using a noise based technique. This measurement essentially provides an
approximation of the linear portion of the MTF and has been argued as being representative of the global pictorial effect
of the compression system.
This paper presents work that calculates an effective line spread function for the compression system by
utilizing the derived MTFs for JPEG 6b. These LSFs are then combined with estimates of the noise in the compression
system to yield an estimate for the Effective Pictorial Information Capacity (EPIC) of the system. Further modifications
are made to the calculations to allow for the size and viewing distances of the images to yield a simple image quality
metric. The quality metric is compared with previous data generated by Ford using Bartens' Square Root Integral with
Noise and Jacobson and Topfers' Perceived Information Capacity. The metric is further tested against subjective results,
derived using categorical scaling methods, for a number of scenes that are subjected to various amounts of photographic-type
distortion. Despite its simplicity, EPIC is shown to provide correlation with results from subjective
experimentation. Further improvements are also considered.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.