- QoS Mapping for Fine Granular Scalability with Base Layer Scaling
- Masayuki Inoue (NTT Corporation, Japan); Katsuhiko Fukazawa (NTT Corporation, Japan); Shigetaro Iwatsu (NTT Corporation, Japan); Yoshiyuki Yashima (NTT Corporation, Japan)
Scalable video coding is receiving increasing attention. However, few papers address QoS mapping with regard to scalable video coding. This paper uses Fine Granular Scalability with base layer scaling that offers application-level QoS control. An experiment is conducted that uses the DSIS method to subjectively assess Fine Granular Scalability. The results show that the maximum of the mean grading point fell as the base-layer Size value decreased. Our results also show that Fine Granular Scalability provides only SNR scalability, not subjective image quality scalability. A multiple regression analysis shows that we can establish QoS mapping between user and application-levels by using both application-level QoS parameters and a feasible human factor. Furthermore, we show that FGS can adjust to not only a wide range of channel capacities but also a wide variety of users by using the indicated QoS mapping.
Masayuki Inoue received the B.E. and M.E. degrees in electrical engineering from Tokyo University of Science in Japan in 1994 and 1996, respectively. He joined NTT Laboratories in 1996 and has been engaged in 3D CyberSpace communication systems. He is currently working on scalable video coding systems.
- Quality evaluation model using local features of still picture
- Yuukou Horita (University of Toyama, Japan)
The objective picture quality evaluation model for coded still picture without using the reference is very useful for quality oriented image compression. In this paper, a new objective no-reference (NR) picture quality evaluation model for JPEG is presented, which is easy to calculate and applicable to various image coding applications. The proposed model is based on the local features of the picture such as edge, flat and texture area and also on the blockiness, activity measures, and zero-crossing rate within block of the picture. Our experiments on various picture distortion types indicate that it performs significantly better than the conventional model.
- An Adaptive Error Concealment Mechanism for H.264 Encoded Low-Resolution Video Streaming
- Olivia Nemethova (Vienna University of Technology, Austria); Ameen Al-Moghrabi (Vienna University of Technology, Austria); Markus Rupp (TU Wien, Austria)
H.264 video codec is well suited for the real-time error resilient transport over packet oriented networks. In real-time communications, lost packets at the receiver cannot be avoided. Therefore, it is essential to design efficient error concealment methods which allow to visually reduce the degradation caused by the missing information. Each method has its own quality of reconstruction. We implemented various efficient error concealment techniques and investigated their performance in different scenarios. As a result, we propose and evaluate an adaptive error concealment mechanism that accomplishes both -- good performance and low complexity enabling the deployment for mobile video streaming applications. This mechanism selects suitable error concealment method according to the amount of instantaneous spatial and temporal information of the video sequence and according to the type of the frame.
- An Adaptive Color Transform Approach and its Application in 4:4:4 Video Coding
- Detlev Marpe (Fraunhofer HHI, Germany); Heiner Kirchhoffer (Fraunhofer HHI, Germany); Valeri George (Fraunhofer HHI, Germany); Peter Kauff (Fraunhofer HHI, Germany); Thomas Wiegand (HHI/FhG, Germany)
This paper deals with an approach that extends block-based video-coding techniques by an adaptive color space trans-form. The presented technique allows the encoder to switch between two given color space representations with the ob-jective to maximize the overall rate-distortion gain. Simula-tions based on the current draft of the H.264/MPEG4-AVC 4:4:4 extensions demonstrate that our technique guarantees a rate-distortion performance equal or better than that ob-tained when coding in any of the two fixed color spaces.
- Spatio-Temporal Filter for ROI Video Coding
- Linda Karlsson (Mid Sweden University, Sweden); Mårten Sjöström (Mid Sweden University, Sweden); Roger Olsson (Mid Sweden University, Sweden)
Reallocating resources within a video sequence to the regions-of-interest increases the perceived quality at limited bandwidths. In this paper we combine a spatial filter with a temporal filter, which are both codec and standard independent. This spatio-temporal filter removes resources from both the motion vectors and the prediction error with a computational complexity lower than the spatial filter by itself. This decreases the bit rate by 30-50% compared to coding the original sequence using H.264. The released bits can be used by the codec to increase the PSNR of the ROI by 1.58 4.61 dB, which is larger than for the spatial and temporal filters by themselves.
- Pdf sharpening for multichannel predictive coders
- Omer Gerek (Anadolu University, Turkey)
Predictive coders that split the prediction decision into con-texts depending on the local image behaviour have proved to be practically useful and successful in image coding appli-cations. Such predictive coders can be named as multi-channel. LOCO is a simple, yet successful example of such coders. Due to its success, a fair amount of attention has been paid for the improvement of multichannel predictive coders. The common task for these coders is to split the pixel layout around the pixel of interest into a list of contexts or prediction rules that specifically succeeds in predicting the value in a reasonable way. The improvement proposed in this work is due to the well known observation that the pre-diction error pdfs are not identically or evenly distributed for each channel output. Although several methods have been proposed for the compensation of this situation, they mostly perturb the low complexity behaviour. In this work, it is shown that a two-pass coder is a simple, yet efficient im-provement that perfectly determines channel pdf bias amounts, and the adjustment produces up to 5% compres-sion improvement over the test images.
- An Improved Error Concealment Strategy Driven by Scene Motion Properties for H.264/AVC Decoders
- Susanna Spinsante (Polytechnic University of Marche, Italy); Ennio Gambi (Università Politecnica delle Marche, Italy); Franco Chiaraluce (Universita' Politecnica delle Marche, Italy)
This paper deals with the possibility of improving the concealment
effectiveness of an H.264 decoder, by means of the integration of
a scene change detector. This way, the selected
recovering strategy is driven by the detection of a change in the
scene, rather than by the coding features of each frame. The scene
detection algorithm under evaluation has been chosen from the
technical literature, but a deep analysis of its performance, over
a wide range of video sequences having different motion
properties, has allowed the suggestion of simple but effective
modifications, which provide better results in terms of final
perceived video quality.
Susanna Spinsante received the “Laurea in Ingegneria Elettronica†from the Università Politecnica delle Marche in June 2002. Since November 2002 she joined the Dipartimento di Elettronica, Intelligenza artificiale e Telecomunicazioni of the Università Politecnica delle Marche in Ancona, where she obtained her Ph. D. degree in Electronic Engineering and Telecommunication in December 2005. From August 2004 to February 2005 she has been at the Department of Informatics, University of Bergen (Norway) under a FASTSEC Marie Curie Training Site contract on Coding Theory and Cryptography. At present, she is involved in research activities on video coding systems optimization, source and channel coding, encryption and authentication algorithms. Since December 2003 she is IEEE member.
- H.264 Encoding of Videos with Large Number of Shot Transitions Using Long-Term Reference Pictures
- Nukhet Ozbek (Ege University, Turkey); A. Murat Tekalp (Koc University, Turkey)
Long-term reference prediction is an important feature of the H.264 standard, which provides a trade-off between gain and complexity. A simple long-term reference selection method is presented for videos with frequent shot/view tran-sitions in order to optimize compression efficiency at the shot boundaries. Experimental results show up to 50% reduction in the number of bits, at the same PSNR, for frames at the border of transitions.
- Embedded Image Processing/Compression For High-Speed CMOS Sensor
- Romuald Mosqueron (University of Burgundy, France); Julien Dubois (university of Burgundy, France); Michel Paindavoine (Université de Bourgogne, France)
High-speed video cameras are powerful tools for investigating for instance the biomechanics analysis or the movements of mechanical parts in manufacturing processes. In the past years, the use of CMOS sensors instead of CCDs has made possible the development of high-speed video cameras offering digital outputs, readout flexibility and lower manufacturing costs. In this paper, we proposed a high-speed camera based on CMOS sensor with embedded processing. Two types algorithms have been implemented. The compression algorithm represents the first class for our camera and allows to transfer images using serial link output. The second type is dedicated to feature extraction like edge detection, markers extraction, or image analysis, wavelet analysis and object tracking. These image processing algorithms have been implemented into a FPGA embedded inside the camera. This FPGA technology allows us to process in real time 500 images per second with a 1,280H ×1,024V resolution. Keywords: CMOS Image Sensor, FPGA, Image Compression, High-speed Video.
- Improving Wyner-Ziv Video coding by block-based distortion estimation
- Zouhair Belkoura (Technische Universität Berlin, Germany); Thomas Sikora (Technische Universität Berlin, Germany)
Conventional video coding uses motion estimation to perform adaptive linear predictive encoding. Wyner-Ziv coding does not use predictive coding but performs motion estimation at the decoder. Recent work uses a difference signal at the encoder to estimate the prediction quality at the decoder. In this paper, we recognise that this operation constitutes a step in the motion estimation process. We exploit this information by omitting suitable blocks, effectively implementing linear predictive coding with deadzone quantisation for parts of the input signal. This modified Wyner-Ziv coding results in large bitrate reductions as well as significant decoding complexity decrease. At certain bitrates, our modified Wyner-Ziv codec outperforms conventional hybrid coding in an I-B-I-B setup.
Zouhair Belkoura received the MEng degree in Electrical and Electronic Engineering from Imperial College London (UK) in 2002. Since September 2002 he has been working as "Wissenschaftlicher Mitarbeiter" in the Communication Systems Group at TU Berlin (Germany) where he is also pursuing the Dr.-Ing. degree. Z Belkouras research interests currently include Channel Coding and (Distributed) Source Coding of video data.
- A low-complexity multiple description video coder based on 3D-transforms
- Andrey Norkin (Tampere University of Technology, Finland); Atanas Gotchev (Tampere University of Technology, Finland); Karen Egiazarian (Tampere University of Technology, Finland); Jaakko Astola (Tampere University of Technology, Finland)
The paper presents a multiple description (MD) video coder based on three-dimensional (3D) transforms. The coder has low computational complexity and high robustness to transmission errors and is targeted to mobile devices. The encoder represents video sequence in form of coarse sequence approximation (shaper) included in both descriptions and residual sequence (details) split between two descriptions. The shaper is obtained by block-wise pruned 3D-DCT. The residual sequence is coded by 3D-DCT or hybrid 3D-transform. The coding scheme is simple and yet outperforms some plain MD coders based on H.263 in lossy environment, especially in low-redundancy region.
Andrey Norkin received his M.Sc. degree in computer science from the Ural State Technical University, Ekaterinburg, Russia, in 2001 and Lic.Tech. degree in signal processing from Tampere University of Technology (TUT), Tampere, Finland, in 2005.
Currently, he is with the Institute of Signal Processing, TUT, where he is working as researcher and towards a Ph.D degree. His research interests include multiple description coding, image and video compression, stereoscopic and multi-view coding, and 3-D meshes.
- Adaptive Interpolation Algorithm for Fast and Efficient Video Encoding in H.264
- Gianluca Bailo (University of Genova, Italy); Massimo Bariani (University of Genova, Italy); Chiappori Andrea (University of Genova, Italy); Riccardo Stagnaro (University of Genova, Italy)
H.264/MPEG-4 AVC is the latest video-coding standard jointly developed by VCEG (Video Coding Experts Group of ITU-T and MPEG (Moving Picture Experts Group) of ISO/IEC. It uses state of the art video signals algorithms providing enhanced efficiency, compared with previous standards, for a wide range of applications including video telephony, video conferencing, video surveillance, storage, streaming video, digital video editing and creation, digital cinema and others. In order to reduce the bitrate of the video signal in H.264, the ISO and ITU coding standards use a ¼ pel displacement resolution. A 6-tap Weiner filter is utilized to obtain half-pel samples, which are then averaged in order to achieve the quarter-pel interpolation. H.264 saves 50% bit-rate maintaining the same quality if compared with existing video coder standards, but such a result demands additional computational complexity. In this paper, we propose an algorithm for the reduction of the interpolation computational time. The goal is to adapt the H.264 ¼ pel interpolation to the complexity of the video stream to encode, on the basis of our motion detection algorithm. The proposed solution allows to decrease the overall encoder complexity both in low and high complex sequences. This paper illustrates the integration of the H.264 encoder with our motion detection algorithm for the development of an adaptive interpolation. The obtained results are compared with the jm86 standard interpolation using different quantization values.
- Efficient Image Registration With Subpixel Accuracy
- Irene Karybali (University of Patras, Greece); Emmanouil Psarakis (University of Patras, Greece); Kostas Berberidis (University of Patras, Greece); Georgios Evangelidis (, ? )
The contribution of this paper is twofold. First, a new spatial domain image registration technique with subpixel accuracy is presented. This technique is based on a double maximization of the correlation coefficient and provides a closed-form solution to the subpixel translation estimation problem. Second, an efficient iterative scheme for integer registration is proposed, which reduces significantly the number of searches, as compared to the exhaustive search. This scheme can be used as a pre-processing step in the sub-pixel accuracy technique, leading to lower computational complexity. Extensive simulation results have shown that the performance of the proposed technique compares very favorably with respect to existing ones.
- H.264 video coding for low bit rate error prone channels: an application to Tetra systems
- Stefania Colonnese (Università "La Sapienza" di Roma, Italy); Alessandro Piccoli (University of Rome "La Sapienza", Italy); Claudio Sansone (University of Rome "La Sapienza", Italy); Gaetano Scarano (Università "La Sapienza" di Roma, Italy)
This work investigates a H.264 coding scheme for video transmission over packet networks characterized by heavy packet losses and low available bitrate, such as those en-countered on channels that were originally designed for voice and limited data services. The H.264 resilient coding tools such Flexible Macroblock Ordering, Redundant Slices and Arbitrary Slice Ordering are here tuned in order to adapt the application layer parameters to the physical lay-ers characteristics. Due to the limited bandwidth, the tools are differentiated on a Region Of Interest (ROI). Moreover, the Redundant Slices tool is integrated by suitable applica-tion level interleaving to counteract the bursty nature of the errors. The performances of the codec design choices are assessed on a TETRA communication channel, that is quite challenging due to both limited bandwidth and severe error conditions. However, the illustrated codec design criteria can be adopted in different low bit-rate, error prone chan-nels.
- Spatio-temporal selective extrapolation for 3-D signals applied to concealment in video communications
- Katrin Meisinger (University of Erlangen-Nuremberg, Germany); Sandra Martin (University of Erlangen-Nuremberg, Germany); André Kaup (University of Erlangen-Nürnberg, Germany)
In this paper we derive a frequency selective extrapolation method for three-dimensional signals. Extending a signal beyond a limited number of known samples is commonly referred to as signal extrapolation. We provide an extrapolation technique which enables to estimate image areas by exploiting simultaneously spatial and temporal correlations of the video signal. Lost areas caused by transmission errors are concealed by extrapolation from the surrounding. The missing areas in the video sequence are estimated conventionally from either the spatial or temporal surrounding. Our approach approximates the known signal by a weighted linear combination of 3-D basis functions from spatial as well as temporal direction and extrapolates it into the missing area. The algorithm is able to extrapolate smooth and structured areas and to inherently compensate motion and changes in luminance from frame to frame.
- A scalable SPIHT-based multispectral image compression
- Fouad Khelifi (Queen's University Belfast, United Kingdom); Ahmed Bouridane (Queen's University, United Kingdom); Fatih Kurugollu (Queen's University Belfast, United Kingdom)
This paper addresses the compression of multispectral images which can be viewed, at the encoder side, as a three-dimensional (3D) data set characterized by a high correlation through the successive bands. Recently, the celebrated 3D-SPIHT (Sets Partitioning In Hierarchical Trees) algorithm has been widely adopted in the literature for the coding of multispectral images because of its proven state-of-the art performance. In order to exploit the spectral redundancy in the 3D wavelet transform domain, a new scalable SPIHT based multispectral image compression technique is proposed. The rational behind this approach is that image components in two consecutive transformed bands are significantly dependent in terms of zerotrees locations in the 3D-DWT domain. Therefore, by joining the trees with the same location into the List of Insignificant Sets (LIS), a considerable amount of bits can be reduced in the sorting pass in comparison with the separate encoding of the transformed bands. Numerical experiments on two sample multispectral images show a highly better performance of the proposed technique when compared to the conventional 3D-SPIHT.
- Temporal And Spatial Scaling For Stereoscopic Video Compression
- Anil Aksay (Middle East Technical University, Turkey); Cagdas Bilen (Middle East Technical University, Turkey); Engin Kurutepe (Koc University, Turkey); Tanir Ozcelebi (Koc University, Turkey); Gozde Bozdagi Akar (Middle East Technical University, Turkey); Reha Civanlar (Koc University, Turkey); A. Murat Tekalp (Koc University, Turkey)
In stereoscopic video, it is well-known that compression efficiency can be improved, without sacrificing PSNR, by predicting one view from the other. Moreover, additional gain can be achieved by subsampling one of the views, since the Human Visual System can perceive high frequency information from the other view. In this work, we propose subsampling of one of the views by scaling its temporal rate and/or spatial size at regular intervals using a real-time stereoscopic H.264/AVC codec, and assess the subjective quality of the resulting videos using DSCQS test methodology. We show that stereoscopic videos can be coded at a rate about 1.2 times that of monoscopic videos with little visual quality degradation.
- Linear and Nonlinear Temporal Prediction Employing Lifting Structures for Scalable Video Coding
- Behcet Toreyin (Bilkent University, Turkey); Maria Trocan (ENST, France); Béatrice Pesquet (Ecole Nationale Supérieure des Télécommunications, France); Enis Çetin (Bilkent University, Turkey)
Scalable 3D video codecs based on wavelet lifting structures have attracted recently a lot of attention, due to their compression performance comparable with that of state-of-art hybrid codecs. In this work, we propose a set of linear and nonlinear predictors for the temporal prediction step in lifting implementation. The predictor uses pixels on the motion trajectories of the frames in a window around the pixel to be predicted to improve the quality of prediction. Experimental results show that the video quality as well as PSNR values are improved with the proposed prediction method.
- An H.264-Based Video Encoding Scheme for 3D TV
- Mahsa T. Pourazad (University of British Columbia, Canada); Panos Nasiopoulos (University of British Columbia, Canada); Rabab Ward (University of British Columbia, Canada)
This paper presents an H.264-based scheme for compress-ing 3D content captured by 3D depth range cameras. Exist-ing MPEG-2 based schemes take advantage of the correla-tion between the 2D video sequence and its corresponding depth map sequence, and use the 2D motion vectors (MV) for the depth video sequence as well. This improves the speed of encoding the depth map sequence, but it results in an increase in the bitrate or a drop in the quality of the re-constructed 3D video. This is found to be due to the MVs of the 2D video sequence not being the best choice for encod-ing some parts of the depth map sequence containing sharp edges or corresponding to distant objects. To solve this problem, we propose an H.264-based method which re-estimates the MVs and re-selects the appropriate modes for these regions. Experimental results show that the proposed method enhances the quality of the encoded depth map se-quence by an average of 1.77 dB. Finding the MVs of the sharp edge-included regions of the depth map sequence amounts to 30.64% of the computational effort needed to calculate MVs for the whole depth map sequence.