This document was downloaded over the internet from www.c-cube.com and slightly modified.
This chapter presents an overview of the Moving Picture Experts Group (MPEG) standard that is implemented by the CL480. The standard is officially known as ISO/IEC Standard, Coded Representation of Picture, Audio and Multimedia/hypermedia Information, ISO 11172. It is more commonly referred to as the MPEG-1 standard.
MPEG addresses the compression, decompression and synchronization of video and audio signals. The MPEG video algorithm can compress video signals to an average of about 1/2 to 1 bit per coded pixel. At a compressed data rate of 1.2 Mbps, a coded resolution of 352 x 240 at 30 Hz is often used, and the resulting video quality is comparable to VHS. Image quality can be significantly improved by using a more highly compressed data rate (for example, 2 Mbps) without changing the coded resolution.
MPEG Stream Structure
In its most general form, an MPEG system stream is made up of two layers:
The system layer contains timing and other information needed to demultiplex the audio and video streams and to synchronize audio and video during playback.
The compression layer includes the audio and video streams.
General Decoding Process
Figure 2-1 shows a generalized decoding system for the audio and video streams.
The system decoder extracts the timing information from the MPEG system stream and sends it to the other system components. (The Synchronization section has more information about the use of timing information for audio and video synchronization.) The system decoder also demultiplexes the video and audio streams from the system stream; then sends each to the appropriate decoder.
The video decoder decompresses the video stream as specified in Part 2 of the MPEG standard. (See Inter-Picture Coding section and Intra-picture (Transform) Coding section for more information about video compression.)
The audio decoder decompresses the audio stream as specified in Part 3 of the MPEG standard.
Figure 2-1 General MPEG Decoding System
Video Stream Data Hierarchy
The MPEG standard defines a hierarchy of data structures in the video stream as shown schematically in Figure 2-2
Figure 2-2 MPEG Data Hierarchy
Begins with a sequence header (may contain additional sequence headers), includes one or more groups of pictures, and ends with an end-of-sequence code.
Group of Pictures (GOP)
A header and a series of one or more pictures intended to allow random access into the sequence.
The primary coding unit of a video sequence. A picture consists of three rectangular matrices representing luminance (Y) and two chrominance (Cb and Cr) values. The Y matrix has an even number of rows and columns. The Cb and Cr matrices are one-half the size of the Y matrix in each direction (horizontal and vertical).
Figure 2-3 shows the relative x-y locations of the luminance and chrominance components. Note that for every four luminance values, there are two associated chrominance values: one Cb value and one Cr value. (The location of the Cb and Cr values is the same, so only one circle is shown in the figure.)
Figure 2-3 Location of Luminance and Chrominance Values
One or more contiguous macroblocks. The order of the macroblocks within a slice is from left-to-right and top-to-bottom.
Slices are important in the handling of errors. If the bit stream contains an error, the decoder can skip to the start of the next slice. Having more slices in the bit stream allows better error concealment, but uses bits that could otherwise be used to improve picture quality.
A 16-pixel by 16-line section of luminance components and the corresponding 8-pixel by 8-line section of the two chrominance components. See Figure 2-3 for the spatial location of luminance and chrominance components. A macroblock contains four Y blocks, one Cb block and one Cr block as shown in Figure 2-4. The numbers correspond to the ordering of the blocks in the data stream, with block 1 first.
Figure 2-4 Macroblock Composition
A block is an 8-pixel by 8-line set of values of a luminance or a chrominance component. Note that a luminance block corresponds to one-fourth as large a portion of the displayed image as does a chrominance block.
Audio Stream Data Hierarchy
The MPEG standard defines a hierarchy of data structures that accept, decode and produce digital audio output. The MPEG audio stream, like the MPEG video stream, consists of a series of packets. Each audio packet contains an audio packet header and one or more audio frames as shown in Figure 2-5.
Figure 2-5 Audio Stream Structure
Each audio packet header contains the following information:
Packet start code - Identifies the packet as being an audio packet
Packet length - Indicates the number of bytes in the audio packet.
An audio frame contains the following information:
Audio frame header - Contains synchronization, ID, bit rate, and sampling frequency information
Error-checking code - Contains error-checking information
Audio data - Contains information used t o reconstruct the sampled audio data.
Ancillary data - Contains user-defined data.
Much of the information in a picture within a video sequence is similar to information in a previous or subsequent picture. The MPEG standard takes advantage of this temporal redundancy by representing some pictures in terms of their differences from other (reference) pictures, or what is known as inter-picture coding. This section describes the types of coded pictures and explains the techniques used in this process.
The MPEG standard specifically defines three types of pictures: intra, predicted, and bidirectional.
Intra pictures, or I-pictures, are coded using only information present in the picture itself. I-pictures provide potential random access points into the compressed video data. I-pictures use only transform coding (as explained in the Intra-picture (Transform) Coding section) and provide moderate compression. I-pictures typically use about two bits per coded pixel.
Predicted pictures, or P-pictures, are coded with respect to the nearest previous I- or P-picture. This technique is called forward prediction and is illustrated in Figure 2-6.
Like I-pictures, P-pictures serve as a prediction reference for B-pictures and future P-pictures. However, P-pictures use motion compensation (see the Motion Compensation section) to provide more compression than is possible with I-pictures. Unlike I-pictures, P-pictures can propagate coding errors because P-pictures are predicted from previous reference (I- or P-) pictures.
Figure 2-6 Forward Prediction
Bidirectional pictures, or B-pictures, are pictures that use both a past and future picture as a reference. This technique is called bidirectional prediction and is illustrated in Figure 2-7. B-pictures provide the most compression and do not propagate errors because they are never used as a reference. Bidirectional prediction also decreases the effect of noise by averaging two pictures.
Figure 2-7 Bidirectional Prediction
Video Stream Composition
The MPEG algorithm allows the encoder to choose the frequency and location of I-pictures. This choice is based on the application's need for random accessibility and the location of scene cuts in the video sequence. In applications where random access is important, I-pictures are typically used two times a second.
The encoder also chooses the number of B-pictures between any pair of reference (I- or P-) pictures. This choice is based on factors such as the amount of memory in the encoder and the characteristics of the material being coded. For example, a large class of scenes have two bidirectional pictures separating successive reference pictures. A typical arrangement of I-, P-, and B-pictures is shown in Figure 2-8 in the order in which they are displayed.
Figure 2-8 Typical Display Order of Picture Types
The MPEG encoder reorders pictures in the video stream to present the pictures to the decoder in the most efficient sequence. In particular, the reference pictures needed to reconstruct B-pictures are sent before the associated B-pictures. Figure 2-9 demonstrates this ordering for the first section of the example shown above.
Figure 2-9 Video Stream versus Display Ordering
Motion compensation is a technique for enhancing the compression of P- and B-pictures by eliminating temporal redundancy. Motion compensation typically improves compression by about a factor of three compared to intra-picture coding. Motion compensation algorithms work at the macroblock level.
When a macroblock is compressed by motion compensation, the compressed file contains this information:
The spatial vector between the reference macroblock(s) and the macroblock being coded (motion vectors)
The content differences between the reference macroblock(s) and the macroblock being coded (error terms)
Not all information in a picture can be predicted from a previous picture. Consider a scene in which a door opens: The visual details of the room behind the door cannot be predicted from a previous frame in which the door was closed. When a case such as this arises--i.e., a macroblock in a P-picture cannot be efficiently represented by motion compensation--it is coded in the same way as a macroblock in an I-picture using transform coding techniques (see Intra-picture (Transform) Coding Section).
The difference between B- and P-picture motion compensation is that macroblocks in a P-picture use the previous reference (I- or P-picture) only, while macroblocks in a B-picture are coded using any combination of a previous or future reference picture.
Four codings are therefore possible for each macroblock in a B-picture:
Intra coding: no motion compensation
Forward prediction: the previous reference picture is used as a reference
Backward prediction: the next picture is used as a reference
Bidirectional prediction: two reference pictures are used, the previous reference picture and the next reference picture
Backward prediction can be used to predict uncovered areas that do not appear in previous pictures.
Intra-picture (Transform) Coding
The MPEG transform coding algorithm includes these steps:
Discrete cosine transform (DCT)
Both image blocks and prediction-error blocks have high spatial redundancy. To reduce this redundancy, the MPEG algorithm transforms 8 x 8 blocks of pixels or 8 x 8 blocks of error terms from the spatial domain to the frequency domain with the Discrete Cosine Transform (DCT).
Next, the algorithm quantizes the frequency coefficients. Quantization is the process of approximating each frequency coefficient as one of a limited number of allowed values. The encoder chooses a quantization matrix that determines how each frequency coefficient in the 8 x 8 block is quantized. Human perception of quantization error is lower for high spatial frequencies, so high frequencies are typically quantized more coarsely (i.e., with fewer allowed values) than low frequencies.
The combination of DCT and quantization results in many of the frequency coefficients being zero, especially the coefficients for high spatial frequencies. To take maximum advantage of this, the coefficients are organized in a zigzag order to produce long runs of zeros (see Figure 2-10). The coefficients are then converted to a series of run-amplitude pairs, each pair indicating a number of zero coefficients and the amplitude of a non-zero coefficient. These run-amplitude pairs are then coded with a variable-length code, which uses shorter codes for commonly occurring pairs and longer codes for less common pairs.
Some blocks of pixels need to be coded more accurately than others. For example, blocks with smooth intensity gradients need accurate coding to avoid visible block boundaries. To deal with this inequality between blocks, the MPEG algorithm allows the amount of quantization to be modified for each macroblock of pixels. This mechanism can also be used to provide smooth adaptation to a particular bit rate.
Figure 2-10 Transform Coding Operations
The MPEG standard provides a timing mechanism that ensures synchronization of audio and video. The standard includes two parameters: the system clock reference (SCR) and the presentation time stamp (PTS).
The MPEG-specified ``system clock'' runs at 90 KHz. System clock reference and presentation time stamp values are coded in MPEG bit streams using 33 bits, which can represent any clock cycle in a 24-hour period.
System Clock References
An SCR is a snapshot of the encoder system clock which is placed into the system layer of the bit stream, as shown in Figure 2-11. During decoding, these values are used to update the system clock counter in the CL480.
Figure 2-11 SCR Flow in MPEG System
Presentation Time Stamps
Presentation time stamps are samples of the encoder system clock that are associated with video or audio presentation units. A presentation unit is a decoded video picture or a decoded audio time sequence. The PTS represents the time at which the video picture is to be displayed or the starting playback time for the audio time sequence.
The decoder either skips or repeats picture displays to ensure that the PTS is within one picture's worth of 90 KHz clock tics of the SCR when a picture is displayed. If the PTS is earlier (has a smaller value) than the current SCR, the decoder discards the picture. If the PTS is later (has a larger value) than the current SCR, the decoder repeats the display of the picture.