The patent's inventor is Duvivier, Christian L. (
This patent was filed on
From the background information supplied by the inventors, news correspondents obtained the following quote: "Video codecs (COmpressor-DECompressor) are compression algorithms designed to encode/compress and decode/decompress video data streams to reduce the size of the streams for faster transmission and smaller storage space. While lossy, video codecs attempt to maintain video quality while compressing the binary data of a video stream. Video codecs are typically implemented in both hardware and software. Examples of popular video codecs are MPEG-4, AVI, WMV, RM, RV, H.261, H.263, and H.264.
"A video stream is comprised of a sequence of video frames where each frame is comprised of multiple macroblocks. A video codec encodes each frame in the sequence by dividing the frame into slices or sub-portions, each slice containing an integer number of macroblocks. Each macroblock is typically a 16.times.16 array of luminance pixels, although other sizes of macroblocks are also possible. The number of macroblocks per slice (i.e., slice size) and number of slices per frame (i.e., slice number) is determined by the video codec. Typically, the video frame is divided into even sized slices so that each slice contains the same number of macroblocks. A slice can be measured by the percentage of the frame that the slice comprises. For example, a frame can be divided into five even slices where each slice comprises 20% of the frame.
"Frames are encoded in slices to allow the frame to be later decoded/decompressed using parallel multithread processing. In multithread processing, each thread performs a single task (such as decoding a slice) so that multiple tasks can be performed simultaneously, for example, by multiple central processing units (CPUs). By dividing a frame into multiple slices, two or more slices can be decoded/decompressed simultaneously by two or more threads/CPUs. Each slice is a considered a task unit that is put into a task list that is processed by a thread pool (a set of threads). A main thread (having the task of decoding an entire frame) and the thread pool need to synchronize after all the tasks in the task list have been processed (i.e., when all the slices of a frame have been decoded).
"There are, however, disadvantages to encoding a frame in slices as each slice has an amount of overhead. First, each slice requires a header that consumes memory and processing resources as it increases the encoding size and decoding time required for each frame. Second, predictive ability is lost across slice boundaries. Typically, macroblocks benefit from other macroblocks within the same slice in that information from other macroblocks can be used as predictive information for another macroblock. A macroblock in one slice, however, can not benefit from predictive information based on a macroblock in another slice. As such, the greater the number of slices per frame, the greater the amount of predictive loss per frame.
"The overhead of a frame slice must be considered when determining the slice size and slice number of a frame. Dividing a frame into fewer and larger slices reduces slice overhead but causes a higher typical idle time in the threads/CPUs that decode the slices (as discussed below in relation to FIGS. 1A-B). Whereas dividing a frame into numerous smaller slices causes a lower typical idle time in the threads/CPUs that decode the slices but increases slice overhead.
"FIG. 1A is a timing diagram illustrating the time required to decode two large slices comprising a video frame. A first slice is decoded by a first thread/CPU and a second slice is decoded by a second thread/CPU. The first and second slices each comprise 50% of the frame. Note that although the first and second slices are of equal size (i.e., contain the same number of macroblocks), due to processing variations, the first and second slices will be decoded at different rates so that the times for completing the decoding of the first and second slices vary. This is true even if it is assumed that the first and second slices have identical content (although typically the first and second slices have different content) and the first and second slices are processed by identical CPUs. Processing variations are caused, for example, by operating system and the other applications that are concurrently running on the system and 'stealing' processing cycles of the CPUs.
"Typically, each slice in the previous frame must be decoded before decoding of a next frame in the sequence can begin. This is due to the decoding methods of video codecs that use predictive information derived from previous frames thereby requiring the decoding of an entire previous frame before beginning the decoding of the next frame. As stated above, the main thread (having the task of decoding an entire frame) and the thread pool synchronize after all the slices of a frame have been decoded.
"As such, a thread/CPU (referred to herein as an 'idling' thread/CPU) that finishes decoding all of the slices assigned to the thread/CPU before other threads/CPUs experiences 'idle time,' i.e., a period of time that it does not decode a slice. 'Idle time' of a thread/CPU exists when the last slice in a frame to be decoded is in the process of being decoded by another thread/CPU and there are no additional slices in the frame to be decoded. In other words, when a thread in the thread pool cannot find a task (because the task list is empty), in order to synchronize with the other threads, it has to wait for the other threads to complete their respective tasks. In general, all but one thread/CPU in a set of threads/CPUs available for processing slices of a frame (referred to herein as decoding threads/CPUs) will experience 'idle time.' For example, for a set of four threads/CPUs, three of the four threads/CPUs will experience 'idle time' during the processing of a frame. The only thread/CPU in the set of threads/CPUs that will not experience 'idle time' (i.e., will always be busy) is the last thread/CPU to finish processing of all slices of the frame assigned to the thread/CPU (referred to herein as the 'non-idling' thread/CPU). The 'non-idling' thread/CPU in the set of threads/CPUs is random and varies for each frame.
"The duration of the 'idle time' of a thread/CPU begins when the thread/CPU finishes decoding the last slice assigned to the thread/CPU and ends when the last slice in the frame is decoded by the 'non-idling' thread/CPU (and hence the thread/CPU can begin decoding a slice of the next frame of the sequence). As such, the idle time of a CPU is determined, in large part, on the size of the last slice being decoded by the 'non-idling' thread/CPU: typically, the larger the size of the last slice, the longer the idle time of the CPU.
"In the example of FIG. 1A, there are two threads/CPUs available for decoding slices and each frame is divided into two slices each comprising 50% of the frame. Dividing a frame into such large slices reduces the amount of slice overhead but causes a higher typical idle time in the threads/CPUs. As shown in FIG. 1A, the first thread/CPU completes decoding of the slice before the second thread/CPU and experiences an idle time of duration x. In the example of FIG. 1B, a frame is divided into ten smaller slices each comprising 10% of the frame. Dividing a frame into such smaller slices reduces the typical idle time in the threads/CPUs but increases the amount of slice overhead. As shown in FIG. 1A, the first thread/CPU completes decoding all slices assigned to it before the second thread/CPU and experiences an idle time of duration y, where y is less than x.
"As such, there is a need for a method for determining the slice size of a frame in a multithread environment that both reduces slice overhead and reduces the typical idle time of the threads/CPUs decoding the slices.
"Also, in decoding an image frame, a deblocking/loop filter is used to reduce the appearance of macroblock borders in the image frame. As discussed above, a popular video codec is H.264. Typically however, during the filtering stage of the deblocking filter, macroblocks are processed/filtered sequentially with strict dependencies specified under the H.264 codec and are not processed/filtered in parallel using multithreading."
Supplementing the background information on this patent, VerticalNews reporters also obtained the inventor's summary information for this patent: "A method for dynamically determining frame slice sizes for a video frame in a multithreaded decoding environment is provided. In some embodiments, a frame of a video sequence is encoded and later decoded in uneven sized slices where the frame is divided into at least two different types of slices based on size, a large-type slice and a small-type slice. In some embodiments, a large-type slice is at least one and a half times larger than a small-type slice. In some embodiments, a large-type slice is at least two times larger than a small-type slice. In some embodiments, the large-type slices in total comprise 70-90% of the frame and the small-type slices in total comprise the remaining 10-30% of the frame. In some embodiments, slices of the same type may be different in size so that two large-type slices may have different sizes and/or two small-type slices may have different sizes. In some embodiments, the number of large-type slices is equal to the number of threads/CPUs that are available to decode the slices of the frame.
"In some embodiments, the large-type slices comprise slices of the frame configured to be assigned for decoding first, whereas small-type slices comprise slices of the frame configured to be assigned for decoding after large-type slices. In some embodiments, the large-type slices comprise the first/beginning slices of the frame where the small-type slices comprise the remainder of frame so that the large-type slices are assigned to threads/CPUs for decoding before the small-type slices.
"In some embodiments, the macroblock dependencies specified under the H.264 codec are manipulated in a way to allow multithreaded deblock filtering/processing of a video frame. In some embodiments, a first thread processes a first section of the frame and a second thread processes a second section in parallel, where the first section comprises macroblocks of the frame on one side of a diagonal line and the second section comprises macroblocks on the other side of the diagonal line. In some embodiments, the diagonal line is a line extending from a first corner of a sub-frame to a second corner of the sub-frame, the sub-frame comprising at least some of the blocks of the frame. In some embodiments, each section comprises one or more sub-sections, each sub-section of a section having an associated processing order that is determined by the position of the sub-section in the frame. In some embodiments, the frame is a luma frame having associated chroma frames where the chroma frames are processed during idle time experienced by the first and/or second thread in processing the luma frame."
For the URL and additional information on this patent, see: Duvivier, Christian L.. Multithread Processing of Video Frames. U.S. Patent Number 8804849, filed
Keywords for this news article include:
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Tablets, Cars Drive AT&T Gains
- 2015 Mazda MX-5 Miata Is Fast and Eager
- Small Businesses Add 3 More Worries to Their List
- DOMA Tech Adding Jobs to Process VA Claims
- Apple Warns of China iCloud Attack
- Job Hunting Is Hard Work
- Tech Firms Flock to LA's 'Silicon Beach'
- Stocks Subdued After Gains Earlier in Week
- Ford, GM Expect to Report Strong Profits
- Consumer Prices Edge Up, Surprising Economists