Introduction
[0022] FIG. 1B illustrates an example of conventional encoding multi-resolution video 121. An encoder 122 encodes the high resolution video to generate a high resolution stream 124 of bits representing encoded high resolution video frames. To accommodate older hardware not configured for high resolution video or to reduce bandwidth required for transmission during bandwidth congestion, the high resolution video is down-sampled, as indicated at 123 and the resulting down-sampled video 121' is encoded, e.g., by another encoder 122' to generate a stream of bits 124' representing encoded down-sampled video frames.
[0023] On the decoder side, illustrated in FIG. 1C, a decoder 132 receives the high resolution stream 124 and decodes it to generate high resolution output 126 in the form of decoded high resolution video frames. Devices not equipped to decode the high resolution stream may ignore the high resolution stream and receive and decode the down-sampled stream 124', e.g., using a different decoder 132'.
[0024] Aspects of the present disclosure allow for efficient video compression without utilizing extensions to a video coding standard. The approach described herein allows for high picture quality with lower bit usage compared to the existing method of encoding multiple resolutions of the same content as separate bitstreams. Instead of creating a separate bitstream at a higher resolution, the encoder creates an enhancement stream that uses fewer bits. The decoder creates output video for display by combining a lower resolution base stream and the enhancement stream. The extra processing required to generate the output video could be performed efficiently on a graphics processor unit (GPU). The proposed approach is particularly advantageous if the average time needed to generate a high-resolution frame using the proposed is not higher than the time needed to decode a frame using a separate high-resolution bitstream.
[0025] According to aspects of the present disclosure, the proposed approach uses a combination of up-sampling of low resolution video and enhancement information. When low resolution video is up-sampled to high resolution some sharpness is lost. As a result, the video looks blurred. "enhancement information" can be combined with up-sampled low resolution video to produce a high quality image for display. The edge enhancement data captures information that is lost when up-sampling low resolution video to high resolution. The edge enhancement information is related to pixel values that correspond to edges within an image. The combination of up-sampling with edge enhancement eliminates the need to store a separate higher resolution video bitstream, and instead only an enhancement stream needs to be stored, which requires fewer bits and therefore much less storage space. This approach is particularly advantageous if the amount of data required for the low resolution video plus edge enhancement information before encoding is less than or equal to the amount of data required for the high resolution video before encoding. Such situations may arise in embodiments wherein high resolution video is down-sampled to low resolution video, and edge enhancement data is created from the high resolution video data before down-sampling. In certain implementations, down-sampling may involve an integer down-sample, e.g., eliminating alternate pixels. An example of an integer down-sample may be down-sampling from 4K (2160P) to 1080P.
[0026] In some implementations, the enhancement information may be encoded in an existing video format that a decoder would recognize and an existing encoder would know how to encode. By way of example and not by way of limitation, the enhancement information may be encoded using existing AVC or High Efficiency Video Coding (HEVC) decoders found commonly in devices that support video playback and does not require devices to add support for extensions to the standard. This solution could also be used to reduce the CPU and GPU load for decoding high resolution video bitstreams in devices that include multiple decoders (e.g., both hardware and software-based decoders).
[0027] The enhancement stream need not be stored in a video format; however, doing so works quite well with existing hardware. In alternative implementations the edge enhancement information could be encoded, e.g., using JPEG compression or any other arithmetic coding standard.
[0028] Up-sampling in conjunction with enhancement information as described herein could also be applied to still images, e.g., where JPEG encoding or some other image compression standard is used to compress both the base and the enhancement information.
[0029] Raw video is represented by luminance (intensity) and chrominance (color) values. Many encoders use fewer bits to store the chrominance data so that more bits may be spent on luminance data, as the human eye is more sensitive to differences in luminance than chrominance.
[0030] In certain implementations, an enhancement information generation algorithm may analyze the images in video data to find edges within the image. Edge enhancement data may be determined by analyzing the result of comparing an up-sampled version of a low resolution base image to the corresponding original high resolution image and determining the difference between the images. Up-sampling the low resolution image may use a standard algorithm, e.g. bilinear (fastest and lowest quality) or bicubic (better quality but slower). In certain embodiments, this comparison may be performed by the GPU. In alternative embodiments, this comparison may be performed by a CPU. In some cases, there is no edge enhancement information for a frame because there is not a significant difference between the high resolution video and the up-sampled low resolution video. When such a scenario occurs in situations involving high resolution streaming, those frames for which there is not a significant difference could be encoded as original high resolution frames without edge enhancement information. In alternative embodiments, edge enhancement information may be determined as provided below and subsequently compared to a threshold; the result of such a comparison can then be used to determine whether or not to down-sample the frame before encoding.
[0031] By way of example, and not by way of limitation , the enhancement information may be generated by determining a difference in pixel values (e.g., chroma values or luma values or both) between the up-sampled low resolution image and the original high resolution image and adding a midpoint for pixel value (e.g., 128 for 8-bit). According to certain aspects of the present disclosure, the enhancement information may be created in such a way as to minimize the arithmetic difference between the input frame and an up-sampled version of the down-sampled frame. As used herein the term `difference` refers to a difference in the mathematical sense, including but not limited to arithmetic difference (i.e., the result of a subtraction). Determining the difference may include other mathematical operations on the pixel values prior to subtraction, such as squaring, taking a square root, multiplying by a scaling factor. Determining the difference may also include mathematical operations on the result of a subtraction. For example, in some implementations negative values resulting from a subtraction may be set to zero, and any values that exceed the maximum value for the number of bits may be set to the maximum value (e.g., for 8-bit pixels, values greater than 255 would be set to 255). Additionally, the same number of bits could be utilized to represent each lower resolution pixel, but fewer bits could be used to represent the edge enhancement data, as a large number of bits might not be needed to represent a small difference. By way of example, and not by way of limitation, a calculated 16-bit difference may have a value reduced to an 8-bit representation. Other examples of generating edge enhancement information include feature and edge detection methods such as using the Sobel operator or Roberts cross operator.
[0032] In certain implementations, the difference in the luminance values may be determined without regard for chrominance information that is lost when the up-sampling of the low resolution video is carried out. This frees up computational and memory resources, as no additional chrominance data is saved during the utilization of such a process. This process also increases the efficiency of the encoding process, as computing the differences for the chrominance values is no longer required. By way of example, and not by way of limitation, the luminance information may be stored in the chrominance information in order to manipulate an encoder into encoding some of the luminance information as chrominance information while the rest of the luminance information remains luminance information. On the decoder side, the luminance information stored as chrominance information is moved back to the luminance information and the chrominance information is ignored. Alternatively embodiments allow for the encoding of the chrominance as a flat grey.
[0033] In certain alternative embodiments, a filtering stage is added to make the edge enhancement information more suitable for video compression, e.g., by removing noisy pixels. Noisy pixels are, for example, isolated pixels that are of a much different value than surrounding pixels.
[0034] In certain implementations, decoding performed on low resolution hardware may involve decoding the low resolution video and taking no action with the enhancement data. In alternative embodiments of the present invention, decoding performed on high resolution hardware may involve decoding the low resolution video and the enhancement data and performing the inverse of the comparison that generated the enhancement data to result in reconstituted high resolution video. The inverse comparison may be performed on either the GPU or the CPU. Up-sampling the data may use bilinear or bicubic according to the algorithm that was used to generate the edge enhancement information.
[0035] In alternative embodiments of the present invention, two decoders are utilized. A first decoder may decode low resolution video, and a second decoder may decode the edge enhancement data. In certain embodiments, a hardware decoder may be used for the video and a software decoder may be used for the enhancement data. In alternative embodiments, two instances of a software decoder or two hardware decoders may optionally be utilized. Furthermore, certain alternative embodiments may apply the encoding/decoding processes, methods, and devices described above with respect to audio data.
[0036] In other alternative implementations, only one decoder might be utilized. In such cases encoding may be implemented with only one encoder, and the enhancement data may be encoded into the same bitstream as the encoded base video data. The encoded base video data does not reference any frames containing the enhancement data, and can be decoded independently without decoding the enhancement data completely. By way of example, the slice headers could be used to determine if data being decoded corresponds to the base video or the enhancement video, and if the enhancement data is not required, the rest of decoding may be skipped for that frame. In lower powered hardware, only the base video pictures are decoded. In higher powered hardware, all frames are decoded, and the final high resolution frame is reconstructed from the decoded base video and enhancement data.
[0037] The above-described processes, methods, and devices may alternatively be used to compress high resolution video for storage, as down-sampling high resolution video and storing the down-sampled video with corresponding edge enhancement data may require less storage space than simply storing the high resolution video.
[0038] In certain implementations, the edge enhancement algorithm may be used to determine if any frame has enough detail to be sent at high resolution (e.g., 4K resolution) and, if so, the frame may be encoded at high resolution without down-sampling. By way of example, and not by way of limitation, determining whether a frame has enough detail to be sent at high resolution may use metrics such as variance or a count of the total number of pixels that are not equal to some reference value, e.g., 128, and use thresholds established from empirical data to determine if the enhancement information is significant. If it is determined that the frame does not have enough detail to be sent at 4Kresolution, the frame may be down-scaled to a lower resolution (e.g., 1080P), and encoded as a restructured frame containing low resolution pixels surrounded by pixels of uniform chroma and luma values, e.g., flat grey, along with parameters to indicate the section of the frame that contains the lower resolution data. On the decoder side, these restructured frames may be decoded by using the parameters to extract the down-scaled frame and then up-sampling the extracted down-scaled frame. Alternatively, if it is determined that sending the frame at high resolution (e.g., 4K resolution) is inefficient, the frame may be down-scaled to a lower resolution (e.g., 1080P), followed by generation of enhancement information and encoding of base frames and enhancement information. In such implementations, a decoder can up-sample the low resolution frames and combine them with enhancement information and handle the 4Kframes normally. By selectively sending some frames as low resolution, these embodiments are capable of reducing the bit stream size.