Introduction to Video Storage

Digital video is made up of many "frames" of video, where each frame is basically a photograph or image, displayed in rapid succession.  In addition, there is usually one or more tracks of audio that is synchronized with the video, and possibly subtitle tracks as well.  There are a nearly innumerable different number of ways to format it, store it, and transmit it.  Here we will break it down into the different aspects and components and cover some of the basics behind how it works. 

Aspect Ratio and Resolution

Video typically comes in two different aspect ratios: 4:3 (the same as typical photographs), and 16:9 (or 17:9).  4:3 was a common standard for television, but it is being replaced with HD TV which uses the 16:9 aspect ratio.  The wider 16:9 ratio is generally considered better for humans to watch.  The reason for this is that our eyes are towards the side of our head, and this gives us a much wider field of view to the sides than it does up and down.  The wider aspect ratio does a better job at filling that natural field of view. 

Here are some of the common resolutions used for video and some of the names that go with them.

16:9 formats - HD TV and Internet Video
1920x1080 - 1080p, 1080i, hd1080
1280x720 - 720p, hd720
1024x576 - 576p, WSVGA
854:480 - 480p, hd480
640:360 - 360p
463:240 - 240p

4:3 formats - old TV and Internet Video
768:576 - 576i, PAL
640:480 - VGA, 480i, NTSC
320:240 - QVGA

17:9 formats - specifically for movies in theaters and film editing
4096x2160 - 4k
2048x1080 - 2k

http://en.wikipedia.org/wiki/Aspect_ratio_(image)
http://es.wikipedia.org/wiki/Relación_de_aspecto

Interlacing

You may notice that many of the numeric representations for the formats have a p or i after the number of lines of resolution.  This indicates that the video is Progressive scan or Interlaced scan.  Old, analog televisions used interlaced scan to cut in half the amount of data they needed to send.  This is commonly called interlacing and it works by alternating which lines are broadcast.  In one frame of video all the odd numbered lines would be broadcast, then in the next frame of video, all the even numbered lines would be broadcast.  If this interlacing of frames happens fast enough it tricks the human eye into thinking that they are full frames.  With one single exception (1080i), all the modern video formats are progressive scan.  The 1080i format was really just used as a marketing gimmick to tell people that they are getting 1080 quality video on their TVs without having to processes full 1080p video.  In reality, 720p video will look just as good if not better than 1080i video, especially for high-motion action scenes.  When creating videos, interlacing should only be used if the video is primarily targeting a analog broadcast television.  Digital video should always be progressive scan (it helps it compress better).

http://en.wikipedia.org/wiki/Interlaced_video
http://es.wikipedia.org/wiki/Exploración_entrelazada

Frame Rates

The number of frames shown per second, also known as the "Frame Rate" is another variable when creating video.   Typically in this course we will be
using 30fps (Frames Per Second), unless the source material is something different.

The faster the frame rate is, the more smoothly it will seem that the video is.  For a somewhat extreme example of the difference in framerates, watch these two examples of the same clip at 30fps and 5fps.

Here are some common frame rates in use, in frames per second:
23.976 - NTSC film (24/1.001)
24 - Typical for film
25 - PAL television/film
29.97 - NTSC television (30/1.001)
30 - Digital Video
48 - High Speed Film
60 - Digital Video

http://en.wikipedia.org/wiki/Frame_rate
http://es.wikipedia.org/wiki/Imágenes_por_segundo

Pixel Formats

There are a lot of different formats used to store the pixels (it can also be referred to as the color space) and it gets very complicated, so we're not going to go too far into it.  A quick overview, when we worked on the photo unit, we assumed that everything was represented in terms of its RGB values (8 bits each, for a total of 24 bits).  That format is also common in video, however there are many more.  Most are defined in terms of YUV, this defines a Luminance (brightness and darkness) channel and two different chroma (color) channels, this can roughly re-produce all the colors of the 24 bit RGB.  However, when the human eye sees an image, it is mostly looking at the lightness and darkness (luminance) of it, and though the color is important it doesn't take a large amount of color to make it seem fully colored.  Because the color information isn't as important to the eye, some of these formats take out some of that information to save space.  Common formats you will see are YUV4:2:2 (16 bits), YUV4:1:1 (12 bits), and YUV4:2:0 (12 bits).  Don't worry about what the numbers mean, just be aware that they are out there. 

http://en.wikipedia.org/wiki/YUV
http://es.wikipedia.org/wiki/YUV