Teaching Video to my Grandson

What is a pixel? A pixel is a made-up term for “picture element”. Displays (TVs, phones, etc.) are composed of scads of pixels arranged in rows and columns. The iPhone 3GS has 480 pixels in the wide direction. Under each one is 319 more pixels (in the narrow direction). But what are they? Each pixel is composed of (3) LEDs – light emitting diodes – a blue one, a red one and a green one. Each LED can be turned on in tiny increments from complete OFF to barely ON to fully bright ON. There are actually 256 different setting possible for each LED. It takes one BYTE to store the number 256 (or anything between 0 and 255). It take three BYTES to store the color information for a single pixel. It takes 3 x 480 x 320 to store the color information for a single picture on the iPhone 3GS display.

What is video? Video is “moving pictures”. A series of pictures that change many times a second will fool the eye into believing the image is moving.  A movie in a theater uses 24 pictures per second. A television show is 30 pictures per second. The iPhone 3GS video camera “takes” s 30 pictures per second at 640 pixels x 480 pixels. A picture that is part of a video is called a FRAME. Each FRAME therefore contains 3 x 640 x 480 BYTEs. A 30 FPS (frame per second) movie will use 30 x 3 x 640 x 480 BYTEs each second or 60 x 30 x 3 x 640 x 480 BYTES/minute. That’s a lot of BYTEs, about 1600 million bytes per minute, or ~100 Billion Bytes/hour. And that is only for what you SEE not what you HEAR. The sound is stored separately and the explanation for how that is done is a future topic. Fortunately sound takes much less space so we can ignore it for now.

But my iPhone only has 16GB of memory – how can it store several hours of video? The computer chips in the iPhone shrink the movie while it is being recorded. And the result is huge – almost 100 times smaller than raw numbers would predict. How is this possible? This shrinking is called ENCODING. It follows several simple principles. For instance your full name takes 14 letters to spell. The teacher puts your full name on your report card, but when she records your grades each week, she uses your initials (2 letters). By not writing the other 12 letter every day for you (and all the other kids in your class) she saves a lot of pencils over the school year. Encoding uses several techniques to “abbreviate” the movie data. When the movie is played (DECODING), the abbreviations are replaced with the full names, just like on your report card. These techniques are called arithmetic compression because they use a lot of tricky math. But that isn’t all. If you take a picture of the sky, there will be a lot of blue pixels right? Let’s pretend you are a “pixel” in a picture and you’re wearing your school-blue sweatshirt. The kids (pixels) standing next to you are also wearing the same blue color that your are. What if we replace the RGB color with a short note that says “I’m the same color as the kid in front of me (or to my left or right, and so on). Assume this “note” takes fewer BYTES than your actual color data would. This technique of saving data is called “spacial compression” because information about the “space” ( the kids near you) is used to reduce data.

Now let’s think about time. If you take a movie of a fast car, the pixels in each frame will probably be different. But what if you take a picture of a person. The colors in his face don’t change from one second to the next. Take the previous example of kids standing around you and think about in a high-rise building. On each floor of the building there are kids standing in rows and columns. Each kid represents a pixel in a frame. Each floor in the building is a different frame. A 30 story building is one frame of iPhone video – displayed as the elevator shoots from the ground to the top floor at 30 frames per sec. Your “color” might the same as the “color” of the kid on the floor directly above or below you. A short note could say “use the color of the kid on the floor directly above me”. This would save space – in fact this will save a lot of space if you think about how fast the elevator travels. This is called “temporal” compression. Temporal is a fancy word for time.

But wait, there’s more. When all the kids from all the classes are on the school ground do you notice if one of them stayed home sick. Maybe you would notice if one of your friends was missing. But if hundreds of kids are on the field you might not notice a few that you don’t know are missing.  The same thing is true when you eye looks at an image that has millions of pixels. Some of them could be missing and you would never notice. The eye isn’t perfect when looking at lights and colors.  Some colors are more noticeable than others. For instance, if there are a lot of kids wearing blue (your school color) jackets everyday, then your eyes won’t notice if a few of them stayed home sick. If a few kids always wear green jackets, you would notice if one of them was absent. The engineers that invented video encoding programs use facts about how our eyes work to remove some the color elements. This is called “sub-sampling”. Together these technique reduce the amount of data required to store a move by a factor of one hundred or more, allowing your iPhone to store several hours of video, and to make a single movie small enough to email to YouTube for instance. When you play an encoded movie, a reverse process takes place, to put the video data back. This is called DECODING. When you play a movie all of the abbreviations are replaced with full names, all of the little notes about which the color a kid near you in space or time, are replaced with actual color values, and so on. This happens on a frame by frame basis. The frames are “shrunk” during encoding and “un-shrunk” just before they get sent to the display you are watching.

None of this was possible in a hand-held device when you were born. The computer chips weren’t fast enough. The storage chips weren’t big enough. And it would have cost a ton of money to make a high quality video camera that fits in your pocket. In ten more years, the quality will be higher and the device will be smaller and cheaper. When your kids are ten, they might take video movies of everything they see every second of every day. Their whole life might be recorded and wirelessly saved somewhere to watch in the future.

Leave a Reply