Image Stabilization, the basics
Image stabilization for video can be divided into two categories; Optical image stabilization, where the lens or sensor is moved to counteract movements from the device, and electronic (or digital) image stabilization, where the recorded frames are moved and morphed to align with one another. That can be done in real-time if the device movements are known, or with the help of after editing software, by analyzing and aligning blocks of pixels in the frames.
First, it should be known that there are two kinds of movements that will effect shake in a video. The first is lateral movements, where a recording device is moved sideways, up and down or forwards and backwards. Only if the object is very close by, will this be a problem though. If objects are further away then lateral movements are hardly visible in the video. Lateral movements are also impossible to fully compensate for with image stabilization, because the vantage point of the camera changes, so while objects nearby move a lot, the objects in the background do so far less.
The second type of movements which have a far greater impact on the shake observed in a video are the rotational (or angular) ones. Shakes as small as 0.1 degree will already be visible in a video that is not stabilized. However, because the vantage point of the camera does not change when rotational movements are made, the objects in the front and back are moved and warped by the same amount, and thus it is possible to compensate for these rotational movements.
Choosing the device
The ZeroShake stabilization technology can be applied to both optical and electronic image stabilization. A smartphone however, lacks the space for a decent optical system. To create the ZeroShake app, the search began for a device that had the following specifications;
- On board accelerometer, gyroscope and magnetometer; In order to capture the movements of the device accurately.
- A good quality camera, capable of recording in at least HD quality.
- A fast GPU and CPU; In order to do the real-time processing of the frames and running the motion filter.
- Has a solid development platform.
- Is at or near the top in units sold; A stabilization profile for the device’s camera(s) has to be created, which requires many experiments.
It was decided to develop the first version of the ZeroShake application for the iPhone 4S, as at that time it was one of only a very few devices meeting all requirements. During development, the iPhone 5 was released, and subsequently new stabilization profiles for that model were created as well.
Capturing motion data
An important part of being able to adequately stabilize a recording is getting accurate motion and orientation data. The iOS platform offers a Core Motion framework which uses a sensor fusing filter to take the sampled data from the accelerometer, gyroscope and magnetometer, and fuses this together to be able to output a stable orientation value of the device, set to different reference points, and described in either degrees, vectors or quaternions. The problem with said filter, is that it is not only affected by rotational movements, but also by lateral movements. Quick lateral movements are influencing the rotational output of the Core Motion filter.
For our purpose of image stabilization this was not acceptable, so another solution had to be found. Sampling rotational motion via the gyroscope only did not work either for our purpose, as gyroscopes have a bias resulting in drift, when the data is integrated into an orientation. This is of no concern to other stabilization techniques, where the orientation of the device is not required to be known. These systems just stabilize the fast rotations, and don’t keep track of where the device is being pointed at. But in the case of ZeroShake, where the user is being shown the device’s orientation, because all motion within set limits has to compensated for, this is a concern. Having a stable orientation (or attitude) of the device is required to make ZeroShake work, and also allows for future features, such as smooth panning, and pre-programmed panning.
So a custom designed complementary motion filter was developed. One which provides accurate data about the device’s orientation, but with minimum drift, and not being corrupted by lateral movements.
The motion data is described as a quaternion, with the center of the ZeroShake target area as the reference. This allows to rotate the device in all positions, without messing up the axes. Because the reference point is known, accurate stabilization can be done for all three rotational directions at the same time.
Stabilizing the image frames
A complex algorithm is used to calculate when and how much stabilization is required, and when the reference point needs to be reset. OpenGL and the GPUImage framework are used to move and morph the frames. GPUImage allows for much faster image processing directly via the GPU. This allows us to not only show the ZeroShake UI markers on the screen in real time, but even have a fully stabilized preview.
There are several steps to be taken in order to match a frame to the reference point; The most obvious is the rotation of the entire image. Rotate the device clockwise, and the image needs to be rotated in its entirety as well to counteract that movement. Similarly, rotational motions to the side or up- downwards can be compensated for via translation of the image. If the camera is pointed to the right, the recorded object will be to the left. Move (translate) the frame to the right will get you back on reference. Although the vantage point of the camera does not change with rotational movements, the perspective does. Most image stabilization technologies do not bother with this, because they only stabilize for small quick movements. ZeroShake allows for much greater stabilization and slow movements as well, and as such the perspective change becomes obvious, if not being corrected. Because the device orientation is known, we can simply add a value in the rotation matrix for the perspective to be taken into account. In order to accurately calculate the required translation and perspective warp, the exact Field Of View of the camera, and the resolution of the recording must be known.
The rolling shutter effect
Most basic digital cameras and all smartphones today have a CMOS image sensor. This image sensor is small, cheap, and provides a good quality image. However, unlike a CCD sensor, which captures the entire frame in one instance, a CMOS sensor captures the image row by row. It has a so-called rolling shutter. The rolling shutter effect makes it impossible to match the captured motion at a certain time with all areas in a frame of that time. Instead, each row in the frame should be compensated for according to the precise movements of the device that occurred during the exact capture of that row. As such a mesh was created according to which the image is morphed. A delay between the motion data and the bottom row of the frame is applied to align the two, and separate delays are used in regards to the other rows, depending on the speed of the rolling shutter of the sensor.
Matching the timing of motion data and image data
Of course it a very important to use motion data that represents the device movements at the exact time a row of the image was captured. That is why our complimentary motion filter runs at a very high frequency, and takes the most out of the sensors and CPU to gave as many samples per second as possible. The two motion samples that are just before and after the actual capture of the frame row are found, and using a slerp method the most probable orientation at that time is calculated.
The discrepancy between motion time and frame capture time is further influenced by the exposure time of the frame. So an algorithm was designed to take this into effect, allowing a continuous adaptation of the applied delay to the surrounding light.
Because it is impossible to capture future device movements, a small delay is built into the preview, hardly noticeable by the user, that allows the system to match frames and motion data according to above description, while showing a fully stabilized image on the screen.
Processing of the recording
Because frames are moved, warped and morphed during the stabilization process, there will be black edges and corners in a stabilized video. To prevent this, the frames are cropped. That makes it appear the image is zoomed in a bit. Cropping is kept to a minimum amount, because maximum angles of allowed shake and sway are fixed, and the precise amount of maximum necessary cropping can be calculated.
For faster processing of after effect filters, and sharing videos over the internet, the resolution of the video is made smaller. For a video saved on the device’s Camera Roll, of which the resolution is not reduced, we managed to reduce the file-size of the HD and FullHD recorded footage by a factor 3 without a significant loss in image quality.
What to expect in the near future
ZeroShake relies heavily on the accuracy of multiple parameters that describe the camera(s) used in the device, the motion filter, and the matching of it’s output with the captured image data. As a result there are over 20 parameters to be set for each camera, and at each resolution. It is a painstaking long process to get the optimum values, as changing one affects others. As such there is definitely room for improvement in the stabilization in the current version of the ZeroShake app.
ZeroShake Version 1.x allows for full stabilization, as long as the user keeps his shake or sway within preset values. In the future we will launch a version which incorporates some of the other features possible with ZeroShake, such as smooth panning, and pre-programmed panning.
If you would like to share an idea, on how to make the ZeroShake app even better, then please let us know via the form on the suggestions page.