Design Ideas

From ElphelWiki
Jump to: navigation, search

Design Ideas

FPGA Theora Encoder for Videoconferencing

In 2005 Elphel implemented a subset of Theora video encoder in Xilinx® FPGA that is part of Elphel model 333 camera capable of compressing 1280x1024@30fps ([1], [2]), but the CPU in the camera was not fast enough for the job even when the hard part was made by the hardware. In the model 333 camera software was responsible for generating frame headers and Ogg encapsulation of the Theora bitstream provided by FPGA. Knowing that Axis Communications AB were going to release a new faster processor we decided to wait for it before proceeding with Theora in the camera and used plain old Motion JPEG for a while.

Now we have the the new Model 353 camera tested and released to production - the camera that has a brand new ETRAX FS, more memory and larger FPGA and is already tested in JPEG mode. So now it is a perfect time to resurrect Theora code in the camera and move forward.

Current FPGA implementation supports only INTRA and INTER NOMV frames - the goal was to provide efficient compression for the scenes where the camera does not move (CCTV, videoconferencing) and large part of the frame stays the same. To reduce the bandwidth more we need to utilize selective block encoding so if camera is looking at an empty hallway there would be no bitstream at all but INTRA frames - just header telling that no block was encoded.

Such ability to selectively encode blocks is already in the FPGA code but we never used it with the slow CPU - encoded block map is a part of the frame header and the header is built by software, currently - before the video starts. To move farther we need either add FPGA code to generate frame headers or make use of the faster processor and do it in software.

Such project requires both FPGA code development (we use, and the rest of the code is written in Verilog HDL) and driver/application code (usually in C). When I was writing code (and debugging it) for the original encoder of the 333 camera I had to do both, but it would be nice to make such development in a team.

There are many project ideas for Google SoC about Ogg Theora on Xiph Foundation Wiki page

AJAX Camera Interface with PHP/fastCGI

In March of the 2006 I published an article in LinuxDevices - "AJAX, LAMP, and liveDVD for a Linux-based camera". It was a result of an interesting project to create a camera web interface with sliders, semi-transparent overlays, embedded MPlayer video plugin and other fancy features. It would be nice if you could create something like that with regular web development tools, but it that case I had to cheat - modify (and create new) CGI programs (mostly compiled programs in C, some - shell scripts) running in the camera providing the server part of AJAX.

In the Model 353 camera there is more memory (64MB system, 64 MB video buffer and 128MB flash) and the CPU is three times faster. This allows to expand the usage of the mainstream web development tools in the camera - replace binary CGI programs running on the server (in the camera) with the PHP code - it seem to run nicely in the fastCGI mode in the camera with lightTPD web server.

We plan to update the driver interface to simplify hardware interfacing with the PHP code (i.e. replace IOCTL with read/writes) and unleash creativity of the web developers. Instead of having API to the hardware as something given (in some cases - even "taken", not "given" - when the API is unpublished and has to be reverse-engineered ) you'll be able to create one of you dream.

Or - of the dream of Digital Video enthusiasts? See:

LAMP-based DVR

There is not much use of the network video camera if the stream is not recorded somewhere and attaching a hard drive to the camera is not always the best solution - disk storage would increase the size of the camera, camera could be mounted outdoors, there could be a requirement that the data should survive the destruction of the camera and so on. Obvious solution for that is to use off-camera digital video recorder (DVR) that shares the same LAN with the camera. Or better yet - a cluster of cameras so it can record and play back video from multiple cameras.

When developing control software for the camera (AJAX, LAMP, and liveDVD for a Linux-based camera) I used a prototype DVR with very basic features implemented using LAMP technology - Camera, client, and the DVR. To avoid problems with cross-domain scripts (servers in the camera and DVR) I used a small trick - while video from the DVR was coming to the client directly, all control commands (low bandwidth) went through the camera used as a proxy.

The production software should rather be DVR-centric and support multiple cameras. It should organize video records and be able to serve requested ones with specified resolution and video format, transcoding from the Ogg+MJPEG/Ogg+Theora used for recording camera streams using MEncoder or a similar application.

It would also be very useful to have capability of live trascoding of several (CPU power permitting) videostreams being recorded for remote monitoring.

Another idea - make the DVR+cameras cluster look like (have interface of) several lower resolution cameras with one of established APIs so the unmodified 3-rd party CCTV software could be used to control Elphel high-resolution cameras.

Electronic Rolling Shutter Distortion Compensation

Most of the available CMOS image sensors use Electronic Rolling Shutter and the different lines of an image are exposed at different times. That leads to distortions that are most visible when the camera moves or rotates - i.e. the vertical objects will look tilted when filmed sideways from the moving car or during panning. Fast moving objects are also distorted, but the moving camera effect is more annoying. Because of this effect camera manufacturers are avoiding this class of otherwise high performance and inexpensive the sensors and use interline CCDs with true snapshot electronic shutters. Being able to compensate the distortion of ERS effect would make it possible to build high resolution, high frame rate and still inexpensive video cameras.

To some extent the effect of moving camera can be compensated by post-processing of the video, estimating the movement of the camera by comparing consecutive frames and assuming that the accelerations (changes in camera movement/rotation speed) during a single frame were low. This seem to be implemented in Deshaker for VirtualDub

The quality of correction could be higher if the movement of the camera was tracked with higher temporal resolution. There are other applications that require high resolution (and precision) imagery that could benefit from such system. This can be tested by converting a camera (or one of the several sensors of the same camera - see sensor multiplexer board) into an "optical mouse". Regular optical mice have tiny cameras (with usually just 16x16 or 32x32 pixel resolution) running at a high frame rate and calculating correlation between images. Similar could be done with the Elphel reconfigurable hardware - run a small window on a regular sensor board and port/implement correlation code in the FPGA.

Supplementing recorded images/video from the main sensor with precise orientation/position of the camera during each line exposure will allow correction of the ERS distortion during post-processing.

Demosaic Algorithms in FPGA

What is it and why it is needed in Demosaicing in Wikipedia. In Elphel cameras Bayer-encoded pixels are processed just in front of the compressor (JPEG/MJPEG, Ogg Theora). These compressors use 16x16 blocks of pixels converted to YCbCr (intensity and 2 color components), currently we use 4:2:0 that means that color (chroma) components have twice less the spacial resolution (in each direction) than intensity (luma). So for each 4 input pixels compressor needs 4 Y (luma) pixels and one of each Cb and Cr (chroma) ones.

Our first cameras used very simple algorithm to calculate YCbCr from the Bayer pixels and for each pixel it needed just 3x3 block of neighbors. And as these neighbors were needed for the outer pixels in the 16x16 blocks, FPGA had to read larger (18x18) overlapping blocks from the external memory (internal FPGA memory is much smaller and can not hold the whole image). In the later FPGA code 20x20 blocks are read in (to make possible implementation of the fancier demosaic algorithms), but that was not done - the outer pixels are discarded and still only 3x3 used.

There are several algorithms that provide good results with less artifacts (see Wikipedia article) and these detailed descriptions:

So just implement one in the FPGA code of the camera? Or adapt those ideas to use use 4:2:0 encoding and convert to direct Bayer->YCbCr conversion (not Bayer->RGB->YCbCr)

Stereo Vision for Robots

For the 353 series of cameras we had developed a multiplexer board that can accommodate several sensor boards. As these boards are connected to the same FPGA it is rather easy to achieve a complete synchronization of the two sensors - i.e. just by skipping clock pulses to the sensors until their output frame sync pulses will match. When a pair of sensors is mechanically aligned some stereo processing can be performed on a line-by-line basis, storing the intermediate results in attached SDRAM chip and then improving the results by combining 1-d correlation data from multiple lines.

Visual Processing with the FPGA

Many machine vision algorithms can't be run on large image sizes in real time because of simple parallel-izable initial preprocessing passes; some of these could probably be conjoined with the color balancing and compression passes the elphel FPGA takes. A lot of research and products are built on top of the OpenCV libraries (initially released by intel and optimized for their processors but now under a BSD license and developed for several platforms; simple example). If parts of the library were reoptimized to take advantage of the FPGA, existing higher level code should get better performance with minimal tweaking.

There are some interesting algorithms to enhance images beyond the usual gamma/contrast corrections; see the robotics institute and CMU and their spin off shadow illuminator for examples.

Real-time artistic visual filters should also be possible at higher resolutions and/or frame rates than currently possible. A classic example is the waking life film which implemented (example frame) or the newer a scanner darkly (trailer). These both had a lot of rotoscoping by hand, but good effects can be had automatically: gaussian blur, edge detect, blur more, and reduce color pallet to ~4-5.

Resources:

Gyro Image Stabilization

By integrating a 3-axis MEMS gyroscope (and/or accelerometers) into the developer board, the camera could try to shift exposure times to vibration/rotation free instants on a frame-by-frame case; this is apparently a techinque used by digital still cameras to reduce bluring. Or, the stability information could be encoded in every frame (exif?), so that good frames can be identified during post-processing. Or, exposure times could be shortened when rotation is detected to minimize bluriness at the expense of light information.

MEMS sensors are getting cheap and i'm sure there are tons of other projects that could make use of the information. Maybe this could be left to a third party USB or serial device?

Barcode reader

FPGA based fast FFT may be useful.

links:
http://google-code-updates.blogspot.com/2007/11/zxing-1d2d-barcode-decoding-source-code.html
http://www.dmoz.org/Computers/Software/Bar_Code/Decoding/
http://qrcode.sourceforge.jp/
GPL library....... http://www.libdmtx.org/
Systems overview.. http://www.adams1.com/pub/russadam/stack.html