Difference between revisions of "SoC"
(→Dates and deadlines) |
(→Hardware for the projects) |
||
Line 32: | Line 32: | ||
=== Hardware for the projects === | === Hardware for the projects === | ||
− | It is difficult (but not impossible) to work with the hardware if you do not have it in hands. We plan to provide each of the students with '''our latest hardware (model 353 camera) that will remain yours''' after the summer will be over. Depending on the country you there could be some import or customs difficulties (camera are more | + | It is difficult (but not impossible) to work with the hardware if you do not have it in hands. We plan to provide each of the students with '''our latest hardware (model 353 camera) that will remain yours''' after the summer will be over. Depending on the country where you live there could be some import or customs difficulties (camera shipments are more regulated than T-shirts) so while working on their resolution (hoping it will not take long) we will provide you with online access to the individual cameras in our facility - the author of one of the first video streamers for Elphel earlier cameras had only remote Internet access to the hardware. |
'''Good luck !''' | '''Good luck !''' | ||
− | |||
− | |||
== Design Ideas == | == Design Ideas == |
Revision as of 08:54, 2 April 2007
/* ** Elphel was not accepted as a mentoring organization for SoC by Google ** ** this year but we still plan to provide mentoring (and the stipends) ** ** for 1-3 students to work on the projects outlined in the Design Ideas ** ** below or your own if we find them interesting. ** ** ** ** We'll start accepting applications on April, 2 2007 - a week after ** ** the deadline for GSoC student applications. ** ** ** ** More details will be posted, please stay tuned. ** ** UPDATE: See below ** */
Contents
Student applications
As promised - we are now accepting applications for the projects related to the Elphel open hardware.
Dates and deadlines
For the mentored work itself we will keep the same dates as GSoC (May 28 - August 31, 2007), but we start accepting applications later (today, April 2, 2007) and the application period will last longer - through April 30, 2007 - full 4 weeks. We will announce our decision who is selected in a week after the end of the application period - on May 7, 2007.
Application period is longer because Elphel area of interest is on the border between the software and the hardware (and of course it is free software and open hardware) - it is much less familiar to most of you than many software projects selected for GSoC. You may need more time to find out about us and our projects, to ask us additional questions (via email, IRC, forums or just a phone call).
General rules
Elphel is trying to reuse most of the GSoC proceeding and stick to the same general rules (like eligibility for students, deadlines, amounts paid) that you may find on http://code.google.com/soc/ FAQ. We will post here just the important diffs - feel free to post questions into discussion section of this wiki article.
T-shirts
Elphel will send you our T-shirts even if they may be not as cool as GSoC ones.Number of students Elphel can accept
We will accept just one to three students - it will depend on how we'll be impressed by your applications.
Application process
Application process is different. We do not offer any web forms to fill, the main tool for submitting your proposals is this wiki - you start you proposal page here and place a link to it on this page. In the case your application will be accepted that page will become your project page. Just email us at soc@elphel.com with with additional personal information (same as required at GSoC) and identifying your wiki project name/page. Discussion section of this article may be a good place to start.
Licensing
All the code developed for these projects will be released under GNU GPL v.2 (or later). We require copyright ownership of the code to be assigned to Elphel, Inc. while your name will stay as the author (or co-author if applicable) - we do so to simplify future license upgrades for our products - i.e. when GPL v.3 will be finalized.
Hardware for the projects
It is difficult (but not impossible) to work with the hardware if you do not have it in hands. We plan to provide each of the students with our latest hardware (model 353 camera) that will remain yours after the summer will be over. Depending on the country where you live there could be some import or customs difficulties (camera shipments are more regulated than T-shirts) so while working on their resolution (hoping it will not take long) we will provide you with online access to the individual cameras in our facility - the author of one of the first video streamers for Elphel earlier cameras had only remote Internet access to the hardware.
Good luck !
Design Ideas
FPGA Theora Encoder for Videoconferencing
In 2005 Elphel implemented a subset of Theora video encoder in Xilinx® FPGA that is part of Elphel model 333 camera capable of compressing 1280x1024@30fps ([1], [2]), but the CPU in the camera was not fast enough for the job even when the hard part was made by the hardware. In the model 333 camera software was responsible for generating frame headers and Ogg encapsulation of the Theora bitstream provided by FPGA. Knowing that Axis Communications AB were going to release a new faster processor we decided to wait for it before proceeding with Theora in the camera and used plain old Motion JPEG for a while.
Now we have the the new Model 353 camera tested and released to production - the camera that has a brand new ETRAX FS, more memory and larger FPGA and is already tested in JPEG mode. So now it is a perfect time to resurrect Theora code in the camera and move forward.
Current FPGA implementation supports only INTRA and INTER NOMV frames - the goal was to provide efficient compression for the scenes where the camera does not move (CCTV, videoconferencing) and large part of the frame stays the same. To reduce the bandwidth more we need to utilize selective block encoding so if camera is looking at an empty hallway there would be no bitstream at all but INTRA frames - just header telling that no block was encoded.
Such ability to selectively encode blocks is already in the FPGA code but we never used it with the slow CPU - encoded block map is a part of the frame header and the header is built by software, currently - before the video starts. To move farther we need either add FPGA code to generate frame headers or make use of the faster processor and do it in software.
Such project requires both FPGA code development (we use, and the rest of the code is written in Verilog HDL) and driver/application code (usually in C). When I was writing code (and debugging it) for the original encoder of the 333 camera I had to do both, but it would be nice to make such development in a team.
There are many project ideas for Google SoC about Ogg Theora on Xiph Foundation Wiki page
AJAX Camera Interface with PHP/fastCGI
In March of the 2006 I published an article in LinuxDevices - "AJAX, LAMP, and liveDVD for a Linux-based camera". It was a result of an interesting project to create a camera web interface with sliders, semi-transparent overlays, embedded MPlayer video plugin and other fancy features. It would be nice if you could create something like that with regular web development tools, but it that case I had to cheat - modify (and create new) CGI programs (mostly compiled programs in C, some - shell scripts) running in the camera providing the server part of AJAX.
In the Model 353 camera there is more memory (64MB system, 64 MB video buffer and 128MB flash) and the CPU is three times faster. This allows to expand the usage of the mainstream web development tools in the camera - replace binary CGI programs running on the server (in the camera) with the PHP code - it seem to run nicely in the fastCGI mode in the camera with lightTPD web server.
We plan to update the driver interface to simplify hardware interfacing with the PHP code (i.e. replace IOCTL with read/writes) and unleash creativity of the web developers. Instead of having API to the hardware as something given (in some cases - even "taken", not "given" - when the API is unpublished and has to be reverse-engineered ) you'll be able to create one of you dream.
Or - of the dream of Digital Video enthusiasts? See:
- HD_cinema_camera_Development_FAQ
- High Definition with Elphel model 333 camera
- Elphel 333 (practical thread)
- Elphel 333 HTML
LAMP-based DVR
There is not much use of the network video camera if the stream is not recorded somewhere and attaching a hard drive to the camera is not always the best solution - disk storage would increase the size of the camera, camera could be mounted outdoors, there could be a requirement that the data should survive the destruction of the camera and so on. Obvious solution for that is to use off-camera digital video recorder (DVR) that shares the same LAN with the camera. Or better yet - a cluster of cameras so it can record and play back video from multiple cameras.
When developing control software for the camera (AJAX, LAMP, and liveDVD for a Linux-based camera) I used a prototype DVR with very basic features implemented using LAMP technology - Camera, client, and the DVR. To avoid problems with cross-domain scripts (servers in the camera and DVR) I used a small trick - while video from the DVR was coming to the client directly, all control commands (low bandwidth) went through the camera used as a proxy.
The production software should rather be DVR-centric and support multiple cameras. It should organize video records and be able to serve requested ones with specified resolution and video format, transcoding from the Ogg+MJPEG/Ogg+Theora used for recording camera streams using MEncoder or a similar application.
It would also be very useful to have capability of live trascoding of several (CPU power permitting) videostreams being recorded for remote monitoring.
Another idea - make the DVR+cameras cluster look like (have interface of) several lower resolution cameras with one of established APIs so the unmodified 3-rd party CCTV software could be used to control Elphel high-resolution cameras.
Electronic Rolling Shutter Distortion Compensation
Most of the available CMOS image sensors use Electronic Rolling Shutter and the different lines of an image are exposed at different times. That leads to distortions that are most visible when the camera moves or rotates - i.e. the vertical objects will look tilted when filmed sideways from the moving car or during panning. Fast moving objects are also distorted, but the moving camera effect is more annoying. Because of this effect camera manufacturers are avoiding this class of otherwise high performance and inexpensive the sensors and use interline CCDs with true snapshot electronic shutters. Being able to compensate the distortion of ERS effect would make it possible to build high resolution, high frame rate and still inexpensive video cameras.
To some extent the effect of moving camera can be compensated by post-processing of the video, estimating the movement of the camera by comparing consecutive frames and assuming that the accelerations (changes in camera movement/rotation speed) during a single frame were low. This seem to be implemented in Deshaker for VirtualDub
The quality of correction could be higher if the movement of the camera was tracked with higher temporal resolution. There are other applications that require high resolution (and precision) imagery that could benefit from such system. This can be tested by converting a camera (or one of the several sensors of the same camera - see [[10353#10359|sensor multiplexer board]) into an "optical mouse". Regular optical mice have tiny cameras (with usually just 16x16 or 32x32 pixel resolution) running at a high frame rate and calculating correlation between images. Similar could be done with the Elphel reconfigurable hardware - run a small window on a regular sensor board and port/implement correlation code in the FPGA.
Supplementing recorded images/video from the main sensor with precise orientation/position of the camera during each line exposure will allow correction of the ERS distortion during post-processing.
Demosaic Algorithms in FPGA
What is it and why it is needed in Demosaicing in Wikipedia. In Elphel cameras Bayer-encoded pixels are processed just in front of the compressor (JPEG/MJPEG, Ogg Theora). These compressors use 16x16 blocks of pixels converted to YCbCr (intensity and 2 color components), currently we use 4:2:0 that means that color (chroma) components have twice less the spacial resolution (in each direction) than intensity (luma). So for each 4 input pixels compressor needs 4 Y (luma) pixels and one of each Cb and Cr (chroma) ones.
Our first cameras used very simple algorithm to calculate YCbCr from the Bayer pixels and for each pixel it needed just 3x3 block of neighbors. And as these neighbors were needed for the outer pixels in the 16x16 blocks, FPGA had to read larger (18x18) overlapping blocks from the external memory (internal FPGA memory is much smaller and can not hold the whole image). In the later FPGA code 20x20 blocks are read in (to make possible implementation of the fancier demosaic algorithms), but that was not done - the outer pixels are discarded and still only 3x3 used.
There are several algorithms that provide good results with less artifacts (see Wikipedia article) and these detailed descriptions:
So just implement one in the FPGA code of the camera? Or adapt those ideas to use use 4:2:0 encoding and convert to direct Bayer->YCbCr conversion (not Bayer->RGB->YCbCr)
Stereo Vision for Robots
For the 353 series of cameras we had developed a multiplexer board that can accommodate several sensor boards. As these boards are connected to the same FPGA it is rather easy to achieve a complete synchronization of the two sensors - i.e. just by skipping clock pulses to the sensors until their output frame sync pulses will match. When a pair of sensors is mechanically aligned some stereo processing can be performed on a line-by-line basis, storing the intermediate results in attached SDRAM chip and then improving the results by combining 1-d correlation data from multiple lines.