Talk:Porting Theora to 353 cameras

From ElphelWiki
Jump to: navigation, search

Ok, I started a new page on the Theora project in 353 cameras. I started with a global approach and a list of modules that need to be changed or replaced on the 353 model. Feel free to correct me or add suggestions as I'm not (yet...) an expert on Elphel cameras. I think the first step is to make an exhaustive list of the hardware blocks that need modifications and then detail what need to be done in each block. --JeremViewsurf 03:48, 19 February 2008 (CST)


As I understand your first need is not a streaming but producing a small video clip to be uploaded via FTP. This can be done more easy with minimal non FPGA modifications.

We will probably create a CVS branch for your port, to be able to test it we need to compile a standard 7.1.x firmware disabling everything sensor or FPGA related and change the FPGA bitstream. --Alexandre.Poltorak 07:53, 19 February 2008 (CST)


You mean that it should be possible that the system can handle software Ogg Theora encoding for a non real-time application such as producing small video clips ? I'm aware that it depends on the output resolution, framerate and length of the clip but does a 1 min 1024x768 @ 25fps clip for example can be processed in a reasonable delay ? I'll discuss with Viewsurf's development team in order to obtain more precise specs concerning video output (in terms of length, resolution, framerate and encoding delay acceptable) but if you can give me an idea of what the software part can handle, I would consider spending more time in soft Theora transcoding instead of FPGA design. --JeremViewsurf 03:37, 20 February 2008 (CST)


No, sorry i was not speaking of transcoding. It would take too much calculation power and is not possible on the camera's CPU. The Theora encoding should be done in FPGA. What i was speaking about is the software part.. I think you do not need a streamer for the first steps. You only need to generate a clip in the FPGA memory, and upload it to a FTP. --Alexandre.Poltorak 08:32, 20 February 2008 (CST)




Some questions concerning changes in the FPGA :

  • Is it necessary to keep histogram and timestamp blocks in a design which supports Theora only ?
    • I believe the camera will be rather crippled w/o those features--Andrey.filippov 10:24, 25 February 2008 (CST)
  • Is it possible to keep channel 0, 1 and 3 design without changes in the memory controller ?
    • Theora implementation used different (more advanced) memory controller that the JPEG, it was getting 95% efficiency at Theora data structure, so you need Theora memory controller. But if you mean by "Channel 0 design" the gamma tables and histograms - yes, they probably could be ported (with some changes, of course). As for channels 1 and 3 - similar are available in Theora 333 also--Andrey.filippov 10:24, 25 February 2008 (CST)
  • The 353 design uses 2 DMA fifos between the compressor and the system interface instead of one in the 333 design. Is it possible to keep only one 333 DMA fifo in the final design ?
    • Yes, of course - but this is so minor difference. The more important is that in Theora that FIFO could make the compressor wait if ETRAX DMA is not ready - at the beginning of the frame in theora there is a data "burst".--Andrey.filippov 10:24, 25 February 2008 (CST)
  • In the 353 design, all signals and blocks I/O concerning 90° phase clock are commented. Is it possible to uncomment them in order to restore the clock ?
    • Maybe yes, more likely no. 353 and 333 use different FPGAs.--Andrey.filippov 10:24, 25 February 2008 (CST)
  • Motion compensation seems not to be implemented in the Theora compressor yet. Is it possible to add this feature later in the compressor block without major changes in the final design ?
    • No, I'm afraid that is not possible - I believe we need faster FPGA for that and likely faster/wider memory.--Andrey.filippov 10:24, 25 February 2008 (CST)

--JeremViewsurf 02:43, 25 February 2008 (CST)




I'm currently working on the FPGA control register and I have a question concerning the I2C pads controlled by the control register (via dcr[16:21]). In the 333 design there are 2 I2C "channels" : (sda0,scl0) and (sda1,scl1) and only one on the 353 design. Is it possible to keep a single i2c channel on the new design ? In other ways, what is the purpose of the i2c "channels" in each device ? --JeremViewsurf 03:19, 3 March 2008 (CST)


The later 353 software also has 2 I2C channels, similar to 333. One channel is designed to communicate to the sensor (and goes through 30-pin flex cable), the other - to additional boards that sit on top of the 353, that I2C bus uses 2 bits (0,1) of the 12-bit GPIO of the 353 FPGA. It is currently used by multiple I2C peripherals on teh 10369 board (IDE bus configuration, clock/calendar, temperature sensor/fan control, EEPROM, USB hub configuration --Andrey.filippov 12:27, 5 March 2008 (CST)




I tried to implement the new design according to the specs I proposed on this article. I instantiated the Theora compressor and the 8-channel memory controller in the place of the former jpeg compressor and memory controller and adapted registers and addresses (see article). The synthesis of this design with ISE is completed without error but I expect more issues with the place & route procedure. It would be great if I could share this work on a CVS to have some critics and some support, we had a discussion about that with Alexandre but I'm still waiting for an account. Before going farther, I'm trying to run the simulation with Icarus but I'm experiencing some issues with the compilation :

channel0.v:154: error: operand of concatenation has indefinite width: ({1'b0, ntile_y[9:0]})+('sd1)
channel1.v:152: error: operand of concatenation has indefinite width: (ntile_x[3:0])+('sd1)
channel1.v:155: error: operand of concatenation has indefinite width: ({1'b0, ntile_y[9:0]})+('sd1)

I would appreciate some support concerning how to test and debug the new design since this seems to be a tough issue to deal with.


When simulating Theora I used Silos III simulator on Windows, some work is needed to make it work with Icarus. First step - patch Xilinx unisim library, then - go through code (maybe it needs width for sd1?. The other problem that will come out - in Silos tasks were reenterable, each instance having it's own set of local variables (I used it when simulating CPU interrupts), in Icarus - they all share the same.--Andrey.filippov 12:27, 5 March 2008 (CST) --JeremViewsurf 08:57, 5 March 2008 (CST)


I fixed the previous compilation error in Icarus, in channel0.v and channel1.v replace

channel0.v:154: error: operand of concatenation has indefinite width: ({1'b0, ntile_y[9:0]})+1
channel1.v:152: error: operand of concatenation has indefinite width: (ntile_x[3:0])+1
channel1.v:155: error: operand of concatenation has indefinite width: ({1'b0, ntile_y[9:0]})+1

with

channel0.v:154: error: operand of concatenation has indefinite width: ({1'b0, ntile_y[9:0]})+11'h1
channel1.v:152: error: operand of concatenation has indefinite width: (ntile_x[3:0])+4'h1
channel1.v:155: error: operand of concatenation has indefinite width: ({1'b0, ntile_y[9:0]})+11'h1

I can finally visualize some chronograms in gtkwave however I'm using the old 353 testbench... Time to understand how the testbench is structured and to think about a smarter one to test the new design. --JeremViewsurf 03:13, 6 March 2008 (CST)




I'm working on the simulation of the 333 Theora design (x333t.tf) and it looks like there are three simulation modes for this design that can be switched with 'define : MASTERMODE, MASTERMODE1,SINGLE_ST. What is the difference between these modes and how does each work in summary ? --JeremViewsurf 10:45, 10 March 2008 (CDT)

The source code is now available on the Elphel CVS : | http://elphel.cvs.sourceforge.net/elphel/fpga/theora_353/. The synthesis of this code is OK but it still have to be tested. I split the testbench between the main simulation sequence and the different tasks. The x353_sim.sh script is up to date but the simulation itself (x353_1.tf) is inadequate to test the design (it is still the old MJPEG testbench). The testbench in the 6.3.9 firmware (333 Theora) seems to fit better but I still need to know how it works precisely in order to adapt it to the 353 design. --JeremViewsurf 09:22, 11 March 2008 (CDT)




Some news concerning the simulation of Theora on the 353 :

  • I instantiated the x353t module with the sensor and the sdram.
  • I'm trying to go through the initialization sequence
  • The global initialization works (with glbl.v)
  • The DCM reset works (SDRAM clocks SDCLK_D and SDNCLK_D are working)
  • I didn't check if the tables programmation is ok yet
  • The sdram initialization sequence works :
Initialize SDRAM

>> 214455.000 ns >> cpu_wr(21, 00017fff)
At time 214556.000 ns PRE  : Addr[10] = 1, Bank = 11

>> 214605.000 ns >> cpu_wr(21, 00002002)
At time 214708.000 ns EMR  : Extended Mode Register
At time 214708.000 ns EMR  : Enable DLL

>> 214755.000 ns >> cpu_wr(21, 00000163)
At time 214860.000 ns LMR  : Load Mode Register
At time 214860.000 ns LMR  : Burst Length = 8
At time 214860.000 ns LMR  : CAS Latency = 2.5

>> 214905.000 ns >> cpu_wr(21, 00017fff)
At time 215004.000 ns PRE  : Addr[10] = 1, Bank = 11

>> 215055.000 ns >> cpu_wr(21, 00008000)
At time 215156.000 ns AREF : Auto Refresh

>> 215205.000 ns >> cpu_wr(21, 00008000)
At time 215308.000 ns AREF : Auto Refresh
testbench353t.i_mt46v16m16fg: at time 215308.000 ns MEMORY:  Power Up and Initialization Sequence is complete

>> 215355.000 ns >> cpu_wr(21, 00000063)
At time 215460.000 ns LMR  : Load Mode Register
At time 215460.000 ns LMR  : Burst Length = 8
At time 215460.000 ns LMR  : CAS Latency = 2.5

Initialize SDRAM done

I committed the changes to the CVS. --JeremViewsurf 05:20, 19 March 2008 (CDT)


I checked the table programming (quantization, huffman, etc), the procedure seems ok. I finally have data output on the SDA and SDD ports after having replace the sensorpix and sensortrig blocks in the 353 with those that were in the 333 Theora (with some modifications in the width of some signals). However, the timings seems completely different between the two sensors and I experience memory overflow during the simulation :

...
At time 1498748.000 ns WRITE: Bank = 3, Row = 0034, Col = 06d, Data = 0000
At time 1498748.000 ns ERROR: Memory overflow.
  Write to Address 180d06d with Data 0000 will be lost.
  You must increase the part_mem_bits parameter or define FULL_MEM.
At time 1498748.000 ns WRITE: Bank = 3, Col = 070
At time 1498752.000 ns WRITE: Bank = 3, Row = 0034, Col = 06e, Data = 0000
At time 1498752.000 ns ERROR: Memory overflow.
  Write to Address 180d06e with Data 0000 will be lost.
  You must increase the part_mem_bits parameter or define FULL_MEM.
...

--JeremViewsurf 11:46, 26 March 2008 (CDT)

Have you tried to increase "part_mem_bits"? It's in ddr_parameters.v --Oleg 20:06, 26 March 2008 (CDT)

I defined FULL_MEM in ddr.v to be sure, there is no more memory overflow error. However is the simulation still representative of reality ? --JeremViewsurf 03:32, 27 March 2008 (CDT)




I have a question concerning the sensor in both 333 and 353 design. In the 333 the sensor has a 10 bit output that goes directly to the sensorpix module (through sensorpads) to a 10 bit input. However In the 353, the sensor has a 12 bit output that goes to the sensorpads module and is treated to produce a 16 bit output that goes to the sensorpix module. What is the purpose of the extra bits in the 353 design ? Is it possible to extract a 10 bit output from the 353 sensorpad module to feed a 333 sensorpix block ? It looks like that the higher 10 bits of the ipxd[15:0] signal are different from the lower 6 bits (the latter can be forced to 0 with the cnven signal). Are these 10 bits similar to the 10 bits output of the 333 sensor ?

sensorpads353.v
    ...
    ipxd_pre[ 5:0] <= cnven?6'b0:{ipxded[3:2],pxd14?ipxded[1:0]:2'h0,2'h0};
    ipxd_pre[15:6] <= ipxded[15:6];
    ipxd[15:0] <= ipxd_pre[15:0];
    ...

--JeremViewsurf 03:37, 31 March 2008 (CDT)


The 5MPix sensor does have 12 bit output and all the bits are used during "gamma"-table conversion to 8 bits. Even more - the same FPGA code now supports 10347 boards (for Kodak KAI-11002/16000 CCD sensors) that produce 14-bit data.

After the data (10-12-14 bits) reaches the FPGA, it can go two ways - through the LUT (with linear interpolation) conversion to 8 bits and raw - 2 bytes/pixel format, where pixel data is MSB-aligned (actually bit 14 being the highest, bit 15 is left 0 to allow subtraction without overflow). I believe you should use that part (from the sensor interface to the SDRAM - including gamma tables and histograms) the same for Theora as it is now in 353 (for JPEG)--Andrey.filippov 12:09, 31 March 2008 (CDT)




I restored the sensorpix353 and sensortrig353 in the Theora design and tested the data path from the sensor to the dma output. Looks like there is no data coming out of the mcontr channel 7 (after the token read buffer). Same issue with the 333 Theora simulation although the design is obviously correct. This might be a flaw in the testbench (?) as I used somehow the same for the two designs. Moreover I can't simulated over 2ms and I can't figure out why the simulations stops at that time. Maybe the second stage of the compressor processes data after 2 ms so it is never reached by the simulator (?)

Datapath.jpg

--JeremViewsurf 05:17, 7 April 2008 (CDT)

"simulation over 2ms" - $stop in line 639 of x353t.tf finishes the simulation. And there's one file is missing in cvs - linear1028rgb.dat - Jeremy, could you upload it.--Oleg 11:44, 8 April 2008 (CDT)

The missing file is now on the cvs, let me know if something else is missing. If you want to check the set of signals in my simulation above, use pxdpath.sav in gtkwave. Before $stop in line 639 of x353t.tf there is now a 5ms delay (#5000000) to let the simulation run, it stops at 700µs... --JeremViewsurf 02:24, 9 April 2008 (CDT)

I needed to copy 'defines333.vh' and '*.dat's from theora/ folder.--Oleg 03:59, 10 April 2008 (CDT)

I'm now trying with a 5 ms delay and without $stop after the delay, the simulation is currently at 2.3 ms and is still running ! I looked at the waves, still no output after ch7 (after mcontr_tok_rd to be more precise). --JeremViewsurf 03:44, 9 April 2008 (CDT)




At last, I have some output after the memory controller channel 7 after 5h30 of simulation... Basically it seems that a whole frame have to be processed before having any output. Maybe changing nstx and nsty (number of blocks in a row and rows in a frame) parameters would reduce simulation time.

Finaldatapath.jpg

--JeremViewsurf 07:53, 11 April 2008 (CDT)

Jeremy, what frame size do you try to simulate ? You should try on a minimal: 32x128. Look on the 333 Theora testbench for more info. --Alexandre.Poltorak 14:28, 11 April 2008 (CDT)

I realized that I did not check the frame dimensions and I tried to simulate 1280x1024 frames (init_dimensions(9,7)). I changed to init_dimensions(1,1) thus setting nstx and nsty to 1. I now have output on ch7 after 2 ms, thanks. --JeremViewsurf 04:07, 14 April 2008 (CDT)




I carried out the synthesis of the design with ISE, I am able to get the .bit file with some warnings (old constraints file from 353 : some timings not met, some clocks warnings and obviously unused signals) but no error. I started to have a look at the driver and made a first version of the definition file x3x3.h (see CVS). Some macros at the end may be obsolete as they command the 353 memory controller. --JeremViewsurf 08:32, 17 April 2008 (CDT)

Problem while compiling with definition file x3x3.h :

Making install in packages/os/linux-2.6 for crisv32-axis-linux-gnu
make[2]: entrant dans le répertoire « /home/larrey/Elphel/353t_clean/elphel353-7.1.7.11/elphel353t/packages/os/linux-2.6-R1_4_2 »
subdirs=
  MAKE    /home/larrey/Elphel/353t_clean/elphel353-7.1.7.11/elphel353t/os/linux-2.6 zImage
make[3]: entrant dans le répertoire « /home/larrey/Elphel/353t_clean/elphel353-7.1.7.11/elphel353t/os/linux-2.6-tag--devboard-R2_10-4 »
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CHK     include/linux/compile.h
  CC      arch/cris/arch-v32/drivers/elphel/fpgajtag.o
In file included from arch/cris/arch-v32/drivers/elphel/fpgajtag.c:95:
arch/cris/arch-v32/drivers/elphel/x3x3.h:963:45: "[" may not appear in macro parameter list
make[5]: *** [arch/cris/arch-v32/drivers/elphel/fpgajtag.o] Erreur 1

At the first occurrence of #include "x3x3.h" some 353 macros such as #define X313_IRQSTATE(port_csp0_addr[0x11],port_csp0_addr[0x11]) with the "[" character stops the compilation. --JeremViewsurf 10:48, 21 April 2008 (CDT)

Strange. Could you pls use LANG=CC ? So others will be able to read error messages too :) What GNU/Linux distribution do you use ? what version of gcc-cris? Also it would be better to sync to last CVS, many bugs was resolved since .11. --Alexandre.Poltorak 12:45, 21 April 2008 (CDT) In the http://elphel.cvs.sourceforge.net/elphel/elphel353-7.1/os/linux-2.6-tag--devboard-R2_10-4/arch/cris/arch-v32/drivers/elphel/x353.h?view=markup 1053: #define X313_IRQSTATE (port_csp0_addr[0x11],port_csp0_addr[0x11]) there is a space between "X313_IRQSTATE" and "(" separating macro name with 0 parameters and the macro body. Where could that space character disappear in your example?--Andrey.filippov 14:22, 21 April 2008 (CDT)

Andrey is right, that was just a stupid space problem... By the way, I'll get the latest version of the source code as I'm working on the driver (well just trying to understand how is it structured right now...). Thanks for the quick answer. --JeremViewsurf 02:17, 22 April 2008 (CDT)




Working on the driver made me realize that the compressor never reached the end of a frame... The compressor just stopped processing data after a while and a bunch of signal switched to an undetermined state. I was working on this issue and realized that I got a problem with clock phases and data was written to wrong locations in the sdram. Now it looks like it's fixed, I don't have undetermined signals anymore and the done_compress signal goes high after a frame is processed. Fixed files have been committed.

I am now dealing with interrupts and I can't make the IRQ signal go high when done_compress is high like the 333 design. In the 333 design there's a simple interrupt manager that let the irq go through according to a certain mask value. However I don't understand how the 353 interrupt vector block works... At that time I just plugged the interrupt sources (done_compress, reset_compress etc) in the irq_in input but obviously I think there are more to do to make that work. --JeremViewsurf 03:32, 22 May 2008 (CDT)

By the way, I tried to use the old 333 interrupts manager block with the new design and it seems to work with the simulation (IRQ goes high for some time when the frame is done processing). However I'm stuck with how to modify the device driver with this design, (R_IRQ_MASK0_SET used but undeclared). --JeremViewsurf 09:07, 22 May 2008 (CDT)