Difference between revisions of "Axis DMA Bug"

From ElphelWiki
Jump to: navigation, search
(Created page with "=About= There is a hardware bug in AXIS CPU/peripherals that prevents DMA from working properly in 10353 boards. Axis engineers said that it is possible to mitigate that bug in s...")
 
 
(One intermediate revision by the same user not shown)
Line 13: Line 13:
  
 
I still think it is possible to make the cards based on these chips work in the DMA mode without any possible secret commands or other tricks (if they actually exist), by modifying the behavior of the driver. Make it limit the transfers to just 2 blocks (rather easy -- I already tried that). And, more importantly, remove the dependency on the IRQ generated at the end of the transfer, by using the internally (in ETRAX or system DMA controller) generated interrupt when the required number of bytes were transferred -- however, that is a much larger driver modification that I did not have a chance to work on. It should be applicable to other architectures too, not just to the ETRAX FS IDE controller. This is my excuse for going into such detail describing the nature of the CF cards DMA problems.
 
I still think it is possible to make the cards based on these chips work in the DMA mode without any possible secret commands or other tricks (if they actually exist), by modifying the behavior of the driver. Make it limit the transfers to just 2 blocks (rather easy -- I already tried that). And, more importantly, remove the dependency on the IRQ generated at the end of the transfer, by using the internally (in ETRAX or system DMA controller) generated interrupt when the required number of bytes were transferred -- however, that is a much larger driver modification that I did not have a chance to work on. It should be applicable to other architectures too, not just to the ETRAX FS IDE controller. This is my excuse for going into such detail describing the nature of the CF cards DMA problems.
 +
 +
 +
 +
Devices support some of 3 modes:
 +
 +
1) PIO (slow and loads CPU a lot)
 +
2) DMA - fast
 +
3) UDMA - even faster
 +
 +
Most CF cards support only 1 and 3 and respond as if they support 2 also
 +
ETRAX processor has a bug in implementation of the UDMA, so UDMA is disabled, that means that in the camera we only have 1 and 2. When the camera detects the disk, it loads the device capabilities (1,2,3), and selects the fastest it can support - 2. But here the lie of the CF card controller comes to the light - and the camera fails to communicate with the disk. There is a black list of those untruthful cards in the driver, we already added some cards there - http://elphel.cvs.sourceforge.net/viewvc/elphel/elphel353-8.0/os/linux-2.6-tag--devboard-R2_10-4/drivers/ide/ide-dma.c?view=markup
 +
 +
 +
=Information and Sources to verify=
 +
 +
http://developer.axis.com/wiki/lib/exe/fetch.php?media=axis:etrax_fs_errata-1.1.txt
 +
 +
http://mhonarc.axis.se/dev-etrax/msg00275.html <- Is this related to our bug?
 +
 +
http://mhonarc.axis.se/dev-etrax/threads.html has some more DMA threads ( like http://mhonarc.axis.se/dev-etrax/msg01157.html) that could potentially be of interest.

Latest revision as of 02:02, 8 February 2012

About

There is a hardware bug in AXIS CPU/peripherals that prevents DMA from working properly in 10353 boards. Axis engineers said that it is possible to mitigate that bug in software/drivers and gave some hints how to do that.

Details

Andrey reported some of his findings in the article: http://www.linuxfordevices.com/c/a/Linux-For-Devices-Articles/Open-source-camera-records-geotagged-video-to-SATA-HDD/


Mechanical challenges turned out to be not the only ones waiting for me when I worked on connecting the CF cards to the camera. These cards were hanging when the CPU tried to read them using DMA mode (and the card identified itself as supporting DMA mode). I tried to find the problem, and used all the tools I had. I added a bunch of printk's to the driver source, tried different speed settings for the DMA, and finally used an oscilloscope to spy on the signals between the CF card and the CPU. What I found was that the card did actually send the data using DMA mode, but always only for two "sectors" (1024 bytes total), regardless of the number of blocks to transfer written to the corresponding register. Then it silently hung, without activating an IRQ line, even if it was asked to transfer just a single block. And the CPU was relying on that interrupt to continue with the processing of the data read from the CF card. Careful examination of the data on the IDE bus did not reveal any problems (I was expecting something specific to the ETRAX). The same CF card with the DMA mode disabled in the driver worked fine (but slower, of course), as did the IDE hard drive (or SATA through the bridge) with DMA enabled. Googling the issue showed that I'm not the first to have problems with CF cards and DMA. The driver itself had a blacklist for some of the devices that caused problems.


Next thing was to try different CF cards, and see if the problem persisted. I went to newegg.com and ordered nine more, choosing various brands and models. Only one of them, the Sandisk Extreme(R) III 2.0GB, worked. All others exhibited the same behavior described above. So I opened up the first card (QMemory 16GB) to see what kind of a controller chip they use. It turned out to be a Silicon Motion SM222TF. I saw the same problem in a Transcend 32GB card with a SM223TF controller. Exhausting all my ideas, I emailed to the customer support address of the chip manufacturer. At first they were trying to redirect me to the card manufacturer, then admitted that "we have some firmware issue with DMA mode in the past, and I am not sure if the card you have can support DMA mode properly." But when I sent the detailed description of the behavior of the chip, asking if there is any way to mitigate the problem, they just stopped responding to my emails completely.


I still think it is possible to make the cards based on these chips work in the DMA mode without any possible secret commands or other tricks (if they actually exist), by modifying the behavior of the driver. Make it limit the transfers to just 2 blocks (rather easy -- I already tried that). And, more importantly, remove the dependency on the IRQ generated at the end of the transfer, by using the internally (in ETRAX or system DMA controller) generated interrupt when the required number of bytes were transferred -- however, that is a much larger driver modification that I did not have a chance to work on. It should be applicable to other architectures too, not just to the ETRAX FS IDE controller. This is my excuse for going into such detail describing the nature of the CF cards DMA problems.


Devices support some of 3 modes:

1) PIO (slow and loads CPU a lot) 2) DMA - fast 3) UDMA - even faster

Most CF cards support only 1 and 3 and respond as if they support 2 also ETRAX processor has a bug in implementation of the UDMA, so UDMA is disabled, that means that in the camera we only have 1 and 2. When the camera detects the disk, it loads the device capabilities (1,2,3), and selects the fastest it can support - 2. But here the lie of the CF card controller comes to the light - and the camera fails to communicate with the disk. There is a black list of those untruthful cards in the driver, we already added some cards there - http://elphel.cvs.sourceforge.net/viewvc/elphel/elphel353-8.0/os/linux-2.6-tag--devboard-R2_10-4/drivers/ide/ide-dma.c?view=markup


Information and Sources to verify

http://developer.axis.com/wiki/lib/exe/fetch.php?media=axis:etrax_fs_errata-1.1.txt

http://mhonarc.axis.se/dev-etrax/msg00275.html <- Is this related to our bug?

http://mhonarc.axis.se/dev-etrax/threads.html has some more DMA threads ( like http://mhonarc.axis.se/dev-etrax/msg01157.html) that could potentially be of interest.