Difference between revisions of "Poky migration from rocko to warrior"

From ElphelWiki
Jump to: navigation, search
(- Note 4: PHP causing 'unsupported FP instruction in kernel mode')
([SOLVED] Note 3: Entropy device hwrng)
(2 intermediate revisions by the same user not shown)
Line 92: Line 92:
 
** Haven't found if Xilinx uses any driver for /dev/hwrng
 
** Haven't found if Xilinx uses any driver for /dev/hwrng
 
** TODO: Find out if the order of entropy sources can be changed
 
** TODO: Find out if the order of entropy sources can be changed
** That lag at boot is really annoying - 5-10 seconds?!!
+
** That lag at boot is really annoying - 5 secs?!!
  
==<font color='green'>'''SOLVED'''</font> Note 4: PHP causing 'unsupported FP instruction in kernel mode'==
+
==<font color='green'>'''[SOLVED]'''</font> Note 4: PHP causing 'unsupported FP instruction in kernel mode'==
 
* Kernel Oops:
 
* Kernel Oops:
 
  <font size='1'>[  35.872118] BUG: unsupported FP instruction in kernel mode
 
  <font size='1'>[  35.872118] BUG: unsupported FP instruction in kernel mode
Line 148: Line 148:
 
  - <s>switched from '''-mfloat-abi=softfp''' to '''-mfloat-abi=hard''' - the problem seems to go away - but is it 100%?</s>
 
  - <s>switched from '''-mfloat-abi=softfp''' to '''-mfloat-abi=hard''' - the problem seems to go away - but is it 100%?</s>
 
  - used kmalloc instead of auto variable in mt9x001_pgm_initsensor() - no Oopses so far
 
  - used kmalloc instead of auto variable in mt9x001_pgm_initsensor() - no Oopses so far
 +
 +
* More notes on debugging
 +
- CONFIG_DEBUG_STACK_USAGE=y
 +
  and it reports how many bytes left in stack for various processes. For that particular process (php) the "bytes left" were '''4''' on successful boots and
 +
  ~'''1028''' after a huge variable (of 1024 bytes) got moved to heap.
 +
- Also there's a warning in Eclipse about "frame size" beaing larger than 1024
  
 
==<font color='green'>'''[SOLVED]'''</font> Note 5: Bring up NAND OTP support==
 
==<font color='green'>'''[SOLVED]'''</font> Note 5: Bring up NAND OTP support==

Revision as of 11:06, 10 September 2019

Elphel's kernel tree

.
├── arch
│   └── arm
│       └── boot
│           └── dts/ # device trees for 393 cameras, considering tested
├── drivers
│   ├── ata
│   │   ├── ahci_elphel.c # tested reading and writing from/to SSD
│   │   └── libata-eh.c
│   ├── char
│   │   └── xilinx_devcfg.c # tested bitstream loading - brought back the old character device driver, it's simpler this way than the new one FPGA manager that can load only .bit.bin files
│   ├── clk
│   │   └── clk-si5338.c # chip found, no errors
│   ├── elphel
│   │   ├── circbuf.c # tested via recording
│   │   ├── clock10359.c
│   │   ├── command_sequencer.c # ok
│   │   ├── cxi2c.c
│   │   ├── detect_sensors.c
│   │   ├── elphel393-init.c # ok
│   │   ├── elphel393-mem.c # ok
│   │   ├── elphel393-pwr.c # ok
│   │   ├── exif393.c
│   │   ├── fpgajtag353.c
│   │   ├── framepars.c # ok
│   │   ├── gamma_tables.c # affects images which look ok
│   │   ├── histograms.c # displayed
│   │   ├── imu_log393.c
│   │   ├── jpeghead.c
│   │   ├── klogger_393.c
│   │   ├── lepton.c
│   │   ├── mt9f002.c
│   │   ├── mt9x001.c # sensor is programmed correctly
│   │   ├── multi10359.c
│   │   ├── pgm_functions.c # parameters are getting applied correctly (mt9p006)
│   │   ├── quantization_tables.c # images not broken
│   │   ├── sensor_common.c
│   │   ├── sensor_i2c.c
│   │   ├── x393.c
│   │   ├── x393_fpga_functions.c # ok
│   │   └── x393_videomem.c # also used in circbuf => recording => works
│   ├── misc
│   │   ├── ltc3589.c
│   │   └── vsc330x.c # switching between internal and external SSD ports works
│   ├── mmc
│   │   └── host
│   │       └── sdhci.c # this needed chip detect ORed with dat3: SDHCI_ANY_PRESENT = SDHCI_CARD_PRESENT | SDHCI_DAT3_PRESENT
│   ├── mtd
│   │   └── nand # added functions to work with OTP, tested only reading
│   │       ├── nand_base.c
│   │       ├── nandchip-micron.c
│   │       └── pl35x_nand.c
│   ├── net
│   │   └── ethernet
│   │       └── cadence
│   │           └── macb_main.c # needed fixup for Atheros chip - disable SmartEEE
│   └── rtc
│       └── rtc-m41t80.c # updated to latest version. Our changes only ignore Oscillator failure at boot at m41t80_get_datetime().
├── helpers
│   └── si5338_register_map_dts.py # test it?
├── other
│   └── mem.py
└── patches
    ├── ahci.patch
    ├── drivers-elphel.patch
    ├── garmin_usb.c.patch
    └── libahci.patch

[SOLVED] Note 1: Bring back fpga char device

  • /dev/xdevfg got retired by Xilinx - instead there's the FPGA 'Manager' which is unable to load a simple *.bit (only *.bin or *.bit.bin).
  • Solution:
Brought back the old driver (drivers/char/xilinx_devcfg.c and edited Kconfig and Makefile)- it works as it used to

[SOLVED] Note 2: Build php 5.6.40

  • php 5.6.40 - EOL and won't build - mysql supposedly moved header files.
  • Solution:
Disabled mysql extension:
To meta-elphel393/recipes-devtools/php/php_5.6.%.bbappend:
    PACKAGECONFIG[mysql] = "--without-mysql --without-mysqli --without-pdo-mysql"
    CFLAGS += " -ldl"

[SOLVED] Note 3: Entropy device hwrng

  • New package rng-tools is whining: Failed to init entropy source hwrng
  • Solution:
Leave as is for now. The full log is:
Initalizing available sources
Failed to init entropy source hwrng
Enabling JITTER rng support
Initalizing entropy source jitter
  • Comments:
    • Haven't found if Xilinx uses any driver for /dev/hwrng
    • TODO: Find out if the order of entropy sources can be changed
    • That lag at boot is really annoying - 5 secs?!!

[SOLVED] Note 4: PHP causing 'unsupported FP instruction in kernel mode'

  • Kernel Oops:
[   35.872118] BUG: unsupported FP instruction in kernel mode
[   35.877621] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[   35.883380] Modules linked in:
[   35.886498] CPU: 1 PID: 1756 Comm: php Not tainted 4.14.0-xilinx-v2018.3 #1
[   35.893459] Hardware name: Xilinx Zynq Platform
[   35.897989] task: ee83f280 task.stack: ef1d6000
[   35.902527] PC is at vfp_reload_hw+0x30/0x44
[   35.906802] LR is at __und_usr_fault_32+0x0/0x8
[   35.911338] pc : [<c0102e10>]    lr : [<c010c280>]    psr: a0000013
[   35.917529] sp : ef1d7fb0  ip : 00000051  fp : 00000001
[   35.922813] r10: ef1d61f8  r9 : c010c308  r8 : ee9893c0
[   35.928040] r7 : 00000001  r6 : 00400100  r5 : c0138d08  r4 : ecd600f8
[   35.934569] r3 : c0c6c064  r2 : b67bde8c  r1 : ecd9a224  r0 : eeb00a40
[   35.941098] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   35.948241] Control: 18c5387d  Table: 2cda404a  DAC: 00000051
[   35.953993] Process php (pid: 1756, stack limit = 0xef1d6210)
[   35.959740] Stack: (0xef1d7fb0 to 0xef1d8000)
[   35.964020] 7fa0:                                     a5f43f50 a5f43e18 00000080 00000000
[   35.972269] 7fc0: 00000000 a5f43f4c b687b338 000000ae 00000000 bedcdfe4 00000001 a5f43ffc
[   35.980385] 7fe0: a5f43f50 a5f43d7c b676cf78 b67bde8c 60000010 ffffffff 00000000 00000000
[   35.988626] Code: 128aa080 e89a0162 e3110102 0a000003 (eee96a10) 
[   35.994724] ---[ end trace 06029778db6d2d90 ]---
[   35.999422] note: php[1756] exited with preempt_count 2
  • Unsupported floating point instruction in kernel?
  • Details:
- single sensor (MT9P006) on port 0
- at boot
- after the Oops the camera seems to be operating normal
- the appearance is random (but <50%) - easier reproduced with reboot -f than power cycle
- autocampars.php runs at boot and sometimes causes this - it happens after 0 is written to initiate sensors' driver
- fpga is already programmed
- after mt9x001_pgm_initsensor() exit
- autocampars.php log seems to be ok and full
- tested with 2 boards
  • Causes?
 - kernel?
   - some racing conditions?
   - huge variables in the stack overflow it at mt9x001.c:mt9x001_pgm_initsensor()
 - php? 
   - too old? the version 5.6.40 is EOL
   - got built with some package that is too new for it? Like it won't build with newer mysql
  • Solution?:
- Took arch/vfp/vfpmodule.c from kernel 4.19. The current was 4.14. It didn't work. Roll back and check which php call caused it? Also might be a linux driver.
- Try php 7.x.x - need to update the extension
- Try php 5.6.31 (the one that used to work) - Opps persists
- On the bright side, at least it's not a kernel panic
- switched from -mfloat-abi=softfp to -mfloat-abi=hard - the problem seems to go away - but is it 100%?
- used kmalloc instead of auto variable in mt9x001_pgm_initsensor() - no Oopses so far
  • More notes on debugging
- CONFIG_DEBUG_STACK_USAGE=y
  and it reports how many bytes left in stack for various processes. For that particular process (php) the "bytes left" were 4 on successful boots and
  ~1028 after a huge variable (of 1024 bytes) got moved to heap.
- Also there's a warning in Eclipse about "frame size" beaing larger than 1024

[SOLVED] Note 5: Bring up NAND OTP support

  • MAC is not read from NAND, displays the default: 00:0e:64:10:00:00
  • Problem?
[    3.639851] elphel393-init: Flash page read, code -95
  • Comments:
    • Lookup what had changed.
  • Solution: (for xlnx_rebase_v4.14 branch of linux-xlnx):
In drivers/mtd/nand_base.c in nand_scan_tail() they call nand_manufacturer_init()
which is mapped to a new driver drivers/mtd/nand_micron.c.
So, when it fails - the driver init fails - mtd functions do not get assigned. 
(And the driver (drivers/elphel/elphel393_init.c) that reads from OTP area returns
-95 which is EOPNOTSUPP.)
We just need to fall through for a quick fix.
The reason that function exits with an error is it decides that it does not support
forcefully enabled on-die ECC. And this needs to be investigated.

[SOLVED] Note 6: udev - unknown group 'kvm'

  • Problem:
[    5.817352] udevd[1478]: starting version 3.2.7
[    5.918028] udevd[1478]: specified group 'kvm' unknown
[    5.986364] udevd[1479]: starting eudev-3.2.7
[    6.142897] udevd[1479]: specified group 'kvm' unknown
  • Solution:
KVM == Kernel-based Virtual Machine. Remove for now (and maybe forever)
.
└── udev
    ├── eudev
    │   └── 50-udev-default.rules
    └── eudev_3.2.7.bbappend
50-udev-default.rules - gets installed over the original file.

[SOLVED] Note 7: Add back fixup for Atheros to updated ethernet driver

  • Problem:
Ethernet driver's structure has changed. It was split into several files.
Lives at /driver/net/ethernet/cadence/
  • Soluton:
For out ethernet chip (Atheros 80xx) a fixup had to be added to disable SmartEEE.
It's a single function, call and a couple defines - added all back to the new driver structure.

[SOLVED] Note 8: u-boot update

  • update u-boot
  • solution:
Updated to 2019.07 mainstream u-boot
- converted our *.h (with params used to generate SPL header) to Kconfigs
- updated driver for NAND flash - tested both boot modes - mmc and nand

[SOLVED] Note 9: test camogm

  • test camogm
/var/state/camogm_cmd accepts only the first write - switch to polling?
when switched to polling - when recording - buffer gets overflow. Because the polling version does not work correctly probably.
All is working for the version without polling - after adding EOF reset (clearerr(npipe)) right after reading from the pipe and checking if feof().

[SOLVED] Note 10: test streamer

  • test streamer
Streamer works

[SOLVED] Note 11: test AHCI driver

  • test ahci driver
  • results:
- SSD is detected and automounted
- write/read works

[SOLVED] Note 12: test raw recording

  • test recording on a raw partition
  • comments:
There was a typo in camogm_align.c - it was not aligning when it should have.
CHUNK_LEADER changed to CHUNK_HEADER in line 339:
...
if (chunks[CHUNK_HEADER].iov_len != 0){ // only if it is not TIFF
...

[SOLVED] Note 13: FLIR Lepton 3.5 sensor: NULL pointer dereference

  • Solution:
Forgot to pull the latest device tree with lepton description
Old device tree didn't have i2c configuration for lepton hence something returned NULL
  • Original log:
framepars_operations elphel393-framepars@0: Configuring compressor DMA channels
circbuf elphel393-circbuf@0: Setting i2c drive mode for port 0
circbuf elphel393-circbuf@0: register_i2c_sensor()
detect_sensors elphel393-detect_sensors@0: detect_sensors_par2addr_init(): sensorPortConfig[0].sensor[0] = 0x44
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ecdb4000
[00000000] *pgd=00000000
Internal error: Oops - BUG: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 1 PID: 1755 Comm: php Not tainted 4.14.0-xilinx-v2018.3 #1
Hardware name: Xilinx Zynq Platform
task: ee80cd80 task.stack: ecda0000
PC is at register_i2c_sensor+0x244/0x2ac
LR is at 0x0
pc : [<c05a19e8>]    lr : [<00000000>]    psr: 60030013
sp : ecda1480  ip : ecda14a8  fp : 00000000
r10: c0ee625c  r9 : 000000fc  r8 : 00000000
r7 : 00000028  r6 : ecda14a8  r5 : c0c3ca58  r4 : 00000000
r3 : 00000000  r2 : c09b093a  r1 : ee973c91  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 18c5387d  Table: 2cdb404a  DAC: 00000051
Process php (pid: 1755, stack limit = 0xecda0210)
Stack: (0xecda1480 to 0xecda2000)
...
---[ end Kernel panic - not syncing: Fatal exception in interrupt