Difference between revisions of "Poky migration from rocko to warrior"

From ElphelWiki
Jump to: navigation, search
([SOLVED] Note 3: Entropy device hwrng)
([SOLVED] Note 14: fixdep: Permission denied)
 
(6 intermediate revisions by the same user not shown)
Line 260: Line 260:
 
  ...
 
  ...
 
  ---[ end Kernel panic - not syncing: Fatal exception in interrupt
 
  ---[ end Kernel panic - not syncing: Fatal exception in interrupt
 +
 +
==<font color='green'>'''[SOLVED]'''</font> Note 14: fixdep: Permission denied==
 +
* Description:
 +
- We've had this error for a while, probably since kernel 4.0
 +
- usually happened when running do_compile_kernelmodules
 +
- EXTRA_OEMAKE = "-s -w '''-B''' KCFLAGS='-v'"
 +
- That '''-B''' forces to rebuild all targets and we also have '''-j8''' (in PARALLEL_MAKE variable) for the parallel build
 +
  - so when running the parallel build fixdep gets rebuilt several times and at some point
 +
    one of the targets (e.g. sortextable or kallsyms) calls it while fixdep is being compiled and overwritten for another target (probably)
 +
  - the exec rights are correct after the fact
 +
 +
* Solution:
 +
  Removed '''-B'''. It make fixdep build only once and the problem is gone.
 +
 +
* Note:
 +
  ~$ make -h
 +
    ...
 +
    -B, --always-make          Unconditionally make all targets.
 +
    ...

Latest revision as of 16:49, 3 October 2019

Elphel's kernel tree

.
├── arch
│   └── arm
│       └── boot
│           └── dts/ # device trees for 393 cameras, considering tested
├── drivers
│   ├── ata
│   │   ├── ahci_elphel.c # tested reading and writing from/to SSD
│   │   └── libata-eh.c
│   ├── char
│   │   └── xilinx_devcfg.c # tested bitstream loading - brought back the old character device driver, it's simpler this way than the new one FPGA manager that can load only .bit.bin files
│   ├── clk
│   │   └── clk-si5338.c # chip found, no errors
│   ├── elphel
│   │   ├── circbuf.c # tested via recording
│   │   ├── clock10359.c
│   │   ├── command_sequencer.c # ok
│   │   ├── cxi2c.c
│   │   ├── detect_sensors.c
│   │   ├── elphel393-init.c # ok
│   │   ├── elphel393-mem.c # ok
│   │   ├── elphel393-pwr.c # ok
│   │   ├── exif393.c
│   │   ├── fpgajtag353.c
│   │   ├── framepars.c # ok
│   │   ├── gamma_tables.c # affects images which look ok
│   │   ├── histograms.c # displayed
│   │   ├── imu_log393.c
│   │   ├── jpeghead.c
│   │   ├── klogger_393.c
│   │   ├── lepton.c
│   │   ├── mt9f002.c
│   │   ├── mt9x001.c # sensor is programmed correctly
│   │   ├── multi10359.c
│   │   ├── pgm_functions.c # parameters are getting applied correctly (mt9p006)
│   │   ├── quantization_tables.c # images not broken
│   │   ├── sensor_common.c
│   │   ├── sensor_i2c.c
│   │   ├── x393.c
│   │   ├── x393_fpga_functions.c # ok
│   │   └── x393_videomem.c # also used in circbuf => recording => works
│   ├── misc
│   │   ├── ltc3589.c
│   │   └── vsc330x.c # switching between internal and external SSD ports works
│   ├── mmc
│   │   └── host
│   │       └── sdhci.c # this needed chip detect ORed with dat3: SDHCI_ANY_PRESENT = SDHCI_CARD_PRESENT | SDHCI_DAT3_PRESENT
│   ├── mtd
│   │   └── nand # added functions to work with OTP, tested only reading
│   │       ├── nand_base.c
│   │       ├── nandchip-micron.c
│   │       └── pl35x_nand.c
│   ├── net
│   │   └── ethernet
│   │       └── cadence
│   │           └── macb_main.c # needed fixup for Atheros chip - disable SmartEEE
│   └── rtc
│       └── rtc-m41t80.c # updated to latest version. Our changes only ignore Oscillator failure at boot at m41t80_get_datetime().
├── helpers
│   └── si5338_register_map_dts.py # test it?
├── other
│   └── mem.py
└── patches
    ├── ahci.patch
    ├── drivers-elphel.patch
    ├── garmin_usb.c.patch
    └── libahci.patch

[SOLVED] Note 1: Bring back fpga char device

  • /dev/xdevfg got retired by Xilinx - instead there's the FPGA 'Manager' which is unable to load a simple *.bit (only *.bin or *.bit.bin).
  • Solution:
Brought back the old driver (drivers/char/xilinx_devcfg.c and edited Kconfig and Makefile)- it works as it used to

[SOLVED] Note 2: Build php 5.6.40

  • php 5.6.40 - EOL and won't build - mysql supposedly moved header files.
  • Solution:
Disabled mysql extension:
To meta-elphel393/recipes-devtools/php/php_5.6.%.bbappend:
    PACKAGECONFIG[mysql] = "--without-mysql --without-mysqli --without-pdo-mysql"
    CFLAGS += " -ldl"

[SOLVED] Note 3: Entropy device hwrng

  • New package rng-tools is whining: Failed to init entropy source hwrng
  • Solution:
Leave as is for now. The full log is:
Initalizing available sources
Failed to init entropy source hwrng
Enabling JITTER rng support
Initalizing entropy source jitter
  • Comments:
    • Haven't found if Xilinx uses any driver for /dev/hwrng
    • TODO: Find out if the order of entropy sources can be changed
    • That lag at boot is really annoying - 5 secs?!!

[SOLVED] Note 4: PHP causing 'unsupported FP instruction in kernel mode'

  • Kernel Oops:
[   35.872118] BUG: unsupported FP instruction in kernel mode
[   35.877621] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[   35.883380] Modules linked in:
[   35.886498] CPU: 1 PID: 1756 Comm: php Not tainted 4.14.0-xilinx-v2018.3 #1
[   35.893459] Hardware name: Xilinx Zynq Platform
[   35.897989] task: ee83f280 task.stack: ef1d6000
[   35.902527] PC is at vfp_reload_hw+0x30/0x44
[   35.906802] LR is at __und_usr_fault_32+0x0/0x8
[   35.911338] pc : [<c0102e10>]    lr : [<c010c280>]    psr: a0000013
[   35.917529] sp : ef1d7fb0  ip : 00000051  fp : 00000001
[   35.922813] r10: ef1d61f8  r9 : c010c308  r8 : ee9893c0
[   35.928040] r7 : 00000001  r6 : 00400100  r5 : c0138d08  r4 : ecd600f8
[   35.934569] r3 : c0c6c064  r2 : b67bde8c  r1 : ecd9a224  r0 : eeb00a40
[   35.941098] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   35.948241] Control: 18c5387d  Table: 2cda404a  DAC: 00000051
[   35.953993] Process php (pid: 1756, stack limit = 0xef1d6210)
[   35.959740] Stack: (0xef1d7fb0 to 0xef1d8000)
[   35.964020] 7fa0:                                     a5f43f50 a5f43e18 00000080 00000000
[   35.972269] 7fc0: 00000000 a5f43f4c b687b338 000000ae 00000000 bedcdfe4 00000001 a5f43ffc
[   35.980385] 7fe0: a5f43f50 a5f43d7c b676cf78 b67bde8c 60000010 ffffffff 00000000 00000000
[   35.988626] Code: 128aa080 e89a0162 e3110102 0a000003 (eee96a10) 
[   35.994724] ---[ end trace 06029778db6d2d90 ]---
[   35.999422] note: php[1756] exited with preempt_count 2
  • Unsupported floating point instruction in kernel?
  • Details:
- single sensor (MT9P006) on port 0
- at boot
- after the Oops the camera seems to be operating normal
- the appearance is random (but <50%) - easier reproduced with reboot -f than power cycle
- autocampars.php runs at boot and sometimes causes this - it happens after 0 is written to initiate sensors' driver
- fpga is already programmed
- after mt9x001_pgm_initsensor() exit
- autocampars.php log seems to be ok and full
- tested with 2 boards
  • Causes?
 - kernel?
   - some racing conditions?
   - huge variables in the stack overflow it at mt9x001.c:mt9x001_pgm_initsensor()
 - php? 
   - too old? the version 5.6.40 is EOL
   - got built with some package that is too new for it? Like it won't build with newer mysql
  • Solution?:
- Took arch/vfp/vfpmodule.c from kernel 4.19. The current was 4.14. It didn't work. Roll back and check which php call caused it? Also might be a linux driver.
- Try php 7.x.x - need to update the extension
- Try php 5.6.31 (the one that used to work) - Opps persists
- On the bright side, at least it's not a kernel panic
- switched from -mfloat-abi=softfp to -mfloat-abi=hard - the problem seems to go away - but is it 100%?
- used kmalloc instead of auto variable in mt9x001_pgm_initsensor() - no Oopses so far
  • More notes on debugging
- CONFIG_DEBUG_STACK_USAGE=y
  and it reports how many bytes left in stack for various processes. For that particular process (php) the "bytes left" were 4 on successful boots and
  ~1028 after a huge variable (of 1024 bytes) got moved to heap.
- Also there's a warning in Eclipse about "frame size" beaing larger than 1024

[SOLVED] Note 5: Bring up NAND OTP support

  • MAC is not read from NAND, displays the default: 00:0e:64:10:00:00
  • Problem?
[    3.639851] elphel393-init: Flash page read, code -95
  • Comments:
    • Lookup what had changed.
  • Solution: (for xlnx_rebase_v4.14 branch of linux-xlnx):
In drivers/mtd/nand_base.c in nand_scan_tail() they call nand_manufacturer_init()
which is mapped to a new driver drivers/mtd/nand_micron.c.
So, when it fails - the driver init fails - mtd functions do not get assigned. 
(And the driver (drivers/elphel/elphel393_init.c) that reads from OTP area returns
-95 which is EOPNOTSUPP.)
We just need to fall through for a quick fix.
The reason that function exits with an error is it decides that it does not support
forcefully enabled on-die ECC. And this needs to be investigated.

[SOLVED] Note 6: udev - unknown group 'kvm'

  • Problem:
[    5.817352] udevd[1478]: starting version 3.2.7
[    5.918028] udevd[1478]: specified group 'kvm' unknown
[    5.986364] udevd[1479]: starting eudev-3.2.7
[    6.142897] udevd[1479]: specified group 'kvm' unknown
  • Solution:
KVM == Kernel-based Virtual Machine. Remove for now (and maybe forever)
.
└── udev
    ├── eudev
    │   └── 50-udev-default.rules
    └── eudev_3.2.7.bbappend
50-udev-default.rules - gets installed over the original file.

[SOLVED] Note 7: Add back fixup for Atheros to updated ethernet driver

  • Problem:
Ethernet driver's structure has changed. It was split into several files.
Lives at /driver/net/ethernet/cadence/
  • Soluton:
For out ethernet chip (Atheros 80xx) a fixup had to be added to disable SmartEEE.
It's a single function, call and a couple defines - added all back to the new driver structure.

[SOLVED] Note 8: u-boot update

  • update u-boot
  • solution:
Updated to 2019.07 mainstream u-boot
- converted our *.h (with params used to generate SPL header) to Kconfigs
- updated driver for NAND flash - tested both boot modes - mmc and nand

[SOLVED] Note 9: test camogm

  • test camogm
/var/state/camogm_cmd accepts only the first write - switch to polling?
when switched to polling - when recording - buffer gets overflow. Because the polling version does not work correctly probably.
All is working for the version without polling - after adding EOF reset (clearerr(npipe)) right after reading from the pipe and checking if feof().

[SOLVED] Note 10: test streamer

  • test streamer
Streamer works

[SOLVED] Note 11: test AHCI driver

  • test ahci driver
  • results:
- SSD is detected and automounted
- write/read works

[SOLVED] Note 12: test raw recording

  • test recording on a raw partition
  • comments:
There was a typo in camogm_align.c - it was not aligning when it should have.
CHUNK_LEADER changed to CHUNK_HEADER in line 339:
...
if (chunks[CHUNK_HEADER].iov_len != 0){ // only if it is not TIFF
...

[SOLVED] Note 13: FLIR Lepton 3.5 sensor: NULL pointer dereference

  • Solution:
Forgot to pull the latest device tree with lepton description
Old device tree didn't have i2c configuration for lepton hence something returned NULL
  • Original log:
framepars_operations elphel393-framepars@0: Configuring compressor DMA channels
circbuf elphel393-circbuf@0: Setting i2c drive mode for port 0
circbuf elphel393-circbuf@0: register_i2c_sensor()
detect_sensors elphel393-detect_sensors@0: detect_sensors_par2addr_init(): sensorPortConfig[0].sensor[0] = 0x44
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ecdb4000
[00000000] *pgd=00000000
Internal error: Oops - BUG: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 1 PID: 1755 Comm: php Not tainted 4.14.0-xilinx-v2018.3 #1
Hardware name: Xilinx Zynq Platform
task: ee80cd80 task.stack: ecda0000
PC is at register_i2c_sensor+0x244/0x2ac
LR is at 0x0
pc : [<c05a19e8>]    lr : [<00000000>]    psr: 60030013
sp : ecda1480  ip : ecda14a8  fp : 00000000
r10: c0ee625c  r9 : 000000fc  r8 : 00000000
r7 : 00000028  r6 : ecda14a8  r5 : c0c3ca58  r4 : 00000000
r3 : 00000000  r2 : c09b093a  r1 : ee973c91  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 18c5387d  Table: 2cdb404a  DAC: 00000051
Process php (pid: 1755, stack limit = 0xecda0210)
Stack: (0xecda1480 to 0xecda2000)
...
---[ end Kernel panic - not syncing: Fatal exception in interrupt

[SOLVED] Note 14: fixdep: Permission denied

  • Description:
- We've had this error for a while, probably since kernel 4.0
- usually happened when running do_compile_kernelmodules
- EXTRA_OEMAKE = "-s -w -B KCFLAGS='-v'"
- That -B forces to rebuild all targets and we also have -j8 (in PARALLEL_MAKE variable) for the parallel build
  - so when running the parallel build fixdep gets rebuilt several times and at some point
    one of the targets (e.g. sortextable or kallsyms) calls it while fixdep is being compiled and overwritten for another target (probably)
  - the exec rights are correct after the fact
  • Solution:
 Removed -B. It make fixdep build only once and the problem is gone.
  • Note:
 ~$ make -h
   ...
   -B, --always-make           Unconditionally make all targets.
   ...