Difference between revisions of "Poky migration from rocko to warrior"
From ElphelWiki
(→- Note 4: PHP causing 'unsupported FP instruction in kernel mode') |
(→[SOLVED] Note 14: fixdep: Permission denied) |
||
(17 intermediate revisions by the same user not shown) | |||
Line 92: | Line 92: | ||
** Haven't found if Xilinx uses any driver for /dev/hwrng | ** Haven't found if Xilinx uses any driver for /dev/hwrng | ||
** TODO: Find out if the order of entropy sources can be changed | ** TODO: Find out if the order of entropy sources can be changed | ||
+ | ** That lag at boot is really annoying - 5 secs?!! | ||
− | ==<font color='green'>''' | + | ==<font color='green'>'''[SOLVED]'''</font> Note 4: PHP causing 'unsupported FP instruction in kernel mode'== |
− | + | * Kernel Oops: | |
− | * | ||
− | |||
<font size='1'>[ 35.872118] BUG: unsupported FP instruction in kernel mode | <font size='1'>[ 35.872118] BUG: unsupported FP instruction in kernel mode | ||
[ 35.877621] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM | [ 35.877621] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM | ||
Line 121: | Line 120: | ||
[ 35.999422] note: php[1756] exited with preempt_count 2</font> | [ 35.999422] note: php[1756] exited with preempt_count 2</font> | ||
− | Unsupported floating point instruction in kernel? | + | * Unsupported floating point instruction in kernel? |
+ | |||
+ | * Details: | ||
+ | - single sensor (MT9P006) on port 0 | ||
+ | - at boot | ||
+ | - after the Oops the camera seems to be operating normal | ||
+ | - the appearance is random (but <50%) - easier reproduced with ''reboot -f'' than power cycle | ||
+ | - '''autocampars.php''' runs at boot and sometimes causes this - it happens after 0 is written to initiate sensors' driver | ||
+ | - fpga is already programmed | ||
+ | - after mt9x001_pgm_initsensor() exit | ||
+ | - autocampars.php log seems to be ok and full | ||
+ | - tested with 2 boards | ||
− | * | + | * Causes? |
− | + | - kernel? | |
+ | - <s>some racing conditions?</s> | ||
+ | - huge variables in the stack overflow it at mt9x001.c:mt9x001_pgm_initsensor() | ||
+ | <s>- php? | ||
+ | - too old? the version 5.6.40 is EOL | ||
+ | - got built with some package that is too new for it? Like it won't build with newer mysql</s> | ||
− | * | + | * Solution?: |
− | Took arch/vfp/vfpmodule.c from kernel 4.19 | + | - <s>Took arch/vfp/vfpmodule.c from kernel 4.19. The current was 4.14. It didn't work. Roll back and check which php call caused it? Also might be a linux driver.</s> |
− | + | - <s>Try php 7.x.x - need to update the extension</s> | |
− | + | - <s>Try php 5.6.31 (the one that used to work) - Opps persists</s> | |
− | + | - <s>On the bright side, at least it's not a kernel panic</s> | |
− | + | - <s>switched from '''-mfloat-abi=softfp''' to '''-mfloat-abi=hard''' - the problem seems to go away - but is it 100%?</s> | |
+ | - used kmalloc instead of auto variable in mt9x001_pgm_initsensor() - no Oopses so far | ||
− | * | + | * More notes on debugging |
+ | - CONFIG_DEBUG_STACK_USAGE=y | ||
+ | and it reports how many bytes left in stack for various processes. For that particular process (php) the "bytes left" were '''4''' on successful boots and | ||
+ | ~'''1028''' after a huge variable (of 1024 bytes) got moved to heap. | ||
+ | - Also there's a warning in Eclipse about "frame size" beaing larger than 1024 | ||
==<font color='green'>'''[SOLVED]'''</font> Note 5: Bring up NAND OTP support== | ==<font color='green'>'''[SOLVED]'''</font> Note 5: Bring up NAND OTP support== | ||
Line 240: | Line 260: | ||
... | ... | ||
---[ end Kernel panic - not syncing: Fatal exception in interrupt | ---[ end Kernel panic - not syncing: Fatal exception in interrupt | ||
+ | |||
+ | ==<font color='green'>'''[SOLVED]'''</font> Note 14: fixdep: Permission denied== | ||
+ | * Description: | ||
+ | - We've had this error for a while, probably since kernel 4.0 | ||
+ | - usually happened when running do_compile_kernelmodules | ||
+ | - EXTRA_OEMAKE = "-s -w '''-B''' KCFLAGS='-v'" | ||
+ | - That '''-B''' forces to rebuild all targets and we also have '''-j8''' (in PARALLEL_MAKE variable) for the parallel build | ||
+ | - so when running the parallel build fixdep gets rebuilt several times and at some point | ||
+ | one of the targets (e.g. sortextable or kallsyms) calls it while fixdep is being compiled and overwritten for another target (probably) | ||
+ | - the exec rights are correct after the fact | ||
+ | |||
+ | * Solution: | ||
+ | Removed '''-B'''. It make fixdep build only once and the problem is gone. | ||
+ | |||
+ | * Note: | ||
+ | ~$ make -h | ||
+ | ... | ||
+ | -B, --always-make Unconditionally make all targets. | ||
+ | ... |
Latest revision as of 15:49, 3 October 2019
Contents
- 1 Elphel's kernel tree
- 2 [SOLVED] Note 1: Bring back fpga char device
- 3 [SOLVED] Note 2: Build php 5.6.40
- 4 [SOLVED] Note 3: Entropy device hwrng
- 5 [SOLVED] Note 4: PHP causing 'unsupported FP instruction in kernel mode'
- 6 [SOLVED] Note 5: Bring up NAND OTP support
- 7 [SOLVED] Note 6: udev - unknown group 'kvm'
- 8 [SOLVED] Note 7: Add back fixup for Atheros to updated ethernet driver
- 9 [SOLVED] Note 8: u-boot update
- 10 [SOLVED] Note 9: test camogm
- 11 [SOLVED] Note 10: test streamer
- 12 [SOLVED] Note 11: test AHCI driver
- 13 [SOLVED] Note 12: test raw recording
- 14 [SOLVED] Note 13: FLIR Lepton 3.5 sensor: NULL pointer dereference
- 15 [SOLVED] Note 14: fixdep: Permission denied
Elphel's kernel tree
. ├── arch │ └── arm │ └── boot │ └── dts/ # device trees for 393 cameras, considering tested ├── drivers │ ├── ata │ │ ├── ahci_elphel.c # tested reading and writing from/to SSD │ │ └── libata-eh.c │ ├── char │ │ └── xilinx_devcfg.c # tested bitstream loading - brought back the old character device driver, it's simpler this way than the new one FPGA manager that can load only .bit.bin files │ ├── clk │ │ └── clk-si5338.c # chip found, no errors │ ├── elphel │ │ ├── circbuf.c # tested via recording │ │ ├── clock10359.c │ │ ├── command_sequencer.c # ok │ │ ├── cxi2c.c │ │ ├── detect_sensors.c │ │ ├── elphel393-init.c # ok │ │ ├── elphel393-mem.c # ok │ │ ├── elphel393-pwr.c # ok │ │ ├── exif393.c │ │ ├── fpgajtag353.c │ │ ├── framepars.c # ok │ │ ├── gamma_tables.c # affects images which look ok │ │ ├── histograms.c # displayed │ │ ├── imu_log393.c │ │ ├── jpeghead.c │ │ ├── klogger_393.c │ │ ├── lepton.c │ │ ├── mt9f002.c │ │ ├── mt9x001.c # sensor is programmed correctly │ │ ├── multi10359.c │ │ ├── pgm_functions.c # parameters are getting applied correctly (mt9p006) │ │ ├── quantization_tables.c # images not broken │ │ ├── sensor_common.c │ │ ├── sensor_i2c.c │ │ ├── x393.c │ │ ├── x393_fpga_functions.c # ok │ │ └── x393_videomem.c # also used in circbuf => recording => works │ ├── misc │ │ ├── ltc3589.c │ │ └── vsc330x.c # switching between internal and external SSD ports works │ ├── mmc │ │ └── host │ │ └── sdhci.c # this needed chip detect ORed with dat3: SDHCI_ANY_PRESENT = SDHCI_CARD_PRESENT | SDHCI_DAT3_PRESENT │ ├── mtd │ │ └── nand # added functions to work with OTP, tested only reading │ │ ├── nand_base.c │ │ ├── nandchip-micron.c │ │ └── pl35x_nand.c │ ├── net │ │ └── ethernet │ │ └── cadence │ │ └── macb_main.c # needed fixup for Atheros chip - disable SmartEEE │ └── rtc │ └── rtc-m41t80.c # updated to latest version. Our changes only ignore Oscillator failure at boot at m41t80_get_datetime(). ├── helpers │ └── si5338_register_map_dts.py # test it? ├── other │ └── mem.py └── patches ├── ahci.patch ├── drivers-elphel.patch ├── garmin_usb.c.patch └── libahci.patch
[SOLVED] Note 1: Bring back fpga char device
- /dev/xdevfg got retired by Xilinx - instead there's the FPGA 'Manager' which is unable to load a simple *.bit (only *.bin or *.bit.bin).
- Solution:
Brought back the old driver (drivers/char/xilinx_devcfg.c and edited Kconfig and Makefile)- it works as it used to
[SOLVED] Note 2: Build php 5.6.40
- php 5.6.40 - EOL and won't build - mysql supposedly moved header files.
- Solution:
Disabled mysql extension: To meta-elphel393/recipes-devtools/php/php_5.6.%.bbappend: PACKAGECONFIG[mysql] = "--without-mysql --without-mysqli --without-pdo-mysql" CFLAGS += " -ldl"
[SOLVED] Note 3: Entropy device hwrng
- New package rng-tools is whining: Failed to init entropy source hwrng
- Solution:
Leave as is for now. The full log is: Initalizing available sources Failed to init entropy source hwrng Enabling JITTER rng support Initalizing entropy source jitter
- Comments:
- Haven't found if Xilinx uses any driver for /dev/hwrng
- TODO: Find out if the order of entropy sources can be changed
- That lag at boot is really annoying - 5 secs?!!
[SOLVED] Note 4: PHP causing 'unsupported FP instruction in kernel mode'
- Kernel Oops:
[ 35.872118] BUG: unsupported FP instruction in kernel mode [ 35.877621] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [ 35.883380] Modules linked in: [ 35.886498] CPU: 1 PID: 1756 Comm: php Not tainted 4.14.0-xilinx-v2018.3 #1 [ 35.893459] Hardware name: Xilinx Zynq Platform [ 35.897989] task: ee83f280 task.stack: ef1d6000 [ 35.902527] PC is at vfp_reload_hw+0x30/0x44 [ 35.906802] LR is at __und_usr_fault_32+0x0/0x8 [ 35.911338] pc : [<c0102e10>] lr : [<c010c280>] psr: a0000013 [ 35.917529] sp : ef1d7fb0 ip : 00000051 fp : 00000001 [ 35.922813] r10: ef1d61f8 r9 : c010c308 r8 : ee9893c0 [ 35.928040] r7 : 00000001 r6 : 00400100 r5 : c0138d08 r4 : ecd600f8 [ 35.934569] r3 : c0c6c064 r2 : b67bde8c r1 : ecd9a224 r0 : eeb00a40 [ 35.941098] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 35.948241] Control: 18c5387d Table: 2cda404a DAC: 00000051 [ 35.953993] Process php (pid: 1756, stack limit = 0xef1d6210) [ 35.959740] Stack: (0xef1d7fb0 to 0xef1d8000) [ 35.964020] 7fa0: a5f43f50 a5f43e18 00000080 00000000 [ 35.972269] 7fc0: 00000000 a5f43f4c b687b338 000000ae 00000000 bedcdfe4 00000001 a5f43ffc [ 35.980385] 7fe0: a5f43f50 a5f43d7c b676cf78 b67bde8c 60000010 ffffffff 00000000 00000000 [ 35.988626] Code: 128aa080 e89a0162 e3110102 0a000003 (eee96a10) [ 35.994724] ---[ end trace 06029778db6d2d90 ]--- [ 35.999422] note: php[1756] exited with preempt_count 2
- Unsupported floating point instruction in kernel?
- Details:
- single sensor (MT9P006) on port 0 - at boot - after the Oops the camera seems to be operating normal - the appearance is random (but <50%) - easier reproduced with reboot -f than power cycle - autocampars.php runs at boot and sometimes causes this - it happens after 0 is written to initiate sensors' driver - fpga is already programmed - after mt9x001_pgm_initsensor() exit - autocampars.php log seems to be ok and full - tested with 2 boards
- Causes?
- kernel? -some racing conditions?- huge variables in the stack overflow it at mt9x001.c:mt9x001_pgm_initsensor()- php? - too old? the version 5.6.40 is EOL - got built with some package that is too new for it? Like it won't build with newer mysql
- Solution?:
-Took arch/vfp/vfpmodule.c from kernel 4.19. The current was 4.14. It didn't work. Roll back and check which php call caused it? Also might be a linux driver.-Try php 7.x.x - need to update the extension-Try php 5.6.31 (the one that used to work) - Opps persists-On the bright side, at least it's not a kernel panic-switched from -mfloat-abi=softfp to -mfloat-abi=hard - the problem seems to go away - but is it 100%?- used kmalloc instead of auto variable in mt9x001_pgm_initsensor() - no Oopses so far
- More notes on debugging
- CONFIG_DEBUG_STACK_USAGE=y and it reports how many bytes left in stack for various processes. For that particular process (php) the "bytes left" were 4 on successful boots and ~1028 after a huge variable (of 1024 bytes) got moved to heap. - Also there's a warning in Eclipse about "frame size" beaing larger than 1024
[SOLVED] Note 5: Bring up NAND OTP support
- MAC is not read from NAND, displays the default: 00:0e:64:10:00:00
- Problem?
[ 3.639851] elphel393-init: Flash page read, code -95
- Comments:
- Lookup what had changed.
- Solution: (for xlnx_rebase_v4.14 branch of linux-xlnx):
In drivers/mtd/nand_base.c in nand_scan_tail() they call nand_manufacturer_init() which is mapped to a new driver drivers/mtd/nand_micron.c. So, when it fails - the driver init fails - mtd functions do not get assigned. (And the driver (drivers/elphel/elphel393_init.c) that reads from OTP area returns -95 which is EOPNOTSUPP.) We just need to fall through for a quick fix.
The reason that function exits with an error is it decides that it does not support forcefully enabled on-die ECC. And this needs to be investigated.
[SOLVED] Note 6: udev - unknown group 'kvm'
- Problem:
[ 5.817352] udevd[1478]: starting version 3.2.7 [ 5.918028] udevd[1478]: specified group 'kvm' unknown [ 5.986364] udevd[1479]: starting eudev-3.2.7 [ 6.142897] udevd[1479]: specified group 'kvm' unknown
- Solution:
KVM == Kernel-based Virtual Machine. Remove for now (and maybe forever) . └── udev ├── eudev │ └── 50-udev-default.rules └── eudev_3.2.7.bbappend
50-udev-default.rules - gets installed over the original file.
[SOLVED] Note 7: Add back fixup for Atheros to updated ethernet driver
- Problem:
Ethernet driver's structure has changed. It was split into several files. Lives at /driver/net/ethernet/cadence/
- Soluton:
For out ethernet chip (Atheros 80xx) a fixup had to be added to disable SmartEEE. It's a single function, call and a couple defines - added all back to the new driver structure.
[SOLVED] Note 8: u-boot update
- update u-boot
- solution:
Updated to 2019.07 mainstream u-boot - converted our *.h (with params used to generate SPL header) to Kconfigs - updated driver for NAND flash - tested both boot modes - mmc and nand
[SOLVED] Note 9: test camogm
- test camogm
/var/state/camogm_cmd accepts only the first write - switch to polling? when switched to polling - when recording - buffer gets overflow. Because the polling version does not work correctly probably. All is working for the version without polling - after adding EOF reset (clearerr(npipe)) right after reading from the pipe and checking if feof().
[SOLVED] Note 10: test streamer
- test streamer
Streamer works
[SOLVED] Note 11: test AHCI driver
- test ahci driver
- results:
- SSD is detected and automounted - write/read works
[SOLVED] Note 12: test raw recording
- test recording on a raw partition
- comments:
There was a typo in camogm_align.c - it was not aligning when it should have. CHUNK_LEADER changed to CHUNK_HEADER in line 339: ... if (chunks[CHUNK_HEADER].iov_len != 0){ // only if it is not TIFF ...
[SOLVED] Note 13: FLIR Lepton 3.5 sensor: NULL pointer dereference
- Solution:
Forgot to pull the latest device tree with lepton description Old device tree didn't have i2c configuration for lepton hence something returned NULL
- Original log:
framepars_operations elphel393-framepars@0: Configuring compressor DMA channels circbuf elphel393-circbuf@0: Setting i2c drive mode for port 0 circbuf elphel393-circbuf@0: register_i2c_sensor() detect_sensors elphel393-detect_sensors@0: detect_sensors_par2addr_init(): sensorPortConfig[0].sensor[0] = 0x44 Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ecdb4000 [00000000] *pgd=00000000 Internal error: Oops - BUG: 5 [#1] PREEMPT SMP ARM Modules linked in: CPU: 1 PID: 1755 Comm: php Not tainted 4.14.0-xilinx-v2018.3 #1 Hardware name: Xilinx Zynq Platform task: ee80cd80 task.stack: ecda0000 PC is at register_i2c_sensor+0x244/0x2ac LR is at 0x0 pc : [<c05a19e8>] lr : [<00000000>] psr: 60030013 sp : ecda1480 ip : ecda14a8 fp : 00000000 r10: c0ee625c r9 : 000000fc r8 : 00000000 r7 : 00000028 r6 : ecda14a8 r5 : c0c3ca58 r4 : 00000000 r3 : 00000000 r2 : c09b093a r1 : ee973c91 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 18c5387d Table: 2cdb404a DAC: 00000051 Process php (pid: 1755, stack limit = 0xecda0210) Stack: (0xecda1480 to 0xecda2000) ... ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[SOLVED] Note 14: fixdep: Permission denied
- Description:
- We've had this error for a while, probably since kernel 4.0 - usually happened when running do_compile_kernelmodules - EXTRA_OEMAKE = "-s -w -B KCFLAGS='-v'" - That -B forces to rebuild all targets and we also have -j8 (in PARALLEL_MAKE variable) for the parallel build - so when running the parallel build fixdep gets rebuilt several times and at some point one of the targets (e.g. sortextable or kallsyms) calls it while fixdep is being compiled and overwritten for another target (probably) - the exec rights are correct after the fact
- Solution:
Removed -B. It make fixdep build only once and the problem is gone.
- Note:
~$ make -h ... -B, --always-make Unconditionally make all targets. ...