Difference between revisions of "Poky migration from rocko to warrior"
From ElphelWiki
(→- Note 4: PHP causing 'unsupported FP instruction in kernel mode') |
(→- Note 4: PHP causing 'unsupported FP instruction in kernel mode') |
||
(5 intermediate revisions by the same user not shown) | |||
Line 92: | Line 92: | ||
** Haven't found if Xilinx uses any driver for /dev/hwrng | ** Haven't found if Xilinx uses any driver for /dev/hwrng | ||
** TODO: Find out if the order of entropy sources can be changed | ** TODO: Find out if the order of entropy sources can be changed | ||
+ | ** That lag at boot is really annoying - 5-10 seconds?!! | ||
==<font color='green'>'''-'''</font> Note 4: PHP causing 'unsupported FP instruction in kernel mode'== | ==<font color='green'>'''-'''</font> Note 4: PHP causing 'unsupported FP instruction in kernel mode'== | ||
− | + | * Kernel Oops: | |
− | * | ||
− | |||
<font size='1'>[ 35.872118] BUG: unsupported FP instruction in kernel mode | <font size='1'>[ 35.872118] BUG: unsupported FP instruction in kernel mode | ||
[ 35.877621] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM | [ 35.877621] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM | ||
Line 121: | Line 120: | ||
[ 35.999422] note: php[1756] exited with preempt_count 2</font> | [ 35.999422] note: php[1756] exited with preempt_count 2</font> | ||
− | Unsupported floating point instruction in kernel? | + | * Unsupported floating point instruction in kernel? |
+ | |||
+ | * Details: | ||
+ | - single sensor (MT9P006) on port 0 | ||
+ | - at boot | ||
+ | - after the Oops the camera seems to be operating normal | ||
+ | - the appearance is random (but <50%) - easier reproduced with ''reboot -f'' than power cycle | ||
+ | - '''autocampars.php''' runs at boot and sometimes causes this - it happens after 0 is written to initiate sensors' driver | ||
+ | - fpga is already programmed | ||
+ | - after mt9x001_pgm_initsensor() exit | ||
+ | - autocampars.php log seems to be ok and full | ||
+ | - tested with 2 boards | ||
− | * | + | * Causes? |
+ | - hardware | ||
+ | - power board? | ||
+ | - system board? (probably not cause reproduced on 2 boards) | ||
+ | - temperature? | ||
+ | - kernel? | ||
+ | - some racing conditions? | ||
+ | - php? | ||
+ | - too old? the version 5.6.40 is EOL | ||
+ | - got built with some package that is too new for it? Like it won't build with newer mysql | ||
− | * | + | * Solution?: |
− | Took arch/vfp/vfpmodule.c from kernel 4.19 | + | - Took arch/vfp/vfpmodule.c from kernel 4.19. The current was 4.14. It didn't work. Roll back and check which php call caused it? Also might be a linux driver. |
− | + | - Try php 7.x.x - need to update the extension | |
− | + | - Try php 5.6.31 (the one that used to work) - Opps persists | |
− | + | - On the bright side, at least it's not a kernel panic | |
* TODO: keep an eye on this, because the real reason is not investigated | * TODO: keep an eye on this, because the real reason is not investigated |
Revision as of 14:55, 8 August 2019
Contents
- 1 Elphel's kernel tree
- 2 [SOLVED] Note 1: Bring back fpga char device
- 3 [SOLVED] Note 2: Build php 5.6.40
- 4 [SOLVED] Note 3: Entropy device hwrng
- 5 - Note 4: PHP causing 'unsupported FP instruction in kernel mode'
- 6 [SOLVED] Note 5: Bring up NAND OTP support
- 7 [SOLVED] Note 6: udev - unknown group 'kvm'
- 8 [SOLVED] Note 7: Add back fixup for Atheros to updated ethernet driver
- 9 [SOLVED] Note 8: u-boot update
- 10 [SOLVED] Note 9: test camogm
- 11 [SOLVED] Note 10: test streamer
- 12 [SOLVED] Note 11: test AHCI driver
- 13 [SOLVED] Note 12: test raw recording
- 14 [SOLVED] Note 13: FLIR Lepton 3.5 sensor: NULL pointer dereference
Elphel's kernel tree
. ├── arch │ └── arm │ └── boot │ └── dts/ # device trees for 393 cameras, considering tested ├── drivers │ ├── ata │ │ ├── ahci_elphel.c # tested reading and writing from/to SSD │ │ └── libata-eh.c │ ├── char │ │ └── xilinx_devcfg.c # tested bitstream loading - brought back the old character device driver, it's simpler this way than the new one FPGA manager that can load only .bit.bin files │ ├── clk │ │ └── clk-si5338.c # chip found, no errors │ ├── elphel │ │ ├── circbuf.c # tested via recording │ │ ├── clock10359.c │ │ ├── command_sequencer.c # ok │ │ ├── cxi2c.c │ │ ├── detect_sensors.c │ │ ├── elphel393-init.c # ok │ │ ├── elphel393-mem.c # ok │ │ ├── elphel393-pwr.c # ok │ │ ├── exif393.c │ │ ├── fpgajtag353.c │ │ ├── framepars.c # ok │ │ ├── gamma_tables.c # affects images which look ok │ │ ├── histograms.c # displayed │ │ ├── imu_log393.c │ │ ├── jpeghead.c │ │ ├── klogger_393.c │ │ ├── lepton.c │ │ ├── mt9f002.c │ │ ├── mt9x001.c # sensor is programmed correctly │ │ ├── multi10359.c │ │ ├── pgm_functions.c # parameters are getting applied correctly (mt9p006) │ │ ├── quantization_tables.c # images not broken │ │ ├── sensor_common.c │ │ ├── sensor_i2c.c │ │ ├── x393.c │ │ ├── x393_fpga_functions.c # ok │ │ └── x393_videomem.c # also used in circbuf => recording => works │ ├── misc │ │ ├── ltc3589.c │ │ └── vsc330x.c # switching between internal and external SSD ports works │ ├── mmc │ │ └── host │ │ └── sdhci.c # this needed chip detect ORed with dat3: SDHCI_ANY_PRESENT = SDHCI_CARD_PRESENT | SDHCI_DAT3_PRESENT │ ├── mtd │ │ └── nand # added functions to work with OTP, tested only reading │ │ ├── nand_base.c │ │ ├── nandchip-micron.c │ │ └── pl35x_nand.c │ ├── net │ │ └── ethernet │ │ └── cadence │ │ └── macb_main.c # needed fixup for Atheros chip - disable SmartEEE │ └── rtc │ └── rtc-m41t80.c # updated to latest version. Our changes only ignore Oscillator failure at boot at m41t80_get_datetime(). ├── helpers │ └── si5338_register_map_dts.py # test it? ├── other │ └── mem.py └── patches ├── ahci.patch ├── drivers-elphel.patch ├── garmin_usb.c.patch └── libahci.patch
[SOLVED] Note 1: Bring back fpga char device
- /dev/xdevfg got retired by Xilinx - instead there's the FPGA 'Manager' which is unable to load a simple *.bit (only *.bin or *.bit.bin).
- Solution:
Brought back the old driver (drivers/char/xilinx_devcfg.c and edited Kconfig and Makefile)- it works as it used to
[SOLVED] Note 2: Build php 5.6.40
- php 5.6.40 - EOL and won't build - mysql supposedly moved header files.
- Solution:
Disabled mysql extension: To meta-elphel393/recipes-devtools/php/php_5.6.%.bbappend: PACKAGECONFIG[mysql] = "--without-mysql --without-mysqli --without-pdo-mysql" CFLAGS += " -ldl"
[SOLVED] Note 3: Entropy device hwrng
- New package rng-tools is whining: Failed to init entropy source hwrng
- Solution:
Leave as is for now. The full log is: Initalizing available sources Failed to init entropy source hwrng Enabling JITTER rng support Initalizing entropy source jitter
- Comments:
- Haven't found if Xilinx uses any driver for /dev/hwrng
- TODO: Find out if the order of entropy sources can be changed
- That lag at boot is really annoying - 5-10 seconds?!!
- Note 4: PHP causing 'unsupported FP instruction in kernel mode'
- Kernel Oops:
[ 35.872118] BUG: unsupported FP instruction in kernel mode [ 35.877621] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [ 35.883380] Modules linked in: [ 35.886498] CPU: 1 PID: 1756 Comm: php Not tainted 4.14.0-xilinx-v2018.3 #1 [ 35.893459] Hardware name: Xilinx Zynq Platform [ 35.897989] task: ee83f280 task.stack: ef1d6000 [ 35.902527] PC is at vfp_reload_hw+0x30/0x44 [ 35.906802] LR is at __und_usr_fault_32+0x0/0x8 [ 35.911338] pc : [<c0102e10>] lr : [<c010c280>] psr: a0000013 [ 35.917529] sp : ef1d7fb0 ip : 00000051 fp : 00000001 [ 35.922813] r10: ef1d61f8 r9 : c010c308 r8 : ee9893c0 [ 35.928040] r7 : 00000001 r6 : 00400100 r5 : c0138d08 r4 : ecd600f8 [ 35.934569] r3 : c0c6c064 r2 : b67bde8c r1 : ecd9a224 r0 : eeb00a40 [ 35.941098] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 35.948241] Control: 18c5387d Table: 2cda404a DAC: 00000051 [ 35.953993] Process php (pid: 1756, stack limit = 0xef1d6210) [ 35.959740] Stack: (0xef1d7fb0 to 0xef1d8000) [ 35.964020] 7fa0: a5f43f50 a5f43e18 00000080 00000000 [ 35.972269] 7fc0: 00000000 a5f43f4c b687b338 000000ae 00000000 bedcdfe4 00000001 a5f43ffc [ 35.980385] 7fe0: a5f43f50 a5f43d7c b676cf78 b67bde8c 60000010 ffffffff 00000000 00000000 [ 35.988626] Code: 128aa080 e89a0162 e3110102 0a000003 (eee96a10) [ 35.994724] ---[ end trace 06029778db6d2d90 ]--- [ 35.999422] note: php[1756] exited with preempt_count 2
- Unsupported floating point instruction in kernel?
- Details:
- single sensor (MT9P006) on port 0 - at boot - after the Oops the camera seems to be operating normal - the appearance is random (but <50%) - easier reproduced with reboot -f than power cycle - autocampars.php runs at boot and sometimes causes this - it happens after 0 is written to initiate sensors' driver - fpga is already programmed - after mt9x001_pgm_initsensor() exit - autocampars.php log seems to be ok and full - tested with 2 boards
- Causes?
- hardware - power board? - system board? (probably not cause reproduced on 2 boards) - temperature? - kernel? - some racing conditions? - php? - too old? the version 5.6.40 is EOL - got built with some package that is too new for it? Like it won't build with newer mysql
- Solution?:
- Took arch/vfp/vfpmodule.c from kernel 4.19. The current was 4.14. It didn't work. Roll back and check which php call caused it? Also might be a linux driver. - Try php 7.x.x - need to update the extension - Try php 5.6.31 (the one that used to work) - Opps persists - On the bright side, at least it's not a kernel panic
- TODO: keep an eye on this, because the real reason is not investigated
[SOLVED] Note 5: Bring up NAND OTP support
- MAC is not read from NAND, displays the default: 00:0e:64:10:00:00
- Problem?
[ 3.639851] elphel393-init: Flash page read, code -95
- Comments:
- Lookup what had changed.
- Solution: (for xlnx_rebase_v4.14 branch of linux-xlnx):
In drivers/mtd/nand_base.c in nand_scan_tail() they call nand_manufacturer_init() which is mapped to a new driver drivers/mtd/nand_micron.c. So, when it fails - the driver init fails - mtd functions do not get assigned. (And the driver (drivers/elphel/elphel393_init.c) that reads from OTP area returns -95 which is EOPNOTSUPP.) We just need to fall through for a quick fix.
The reason that function exits with an error is it decides that it does not support forcefully enabled on-die ECC. And this needs to be investigated.
[SOLVED] Note 6: udev - unknown group 'kvm'
- Problem:
[ 5.817352] udevd[1478]: starting version 3.2.7 [ 5.918028] udevd[1478]: specified group 'kvm' unknown [ 5.986364] udevd[1479]: starting eudev-3.2.7 [ 6.142897] udevd[1479]: specified group 'kvm' unknown
- Solution:
KVM == Kernel-based Virtual Machine. Remove for now (and maybe forever) . └── udev ├── eudev │ └── 50-udev-default.rules └── eudev_3.2.7.bbappend
50-udev-default.rules - gets installed over the original file.
[SOLVED] Note 7: Add back fixup for Atheros to updated ethernet driver
- Problem:
Ethernet driver's structure has changed. It was split into several files. Lives at /driver/net/ethernet/cadence/
- Soluton:
For out ethernet chip (Atheros 80xx) a fixup had to be added to disable SmartEEE. It's a single function, call and a couple defines - added all back to the new driver structure.
[SOLVED] Note 8: u-boot update
- update u-boot
- solution:
Updated to 2019.07 mainstream u-boot - converted our *.h (with params used to generate SPL header) to Kconfigs - updated driver for NAND flash - tested both boot modes - mmc and nand
[SOLVED] Note 9: test camogm
- test camogm
/var/state/camogm_cmd accepts only the first write - switch to polling? when switched to polling - when recording - buffer gets overflow. Because the polling version does not work correctly probably. All is working for the version without polling - after adding EOF reset (clearerr(npipe)) right after reading from the pipe and checking if feof().
[SOLVED] Note 10: test streamer
- test streamer
Streamer works
[SOLVED] Note 11: test AHCI driver
- test ahci driver
- results:
- SSD is detected and automounted - write/read works
[SOLVED] Note 12: test raw recording
- test recording on a raw partition
- comments:
There was a typo in camogm_align.c - it was not aligning when it should have. CHUNK_LEADER changed to CHUNK_HEADER in line 339: ... if (chunks[CHUNK_HEADER].iov_len != 0){ // only if it is not TIFF ...
[SOLVED] Note 13: FLIR Lepton 3.5 sensor: NULL pointer dereference
- Solution:
Forgot to pull the latest device tree with lepton description Old device tree didn't have i2c configuration for lepton hence something returned NULL
- Original log:
framepars_operations elphel393-framepars@0: Configuring compressor DMA channels circbuf elphel393-circbuf@0: Setting i2c drive mode for port 0 circbuf elphel393-circbuf@0: register_i2c_sensor() detect_sensors elphel393-detect_sensors@0: detect_sensors_par2addr_init(): sensorPortConfig[0].sensor[0] = 0x44 Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ecdb4000 [00000000] *pgd=00000000 Internal error: Oops - BUG: 5 [#1] PREEMPT SMP ARM Modules linked in: CPU: 1 PID: 1755 Comm: php Not tainted 4.14.0-xilinx-v2018.3 #1 Hardware name: Xilinx Zynq Platform task: ee80cd80 task.stack: ecda0000 PC is at register_i2c_sensor+0x244/0x2ac LR is at 0x0 pc : [<c05a19e8>] lr : [<00000000>] psr: 60030013 sp : ecda1480 ip : ecda14a8 fp : 00000000 r10: c0ee625c r9 : 000000fc r8 : 00000000 r7 : 00000028 r6 : ecda14a8 r5 : c0c3ca58 r4 : 00000000 r3 : 00000000 r2 : c09b093a r1 : ee973c91 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 18c5387d Table: 2cdb404a DAC: 00000051 Process php (pid: 1755, stack limit = 0xecda0210) Stack: (0xecda1480 to 0xecda2000) ... ---[ end Kernel panic - not syncing: Fatal exception in interrupt