Wednesday, November 17, 2021

[SOLVED] Signals not handled in Linux kernel init thread

Issue

I have the following problem with Linux kernel: When a signal (exception) is occurred in the init thread, where kernel is loading drivers, that signal is not handled in any way, causing the system to freeze. It is even not possible to use debugger.

However, when same signal is raised on a user process after the kernel is booted, it is caught and does not cause the system freeze:

# devmem2 0x51002104 w
/dev/mem opened.[  206.043479] 8<--- cut here ---
[  206.047808] Unhandled fault: asynchronous external abort (0x1211) at 0x00000000    
[  206.055149] pgd = b7e0d3b2
[  206.057865] [00000000] *pgd=a8fc7003, *pmd=00000000

Memory mapped at address 0xb6f2d000.
Bus error

In the pcie-keystone.c driver, there is following signal handler:

#ifdef CONFIG_ARM
    /*
     * PCIe access errors that result into OCP errors are caught by ARM as
     * "External aborts"
     */
    hook_fault_code(17, ks_pcie_fault, SIGBUS, 0,
            "Asynchronous external abort");
#endif

Setting similar handler in my code does not help. When the driver is loaded as a module after the init process is done, the signal does not freeze the system too.

Could a Linux kernel expert advise, is there a hidden option in the kernel config? Or the platform initialization has missing piece, so that the init thread is not fully configured and unable to handle signals? I am on ARM AM5728 machine.


Solution

I returned back to this problem some time ago. The root cause was that under some conditions, no any ARM exceptions are thrown, the hardware just freezes the memory bus and CPU.

I reworked the pipe3 PCIe PHY driver, so that it checks if APLL is locked before letting the PCIe core to start initializing. This is mentioned in the AM57x TRM but not implemented in the mainline Linux driver. Maybe will send patches later to LKML...

The problem is very similar to this: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/615543/am5728-pcie-access-hangs-dsp-and-arm

The reason why behavior is different in init thread and on running system is runtime power management, which disables the PCIe core clock after the kernel is loaded.



Answered By - Vir91