Issue
I am running an application on my intel rangeley board, which is having 3.14.29-rt22 kernel running on it. Application will run two threads with pri :39 each. for 1 and 2 msec periodically. Both the threads will be running in continuous while loop, which will be running only on core 0. After running sometime, around 10 min. When I press ctrl+c, it is giving logs below.
**INFO: rcu_preempt self-detected stall on CPU { 0} (t=21000 jiffies g=2362 c=2361 q=207)**
**sending NMI to all CPUs:
NMI backtrace for cpu 1**
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.29ltsi-rt22-yocto-preempt-rt+ #1
Hardware name: ADI Engineering RCC-VE/RCC-VE, BIOS ADI_RCCVE-01.00.00.04-nodebug 05/06/2015
task: ffff8802761a0000 ti: ffff8802761a8000 task.ti: ffff8802761a8000
RIP: 0010:[<ffffffff8100b451>] [<ffffffff8100b451>] native_read_tsc+0x1/0x20
RSP: 0018:ffff8802761abe28 EFLAGS: 00000003
RAX: 0000000000000000 RBX: ffffffff81e1acc0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffffff81e1acc0
RBP: ffff8802761abe38 R08: ffff8802761a8000 R09: 0000000000000001
R10: 0000000000000800 R11: 0000000000000000 R12: 000000000000003e
R13: 0000000000014e76 R14: ffff8802761abfd8 R15: ffff88027fc8cf00
FS: 0000000000000000(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fabcd23f000 CR3: 0000000269589000 CR4: 00000000001007e0
Stack:
ffff8802761abe38 ffffffff8100b4a9 ffff8802761abe60 ffffffff810a6b73
0000000000000001 ffff8802761abfd8 ffffffff81edc030 ffff8802761abec0
ffffffff810b01a5 ffffffffffffff10 ffffffff8103b906 0000000000000000
Call Trace:
[<ffffffff8100b4a9>] ? read_tsc+0x9/0x20
[<ffffffff810a6b73>] ktime_get+0x43/0xc0
[<ffffffff810b01a5>] __tick_nohz_idle_enter+0x25/0x480
[<ffffffff8103b906>] ? native_safe_halt+0x6/0x10
[<ffffffff810b064a>] tick_nohz_idle_enter+0x4a/0x80
[<ffffffff8109a626>] cpu_startup_entry+0x46/0x290
[<ffffffff81031597>] start_secondary+0x1b7/0x210
What can be the reason? Is it because I am continuously using CPU for longtime? When I print anything from thread on the console, this crash is not happening.
Solution
Yes, continuous using of CPU from a high priority thread for a long time (1ms is a large period from the view of scheduler) could be a reason of RCU stall.
From the documentation about RCU stall detector:
The following problems can result in RCU CPU stall warnings:
... A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might happen to preempt a low-priority task in the middle of an RCU read-side critical section. This is especially damaging if that low-priority task is not permitted to run on any other CPU, in which case the next RCU grace period can never complete, which will eventually cause the system to run out of memory and hang.
... A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that is running at a higher priority than the RCU softirq threads. This will prevent RCU callbacks from ever being invoked, and in a CONFIG_PREEMPT_RCU kernel will further prevent RCU grace periods from ever completing. Either way, the system will eventually run out of memory and hang.
Performing any system call (like write()
to console) from the high priority thread give kernel to perform some work targeted to system's maintainence.
Possibly, sched_yield will helps too.
Answered By - Tsyvarev Answer Checked By - Mildred Charles (WPSolving Admin)