Wednesday, May 25, 2022

[SOLVED] Linux kernel module crash debug: general protection fault: 0000 [#1] SMP

Issue

I have a kernel module for splitting incoming rtp packets and merging rtp outgoing packets. The program crashes once in 2/3 days. If would be very convenient for me if its possible to find the exact line where the module crashes. I have given the crash dump below. Is it possible to find the exact line in the code from crash dump?

    PID: 1256   TASK: ffff88020fc71700  CPU: 0   COMMAND: "rtpproxy"
 #0 [ffff880212faf2f0] machine_kexec at ffffffff8103bb7a
 #1 [ffff880212faf360] crash_kexec at ffffffff810bb968
 #2 [ffff880212faf430] oops_end at ffffffff8169fad8
 #3 [ffff880212faf460] die at ffffffff81017808
 #4 [ffff880212faf490] do_general_protection at ffffffff8169f5d2
 #5 [ffff880212faf4c0] general_protection at ffffffff8169eef5
    [exception RIP: pkt_queue+388]
    RIP: ffffffffa00f3fa0  RSP: ffff880212faf578  RFLAGS: 00010292
    RAX: ffff8802110ae400  RBX: ffff880213a53f38  RCX: 00015d910000a20f
    RDX: 497d74565cede60c  RSI: 000000006df1ed57  RDI: 00000000e46e0cfc
    RBP: ffff880212faf728   R8: ffff880211a8b000   R9: ffff880212fafa60
    R10: ffff880212fafbc8  R11: 0000000000000293  R12: 00000000134ab2b4
    R13: 000000008386615c  R14: 00000000000000e3  R15: 00000000000000e3
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff880212faf730] obsf_tg at ffffffffa00f34a0 [xt_OBSF]
 #7 [ffff880212faf890] ipt_do_table at ffffffffa00e41a5 [ip_tables]
 #8 [ffff880212faf970] ipt_mangle_out at ffffffffa00dd129 [iptable_mangle]
 #9 [ffff880212faf9c0] iptable_mangle_hook at ffffffffa00dd1eb [iptable_mangle]
#10 [ffff880212faf9d0] nf_iterate at ffffffff815aded5
#11 [ffff880212fafa20] nf_hook_slow at ffffffff815adf85
#12 [ffff880212fafaa0] __ip_local_out at ffffffff815babb2
#13 [ffff880212fafac0] ip_local_out at ffffffff815babd6
#14 [ffff880212fafae0] ip_send_skb at ffffffff815bbefb
#15 [ffff880212fafb00] udp_send_skb at ffffffff815df1d1
#16 [ffff880212fafb50] udp_sendmsg at ffffffff815e0286
#17 [ffff880212fafc90] inet_sendmsg at ffffffff815eabc4
#18 [ffff880212fafcd0] sock_sendmsg at ffffffff8156a437
#19 [ffff880212fafe50] sys_sendto at ffffffff8156d91d
#20 [ffff880212faff80] system_call_fastpath at ffffffff816a7029
    RIP: 00007f17363b83a3  RSP: 00007ffff2965f90  RFLAGS: 00010213
    RAX: 000000000000002c  RBX: ffffffff816a7029  RCX: 00007ffff29ff99b
    RDX: 0000000000000020  RSI: 00007f1737da4378  RDI: 0000000000000006
    RBP: 0000000000000001   R8: 00007f1737da67a0   R9: 0000000000000010
    R10: 0000000000000000  R11: 0000000000000293  R12: 00007f1737da4378
    R13: 0000000000000001  R14: 00007f1737da42a0  R15: 0000000000000000
    ORIG_RAX: 000000000000002c  CS: 0033  SS: 002b














[157707.736203] general protection fault: 0000 [#1] SMP 
[157707.736955] CPU 0 
[157707.736973] Modules linked in:
[157707.737654]  arc4 xt_tcpudp xt_OBSF(O) iptable_mangle ip_tables x_tables ghash_clmulni_intel aesni_intel cryptd aes_x86_64 joydev hid_generic microcode ext2 usbhid psmouse hid serio_raw i2c_piix4 virtio_balloon lp parport mac_hid floppy
[157707.740018] 
[157707.740102] Pid: 1256, comm: rtpproxy Tainted: G           O 3.5.0-23-generic #35~precise1-Ubuntu Bochs Bochs
[157707.740102] RIP: 0010:[<ffffffffa00f3fa0>]  [<ffffffffa00f3fa0>] pkt_queue+0x184/0x48a [xt_OBSF]
[157707.740102] RSP: 0018:ffff880212faf578  EFLAGS: 00010292
[157707.740102] RAX: ffff8802110ae400 RBX: ffff880213a53f38 RCX: 00015d910000a20f
[157707.740102] RDX: 497d74565cede60c RSI: 000000006df1ed57 RDI: 00000000e46e0cfc
[157707.740102] RBP: ffff880212faf728 R08: ffff880211a8b000 R09: ffff880212fafa60
[157707.740102] R10: ffff880212fafbc8 R11: 0000000000000293 R12: 00000000134ab2b4
[157707.740102] R13: 000000008386615c R14: 00000000000000e3 R15: 00000000000000e3
[157707.740102] FS:  00007f1736ad9700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
[157707.740102] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[157707.740102] CR2: 00007fd8a39f8000 CR3: 0000000211ad7000 CR4: 00000000000407f0
[157707.740102] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[157707.740102] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[157707.740102] Process rtpproxy (pid: 1256, threadinfo ffff880212fae000, task ffff88020fc71700)
[157707.740102] Stack:
[157707.740102]  ffff880212faf5a8 0000000000015d91 134ab2b400000008 000008f58386615c
[157707.740102]  00015d910000a20f a080527800000014 3a78560000d1fa00 564812de1a006045
[157707.740102]  ffff880212faf618 ffffffff81872e20 0000000000000000 ffff880210ca9000
[157707.740102] Call Trace:
[157707.740102]  [<ffffffff8169e7de>] ? _raw_spin_lock+0xe/0x20
[157707.740102]  [<ffffffff815a0958>] ? sch_direct_xmit+0x88/0x1c0
[157707.740102]  [<ffffffff81090833>] ? update_cpu_power+0x63/0x100
[157707.740102]  [<ffffffff810909c3>] ? update_group_power+0xf3/0x100
[157707.740102]  [<ffffffff81090db2>] ? update_sd_lb_stats+0x3e2/0x5f0
[157707.740102]  [<ffffffffa00f34a0>] obsf_tg+0x9c0/0x133c [xt_OBSF]
[157707.740102]  [<ffffffff81090ff9>] ? find_busiest_group+0x39/0x4a0
[157707.740102]  [<ffffffff81091541>] ? load_balance+0xe1/0x4a0
[157707.740102]  [<ffffffffa00e41a5>] ipt_do_table+0x315/0x450 [ip_tables]
[157707.740102]  [<ffffffffa00dd129>] ipt_mangle_out+0x99/0x100 [iptable_mangle]
[157707.740102]  [<ffffffffa00dd1eb>] iptable_mangle_hook+0x5b/0x60 [iptable_mangle]
[157707.740102]  [<ffffffff815aded5>] nf_iterate+0x85/0xc0
[157707.740102]  [<ffffffff815b8e50>] ? ip_forward_options+0x200/0x200
[157707.740102]  [<ffffffff815adf85>] nf_hook_slow+0x75/0x150
[157707.740102]  [<ffffffff815b8e50>] ? ip_forward_options+0x200/0x200
[157707.740102]  [<ffffffff815babb2>] __ip_local_out+0xa2/0xb0
[157707.740102]  [<ffffffff815babd6>] ip_local_out+0x16/0x30
[157707.740102]  [<ffffffff815bbefb>] ip_send_skb+0x1b/0x50
[157707.740102]  [<ffffffff815df1d1>] udp_send_skb+0x111/0x2a0
[157707.740102]  [<ffffffff815b9070>] ? ip_setup_cork+0x150/0x150
[157707.740102]  [<ffffffff815e0286>] udp_sendmsg+0x316/0x960
[157707.740102]  [<ffffffff815eabc4>] inet_sendmsg+0x64/0xb0
[157707.740102]  [<ffffffff812f31b7>] ? apparmor_socket_sendmsg+0x17/0x20
[157707.740102]  [<ffffffff8156a437>] sock_sendmsg+0x117/0x130
[157707.740102]  [<ffffffff8119a510>] ? __pollwait+0xf0/0xf0
[157707.740102]  [<ffffffff8119a510>] ? __pollwait+0xf0/0xf0
[157707.740102]  [<ffffffff8119a510>] ? __pollwait+0xf0/0xf0
[157707.740102]  [<ffffffff8156b58d>] ? move_addr_to_user+0xbd/0xd0
[157707.740102]  [<ffffffff8156ce7a>] ? move_addr_to_kernel+0x5a/0xa0
[157707.740102]  [<ffffffff8156d91d>] sys_sendto+0x13d/0x190
[157707.740102]  [<ffffffff8103fcc9>] ? kvm_clock_read+0x19/0x20
[157707.740102]  [<ffffffff8103fcd9>] ? kvm_clock_get_cycles+0x9/0x10
[157707.740102]  [<ffffffff810a3bd7>] ? getnstimeofday+0x57/0xe0
[157707.740102]  [<ffffffff810a3cca>] ? do_gettimeofday+0x1a/0x50
[157707.740102]  [<ffffffff816a7029>] system_call_fastpath+0x16/0x1b
[157707.740102] Code: f7 f1 48 8b 8d 70 fe ff ff 4c 63 f2 41 89 d7 49 69 c6 68 01 00 00 48 01 c3 48 8b 83 58 01 00 00 48 2d 58 01 00 00 48 89 c2 eb 20 <44> 39 62 04 0f 85 c0 02 00 00 44 39 6a 08 0f 85 b6 02 00 00 48 
[157707.740102] RIP  [<ffffffffa00f3fa0>] pkt_queue+0x184/0x48a [xt_OBSF]
[157707.740102]  RSP <ffff880212faf578>

Solution

[157707.736203] general protection fault: 0000 [#1] SMP 

Says that you are doing something horrible in memory (e.g dereferencing a null pointer)

[157707.740102] RIP: 0010:[<ffffffffa00f3fa0>]  [<ffffffffa00f3fa0>] pkt_queue+0x184/0x48a

This line is reporting to you the instruction pointer value when your module crashed; it says that it died inside a function named "pkt_queue" after an offset of "0x184". (btw, the same value appears in the first crash dump, 388 in decimal = 0x184)

Now, you can use objdump to dump the assembly + debug information about your code and you add the address of the function pkt_queue to 0x184 and you get to the offending instruction.
Let's say your pkt_queue function appears(unreasonably hypothetical) at address 0x01 in objdump, it means you should look at line: 0x184 + 0x01 = 0x185 in the assembly to see what's going on.

Objdump allows you view the source + the assembly and line numbers:
objdump -S your_object_file.o this will not only list the assembly but also the corresponding source code assuming the debug symbols are added when compiling.

Oh and for your future reference:
https://opensourceforu.com/2011/01/understanding-a-kernel-oops/



Answered By - Fingolfin
Answer Checked By - Clifford M. (WPSolving Volunteer)