Issue
Consider a very simple ebpf
code of BPF_PROG_TYPE_SOCKET_FILTER
type:
struct bpf_insn prog[] = {
BPF_MOV64_IMM(BPF_REG_0, -1),
BPF_EXIT_INSN(),
};
The code snippets below from net/core/filter.c
and net/core/sock/c
show how the filter will be invoked:
static inline int pskb_trim(struct sk_buff *skb, unsigned int len)
{
return (len < skb->len) ? __pskb_trim(skb, len) : 0;
}
...
int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap)
{
int err;
...
if (filter) {
pkt_len = bpf_prog_run_save_cb(filter->prog, skb);
skb->sk = save_sk;
err = pkt_len ? pskb_trim(skb, max(cap, pkt_len)) : -EPERM;
}
...
return err;
}
...
static inline int sk_filter(struct sock *sk, struct sk_buff *skb)
{
return sk_filter_trim_cap(sk, skb, 1);
}
Eventually sk_filter()
will be called by sock_queue_rcv_skb()
, i.e. a packet reaching socket will be processed by the filter and queued if sk_filter()
returns 0.
If I understand this code correctly, in this case (return code 0xffffffff) will result in a packet being dropped. However, my simple ebpf code when attached to a AF_PACKET
raw socket (bound to lo
interface) does not drop icmp
packets sent across the loopback interface. Does it have anything to do with the eBPF or behaviour of ICMP on loopback interface?
UPDATE
As pchaigno
has pointed out, socket filter programs deal with the copies of packets. In my case, ebpf application basically creates a tap
socket (AF_PACKET
raw socket), which will be handed ingress packets before they are delivered up to a protocol layer. I did some investigation in the kernel code and found that a received packet will eventually land in __netif_receive_skb_core() function, which does the following:
list_for_each_entry_rcu(ptype, &skb->dev->ptype_all, list) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
This will pass the packet to the AF_PACKET
handler, which will run ebpf filter eventually.
As a way to confirm that the ebpf
filter actually filters on raw AF_PACKET sockets, one can dump stats as follows:
struct tpacket_stats stats;
...
len = sizeof(stats);
err = getsockopt(sock, SOL_PACKET, PACKET_STATISTICS, &stats, &len);
This stats will indicate the filter behaviour.
Solution
It's the other way around: it will drop the packet if 0 is returned. From the code:
* sk_filter_trim_cap - run a packet through a socket filter
* [...]
*
* Run the eBPF program and then cut skb->data to correct size returned by
* the program. If pkt_len is 0 we toss packet. If skb->len is smaller
* than pkt_len we keep whole skb->data. [...]
Answered By - pchaigno Answer Checked By - Willingham (WPSolving Volunteer)