a simple k8s Network Plugin with eBPF

Prerequisites

A working Kubernetes cluster for testing (a single-node kind or kubeadm cluster works perfectly)
clang/llvm for compiling eBPF programs, bpftool

What We’re Building

Our plugin will handle a single, well-defined scope: pod-to-pod connectivity on a single node. That means every pod gets an IP address, and any pod can reach any other pod on the same machine. We won’t handle multi-node routing, services, or network policy. each of those is a project unto itself.

Here’s the architecture at a glance:

Kubernetes node showing pods connected to bridge cbr0 via veth pairs with eBPF packet forwarding

Each pod gets a veth pair. one end lives in the pod’s network namespace, the other is attached to a bridge on the host. An eBPF program attached to the bridge’s TC (traffic control) ingress hook inspects packets and forwards them to the correct pod.

Part 1: Understanding the CNI Specification

The Container Network Interface is simpler than most people expect. A CNI plugin is just a binary that the container runtime (containerd, CRI-O) invokes when a pod is created or destroyed. The runtime passes configuration via stdin as JSON and communicates the operation via environment variables.

The two operations we care about are ADD (set up networking for a new pod) and DEL (tear it down). The runtime sets CNI_COMMAND to indicate which one it wants.

Here are the key environment variables the runtime provides:

1
2
3
4
5
CNI_COMMAND    . "ADD" or "DEL"
CNI_CONTAINERID. the container ID
CNI_NETNS      . path to the container's network namespace (e.g. /proc/12345/ns/net)
CNI_IFNAME     . the interface name the runtime wants inside the container (usually "eth0")
CNI_PATH       . where to find CNI plugin binaries

And here’s what a minimal CNI config file looks like, typically placed in /etc/cni/net.d/:

1
2
3
4
5
6
7
8
{
  "cniVersion": "1.0.0",
  "name": "simple-ebpf-cni",
  "type": "simple-ebpf-cni",
  "bridge": "cbr0",
  "subnet": "10.244.1.0/24",
  "gateway": "10.244.1.1"
}

The type field tells the runtime which binary to execute. It will look for a binary named simple-ebpf-cni in the CNI_PATH directory.

Part 2: IP Address Management (IPAM)

Before we can assign an IP to a pod, we need a dead-simple IPAM mechanism. Production CNI plugins use sophisticated IPAM backends, but we’ll use a file-based approach: a JSON file on disk that tracks which IPs in our subnet are allocated.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
// ipam.go
package main

import (
	"encoding/json"
	"fmt"
	"net"
	"os"
	"sync"
)

const ipamFilePath = "/var/lib/cni/simple-ebpf-cni/ipam.json"

type IPAMState struct {
	mu          sync.Mutex
	Allocated   map[string]string `json:"allocated"` // IP -> containerID
	SubnetCIDR  string            `json:"subnet"`
	GatewayIP   string            `json:"gateway"`
}

func loadIPAM() (*IPAMState, error) {
	state := &IPAMState{
		Allocated: make(map[string]string),
	}

	data, err := os.ReadFile(ipamFilePath)
	if err != nil {
		if os.IsNotExist(err) {
			return state, nil
		}
		return nil, err
	}

	if err := json.Unmarshal(data, state); err != nil {
		return nil, err
	}
	return state, nil
}

func (s *IPAMState) save() error {
	os.MkdirAll("/var/lib/cni/simple-ebpf-cni", 0755)
	data, err := json.MarshalIndent(s, "", "  ")
	if err != nil {
		return err
	}
	return os.WriteFile(ipamFilePath, data, 0644)
}

func (s *IPAMState) allocate(containerID, subnet, gateway string) (net.IP, error) {
	s.mu.Lock()
	defer s.mu.Unlock()

	s.SubnetCIDR = subnet
	s.GatewayIP = gateway

	_, ipNet, err := net.ParseCIDR(subnet)
	if err != nil {
		return nil, fmt.Errorf("invalid subnet: %w", err)
	}

	gatewayIP := net.ParseIP(gateway)

	// Walk the subnet and find the first unallocated IP.
	// Skip the network address, the gateway, and the broadcast address.
	for ip := nextIP(ipNet.IP); ipNet.Contains(ip); ip = nextIP(ip) {
		if ip.Equal(gatewayIP) {
			continue
		}
		if isBroadcast(ip, ipNet) {
			continue
		}
		if _, taken := s.Allocated[ip.String()]; !taken {
			s.Allocated[ip.String()] = containerID
			if err := s.save(); err != nil {
				return nil, err
			}
			return ip, nil
		}
	}

	return nil, fmt.Errorf("no available IPs in subnet %s", subnet)
}

func (s *IPAMState) release(containerID string) {
	s.mu.Lock()
	defer s.mu.Unlock()

	for ip, id := range s.Allocated {
		if id == containerID {
			delete(s.Allocated, ip)
		}
	}
	s.save()
}

func nextIP(ip net.IP) net.IP {
	next := make(net.IP, len(ip))
	copy(next, ip)
	for i := len(next) - 1; i >= 0; i-- {
		next[i]++
		if next[i] != 0 {
			break
		}
	}
	return next
}

func isBroadcast(ip net.IP, network *net.IPNet) bool {
	for i := range ip {
		if ip[i] != (network.IP[i] | ^network.Mask[i]) {
			return false
		}
	}
	return true
}

This is intentionally naive. There’s no file locking (a production plugin would use flock or an atomic rename), and it doesn’t handle edge cases gracefully. But it’s clear, readable, and sufficient for learning.

Part 3: The CNI Binary

Now for the main binary. When the container runtime calls our plugin, we need to do the following on ADD:

Parse the CNI config from stdin.
Allocate an IP address.
Create a veth pair.
Move one end into the pod’s network namespace and configure it.
Attach the host end to our bridge.
Load and attach the eBPF program.
Return the result to the runtime as JSON.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// main.go
package main

import (
	"encoding/json"
	"fmt"
	"net"
	"os"
	"os/exec"
	"runtime"
	"strings"

	"github.com/containernetworking/cni/pkg/skel"
	"github.com/containernetworking/cni/pkg/types"
	current "github.com/containernetworking/cni/pkg/types/100"
	"github.com/containernetworking/cni/pkg/version"
	"github.com/vishvananda/netlink"
	"github.com/vishvananda/netns"
)

type NetConf struct {
	types.NetConf
	Bridge  string `json:"bridge"`
	Subnet  string `json:"subnet"`
	Gateway string `json:"gateway"`
}

func main() {
	skel.PluginMainFuncs(
		skel.CNIFuncs{
			Add: cmdAdd,
			Del: cmdDel,
			Check: cmdCheck,
		},
		version.All,
		"simple-ebpf-cni",
	)
}

func parseConfig(stdin []byte) (*NetConf, error) {
	conf := &NetConf{}
	if err := json.Unmarshal(stdin, conf); err != nil {
		return nil, fmt.Errorf("failed to parse config: %w", err)
	}
	if conf.Bridge == "" {
		conf.Bridge = "cbr0"
	}
	return conf, nil
}

The cmdAdd function is where the real work happens. Let’s walk through it in pieces.

Creating the Bridge

First, we ensure our bridge interface exists on the host:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
func ensureBridge(name string, gatewayIP net.IP, subnet *net.IPNet) (*netlink.Bridge, error) {
	br, err := netlink.LinkByName(name)
	if err == nil {
		return br.(*netlink.Bridge), nil
	}

	bridge := &netlink.Bridge{
		LinkAttrs: netlink.LinkAttrs{
			Name:   name,
			MTU:    1500,
			TxQLen: -1,
		},
	}

	if err := netlink.LinkAdd(bridge); err != nil {
		return nil, fmt.Errorf("failed to create bridge %s: %w", name, err)
	}

	addr := &netlink.Addr{
		IPNet: &net.IPNet{
			IP:   gatewayIP,
			Mask: subnet.Mask,
		},
	}
	if err := netlink.AddrAdd(bridge, addr); err != nil {
		return nil, fmt.Errorf("failed to assign address to bridge: %w", err)
	}

	if err := netlink.LinkSetUp(bridge); err != nil {
		return nil, fmt.Errorf("failed to bring up bridge: %w", err)
	}

	return bridge, nil
}

Creating and Configuring the Veth Pair

Next, we create a veth pair, move one end into the pod’s network namespace, and configure it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
func setupVeth(netnsPath, ifName string, podIP net.IP, subnet *net.IPNet,
	gatewayIP net.IP, bridge *netlink.Bridge) (string, error) {

	// Lock the OS thread because we're switching network namespaces.
	runtime.LockOSThread()
	defer runtime.UnlockOSThread()

	hostVethName := "veth" + generateShortID()

	// Get the pod's network namespace.
	podNS, err := netns.GetFromPath(netnsPath)
	if err != nil {
		return "", fmt.Errorf("failed to open netns %s: %w", netnsPath, err)
	}
	defer podNS.Close()

	// Create the veth pair in the host namespace.
	veth := &netlink.Veth{
		LinkAttrs: netlink.LinkAttrs{
			Name:  hostVethName,
			MTU:   1500,
		},
		PeerName: ifName,
	}

	if err := netlink.LinkAdd(veth); err != nil {
		return "", fmt.Errorf("failed to create veth pair: %w", err)
	}

	// Move the peer end into the pod's namespace.
	peer, err := netlink.LinkByName(ifName)
	if err != nil {
		return "", fmt.Errorf("failed to find peer veth: %w", err)
	}
	if err := netlink.LinkSetNsFd(peer, int(podNS)); err != nil {
		return "", fmt.Errorf("failed to move veth to pod ns: %w", err)
	}

	// Attach the host end to the bridge and bring it up.
	hostVeth, err := netlink.LinkByName(hostVethName)
	if err != nil {
		return "", fmt.Errorf("failed to find host veth: %w", err)
	}
	if err := netlink.LinkSetMaster(hostVeth, bridge); err != nil {
		return "", fmt.Errorf("failed to attach veth to bridge: %w", err)
	}
	if err := netlink.LinkSetUp(hostVeth); err != nil {
		return "", fmt.Errorf("failed to bring up host veth: %w", err)
	}

	// Now configure the pod side. We need to enter the pod's namespace.
	originNS, _ := netns.Get()
	defer netns.Set(originNS)

	if err := netns.Set(podNS); err != nil {
		return "", fmt.Errorf("failed to enter pod ns: %w", err)
	}

	podVeth, err := netlink.LinkByName(ifName)
	if err != nil {
		return "", fmt.Errorf("failed to find veth in pod ns: %w", err)
	}

	addr := &netlink.Addr{
		IPNet: &net.IPNet{
			IP:   podIP,
			Mask: subnet.Mask,
		},
	}
	if err := netlink.AddrAdd(podVeth, addr); err != nil {
		return "", fmt.Errorf("failed to set pod IP: %w", err)
	}

	if err := netlink.LinkSetUp(podVeth); err != nil {
		return "", fmt.Errorf("failed to bring up pod veth: %w", err)
	}

	// Add a default route pointing at the bridge gateway.
	defaultRoute := &netlink.Route{
		Dst: nil, // default route
		Gw:  gatewayIP,
	}
	if err := netlink.RouteAdd(defaultRoute); err != nil {
		return "", fmt.Errorf("failed to add default route: %w", err)
	}

	return hostVethName, nil
}

func generateShortID() string {
	// In practice, derive this from the container ID for determinism.
	f, _ := os.ReadFile("/proc/sys/kernel/random/uuid")
	return strings.ReplaceAll(string(f[:8]), "-", "")
}

Wiring It Together in cmdAdd

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
func cmdAdd(args *skel.CmdArgs) error {
	conf, err := parseConfig(args.StdinData)
	if err != nil {
		return err
	}

	_, subnet, _ := net.ParseCIDR(conf.Subnet)
	gatewayIP := net.ParseIP(conf.Gateway)

	// 1. Allocate an IP.
	ipam, err := loadIPAM()
	if err != nil {
		return fmt.Errorf("failed to load IPAM: %w", err)
	}
	podIP, err := ipam.allocate(args.ContainerID, conf.Subnet, conf.Gateway)
	if err != nil {
		return fmt.Errorf("IPAM allocation failed: %w", err)
	}

	// 2. Ensure the bridge exists.
	bridge, err := ensureBridge(conf.Bridge, gatewayIP, subnet)
	if err != nil {
		return err
	}

	// 3. Create veth pair and configure pod networking.
	hostVethName, err := setupVeth(args.Netns, args.IfName, podIP, subnet,
		gatewayIP, bridge)
	if err != nil {
		return err
	}

	// 4. Load and attach the eBPF program to the bridge.
	if err := loadAndAttachBPF(conf.Bridge); err != nil {
		// Log the error but don't fail. basic L2 forwarding
		// via the bridge will still work without the eBPF program.
		fmt.Fprintf(os.Stderr, "warning: eBPF attach failed: %v\n", err)
	}

	_ = hostVethName // Used for logging/debugging in a real plugin.

	// 5. Build and return the CNI result.
	result := &current.Result{
		CNIVersion: conf.CNIVersion,
		IPs: []*current.IPConfig{
			{
				Address: net.IPNet{
					IP:   podIP,
					Mask: subnet.Mask,
				},
				Gateway: gatewayIP,
			},
		},
	}
	return types.PrintResult(result, conf.CNIVersion)
}

func cmdDel(args *skel.CmdArgs) error {
	conf, _ := parseConfig(args.StdinData)
	_ = conf

	ipam, err := loadIPAM()
	if err != nil {
		return nil // Best-effort cleanup.
	}
	ipam.release(args.ContainerID)

	// The veth pair is automatically destroyed when the network
	// namespace is deleted, so we don't need to clean it up manually.
	return nil
}

func cmdCheck(args *skel.CmdArgs) error {
	return nil // Not implemented for this demo.
}

Part 4: The eBPF Program

We’ll write an eBPF program that attaches to the TC (traffic control) ingress hook on our bridge. The program inspects each incoming packet and can make forwarding decisions, collect metrics, or enforce policy.

For our minimal plugin, the eBPF program does two things: it counts packets per source IP (giving us basic observability), and it passes all traffic through (the bridge handles actual L2 forwarding). This is a starting point. you could extend it to do direct forwarding, bypass the bridge entirely, or enforce network policy.

Create a file called cni_tc.bpf.c:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// cni_tc.bpf.c
// Compile with: clang -O2 -target bpf -c cni_tc.bpf.c -o cni_tc.bpf.o

#include <linux/bpf.h>
#include <linux/pkt_cls.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>

// Map: source IP -> packet count.
// Useful for basic per-pod traffic observability.
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 1024);
    __type(key, __u32);   // source IPv4 address
    __type(value, __u64); // packet count
} pkt_count SEC(".maps");

// Map: destination IP -> destination ifindex.
// Populated from userspace by the CNI plugin to enable
// direct forwarding without relying on the bridge's FDB.
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 256);
    __type(key, __u32);   // destination IPv4 address
    __type(value, __u32); // ifindex of the destination veth
} pod_ifindex SEC(".maps");

SEC("tc")
int cni_tc_main(struct __sk_buff *skb)
{
    void *data     = (void *)(long)skb->data;
    void *data_end = (void *)(long)skb->data_end;

    // Parse the Ethernet header.
    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end)
        return TC_ACT_OK;

    // We only care about IPv4.
    if (eth->h_proto != bpf_htons(ETH_P_IP))
        return TC_ACT_OK;

    // Parse the IP header.
    struct iphdr *ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end)
        return TC_ACT_OK;

    // --- Packet counting ---
    __u32 src_ip = ip->saddr;
    __u64 *count = bpf_map_lookup_elem(&pkt_count, &src_ip);
    if (count) {
        __sync_fetch_and_add(count, 1);
    } else {
        __u64 init_val = 1;
        bpf_map_update_elem(&pkt_count, &src_ip, &init_val, BPF_ANY);
    }

    // --- Optional: direct forwarding ---
    // If we have a known ifindex for the destination pod,
    // we can redirect the packet directly, bypassing the
    // bridge's normal forwarding path. This is the core
    // idea behind how Cilium achieves high performance.
    __u32 dst_ip = ip->daddr;
    __u32 *target_ifindex = bpf_map_lookup_elem(&pod_ifindex, &dst_ip);
    if (target_ifindex) {
        return bpf_redirect(*target_ifindex, 0);
    }

    // Fall through to normal bridge forwarding.
    return TC_ACT_OK;
}

char _license[] SEC("license") = "GPL";

A few things to note about this program. The verifier. the kernel component that checks eBPF programs before loading them. requires that every pointer access is bounds-checked. That’s why we compare (void *)(eth + 1) > data_end before dereferencing the Ethernet header, and similarly for the IP header. Without those checks, the verifier will reject the program.

The SEC("tc") annotation tells the loader this is a TC classifier program. The SEC(".maps") annotation on the structs defines BPF maps. key-value stores shared between the eBPF program running in the kernel and our userspace CNI binary.

Compile it:

1
clang -O2 -g -target bpf -c cni_tc.bpf.c -o cni_tc.bpf.o

Part 5: Loading the eBPF Program from Go

We need Go code to load the compiled eBPF object file and attach it to the bridge’s TC ingress hook. The cilium/ebpf library makes this manageable:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// bpf.go
package main

import (
	"fmt"
	"os/exec"
)

const bpfObjectPath = "/opt/cni/bin/cni_tc.bpf.o"

func loadAndAttachBPF(bridgeName string) error {
	// For simplicity, we shell out to `tc` to attach the eBPF program.
	// A production plugin would use the cilium/ebpf library to do this
	// programmatically, which avoids the exec dependency and gives you
	// direct access to the BPF maps.

	// First, add a clsact qdisc to the bridge if it doesn't exist.
	// clsact is a special qdisc that lets us attach eBPF programs
	// to both ingress and egress without any queuing behavior.
	exec.Command("tc", "qdisc", "del", "dev", bridgeName,
		"clsact").Run() // Ignore error; it may not exist yet.

	out, err := exec.Command("tc", "qdisc", "add", "dev", bridgeName,
		"clsact").CombinedOutput()
	if err != nil {
		return fmt.Errorf("failed to add clsact qdisc: %s: %w",
			string(out), err)
	}

	// Attach our eBPF program to the ingress path of the bridge.
	out, err = exec.Command("tc", "filter", "add", "dev", bridgeName,
		"ingress", "bpf", "da", "obj", bpfObjectPath,
		"sec", "tc").CombinedOutput()
	if err != nil {
		return fmt.Errorf("failed to attach BPF program: %s: %w",
			string(out), err)
	}

	return nil
}

For a more robust approach, here’s a sketch of how you’d use the cilium/ebpf library directly. This gives you programmatic access to the BPF maps so you can populate the pod_ifindex map when new pods are created:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
// bpf_native.go (alternative approach using cilium/ebpf)
package main

import (
	"fmt"
	"net"

	"github.com/cilium/ebpf"
	"github.com/cilium/ebpf/link"
	"github.com/vishvananda/netlink"
)

type BPFProgram struct {
	collection  *ebpf.Collection
	pktCount    *ebpf.Map
	podIfindex  *ebpf.Map
}

func loadBPFNative(bridgeName string) (*BPFProgram, error) {
	spec, err := ebpf.LoadCollectionSpec(bpfObjectPath)
	if err != nil {
		return nil, fmt.Errorf("load BPF spec: %w", err)
	}

	coll, err := ebpf.NewCollection(spec)
	if err != nil {
		return nil, fmt.Errorf("create BPF collection: %w", err)
	}

	prog := coll.Programs["cni_tc_main"]
	if prog == nil {
		coll.Close()
		return nil, fmt.Errorf("BPF program 'cni_tc_main' not found")
	}

	// Get the bridge interface.
	br, err := netlink.LinkByName(bridgeName)
	if err != nil {
		coll.Close()
		return nil, fmt.Errorf("find bridge: %w", err)
	}

	// Attach to TC ingress.
	_, err = link.AttachTCX(link.TCXOptions{
		Program:   prog,
		Attach:    ebpf.AttachTCXIngress,
		Interface: br.Attrs().Index,
	})
	if err != nil {
		coll.Close()
		return nil, fmt.Errorf("attach TC program: %w", err)
	}

	return &BPFProgram{
		collection: coll,
		pktCount:   coll.Maps["pkt_count"],
		podIfindex: coll.Maps["pod_ifindex"],
	}, nil
}

// RegisterPod adds a pod's IP-to-ifindex mapping so the eBPF program
// can forward packets directly to the correct veth.
func (b *BPFProgram) RegisterPod(podIP net.IP, ifindex uint32) error {
	key := ip4ToUint32(podIP)
	return b.podIfindex.Put(key, ifindex)
}

func ip4ToUint32(ip net.IP) uint32 {
	ip = ip.To4()
	return uint32(ip[0]) | uint32(ip[1])<<8 |
		uint32(ip[2])<<16 | uint32(ip[3])<<24
}

Part 6: Deployment

Building

1
2
3
4
5
# Compile the eBPF program.
clang -O2 -g -target bpf -c cni_tc.bpf.c -o cni_tc.bpf.o

# Build the CNI binary.
CGO_ENABLED=0 go build -o simple-ebpf-cni .

Installing on the Node

CNI plugins live in /opt/cni/bin/ and their configs live in /etc/cni/net.d/. Here’s a simple install script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
set -euo pipefail

PLUGIN_NAME="simple-ebpf-cni"
CNI_BIN_DIR="/opt/cni/bin"
CNI_CONF_DIR="/etc/cni/net.d"

echo "Installing CNI plugin..."

# Copy the binary and BPF object.
sudo cp ${PLUGIN_NAME} ${CNI_BIN_DIR}/
sudo cp cni_tc.bpf.o ${CNI_BIN_DIR}/
sudo chmod +x ${CNI_BIN_DIR}/${PLUGIN_NAME}

# Write the CNI config. Using a .conflist is also fine,
# but a single .conf keeps things simple.
sudo tee ${CNI_CONF_DIR}/10-simple-ebpf.conf > /dev/null <<EOF
{
  "cniVersion": "1.0.0",
  "name": "simple-ebpf-cni",
  "type": "simple-ebpf-cni",
  "bridge": "cbr0",
  "subnet": "10.244.1.0/24",
  "gateway": "10.244.1.1"
}
EOF

echo "Done. Restart kubelet to pick up the new CNI config."

Deploying as a DaemonSet

In a real cluster, you’d package this as a DaemonSet with an init container that copies the binary and config onto each node. Here’s a minimal manifest:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: simple-ebpf-cni
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: simple-ebpf-cni
  template:
    metadata:
      labels:
        app: simple-ebpf-cni
    spec:
      hostNetwork: true
      tolerations:
        - operator: Exists
      initContainers:
        - name: install-cni
          image: your-registry/simple-ebpf-cni:latest
          command: ["/install.sh"]
          volumeMounts:
            - name: cni-bin
              mountPath: /opt/cni/bin
            - name: cni-conf
              mountPath: /etc/cni/net.d
          securityContext:
            privileged: true
      containers:
        - name: monitor
          image: your-registry/simple-ebpf-cni:latest
          command: ["sleep", "infinity"]
          # In a real plugin, this container would run a
          # daemon that watches for changes and manages
          # the eBPF maps (e.g., updating pod_ifindex
          # entries as pods come and go).
      volumes:
        - name: cni-bin
          hostPath:
            path: /opt/cni/bin
        - name: cni-conf
          hostPath:
            path: /etc/cni/net.d

Part 7: Testing

Once installed, create a couple of test pods and verify connectivity:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Create two test pods.
kubectl run test-a --image=busybox --command -- sleep 3600
kubectl run test-b --image=busybox --command -- sleep 3600

# Wait for them to be running.
kubectl wait --for=condition=ready pod/test-a pod/test-b --timeout=60s

# Check that they got IPs from our subnet.
kubectl get pods -o wide

# Test connectivity from pod A to pod B.
POD_B_IP=$(kubectl get pod test-b -o jsonpath='{.status.podIP}')
kubectl exec test-a -- ping -c 3 "$POD_B_IP"

To inspect the eBPF packet counters, use bpftool on the node:

1
2
3
4
5
# Find the map ID.
sudo bpftool map show | grep pkt_count

# Dump the contents (IPs are in network byte order).
sudo bpftool map dump id <MAP_ID>

You can also inspect the TC filter attachment:

1
2
# Verify the eBPF program is attached to the bridge.
sudo tc filter show dev cbr0 ingress

Part 8: What’s Missing

This plugin handles the simplest possible case. Here’s what a production-grade eBPF CNI adds on top:

Multi-node networking. Our plugin only works on a single node. Real CNI plugins use VXLAN, Geneve, WireGuard, or BGP to route traffic between nodes. Cilium uses either VXLAN encapsulation or native routing with BGP depending on your configuration.

Services and load balancing. Kubernetes Services (ClusterIP, NodePort, LoadBalancer) require DNAT/SNAT to route traffic to backend pods. eBPF-based CNIs replace kube-proxy’s iptables rules with eBPF programs attached to the XDP or TC hook, which is significantly faster.

Network Policy. Kubernetes NetworkPolicy resources define which pods can talk to each other. Enforcing these in eBPF means matching packets against policy rules in the datapath. essentially building a firewall in BPF.

IPv6 and dual-stack. We only handled IPv4. Real deployments increasingly need IPv6 support.

Observability. Our packet counter is a start, but production systems emit flow logs, integrate with Hubble or Prometheus, and provide real-time visibility into traffic patterns.

Identity-based security. Cilium assigns each pod a cryptographic identity and enforces policies based on identity rather than IP addresses, which is more robust in dynamic environments where IPs are constantly recycled.