Write a program which schedules other programs.

Linux scheduling policy is usually buried deep inside the kernel. Even with sched_ext, the policy is typically implemented as an eBPF program compiled ahead of time.

I contributed to Lunatik as part of GSoC project in 2026.

luasched is a part of that project.

It is a Lunatik binding created with a different approach: the scheduler remains in eBPF, but task classification is delegated to a Lua script. Instead of rebuilding the scheduler whenever policy changes, we can simply edit a Lua file.

Motivation

Suppose we want:

  • Nginx workers to get low latency scheduling
  • Firefox background processes to receive larger time slices
  • Everything else to use the default policy

Traditionally this logic would be hardcoded inside the scheduler like so:

if (is_nginx(task))
    ...
else if (is_firefox(task))
    ...

There are two pain points:

  • Every change is in eBPF, and the eBPF code is a pain to write.
  • Every policy change requires recompilation.

With luasched, the scheduler asks Lua how a task should be treated.

if task:comm():match("^nginx") then
	ctx:dsq(REALTIME)
	ctx:slice_ns(10000000)
end

IMPORTANT

The scheduling mechanism stays in eBPF. The scheduling policy lives in Lua.

Architecture:

flowchart TD
A[enqueue task] ==> B["`sched_ext BPF
check pid inside eBPF map
if found, dispatch
if **not** found, invoke Lua
`"]
B --> G[(eBPF map)]
B ==>|Cache miss|D[bpf_luasched_run kfunc]
B -->|Cache hit|E[scx_bpf_dsq_insert]
D ==>|ctx:dsq, ctx:slice_ns|F[Lua runtime]
F ==> E

There are two parts to this scheduler:

  • Create an eBPF scheduler which handles the fast path.
  • Create a workload.lua policy handler which takes care of slow path.
  • The first time a task is seen, the scheduler invokes Lua.
  • Lua returns:
    • target dispatch queue (DSQ)
    • scheduling slice
  • The result is cached in a BPF hash map keyed by PID.
  • Subsequent enqueues avoid Lua entirely.

Dispatch Queues

The user needs to define (on eBPF side) dispatch queues, like so.

#define DSQ_REALTIME 0
#define DSQ_BATCH    1
#define DSQ_DEFAULT  2

During initialization (on eBPF side) these queues are created:

scx_bpf_create_dsq(DSQ_REALTIME, -1);
scx_bpf_create_dsq(DSQ_BATCH, -1);
scx_bpf_create_dsq(DSQ_DEFAULT, -1);

Tasks are inserted into one of these queues based on their Lua-assigned class.

The dispatcher prioritizes them in order:

flowchart LR
a[REALTIME] --> B[BATCH]
B --> C[DEFAULT]

This means latency-sensitive work can be serviced before background workloads.

eBPF scheduler

Now on eBPF side create a map for caching pids with enqueue decisions. Define an enqueue decision as the following struct:

struct task_class {
	s32 dsq;
	u64 slice_ns;
};

Now we create the map for caching the decisions

struct {
	__uint(type, BPF_MAP_TYPE_HASH);
	__uint(max_entries, 10240);
	__type(key, pid_t);
	__type(value, struct task_class);
} task_classes SEC(".maps");

The interesting part happens inside enqueue. If the task was previously classified, the cached result is reused.

void BPF_STRUCT_OPS(luasched_enqueue, struct task_struct *p, u64 enq_flags)
{
    pid_t pid = p->pid;
    struct task_class *cls;
    
    cls = bpf_map_lookup_elem(&task_classes, &pid);
    if (cls) {
    	scx_bpf_dsq_insert(p, cls->dsq, cls->slice_ns, 0);
    	return;
    }
    ...
    /* invoke Lua to determine the enqueue verdict */
}

If no cached result is found, Lua is invoked. The function is exposed as a kfunc

extern int bpf_luasched_run(
    const char *key, 
    size_t key__sz, 
    struct task_struct *task, 
    struct task_class *cls
) __ksym;

It recieves a pointer to struct task_class and modifies it according to a Lua policy

void BPF_STRUCT_OPS(luasched_enqueue, struct task_struct *p, u64 enq_flags)
{
    ...
    /* invoke Lua to determine the enqueue verdict */
    struct task_class received_cls = { .dsq = -1, .slice_ns = -1 };
    
    int ret = bpf_luasched_run(runtime, sizeof(runtime), p, &received_cls);
    
    bpf_map_update_elem(&task_classes, &pid, &received_cls, BPF_ANY);
    scx_bpf_dsq_insert(p, received_cls.dsq, received_cls.slice_ns, 0);
}

Writing Policy in Lua

The scheduler itself knows nothing about process names like nginx or firefox.

That knowledge lives entirely in Lua.

local policy = {
    { pattern = "^nginx", dsq = REALTIME, slice = 1000000 },
    { pattern = "^firefox", dsq = BATCH, slice = 10000000 },
}

On Lua we attach a handler to set the enqueue verdict

local sched = require("sched")
 
local function workload(ctx)
	local task = ctx:task()
	for _, rule in ipairs(policy) do
		if task:comm():match(rule.pattern) then
			ctx:dsq(rule.dsq)
			ctx:slice_ns(rule.slice)
			return
		end
	end
	ctx:dsq(DEFAULT)
	ctx:slice_ns(scx.SLICE_DFL)
end
 
sched.attach(workload)

Why Cache Results?

Calling into Lua on every enqueue would be expensive. A task’s command name rarely changes after startup. By classifying a task once and storing the result in a BPF map, the scheduler pays the Lua cost only once.

All future scheduling decisions become simple hash lookups.