Chapter 21: User I/O Subsystems¶

TTY/PTY, console/logging, input (evdev), audio (ALSA), display/graphics (DRM/KMS)

User I/O subsystems bridge the kernel to human-facing hardware: terminals (TTY/PTY), input devices (evdev), audio (ALSA-compatible), and display/graphics (DRM/KMS). Each subsystem presents a Linux-compatible userspace API while using UmkaOS-internal driver isolation and zero-copy paths where applicable.

21.1 TTY and PTY Subsystem¶

Tier assignment: The early-boot serial console (/dev/console, /dev/ttyS0) runs as Tier 0 (in-kernel, statically linked) because it is the diagnostic output path during boot before any driver isolation domains are available. The PTY subsystem (/dev/ptmx, /dev/pts/*) and non-boot serial drivers (/dev/ttyUSB*, /dev/ttyACM*) run as Tier 1 within the VFS isolation domain. Line discipline processing for PTYs is performed in the Tier 1 domain alongside the VFS and devpts pseudo-filesystem.

21.1.1 The Problem¶

Linux's TTY layer is a historical artifact designed for 300-baud hardware teletypes. It features monolithic locks (tty_mutex, termios_rwsem), synchronous line discipline processing (handling backspace and signals in the critical path), and a complex buffer management system that scales poorly to thousands of concurrent terminal sessions.

In modern systems, the TTY layer is primarily used for Pseudo-Terminals (PTYs) — the backends for SSH sessions, terminal emulators (GNOME Terminal, Alacritty), and container multiplexers (Docker, Kubernetes). The Linux PTY implementation requires every byte of terminal output to traverse the kernel data path, acquiring locks and waking sleeping processes, making it a significant bottleneck for high-density container logging and high-throughput terminal applications.

21.1.2 UmkaOS's Lock-Free Ring Architecture¶

UmkaOS completely rearchitects the TTY/PTY subsystem around lock-free, single-producer/single-consumer (SPSC) ring buffers, identical to the KABI ring buffers used for storage and networking (Section 11.7).

The PTY Data Path: A PTY consists of a master side (/dev/ptmx, held by SSHd or Docker) and a slave side (/dev/pts/N, held by the shell or containerized application).

In UmkaOS, a PTY pair shares a pair of mapped memory pages (8 KB total) containing two SPSC ring buffers (master-to-slave and slave-to-master). Each ring buffer occupies one 4 KB page, providing adequate buffer space for interactive terminal sessions and container logging.

/// PTY ring buffer header. 16 bytes, designed for minimal overhead.
///
/// This is a simplified SPSC ring buffer format (not the full DomainRingBuffer
/// from Section 11.6.2, which has 128 bytes of header for MPSC/broadcast support).
/// PTYs are always single-producer/single-consumer, so the compact header suffices.
///
/// Layout (128 bytes, cache-line padded to prevent false sharing):
///   - bytes [0..8]:    head index (write position, AtomicU64)
///   - bytes [8..64]:   padding (separate head from tail cache line)
///   - bytes [64..72]:  tail index (read position, AtomicU64)
///   - bytes [72..128]: padding (separate tail from data)
///   - bytes [128..4095]: data buffer (3968 bytes usable)
///
/// The 64-byte cache-line separation between head and tail eliminates false
/// sharing: the producer writes head (cache line 0) while the consumer writes
/// tail (cache line 1). Without padding, both fields fit in a single 16-byte
/// header and share a cache line, causing cache-line bouncing at ~1M write/s.
// Userspace boundary struct — mmap'd into userspace in zero-copy PTY mode. Layout is stable.
#[repr(C, align(64))]
pub struct PtyRingHeader {
    /// Write position (producer advances). Counts bytes written.
    /// Applied modulo data capacity to get buffer offset.
    ///
    /// AtomicU64 on 32-bit architectures (ARMv7, PPC32): the SPSC protocol
    /// only requires store-release (producer) and load-acquire (consumer) —
    /// no CAS or read-modify-write. On architectures where AtomicU64 is not
    /// natively lock-free, these are implemented as a u64 write + release
    /// fence (producer) and acquire fence + u64 read (consumer), which is
    /// correct for single-producer/single-consumer without hardware atomics.
    pub head: AtomicU64,
    /// Padding to push tail to the next 64-byte cache line.
    pub _pad_head: [u8; 56],
    /// Read position (consumer advances). Counts bytes read.
    /// Applied modulo data capacity to get buffer offset.
    pub tail: AtomicU64,
    /// Padding to fill the second cache line (64 bytes total per line).
    pub _pad_tail: [u8; 56],
}

// Static assertion: PtyRingHeader is exactly 128 bytes (2 cache lines).
// head(8) + _pad_head(56) + tail(8) + _pad_tail(56) = 128 bytes.
const _PTY_RING_HEADER_SIZE: () = assert!(
    core::mem::size_of::<PtyRingHeader>() == 128,
    "PtyRingHeader must be exactly 128 bytes (2 cache lines)"
);

/// Data capacity of each PTY ring after the 128-byte cache-line-padded header.
/// 4096 (page) - 128 (header) = 3968 usable bytes. With monotonic u64
/// counters, sentinel-based disambiguation is unnecessary (full when
/// `head - tail >= PTY_RING_DATA_SIZE`).
pub const PTY_RING_DATA_SIZE: usize = 4096 - 128;  // 3968

/// PTY ring buffer page. 4 KB total, 3968 bytes usable data.
/// Aligned to page boundary for direct mmap() into userspace.
///
/// The producer writes at (head % PTY_RING_DATA_SIZE + 128), advancing head.
/// The consumer reads at (tail % PTY_RING_DATA_SIZE + 128), advancing tail.
///
/// **Full/empty detection**: Since head and tail are monotonically increasing
/// u64 counters (never reset), full/empty is detected by simple subtraction:
/// - Empty when `head == tail`
/// - Full when `head - tail >= PTY_RING_DATA_SIZE` (monotonic u64 counters make sentinel-based disambiguation unnecessary)
/// - Available for write: `PTY_RING_DATA_SIZE - (head - tail)`
/// - Available for read: `head - tail`
/// Buffer offsets are derived by `head % PTY_RING_DATA_SIZE + 128` and
/// `tail % PTY_RING_DATA_SIZE + 128`.
/// The u64 counters will not wrap in practice (2^64 bytes = 18 exabytes).
// Userspace boundary struct — mmap'd into userspace in zero-copy PTY mode. Layout is stable.
#[repr(C, align(4096))]
pub struct PtyRingPage {
    /// Ring buffer header (128 bytes, 2 cache lines).
    pub header: PtyRingHeader,
    /// Data buffer (PTY_RING_DATA_SIZE bytes).
    pub data: [u8; PTY_RING_DATA_SIZE],
}
// PtyRingPage: 128 (header) + 3968 (data) = 4096 bytes (one page).
// mmap'd into userspace — boundary struct.
const _: () = assert!(core::mem::size_of::<PtyRingPage>() == 4096);

/// The reverse-direction ring (slave→master) is a separate page allocation.
/// Same layout as PtyRingPage. This design allows each direction to be
/// mapped independently if needed, and avoids the 8 KB allocation exceeding
/// the page granularity.
// Userspace boundary struct — mmap'd into userspace in zero-copy PTY mode. Layout is stable.
#[repr(C, align(4096))]
pub struct PtyRingPageReverse {
    /// Ring buffer header (128 bytes, 2 cache lines).
    pub header: PtyRingHeader,
    /// Data buffer (PTY_RING_DATA_SIZE bytes).
    pub data: [u8; PTY_RING_DATA_SIZE],
}
// PtyRingPageReverse: same layout as PtyRingPage = 4096 bytes.
const _: () = assert!(core::mem::size_of::<PtyRingPageReverse>() == 4096);

/// Terminal state shared between master and slave.
/// Stored in a separate small allocation (not a full page) within a per-master
/// state arena. Multiple PTYs from the same master share a single arena,
/// amortizing the page allocation overhead.
///
/// Total size: 32 bytes (8-byte aligned, fits in half a cache line).
/// Not cache-line padded (64 bytes) — multiple AtomicTtyState structs
/// are packed into a shared 4 KB arena; padding to 64 bytes would halve
/// the arena capacity from 128 to 64 PTYs per page.
/// Layout: termios_flags(4) + winsize_seq(4) + winsize_data(8) + flow_control(1)
///         + zero_copy_enabled(1) + _pad(14) = 32.
#[repr(C, align(8))]
pub struct AtomicTtyState {
    /// Terminal flags (ICANON, ECHO, ISIG, etc.) as bit positions.
    /// Modified atomically via compare-and-swap.
    pub termios_flags: AtomicU32,
    /// Window size (rows, columns). Modified via seqlock protocol.
    /// Layout: [seq_counter: AtomicU32 (4 bytes), winsize: Winsize (8 bytes)]
    pub winsize_seq: AtomicU32,
    pub winsize_data: UnsafeCell<Winsize>,
    /// Flow control state (stopped/running).
    pub flow_control: AtomicBool,
    /// Zero-copy mode enabled flag. Set by mutual consent handshake.
    pub zero_copy_enabled: AtomicBool,
    /// Padding to 32 bytes for cache alignment.
    _pad: [u8; 14],
}
const_assert!(core::mem::size_of::<AtomicTtyState>() == 32);

/// Window size structure (matches POSIX struct winsize from <sys/ioctl.h>).
/// Used by TIOCGWINSZ/TIOCSWINSZ ioctls.
#[repr(C)]
#[derive(Clone, Copy)]
pub struct Winsize {
    pub ws_row: u16,
    pub ws_col: u16,
    pub ws_xpixel: u16,
    pub ws_ypixel: u16,
}
const_assert!(core::mem::size_of::<Winsize>() == 8);

PtyPair struct — the kernel's handle for one PTY master+slave pair:

/// Kernel-side state for a single PTY pair (master + slave).
///
/// # Safety
/// The three raw pointer fields (`master_tx`, `slave_tx`, `state`) point to
/// slab-allocated objects exclusively owned by this `PtyPair`. They are:
/// - Allocated in `PtyPair::new()` from the PTY slab pool (rings: 4 KiB page
///   each; state: 32-byte slot from `pty_state_slab()`).
/// - Valid for the entire lifetime of the `PtyPair` (until `Drop`).
/// - ALL three are freed in `Drop::drop()` by returning to their respective
///   slab pools (`pty_slab_pool` for rings, `pty_state_slab` for state).
/// No aliasing occurs: only this `PtyPair` and the zero-copy mapped processes
/// (if enabled) hold references to the physical pages. The kernel retains
/// ownership and revokes user mappings on process exit or exec.
pub struct PtyPair {
    /// Master→slave ring buffer (1 page, 4 KiB). See Safety on `PtyPair`.
    /// `Option` so that `Drop` can `.take()` to free the page exactly once.
    pub master_tx: Option<*mut PtyRingPage>,
    /// Slave→master ring buffer (1 page, 4 KiB). See Safety on `PtyPair`.
    /// `Option` so that `Drop` can `.take()` to free the page exactly once.
    pub slave_tx: Option<*mut PtyRingPageReverse>,
    /// Shared atomic terminal state (termios flags, winsize, flow control).
    /// See Safety on `PtyPair`.
    pub state: *mut AtomicTtyState,
    /// Mount namespace ID of the process that opened the master fd.
    /// Used for zero-copy security checks (see restriction 3 below).
    pub owner_mnt_ns_id: MntNsId,
    /// PTY index (/dev/pts/N) within the devpts mount.
    pub pts_index: u32,
    /// Initial termios snapshot used during zero-copy negotiation.
    /// For PTY devices, the authoritative termios state is `TtyPort.termios`,
    /// updated by `tcsetattr()`. `AtomicTtyState.termios_flags` is a lock-free
    /// cache of the hot-path flags, updated atomically on every `tcsetattr()`.
    pub termios: SpinLock<Termios>,
    /// Line discipline attached to this PTY (default: N_TTY).
    pub ldisc: AtomicU8,
    /// Control ring for out-of-band signal delivery in zero-copy mode.
    ///
    /// **Lifecycle**: Allocated as a single 4 KB page via `alloc_page()`
    /// when the terminal emulator enables zero-copy mode (ioctl
    /// `TIOCSETZCOPY`). The page is mapped read-write into the terminal
    /// emulator's address space and read-only into the kernel's.
    ///
    /// **Deallocation**: On PTY close (`tty_release()`), the kernel:
    /// 1. Unmaps the control ring page from the terminal emulator's VMA
    ///    (if still mapped — the process may have already exited).
    /// 2. Drains any pending events from the ring (no-op if empty).
    /// 3. Frees the page via `free_page()`.
    /// The `Option` is set to `None` after deallocation.
    /// If the terminal emulator process exits first, the VMA teardown
    /// in `exit_mmap()` unmaps its side; the kernel retains the page
    /// until `tty_release()` runs (triggered by the last fd close).
    pub control_ring: Option<*mut PtyControlRing>,
}

/// Drop impl returns all allocated pages/slots to the PTY slab pool.
/// Runs when the last `Arc<PtyPair>` reference is dropped (both master
/// and slave fds closed, XArray entry removed).
impl Drop for PtyPair {
    fn drop(&mut self) {
        // Order: unmap userspace mappings first (if still mapped), then
        // return pages/slots. The slab pool is per-NUMA-node for locality.
        if let Some(ring) = self.master_tx.take() {
            // SAFETY: ring was allocated from PTY slab in PtyPair::new()
            // and is exclusively owned by this PtyPair (no aliasing).
            unsafe { pty_slab_pool().free_page(ring as *mut u8) };
        }
        if let Some(ring) = self.slave_tx.take() {
            // SAFETY: same invariant as master_tx above.
            unsafe { pty_slab_pool().free_page(ring as *mut u8) };
        }
        if let Some(ctrl) = self.control_ring.take() {
            unsafe { pty_slab_pool().free_page(ctrl as *mut u8) };
        }
        // Free the AtomicTtyState slot back to the state arena.
        // SAFETY: `state` was allocated from the PTY state slab in
        // PtyPair::new() and is exclusively owned by this PtyPair.
        // The 32-byte slot is returned to the per-NUMA slab allocator.
        if !self.state.is_null() {
            unsafe { pty_state_slab().free(self.state as *mut u8, 32) };
        }
    }
}

Memory layout: A PTY pair consists of three shared memory regions: 1. Master→slave ring (1 page, 4 KB): Written by master, read by slave 2. Slave→master ring (1 page, 4 KB): Written by slave, read by master 3. State arena (shared across PTYs from same master): Contains multiple AtomicTtyState structs (32 bytes each). A 4 KB arena supports up to 128 PTYs.

Seqlock protocol for window size (see Section 3.6 for the formal SeqLock<T> specification): Reads use the standard seqlock pattern: 1. Read winsize_seq with Acquire ordering. If odd, retry (writer in progress). 2. Read winsize_data with Relaxed ordering (protected by the epoch fence). 3. Read winsize_seq again with Acquire ordering. If changed, retry. Writes: acquire TTY write mutex, store winsize_seq odd with Release ordering, update winsize_data with Relaxed, store winsize_seq even with Release. The Acquire/Release pairs on winsize_seq ensure that data reads are fenced by the epoch loads on all architectures (including ARM/RISC-V/PPC with weak ordering).

Writer serialization: Concurrent TIOCSWINSZ callers must acquire the TTY write mutex before entering the seqlock write section (incrementing winsize_seq to odd). Without this, two concurrent writers can interleave their begin/end increments, leaving winsize_seq in an odd (permanently-locked) state and corrupting winsize_data. The reader path (TIOCGWINSZ) requires no mutex — pure seqlock retry is sufficient.

When the slave application calls write() to stdout, the UmkaOS syscall interface (umka-sysapi) writes the data directly into the slave_tx ring buffer. If the master application is polling via epoll() or io_uring, the kernel signals the eventfd associated with the ring.

Zero-Copy PTYs for Containers: For high-density container environments, UmkaOS supports a zero-copy PTY mode. If both the master and slave processes explicitly request it via an UmkaOS-specific ioctl(PTY_REQ_DIRECT), the kernel maps the PtyRingPage directly into the address spaces of both processes. The master and slave can then exchange terminal data entirely in userspace, bypassing the kernel data path completely. The kernel is only invoked to handle buffer full/empty wakeups (via futex). This allows a single node to stream gigabytes of container logs per second with near-zero CPU overhead.

Zero-copy mode restrictions: - Raw mode only: Zero-copy mode requires the PTY to be in raw mode (ICANON flag clear in termios). The kernel's asynchronous TTY worker thread (Section 21.1) is bypassed, so no inline line discipline processing occurs. Applications receive raw bytes without backspace handling or line buffering. Signal generation is handled out-of-band via the control ring (see next bullet). - Signal generation via control ring: Because the kernel data path is bypassed, inline byte-stream interception cannot detect control characters. Instead, zero-copy PTY uses a dedicated control ring for out-of-band signal delivery (see Section 21.1 below). POSIX semantics (Ctrl+C → SIGINT, Ctrl+\ → SIGQUIT, Ctrl+Z → SIGTSTP) are preserved. - No echo processing: Local echo (ECHO flag) is disabled automatically when zero-copy mode is activated. The master must implement echo if required. - Termios changes require renegotiation: If either side calls tcsetattr() to change terminal settings, the kernel automatically disables zero-copy mode and falls back to kernel-mediated mode. To re-enable, both sides must repeat the consent handshake.

Security Model for Zero-Copy PTYs:

Trust boundary note: Zero-copy mode creates a shared-memory channel between master and slave. The master process can directly read all slave terminal output without kernel mediation. Zero-copy mode requires mutual trust between master and slave and is not suitable for security-isolation boundaries (e.g., between different security domains, privilege levels, or container trust zones).

Zero-copy PTY mode requires explicit security checks before enabling direct memory sharing:

Capability requirement: The master side (the process requesting zero-copy mode) must hold CAP_TTY_DIRECT (defined in Section 9.2). This capability grants permission to bypass the kernel's TTY data path security checks. Container runtimes (Docker, containerd) typically hold this capability; unprivileged processes do not.
Mutual consent: Both master and slave must explicitly agree to zero-copy mode. The PtyDirectParams structure passed to PTY_REQ_DIRECT is:

/// Parameters for PTY_REQ_DIRECT ioctl.
/// Layout: C-compatible, 64-byte fixed size (padding ensures ABI stability).
#[repr(C)]
pub struct PtyDirectParams {
    /// Random 64-bit nonce generated by the master. The slave must echo
    /// this value in its PTY_ACK_DIRECT ioctl to prove consent.
    /// The kernel verifies nonce equality. Generated via `getrandom(2)`.
    pub nonce: u64,

    /// Timeout for slave acknowledgement in milliseconds.
    /// If the slave does not call PTY_ACK_DIRECT within this window,
    /// PTY_REQ_DIRECT returns -ETIMEDOUT. Range: 100-30000 ms.
    /// Default (0): kernel uses 5000 ms.
    pub timeout_ms: u32,

    /// Requested ring buffer size for the shared data ring (bytes).
    /// Must be a power of two in [4096, 4194304] (4 KB to 4 MB).
    /// Default (0): kernel uses 65536 bytes (64 KB, matching pipe default).
    pub ring_size_bytes: u32,

    /// Flags. Currently reserved, must be 0.
    pub flags: u64,

    /// On success, filled by the kernel with the file descriptor for
    /// the shared ring mmap. The caller maps this fd to access the ring.
    /// Negative value on failure.
    pub ring_fd: i32,

    /// Padding to 64 bytes for ABI stability.
    pub _pad: [u8; 36],
}
const_assert!(core::mem::size_of::<PtyDirectParams>() == 64);

Error codes for PTY_REQ_DIRECT: - -EPERM: caller lacks CAP_TTY_DIRECT - -EINVAL: timeout_ms or ring_size_bytes out of range, or flags != 0 - -ETIMEDOUT: slave did not acknowledge within timeout_ms - -EBUSY: zero-copy mode already active on this PTY - -ENOMEM: ring buffer allocation failed

Error codes for PTY_ACK_DIRECT: - -ENOENT: no pending PTY_REQ_DIRECT request on this slave fd - -EINVAL: nonce mismatch (wrong value supplied)

Master requests via ioctl(fd, PTY_REQ_DIRECT, &params) where params includes a nonce
Slave acknowledges via ioctl(slave_fd, PTY_ACK_DIRECT, nonce) within a timeout window
If the slave never acknowledges, the request fails with -ETIMEDOUT
This prevents a malicious master from forcing zero-copy mode on an unsuspecting slave
Same mount namespace constraint: Both processes must share the same mount namespace as the PTY owner. The check is:

// Zero-copy PTY access requires same mount namespace as the PTY owner.
// Mount namespace is stable for a process's lifetime (cannot be changed
// after unshare(CLONE_NEWNS)), unlike cgroup membership which can be
// changed after the zero-copy channel is established (TOCTOU bypass).
if current_task().mnt_ns_id == pty.owner_mnt_ns_id {
    enable_zero_copy_for_pair(master_fd, slave_fd)
} else {
    Err(Error::PermissionDenied)
}

Mount namespace is the correct isolation boundary for PTY zero-copy: it is immutable after unshare(CLONE_NEWNS) and correctly scopes to a container boundary. Using cgroup membership would be vulnerable to cgroup migration attacks — a process can be moved between cgroups by any holder of CAP_SYS_ADMIN, creating a TOCTOU bypass where the check passes but the process is subsequently migrated out of the container's cgroup scope before the zero-copy channel is used. Mount namespace membership, by contrast, is fixed for the lifetime of the process after the initial unshare() call and cannot be changed by any external actor.

The PtyPair struct stores owner_mnt_ns_id: MntNsId (not owner_cgroup) for this check. The MntNsId is recorded when the PTY master fd is opened (at posix_openpt() time) and never updated. Processes without CAP_SYS_ADMIN cannot change their mount namespace after creation.

Memory isolation guarantee: The PtyRingPage is mapped with PROT_READ | PROT_WRITE into both processes, but the kernel retains a back-reference to the physical pages. If either process exits or execs a binary with elevated capability grants (Section 9.2), the kernel immediately revokes the direct mapping and falls back to standard ring-buffer mode. This prevents privilege escalation via persistent shared memory.
Audit logging: Successful zero-copy mode activation generates an audit event (Section 20.2) with both PIDs and the PTY device identifier, enabling post-incident forensics.

The fallback path (when zero-copy is not requested or denied) uses the standard kernel-mediated ring buffer with full security checks on every data transfer.

21.1.2.1.1 Signal Generation in Zero-Copy Mode¶

Problem: In standard PTY mode, the kernel's line discipline (N_TTY) reads every byte written to the PTY master, detects control characters (INTR=0x03 → SIGINT, QUIT=0x1C → SIGQUIT, SUSP=0x1A → SIGTSTP, EOF=0x04), and delivers signals to the foreground process group. In zero-copy mode (PTY_REQ_DIRECT), the terminal emulator writes directly to the shared ring buffer without a kernel read path — so the kernel cannot intercept control characters inline.

Solution — Sentinel ring for control characters:

Zero-copy PTY uses a dual-ring design:

Data ring (shared mmap, zero-copy): carries printable characters. The terminal emulator writes here at full speed.
Control ring (small kernel-visible ring, 64 entries): carries out-of-band events. The terminal emulator writes here when it detects a control character.

/// Out-of-band control event sent from the terminal emulator to the kernel
/// via the control ring. Each variant corresponds to a POSIX signal or
/// terminal state change that the kernel must process.
// Size: 16 bytes per entry under #[repr(C, u8)] due to FlushTo { offset: u64 } alignment.
// The discriminant occupies the first byte; the largest variant (FlushTo/WindowResize)
// determines the enum size.
#[repr(C, u8)]
pub enum PtyControlEvent {
    /// Terminal emulator detected INTR character (default: Ctrl+C = 0x03).
    /// Kernel delivers SIGINT to foreground process group.
    SignalIntr = 1,
    /// Terminal emulator detected QUIT character (default: Ctrl+\ = 0x1C).
    /// Kernel delivers SIGQUIT to foreground process group.
    SignalQuit = 2,
    /// Terminal emulator detected SUSP character (default: Ctrl+Z = 0x1A).
    /// Kernel delivers SIGTSTP to foreground process group.
    SignalSusp = 3,
    /// Terminal window resized. Kernel delivers SIGWINCH and updates winsize.
    WindowResize { cols: u16, rows: u16, xpixel: u16, ypixel: u16 } = 4,
    /// Terminal emulator detected EOF (default: Ctrl+D = 0x04).
    /// Kernel sets hangup condition on PTY slave.
    Eof = 5,
    /// Flush the data ring up to this byte offset (for atomic command delivery).
    FlushTo { offset: u64 } = 6,
}
const_assert!(core::mem::size_of::<PtyControlEvent>() == 16);

Control ring layout:

/// Written to the control ring page (mapped read-write by terminal emulator).
/// The control ring occupies a single 4 KB page, separate from the data ring
/// pages. The terminal emulator writes events; the kernel drains them.
#[repr(C)]
pub struct PtyControlRing {
    /// Write index (terminal emulator advances).
    ///
    /// **Memory ordering**: Terminal emulator stores with `Release` after
    /// writing the event entry. Kernel loads with `Acquire` before reading
    /// the entry, ensuring the event data is visible before consumption.
    pub write_idx: AtomicU32,
    /// Padding to separate from kernel's read_idx (avoid false sharing).
    _pad: [u8; 60],
    /// Read index (kernel advances).
    ///
    /// **Memory ordering**: Kernel stores with `Release` after processing
    /// the event. Terminal emulator loads with `Acquire` before checking
    /// free slots, ensuring the slot is fully consumed before reuse.
    pub read_idx: AtomicU32,
    /// Ring entries. 64 entries × 16 bytes = 1024 bytes.
    /// Total struct: 4 + 60 + 4 + 1024 = 1092 bytes (fits in a 4 KB page
    /// with room for additional metadata). PtyControlEvent entries are NOT
    /// cache-line aligned because the control ring is a low-frequency path
    /// (human input rate: <1000 events/sec). The 60-byte pad between
    /// write_idx and read_idx prevents false sharing between the userspace
    /// producer and the kernel consumer on the hot indices only.
    /// Note: read_idx (kernel-written) shares a cache line with entries[0..3]
    /// (terminal-emulator-written). At <1000 events/sec this false sharing
    /// adds negligible overhead and is not worth the 60-byte padding cost.
    pub entries: [PtyControlEvent; 64],
}
const_assert!(core::mem::size_of::<PtyControlRing>() == 1092);

Terminal emulator protocol: When the terminal emulator detects a control character in the input stream (from the physical keyboard), it:

Writes the control character's PtyControlEvent to the control ring at index write_idx % 64.
Increments write_idx with Release ordering.
Triggers the kernel via write(ctl_fd, &SIG_NOTIFY, 1) — a 1-byte write to a dedicated control file descriptor that does not carry data, just wakes the kernel.

The kernel, on receiving the ctl_fd write:

Drains the control ring: reads entries[read_idx % 64] (a 16-byte slot), inspects the u8 discriminant at offset 0 of the slot; values outside the valid range [1, 6] are silently discarded (the entry is skipped and read_idx advances). This prevents undefined behavior from a malicious or buggy terminal emulator writing invalid discriminants into the user-mapped control ring page.
For each SignalIntr/SignalQuit/SignalSusp: calls kill_pgrp(slave_pgrp, sig, 1) to deliver the signal to the foreground process group.
For WindowResize: updates PtyState.winsize and delivers SIGWINCH to the foreground process group.
For Eof: sets the hangup condition on the PTY slave, waking any blocked readers with zero-length reads.
Advances read_idx with Release ordering.

Security: The control ring is in a user-mapped page. A malicious terminal emulator could spam SIGINT events, but: (1) signals can only be delivered to processes in the session the PTY controls — cross-session delivery is impossible; (2) rate limiting: at most 64 control events per ctl_fd write (ring size); (3) the mapping is per-PTY, allocated only in zero-copy mode. A compromised terminal emulator already has full control over the PTY master side (it can close the fd, inject arbitrary bytes, resize the window), so the control ring does not expand the attack surface.

SIGINT rate limiting: PTY SIGINT rate limiting is applied per PTY slave device, not per master FD. Multiple master FDs opened to the same slave share one token bucket. This prevents a misbehaving terminal emulator from bypassing the rate limit by opening N master FDs (each with its own bucket) and interleaving SIGINT injections across them.

Token bucket: capacity = 100, refill_rate = 1000 tokens/second
Each injected SIGINT, SIGQUIT, SIGTSTP, or SIGHUP consumes 1 token
When the bucket is empty, excess signals are dropped silently
The terminal emulator is expected to coalesce input events; the rate limit prevents a malicious or buggy terminal emulator from flooding the foreground process group. 1000 signals/second is far above any legitimate interactive use.

Rate limiting state is stored in PtySlaveState (not in per-fd structures):

pub struct PtySlaveState {
    // ... existing fields ...
    /// Shared SIGINT rate limiter for this slave device.
    /// All master FDs to this slave share this bucket.
    /// Capacity: 100 signals; refill rate: 1000 signals/second.
    pub signal_token_bucket: TokenBucket,
}

When any master FD injects a SIGINT to this slave: deduct one token from PtySlaveState::signal_token_bucket. If the bucket is empty, the injection is rate-limited (SIGINT is either queued or dropped, depending on policy).

Token bucket lifetime: same as PTY slave device lifetime — NOT tied to any specific master FD's lifetime.

Compatibility: Applications using standard read()/write() on the PTY master continue to work unchanged — signal generation is handled by the kernel's line discipline (Section 21.1). The control ring is only allocated when zero-copy mode is activated via PTY_REQ_DIRECT ioctl. Falling back from zero-copy mode (due to tcsetattr() or process exit) automatically returns to kernel-mediated signal generation.

21.1.2.1.2 XON/XOFF Flow Control in Zero-Copy Mode¶

Problem: Classical TTY processes XON/XOFF software flow control by scanning each byte as it passes through the line discipline — when XOFF (Ctrl-S, 0x13) is seen, output is paused; when XON (Ctrl-Q, 0x11) is seen, output resumes. This is fundamentally incompatible with zero-copy: you cannot scan a buffer you are not copying. The solution is a two-layer architecture that preserves the zero-copy property for bulk data while enforcing POSIX flow control semantics.

Layer 1 — Data path (zero-copy): In zero-copy mode, the slave writes directly to the master's ring buffer without scanning for XON/XOFF characters. This preserves the zero-copy property for bulk data (container logs, remote shell output, etc.).

Layer 2 — Control path (XON/XOFF scanning): XON/XOFF scanning is performed only when IXON or IXOFF is set in termios.c_iflag. The scan happens at the ring buffer consumer (master read) side — bytes are examined as the master application reads them, not as the slave writes them. The flow control state is communicated back to the slave writer via an atomic flag in AtomicTtyState.

/// Flow control state for one side of a PTY (master or slave).
///
/// Manages XON/XOFF (software flow control, IXON/IXOFF termios flags).
///
/// **PTY TIOCMGET/TIOCMSET behavior**: Linux PTYs do NOT implement `tiocmget` —
/// `TIOCMGET` returns `-ENOTTY` (errno 25) on Linux PTYs because none of the
/// PTY `tty_operations` structs (`ptm_unix98_ops`, `pty_unix98_ops`) set a
/// `tiocmget` function pointer. UmkaOS matches this: PTY ioctl dispatch returns
/// `-ENOTTY` for `TIOCMGET`/`TIOCMSET`/`TIOCMBIS`/`TIOCMBIC`. The `modem_signals`
/// field below is used internally for XON/XOFF flow control state only; it is
/// NOT exposed via TIOCMGET for PTY devices. Physical serial ports (via
/// `SerialTtyOps` KABI) DO implement TIOCMGET and return real modem signal state.
///
/// XON character: `termios.c_cc[VSTART]` (default Ctrl-Q = 0x11).
/// XOFF character: `termios.c_cc[VSTOP]`  (default Ctrl-S = 0x13).
///
/// All fields are atomic so the master consumer and slave writer can read/write
/// without holding a lock on the hot data path.
/// Kernel-internal, Rust-managed layout. Not ABI.
pub struct PtyFlowControlState {
    /// True if this side is currently in XOFF state (transmission suspended).
    /// Set when the read buffer crosses `rx_high_watermark`; cleared when it
    /// drops below `rx_low_watermark`. The slave write path checks this before
    /// writing to the ring.
    pub tx_stopped: AtomicBool,

    /// True if the remote side (peer) is in XOFF state (we must stop sending).
    /// Set when we receive an XOFF character or when the peer's `tx_stopped` is true.
    pub peer_stopped: AtomicBool,

    /// Number of bytes currently in the receive buffer for this side.
    pub rx_bytes: AtomicU32,

    /// High watermark: when `rx_bytes` exceeds this, send XOFF to the peer.
    /// Default: 3/4 of the receive buffer capacity.
    pub rx_high_watermark: u32,

    /// Low watermark: when `rx_bytes` drops below this (after XOFF was sent),
    /// send XON to the peer to resume transmission.
    /// Default: 1/4 of the receive buffer capacity.
    pub rx_low_watermark: u32,

    /// Total receive buffer capacity in bytes. Set at PTY creation.
    pub rx_capacity: u32,

    /// Simulated modem control signals using Linux TIOCM_* bit positions.
    /// AtomicU16 to accommodate DSR (bit 8 = 0x100). Bits used:
    /// bit 1: TIOCM_DTR (0x002), bit 2: TIOCM_RTS (0x004),
    /// bit 5: TIOCM_CTS (0x020), bit 6: TIOCM_CAR (0x040),
    /// bit 7: TIOCM_RNG (0x080), bit 8: TIOCM_DSR (0x100).
    /// No translation needed: TIOCMGET returns the raw value.
    pub modem_signals: AtomicU16,

    /// Number of XON characters sent to the peer (telemetry).
    pub xon_sent: AtomicU32,

    /// Number of XOFF characters sent to the peer (telemetry).
    pub xoff_sent: AtomicU32,

    /// If true, software flow control (XON/XOFF) is enabled for this side.
    /// Matches the IXON/IXOFF termios flags.
    pub sw_flow_enabled: AtomicBool,

    /// Whether `IXON` is currently active (derived from `termios.c_iflag`).
    /// When false, the master consumer skips XON/XOFF scanning entirely.
    pub ixon_enabled: AtomicBool,

    /// Whether `IXOFF` is currently active.
    /// When true, the kernel sends XOFF/XON to the slave based on ring fill level.
    pub ixoff_enabled: AtomicBool,

    /// Whether `IXANY` is set: any character from master resumes output.
    pub ixany_enabled: AtomicBool,

    /// Tracks whether an XOFF has been injected into the slave for IXOFF
    /// threshold enforcement. True from the moment XOFF is injected until XON
    /// is injected (when the ring drains below `rx_low_watermark`).
    pub ixoff_sent: AtomicBool,

    /// XOFF character value (default 0x13 = Ctrl-S). From termios.c_cc[VSTOP].
    pub xoff_char: u8,

    /// XON character value (default 0x11 = Ctrl-Q). From termios.c_cc[VSTART].
    pub xon_char: u8,
}

impl PtyFlowControlState {
    /// Default watermarks: high = 3/4 capacity, low = 1/4 capacity.
    pub fn new(capacity: u32) -> Self {
        Self {
            tx_stopped: AtomicBool::new(false),
            peer_stopped: AtomicBool::new(false),
            rx_bytes: AtomicU32::new(0),
            rx_high_watermark: capacity * 3 / 4,
            rx_low_watermark: capacity / 4,
            rx_capacity: capacity,
            // PTY-specific default; physical serial drivers derive initial
            // modem state from hardware.
            modem_signals: AtomicU16::new(0x000), // PTY: no modem signals. TIOCMGET returns -ENOTTY
                                                  // (matching Linux, which has no tiocmget op for PTYs).
                                                  // Physical serial drivers set initial modem state
                                                  // from hardware (DTR+RTS+CAR typically).
            xon_sent: AtomicU32::new(0),
            xoff_sent: AtomicU32::new(0),
            sw_flow_enabled: AtomicBool::new(false),
            ixon_enabled: AtomicBool::new(false),
            ixoff_enabled: AtomicBool::new(false),
            ixany_enabled: AtomicBool::new(false),
            ixoff_sent: AtomicBool::new(false),
            xoff_char: 0x13, // Ctrl-S
            xon_char: 0x11,  // Ctrl-Q
        }
    }

    /// Called when `rx_bytes` increases. Returns true if XOFF should be sent to peer.
    pub fn on_rx(&self, added: u32) -> bool {
        let new = self.rx_bytes.fetch_add(added, Ordering::Relaxed) + added;
        if self.sw_flow_enabled.load(Ordering::Relaxed)
            && new > self.rx_high_watermark
            && !self.tx_stopped.swap(true, Ordering::Release)
        {
            self.xoff_sent.fetch_add(1, Ordering::Relaxed);
            return true; // caller should inject XOFF into the peer's write path
        }
        false
    }

    /// Called when `rx_bytes` decreases. Returns true if XON should be sent to peer.
    /// Uses a CAS loop (not `fetch_sub`) to prevent underflow wrapping:
    /// `AtomicU32::fetch_sub` wraps to `u32::MAX - delta` if `consumed > current`,
    /// which would corrupt the flow control state irreversibly. The CAS loop
    /// loads the current value, clamps the subtraction to zero, and retries
    /// on contention with concurrent `on_rx()` increments.
    pub fn on_tx(&self, consumed: u32) -> bool {
        let (prev, new) = loop {
            let current = self.rx_bytes.load(Ordering::Relaxed);
            let clamped = current.saturating_sub(consumed);
            match self.rx_bytes.compare_exchange_weak(
                current, clamped, Ordering::Relaxed, Ordering::Relaxed,
            ) {
                Ok(old) => break (old, clamped),
                Err(_) => continue,  // contention with on_rx(); retry
            }
        };
        debug_assert!(consumed <= prev, "on_tx: consumed {} > rx_bytes {}", consumed, prev);
        if self.sw_flow_enabled.load(Ordering::Relaxed)
            && new < self.rx_low_watermark
            && self.tx_stopped.swap(false, Ordering::Release)
        {
            self.xon_sent.fetch_add(1, Ordering::Relaxed);
            return true; // caller should inject XON into the peer's write path
        }
        false
    }
}

Slave write path (when ixon_enabled is set):

fn pty_slave_write(ring: &PtyRingPage, flow: &PtyFlowControlState, data: &[u8]):
  1. if flow.tx_stopped.load(Acquire):
       // Block until master sends XON (or zero-copy mode is exited).
       wait_event(&flow.write_waitq, !flow.tx_stopped.load(Relaxed))
  2. Write `data` to ring buffer (zero-copy; no character scanning).
  3. Signal master via eventfd (data available).

The slave never scans bytes — it only checks the tx_stopped flag before each write(). The wait is on a standard wait queue; wakeup is delivered by the master consumer path when XON is detected.

Master read path (consumer side, when ixon_enabled):

fn pty_master_read(ring: &PtyRingPage, flow: &PtyFlowControlState, buf: &mut [u8]):
  let xon  = flow.xon_char;   // plain u8, set by tcsetattr
  let xoff = flow.xoff_char;  // plain u8, set by tcsetattr
  let ixany = flow.ixany_enabled.load(Relaxed);

  for each byte `b` consumed from the ring:
    if b == xoff && flow.ixon_enabled.load(Relaxed):
      flow.tx_stopped.store(true, Release)
      // Wake slave to re-check stopped state on next write attempt.
      wake_up(&flow.write_waitq)
      // XON/XOFF bytes are NOT delivered to the master application (POSIX).
      continue
    elif b == xon && flow.ixon_enabled.load(Relaxed) && !ixany:
      flow.tx_stopped.store(false, Release)
      wake_up(&flow.write_waitq)  // Unblock paused slave writers.
      continue
    elif flow.tx_stopped.load(Relaxed) && ixany:
      // IXANY: any character from master resumes paused output.
      flow.tx_stopped.store(false, Release)
      wake_up(&flow.write_waitq)
      // The character itself IS delivered to master (unlike plain XON).
      buf.push(b)
    else:
      buf.push(b)

POSIX character-stripping rules: - When IXON is set and IXANY is not set: XON (VSTART) and XOFF (VSTOP) bytes are consumed by the flow control layer and not delivered to the master application. This matches POSIX termios(3) semantics. - When IXANY is set: any character received from the master resumes paused output; only VSTOP pauses. The character that resumed output IS passed to the master application (it is not a dedicated control byte in this mode). - When IXOFF is set: the kernel automatically injects XOFF (VSTOP) into the slave's input stream when the slave-to-master ring reaches 75% capacity, and injects XON (VSTART) when the ring drains below 25% capacity. This back-pressures the slave from the kernel side without application involvement.

IXOFF kernel-side injection:

fn pty_check_ixoff_thresholds(ring: &PtyRingPage, flow: &PtyFlowControlState):
  let used = ring.header.head.load(Relaxed) - ring.header.tail.load(Relaxed);
  let capacity = PTY_RING_DATA_SIZE as u64;  // 3968 bytes
  if flow.ixoff_enabled.load(Relaxed):
    if used >= (capacity * 3 / 4) && !flow.ixoff_sent.load(Relaxed):
      inject_byte_to_slave(ring, flow.xoff_char)  // plain u8
      flow.ixoff_sent.store(true, Release)
    elif used <= (capacity / 4) && flow.ixoff_sent.load(Relaxed):
      inject_byte_to_slave(ring, flow.xon_char)   // plain u8
      flow.ixoff_sent.store(false, Release)

This check runs on the master consumer path after each read batch; it does not require a background timer or dedicated thread.

Termios change interaction: XON/XOFF mode is part of termios.c_iflag. When tcsetattr() is called while zero-copy mode is active: - If only IXON/IXOFF/IXANY bits change, zero-copy mode remains active. AtomicTtyState is updated in place; the consumer and producer paths pick up the new values on their next iteration. - If ICANON is re-enabled or any flag incompatible with zero-copy is set, zero-copy mode falls back to kernel-mediated mode (as documented in the zero-copy restrictions above). The tx_stopped flag is cleared during the transition to prevent the slave from blocking indefinitely after the fallback.

OPOST interaction: Zero-copy mode is active only when OPOST is clear in termios.c_oflag. When OPOST is enabled, output processing (newline translation ONLCR, tab expansion, etc.) is required on each byte — this is fundamentally incompatible with zero-copy. Setting OPOST forces the copy path for output processing; zero-copy mode is automatically suspended until OPOST is cleared again.

Overhead: The XON/XOFF consumer-side check adds approximately 2 ns per byte on x86-64 (one atomic byte load per byte consumed, branch predicted not-taken for bulk data where flow control is inactive). For bulk container logging — where IXON is typically not set — there is zero overhead (the ixon_enabled atomic check short- circuits the entire scanning path). For interactive terminals where XON/XOFF flow control is active, the per-byte overhead is acceptable and consistent with the terminal's interactive (non-bulk) nature.

21.1.3 Character Device Registration¶

TTY devices register with the VFS character device subsystem (Section 14.5) during subsystem init. Linux assigns two well-known majors to TTY:

Major	Minor range	Device nodes	Description
4	0–63	`/dev/tty0`–`/dev/tty63`	Virtual consoles (VTs)
4	64–255	`/dev/ttyS0`–`/dev/ttyS191`	Serial ports (ttySN = minor 64+N)
5	0	`/dev/tty`	Controlling terminal (current process)
5	1	`/dev/console`	System console
5	2	`/dev/ptmx`	PTY master multiplexer
136	0–1048575	`/dev/pts/N`	PTY slave devices (devpts, up to 1M PTYs)

/// Called from tty_subsystem_init() during boot Phase 5.3+ (after Tier 1 driver loading).
fn tty_register_chrdevs() {
    // Major 4: VTs (minors 0-63) + serial ports (minors 64-255)
    register_chrdev_region(ChrdevRegion {
        major: 4,
        minor_base: 0,
        minor_count: 256,  // 0-63 = VTs, 64-255 = serial (ttyS0 = minor 64)
        fops: &TTY_FOPS,
        name: "tty",
    }).expect("TTY major 4 registration");

    // /dev/tty, /dev/console, /dev/ptmx: major 5, minors 0-2
    register_chrdev_region(ChrdevRegion {
        major: 5,
        minor_base: 0,
        minor_count: 3,
        fops: &TTY_FOPS,
        name: "tty_misc",
    }).expect("TTY major 5 registration");

    // PTY slaves: major 136, devpts (dynamically allocated minors, up to 1M)
    register_chrdev_region(ChrdevRegion {
        major: 136,
        minor_base: 0,
        minor_count: 1_048_576,  // 2^20 = 1M PTYs via devpts
        fops: &PTY_SLAVE_FOPS,
        name: "pts",
    }).expect("PTY slave registration (devpts)");
}

TTY_FOPS dispatches open() to the appropriate TTY driver based on the minor number (serial driver, VT driver, or PTY master allocator). PTY_SLAVE_FOPS delegates to the PtyPair's slave-side ring buffer. Minor-to-driver lookup uses the per-driver TtyDriver registry (an XArray<Arc<TtyDriver>> keyed by minor range).

Serial device node creation (devtmpfs): When a serial port driver (8250/16550, PL011, etc.) probes a UART, it calls tty_register_device(driver, port_index). This calls devtmpfs_create_node() (Section 14.5) to create /dev/ttyS<N> (major 4, minor 64+N) in the devtmpfs filesystem. The device node inherits the standard permissions (0660, root:dialout) from the ChrdevRegion registration. On serial port removal (hot-unplug or driver unbind), tty_unregister_device() calls devtmpfs_remove_node() to remove the device node. VT device nodes (/dev/tty0–/dev/tty63) are created statically at boot by tty_register_chrdevs() and are never removed.

21.1.4 The devpts Pseudo-Filesystem¶

devpts is the pseudo-filesystem that provides /dev/pts/* device nodes for PTY slave devices. It is the kernel component that bridges open(/dev/ptmx) to the creation of a numbered /dev/pts/N inode visible to userspace. Without devpts, containers cannot have isolated PTY namespaces — Docker, Kubernetes pods, and unshare --mount all depend on per-mount-namespace devpts instances.

21.1.4.1 Filesystem Type and Superblock¶

devpts registers as a filesystem type (fs_type = "devpts") with the VFS (Section 14.1). Each mount creates an independent superblock with its own PTY index allocator and inode set:

/// devpts superblock — one per mount instance.
pub struct DevptsSuperblock {
    /// Per-instance PTY index allocator. Bitmap-based, O(1) alloc/free.
    /// Size is determined by the `max` mount option (default: 1048576).
    pub index_bitmap: SpinLock<DynBitmap>,
    /// Maximum PTY index for this instance (from `max=` mount option).
    /// Range: 1–1048576. Default: 1048576 (2^20, matching Linux).
    pub max_ptys: u32,
    /// Permission mode for `/dev/pts/ptmx` within this mount.
    /// From `ptmxmode=` mount option. Default: 0o000 (disabled).
    pub ptmx_mode: u16,
    /// UID assigned to newly created PTY slave inodes.
    /// From `uid=` mount option. Default: UID of the mounting process.
    pub default_uid: Uid,
    /// GID assigned to newly created PTY slave inodes.
    /// From `gid=` mount option. Default: GID of group "tty" (typically 5).
    pub default_gid: Gid,
    /// Permission mode for newly created PTY slave inodes.
    /// From `mode=` mount option. Default: 0o620 (owner rw, group w).
    pub default_mode: u16,
    /// Back-reference to the mount namespace that owns this instance.
    pub mnt_ns_id: MntNsId,
    /// Active PTY pairs keyed by pts_index. Used for inode lookup on
    /// `open("/dev/pts/N")` and for teardown on unmount.
    pub active_ptys: XArray<Arc<PtyPair>>,
}

21.1.4.2 Mount Options¶

devpts supports the following mount options, matching Linux's fs/devpts/inode.c:

Option	Type	Default	Description
`newinstance`	flag	(required for namespaced mounts)	Creates a new, isolated devpts instance. Without this flag, the mount joins the legacy singleton instance (compat only). Container runtimes always pass `newinstance`.
`max`	u32	1048576	Maximum number of PTYs allocatable on this instance. Range: 1–1048576.
`ptmxmode`	octal	0o000	Permission mode for the `/dev/pts/ptmx` node within this mount. Set to `0o666` to allow unprivileged PTY allocation inside containers (the standard container runtime configuration).
`mode`	octal	0o620	Permission mode for newly created `/dev/pts/N` slave inodes.
`uid`	u32	caller UID	Owner UID for new PTY slave inodes.
`gid`	u32	GID of "tty" group	Group GID for new PTY slave inodes. Typically 5 (`tty`).

/// Parsed devpts mount options.
pub struct DevptsMountOpts {
    /// True if `newinstance` was specified. Required for namespace-scoped mounts.
    pub new_instance: bool,
    /// Maximum PTY count for this instance.
    pub max: u32,
    /// Permission mode for `/dev/pts/ptmx`.
    pub ptmx_mode: u16,
    /// Permission mode for `/dev/pts/N` slave nodes.
    pub mode: u16,
    /// Owner UID for slave nodes.
    pub uid: Uid,
    /// Group GID for slave nodes.
    pub gid: Gid,
}

/// Parse devpts mount option string.
/// Returns error on invalid option names or out-of-range values.
fn devpts_parse_mount_opts(data: &[u8]) -> Result<DevptsMountOpts, Errno> {
    // Parse comma-separated key=value pairs.
    // Unrecognized options return -EINVAL (Linux compat).
    // ...
}

21.1.4.3 Namespace Scoping¶

Since Linux 4.7 (commit eedf265a), each mount namespace gets its own devpts instance when mounted with newinstance (Section 17.1). UmkaOS adopts this as the sole mode for new mounts — the legacy single-instance mode exists only for the initial root namespace's boot-time mount (compatibility with init scripts that predate newinstance).

Namespace isolation guarantees:

PTY indices are local to each devpts instance. Two containers can both have /dev/pts/0 without conflict — they refer to different DevptsSuperblock instances.
open("/dev/pts/N") resolves through the calling process's mount namespace. A process in namespace A cannot access PTY slave nodes from namespace B's devpts mount.
When a mount namespace is destroyed, its devpts superblock is torn down: all active PTY pairs receive a hangup condition on the slave side, and the index bitmap is freed.

Container runtime integration:

OCI container runtimes (runc, crun) perform the following devpts setup during container creation, which UmkaOS supports identically to Linux:

unshare(CLONE_NEWNS) — create new mount namespace.
mount("devpts", "/dev/pts", "devpts", 0, "newinstance,ptmxmode=0666,mode=0620,gid=5") — mount a fresh devpts instance.
bind_mount("/dev/pts/ptmx", "/dev/ptmx") — ensure /dev/ptmx inside the container points to this instance's multiplexer, not the host's.

21.1.4.4 PTY Allocation via `/dev/ptmx`¶

When a process opens /dev/ptmx (or /dev/pts/ptmx inside a container), the kernel allocates a new PTY pair from the devpts instance associated with the caller's mount namespace:

/// Called when userspace opens /dev/ptmx (major 5, minor 2) or
/// /dev/pts/ptmx (the per-instance ptmx node).
///
/// Returns a file descriptor for the PTY master side.
fn devpts_ptmx_open(file: &mut File) -> Result<(), Errno> {
    // 1. Resolve the devpts superblock from the mount point.
    //    For /dev/ptmx: follow the bind mount to find the real devpts instance.
    //    For /dev/pts/ptmx: the superblock is the parent directory's mount.
    let sb = devpts_resolve_superblock(file)?;

    // 2. Allocate a PTY index from the per-instance bitmap.
    let pts_index = {
        let mut bitmap = sb.index_bitmap.lock();
        let idx = bitmap.find_first_zero()
            .ok_or(Errno::ENOSPC)?;  // all PTY slots full
        if idx >= sb.max_ptys as usize {
            return Err(Errno::ENOSPC);
        }
        bitmap.set(idx);
        idx as u32
    };

    // 3. Create the PtyPair (ring buffers, termios state, flow control).
    let pty = PtyPair::new(pts_index, current_task().mnt_ns_id)?;

    // 4. Create the /dev/pts/N inode in this devpts instance.
    devpts_create_slave_inode(&sb, pts_index, &pty)?;

    // 5. Register the PtyPair in the superblock's active set.
    sb.active_ptys.store(pts_index as u64, pty.clone());

    // 6. Set up the master file descriptor.
    file.private_data = FilePrivateData::PtyMaster(pty);
    Ok(())
}

Index allocation: The DynBitmap is a dynamically-sized bitmap (allocated at mount time based on max option). find_first_zero() scans for the lowest available index — O(N/64) in the worst case (scanning 64-bit words), O(1) amortized with a cached hint of the last-freed position. The bitmap is protected by a SpinLock because PTY allocation is a warm-path operation (not per-packet/per-syscall) and contention is low.

Slave inode creation: devpts_create_slave_inode() creates a character device inode with major 136 + (pts_index / 256), minor pts_index % 256, owned by sb.default_uid / sb.default_gid with mode sb.default_mode. The inode is inserted into the devpts directory so that readdir("/dev/pts") and stat("/dev/pts/N") work correctly.

21.1.4.5 PTY Teardown¶

When the PTY master file descriptor is closed (last reference dropped):

/// Called when the last reference to the PTY master fd is dropped.
fn devpts_ptmx_release(file: &File) {
    let pty = file.private_data.as_pty_master();
    let sb = devpts_resolve_superblock_from_pty(pty);

    // 1. Send hangup to the slave side (wake blocked readers with EIO).
    pty.hangup_slave();

    // 2. Remove the /dev/pts/N inode from the devpts directory.
    devpts_remove_slave_inode(&sb, pty.pts_index);

    // 3. Free the PTY index back to the bitmap.
    {
        let mut bitmap = sb.index_bitmap.lock();
        bitmap.clear(pty.pts_index as usize);
    }

    // 4. Remove from the active PTY set.
    sb.active_ptys.remove(pty.pts_index as u64);

    // 5. PtyPair is dropped when Arc refcount reaches zero
    //    (ring buffer pages are freed, state arena slot is released).
}

21.1.4.6 `TIOCGPTPEER` — Open Slave from Master FD¶

Linux 4.13 added ioctl(master_fd, TIOCGPTPEER, flags) which returns an open file descriptor to the PTY slave without requiring the caller to know the slave's path. This is critical for containers where the slave path in the host's filesystem namespace may differ from the container's view:

/// TIOCGPTPEER ioctl value (Linux ABI).
pub const TIOCGPTPEER: u32 = 0x5441;

/// Handle TIOCGPTPEER: open the slave side of a PTY from its master fd.
///
/// This avoids the race condition in the traditional open("/dev/pts/N") path
/// and works correctly across mount namespaces (the slave fd is opened in
/// the devpts instance of the master, not the caller's mount namespace).
fn pty_ioctl_tiocgptpeer(master: &PtyPair, flags: u32) -> Result<FileDesc, Errno> {
    let sb = devpts_resolve_superblock_from_pty(master);
    let inode = sb.active_ptys.load(master.pts_index as u64)
        .ok_or(Errno::EIO)?;  // slave already torn down
    let open_flags = OpenFlags::from_bits_truncate(flags);
    let slave_file = devpts_open_slave_inode(&sb, master.pts_index, open_flags)?;
    Ok(current_task().fd_table.install(slave_file)?)
}

21.1.5 Asynchronous Line Disciplines (N_TTY)¶

The line discipline (N_TTY) translates raw characters into canonical input (handling backspace, line buffering) and generates signals (translating Ctrl+C into SIGINT).

In Linux, this processing happens synchronously during the write() or read() syscall, while holding the tty_mutex.

In UmkaOS, line discipline processing is asynchronous and decoupled from the data path. 1. When the user types Ctrl+C, the raw byte (0x03) is placed into the master_tx ring buffer. 2. The kernel's asynchronous TTY worker thread (running in UmkaOS Core) consumes the raw ring, processes the line discipline rules based on the termios state, and pushes the processed output to the canonical ring (or generates the SIGINT signal to the foreground process group). 3. The foreground application reads from the canonical ring.

Because the TTY worker thread is the sole consumer of the raw ring and the sole producer of the canonical ring, it operates entirely lock-free.

Async TTY Worker Thread Configuration:

Count: One worker thread per physical CPU socket (NUMA node), not per CPU. Named tty_worker/{socket_id}. Rationale: TTY throughput is not CPU-intensive (character processing + application wakeup); socket-scoped workers provide NUMA locality without per-CPU overhead.
Priority: SCHED_OTHER (normal timesharing) at nice -5. This gives TTY processing a small priority boost over typical user tasks (nice 0) without impacting RT workloads. Interactive terminal responsiveness is maintained because terminal input wakeup latency is dominated by the nice-level scheduling latency (~0.5–2ms), not TTY processing time.
Starvation handling: The worker thread checks tty_queue.len() at each wakeup. If the queue has grown to >80% of its capacity (TTY_ASYNC_QUEUE_SIZE = 4096 entries), the worker temporarily raises its scheduling priority to SCHED_OTHER nice=-15 until the queue drains below 50%. This prevents input drop under heavy load without permanently occupying a high-priority slot.
Queue overflow: If the async queue reaches 100% capacity (4096 unprocessed TTY events), new input is dropped and tty_drop_count is incremented. A warning is logged to the kernel ring buffer and exposed via umkafs at /ukfs/kernel/tty/drop_count. Drop recovery: the worker thread processes the queue as fast as possible, then resets drop_count to 0 when the queue clears.
Shutdown: The worker thread is a kthread; it joins cleanly via kthread_stop() during system shutdown after all TTY devices have been closed.
Wake mechanism: The worker thread sleeps on a per-NUMA-node WaitQueue between drain passes. The pending-data flag and wake call are issued from interrupt context (serial UART IRQ, PTY write path) — both paths must be IRQ-safe.

/// Per-NUMA-node TTY worker state. Indexed by NUMA node ID.
/// Boot-allocated: `nr_numa_nodes` entries discovered from ACPI SRAT / device tree.
/// `Box<[TtyWorkerState]>` because NUMA node count is runtime-discovered —
/// a static `[TtyWorkerState; N]` would require a compile-time constant.
/// Initialized once during TTY subsystem init; never resized.
pub static TTY_WORKER_STATES: OnceCell<Box<[TtyWorkerState]>> = OnceCell::new();

/// Drain all pending TTY ring buffers on the given NUMA node.
/// Called from the TTY worker main loop after `has_pending` is observed true.
/// Iterates all TTY devices homed on `numa_id`, dequeues characters from
/// each device's input ring buffer, and dispatches them through the line
/// discipline (n_tty canonical/raw processing, echo, signal generation).
/// Returns when all rings on this node are empty.
pub fn tty_drain_rings(numa_id: usize) { /* ... process all pending writes ... */ }

/// TTY driver descriptor. Each driver registers with the TTY core at init
/// time and is stored in DRIVERS: XArray<Arc<TtyDriver>> keyed by major
/// number. Minor-to-driver lookup resolves the appropriate driver for
/// `open(/dev/ttyS*, /dev/tty*, /dev/pts/*)` operations.
pub struct TtyDriver {
    /// Human-readable driver name (e.g., "serial", "pty_master", "pty_slave").
    pub name: &'static str,
    /// Major device number (e.g., 4 for /dev/ttyS*, 136 for /dev/pts/*).
    /// u16 matches Linux's `MAJOR()` range (0-4095, 12-bit) and the dev_t
    /// encoding where major occupies bits [8:19] (12 bits).
    pub major: u16,
    /// First minor device number owned by this driver.
    pub minor_start: u32,
    /// Number of device instances managed by this driver.
    pub num_devices: u32,
    /// Driver operations (open, close, write, ioctl, set_termios, etc.).
    pub ops: &'static dyn TtyOps,
}

/// State for one NUMA node's TTY worker thread.
pub struct TtyWorkerState {
    /// Wait queue: worker sleeps here when no TTY on this node has pending data.
    pub wq:          WaitQueue,
    /// Set to true (Release) before each `wq.notify_one()`. Cleared (Relaxed)
    /// at the start of each drain pass. Prevents missed wake-ups in the race
    /// where data arrives after the drain but before the worker re-enters sleep.
    pub has_pending: AtomicBool,
    /// The kthread handle for this worker.
    pub thread:      TaskRef,
}

/// Called from interrupt/IRQ context (serial UART IRQ, PTY write) when new data
/// is placed into any TTY input ring on `numa_id`.
/// IRQ-safe: uses only atomics and the IRQ-safe WaitQueue notify path.
pub fn tty_worker_wake(numa_id: usize) {
    let state = &TTY_WORKER_STATES[numa_id];
    state.has_pending.store(true, Release);  // (1) publish: data is ready
    state.wq.notify_one();                   // (2) wake: interrupt worker sleep
}

/// TTY worker main loop (kthread).
pub fn tty_worker_main(numa_id: usize) -> ! {
    let state = &TTY_WORKER_STATES[numa_id];
    loop {
        // Wait until has_pending is true. The Acquire load pairs with the
        // Release store in `tty_worker_wake()`, ensuring all ring data written
        // before the wake call is visible here after the load.
        state.wq.wait_until(|| state.has_pending.load(Acquire));
        state.has_pending.store(false, Relaxed);  // clear before drain
        tty_drain_rings(numa_id);
        // If new data arrived during the drain (race: producer stored data,
        // set has_pending=true, called notify_one AFTER we cleared it but
        // BEFORE tty_drain_rings completed), the loop continues immediately
        // because has_pending was set again. No data is lost.
    }
}

The Release/Acquire pair on has_pending closes the classic "missed wake-up" race: any data written to a ring before tty_worker_wake() is called is guaranteed visible to the worker after it observes has_pending = true.

21.1.6 Serial TTY — Full POSIX termios and Modem Control¶

This section answers: "can minicom run on UmkaOS?"

The POSIX termios interface controls the serial line discipline: character size, baud rate, parity, flow control, canonical vs raw mode, and modem control signals. It applies to both serial UART ports (/dev/ttyS0, /dev/ttyUSB0) and to PTYs (via the PTY slave). The preceding sub-sections ("The Problem" through "Character Device Registration") cover PTY; this section covers the serial-specific parts needed for programs like minicom, picocom, and screen.

21.1.6.1 struct termios¶

The full POSIX struct termios as exposed to userspace (Linux asm-generic/termbits.h layout, required for binary compat):

/// POSIX struct termios — character device terminal settings.
/// Layout matches Linux's `struct termios2` for TCGETS2/TCSETS2 ioctls.
/// The kernel-internal representation is `KernelTermios`; this is the
/// userspace-visible layout placed at ioctl argument pointers.
#[repr(C)]
pub struct Termios {
    /// Input mode flags.
    pub c_iflag: u32,
    /// Output mode flags.
    pub c_oflag: u32,
    /// Control mode flags.
    pub c_cflag: u32,
    /// Local mode flags.
    pub c_lflag: u32,
    /// Line discipline index (N_TTY = 0).
    pub c_line:  u8,
    /// Special character array (NCCS = 19 for Linux/POSIX).
    pub c_cc:    [u8; 19],
    /// Input baud rate (encoded as Bxxx constant OR an actual numeric rate
    /// when using TCSETS2/BOTHER — see §21.1.4.2).
    pub c_ispeed: u32,
    /// Output baud rate.
    pub c_ospeed: u32,
}
// Termios: u32(4)*4 + u8(1) + [u8;19](19) + u32(4)*2 = 44 bytes.
// Userspace ABI struct (TCGETS2/TCSETS2 ioctl argument pointer).
const_assert!(core::mem::size_of::<Termios>() == 44);

// c_iflag bits
pub const IGNBRK:  u32 = 0o000001; // Ignore BREAK condition
pub const BRKINT:  u32 = 0o000002; // BREAK → SIGINT to foreground process group
pub const IGNPAR:  u32 = 0o000004; // Ignore framing and parity errors
pub const PARMRK:  u32 = 0o000010; // Mark parity and framing errors with 0xFF 0x00
pub const INPCK:   u32 = 0o000020; // Enable input parity checking
pub const ISTRIP:  u32 = 0o000040; // Strip 8th bit from input characters
pub const INLCR:   u32 = 0o000100; // Translate NL to CR on input
pub const IGNCR:   u32 = 0o000200; // Ignore CR on input
pub const ICRNL:   u32 = 0o000400; // Translate CR to NL on input (unless IGNCR)
pub const IUCLC:   u32 = 0o001000; // Map uppercase to lowercase (obsolete, not POSIX)
pub const IXON:    u32 = 0o002000; // Enable XON/XOFF flow control on output
pub const IXANY:   u32 = 0o004000; // Any character restarts output stopped by XOFF
pub const IXOFF:   u32 = 0o010000; // Enable XON/XOFF flow control on input
pub const IMAXBEL: u32 = 0o020000; // Ring bell when input queue is full
pub const IUTF8:   u32 = 0o040000; // Input is UTF-8; affects erase in canonical mode

// c_oflag bits
pub const OPOST:   u32 = 0o000001; // Enable output processing
pub const OLCUC:   u32 = 0o000002; // Map lowercase to uppercase (obsolete)
pub const ONLCR:   u32 = 0o000004; // Map NL to CR-NL on output
pub const OCRNL:   u32 = 0o000010; // Map CR to NL on output
pub const ONOCR:   u32 = 0o000020; // No CR output at column 0
pub const ONLRET:  u32 = 0o000040; // NL performs CR function
pub const OFILL:   u32 = 0o000100; // Use fill characters for delay
pub const OFDEL:   u32 = 0o000200; // Fill char is DEL (otherwise NUL)

// c_cflag bits
pub const CBAUD:   u32 = 0o010017; // Baud rate mask (use BOTHER for non-standard rates)
pub const BOTHER:  u32 = 0o010000; // Non-standard baud rate (rate in c_ispeed/c_ospeed)
pub const CS5:     u32 = 0o000000; // 5-bit characters
pub const CS6:     u32 = 0o000020; // 6-bit characters
pub const CS7:     u32 = 0o000040; // 7-bit characters
pub const CS8:     u32 = 0o000060; // 8-bit characters
pub const CSIZE:   u32 = 0o000060; // Character size mask
pub const CSTOPB:  u32 = 0o000100; // 2 stop bits (1 if not set)
pub const CREAD:   u32 = 0o000200; // Enable receiver
pub const PARENB:  u32 = 0o000400; // Enable parity generation on output and checking on input
pub const PARODD:  u32 = 0o001000; // Odd parity (even if not set)
pub const HUPCL:   u32 = 0o002000; // Hang up on last close (de-assert DTR/RTS)
pub const CLOCAL:  u32 = 0o004000; // Ignore modem status lines
pub const CRTSCTS: u32 = 0o020000000000; // Enable RTS/CTS hardware flow control

// c_lflag bits
pub const ISIG:    u32 = 0o000001; // Generate signal when INTR/QUIT/SUSP received
pub const ICANON:  u32 = 0o000002; // Canonical mode (line-by-line)
pub const XCASE:   u32 = 0o000004; // Fold uppercase (obsolete)
pub const ECHO:    u32 = 0o000010; // Echo input characters
pub const ECHOE:   u32 = 0o000020; // ERASE erases preceding character
pub const ECHOK:   u32 = 0o000040; // KILL erases current line
pub const ECHONL:  u32 = 0o000100; // Echo NL even if ECHO is not set
pub const NOFLSH:  u32 = 0o000200; // No flush on INTR, QUIT, or SUSP
pub const TOSTOP:  u32 = 0o000400; // Send SIGTTOU for background write attempts
pub const ECHOCTL: u32 = 0o001000; // Echo control chars as ^X
pub const ECHOPRT: u32 = 0o002000; // Echo erased chars (hardcopy terminal style)
pub const ECHOKE:  u32 = 0o004000; // KILL erases by echoing spaces
pub const FLUSHO:  u32 = 0o010000; // Output is being flushed
pub const PENDIN:  u32 = 0o040000; // Re-print pending input at next read/newline
pub const IEXTEN:  u32 = 0o100000; // Enable implementation-defined input processing

// c_cc indices (NCCS = 19)
pub const VINTR:    usize = 0;  // Interrupt (default ^C = 0x03)
pub const VQUIT:    usize = 1;  // Quit (default ^\ = 0x1C)
pub const VERASE:   usize = 2;  // Erase (default ^H/DEL)
pub const VKILL:    usize = 3;  // Kill line (default ^U)
pub const VEOF:     usize = 4;  // End-of-file (canonical, default ^D)
pub const VTIME:    usize = 5;  // Timeout for non-canonical read (tenths of second)
pub const VMIN:     usize = 6;  // Min chars for non-canonical read
pub const VSWTC:    usize = 7;  // Switch (not POSIX; 0 in Linux)
pub const VSTART:   usize = 8;  // Resume output (XON, default ^Q)
pub const VSTOP:    usize = 9;  // Pause output (XOFF, default ^S)
pub const VSUSP:    usize = 10; // Suspend (default ^Z)
pub const VEOL:     usize = 11; // Additional end-of-line (canonical)
pub const VREPRINT: usize = 12; // Reprint pending input (default ^R)
pub const VDISCARD: usize = 13; // Toggle discard output (default ^O)
pub const VWERASE:  usize = 14; // Word erase (default ^W)
pub const VLNEXT:   usize = 15; // Literal next (default ^V)
pub const VEOL2:    usize = 16; // Second end-of-line (default NUL = disabled)
// indices 17, 18 are padding (unused)

21.1.6.2 Baud Rate Setting¶

Standard baud rates are encoded as Bxxx constants in c_cflag & CBAUD. Non-standard rates use BOTHER + numeric value in c_ispeed/c_ospeed, via the TCSETS2/TCGETS2 ioctls (Linux 2.6.32+, struct termios2):

/// Standard baud rate constants (in c_cflag bits 0-4, masked by CBAUD).
/// Values are in octal to match Linux `include/uapi/asm-generic/termbits.h`
/// where they are defined as `#define B9600 0000015` etc. Octal notation
/// makes the bit-field encoding clearer (each octal digit = 3 bits).
pub const B0:      u32 = 0o000000; // Hang up (de-assert DTR)
pub const B50:     u32 = 0o000001;
pub const B75:     u32 = 0o000002;
pub const B110:    u32 = 0o000003;
pub const B134:    u32 = 0o000004;
pub const B150:    u32 = 0o000005;
pub const B200:    u32 = 0o000006;
pub const B300:    u32 = 0o000007;
pub const B600:    u32 = 0o000010;
pub const B1200:   u32 = 0o000011;
pub const B1800:   u32 = 0o000012;
pub const B2400:   u32 = 0o000013;
pub const B4800:   u32 = 0o000014;
pub const B9600:   u32 = 0o000015;
pub const B19200:  u32 = 0o000016;
pub const B38400:  u32 = 0o000017;
pub const B57600:  u32 = 0o010001;
pub const B115200: u32 = 0o010002;
pub const B230400: u32 = 0o010003;
pub const B460800: u32 = 0o010004;
pub const B500000: u32 = 0o010005;
pub const B576000: u32 = 0o010006;
pub const B921600: u32 = 0o010007;
pub const B1000000:u32 = 0o010010;
pub const B1152000:u32 = 0o010011;
pub const B1500000:u32 = 0o010012;
pub const B2000000:u32 = 0o010013;
pub const B2500000:u32 = 0o010014;
pub const B3000000:u32 = 0o010015;
pub const B3500000:u32 = 0o010016;
pub const B4000000:u32 = 0o010017;

UmkaOS ioctls for terminal settings: - TCGETS (0x5401): get struct termios (old, 15 c_cc entries) - TCSETS (0x5402): set immediately - TCSETSW (0x5403): set after drain (wait for output to flush) - TCSETSF (0x5404): set after flush (drain output + flush input) - TCGETS2 (0x802C542A): get struct termios2 (19 c_cc, supports BOTHER) - TCSETS2 (0x402C542B): set via termios2 (supports non-standard baud) - TCSETSW2 / TCSETSF2: drain/flush variants of TCSETS2

21.1.6.3 Modem Control Lines¶

/// Modem control line bits (TIOCMGET/TIOCMSET/TIOCMBIS/TIOCMBIC).
pub const TIOCM_LE:  u32 = 0x001; // Line Enable (DSR in LE role)
pub const TIOCM_DTR: u32 = 0x002; // Data Terminal Ready (output)
pub const TIOCM_RTS: u32 = 0x004; // Request To Send (output)
pub const TIOCM_ST:  u32 = 0x008; // Secondary Transmit (rare)
pub const TIOCM_SR:  u32 = 0x010; // Secondary Receive (rare)
pub const TIOCM_CTS: u32 = 0x020; // Clear To Send (input)
pub const TIOCM_CAR: u32 = 0x040; // Carrier Detect (input, alias DCD)
pub const TIOCM_RNG: u32 = 0x080; // Ring Indicator (input)
pub const TIOCM_DSR: u32 = 0x100; // Data Set Ready (input)
pub const TIOCM_CD:  u32 = TIOCM_CAR;
pub const TIOCM_RI:  u32 = TIOCM_RNG;
pub const TIOCM_OUT1:u32 = 0x2000;
pub const TIOCM_OUT2:u32 = 0x4000;
pub const TIOCM_LOOP:u32 = 0x8000;

/// Modem control ioctls.
/// TIOCMGET: read current modem line state → *argp = u32 bitmask
/// TIOCMSET: set modem lines → *argp = u32 bitmask (replaces all writable bits)
/// TIOCMBIS: set individual bits → *argp = u32 bitmask (OR into current)
/// TIOCMBIC: clear individual bits → *argp = u32 bitmask (AND NOT into current)
pub const TIOCMGET:  u32 = 0x5415;
pub const TIOCMSET:  u32 = 0x5418;
pub const TIOCMBIS:  u32 = 0x5416;
pub const TIOCMBIC:  u32 = 0x5417;

/// TIOCMIWAIT: wait for modem line state change.
/// *argp = bitmask of lines to wait on (TIOCM_CAR|TIOCM_DSR|TIOCM_RI|TIOCM_CTS).
/// Blocks until any of the specified lines changes. Returns 0 on change, EINTR on signal.
pub const TIOCMIWAIT: u32 = 0x545C;

/// TIOCGICOUNT: get modem line interrupt counter (counts transitions since last call).
/// *argp = struct serial_icounter_struct { cts, dsr, rng, dcd, rx, tx, frame, overrun, parity, brk, ... }
pub const TIOCGICOUNT: u32 = 0x545D;

21.1.6.4 Serial-Specific ioctls¶

/// TIOCEXCL: put tty into exclusive mode.
/// Subsequent open() calls on the device fail with EBUSY.
/// Required by minicom for exclusive serial port access.
pub const TIOCEXCL:  u32 = 0x540C;
/// TIOCNXCL: clear exclusive mode.
pub const TIOCNXCL:  u32 = 0x540D;
/// TIOCGEXCL: check if in exclusive mode (Linux 3.8+). *argp = int (1 = exclusive).
pub const TIOCGEXCL: u32 = 0x80045440;

/// TIOCGSERIAL: get serial port info (struct serial_struct, Linux ABI compat).
pub const TIOCGSERIAL: u32 = 0x541E;
/// TIOCSSERIAL: set serial port info.
pub const TIOCSSERIAL: u32 = 0x541F;

/// struct serial_struct (Linux ABI — must match exactly for compat).
/// minicom uses TIOCGSERIAL to detect and set ASYNC_LOW_LATENCY.
// Userspace ABI — matches Linux struct serial_struct (TIOCGSERIAL/TIOCSSERIAL). Layout frozen.
#[repr(C)]
pub struct SerialStruct {
    pub type_:         i32,   // PORT_16550A etc.
    pub line:          i32,   // tty line number
    pub port:          u32,   // I/O port address
    pub irq:           i32,
    pub flags:         i32,   // ASYNC_LOW_LATENCY = 0x2000, ASYNC_SKIP_TEST = 0x0200
    pub xmit_fifo_size:i32,
    pub custom_divisor:i32,
    pub baud_base:     i32,   // base baud rate (usually 115200 or clock/16)
    pub close_delay:   u16,   // delay before fully closed (jiffies/100)
    pub io_type:       u8,
    pub reserved_char: [u8; 1],
    pub hub6:          i32,
    pub closing_wait:  u16,   // delay before close (jiffies/100; ASYNC_CLOSING_WAIT_NONE=0xFFFF)
    pub closing_wait2: u16,
    pub iomem_base:    usize, // MMIO base (Linux: `unsigned char *`; usize for 32/64-bit ABI compat)
    pub iomem_reg_shift: u16,
    pub port_high:     u32,
    pub iomap_base:    usize, // Linux: `unsigned long`; usize for 32/64-bit ABI compat
}
#[cfg(target_pointer_width = "64")]
const _: () = assert!(core::mem::size_of::<SerialStruct>() == 72);
#[cfg(target_pointer_width = "32")]
const _: () = assert!(core::mem::size_of::<SerialStruct>() == 60);
// **compat_ioctl**: On 64-bit kernels running 32-bit processes, `iomem_base`
// and `iomap_base` are `usize` (4 bytes in the 32-bit struct, 8 bytes in the
// 64-bit struct). The compat_ioctl handler for TIOCGSERIAL/TIOCSSERIAL must
// translate between the 32-bit and 64-bit layouts:
// - On TIOCGSERIAL (get): copy the 64-bit struct, truncating iomem_base and
//   iomap_base to the lower 32 bits (MMIO addresses in the 32-bit compat
//   address space are always <4 GiB).
// - On TIOCSSERIAL (set): zero-extend iomem_base and iomap_base from 32 to
//   64 bits. This matches Linux's `compat_serial_struct` handling.
// compat (32-bit) layout: same fields as SerialStruct but with iomem_base: u32
// and iomap_base: u32, giving total size 60 bytes.

21.1.6.5 Line Discipline Switching¶

/// TIOCSETD: set line discipline. *argp = int (discipline number).
pub const TIOCSETD: u32 = 0x5423;
/// TIOCGETD: get current line discipline. *argp = int.
pub const TIOCGETD: u32 = 0x5424;

/// Registered line disciplines.
pub const N_TTY:   i32 = 0;  // Default: terminal line discipline
pub const N_SLIP:  i32 = 1;  // SLIP (Serial Line Internet Protocol)
pub const N_MOUSE: i32 = 2;  // Mouse driver (obsolete)
pub const N_PPP:   i32 = 3;  // PPP (Point-to-Point Protocol) — used by pppd
pub const N_STRIP: i32 = 4;  // STRIP (Metricom Striper) — obsolete
pub const N_AX25:  i32 = 5;  // AX.25 packet radio — unused on UmkaOS
pub const N_X25:   i32 = 6;  // X.25 async — unused on UmkaOS
pub const N_6PACK: i32 = 7;  // 6PACK packet radio — unused
pub const N_MASC:  i32 = 8;  // Reserved
pub const N_R3964: i32 = 9;  // Simatic R3964
pub const N_PROFIBUS_FDL: i32 = 10; // Profibus — industrial
pub const N_IRDA:  i32 = 11; // IrDA — legacy
pub const N_SMSBLOCK: i32 = 12; // SMS block protocol
pub const N_HDLC:  i32 = 13; // HDLC sync — used by isdn/WAN drivers
pub const N_SYNC_PPP: i32 = 14; // Sync PPP
pub const N_HCI:   i32 = 15; // Bluetooth HCI via UART (H4 protocol)

/// Per-TTY port state. Represents a single TTY device instance (serial port,
/// PTY slave, VT console). Passed to line discipline methods as the context
/// for all TTY operations. One `TtyPort` exists per open TTY device.
///
/// **Relationship to `PtyPair`**: For PTY devices, `TtyPort` is the
/// line-discipline-facing interface (termios, input/output rings, wait queues),
/// while `PtyPair` is the zero-copy data transport (shared ring pages, control
/// ring). A PTY slave's `TtyPort.driver_data` points to the owning `PtyPair`.
/// The `TtyPort` handles canonical processing (echo, line editing); the `PtyPair`
/// handles master-slave data transport. Serial ports have `TtyPort` only (no
/// `PtyPair`). This separation prevents serial-port code from depending on PTY
/// ring structures and vice versa.
pub struct TtyPort {
    /// Major/minor device number for this TTY.
    pub dev: DevNum,
    /// Current termios settings (baud rate, c_lflag, c_iflag, c_oflag, c_cflag).
    pub termios: SpinLock<Termios>,
    /// Line discipline currently active on this port (N_TTY by default).
    pub ldisc: Arc<dyn LineDisciplineOps>,
    /// Line discipline ID (N_TTY=0, N_PPP=3, etc.).
    pub ldisc_id: i32,
    /// Input ring buffer: serial IRQ / PTY write path pushes bytes here.
    pub input_ring: SpscRing<u8, 4096>,
    /// Output ring buffer: application write path pushes bytes here.
    pub output_ring: SpscRing<u8, 4096>,
    /// Wait queue for readers blocked on empty input.
    pub read_wait: WaitQueueHead,
    /// Wait queue for writers blocked on full output buffer.
    pub write_wait: WaitQueueHead,
    /// True if TIOCEXCL has been set (exclusive access mode).
    pub exclusive: AtomicBool,
    /// Modem control line state (DTR, RTS, CTS, DCD, RI, DSR).
    pub modem_status: AtomicU32,
    /// Session ID of the controlling process (for SIGHUP on hangup).
    pub session: AtomicU64,
    /// Foreground process group (for SIGINT/SIGTSTP delivery).
    pub pgrp: AtomicU64,
    /// NUMA node this TTY worker is assigned to.
    pub numa_node: u16,
}

/// LineDisciplineOps trait — implemented by each line discipline.
pub trait LineDisciplineOps: Send + Sync {
    /// Called when characters arrive from the driver.
    fn receive_buf(&self, tty: &TtyPort, buf: &[u8], flags: &[u8]);
    /// Called when the application reads from the tty.
    fn read(&self, tty: &TtyPort, buf: &mut [u8]) -> Result<usize, KernelError>;
    /// Called when the application writes to the tty.
    fn write(&self, tty: &TtyPort, buf: &[u8]) -> Result<usize, KernelError>;
    /// Handle ioctl (discipline-specific, e.g., PPPIOCGUNIT for N_PPP).
    fn ioctl(&self, tty: &TtyPort, cmd: u32, arg: usize) -> Result<i32, KernelError>;
    /// Called when line discipline is opened.
    fn open(&self, tty: &TtyPort) -> Result<(), KernelError>;
    /// Called when line discipline is closed.
    fn close(&self, tty: &TtyPort);
}

Note — TIOCSETD behavior (D24): Line disciplines are not stacked. TIOCSETD replaces the current discipline with a new one; a TTY has exactly one active line discipline at any time (no STREAMS-style stacking, matching Linux behavior).

ioctl(fd, TIOCSETD, &ldisc_id): calls the old discipline's close(), then the new discipline's open(). Returns EINVAL if ldisc_id >= N_LDISC_MAX (30) or the discipline is not registered.

ioctl(fd, TIOCGETD, &ldisc_id): returns the ID of the current line discipline.

N_TTY (ID 0) is always registered and is the fallback if a custom discipline's open() fails.

N_LDISC_MAX = 30: system-wide limit on the number of distinct registered discipline types (not on simultaneous TTY instances), matching Linux.

21.1.6.6 SerialTtyOps KABI¶

Hardware serial UART drivers implement SerialTtyOps:

/// KABI vtable for a serial UART driver.
/// Transport: T1 (ring buffer + MPK domain switch).
#[repr(C)]
pub struct SerialTtyOps {
    pub vtable_size: usize,
    /// Apply new termios settings to hardware (baud rate, framing, flow control).
    pub set_termios: unsafe extern "C" fn(
        ctx:     *mut c_void,
        new:     *const Termios,
        old:     *const Termios,
    ),
    /// Get current modem control line state (returns TIOCM_* bitmask).
    pub get_mctrl: unsafe extern "C" fn(ctx: *mut c_void) -> u32,
    /// Set modem control output lines (DTR, RTS).
    pub set_mctrl: unsafe extern "C" fn(ctx: *mut c_void, mctrl: u32),
    /// Send a BREAK condition for `duration_ms` milliseconds.
    pub send_break: unsafe extern "C" fn(ctx: *mut c_void, duration_ms: u32),
    /// Start transmitting (driver was stopped by throttle/stop_tx, now resume).
    pub start_tx: unsafe extern "C" fn(ctx: *mut c_void),
    /// Stop transmitting (XOFF received or output buffer full).
    pub stop_tx: unsafe extern "C" fn(ctx: *mut c_void),
    /// Enable/disable receiver (CREAD flag).
    pub set_rx_enabled: unsafe extern "C" fn(ctx: *mut c_void, enabled: u8), // 0 = disabled, 1 = enabled
    /// Wait for modem line changes (blocking; interruptible).
    pub wait_mctrl_change: unsafe extern "C" fn(
        ctx:          *mut c_void,
        wait_mask:    u32,
        timeout_ms:   u32,
    ) -> u32,
    /// Get serial port static info (for TIOCGSERIAL).
    pub get_serial: unsafe extern "C" fn(ctx: *mut c_void, out: *mut SerialStruct),
    /// Set serial port parameters (for TIOCSSERIAL).
    pub set_serial: unsafe extern "C" fn(ctx: *mut c_void, new: *const SerialStruct) -> i32,
}
// SerialTtyOps: vtable_size(usize) + 10 fn pointers.
// KABI vtable — size is pointer-width dependent.
#[cfg(target_pointer_width = "64")]
const _: () = assert!(core::mem::size_of::<SerialTtyOps>() == 88);
#[cfg(target_pointer_width = "32")]
const _: () = assert!(core::mem::size_of::<SerialTtyOps>() == 44);

21.1.6.7 Break Handling ioctls¶

/// TCSBRK: send a BREAK. If arg==0, send 0.25s break; if arg!=0, drain output.
pub const TCSBRK:  u32 = 0x5409;
/// TCSBRKP: send break of arg*0.1s (POSIX break).
pub const TCSBRKP: u32 = 0x5425;
/// TIOCSBRK: start sending BREAK (until TIOCCBRK or TCSBRK arg=0).
pub const TIOCSBRK: u32 = 0x5427;
/// TIOCCBRK: stop sending BREAK.
pub const TIOCCBRK: u32 = 0x5428;

21.1.6.8 minicom Compatibility¶

minicom requires the following kernel features to operate correctly:

Feature	UmkaOS mechanism
Open serial port exclusively	`TIOCEXCL` → sets `TtyPort::exclusive` flag
Set baud rate (e.g., 115200)	`TCSETS2` with `BOTHER` or `TCSETS` with `B115200`
Hardware flow control	`CRTSCTS` flag → `set_mctrl(TIOCM_RTS)` + hardware CTS monitoring
Software flow control	`IXON`/`IXOFF` handled in N_TTY line discipline
Raw mode (no echo, no canon)	`c_lflag &= ~(ICANON\|ECHO\|ECHOE\|ISIG)`
Non-blocking I/O with timeout	`VMIN=0, VTIME=10` (1-second timeout per read)
Modem control (dial)	`TIOCMBIS(TIOCM_DTR\|TIOCM_RTS)` to assert DTR/RTS
Wait for DCD (carrier detect)	`TIOCMIWAIT(TIOCM_CAR)`
TIOCGSERIAL (low-latency mode)	`TIOCSSERIAL` with `ASYNC_LOW_LATENCY` flag
Z-modem (HDLC linedisc)	`TIOCSETD(N_HDLC)` for HDLC-based protocols

All of these are implemented in UmkaOS. minicom, picocom, screen, and cu all work correctly.

21.1.7 Serial Service Provider (Cluster-Wide Serial Access)¶

Provider model: Serial service can be host-proxy (host kernel manages the UART and forwards bytes) or device-native (a serial controller with Tier M firmware provides the service directly). The wire protocol (SerialServiceOpcode) is identical in both cases. Sharing model: exclusive (one peer at a time per serial port).

A node with a physical serial port can provide it as a cluster capability service. Any peer in the cluster can discover and use the serial port as if it were locally attached. This is the serial/TTY instantiation of the capability service provider model (Section 5.7).

Use cases: - Out-of-band management consoles (serial-connected switches, PDUs, UPS) - Industrial/embedded clusters (PLCs, sensors, GPS receivers, modems) - Debug consoles (kernel serial output from remote nodes) - Legacy equipment management (storage controllers, network appliances)

// umka-user-io/src/serial_service_provider.rs

/// Provides a local serial port as a cluster service.
pub struct SerialServiceProvider {
    /// Local serial device being served.
    device: SerialDeviceHandle,
    /// Service instance identifier.
    service_id: ServiceInstanceId,
    /// Service endpoint on the peer protocol.
    endpoint: PeerServiceEndpoint,
    /// Current serial configuration (baud, parity, etc.).
    config: TermiosConfig,
    /// Connected client (at most one — serial is exclusive).
    client: Option<PeerId>,
}

PeerCapFlags: SERIAL_PORT (bit 9) — advertised by peers that provide serial port access.

ServiceId: ServiceId("serial", 1).

PeerServiceDescriptor.properties (32 bytes):

#[repr(C)]
pub struct SerialPortProperties {
    /// Port name on the serving host (e.g., "ttyS0", "ttyUSB0").
    pub port_name: [u8; 16],
    /// Maximum supported baud rate.
    pub max_baud: u32,
    /// Capabilities bitmask.
    /// bit 0: hardware flow control (RTS/CTS)
    /// bit 1: modem control signals (DTR/DSR/DCD/RI)
    /// bit 2: RS-485 mode
    pub capabilities: u32,
    pub _pad: [u8; 8],
}
// SerialPortProperties: [u8;16](16) + u32(4) + u32(4) + [u8;8](8) = 32 bytes.
// Wire struct (PeerServiceDescriptor.properties payload).
const_assert!(core::mem::size_of::<SerialPortProperties>() == 32);

Wire protocol — four opcodes via ServiceMessage/ServiceResponse:

#[repr(u16)]
pub enum SerialServiceOpcode {
    /// Client → provider: transmit bytes.
    /// Payload: raw bytes (up to 224 bytes per entry, continuation for more).
    TxData       = 0x0001,
    /// Provider → client: received bytes from serial port.
    /// Payload: raw bytes. Sent as data arrives (no batching delay).
    RxData       = 0x0002,
    /// Client → provider: set serial configuration.
    /// Payload: SerialConfig (baud, data bits, parity, stop bits, flow control).
    SetConfig    = 0x0010,
    /// Client → provider: set/get modem control lines.
    /// Payload: ModemControl (DTR, RTS, read DCD/DSR/RI/CTS).
    ModemControl = 0x0020,
}

Serial service messages use the standard ServiceMessage/ServiceResponse framing from the peer protocol (Section 5.1). Each message contains a ServiceMessage header (opcode, sequence_number, payload_length) followed by an opcode-specific payload:

Opcode	Dir	Payload	Details
`TxData` (0x0001)	Client->Provider	Raw bytes, 1-224 B per entry	Continuation entries for data > 224 bytes. No application-level sequence numbering within the byte stream — order is guaranteed by the RDMA RC QP (in-order delivery).
`RxData` (0x0002)	Provider->Client	Raw bytes or error-prefixed bytes	When error flags are present, payload uses per-character framing: normal byte = `[byte]`; error byte = `[0xFF] [error_flag] [byte]`. A literal 0xFF in data is escaped as `[0xFF] [0x00] [0xFF]`. This matches Linux `PARMRK` encoding. See the error reporting table below.
`SetConfig` (0x0010)	Client->Provider	`SerialConfig` struct (16 bytes)	Response: `ServiceResponse` with status 0 (success) or `-EINVAL` (unsupported configuration). Synchronous — client blocks until response.
`ModemControl` (0x0020)	Bidirectional	`ModemControlPayload` (4 bytes)	Client sends to set DTR/RTS. Provider sends asynchronously when DCD/DSR/RI/CTS change.

/// Modem control payload for the ModemControl opcode. 4 bytes.
/// Direction: bidirectional. When sent by the client, bits 4-5 (DTR, RTS)
/// are commands. When sent by the provider, all bits reflect current
/// physical line state.
#[repr(C)]
pub struct ModemControlPayload {
    /// Bitmask of modem signal states. Layout matches Linux TIOCM_* constants.
    pub signals: u32,
}
// ModemControlPayload: u32(4) = 4 bytes. Wire struct.
const_assert!(core::mem::size_of::<ModemControlPayload>() == 4);

/// Serial line configuration. 16 bytes.
#[repr(C)]
pub struct SerialConfig {
    pub baud_rate: u32,       // e.g., 115200
    pub data_bits: u8,        // 5, 6, 7, or 8
    pub parity: u8,           // 0=none, 1=odd, 2=even
    pub stop_bits: u8,        // 1 or 2
    pub flow_control: u8,     // 0=none, 1=XON/XOFF, 2=RTS/CTS
    pub flags: u8,            // bit 0: BREAK active (1=assert, 0=deassert)
    pub _pad: [u8; 7],
}
// SerialConfig: u32(4) + u8(1)*4 + u8(1) + [u8;7](7) = 16 bytes.
// Wire struct (SetConfig opcode payload).
const_assert!(core::mem::size_of::<SerialConfig>() == 16);

Capability gating: Remote serial access requires CAP_SERIAL_REMOTE (Section 9.1). Checked at ServiceBind time.

Exclusive access: Serial ports are inherently single-client. If a second peer tries to bind while a client is connected, ServiceBind returns CapResponseStatus::Busy. The existing client must ServiceUnbind first.

Enforcement: the SerialServiceProvider.client field (Option<PeerId>) is protected by the ServiceBind lock in the peer protocol layer (Section 5.1). When a ServiceBind arrives:

Acquire ServiceBind lock (per-service spinlock).
Check client field: if Some(_), return CapResponseStatus::Busy.
If None: set client = Some(new_peer_id), return CapResponseStatus::Ok.
Release lock.

ServiceUnbind and peer failure clear the client field under the same lock. No CAS retry loop is needed — the spinlock serializes all bind/unbind operations.

Latency: Byte-stream I/O at serial baud rates (115200 bps = ~14 KB/s max) is negligible compared to RDMA bandwidth. The dominant latency is the RDMA RTT (~3-5 us) per TxData/RxData message, which is invisible at serial speeds. At 115200 baud, one character takes ~87 us on the wire -- the RDMA hop adds <6% latency.

Drain protocol: On graceful shutdown (Section 5.8), the serial service provider sends ServiceDrainNotify to the connected client. The client closes the PTY and reconnects to an alternative peer (if alternative_peer is set) or loses access. No data buffering needed -- serial is real-time, no writeback.

21.1.7.1 Serial Service Client (Consuming Peer)¶

On the consuming peer, the serial service client bridges the remote serial port into the local TTY subsystem via a PTY pair. Applications interact with the PTY slave and see a standard terminal device.

/// Client-side state for a bound remote serial port. One instance per
/// active ServiceBind to a serial service provider.
///
/// Tier assignment: Tier 1 (runs in the TTY/VFS isolation domain).
pub struct SerialServiceClient {
    /// ServiceBind connection to the remote serial port provider.
    connection: ServiceBindHandle,
    /// Peer providing the serial port.
    peer_id: PeerId,
    /// PTY master file descriptor. The client kernel thread reads/writes
    /// this fd to bridge data between the PTY and the ServiceMessage ring.
    pty_master_fd: FileDescriptor,
    /// PTY slave index (the N in /dev/pts/N). Used for symlink creation.
    pty_slave_index: u32,
    /// Shadow copy of the current serial configuration. Updated when the
    /// client sends SetConfig to the provider, so the client can answer
    /// local termios queries without a round trip.
    config_shadow: SerialConfig,
    /// Last known modem control line state from the provider.
    /// Updated on each ModemControl response. Read by TIOCMGET ioctl.
    modem_status: AtomicU32,
    /// Bridge thread handle. Runs the PTY-to-service pump loop.
    bridge_thread: KernelThreadHandle,
    /// Shutdown flag. Set to signal the bridge thread to exit.
    shutdown: AtomicBool,
}

/// Modem control line state. Uses Linux TIOCM_* bitmask values directly.
/// TIOCMGET returns the raw `signals` field without translation.
#[repr(C)]
pub struct ModemControlState {
    /// Bitmask of modem signal states using Linux TIOCM_* bit positions:
    /// bit 1:  TIOCM_DTR (0x002) — Data Terminal Ready (set by client)
    /// bit 2:  TIOCM_RTS (0x004) — Request To Send (set by client)
    /// bit 5:  TIOCM_CTS (0x020) — Clear To Send
    /// bit 6:  TIOCM_CAR (0x040) — Carrier Detect / DCD
    /// bit 7:  TIOCM_RNG (0x080) — Ring Indicator
    /// bit 8:  TIOCM_DSR (0x100) — Data Set Ready
    pub signals: u32,
}
// ModemControlState: u32(4) = 4 bytes. Wire struct (TIOCMGET result).
const_assert!(core::mem::size_of::<ModemControlState>() == 4);

PTY bridge architecture: The kernel creates a PTY pair at ServiceBind time. A dedicated kernel thread (serial_bridge_{N}) runs a pump loop:

RX path (provider -> user): The bridge thread polls the ServiceMessage ring for incoming RxData messages. Received bytes are written to the PTY master fd. The PTY slave's line discipline processes them (echo, canonical mode, signal generation) before delivering to the reading application.
TX path (user -> provider): The bridge thread reads from the PTY master fd (which receives bytes written by applications to the PTY slave). Read bytes are packed into TxData ServiceMessage entries and sent to the provider. Up to 224 bytes per ring entry; larger writes use continuation entries.
Event loop: The bridge thread uses poll() on both the PTY master fd and the ServiceMessage ring's eventfd, waking on either direction having data. This avoids busy-waiting and keeps CPU usage at zero when idle.

termios forwarding: When an application sets terminal attributes on the PTY slave (tcsetattr(), stty), the PTY layer generates a TIOCSETS notification on the master side. The bridge thread detects this by checking termios state after each wake, compares against config_shadow, and sends a SetConfig message to the provider for any changed parameters (baud rate, parity, stop bits, flow control). The provider applies the configuration to the physical UART.

Config_shadow synchronization: The bridge thread is single-threaded and owns all config_shadow mutations. The sequence is:

Detect termios change on PTY master.
Copy new config into a local variable (NOT into config_shadow yet).
Send SetConfig to provider with the new config.
Wait for ServiceResponse (synchronous — blocks the bridge thread).
On success: update config_shadow to the new config.
On failure (-EINVAL): revert PTY master termios to config_shadow values via tcsetattr() and return EINVAL to the application.

No CAS needed — single writer (bridge thread), atomic readers (TIOCMGET reads modem_status with Acquire ordering). This eliminates the race where config_shadow could temporarily hold a config that the provider rejected.

Modem status: The provider sends ModemControl messages asynchronously when physical modem control lines change state (DCD drop on disconnect, RI pulse on incoming call, DSR/CTS transitions). The bridge thread receives these and updates modem_status atomically. Applications querying TIOCMGET read the cached modem_status without a network round trip. Setting modem lines (TIOCMSET/TIOCMBIS/TIOCMBIC for DTR/RTS) generates a ModemControl message to the provider.

BREAK forwarding: When an application sends a break condition (tcsendbreak(), TCSBRK ioctl), the bridge thread sends a SetConfig message with flags bit 0 set (BREAK active). The provider asserts BREAK on the physical serial line. A second SetConfig with flags bit 0 clear deasserts BREAK. For timed breaks (tcsendbreak(fd, duration)), the bridge thread sends assert, sleeps for the requested duration (clamped to 250-500 ms per POSIX convention), then sends deassert. The flags field is separate from flow_control to avoid overloading flow control semantics with unrelated signaling.

Error reporting: The provider includes error flags in RxData messages when the physical UART detects line errors. A one-byte error prefix per affected character encodes the error type:

Error Flag	Value	TTY Flag	Meaning
None	`0x00`	`TTY_NORMAL`	Normal character
Parity	`0x01`	`TTY_PARITY`	Parity error on this character
Framing	`0x02`	`TTY_FRAME`	Framing error (missing stop bit)
Overrun	`0x04`	`TTY_OVERRUN`	UART receive buffer overrun
Break	`0x08`	`TTY_BREAK`	Break condition detected

When error flags are present, RxData payload uses the Linux PARMRK-style encoding: 0xFF, error_flag, character. The bridge thread injects these into the PTY master with the appropriate TTY flags so the line discipline can deliver PARMRK-encoded errors to applications that have PARMRK set in termios, or replace erroneous characters with \0 for applications that have IGNPAR clear and PARMRK clear.

Device naming: The client creates a symlink /dev/ttyRemote{N} pointing to the PTY slave /dev/pts/{M}. The symlink is created via a sysfs device registration under /sys/class/tty/ttyRemote{N}/ with attributes:

peer: peer ID of the provider node
port: provider-side port name (from SerialPortProperties.port_name)
speed: current baud rate

Discovery: ls /sys/class/tty/ttyRemote*/ lists all remote serial ports. Udev rules can create additional symlinks (e.g., /dev/serial/by-peer/).

Line discipline: The line discipline (N_TTY, N_SLIP, N_HDLC, etc.) always runs on the client side, in the PTY slave's processing path. The provider always sends and receives raw bytes -- it never interprets line editing, signal generation, or protocol framing. This avoids split-brain where both sides attempt line discipline processing, and ensures that stty settings on the client are authoritative.

Reconnection: If the provider disconnects (peer failure, ServiceDrainNotify, or RDMA link error), the bridge thread enters a reconnection loop:

The PTY stays open — applications don't see immediate errors. Reads block, writes buffer locally (bounded: 4 KB, matching typical serial buffer size).
The bridge thread attempts to re-bind to the same service on the same peer (or alternative_peer if specified in ServiceDrainNotify).
On successful reconnect: re-send the last SetConfig from config_shadow to restore serial parameters, then drain the write buffer to the provider.
After reconnect_timeout_sec seconds (default: 30, configurable via sysfs at /sys/class/tty/ttyRemote{N}/reconnect_timeout) without successful reconnection: complete all pending reads with -EIO, discard write buffer. The PTY remains open but all subsequent operations return -EIO until a new provider connection is established. The configurable range is 5-300 seconds; values outside this range are clamped.
On provider return (peer re-joins cluster with same serial service): automatic rebind. Applications see no error.

Window size (TIOCGWINSZ/TIOCSWINSZ): Not forwarded to the provider. Serial ports have no concept of terminal window dimensions — window size is a property of the PTY slave, managed entirely on the client side by terminal emulators. This is correct behavior: the provider deals with a physical UART, not a terminal.

21.2 Console Framework and Kernel Logging¶

Tier assignment: The console framework (log ring buffer, backend dispatch, console= parsing) runs as Tier 0 Evolvable — in the Core domain but live-replaceable via EvolvableComponent. It must be callable from any kernel context (interrupt, NMI, panic) without domain crossings. Console backends are Tier 1 Evolvable (serial driver, netconsole) except for the emergency serial output which is Tier 0 static (non-evolvable, panic-safe, already exists in arch::current::serial).

KABI interface name: console_backend_v1 (in interfaces/console_backend.kabi).

21.2.1 Kernel Log Ring Buffer¶

The kernel log ring buffer (klog) is the central store for all kernel diagnostic messages. It replaces Linux's printk ring buffer with a lock-free, NMI-safe, multi-producer design. All kernel subsystems write here; console backends read from here.

21.2.1.1 Log Levels¶

// umka-core/src/klog/mod.rs

/// Kernel log levels. Numerically compatible with Linux syslog(2) severity.
#[repr(u8)]
pub enum KlogLevel {
    /// System is unusable (panic imminent).
    Emerg   = 0,
    /// Action must be taken immediately.
    Alert   = 1,
    /// Critical conditions (hardware failure, driver crash).
    Crit    = 2,
    /// Error conditions (recoverable failures).
    Err     = 3,
    /// Warning conditions (degraded operation).
    Warning = 4,
    /// Normal but significant events (driver loaded, device detected).
    Notice  = 5,
    /// Informational messages (boot progress, configuration).
    Info    = 6,
    /// Debug-level messages (disabled by default in production).
    Debug   = 7,
}

21.2.1.2 Log Entry Format¶

Each log entry is a descriptor (fixed-size metadata) plus variable-length text stored in a separate data ring. This two-ring design avoids wasting space on short messages and supports messages up to 1024 bytes without fragmentation.

/// Descriptor ring entry — fixed 64 bytes, cache-line aligned.
/// Writers claim a slot by CAS on the global sequence counter, then fill
/// the descriptor and mark it committed. Readers skip uncommitted slots.
#[repr(C, align(64))]
pub struct KlogDescriptor {
    /// Monotonically increasing sequence number. Assigned by atomic
    /// fetch_add on `KLOG_RING.next_seq`. Never wraps within 50-year
    /// lifetime (u64 at 10M messages/sec = 58,000 years).
    pub seq: u64,

    /// Timestamp in nanoseconds since boot. Source:
    /// `arch::current::cpu::read_timestamp_ns()`. In NMI context, this
    /// may use a less precise source (TSC without interpolation).
    pub timestamp_ns: u64,

    /// Offset into the data ring where message text begins.
    pub data_offset: u32,

    /// Length of the message text in bytes (0..=1024).
    pub text_len: u16,

    /// Length of the subsystem prefix within text (e.g., 3 for "net").
    /// Text format: "{subsystem}: {message}". If subsystem_len == 0,
    /// no prefix is present.
    pub subsystem_len: u8,

    /// Log level (KlogLevel).
    pub level: u8,

    /// Syslog facility (0=kern, always 0 for kernel messages).
    /// Stored for syslog(2) / /dev/kmsg compatibility.
    pub facility: u8,

    /// Flags.
    pub flags: KlogFlags,

    /// CPU that generated this message.
    pub cpu: u16,

    /// PID of the logging task. 0 for interrupt/NMI/idle context.
    pub pid: u32,

    /// Descriptor state. Writers set to COMMITTED after filling all
    /// fields. Readers skip entries that are not COMMITTED.
    /// On wrap, the reclaimer sets old entries to FREE.
    pub state: AtomicU8,

    /// Padding to 64 bytes. Fields end at offset 33; align(64) requires
    /// 31 bytes of explicit padding to fill the cache line.
    _pad: [u8; 31],
}
const_assert!(core::mem::size_of::<KlogDescriptor>() == 64);

bitflags! {
    /// Per-entry flags.
    pub struct KlogFlags: u8 {
        /// Continuation of the previous message (no newline between).
        const CONT    = 1 << 0;
        /// Message includes a trailing newline.
        const NEWLINE = 1 << 1;
        /// Written from NMI context (may have imprecise timestamp).
        const NMI     = 1 << 2;
        /// Written during panic (after panic path entered).
        const PANIC   = 1 << 3;
    }
}

/// Descriptor states.
#[repr(u8)]
pub enum KlogDescState {
    /// Slot is free (available for writers).
    Free      = 0,
    /// Slot is being written (writer claimed it but hasn't finished).
    Reserved  = 1,
    /// Slot is committed and readable.
    Committed = 2,
}

21.2.1.3 Ring Buffer Structure¶

/// The kernel log ring buffer. Two-ring design: a descriptor ring (fixed-size
/// entries) and a data ring (variable-length message text). The descriptor ring
/// is indexed by `seq % KLOG_DESC_COUNT`. The data ring is a byte-level
/// circular buffer with offsets stored in descriptors.
///
/// Concurrency model:
/// - **Writers** (any CPU, any context including NMI): claim a sequence number
///   via `AtomicU64::fetch_add(1, Relaxed)` on `next_seq`, write descriptor +
///   data, mark descriptor as COMMITTED.
/// - **Readers** (console backends, /dev/kmsg, pstore): track their own
///   `read_seq` and iterate forward, skipping FREE/RESERVED slots.
/// - **Reclaimer**: when the descriptor ring is full, the writer whose
///   `fetch_add` returns a seq that would overwrite a COMMITTED slot must
///   first mark that slot (and its data range) as FREE. Oldest messages
///   are silently lost (ring semantics).
///
/// NMI safety: no locks anywhere. Writers use CAS only for the sequence
/// counter. Data ring writes use the descriptor's `data_offset` + `text_len`
/// to claim a contiguous region (computed from the data ring's own atomic
/// write cursor). Worst case under NMI preemption: a RESERVED descriptor is
/// never committed; readers skip it, and it is eventually reclaimed.

/// Descriptor ring capacity. Power of 2 for fast modular indexing.
/// 4096 entries × 64 bytes = 256 KB descriptor ring.
const KLOG_DESC_COUNT: usize = 4096;

/// Data ring capacity. Sized for ~4096 average-length messages.
/// 256 KB data ring. Total klog memory: 512 KB (256 KB desc + 256 KB data).
const KLOG_DATA_SIZE: usize = 256 * 1024;

/// Maximum message text length. Messages longer than this are truncated.
const KLOG_MAX_TEXT: usize = 1024;

pub struct KlogRing {
    /// Descriptor ring (fixed-size, indexed by seq % KLOG_DESC_COUNT).
    pub descs: [KlogDescriptor; KLOG_DESC_COUNT],

    /// Data ring (circular byte buffer for variable-length message text).
    pub data: [u8; KLOG_DATA_SIZE],

    /// Next sequence number to assign. Writers fetch_add(1) to claim.
    pub next_seq: AtomicU64,

    /// Next write offset in the data ring. Writers fetch_add(text_len)
    /// to claim a contiguous region. Wraps modulo KLOG_DATA_SIZE.
    pub data_write_pos: AtomicU32,

    /// Console sequence: the oldest seq that has been delivered to all
    /// console backends. Used by the console dispatcher to know where
    /// to start reading after a new backend registers.
    pub console_seq: AtomicU64,

    /// Current default log level for console output (messages with level
    /// > console_loglevel are not dispatched to console backends, but
    /// are still stored in the ring for /dev/kmsg readers).
    pub console_loglevel: AtomicU8,
}

/// Global klog ring. Allocated from slab at Phase 1.3 (post-slab-init).
/// Before that, all logging goes to the early log ring
/// ([Section 2.3](02-boot-hardware.md#boot-init-cross-arch--early-boot-log-ring)).
pub static KLOG_RING: OnceCell<&'static KlogRing> = OnceCell::new();

21.2.1.4 Early Boot Ring Transition¶

Before slab init (Phases 0.x–1.2), the early log ring (Section 2.3) stores boot diagnostics as raw text in a 64 KB BSS buffer. At Phase 1.3 (post-slab-init):

Allocate KlogRing from slab (512 KB: 256 KB descriptors + 256 KB data).
Replay all early log entries into KlogRing as KlogLevel::Info with reconstructed timestamps (boot-relative offsets from total_written).
Set KLOG_RING via OnceCell::set().
Redirect early_log() to call klog() (the flag set by early_log_replay() already handles this — see Section 2.3).
The early log ring BSS memory can be reclaimed after replay.

21.2.1.5 Writer Interface¶

/// Write a message to the kernel log ring buffer.
///
/// Safe to call from any context: process, softirq, hardirq, NMI.
/// Messages longer than KLOG_MAX_TEXT (1024 bytes) are truncated.
///
/// This is the `printk` equivalent. All kernel subsystems call this.
pub fn klog(level: KlogLevel, subsystem: &str, msg: &str);

/// Formatted variant (format string + args, no heap allocation).
/// Uses a per-CPU scratch buffer (1024 bytes) for formatting.
/// In NMI context, uses a separate NMI scratch buffer to avoid
/// corrupting the interrupted CPU's buffer.
pub fn klog_fmt(level: KlogLevel, subsystem: &str, fmt: core::fmt::Arguments<'_>);

21.2.1.6 Reader Interface¶

/// A klog reader tracks its position in the ring via `read_seq`.
/// Multiple independent readers can exist (console dispatcher,
/// /dev/kmsg file descriptors, pstore dumper).
pub struct KlogReader {
    /// Next sequence number to read. Initialized to `KLOG_RING.console_seq`
    /// for new readers (skip already-delivered messages) or to 0 for
    /// /dev/kmsg readers opened with `SEEK_SET` to 0 (read full ring).
    pub read_seq: u64,
}

impl KlogReader {
    /// Read the next committed entry. Returns `None` if no new entries.
    /// Skips FREE and RESERVED descriptors (treats them as gaps).
    /// If the reader has fallen behind and entries were overwritten,
    /// advances `read_seq` to the oldest available entry and sets
    /// `KlogReadResult::gap` to the number of lost messages.
    pub fn next(&mut self) -> Option<KlogReadResult>;
}

pub struct KlogReadResult {
    /// The descriptor (metadata).
    pub desc: KlogDescriptor,
    /// The message text (copied from data ring).
    pub text: ArrayVec<u8, KLOG_MAX_TEXT>,
    /// Number of messages lost due to ring wrap since last read.
    /// 0 in normal operation.
    pub gap: u64,
}

21.2.1.7 /dev/kmsg Interface¶

The kernel log ring is exposed to userspace as /dev/kmsg (major 1, minor 11), compatible with Linux's /dev/kmsg format:

read(): Returns the next log entry in the format: <priority>,<seq>,<timestamp_us>,<flags>;<text>\n where priority = facility * 8 + level, matching syslog(2).
write(): Injects a user-supplied message at KlogLevel::Info (or level parsed from <N> prefix). Used by logger(1) and systemd-journald.
poll(): POLLIN when new entries are available after the reader's read_seq.
lseek(SEEK_DATA, 0): Reset reader to oldest available entry.
lseek(SEEK_END, 0): Reset reader to newest entry (skip history).

21.2.1.8 syslog(2) Syscall Compatibility¶

The syslog(2) syscall (not to be confused with the C library's syslog(3)) provides Linux-compatible access to the log ring:

Command	Description
`SYSLOG_ACTION_READ` (2)	Read from ring, blocking. Requires `CAP_SYSLOG`.
`SYSLOG_ACTION_READ_ALL` (3)	Read entire ring (non-destructive).
`SYSLOG_ACTION_READ_CLEAR` (4)	Read and clear ring.
`SYSLOG_ACTION_CLEAR` (5)	Clear ring (advance console_seq).
`SYSLOG_ACTION_CONSOLE_OFF` (6)	Disable console output.
`SYSLOG_ACTION_CONSOLE_ON` (7)	Enable console output.
`SYSLOG_ACTION_CONSOLE_LEVEL` (8)	Set console_loglevel.
`SYSLOG_ACTION_SIZE_UNREAD` (9)	Return bytes available.
`SYSLOG_ACTION_SIZE_BUFFER` (10)	Return total ring buffer size.

Commands that read or clear require CAP_SYSLOG (Linux capability bit 34).

21.2.2 Console Framework¶

The console framework dispatches log messages from the klog ring buffer to registered console backends. It is the kernel's fan-out mechanism: a single log message is delivered to every active backend (serial console, VGA text, netconsole, etc.).

Evolvable: The console framework implements EvolvableComponent. Its state is the backend list, console_loglevel, and per-backend read positions. Live evolution swaps the dispatch logic; backends are not disturbed. The framework has no hot-path callers (log dispatch is warm-path: bounded by I/O throughput, not CPU), so the EvolvableComponent overhead is acceptable.

21.2.2.1 ConsoleBackend Trait¶

// umka-core/src/console/mod.rs — console backend contract

/// A console backend receives formatted log messages from the klog ring
/// and outputs them to a specific device (serial port, network, VGA).
///
/// Backends register via `console_register()` and are called by the
/// console dispatcher thread. Multiple backends can be active
/// simultaneously (fan-out).
pub trait ConsoleBackend: Send + Sync {
    /// Write a log message to this console backend. Called from the
    /// console dispatcher thread (process context, preemptible).
    ///
    /// `text` is the formatted message including subsystem prefix and
    /// newline. The backend must not assume any particular encoding
    /// (UTF-8 text is typical but not guaranteed for binary dmesg).
    ///
    /// Returns `Ok(())` on success, `Err(ConsoleError)` on failure.
    /// Persistent failures cause the framework to deregister the backend
    /// after `CONSOLE_MAX_ERRORS` (16) consecutive errors.
    fn write(&self, text: &[u8], meta: &KlogDescriptor) -> Result<(), ConsoleError>;

    /// Emergency write — called during panic with IRQs disabled, possibly
    /// from NMI context. Must be lock-free and allocation-free.
    ///
    /// Backends that cannot safely write in panic context should return
    /// `Err(ConsoleError::NotAvailable)` immediately. The framework will
    /// continue to the next backend in the priority chain.
    ///
    /// Default implementation returns `NotAvailable`.
    fn emergency_write(&self, text: &[u8]) -> Result<(), ConsoleError> {
        Err(ConsoleError::NotAvailable)
    }

    /// Return this backend's priority. Lower values = higher priority.
    /// Used for ordering during panic fallback (try high-priority backends
    /// first). Standard priorities:
    /// - 0–9: Emergency/Tier 0 backends (serial, VGA)
    /// - 10–19: Tier 1 backends (serial driver, netconsole)
    /// - 20–29: Tier 2 backends (userspace log aggregators)
    fn priority(&self) -> u8;

    /// Human-readable name for this backend (e.g., "ttyS0", "netcon0").
    fn name(&self) -> &str;

    /// Optional: backend-specific setup invoked when `console=` parameters
    /// are parsed. `options` is the part after the device name and comma
    /// (e.g., "115200n8" for `console=ttyS0,115200n8`).
    ///
    /// Default implementation ignores options.
    fn setup(&self, _options: &str) -> Result<(), ConsoleError> {
        Ok(())
    }
}

pub enum ConsoleError {
    /// Backend cannot write (hardware not ready, network down, etc.).
    NotAvailable,
    /// Transient I/O error (retry may succeed).
    IoError,
    /// Backend is permanently failed (deregister it).
    Failed,
}

21.2.2.2 Backend Registration¶

/// Maximum number of simultaneously active console backends.
/// Matches Linux's MAX_CMDLINECONSOLES (8).
const CONSOLE_MAX_BACKENDS: usize = 8;

/// Register a console backend. The backend is appended to the active
/// list and begins receiving log messages from the current klog position.
///
/// If `CONSOLE_MAX_BACKENDS` are already registered, returns
/// `Err(ConsoleError::Failed)`.
///
/// Called from driver init context (warm path, may allocate).
pub fn console_register(
    backend: &'static dyn ConsoleBackend,
) -> Result<(), ConsoleError>;

/// Deregister a console backend. The backend stops receiving messages.
/// Called during driver unload or on persistent backend failure.
pub fn console_deregister(backend: &'static dyn ConsoleBackend);

21.2.2.3 Console Dispatcher Thread¶

The console framework runs a dedicated kernel thread (klogd) that reads from the klog ring buffer and dispatches messages to all registered backends:

/// Console dispatcher. Runs as a kernel thread started at Phase 2.8
/// (post-workqueue-init). Before this thread starts, log messages are
/// stored in the klog ring but not dispatched — they accumulate and are
/// delivered in a burst when the thread starts.
///
/// Priority: SCHED_OTHER, nice -5 (same as TTY workers). Elevated to
/// nice -15 if dispatch falls behind (> 256 undispatched entries).
fn klogd_main() -> ! {
    let mut reader = KlogReader::new_from_console_seq();
    loop {
        // Wait for new entries.
        KLOG_RING.wait_for_entries(&reader);

        // Dispatch all available entries to all backends.
        while let Some(entry) = reader.next() {
            // Skip entries below console_loglevel.
            if entry.desc.level > KLOG_RING.console_loglevel.load(Relaxed) {
                continue;
            }
            // Fan-out to all registered backends.
            for backend in console_backends() {
                let _ = backend.write(&entry.text, &entry.desc);
            }
        }
    }
}

During panic, the dispatcher thread is bypassed. The panic path calls emergency_write() directly on each backend (see §Panic Console Path below).

21.2.2.4 Log Level Filtering¶

Console output is filtered by console_loglevel (default: KlogLevel::Info = 6). Messages with level > console_loglevel are suppressed from console backends but remain in the klog ring for /dev/kmsg readers.

Controllable via: - Boot parameter: umka.loglevel=N (0–7) - syslog(2): SYSLOG_ACTION_CONSOLE_LEVEL - umkafs: /ukfs/kernel/console_loglevel (read-write)

21.2.3 Kernel Command Line Console Parameters¶

21.2.3.1 console= Syntax¶

The console= boot parameter selects which console backends are active and configures their hardware parameters. Syntax is Linux-compatible:

console=<device>[,<options>]

Multiple console= parameters can be specified; all named backends receive output. The last console= device becomes the primary console (/dev/console points to it), matching Linux behavior.

Supported device specifiers:

Device	Backend	Options Format	Example
`ttyS<N>`	Serial port N	`[baudrate][parity][bits][flow]`	`console=ttyS0,115200n8`
`ttyS<N>`	Serial port N	(no options = 115200,8N1)	`console=ttyS1`
`uart[8250],io,<addr>`	8250 UART at I/O port	`[,baudrate]`	`console=uart,io,0x3f8,115200`
`uart[8250],mmio,<addr>`	8250 UART at MMIO addr	`[,baudrate]`	`console=uart,mmio,0x09000000`
`hvc<N>`	Hypervisor console N	(none)	`console=hvc0`
`netcon<N>`	Netconsole target N	`@<src_ip>/<dev>,@<dst_ip>/<dst_mac>`	See §Netconsole
`null`	Discard output	(none)	`console=null`

Options parsing for ttyS:

baudrate: 300, 1200, 2400, 4800, 9600, 19200, 38400, 57600, 115200,
          230400, 460800, 500000, 576000, 921600, 1000000, 1500000,
          2000000, 3000000, 4000000  (default: 115200)
parity:   n = none, o = odd, e = even  (default: n)
bits:     7 or 8  (default: 8)
flow:     r = RTS/CTS flow control  (default: none)

Example: console=ttyS1,9600e7r → serial port 1, 9600 baud, even parity, 7 data bits, RTS/CTS flow control.

21.2.3.2 earlycon= Syntax¶

The earlycon= parameter configures the Tier 0 emergency serial console that operates before the full console framework is available. Unlike console=, earlycon= configures the arch::current::serial layer directly — it does not register a ConsoleBackend.

earlycon=<type>,<addr>[,<baudrate>]

Type	Hardware	Platforms
`uart8250,io,<port>`	16550 UART at I/O port	x86-64
`uart8250,mmio,<addr>`	16550 UART at MMIO address	RISC-V, PPC32
`pl011,mmio,<addr>`	ARM PL011 UART	AArch64, ARMv7
`sbi`	SBI console calls	RISC-V
`opal`	OPAL firmware calls	PPC64LE
`sclp`	SCLP console	s390x

Without earlycon=, the Tier 0 serial uses platform defaults (COM1/0x3F8 on x86-64, DTB stdout-path on DT platforms). The earlycon= parameter overrides these defaults for non-standard hardware configurations.

21.2.3.3 Boot Parameter Registration¶

Console parameters are registered in the boot parameter registry (Section 20.9):

Parameter	Schema	Description
`console`	String (multi)	Console backend device + options
`earlycon`	String	Early console type + address
`umka.loglevel`	u8 (0–7)	Default console log level
`umka.log_buf_len`	Size	Klog ring effective capacity, up to compile-time max of 512K. To increase beyond the default, reconfigure `KLOG_DESC_COUNT` and `KLOG_DATA_SIZE` at compile time.

21.2.4 Serial Console Backend¶

The serial console backend connects the console framework to physical serial ports via the Tier 1 UART driver. It bridges the gap between the kernel's log ring and the hardware UART, handling baud rate configuration, port selection, and the Tier 0/Tier 1 transition during boot.

21.2.4.1 Architecture¶

┌─────────────────────────────────────────────────────┐
│  klog ring buffer (Tier 0 Evolvable)                │
│    ↓ klogd dispatcher thread                        │
│  ┌───────────────────────────────────────────┐      │
│  │ Console Framework (Tier 0 Evolvable)      │      │
│  │   fan-out to all registered backends      │      │
│  └──────┬────────────────┬───────────────────┘      │
│         │                │                          │
│  ┌──────▼──────┐  ┌──────▼──────────┐               │
│  │ Serial      │  │ Netconsole      │  (other       │
│  │ Console     │  │ Backend         │  backends)    │
│  │ Backend     │  │ (Tier 1)        │               │
│  │ (Tier 1)    │  └──────┬──────────┘               │
│  └──────┬──────┘         │                          │
│         │ KABI T1        │ UDP via umka-net          │
│  ┌──────▼──────┐         │                          │
│  │ UART Driver │  ┌──────▼──────────┐               │
│  │ (Tier 1)    │  │ NIC Driver      │               │
│  │ 16550/PL011 │  │ (Tier 1)        │               │
│  └─────────────┘  └─────────────────┘               │
│                                                     │
│  ┌─────────────────────────────────────────────┐    │
│  │ Emergency Serial (Tier 0 Static)            │    │
│  │ arch::current::serial::puts()               │    │
│  │ Panic-only fallback. No KABI, no isolation. │    │
│  └─────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────┘

21.2.4.2 Tier 1 Serial Console¶

The serial console backend is a thin adapter between the console framework and the Tier 1 UART driver exposed via the SerialTtyOps KABI (Section 21.1):

/// Serial console backend. Wraps a Tier 1 UART driver's KABI handle.
pub struct SerialConsoleBackend {
    /// KABI service handle to the Tier 1 UART driver.
    service: KabiServiceHandle,
    /// Which serial port this backend drives (0 = ttyS0, 1 = ttyS1, ...).
    port_index: u8,
    /// Human-readable port name: "ttyS0", "ttyS1", etc.
    /// Computed from `port_index` at construction: `format!("ttyS{}", port_index)`.
    port_name: ArrayString<8>,
    /// Configured baud rate (from console= parameter or default 115200).
    baud_rate: u32,
    /// Whether this is the primary console (/dev/console target).
    is_primary: bool,
}

impl ConsoleBackend for SerialConsoleBackend {
    fn write(&self, text: &[u8], _meta: &KlogDescriptor) -> Result<(), ConsoleError> {
        // KABI T1 call to UART driver's transmit function.
        // Domain switch to UART driver's isolation domain, write bytes
        // to hardware TX FIFO, domain switch back.
        self.service.call(SerialTtyOps::TX_DATA, text)
            .map_err(|_| ConsoleError::IoError)
    }

    fn emergency_write(&self, text: &[u8]) -> Result<(), ConsoleError> {
        // During panic: domain isolation is revoked (PKRU=0 / all
        // permissions). Call the UART driver's emergency path directly
        // as a T0 call (no ring buffer, no domain switch).
        // The driver's emergency_write must be lock-free and poll the
        // UART TX-ready bit directly.
        unsafe { self.service.emergency_call(SerialTtyOps::TX_DATA, text) }
            .map_err(|_| ConsoleError::IoError)
    }

    fn priority(&self) -> u8 { 10 }

    fn name(&self) -> &str {
        // Returns "ttyS0", "ttyS1", etc.
        // Name stored inline (ArrayString<8>).
        &self.port_name
    }

    fn setup(&self, options: &str) -> Result<(), ConsoleError> {
        // Parse "115200n8r" format and configure UART via KABI.
        let config = parse_serial_options(options)?;
        self.service.call(SerialTtyOps::SET_TERMIOS, &config)
            .map_err(|_| ConsoleError::IoError)
    }
}

21.2.4.3 Port Discovery¶

Serial port discovery uses the same mechanisms as the TTY subsystem:

ACPI platforms (x86-64, AArch64 servers): Serial ports enumerated from ACPI SPCR (Serial Port Console Redirection Table) and ACPI namespace \_SB device entries with _HID = PNP0501 (16550) or ARMH0011 (PL011).
DT platforms: Serial ports discovered from /serial@<addr> nodes or aliases (serial0, serial1, ...). The stdout-path property in /chosen identifies the default console port.
x86-64 legacy: COM1–COM4 at standard I/O ports (0x3F8, 0x2F8, 0x3E8, 0x2E8) are probed if no ACPI SPCR is present.

The port discovery order determines the ttyS<N> numbering: the device matching stdout-path (DT) or SPCR (ACPI) is always ttyS0.

21.2.4.4 Boot Transition: Tier 0 → Tier 1¶

During boot, serial console output transitions from the Tier 0 emergency serial to the Tier 1 UART driver:

Boot Phase	Serial Output Path	Notes
0.1–1.2	`arch::current::serial::puts()` (Tier 0 static)	Hardcoded port, 115200 8N1
1.3–4.x	`early_log()` → klog ring (stored, not dispatched)	klogd not yet running
2.8	klogd starts, reads klog ring, dispatches to emergency serial backend	Emergency serial registered as ConsoleBackend with priority 5
5.3	Tier 1 UART driver loads, registers SerialConsoleBackend	Priority 10; emergency serial backend remains as fallback
5.3+	klogd dispatches to Tier 1 serial backend	Full baud rate / port config applied
Panic	Framework calls `emergency_write()` on all backends → falls through to Tier 0 `arch::current::serial::puts()`	See §Panic Console Path

Emergency serial as ConsoleBackend: Between Phase 2.8 and Phase 5.3, the Tier 0 emergency serial is wrapped in a minimal ConsoleBackend adapter:

/// Tier 0 emergency serial wrapped as a ConsoleBackend.
/// Active from Phase 2.8 until a Tier 1 UART driver takes over.
/// Remains registered as a fallback even after Tier 1 registration.
struct EmergencySerialBackend;

impl ConsoleBackend for EmergencySerialBackend {
    fn write(&self, text: &[u8], _meta: &KlogDescriptor) -> Result<(), ConsoleError> {
        for &b in text {
            arch::current::serial::putb(b);
        }
        Ok(())
    }

    fn emergency_write(&self, text: &[u8]) -> Result<(), ConsoleError> {
        // Same as write() — already lock-free and allocation-free.
        self.write(text, &KlogDescriptor::ZERO)
    }

    fn priority(&self) -> u8 { 5 } // Higher priority than Tier 1 backends

    fn name(&self) -> &str { "earlycon" }
}

21.2.5 Netconsole¶

Netconsole sends kernel log messages over UDP to a remote log collector. It provides remote kernel debugging without physical serial access — critical for development on real hardware and for production monitoring of headless systems.

21.2.5.1 Design Constraints¶

Tier 1 Evolvable: netconsole lives in the umka-net Tier 1 domain. It is a network subsystem consumer, not a Core component.
Available only after Phase 5.3: requires the network stack (Phase 4.6), a NIC driver (Phase 5.3), and a configured IP address.
Not the primary console: netconsole supplements serial/VGA, it does not replace them. If the network is down, other backends continue working.
Panic path: uses pre-allocated resources and direct NIC access to transmit final messages when the kernel is dying (see §Panic Transmit Path).

21.2.5.2 Target Configuration¶

Each netconsole target is a remote UDP endpoint that receives kernel log messages. Up to 4 targets can be configured simultaneously.

/// Maximum number of simultaneous netconsole targets.
const NETCONSOLE_MAX_TARGETS: usize = 4;

/// A netconsole target: a remote host receiving kernel log messages via UDP.
pub struct NetconsoleTarget {
    /// Target name (for configfs identification, e.g., "target0").
    pub name: ArrayString<16>,

    /// Source IP address (0.0.0.0 = auto-select based on routing).
    pub src_ip: IpAddr,
    /// Source UDP port (default: 6665).
    pub src_port: u16,
    /// Network device name to use for transmission (e.g., "eth0").
    /// Empty string = auto-select based on routing.
    pub dev_name: ArrayString<16>,

    /// Destination IP address (required).
    pub dst_ip: IpAddr,
    /// Destination UDP port (default: 6666).
    pub dst_port: u16,
    /// Destination MAC address (required for same-subnet targets;
    /// ff:ff:ff:ff:ff:ff for broadcast; resolved via ARP for routed targets).
    pub dst_mac: [u8; 6],

    /// Whether this target is enabled (can be toggled at runtime).
    pub enabled: AtomicBool,

    /// Minimum log level to send to this target (default: KlogLevel::Info).
    /// Messages with level > this value are not sent.
    pub loglevel: AtomicU8,

    /// Extended message format (include metadata headers). Default: true.
    pub extended: bool,

    /// Pre-allocated panic transmit resources (see §Panic Transmit Path).
    pub panic_tx: Option<PanicTxResources>,
}

Boot parameter configuration:

netconsole=[+][src-port]@[src-ip]/[dev],[tgt-port]@<tgt-ip>/[tgt-mac]

The + prefix enables extended message format. Examples:

# Basic: send to 10.0.0.1 port 6666, auto-select source
netconsole=@/,@10.0.0.1/

# Extended format, from eth0, to specific MAC
netconsole=+@10.0.0.2/eth0,6666@10.0.0.1/aa:bb:cc:dd:ee:ff

# Multiple targets (multiple parameters)
netconsole=@/,@10.0.0.1/  netconsole=@/,@10.0.0.2/

21.2.5.3 configfs Runtime Configuration¶

Netconsole targets can be added, modified, and removed at runtime via configfs, mounted at /sys/kernel/config/netconsole/:

/sys/kernel/config/netconsole/
├── target0/
│   ├── enabled          # 0 or 1
│   ├── dev_name         # "eth0"
│   ├── local_ip         # "10.0.0.2"
│   ├── local_port       # "6665"
│   ├── remote_ip        # "10.0.0.1"
│   ├── remote_port      # "6666"
│   ├── remote_mac       # "aa:bb:cc:dd:ee:ff"
│   ├── extended         # 0 or 1
│   └── loglevel         # 0-7
└── target1/
    └── ...

Creating a directory creates a target; removing the directory removes it. Writes to parameter files reconfigure the target atomically (the target is briefly disabled during reconfiguration, then re-enabled). The configfs interface matches Linux's netconsole configfs layout for tooling compatibility.

21.2.5.4 Message Format¶

Basic format (one UDP datagram per log message):

<priority>message text\n

Where priority = facility * 8 + level (syslog encoding). Example: <6>eth0: link up, 1000 Mbps\n

Extended format (enabled by + prefix or extended=1):

<level>,<seq>,<timestamp_us>,<flags>;message text\n
 SUBSYSTEM=<subsystem>\n
 CPU=<cpu>\n
 PID=<pid>\n

Extended format adds structured metadata as key=value continuation lines, matching Linux's extended netconsole format. This enables log aggregators (syslog-ng, rsyslog, Loki) to parse and index kernel messages without regex-based extraction.

21.2.5.5 Normal Transmit Path¶

During steady-state operation, netconsole transmits via the standard network stack:

klogd thread
  → ConsoleBackend::write() on NetconsoleBackend
    → for each enabled target:
      → build UDP datagram (NetBuf) with message payload
      → udp_sendmsg() via umka-net (Tier 1 domain)
        → route_lookup() → ip_output() → NetDevice::dispatch_xmit()
          → KABI T1 ring to NIC driver → hardware TX

This is an ordinary UDP send through the full network stack. No special bypass is needed for normal operation. The transmit path inherits all standard networking features: routing, ARP resolution, VLAN tagging, checksum offload.

Rate limiting: Netconsole limits transmission to 1000 messages/second per target (token bucket, capacity 100, refill 1000/sec). Excess messages are silently dropped. This prevents a logging storm from saturating the network link. The rate limit is per-target and configurable via configfs.

impl ConsoleBackend for NetconsoleBackend {
    fn write(&self, text: &[u8], meta: &KlogDescriptor) -> Result<(), ConsoleError> {
        for target in &self.targets {
            if !target.enabled.load(Relaxed) {
                continue;
            }
            if meta.level > target.loglevel.load(Relaxed) {
                continue;
            }
            if !target.rate_limiter.try_acquire() {
                continue; // Rate limited, drop silently.
            }
            let payload = if target.extended {
                format_extended(text, meta)
            } else {
                format_basic(text, meta)
            };
            // UDP send via umka-net. Errors are silently ignored
            // (netconsole is best-effort).
            let _ = self.udp_send(&target, &payload);
        }
        Ok(())
    }

    fn priority(&self) -> u8 { 15 }

    fn name(&self) -> &str { "netcon" }
}

21.2.5.6 Panic Transmit Path¶

During kernel panic, the normal network stack (umka-net) may be dead. The netconsole panic path bypasses the entire Tier 1 network stack and NIC driver isolation to transmit final messages directly via pre-allocated hardware resources.

Design: Each netconsole target pre-allocates a "panic TX slot" during normal operation. This slot contains everything needed to transmit one UDP datagram without any allocation, locking, or domain switching:

/// Pre-allocated resources for panic-time netconsole transmission.
/// Allocated during target setup (warm path). Used during panic (NMI-safe).
pub struct PanicTxResources {
    /// DMA-coherent buffer for the panic message. Pre-allocated, pre-mapped
    /// in the NIC's IOMMU domain. Contains a pre-built Ethernet + IP + UDP
    /// header; only the UDP payload and lengths need updating at panic time.
    pub tx_buf: CoherentDmaBuf,

    /// Pre-built Ethernet header (dst MAC, src MAC, EtherType 0x0800).
    pub eth_header: [u8; 14],

    /// Pre-built IPv4 header (src IP, dst IP, protocol=UDP).
    /// TTL, total_length, and header checksum are updated at panic time.
    /// **IPv4-only**: Panic-time netconsole uses only IPv4 (20-byte fixed header).
    /// IPv6 is not supported on the panic path because: (1) IPv6 headers are
    /// 40 bytes + variable extension headers, increasing complexity in NMI context;
    /// (2) IPv6 requires neighbor discovery which cannot run during panic; (3) most
    /// datacenter monitoring infrastructure supports IPv4. An IPv6 netconsole target
    /// configuration is rejected at setup time with -EAFNOSUPPORT.
    pub ip_header: [u8; 20],

    /// Pre-built UDP header (src port, dst port).
    /// Length and checksum are updated at panic time.
    pub udp_header: [u8; 8],

    /// Maximum payload size (MTU - headers). Panic messages longer than
    /// this are truncated (no fragmentation in panic path).
    pub max_payload: u16,

    /// NIC driver's panic transmit function. This is a raw function pointer
    /// (not a KABI vtable call) that directly programs the NIC hardware to
    /// transmit the pre-allocated DMA buffer. The function must:
    /// - Be lock-free and allocation-free
    /// - Not depend on NAPI, softirqs, or the network stack
    /// - Write a TX descriptor to the NIC's hardware TX ring
    /// - Poke the NIC's doorbell register
    /// - Optionally poll for TX completion (best-effort)
    ///
    /// The NIC driver registers this function during its init if it
    /// supports panic polling (not all drivers do).
    pub panic_xmit: Option<unsafe fn(buf_dma_addr: u64, len: u32)>,
}

Panic transmit procedure:

impl NetconsoleBackend {
    /// Called from the panic console path with IRQs disabled and all
    /// isolation domains revoked (PKRU=0 on x86-64).
    fn panic_transmit(&self, text: &[u8]) {
        for target in &self.targets {
            let Some(ref ptx) = target.panic_tx else { continue };
            let Some(panic_xmit) = ptx.panic_xmit else { continue };

            // 1. Copy pre-built headers + message payload into the
            //    pre-allocated DMA buffer. No allocation, just memcpy.
            let payload_len = text.len().min(ptx.max_payload as usize);
            let total_len = 14 + 20 + 8 + payload_len; // eth + ip + udp + payload

            unsafe {
                let buf = ptx.tx_buf.as_mut_ptr();
                // Ethernet header (pre-built, includes dst/src MAC).
                core::ptr::copy_nonoverlapping(
                    ptx.eth_header.as_ptr(), buf, 14,
                );
                // IPv4 header (update total_length + checksum).
                let mut ip = ptx.ip_header;
                ip[2..4].copy_from_slice(
                    &((20 + 8 + payload_len) as u16).to_be_bytes(),
                );
                update_ip_checksum(&mut ip);
                core::ptr::copy_nonoverlapping(ip.as_ptr(), buf.add(14), 20);
                // UDP header (update length, zero checksum — allowed for IPv4).
                let mut udp = ptx.udp_header;
                udp[4..6].copy_from_slice(
                    &((8 + payload_len) as u16).to_be_bytes(),
                );
                udp[6..8].copy_from_slice(&[0, 0]); // Checksum = 0 (optional in IPv4 UDP).
                core::ptr::copy_nonoverlapping(udp.as_ptr(), buf.add(34), 8);
                // Payload.
                core::ptr::copy_nonoverlapping(
                    text.as_ptr(), buf.add(42), payload_len,
                );

                // 2. Transmit via direct NIC hardware poke.
                //    Isolation domains are already revoked — this is a
                //    direct function call into the NIC driver's code.
                panic_xmit(ptx.tx_buf.dma_addr(), total_len as u32);
            }
        }
    }
}

NIC driver contract for panic polling:

NIC drivers that support panic transmit must implement and register a panic_xmit function with these constraints:

Lock-free: must not acquire any lock (spinlock, mutex, RCU).
Allocation-free: must not call slab, buddy, or any allocator.
No NAPI/softirq: must not schedule softirqs or NAPI.
Pre-reserved TX descriptor: the driver reserves one TX descriptor slot at init time exclusively for panic use. This slot is never used for normal traffic.
Direct hardware access: writes the TX descriptor and pokes the NIC's doorbell register directly (MMIO write).
Best-effort completion: optionally polls the TX completion status for up to 100μs. If the NIC doesn't confirm transmission, the function returns anyway (panic path cannot block indefinitely).

The panic_xmit function is registered via the NetDeviceOps KABI extension:

/// Extension to NetDeviceOps for panic-capable NIC drivers.
/// Optional — drivers that don't support panic polling leave this as None.
pub trait NetDevicePanicOps {
    /// Register a panic transmit function and pre-allocate a TX slot.
    /// Called once during driver init. The returned DMA address is the
    /// pre-mapped buffer that panic_xmit will transmit from.
    fn register_panic_tx(&self) -> Option<PanicTxRegistration>;
}

pub struct PanicTxRegistration {
    /// DMA-coherent buffer for panic TX (pre-allocated, pre-mapped).
    pub buf: CoherentDmaBuf,
    /// Function pointer for lock-free panic transmit.
    pub panic_xmit: unsafe fn(buf_dma_addr: u64, len: u32),
}

Which NIC drivers support panic polling:

Driver	Panic TX	Notes
virtio-net	Yes	Single TX descriptor write + `VIRTIO_PCI_QUEUE_NOTIFY`
e1000/e1000e	Yes	Single TX descriptor write + tail pointer update
igb/ixgbe/ice	Yes	Single TX descriptor write + doorbell
mlx5 (ConnectX)	Best-effort	Requires WQE posting; may fail if WQ is corrupted
bnxt (Broadcom)	Yes	Single TX BD write + doorbell

Drivers that do not support panic TX simply don't register panic_xmit. The netconsole backend skips them during panic — the message is still attempted on other targets and falls through to serial.

21.2.6 Panic Console Path¶

When the kernel panics, the normal klogd dispatcher thread stops. The panic handler takes over console output directly, bypassing all normal dispatch mechanisms. This path must work even when the scheduler is dead, locks are held, and Tier 1 drivers have crashed.

21.2.6.1 Procedure¶

panic() enters:
  1. Set PANIC flag (AtomicBool, globally visible).
  2. Stop all other CPUs (NMI IPI on x86, FIQ on AArch64).
  3. Revoke all isolation domains:
     - x86-64: WRPKRU(0) — all memory accessible.
     - AArch64 POE: MSR POR_EL0 with all-permission overlay.
     - ARMv7: MCR DACR with all-manager bits.
     - Other architectures: no action needed (no fast isolation).
  4. Write panic message to klog ring (KlogFlags::PANIC set).
  5. Call emergency_write() on each registered ConsoleBackend,
     in priority order (lowest priority number first):
     a. EmergencySerialBackend (priority 5) — direct UART poke.
     b. SerialConsoleBackend (priority 10) — calls UART driver's
        emergency path (domain already revoked, so T0 direct call).
     c. NetconsoleBackend (priority 15) — panic_transmit() via
        pre-allocated DMA resources and direct NIC hardware poke.
  6. Errors from any backend are silently ignored; next backend is tried.
  7. After all backends attempted:
     - Call pstore_kmsg_dump() to persist the log ring to non-volatile
       storage ([Section 20.7](20-observability.md#pstore-panic-log-persistence--panic-handler-integration)).
     - Execute panic action (halt, reboot, or kexec to crash kernel).

21.2.6.2 Domain Revocation During Panic¶

Revoking isolation domains during panic is safe because:

All other CPUs are stopped (NMI IPI / FIQ). No concurrent access.
The kernel is dying — isolation's purpose (crash containment) is moot.
Tier 1 driver code becomes directly callable as T0 (no ring buffer, no capability check, no domain switch overhead).
Pre-allocated resources (panic TX DMA buffers) are already mapped in the device's IOMMU domain — no IOMMU reprogramming needed.

The domain revocation is a single instruction per architecture:

Architecture	Instruction	Effect
x86-64	`WRPKRU(0)`	All 16 protection keys accessible
AArch64 POE	`MSR POR_EL0, all-RWX`	All permission overlays grant full access
ARMv7	`MCR p15, DACR, 0xFFFFFFFF`	All 16 domains set to Manager
PPC32	No action	Segment registers already kernel-mode
PPC64LE	No action	Radix PID already kernel
RISC-V/s390x/LoongArch64	No action	No fast isolation to revoke

21.2.6.3 Panic Output Deduplication¶

Both the serial emergency backend and the Tier 1 serial backend may target the same physical UART. To avoid duplicated output during panic:

The Tier 1 SerialConsoleBackend checks whether its port matches the emergency serial port. If so, emergency_write() returns ConsoleError::NotAvailable to let the higher-priority emergency backend handle it.
This check uses the port's base address (I/O port or MMIO address), which is known at registration time. No locking required.

21.2.7 Boot Phase Integration¶

Console-related initialization is woven into the existing boot phase ordering (Section 2.3):

Phase	Action	Component
0.1	`arch::current::serial::init()` — hardcoded UART init	Tier 0 static
0.15	`early_log_init()` — 64 KB BSS ring available	Early log ring
0.x–1.2	All output via `serial::puts()` + `early_log()`	Tier 0 static
1.3	Allocate KlogRing (512 KB), replay early log entries	Klog ring
2.8	Start `klogd` thread, register `EmergencySerialBackend`	Console framework
2.8	Parse `console=` and `earlycon=` from kernel command line	Console framework
4.6	`net_init()` — network stack available (but no NIC yet)	umka-net
5.3	Tier 1 UART driver loads → `SerialConsoleBackend` registered	Serial backend
5.3	Tier 1 NIC driver loads → `NetconsoleBackend` registered (if configured)	Netconsole
5.3+	Full console operation: klogd → fan-out to all backends	Steady state

21.2.7.1 Evolution¶

All console components (framework, serial backend, netconsole backend) are EvolvableComponent and can be live-replaced:

Console framework evolution: new dispatch logic swapped via AtomicPtr vtable swap. Backend list and klog ring (Nucleus-adjacent data) are preserved. Downtime: ~1 μs (stateless policy swap pattern).
Serial backend evolution: new UART driver binary loaded, bilateral KABI exchange re-established. The serial port hardware state is preserved by the driver's export_state() / import_state() (baud rate, flow control, FIFO thresholds). Downtime: ~50–150 ms (standard Tier 1 evolution).
Netconsole evolution: new netconsole module swapped. Target list and panic TX resources are preserved via state serialization. UDP socket is re-created in the new module. Downtime: ~50–150 ms.

During evolution of any console component, the emergency serial backend (Tier 0 static, non-evolvable) continues operating as a fallback.

21.3 Input Subsystem (evdev)¶

Linux's evdev interface (/dev/input/eventX) is the standard for delivering keyboard, mouse, touch, and joystick events to userspace (Wayland compositors, X11).

21.3.1 Tier 2 Input Drivers¶

In UmkaOS, modern input drivers (USB HID, Bluetooth HID, I2C touchscreens) run in Tier 2 (Ring 3, process-isolated) (Section 11.3). An input driver's only responsibility is to parse hardware-specific reports and translate them into standardized input_event structs.

The driver communicates with umka-core via a shared memory ring established during driver registration (umka_driver_register, Section 12.2).

/// Internal kernel input event representation.
/// Uses 64-bit time fields for y2038 safety across all architectures.
///
/// **32-bit compatibility**: The userspace-visible `struct input_event` exposed
/// via `/dev/input/eventX` uses Linux-compatible layout that varies by architecture:
/// - 64-bit platforms: time_sec (u64), time_usec (u64), type (u16), code (u16), value (i32) = 24 bytes
/// - 32-bit platforms: time_sec (u32), time_usec (u32), type (u16), code (u16), value (i32) = 16 bytes
///
/// The `umka-sysapi` layer translates from this internal format to the
/// architecture-specific Linux input_event layout when copying to userspace.
/// This translation is zero-cost on 64-bit platforms (direct copy) and
/// requires field truncation/conversion on 32-bit platforms.
///
/// **Y2038 on 32-bit**: The 32-bit compat path preserves the Linux ABI
/// (u32 timestamps), which wraps in 2038. Linux solved y2038 for input
/// events by redefining `struct input_event` timestamp fields as
/// `__kernel_ulong_t` (unsigned 32-bit) in v5.0 (commit 152194fe9c3f),
/// extending the wrap date to 2106. UmkaOS follows the same approach: the
/// 32-bit compat layer uses unsigned timestamp fields (u32 sec, u32 usec),
/// matching Linux v5.0+ ABI. No separate ioctl is needed for input events.
#[repr(C)]
pub struct InputEvent {
    /// Event timestamp in seconds since boot (CLOCK_MONOTONIC).
    /// 64-bit for y2038 safety. Truncated to u32 at `copy_to_user` time during `read(2)`
    /// on `/dev/input/eventX` for 32-bit processes (via the `umka-sysapi` read path);
    /// the truncation point is the kernel→userspace copy, not ioctl registration or
    /// ring-buffer insertion.
    pub time_sec: u64,
    /// Event timestamp microseconds component.
    /// 64-bit for consistency with time_sec. Truncated to u32 at `copy_to_user` on the
    /// 32-bit compat read path (same truncation point as time_sec).
    pub time_usec: u64,
    /// Event type (EV_KEY, EV_REL, EV_ABS, etc.).
    pub type_: u16,
    /// Event code (key code, relative axis, absolute axis, etc.).
    pub code: u16,
    /// Event value (key state, relative delta, absolute position, etc.).
    pub value: i32,
}
// InputEvent: u64(8) + u64(8) + u16(2) + u16(2) + i32(4) = 24 bytes.
// Userspace ABI struct — delivered via read(2) on /dev/input/eventX.
const_assert!(core::mem::size_of::<InputEvent>() == 24);

When a user presses a key, the Tier 2 USB HID driver pushes an InputEvent into the shared ring and calls umka_driver_complete (Section 12.3). The UmkaOS Core's input multiplexer (umka-input) wakes up, reads the event, and copies it to all open file descriptors for the corresponding /dev/input/eventX node.

Input event ring buffer protocol:

Each /dev/input/eventX device uses a single-producer single-consumer (SPSC) ring buffer for kernel → userspace event delivery:

Ring capacity: Computed at open() time from dev.hint_events_per_packet * EVDEV_BUF_PACKETS, minimum EVDEV_MIN_BUFFER_SIZE = 64 events, rounded up to the next power of two — matching Linux's evdev_compute_buffer_size(). Immutable for the lifetime of the fd. UmkaOS extension: EVIOCSBUFSIZE ioctl allows resizing (minimum 64, maximum 4096 events); see "UmkaOS Extensions" below.
Synchronization: Kernel writes events using an AtomicU32 write index; userspace reads using an AtomicU32 read index. Both advance modulo ring capacity. Memory ordering: kernel stores events with Release; userspace loads with Acquire on the write index.
Batching: The kernel batches all events between two EV_SYN / SYN_REPORT markers as a single atomic update — the write index advances once after the complete event group. Userspace never sees a partial multi-axis touch event.
Overflow: When the ring is full (write_idx - read_idx >= capacity), the oldest unread events are dropped. The kernel injects EV_SYN / SYN_DROPPED to notify userspace. Userspace must re-sync all device state on receiving SYN_DROPPED (re-read axis values, key states).
Poll integration: poll() / epoll_wait() returns EPOLLIN when write_idx != read_idx.

Because the input driver is a standard Tier 2 process, a crash in the complex USB HID parsing logic simply restarts the driver process (~10ms recovery) without dropping subsequent keystrokes.

21.3.2 Input Device Registration¶

Device class drivers (USB HID, Bluetooth HID, I2C touchscreen, camera button, gamepad, etc.) register as input devices to emit events through the evdev interface (/dev/input/eventX). Registration connects hardware input sources to the userspace-visible evdev nodes.

/// Input device descriptor. Registered by device class drivers to connect
/// hardware input sources to the evdev userspace interface.
///
/// Each `InputDevice` represents one logical input source (e.g., one USB
/// keyboard, one touchpad). A single physical device may register multiple
/// `InputDevice` instances if it exposes multiple logical input paths
/// (e.g., a keyboard with an integrated touchpad registers one InputDevice
/// for keys and another for pointer events).
pub struct InputDevice {
    /// Human-readable device name (e.g., "USB Keyboard", "PS/2 Mouse").
    /// Exposed to userspace via `/sys/class/input/eventN/device/name` and
    /// the `EVIOCGNAME` ioctl.
    pub name: ArrayString<64>,

    /// Physical path (e.g., "usb-0000:00:14.0-1/input0").
    /// Identifies the hardware topology path. Exposed via `EVIOCGPHYS`.
    pub phys: ArrayString<64>,

    /// Device identity (bus type, vendor, product, version).
    /// Exposed via `EVIOCGID` ioctl. Userspace udev rules match on these
    /// fields to apply device-specific configuration.
    pub id: InputId,

    /// Capability bitmask: which event types this device can produce.
    /// Bit positions match the Linux EV_* constants:
    ///   EV_SYN=0x00, EV_KEY=0x01, EV_REL=0x02, EV_ABS=0x03,
    ///   EV_MSC=0x04, EV_SW=0x05, EV_LED=0x11, EV_SND=0x12,
    ///   EV_REP=0x14, EV_FF=0x15.
    /// Queried by userspace via `EVIOCGBIT(0, ...)`.
    pub ev_bits: u32,

    /// Per-event-type capability bitmaps. These detail WHICH codes within
    /// each event type the device supports (e.g., which KEY_* codes for
    /// EV_KEY, which REL_* axes for EV_REL). Queried via `EVIOCGBIT(type, ...)`.
    ///
    /// Stored as a fixed-size array of bitmaps. Only the types set in
    /// `ev_bits` have meaningful data; others are zeroed.
    pub key_bits: [u64; 12],   // 768 bits, covers KEY_MAX=0x2FF
    pub rel_bits: u32,          // REL_MAX=0x0F (16 bits needed)
    pub abs_bits: u64,          // ABS_MAX=0x3F (64 bits needed)

    /// Per-device broadcast event ring (single-producer, multi-reader).
    /// Sized for 64 events of backlog. The ring is allocated from slab
    /// memory during `input_register_device()` and freed on unregister.
    ///
    /// For Tier 2 drivers: the ring is in shared memory mapped into both
    /// the driver process and Core. For Tier 1 drivers: the ring is in
    /// Core memory, written via KABI ring buffer protocol.
    /// **Ring ownership**: This is the device-level ring populated by the input
    /// driver (producer). The write cursor advances unconditionally (never
    /// blocks on slow readers). Each `EvdevClient` (per-fd) has an independent
    /// `read_idx` cursor into this shared ring — the ring itself is NOT
    /// duplicated per fd. All clients read from the same ring; slow clients
    /// whose `read_idx` falls behind `write_idx` by more than the ring capacity
    /// lose events (oldest-first drop, reported via `SYN_DROPPED`).
    /// `BroadcastRing` differs from `SpscRing` in that the writer never
    /// waits for any reader to advance — readers independently track progress.
    /// See [Section 17.3](17-containers.md#posix-ipc--broadcastring) for the `BroadcastRing<T, N>` definition.
    pub event_ring: BroadcastRing<InputEvent, 64>,

    /// Per-device staging counter for atomic SYN_REPORT batching.
    /// Tracks the number of non-SYN events written since the last
    /// SYN_REPORT. Reset to 0 after each SYN_REPORT advances write_idx.
    /// Not user-visible; internal bookkeeping for the batching protocol
    /// described in `input_report_event()`. The staging write position
    /// is derived as `write_idx + staging_count`.
    pub staging_count: u32,
}

/// Input device identity. Matches the Linux `struct input_id` layout
/// exactly (8 bytes, no padding) for binary compatibility with the
/// `EVIOCGID` ioctl.
#[repr(C)]
pub struct InputId {
    /// Bus type: BUS_USB=0x03, BUS_BLUETOOTH=0x05, BUS_I2C=0x18,
    /// BUS_HOST=0x19, BUS_VIRTUAL=0x06, etc. Full list in Linux
    /// `include/uapi/linux/input.h`.
    pub bustype: u16,
    /// Vendor ID (USB VID, Bluetooth SIG company ID, etc.).
    pub vendor: u16,
    /// Product ID (USB PID, etc.).
    pub product: u16,
    /// Device version number (driver-defined).
    pub version: u16,
}
const_assert!(core::mem::size_of::<InputId>() == 8);

/// Opaque handle returned by `input_register_device()`. The driver retains
/// this handle to report events and must pass it to `input_unregister_device()`
/// on teardown. Internally, this is an index into the global `INPUT_DEVICES`
/// XArray (integer-keyed, O(1) lookup).
pub struct InputHandle(u32);

/// Register an input device. Returns a handle for event reporting.
///
/// Side effects:
/// 1. Allocates a minor number from the `INPUT_MINOR_POOL` (0..1023).
/// 2. Creates `/dev/input/eventN` via devtmpfs
///    ([Section 14.17](14-vfs.md#pipes-and-fifos)).
/// 3. Inserts the device into the global `INPUT_DEVICES` XArray
///    (keyed by minor number).
/// 4. Emits a `KOBJ_ADD` uevent for udev/eudevd to process
///    (creates symlinks like `/dev/input/by-id/...`).
///
/// # Errors
/// - `InputError::MinorExhausted`: all 1024 minor numbers are in use.
/// - `InputError::DevtmpfsError`: failed to create the device node.
pub fn input_register_device(dev: InputDevice) -> Result<InputHandle, InputError>

/// Unregister an input device (on driver unload or device disconnect).
///
/// Side effects:
/// 1. Removes `/dev/input/eventN` from devtmpfs.
/// 2. Wakes any blocked `read()` / `poll()` waiters with `ENODEV`.
/// 3. Removes the device from the `INPUT_DEVICES` XArray.
/// 4. Emits a `KOBJ_REMOVE` uevent.
/// 5. Releases the minor number back to `INPUT_MINOR_POOL`.
///
/// Any `EvdevClient` file descriptors still open on the device node
/// continue to exist but return `ENODEV` on subsequent `read()` / `ioctl()`.
pub fn input_unregister_device(handle: InputHandle)

/// Report a single input event. Called from device interrupt handler or
/// polling callback. Lock-free write to the device's SPSC ring.
///
/// **Atomic batching protocol**: Events are written to the ring data area
/// but the `write_idx` is NOT advanced until `EV_SYN / SYN_REPORT` is
/// reported, atomically committing all pending events in the batch.
/// Userspace never sees a partial multi-axis touch or partial key+syn pair.
///
/// Non-SYN events write to `ring.data[staging_idx]` and increment a
/// per-device `staging_count`. When `SYN_REPORT` arrives, the write_idx
/// is advanced by `staging_count + 1` (including the SYN event itself)
/// with a single `Release` store, making the entire batch visible atomically.
///
/// # Arguments
/// - `handle`: the device handle from `input_register_device()`.
/// - `type_`: event type (EV_KEY, EV_REL, EV_ABS, etc.).
/// - `code`: event code (KEY_A, REL_X, ABS_MT_POSITION_X, etc.).
/// - `value`: event value (1=press, 0=release for keys; delta for relative;
///   absolute position for absolute axes).
pub fn input_report_event(handle: &InputHandle, type_: u16, code: u16, value: i32)

/// Convenience: report a key press/release event with automatic SYN_REPORT.
///
/// Generates two events atomically written to the ring:
/// 1. `EV_KEY / code / (1 if pressed, 0 if released)`
/// 2. `EV_SYN / SYN_REPORT / 0`
///
/// For multi-event reports (e.g., multi-touch), drivers should use
/// `input_report_event()` directly and send `SYN_REPORT` once after
/// all axis values are written.
pub fn input_report_key(handle: &InputHandle, code: u16, pressed: bool) {
    input_report_event(handle, EV_KEY, code, if pressed { 1 } else { 0 });
    input_report_event(handle, EV_SYN, SYN_REPORT, 0);
}

evdev layer integration:

input_register_device() allocates /dev/input/eventN (major = 13, minor = EVDEV_MINOR_BASE + device index). Static range: minors 64-95 (first 32 devices); dynamic overflow: minors 256-1023 (shared with other input handlers, matching Linux INPUT_FIRST_DYNAMIC_DEV=256, INPUT_MAX_CHAR_DEVICES=1024). The character device is registered with the VFS via register_chrdev_region() (Section 14.5) with evdev_fops as the file operations.
open(): allocates a per-fd EvdevClient struct containing an independent read position into the device's event ring. Multiple userspace processes can open the same /dev/input/eventX simultaneously; each gets its own EvdevClient with an independent read cursor.
read(): dequeues events from the device's event_ring starting at the client's read position. Blocks (interruptibly) if the ring is empty. Returns events in struct input_event format (architecture-specific layout, see the InputEvent struct above for the compat translation).
poll(): returns EPOLLIN when event_ring.write_idx != client.read_idx.
ioctl(): supports the full Linux evdev ioctl set: EVIOCGVERSION, EVIOCGID, EVIOCGNAME, EVIOCGPHYS, EVIOCGBIT(type, ...), EVIOCGABS(axis), EVIOCGRAB, EVIOCREVOKE, EVIOCSCLOCKID. UmkaOS extension: EVIOCSBUFSIZE (see below).
Grab semantics (EVIOCGRAB): when a client grabs the device, all other clients stop receiving events (their rings are not written). Only one grab is active per device. Used by Wayland compositors to claim exclusive input.

UmkaOS Extensions (not present in Linux evdev):

EVIOCSBUFSIZE: Allows userspace to resize the per-client event buffer after open(). Linux computes the buffer size once at open() time via evdev_compute_buffer_size() and provides no mechanism to change it. UmkaOS adds EVIOCSBUFSIZE as an extension ioctl. The ioctl number uses bit 31 set (0x80000000 | _IOW('E', 0x90, u32)) to avoid collision with any current or future Linux evdev ioctl. Range: minimum 64 events, maximum 4096 events. A Linux application that does not call this ioctl sees identical behavior to Linux (buffer computed at open, immutable).

Global input device registry:

/// Evdev minor range constants (matching Linux drivers/input/evdev.c).
const EVDEV_MINOR_BASE: u32 = 64;
const EVDEV_MINORS: u32 = 32;

/// Global input device table. XArray keyed by minor number (64..95
/// static, 256+ dynamic). O(1) lookup for evdev open/read/ioctl paths.
static INPUT_DEVICES: LazyLock<XArray<InputDeviceEntry>> =
    LazyLock::new(|| XArray::new());

/// Minor number allocator for /dev/input/eventN devices.
/// Two-tier: first allocates from static range 64-95 (32 devices),
/// then overflows to dynamic range 256-1023 (matching Linux's
/// input_register_minor() scheme).
static INPUT_MINOR_POOL: LazyLock<TwoTierMinorAllocator> =
    LazyLock::new(|| TwoTierMinorAllocator::new(EVDEV_MINOR_BASE, EVDEV_MINORS, 256, 1024));

/// Per-fd state for an open evdev file descriptor. Each `open()` on
/// `/dev/input/eventN` allocates one `EvdevClient`. Multiple processes
/// (or multiple fds in one process) each get independent read cursors.
pub struct EvdevClient {
    /// Read cursor into the device's event ring. Tracks the position of the
    /// next unread event for this client. Updated on `read()`.
    /// **Longevity**: u32 with modular arithmetic. At 1000 events/sec, wraps
    /// after ~49.7 days. Modular u32 subtraction (`write_idx - read_idx`)
    /// correctly computes pending event count regardless of wrap, so wrap
    /// does not cause incorrect ring behavior. A client stalled for >49 days
    /// would see SYN_DROPPED on resume (ring overflow detection).
    pub read_idx: u32,
    /// Client-specific event mask: filters which event types are delivered.
    /// Set via `EVIOCSMASK` ioctl. Default: all events.
    pub evmask: [u64; 4],  // Bitmap covering EV_SYN..EV_MAX (0x1f)
    /// Clock ID for event timestamps: `CLOCK_REALTIME` (default),
    /// `CLOCK_MONOTONIC`, or `CLOCK_BOOTTIME`. Set via `EVIOCSCLOCKID`
    /// ioctl. `CLOCK_BOOTTIME` includes suspend time (Linux 4.17+);
    /// used by input libraries for gesture timeout calculations that
    /// survive suspend/resume. Any other clock ID returns `-EINVAL`.
    pub clock_id: i32,
    /// Buffer size for this client (events). Computed at `open()` from
    /// `evdev_compute_buffer_size()` and immutable for the fd lifetime
    /// (matching Linux). UmkaOS extension: adjustable via `EVIOCSBUFSIZE`
    /// ioctl (see "UmkaOS Extensions" below). Events beyond this limit
    /// are dropped (oldest first).
    pub buffer_size: u32,
    /// Link in the InputDeviceEntry.clients intrusive list.
    /// **RCU note**: The clients list is iterated under RCU read-side lock
    /// during event broadcast (IRQ context → input_event() → iterate clients).
    /// Client addition (open) and removal (close) are serialized by the
    /// device's `clients_lock` mutex and use `list_add_rcu()` / `list_del_rcu()`
    /// + `synchronize_rcu()` to ensure safe concurrent iteration.
    pub link: IntrusiveListNode,
    /// Wait queue entry for blocking read/poll.
    pub wait: WaitQueueEntry,
    /// True if this client has been revoked (device removed while fd open).
    /// Subsequent read/ioctl returns ENODEV.
    pub revoked: bool,
}

/// Per-device state stored in the INPUT_DEVICES XArray.
pub struct InputDeviceEntry {
    /// The registered device descriptor.
    pub dev: InputDevice,
    /// List of open EvdevClient instances (for event fan-out and grab tracking).
    ///
    /// **Policy exception**: Intrusive list used here (instead of ring) because
    /// N is small and bounded (<8).
    ///
    /// **IRQ path note**: The `input_event()` broadcast path iterates this
    /// list under RCU read-side lock in IRQ context. Iteration is O(N) in the
    /// number of open clients. Typical N: 1-3 (one compositor + optional
    /// libinput debug fd). Maximum expected N: <8 per device (a process per
    /// open fd; evdev devices rarely have more than a handful of readers).
    /// For this small N, the intrusive list has acceptable cache locality
    /// (clients are allocated close in time from the same slab page).
    /// This matches Linux's `evdev_event()` implementation which uses
    /// `struct list_head` iterated under RCU for the same fan-out pattern.
    pub clients: SpinLock<IntrusiveList<EvdevClient>>,
    /// Currently grabbing client (if any). Only this client receives events.
    pub grab: AtomicPtr<EvdevClient>,
}

21.3.3 Secure VT Switching and Panic Console¶

The Virtual Terminal (VT) subsystem provides the emergency text console and the mechanism for switching between graphical sessions (Ctrl+Alt+F1-F6).

In Linux, the VT subsystem is deeply entangled with the console driver, input layer, and DRM.

In UmkaOS, the VT subsystem is a minimal state machine inside umka-input: 1. Normal Operation: umka-input routes all input_event structs to the active Wayland compositor (the process holding the DRM master node). 2. VT Switch Detected: When umka-input detects a VT switch chord (e.g., Ctrl+Alt+F1), it immediately revokes the DRM master capability from the current compositor and pauses input event delivery to that process. 3. Panic Console Handoff: If the system panics, UmkaOS Core forcefully reclaims the display hardware from the Tier 1 DRM driver. It resets the display controller to a known-safe text mode (or simple framebuffer mode) using a minimal, statically linked Tier 0 VGA/EFI driver, and dumps the panic log. The complex Tier 1 DRM driver is completely bypassed during a panic to ensure the log is always visible, even if the GPU state machine is deadlocked.

21.3.3.1 VT Data Structures¶

// umka-core/src/vt/mod.rs

/// Maximum number of virtual consoles (matching Linux MAX_NR_CONSOLES = 63;
/// serial lines occupy indices 64+).
pub const MAX_NR_CONSOLES: usize = 63;

/// Global VT state. Singleton, initialized at boot.
pub struct VtState {
    /// Currently active VT number (1-based; default 1 at boot).
    /// Updated atomically during VT switch. 0 = no active VT (headless boot).
    pub active_vt: AtomicU8,
    /// Per-VT console state. Index 0 = VT 1, index 62 = VT 63.
    /// Each entry is independently locked to allow concurrent access
    /// to different VTs (e.g., background login on VT 2 while VT 1 is active).
    pub consoles: [SpinLock<VtConsole>; MAX_NR_CONSOLES],
}

/// TTY device state — canonical definition is `TtyPort` in
/// [Section 21.1](#tty-and-pty-subsystem--ttyport-core-struct). `TtyPort` includes
/// all fields listed here (dev, termios, ldisc, winsize, session, pgrp)
/// plus additional state (read/write buffers, driver_data, etc.).
/// VtConsole references `TtyPort` directly.
pub type TtyStruct = TtyPort;

/// DRM master handle. Grants exclusive modesetting access to a DRM device.
/// Only one DRM master is active per VT at a time.
pub struct DrmMaster {
    /// Authentication magic number (for legacy DRM auth protocol).
    pub auth_magic: u32,
    /// Unique identifier string for this master (set via DRM_IOCTL_SET_UNIQUE).
    pub unique: ArrayString<64>,
    /// File descriptor of the DRM device (/dev/dri/card0).
    pub master_fd: i32,
    /// Whether this master is currently the active master (has modesetting rights).
    pub is_active: bool,
}

/// Per-VT console state.
pub struct VtConsole {
    /// Controlling session (the login session or Wayland compositor owning this VT).
    /// `None` if the VT is unused.
    pub session_id: Option<SessionId>,
    /// Associated TTY device (e.g., `/dev/tty1`). `None` for graphical-only VTs.
    pub tty: Option<Arc<TtyStruct>>,
    /// Display mode.
    pub mode: VtMode,
    /// Keyboard input mode.
    pub kbd_mode: KbdMode,
    /// DRM master handle for this VT (the Wayland compositor's DRM master fd).
    /// `None` for text-mode VTs or VTs without a graphical session.
    /// On VT switch, the old VT's DRM master is revoked and the new VT's is granted.
    pub drm_master: Option<Arc<DrmMaster>>,
}

/// VT display mode (matches Linux KD_TEXT / KD_GRAPHICS).
#[repr(u32)]
pub enum VtMode {
    /// Text mode: kernel renders text console (fbcon or VGA text).
    KdText     = 0x00,
    /// Graphics mode: userspace (Wayland compositor) owns the display.
    /// Kernel does not write to the framebuffer.
    KdGraphics = 0x01,
}

/// Keyboard input mode (matches Linux `K_RAW` / `K_XLATE` / `K_MEDIUMRAW` /
/// `K_UNICODE` / `K_OFF` from `include/uapi/linux/kd.h`).
#[repr(u32)]
pub enum KbdMode {
    /// Raw scancode mode: scancodes passed directly to userspace.
    KRaw       = 0x00,
    /// Translated mode: scancodes → keysyms via keymap.
    KXlate     = 0x01,
    /// Medium-raw mode: scancodes with key up/down encoding.
    KMediumRaw = 0x02,
    /// Unicode mode: scancodes → UTF-8 via keymap (default for text VTs).
    KUnicode   = 0x03,
    /// Off mode: keyboard input disabled. Wayland compositors (wlroots, KWin,
    /// Mutter) set this via `ioctl(KDSKBMODE, K_OFF)` when taking VT control.
    /// Without this variant, `KDSKBMODE(4)` returns `-EINVAL`, preventing
    /// Wayland session startup.
    KOff       = 0x04,
}

VT switch protocol: When a VT switch is triggered (by ioctl(VT_ACTIVATE, n) or the keyboard chord Ctrl+Alt+F1..F12):

Validate target: Ensure 1 <= n <= MAX_NR_CONSOLES and the target VT exists.
Revoke old VT's DRM master: If the old VT has a drm_master, call drm_master_revoke() which sets the master's is_current flag to false, disabling modesetting ioctls. The old compositor's pending atomic commits are rejected with -EACCES.
Signal old session: Send SIGUSR1 to the old VT's controlling session (if VT_SETMODE was called with VT_PROCESS mode, enabling cooperative switching). If the old session does not acknowledge within 5 seconds, the switch proceeds forcibly (matching Linux vt_reset() timeout behavior).
Update active_vt: Atomically store the new VT number.
Grant new VT's DRM master: If the new VT has a drm_master, call drm_master_grant() which sets is_current to true and triggers a full modeset restore (the compositor's last committed atomic state is replayed).
Signal new session: Send SIGUSR2 to the new VT's controlling session.
Redirect input: umka-input updates its routing to deliver input_event structs to the new VT's session.

The keyboard chord (Ctrl+Alt+Fn) is intercepted in the umka-input keyboard processing path before events reach userspace. In KD_GRAPHICS mode, the chord is only honored if the compositor has not set K_OFF via KDSKBMODE (Wayland compositors typically set K_OFF and handle VT switching cooperatively via logind's TakeControl/ReleaseControl D-Bus protocol).

Panic console handoff procedure:

The full panic console path — including domain revocation, backend priority chain, netconsole panic transmit, and per-architecture isolation teardown — is specified in Section 21.2. Summary:

IRQs already disabled by the panic path before reaching this code.
Revoke all isolation domains (WRPKRU(0) on x86-64, equivalent on other architectures). All Tier 1 driver code becomes directly callable.
Call emergency_write() on each registered ConsoleBackend in priority order: emergency serial (priority 5), Tier 1 serial (priority 10), netconsole (priority 15). Failures silently ignored.
Fall through to Tier 0 emergency console (arch::current::serial::puts()) if all backends fail. Architecture-specific:
x86-64: COM1 serial (UART 16550, I/O port 0x3F8)
AArch64/ARMv7: PL011 UART (MMIO, base address from DTB)
RISC-V: SBI console extension (sbi_console_putchar)
PPC32/PPC64LE: OpenFirmware/OPAL console (opal_write)
s390x: SCLP console
LoongArch64: NS16550 UART
pstore persistence: pstore_kmsg_dump() writes log ring to non-volatile storage (Section 20.7).

The Tier 0 console path is entirely lock-free and allocation-free. It MUST work unconditionally at panic time, including when the panic was caused by a Tier 1 driver crash, memory corruption, or scheduler deadlock.

21.4 Audio Architecture (ALSA Compatibility)¶

Linux's Advanced Linux Sound Architecture (ALSA) provides the /dev/snd/pcmC0D0p interfaces for audio playback and capture.

21.4.1 ALSA PCM as DMA Rings¶

Audio devices are uniquely suited for UmkaOS's architecture because audio playback is fundamentally a ring buffer problem. Modern audio interfaces (Intel HDA, USB Audio Class 2.0) operate by reading PCM audio samples from a host memory ring buffer via DMA.

In UmkaOS, audio drivers (Tier 1 by default) do not implement complex ALSA state machines. Instead, an UmkaOS audio driver simply allocates an IOMMU-fenced DMA buffer (Section 4.14, umka_driver_dma_alloc) and programs the hardware to consume it.

When a userspace audio server (PipeWire or PulseAudio) opens the ALSA PCM node, umka-sysapi directly maps the hardware's DMA ring buffer into the PipeWire process's address space.

The Audio Data Path: 1. PipeWire writes PCM audio samples directly into the mapped DMA buffer in userspace. 2. PipeWire updates the ring buffer's "appl_ptr" (application pointer) in the shared memory control page. 3. The audio hardware consumes the samples via DMA and generates a period interrupt. 4. The kernel handles the interrupt, updates the "hw_ptr" (hardware pointer) in the shared control page, and wakes PipeWire via a futex.

Zero-Copy Routing: This architecture is purely zero-copy. The audio samples never pass through kernel memory, and the kernel never executes a copy_from_user(). The kernel's only role in the audio data path is routing the hardware interrupt to the PipeWire futex.

21.4.1.1 Xrun Handling (D25)¶

An xrun is a buffer underrun (playback) or overrun (capture) — the application failed to keep up with the real-time audio stream.

Underrun (playback: application fails to refill the DMA ring before the hardware consumes it): - The hardware continues running; the DMA ring outputs silence (zero samples) for the duration of the underrun. No explicit silence padding by the kernel is required — the hardware or DMA zeroes the consumed region. - The PCM state transitions to SNDRV_PCM_STATE_XRUN. - The next write() / snd_pcm_writei() call from the application returns -EPIPE. - The application must call snd_pcm_recover() or snd_pcm_prepare() to restart playback.

Overrun (capture: application fails to drain the DMA ring before it fills): - Incoming samples overwrite the oldest samples in the circular buffer; the oldest samples are silently dropped. - The PCM state transitions to SNDRV_PCM_STATE_XRUN. - The next read() / snd_pcm_readi() call returns -EPIPE. - The application must call snd_pcm_recover() or snd_pcm_prepare() to restart capture.

Recovery: snd_pcm_recover(pcm, -EPIPE, silent) calls snd_pcm_prepare() followed by snd_pcm_start() internally. The silent parameter suppresses error logging for expected xruns (e.g., during transient CPU load spikes).

No automatic recovery: UmkaOS does not silently recover from xruns on behalf of the application. The application is responsible for detecting -EPIPE and calling recover. This matches Linux ALSA behavior.

21.4.2 Audio Driver Tier Policy and Resilience¶

Audio drivers run in Tier 1 by default, as required for professional audio workloads with <5ms latency budgets where period interrupts fire every 1.3–42.7ms. This is consistent with the authoritative tier assignment in Section 13.4.

For consumer/desktop configurations where crash resilience is prioritized over latency, audio drivers may be optionally demoted to Tier 2. The demotion adds ~20–50μs syscall overhead per interrupt, which is acceptable at ≥10ms buffer periods but unacceptable for professional RT audio.

Audio drivers (especially USB Audio and complex DSPs) are prone to state machine bugs. Regardless of tier, an audio driver crash is seamlessly contained via the standard driver crash recovery mechanism (Section 11.9).

When an audio driver process crashes, the kernel's device registry (Section 11.4) revokes its MMIO mappings, leaving the DMA ring buffer intact. The registry restarts the driver process. The new driver instance re-initializes the hardware and binds back to the existing DMA ring buffer. PipeWire experiences a brief audio glitch but does not need to close and reopen the ALSA device, as the memory mapping remains valid throughout the recovery process.

Recovery time breakdown: The ~10-20ms total glitch comprises: (a) crash detection via page fault on revoked MMIO mapping (~0 — synchronous), (b) driver process restart including ELF load and re-initialization (~2-5ms), (c) hardware re-initialization including codec probe and DMA ring rebind (~5-15ms depending on hardware; USB Audio Class devices are at the high end due to USB control transfer latency). The glitch duration corresponds to 1-2 audio periods at typical buffer sizes (≥5ms periods). Professional RT configurations with ≤2ms periods may experience 2-5 dropped periods.

The 5–15 ms hardware re-init figure applies when the device supports soft reset — firmware reload without a USB port cycle. Full USB port reset requires T_RSTRCY ≥ 10 ms per USB 2.0 §11.2.6.2 (and ≥100 ms for USB 1.1 Full Speed devices), making full reset recovery 10–300 ms depending on USB version and device speed. Whether a device supports soft reset is detected at driver load time via AudioDevice::probe_soft_reset() and recorded in AudioDeviceCaps. Devices not supporting soft reset incur the full port-reset recovery time on crash reload.

21.4.3 Audio Device Trait¶

Interface contract: Section 13.4 (AudioDriver trait, audio_device_v1 KABI). This section specifies the Intel HDA, USB Audio Class, and HDMI/DP audio endpoint implementations of that contract. Tier decision and ALSA compat approach are authoritative in Section 13.4.

Architecture: Native UmkaOS audio driver framework with ALSA compatibility in umka-sysapi. The kernel provides a clean, low-latency PCM interface via the AudioDriver trait (Section 13.4). umka-sysapi translates snd_pcm_*/snd_ctl_* ioctls to native calls, enabling existing applications (PipeWire, PulseAudio, JACK) to work unmodified.

Audio types (AudioDeviceId, PcmDirection, PcmFormat, PcmParams, PcmStreamHandle) are defined canonically in Section 13.4. This section uses them for PCM stream management and the ALSA-compatible userspace interface.

/// PCM stream (active playback or capture).
pub struct PcmStream {
    /// Stream handle (opaque to the kernel; used as a key by the driver).
    pub handle: PcmStreamHandle,
    /// Handle to the registered AudioDriver vtable (from device registry).
    /// Used to dispatch start_stream/stop_stream back to the owning driver.
    pub driver_handle: KabiDriverHandle<AudioDriverVTable>,
    /// Parameters.
    pub params: PcmParams,
    /// DMA buffer (ring buffer, mapped into userspace via umka-sysapi).
    pub dma_buffer: DmaBufferHandle,
    /// Hardware pointer (read position for playback, write position for capture).
    /// Updated by hardware via DMA or interrupt. Atomic for lock-free read from userspace.
    pub hw_ptr: Arc<AtomicU64>,
    /// Application pointer (write position for playback, read position for capture).
    /// Updated by userspace (PipeWire, ALSA lib).
    pub appl_ptr: Arc<AtomicU64>,
}

impl PcmStream {
    /// Start the stream (begin DMA).
    ///
    /// Programs the hardware DMA engine to transfer audio data between the
    /// ring buffer and the codec. For playback, the hardware reads from
    /// `dma_buffer[hw_ptr..appl_ptr]`. For capture, the hardware writes to
    /// `dma_buffer[hw_ptr..]`.
    ///
    /// The caller (ALSA compat layer or PipeWire bridge) must ensure sufficient
    /// data is buffered before calling start (playback) or that the buffer has
    /// space (capture). Returns `AudioError::Underrun` or `AudioError::Overrun`
    /// if preconditions are not met.
    ///
    /// This method delegates to the driver via the `AudioDriver` KABI trait
    /// ([Section 13.4](13-device-classes.md#audio-subsystem)). The driver configures DMA scatter-gather from
    /// `self.dma_buffer` and sets the RUN bit in the stream descriptor register.
    pub fn start(&self) -> Result<(), AudioError> {
        // The actual hardware programming is performed by the AudioDriver
        // implementation behind the KABI vtable. The PcmStream is a handle
        // that the driver created in open_pcm(); the driver retains the
        // hardware references needed to configure DMA and stream registers.
        // The handle is passed back to the driver via the KABI start_stream()
        // call, which matches self.handle to the driver's internal state.
        //
        // Dispatch via the device registry: look up the AudioDriver vtable for
        // the device this stream belongs to (stored in self.driver_handle on open),
        // then call start_stream via the vtable pointer.
        let vtable = self.driver_handle.vtable();
        // SAFETY: vtable pointer is valid for the lifetime of the registered driver.
        unsafe { (vtable.start_stream)(self.handle) }
    }

    /// Stop the stream (pause DMA).
    ///
    /// Clears the RUN bit, waits for DMA to drain (up to 1 period), and
    /// resets the hardware pointer. The DMA buffer remains mapped — the
    /// stream can be restarted without re-opening.
    pub fn stop(&self) -> Result<(), AudioError> {
        let vtable = self.driver_handle.vtable();
        // SAFETY: vtable pointer is valid for the lifetime of the registered driver.
        unsafe { (vtable.stop_stream)(self.handle, false) } // immediate stop, no drain
    }
}

/// Mixer control (volume slider, mute toggle, input source selector).
// kernel-internal, not KABI — internal mixer state, translated to snd_ctl_elem_value
// at the ioctl boundary. Never exposed directly to userspace.
#[repr(C)]
pub struct MixerControl {
    /// Control ID (for set_mixer_control).
    pub id: u32,
    /// Control type.
    pub control_type: MixerControlType,
    /// Name (e.g., "Master Playback Volume").
    pub name: [u8; 64],
    /// Min value (for volume controls).
    pub min: i32,
    /// Max value (for volume controls).
    pub max: i32,
    /// For Enum-type controls: number of valid items (0..num_enum_items-1).
    /// The `value` field must be in range [0, num_enum_items). For non-enum
    /// controls, this field is 0. Used for range validation on set_mixer_control:
    /// attempts to set an enum value >= num_enum_items return -EINVAL.
    pub num_enum_items: u32,
    /// Current value. Signed per ALSA `snd_ctl_elem_value` (signed long values).
    /// Volume controls use negative dB offsets; mute is 0.
    pub value: AtomicI32,
}

/// Mixer control type.
#[repr(u32)]
pub enum MixerControlType {
    /// Volume (integer range, min..max).
    Volume = 0,
    /// Mute (boolean, 0=unmuted, 1=muted).
    Mute = 1,
    /// Enumeration (e.g., input source: "Mic", "Line In", "CD").
    Enum = 2,
}

21.4.3.1 PCM DMA Buffer Lifecycle¶

The PcmStream.dma_buffer field references a coherent DMA buffer allocated and managed by the kernel on behalf of the audio driver. The lifecycle is:

Allocation: When userspace opens a PCM device and issues SNDRV_PCM_IOCTL_HW_PARAMS, the kernel allocates a DMA buffer via dma_alloc_coherent() (Section 4.14). The buffer size is periods * period_size_bytes, derived from the negotiated PcmParams. Constraints: minimum 2 periods (required for double-buffering — the hardware reads one period while the application fills the next), maximum 1 MiB per stream (prevents a single PCM device from exhausting DMA-capable memory; professional multi-channel configurations at 192 kHz / 32-bit with 8 periods fit within this limit). The IOMMU mapping is established at allocation time, restricting DMA to the allocated region only (Section 4.14 IOMMU integration).

Userspace mmap: The DMA buffer is exposed to userspace via mmap() on the PCM file descriptor at three well-known offsets (matching Linux ALSA ABI):

Offset	Constant	Content
`0x0000_0000`	`SNDRV_PCM_MMAP_OFFSET_DATA`	PCM sample data (DMA ring buffer)

Status and control page offsets are architecture-dependent (matching Linux include/uapi/sound/asound.h). 64-bit platforms use the NEW offsets introduced in Linux v5.x, which select __snd_pcm_mmap_status64 / __snd_pcm_mmap_control64 with 64-bit timestamps (y2038-safe). 32-bit platforms use the OLD offsets for backward compatibility with 32-bit timestamp layouts.

Architecture	`SNDRV_PCM_MMAP_OFFSET_STATUS`	`SNDRV_PCM_MMAP_OFFSET_CONTROL`	Layout
x86-64, AArch64, RISC-V 64, PPC64LE, s390x, LoongArch64	`0x8200_0000` (NEW)	`0x8300_0000` (NEW)	`snd_pcm_mmap_status64` (64-bit `tstamp`)
ARMv7, PPC32	`0x8000_0000` (OLD)	`0x8100_0000` (OLD)	`snd_pcm_mmap_status` (32-bit `tstamp`)

For reference, Linux defines all four constants in include/uapi/sound/asound.h: - SNDRV_PCM_MMAP_OFFSET_STATUS_OLD = 0x8000_0000 - SNDRV_PCM_MMAP_OFFSET_CONTROL_OLD = 0x8100_0000 - SNDRV_PCM_MMAP_OFFSET_STATUS_NEW = 0x8200_0000 (default on 64-bit) - SNDRV_PCM_MMAP_OFFSET_CONTROL_NEW = 0x8300_0000 (default on 64-bit)

UmkaOS accepts both OLD and NEW offsets on all platforms for forward compatibility (applications compiled against older headers still work). The mmap() handler checks the offset value and selects the appropriate status/control page layout. On 32-bit targets, NEW offsets return EINVAL (no 64-bit timestamp support in the 32-bit ABI).

These offsets are defined as u32 constants. On LP64 platforms, they are zero-extended to off_t (i64) for the mmap() offset parameter.

The status and control pages are single 4 KiB pages shared between kernel and userspace. hw_ptr and appl_ptr are updated atomically (64-bit atomic stores on all architectures; 32-bit platforms use SeqLock or doubleword CAS). The SNDRV_PCM_IOCTL_SYNC_PTR ioctl provides an explicit synchronization path for applications that do not mmap the status/control pages.

Per-architecture DMA coherency:

Platform	DMA Coherency	Buffer Mapping
x86-64	Hardware-coherent (all PCIe devices)	Normal cacheable WB mapping
AArch64 (with CCI/CMN)	Hardware-coherent	Normal cacheable mapping
AArch64 (without CCI)	Non-coherent	Device-nGnRnE (uncached) or explicit cache maintenance via `dma_sync_for_cpu` / `dma_sync_for_device`
ARMv7	Non-coherent (typical)	Uncached mapping (`MT_UNCACHED`) or explicit `dma_sync_*` barriers
RISC-V	Platform-dependent	Non-coherent platforms use uncached mappings; coherent platforms (with IOPMP or AIA IOMMU) use cacheable
PPC32/PPC64LE	Hardware-coherent (cache-inhibited via WIMG bits)	Guarded + cache-inhibited mapping (`WIMG=0101`)

On non-coherent platforms, the dma_alloc_coherent() path in Section 4.14 automatically selects uncached mappings, so audio drivers need no explicit cache management. The userspace mmap inherits the same caching attributes as the kernel mapping.

Teardown: On SNDRV_PCM_IOCTL_HW_FREE or PCM device close (close(fd)):

DMA engine is stopped (RUN bit cleared, wait for current period to complete).
Userspace mmap is revoked (VMA removed from the process address space).
IOMMU mapping is removed (the device can no longer DMA to/from the buffer).
DMA buffer is freed via dma_free_coherent().

Crash recovery: If a Tier 1 audio driver crashes (Section 11.9), the kernel forcibly reclaims the DMA region: the IOMMU mapping is revoked immediately (preventing the crashed driver's hardware from issuing further DMA), the userspace mmap remains valid (the pages are still mapped, but DMA has stopped — PipeWire sees silence). The restarted driver re-initializes hardware and rebinds to the existing DMA buffer, resuming playback with a brief glitch (see Section 21.4).

21.4.4 Intel HDA Driver Model¶

Intel High Definition Audio (HDA) is the dominant audio controller on Intel and AMD x86 platforms. The HDA spec defines: - HDA controller: PCI device (class 0x0403), exposes MMIO registers for command/response, DMA buffer descriptors, interrupt status. - Codecs: Audio chips connected via the HDA link (typically 1-2 codecs: one for analog audio, one for HDMI/DP audio). Each codec has a tree of widgets (nodes: DAC, ADC, mixer, pin, amplifier).

// umka-hda-driver/src/lib.rs (Tier 1 driver, optionally Tier 2)

/// Maximum number of codecs on a single HDA link (HDA spec allows 0-14).
pub const MAX_HDA_CODECS: usize = 15;

/// Maximum concurrent PCM streams per controller (limited by HDA stream
/// descriptor count; typical controllers support 4-16 bidirectional streams).
pub const MAX_HDA_STREAMS: usize = 16;

/// HDA controller state.
/// Uses fixed-capacity arrays to avoid heap allocation during audio playback.
/// Stream open/close modifies the array in-place without reallocation.
pub struct HdaController {
    /// PCI device.
    pub pci_dev: PciDevice,
    /// MMIO base address (from BAR0).
    pub mmio: *mut HdaRegisters,
    /// Codecs discovered on the HDA link.
    pub codecs: ArrayVec<HdaCodec, MAX_HDA_CODECS>,
    /// Active PCM streams.
    pub streams: ArrayVec<Arc<HdaPcmStream>, MAX_HDA_STREAMS>,
}

/// HDA codec (represents one audio chip on the HDA link).
pub struct HdaCodec {
    /// Codec address (0-14).
    pub addr: u8,
    /// Vendor ID (from root node).
    pub vendor_id: u32,
    /// Function groups discovered via GET_SUBORDINATE_NODE_COUNT on root node.
    /// Bounded by HDA spec: max 1 Audio Function Group + 1 Modem Function Group per codec.
    pub function_groups: ArrayVec<HdaFunctionGroup, 4>,
}

/// HDA function group (container for related widgets within a codec).
pub struct HdaFunctionGroup {
    /// Node ID (NID) of this function group.
    pub nid: u8,
    /// Widgets within this function group.
    /// Bounded by HDA spec: max 255 widgets per function group (NID range 8-bit).
    pub widgets: ArrayVec<HdaWidget, 256>,
}

/// HDA widget (node in codec's audio routing graph).
pub struct HdaWidget {
    /// Node ID (NID).
    pub nid: u8,
    /// Widget type (output, input, mixer, selector, pin, etc.).
    /// Decoded from bits [23:20] of the Audio Widget Capabilities parameter
    /// returned by the GET_PARAMETER verb (parameter ID 0x09).
    pub widget_type: HdaWidgetType,
    /// Capabilities (from GET_PARAMETER verb).
    pub capabilities: u32,
}

/// HDA widget type.
#[repr(u8)]
pub enum HdaWidgetType {
    /// Audio output (DAC - Digital-to-Analog Converter).
    AudioOut = 0,
    /// Audio input (ADC - Analog-to-Digital Converter).
    AudioIn = 1,
    /// Mixer (combines multiple inputs).
    Mixer = 2,
    /// Selector (mux: selects one of multiple inputs).
    Selector = 3,
    /// Pin (physical connector: headphone jack, speaker, mic).
    Pin = 4,
    /// Power widget.
    Power = 5,
    /// Volume knob.
    VolumeKnob = 6,
    /// Vendor-specific.
    VendorDefined = 15,
}

impl HdaWidgetType {
    /// Decode widget type from the Audio Widget Capabilities parameter (bits [23:20]).
    /// Per HDA spec section 7.3.4.6: bits [23:20] encode the widget type.
    pub fn from_caps(caps: u32) -> Self {
        match (caps >> 20) & 0xF {
            0 => Self::AudioOut,
            1 => Self::AudioIn,
            2 => Self::Mixer,
            3 => Self::Selector,
            4 => Self::Pin,
            5 => Self::Power,
            6 => Self::VolumeKnob,
            15 => Self::VendorDefined,
            _ => Self::VendorDefined, // Unknown types treated as vendor-defined
        }
    }
}

impl HdaController {
    /// Send a verb (command) to a codec. Returns the response.
    /// HDA verbs use CORB (Command Outbound Ring Buffer) and RIRB (Response Inbound Ring Buffer).
    pub fn send_verb(&self, codec_addr: u8, nid: u8, verb: u32) -> Result<u32, HdaError> {
        // Write to CORB: codec_addr | nid | verb.
        // Wait for RIRB: response appears in ring buffer, signaled by interrupt or polling.
        // Encode verb: bits [31:28] = codec_addr, [27:20] = nid, [19:0] = verb payload.
        let command = ((codec_addr as u32) << 28) | ((nid as u32) << 20) | (verb & 0xF_FFFF);
        // Write command to next CORB slot, advance CORB write pointer.
        let wp = self.corb_advance_wp();
        unsafe { self.corb_base.add(wp).write_volatile(command) };
        // Poll RIRB read pointer until response arrives (timeout: 1ms).
        let response = self.rirb_poll_response(core::time::Duration::from_millis(1))?;
        Ok(response)
    }

    /// Probe codecs on the HDA link.
    pub fn probe_codecs(&mut self) -> Result<(), HdaError> {
        // Read STATESTS register to discover codec addresses (bit set = codec present).
        let statests = unsafe { (*self.mmio).statests };
        for addr in 0..15 {
            if (statests & (1 << addr)) != 0 {
                // Codec present: read vendor ID, build widget tree.
                let vendor_id = self.send_verb(addr, 0, VERB_GET_VENDOR_ID)?;
                let codec = self.build_codec(addr, vendor_id)?;
                self.codecs.push(codec);
            }
        }
        Ok(())
    }

    /// Build widget tree for a codec (enumerate all nodes, parse capabilities).
    fn build_codec(&self, addr: u8, vendor_id: u32) -> Result<HdaCodec, HdaError> {
        // Send GET_SUBORDINATE_NODE_COUNT to root (NID 0) to discover function groups.
        // Send GET_SUBORDINATE_NODE_COUNT to each function group to discover widgets.
        // For each widget, send GET_PARAMETER to read capabilities.
        // Root node (NID 0): get subordinate node count to discover function groups.
        let sub = self.send_verb(addr, 0, VERB_GET_SUBORDINATE_NODE_COUNT)?;
        let fg_start = (sub >> 16) as u8;
        let fg_count = (sub & 0xFF) as u8;
        let mut codec = HdaCodec { addr, vendor_id, function_groups: ArrayVec::new() };
        for fg_nid in fg_start..fg_start + fg_count {
            // Each function group: enumerate child widgets.
            let fg_sub = self.send_verb(addr, fg_nid, VERB_GET_SUBORDINATE_NODE_COUNT)?;
            let w_start = (fg_sub >> 16) as u8;
            let w_count = (fg_sub & 0xFF) as u8;
            let mut widgets = ArrayVec::new();
            for w_nid in w_start..w_start + w_count {
                let caps = self.send_verb(addr, w_nid, VERB_GET_PARAMETER(PARAM_AUDIO_WIDGET_CAP))?;
                let wtype = HdaWidgetType::from_caps(caps);
                widgets.push(HdaWidget { nid: w_nid, widget_type: wtype, capabilities: caps });
            }
            codec.function_groups.push(HdaFunctionGroup { nid: fg_nid, widgets });
        }
        Ok(codec)
    }
}

DMA buffer descriptor list (BDLIST): HDA uses a scatter-gather DMA model. Each PCM stream has a BDLIST (Buffer Descriptor List) in host memory, containing entries like:

/// HDA Buffer Descriptor List Entry (BDL entry).
#[repr(C)]
pub struct HdaBdlEntry {
    /// Physical address of buffer segment.
    pub addr: u64,
    /// Length of buffer segment in bytes.
    pub length: u32,
    /// IOC (Interrupt On Completion) flag. Bit 0 only; upper 31 bits reserved per HDA
    /// spec §4.4.3 and must be written as zero. Set bit 0 to 1 to generate an interrupt
    /// when this segment completes; set to 0 for no interrupt on this entry.
    pub ioc: u32,
}
// HdaBdlEntry: u64(8) + u32(4) + u32(4) = 16 bytes.
// Hardware-facing struct — HDA controller reads BDL entries via DMA.
const_assert!(core::mem::size_of::<HdaBdlEntry>() == 16);

The HDA controller DMA engine walks the BDLIST, fetching audio data from the buffers, and generates an interrupt when ioc=1 entries complete (every period).

HDA PCM stream state: Each active PCM stream on an HDA controller is represented by HdaPcmStream, which binds a generic PcmStream (Section 21.4) to HDA-specific hardware state:

/// HDA PCM stream — binds a generic PcmStream to HDA controller hardware.
/// One instance per active playback or capture stream on the HDA controller.
/// Referenced by `HdaController.streams` (max `MAX_HDA_STREAMS` = 16 per controller).
pub struct HdaPcmStream {
    /// Parent PCM stream (generic ALSA state: params, DMA buffer, hw_ptr/appl_ptr).
    pub pcm: Arc<PcmStream>,
    /// HDA stream descriptor index (0-based, max 30 per controller).
    /// The HDA spec allocates stream descriptors in MMIO space at offset
    /// 0x80 + (stream_idx * 0x20). Typical controllers expose 4-16 descriptors.
    pub stream_idx: u8,
    /// HDA stream tag (1-15, assigned by the controller at stream open time).
    /// The tag is written into the codec's converter widget via the
    /// SET_CHANNEL_STREAMID verb and into the stream descriptor's CTL register.
    /// Tag 0 is reserved (means "stream not running" per HDA spec §3.3.35).
    pub stream_tag: u8,
    /// Buffer Descriptor List (BDL): scatter-gather DMA entries.
    /// Pre-allocated coherent DMA buffer of 32 entries (matching Linux
    /// `AZALIA_MAX_BDL_ENTRIES` / `AZX_MAX_BDL_ENTRIES`). Each entry points
    /// to a page-aligned segment of the PCM DMA buffer. The hardware reads
    /// BDL entries sequentially, wrapping at `bdl_count`.
    pub bdl: DmaCoherentBuf<[HdaBdlEntry; 32]>,
    /// Number of active BDL entries (1..=32). Set during hw_params based on
    /// buffer size and page alignment.
    pub bdl_count: u8,
    /// Codec DAC/ADC widget node ID in the HDA codec graph.
    /// Identified during codec probe by walking the widget tree from pin
    /// widgets back to converter widgets.
    pub codec_node: HdaNodeId,
    /// Codec address (0-14) on the HDA link. Combined with `codec_node`
    /// to address verbs for this stream's converter widget.
    pub codec_addr: u8,
    /// Link Position In Buffer register offset (MMIO, per-stream).
    /// Read by the interrupt handler to update `pcm.hw_ptr`. Located at
    /// stream descriptor base + 0x04 (LPIB register, HDA spec §3.3.37).
    pub lpib_offset: usize,
    /// DMA channel assignment (controller-internal; maps to stream descriptor).
    pub dma_channel: u8,
}

/// HDA codec node identifier.
// kernel-internal, not KABI — internal codec addressing.
#[repr(C)]
pub struct HdaNodeId {
    /// Node ID (NID) within the codec (0-127 per HDA spec).
    pub nid: u8,
}

The BDL entries point to segments of the PCM DMA buffer allocated in Section 21.4. At SNDRV_PCM_IOCTL_PREPARE time, the driver populates the BDL: each entry's addr field is set to the physical address of a page-aligned buffer segment, length to the segment size (typically one page = 4096 bytes), and ioc bit 0 is set on period boundary entries to generate interrupts. The BDL physical address is written to the stream descriptor's BDLPL/BDLPU registers (lower/upper 32 bits), and the BDL entry count to the LVI (Last Valid Index) register. On SNDRV_PCM_IOCTL_START, the driver sets the RUN bit in the stream descriptor's CTL register, and the hardware begins DMA.

21.4.5 USB Audio Class 2.0 Driver Model¶

USB Audio Class (UAC) 2.0 devices are the dominant class of bus-powered and professional USB audio interfaces (studio DACs, microphones, multichannel interfaces). The UAC 2.0 spec (USB Device Class Definition for Audio Devices, Release 2.0) defines isochronous endpoints for PCM streaming and control requests for sample rate, volume, and mute.

Tier assignment: USB Audio drivers run as Tier 1 by default (same as HDA). USB Audio is crash-prone due to the complexity of device-specific quirks and the asynchronous nature of USB transfers. Crash recovery follows the standard driver restart mechanism (Section 11.9), with the added cost of a USB port reset (10–300ms depending on USB version; see Section 21.4).

// umka-usb-audio-driver/src/lib.rs (Tier 1 driver)

/// Maximum isochronous endpoints per USB Audio interface (UAC 2.0 §4.9).
/// Most devices expose 1 playback + 1 capture endpoint; multichannel
/// interfaces may expose up to 4 (e.g., 2 stereo pairs or 1 multichannel).
pub const MAX_UAC_ENDPOINTS: usize = 8;

/// Maximum alternate settings per streaming interface (USB spec §9.6.5).
/// Each altsetting represents a different sample format/rate/channel count.
pub const MAX_UAC_ALTSETTINGS: usize = 16;

/// USB Audio Class 2.0 controller state.
pub struct UacDevice {
    /// USB device handle (from USB core driver framework).
    pub usb_dev: UsbDeviceHandle,
    /// Audio Control (AC) interface number (bInterfaceNumber from
    /// the AC Interface Header Descriptor, UAC 2.0 §4.7.2).
    pub ac_interface: u8,
    /// Audio Streaming (AS) interfaces discovered during probe.
    /// Each AS interface corresponds to one playback or capture endpoint.
    pub as_interfaces: ArrayVec<UacStreamInterface, MAX_UAC_ENDPOINTS>,
    /// Clock source entity ID (from Clock Source Descriptor, UAC 2.0 §4.7.2.1).
    /// Used for sample rate control via SET_CUR/GET_CUR on the Clock Frequency
    /// control (CS = 0x01, CN = 0x01).
    pub clock_source_id: u8,
    /// Device supports asynchronous mode (adaptive/async feedback endpoint).
    /// Async mode devices provide a feedback endpoint (UAC 2.0 §3.16.2.2)
    /// that reports the actual sample rate to the host for clock drift correction.
    pub async_mode: bool,
    /// Feedback endpoint address (valid only if `async_mode == true`).
    /// The host reads 10.14 or 16.16 fixed-point feedback values from this
    /// endpoint to adjust the number of samples per microframe.
    pub feedback_ep: Option<u8>,
    /// Device quirks (vendor-specific workarounds).
    pub quirks: UacQuirks,
}

/// Per-streaming-interface state (one per playback or capture endpoint).
pub struct UacStreamInterface {
    /// USB interface number (bInterfaceNumber).
    pub interface_num: u8,
    /// Direction: playback (OUT endpoint) or capture (IN endpoint).
    pub direction: PcmDirection,
    /// Endpoint address (bEndpointAddress from the AS Isochronous
    /// Audio Data Endpoint Descriptor, UAC 2.0 §4.10.1.2).
    pub endpoint_addr: u8,
    /// Alternate settings (each defines a format/rate/channel combination).
    /// Altsetting 0 is always zero-bandwidth (no active streaming).
    pub altsettings: ArrayVec<UacAltsetting, MAX_UAC_ALTSETTINGS>,
    /// Currently selected alternate setting index (0 = idle).
    pub current_altsetting: u8,
}

/// One alternate setting for a USB Audio streaming interface.
pub struct UacAltsetting {
    /// Alternate setting number (bAlternateSetting).
    pub altsetting_num: u8,
    /// Sample format (from Format Type I Descriptor, UAC 2.0 §4.9.2).
    pub format: PcmFormat,
    /// Number of channels (bNrChannels).
    pub channels: u8,
    /// Supported sample rates. UAC 2.0 devices report rates via the
    /// Clock Source's frequency control (GET_RANGE request returns a
    /// list of discrete rates or a continuous range).
    pub sample_rates: ArrayVec<u32, 16>,
    /// Maximum packet size in bytes (wMaxPacketSize from endpoint descriptor).
    /// Determines the URB buffer size for isochronous transfers.
    pub max_packet_size: u16,
    /// Packets per microframe (1, 2, or 3 for high-speed; encoded in
    /// bits [12:11] of wMaxPacketSize). Determines bandwidth reservation.
    pub packets_per_microframe: u8,
}

bitflags! {
    /// Device-specific quirks for USB Audio devices that deviate from the
    /// UAC 2.0 spec. Discovered at probe time via USB VID/PID lookup table.
    pub struct UacQuirks: u32 {
        /// Device reports incorrect clock frequency (apply host-side rate detection).
        const BROKEN_CLOCK       = 0x0001;
        /// Device requires SET_INTERFACE before SET_CUR for sample rate.
        const RATE_BEFORE_FORMAT = 0x0002;
        /// Device stalls on GET_RANGE for clock frequency (use fixed rate list).
        const NO_CLOCK_RANGE     = 0x0004;
        /// Feedback endpoint returns values in 10.14 format (USB 2.0 Full Speed)
        /// even on High Speed where 16.16 is expected.
        const FEEDBACK_10_14     = 0x0008;
        /// Device requires explicit clock source selection via
        /// Clock Selector SET_CUR before streaming starts.
        const EXPLICIT_CLOCK_SEL = 0x0010;
    }
}

Isochronous URB submission: USB Audio streaming uses isochronous USB transfers (guaranteed bandwidth, no retransmission). The driver submits URBs (USB Request Blocks) in a double-buffering pattern: while one URB is being consumed by the hardware, the next is being filled by the host. For playback, the host writes PCM samples into URB buffers; for capture, the host reads samples from completed URBs.

uac_start_streaming(dev: &UacDevice, iface: &UacStreamInterface, params: &PcmParams):
  1. Select alternate setting matching params (format, rate, channels).
     usb_set_interface(dev.usb_dev, iface.interface_num, altsetting_num)
  2. Set sample rate on clock source:
     usb_control_msg(SET_CUR, Clock Frequency Control, clock_source_id, rate)
  3. Allocate URB ring (double-buffer: 2 URBs for low-latency, up to 4 for robustness).
     Each URB buffer = max_packet_size * packets_per_microframe bytes.
     URB buffers allocated via umka_driver_dma_alloc (coherent DMA for USB HCI).
  4. Submit initial URBs to USB HCI (host controller interface).
  5. On URB completion interrupt:
     - Playback: copy next period's samples from PcmStream.dma_buffer to new URB,
       resubmit URB. Update hw_ptr by the number of frames transferred.
     - Capture: copy received samples from completed URB to PcmStream.dma_buffer,
       resubmit URB. Update hw_ptr.
     - If async_mode: read feedback endpoint, adjust samples-per-packet to match
       device clock (prevents drift-induced xruns on long playback sessions).
  6. Wake PipeWire/ALSA waiter via futex on hw_ptr update.

Clock drift correction (async mode): USB Audio async devices have their own crystal oscillator. The host and device clocks drift relative to each other (~50-200 ppm). Without correction, this causes periodic xruns every few minutes. The feedback endpoint reports the device's actual consumption rate as a fixed-point value. The driver adjusts the number of samples per USB microframe (125μs at high speed) to track the device clock: if the device is consuming faster, the driver sends one extra sample per N microframes; if slower, it skips one. This adjustment is invisible to userspace — PipeWire sees a steady hw_ptr advance.

21.4.6 HDMI/DP Audio Endpoint Model¶

HDMI and DisplayPort carry audio alongside video. On most x86 systems, HDMI/DP audio appears as a secondary codec on the Intel HDA link (the GPU's HDA controller, separate from the PCH's analog audio HDA controller). On systems with discrete GPUs (NVIDIA, AMD), the GPU exposes its own HDA controller on the PCI bus.

Architecture: HDMI/DP audio is not a separate driver — it is a specialization of the HDA driver model (Section 21.4). The HDA codec probe discovers HDMI/DP pin widgets (widget type = Pin, pin config indicates digital output with HDMI/DP connection type). Each HDMI/DP pin maps to one audio endpoint on a physical connector.

/// HDMI/DP audio endpoint state (extension of HdaPcmStream for digital outputs).
/// One instance per HDMI/DP connector that has an active audio stream.
pub struct HdmiDpAudioEndpoint {
    /// Parent HDA PCM stream (reuses HDA DMA infrastructure).
    pub hda_stream: Arc<HdaPcmStream>,
    /// Pin widget NID for this HDMI/DP output.
    pub pin_nid: u8,
    /// Codec address on the HDA link.
    pub codec_addr: u8,
    /// ELD (EDID-Like Data): audio capabilities reported by the connected
    /// display/AV receiver. Parsed from the ELD buffer obtained via the
    /// GET_HDMI_ELD verb (vendor-specific) or HDA spec standard ELD retrieval.
    /// Contains: supported audio formats, sample rates, channel counts,
    /// speaker allocation, display name.
    pub eld: HdmiEld,
    /// Current audio infoframe (HDMI Audio InfoFrame or DP Secondary Data Packet).
    /// Sent to the sink to describe the active audio stream format.
    /// Updated on stream start and format change.
    pub audio_infoframe: AudioInfoFrame,
    /// Connection state (hot-plug detect status).
    pub connected: AtomicBool,
}

/// ELD (EDID-Like Data) parsed from the connected HDMI/DP sink.
/// Contains the audio capabilities negotiated during display hot-plug.
pub struct HdmiEld {
    /// ELD version (typically 0x02 for CEA-861-D and later).
    pub eld_ver: u8,
    /// Monitor name (from EDID, up to 16 bytes, null-terminated).
    pub monitor_name: [u8; 16],
    /// Number of Short Audio Descriptors (SADs) from the sink's EDID.
    /// Each SAD describes one supported audio format (codec, channels, rates).
    pub sad_count: u8,
    /// Short Audio Descriptors. Maximum 15 per CEA-861 spec.
    pub sads: ArrayVec<ShortAudioDescriptor, 15>,
    /// Speaker Allocation Data Block (from EDID). Bitfield indicating which
    /// speaker positions the sink supports (FL/FR, C, LFE, RL/RR, etc.).
    /// Used to configure channel mapping in the Audio InfoFrame.
    pub speaker_alloc: u8,
}

/// CEA-861 Short Audio Descriptor (3 bytes, parsed from EDID).
pub struct ShortAudioDescriptor {
    /// Audio format code (1 = LPCM, 2 = AC-3, 7 = DTS, 11 = DTS-HD, etc.).
    /// See CEA-861 Table 37.
    pub format_code: u8,
    /// Maximum number of channels minus 1 (0 = mono, 1 = stereo, 7 = 8ch).
    pub max_channels: u8,
    /// Supported sample rates (bitfield: bit 0 = 32kHz, 1 = 44.1kHz,
    /// 2 = 48kHz, 3 = 88.2kHz, 4 = 96kHz, 5 = 176.4kHz, 6 = 192kHz).
    pub sample_rates: u8,
    /// For LPCM: supported bit depths (bit 0 = 16-bit, 1 = 20-bit, 2 = 24-bit).
    /// For compressed formats: maximum bitrate / 8 kbit/s.
    pub format_specific: u8,
}

/// HDMI Audio InfoFrame (CEA-861 §6.6.1) or DP Secondary Data Packet.
/// Describes the audio stream format to the sink.
#[repr(C)]
pub struct AudioInfoFrame {
    /// Coding type (0 = refer to stream header, 1 = IEC 60958 PCM).
    pub coding_type: u8,
    /// Channel count minus 1 (0 = refer to stream header, 1-7 = 2-8 channels).
    pub channel_count: u8,
    /// Sample frequency (0 = refer to stream, 1 = 32kHz, 2 = 44.1kHz, 3 = 48kHz, etc.).
    pub sample_freq: u8,
    /// Sample size (0 = refer to stream, 1 = 16-bit, 2 = 20-bit, 3 = 24-bit).
    pub sample_size: u8,
    /// Channel/speaker allocation (CA field, CEA-861 Table 28).
    /// Determines the mapping of PCM channels to physical speakers.
    pub channel_allocation: u8,
    /// Level shift value (0-15 dB, for downmix).
    pub level_shift: u8,
    /// Downmix inhibit flag.
    pub downmix_inhibit: u8, // 0 = inhibit off, 1 = inhibit on
}
// AudioInfoFrame: u8(1)*7 = 7 bytes.
// Hardware-facing struct — HDMI/DP Audio InfoFrame packet fields.
const_assert!(core::mem::size_of::<AudioInfoFrame>() == 7);

Hot-plug and ELD update: When a display is connected or disconnected, the HDA controller generates an unsolicited response (interrupt) on the HDMI/DP pin widget. The driver handles this by: 1. Reading the pin sense register (GET_PIN_SENSE verb) to determine connection state. 2. If connected: reading the ELD buffer from the codec (GET_HDMI_ELD) and parsing the sink's audio capabilities. Updating HdmiDpAudioEndpoint.eld. 3. Notifying the ALSA control interface via snd_ctl_notify() (jack detection event), which PipeWire/PulseAudio monitors to update available audio sinks. 4. If disconnected while streaming: stopping the active PCM stream (triggers xrun in the application) and clearing the ELD.

Audio InfoFrame programming: Before starting an HDMI/DP audio stream, the driver programs the Audio InfoFrame via the HDA codec's Digital Converter verb set (SET_DIGI_CONVERT_1/2, vendor-specific InfoFrame verbs). The InfoFrame must match the actual PCM stream format; a mismatch causes the sink to mute or produce noise. The driver validates that the requested format is supported by the sink's ELD (SAD list) before programming the stream — unsupported formats are rejected at hw_params time with EINVAL.

Multi-display audio: Systems with multiple HDMI/DP outputs (common on discrete GPUs) expose one HdmiDpAudioEndpoint per connector. Each endpoint is independently controllable — different displays can play different audio streams simultaneously. The endpoints share the GPU's HDA controller but use separate stream descriptors.

Cross-references: - HDA driver model (DMA, BDL, codec verbs): Section 21.4 - AudioDriver trait and KABI contract: Section 13.4 - Jack detection events: Section 21.4 - DMA buffer allocation: Section 4.14

21.4.7 PipeWire Integration¶

Section 21.4 defines PipeWire ring buffers for audio routing in userspace. The integration: 1. Kernel provides raw PCM streams (Section 21.4 PcmStream): a DMA ring buffer that hardware directly reads/writes. 2. PipeWire runs in userspace (Tier 2): implements the audio graph (mixing, routing, resampling, effects). 3. Zero-copy path: PipeWire's "audio device" node directly mmaps the kernel PCM DMA buffer. PipeWire writes mixed samples to appl_ptr, advances the pointer, the kernel driver sees the update and programs the hardware to consume up to appl_ptr.

Low-latency timer: PipeWire needs a periodic callback to refill the buffer every period. The kernel provides a timer (HPET or TSC-deadline APIC timer, configured to fire every period_frames / rate seconds, e.g., 1ms for 48-frame periods at 48kHz). Timer interrupt wakes PipeWire, which renders the next period's samples.

21.4.8 Character Device Registration¶

ALSA devices register with the VFS character device subsystem (Section 14.5) during audio subsystem init. Linux assigns a single well-known major:

Major	Minor range	Device nodes	Description
116	0 + 32×C	`/dev/snd/controlC{C}`	Mixer/control per card
116	1	`/dev/snd/seq`	MIDI sequencer
116	33	`/dev/snd/timer`	ALSA timer (`SNDRV_MINOR_TIMER = 33`)
116	4 + 32×C + D	`/dev/snd/hwC{C}D{D}`	Hardware-specific access (`SNDRV_MINOR_HWDEP = 4`)
116	16 + 32×C + D	`/dev/snd/pcmC{C}D{D}p`	PCM playback
116	24 + 32×C + D	`/dev/snd/pcmC{C}D{D}c`	PCM capture

Where C = card index (0–7), D = device index (0–3). General minor formula: minor = base_offset + 32 * card_index + device_index where base_offset is 0 (control), 4 (hwdep), 16 (PCM playback), 24 (PCM capture). Special devices: seq = 1, timer = 33 (both card-independent). This formula matches Linux sound/core/sound.c (snd_find_minor()). The 32-per-card minor stride limits the system to 32 cards (minors 0-1023) by default, matching the static minor scheme (CONFIG_SND_DYNAMIC_MINORS=n). UmkaOS uses the static formula for deterministic minor assignment. With static mode: 8 cards maximum (minors 0-255, SNDRV_OS_MINORS=256). Dynamic mode (CONFIG_SND_DYNAMIC_MINORS=y, supporting >8 cards up to 32) is also available as a boot option for larger configurations.

/// Called from snd_subsystem_init() during boot Phase 5.3+ (after Tier 1 driver loading).
fn snd_register_chrdev() {
    register_chrdev_region(ChrdevRegion {
        major: 116,
        minor_base: 0,
        minor_count: 1024,  // 32 cards × 32 minors per card (CONFIG_SND_MAX_CARDS=32)
        fops: &SND_FOPS,
        name: "snd",
    }).expect("ALSA major 116 registration");
}

devtmpfs node creation: When a sound card driver registers via snd_card_register(), the ALSA core calls devtmpfs_create_node() for each PCM, control, and hwdep device, which triggers devtmpfs to create the corresponding /dev/snd/* nodes automatically. The nodes are removed when the card is unregistered (driver unload or device removal). No udev rule is required for basic node creation — devtmpfs handles it in-kernel.

SND_FOPS.open() decodes the minor number to determine the device type (control, PCM, sequencer, timer, hwdep) and dispatches to the appropriate subsystem handler. Per-card devices are looked up via minor / 32 for the card index and minor % 32 for the device type within the card.

21.4.9 ALSA PCM Compatibility Ioctls¶

The umka-sysapi layer translates Linux ALSA PCM ioctls on /dev/snd/pcmC*D*p and /dev/snd/pcmC*D*c file descriptors to native UmkaOS audio calls. All ioctl numbers use the Linux encoding: magic 'A' (0x41), with _IO, _IOR, _IOW, _IOWR direction/size encoding. The following table lists the ioctls that umka-sysapi must handle for ALSA application compatibility (PipeWire, PulseAudio, JACK, aplay/arecord):

Ioctl	Macro	Nr	Dir	Description
`SNDRV_PCM_IOCTL_PVERSION`	`_IOR('A', 0x00, int)`	0x00	R	Protocol version
`SNDRV_PCM_IOCTL_INFO`	`_IOR('A', 0x01, snd_pcm_info)`	0x01	R	Stream info (card, device, subdevice, name)
`SNDRV_PCM_IOCTL_TSTAMP`	`_IOW('A', 0x02, int)`	0x02	W	Set timestamp mode (deprecated; use TTSTAMP)
`SNDRV_PCM_IOCTL_TTSTAMP`	`_IOW('A', 0x03, int)`	0x03	W	Set timestamp type (monotonic, monotonic_raw)
`SNDRV_PCM_IOCTL_HW_REFINE`	`_IOWR('A', 0x10, snd_pcm_hw_params)`	0x10	RW	Refine hardware parameter space (intersection)
`SNDRV_PCM_IOCTL_HW_PARAMS`	`_IOWR('A', 0x11, snd_pcm_hw_params)`	0x11	RW	Set hardware parameters (format, rate, channels, buffer size)
`SNDRV_PCM_IOCTL_HW_FREE`	`_IO('A', 0x12)`	0x12	—	Free hardware resources (DMA buffer)
`SNDRV_PCM_IOCTL_SW_PARAMS`	`_IOWR('A', 0x13, snd_pcm_sw_params)`	0x13	RW	Set software parameters (avail_min, start_threshold, stop_threshold)
`SNDRV_PCM_IOCTL_STATUS`	`_IOR('A', 0x20, snd_pcm_status)`	0x20	R	Get stream status (state, hw_ptr, tstamp, delay)
`SNDRV_PCM_IOCTL_DELAY`	`_IOR('A', 0x21, snd_pcm_sframes_t)`	0x21	R	Get current delay in frames
`SNDRV_PCM_IOCTL_HWSYNC`	`_IO('A', 0x22)`	0x22	—	Synchronize hw_ptr with hardware
`SNDRV_PCM_IOCTL_SYNC_PTR`	`_IOWR('A', 0x23, snd_pcm_sync_ptr)`	0x23	RW	Sync hw_ptr/appl_ptr (mmap mode; combined status+control update)
`SNDRV_PCM_IOCTL_CHANNEL_INFO`	`_IOR('A', 0x32, snd_pcm_channel_info)`	0x32	R	Per-channel mmap offset/stride info
`SNDRV_PCM_IOCTL_PREPARE`	`_IO('A', 0x40)`	0x40	—	Prepare stream for playback/capture (reset pointers)
`SNDRV_PCM_IOCTL_RESET`	`_IO('A', 0x41)`	0x41	—	Reset stream (stop + clear buffer)
`SNDRV_PCM_IOCTL_START`	`_IO('A', 0x42)`	0x42	—	Start DMA (begin playback/capture)
`SNDRV_PCM_IOCTL_DROP`	`_IO('A', 0x43)`	0x43	—	Stop immediately (discard pending frames)
`SNDRV_PCM_IOCTL_DRAIN`	`_IO('A', 0x44)`	0x44	—	Stop after all pending data played/captured
`SNDRV_PCM_IOCTL_PAUSE`	`_IOW('A', 0x45, int)`	0x45	W	Pause/resume (arg: 1=pause, 0=resume)
`SNDRV_PCM_IOCTL_REWIND`	`_IOW('A', 0x46, snd_pcm_uframes_t)`	0x46	W	Rewind appl_ptr by N frames
`SNDRV_PCM_IOCTL_RESUME`	`_IO('A', 0x47)`	0x47	—	Resume from suspend (power management)
`SNDRV_PCM_IOCTL_XRUN`	`_IO('A', 0x48)`	0x48	—	Force xrun state (testing)
`SNDRV_PCM_IOCTL_FORWARD`	`_IOW('A', 0x49, snd_pcm_uframes_t)`	0x49	W	Advance appl_ptr by N frames
`SNDRV_PCM_IOCTL_WRITEI_FRAMES`	`_IOW('A', 0x50, snd_xferi)`	0x50	W	Write interleaved frames (non-mmap path)
`SNDRV_PCM_IOCTL_READI_FRAMES`	`_IOR('A', 0x51, snd_xferi)`	0x51	R	Read interleaved frames (non-mmap path)
`SNDRV_PCM_IOCTL_WRITEN_FRAMES`	`_IOW('A', 0x52, snd_xfern)`	0x52	W	Write non-interleaved frames
`SNDRV_PCM_IOCTL_READN_FRAMES`	`_IOR('A', 0x53, snd_xfern)`	0x53	R	Read non-interleaved frames
`SNDRV_PCM_IOCTL_LINK`	`_IOW('A', 0x60, int)`	0x60	W	Link two PCM streams (synchronized start/stop)
`SNDRV_PCM_IOCTL_UNLINK`	`_IO('A', 0x61)`	0x61	—	Unlink PCM streams

All struct sizes in the ioctl encoding match the Linux sizeof() on the target architecture (LP64 for 64-bit, ILP32 for 32-bit). The umka-sysapi 32-bit compat layer handles compat_ioctl translation for snd_pcm_hw_params, snd_pcm_sw_params, snd_pcm_status, and snd_pcm_sync_ptr (which contain pointer-sized fields that differ between 32-bit and 64-bit ABIs).

snd_pcm_hw_params ABI struct — the primary parameter negotiation struct used by HW_REFINE and HW_PARAMS. Must match Linux exactly (608 bytes on 64-bit, 604 bytes on 32-bit due to snd_pcm_uframes_t = unsigned long).

Size derivation (64-bit): - flags: 4 - masks[3] + mres[5]: 8 × 32 = 256 - intervals[12] + ires[9]: 21 × 12 = 252 - rmask..rate_den: 6 × 4 = 24 - fifo_size: 8 (unsigned long on LP64) - sync[16] + reserved[48]: 64 - Total: 608 (compile-time assert: assert!(size_of::<SndPcmHwParams>() == 608))

/// Linux ABI: include/uapi/sound/asound.h
#[cfg(target_pointer_width = "64")]
const _: () = assert!(core::mem::size_of::<SndPcmHwParams>() == 608);
#[cfg(target_pointer_width = "32")]
const _: () = assert!(core::mem::size_of::<SndPcmHwParams>() == 604);
// kernel-internal, not KABI
#[repr(C)]
pub struct SndPcmHwParams {
    pub flags:       u32,                   // 4 bytes
    /// Bitmask parameters: ACCESS (0), FORMAT (1), SUBFORMAT (2).
    /// Index = SNDRV_PCM_HW_PARAM_x - SNDRV_PCM_HW_PARAM_FIRST_MASK.
    pub masks:       [SndMask; 3],          // 3 × 32 = 96 bytes
    /// Reserved masks for future mask-type parameters.
    pub mres:        [SndMask; 5],          // 5 × 32 = 160 bytes
    /// Interval parameters: SAMPLE_BITS (8) through TICK_TIME (19).
    /// Index = SNDRV_PCM_HW_PARAM_x - SNDRV_PCM_HW_PARAM_FIRST_INTERVAL.
    pub intervals:   [SndInterval; 12],     // 12 × 12 = 144 bytes
    /// Reserved intervals for future interval-type parameters.
    pub ires:        [SndInterval; 9],      // 9 × 12 = 108 bytes
    /// Request mask: which params the caller wants to set.
    pub rmask:       u32,                   // 4 bytes
    /// Changed mask: which params were actually changed by REFINE.
    pub cmask:       u32,                   // 4 bytes
    /// Info flags (SNDRV_PCM_INFO_*).
    pub info:        u32,                   // 4 bytes
    /// Most significant bits of sample (for formats < 32 bits).
    pub msbits:      u32,                   // 4 bytes
    /// Rate numerator (for exact rational rates).
    pub rate_num:    u32,                   // 4 bytes
    /// Rate denominator.
    pub rate_den:    u32,                   // 4 bytes
    /// Hardware FIFO size in frames.
    pub fifo_size:   usize,                 // snd_pcm_uframes_t (8 on LP64, 4 on ILP32)
    /// Hardware synchronization ID (shared across linked streams).
    pub sync:        [u8; 16],              // 16 bytes
    pub _reserved:   [u8; 48],              // 48 bytes
}

/// Bitmask type for format/access/subformat masks (SNDRV_MASK_MAX = 256 bits).
/// Size: 32 bytes.
#[repr(C)]
pub struct SndMask {
    pub bits: [u32; 8], // (SNDRV_MASK_MAX + 31) / 32 = 8
}
// SndMask: [u32;8] = 32 bytes. Userspace ABI sub-struct within SndPcmHwParams.
const_assert!(core::mem::size_of::<SndMask>() == 32);

/// Interval constraint: [min, max] with openmin/openmax/integer/empty bitflags.
/// Size: 12 bytes (NOT 16 — Linux uses C bitfields, not a separate flags word).
///
/// The bitfield word packs four single-bit flags into one u32:
///   bit 0: openmin (min is exclusive)
///   bit 1: openmax (max is exclusive)
///   bit 2: integer (only integer values allowed)
///   bit 3: empty   (interval is empty / no valid values)
#[repr(C)]
pub struct SndInterval {
    pub min:     u32,
    pub max:     u32,
    /// Packed bitflags: openmin(0), openmax(1), integer(2), empty(3).
    /// Only the low 4 bits are meaningful; upper 28 bits are padding
    /// (matching the C bitfield layout where the compiler packs four
    /// 1-bit fields into a single 32-bit storage unit).
    pub flags:   u32,
}
// SndInterval: u32(4)*3 = 12 bytes. Userspace ABI sub-struct.
const_assert!(core::mem::size_of::<SndInterval>() == 12);

snd_pcm_sw_params ABI struct — software parameter control:

// Userspace ABI struct — copied to/from userspace via SNDRV_PCM_IOCTL_SW_PARAMS.
// Matches Linux `struct snd_pcm_sw_params` layout from `include/uapi/sound/asound.h`.
// The kernel MUST zero the struct before filling and copying to userspace to prevent
// information disclosure through implicit padding bytes.
#[repr(C)]
pub struct SndPcmSwParams {
    pub tstamp_mode:      i32,     // SNDRV_PCM_TSTAMP_NONE/ENABLE
    pub period_step:      u32,     // step between periods (usually 1)
    pub sleep_min:        u32,     // deprecated, must be 0
    // Explicit padding: `#[repr(C)]` alignment rules insert 4 bytes between
    // `sleep_min` (u32, offset 12) and `avail_min` (usize, offset 16 on LP64).
    // Making this explicit prevents information disclosure of uninitialized
    // kernel memory when the struct is copied to userspace.
    #[cfg(target_pointer_width = "64")]
    pub _pad0:            [u8; 4], // offset 12-15 (LP64 only)
    pub avail_min:        usize,   // min frames avail before wakeup
    pub xfer_align:       usize,   // deprecated, must be 0
    pub start_threshold:  usize,   // frames written before auto-start
    pub stop_threshold:   usize,   // frames available before auto-stop (xrun)
    pub silence_threshold: usize,  // silence frames threshold
    pub silence_size:     usize,   // silence fill size
    pub boundary:         usize,   // ring buffer boundary (buffer_size * n)
    pub proto:            u32,     // protocol version
    pub tstamp_type:      u32,     // SNDRV_PCM_TSTAMP_TYPE_*
    pub _reserved:        [u8; 56],
}
// SndPcmSwParams: Userspace ABI struct (SNDRV_PCM_IOCTL_SW_PARAMS).
// Seven usize fields: avail_min, xfer_align, start_threshold, stop_threshold,
// silence_threshold, silence_size, boundary.
// 64-bit: i32(4) + u32(4) + u32(4) + _pad0(4) + usize(8)*7 + u32(4) + u32(4) + [u8;56] = 136.
// 32-bit: i32(4) + u32(4) + u32(4) + usize(4)*7 + u32(4) + u32(4) + [u8;56] = 104.
#[cfg(target_pointer_width = "64")]
const _: () = assert!(core::mem::size_of::<SndPcmSwParams>() == 136);
#[cfg(target_pointer_width = "32")]
const _: () = assert!(core::mem::size_of::<SndPcmSwParams>() == 104);

21.4.10 Jack Detection¶

HDA codecs support unsolicited responses (jack detection events): when a headphone is plugged/unplugged, the codec sends an event to the controller.

impl HdaController {
    /// Enable unsolicited response for a pin widget (jack detection).
    pub fn enable_jack_detect(&self, codec_addr: u8, pin_nid: u8) -> Result<(), HdaError> {
        // Send SET_UNSOLICITED_ENABLE verb to pin widget.
        // SET_UNSOLICITED_ENABLE (verb 0x708): bit 7 = enable, bits [6:0] = tag.
        // NID is encoded by send_verb() into CORB bits [27:20]; do NOT embed it in the verb payload.
        let verb = VERB_SET_UNSOLICITED_ENABLE | (1 << 7); // enable=1, tag=0
        self.send_verb(codec_addr, pin_nid, verb)?;
        Ok(())
    }

    /// Handle unsolicited response interrupt (jack detection event).
    pub fn handle_unsolicited_response(&self, codec_addr: u8, response: u32) {
        // Parse response: extract pin NID, jack state (connected/disconnected).
        let pin_nid = (response >> 4) & 0xFF;
        let connected = (response & 0x1) != 0;

        // Post event to userspace via event ring buffer.
        umka_event::post_event(Event::AudioJackChanged {
            device_id: self.device_id(),
            codec_addr,
            pin_nid: pin_nid as u8,
            connected,
        });
    }
}

Audio routing policy: Audio routing policy (default device selection, per-app routing, volume control) is handled by PipeWire in userspace. Kernel provides DMA ring buffers and jack detection events.

21.4.11 Architectural Decision¶

Audio: Native UmkaOS framework + ALSA compat

Kernel provides native PCM interface with clean ABI. umka-sysapi translates ALSA ioctls to native calls, enabling existing applications (PipeWire, PulseAudio, JACK) to work unmodified. Best of both worlds: clean kernel API, full userspace compatibility.

21.4.12 ALSA MIDI Sequencer¶

The ALSA sequencer provides a kernel-internal MIDI event bus. Applications connect ports and route MIDI events between synthesizers, hardware MIDI interfaces, and software instruments. It is distinct from raw MIDI device I/O (which goes through /dev/midiC0D0 raw devices).

21.4.12.1 Architecture¶

┌────────────────────────────────────────────────────────┐
│                   snd_seq Core                         │
│                                                        │
│  Clients:  [app A]  [app B]  [snd_seq_dummy]  [hw]    │
│               │        │           │            │      │
│  Ports:    [128:0]  [129:0]     [14:0]      [20:0]    │
│               │        │           │            │      │
│  Subscriptions (routing graph — many-to-many)          │
│               └────────┴───────────┘────────────┘      │
│  Queues:   [Q0: real-time]  [Q1: MIDI tick-based]      │
│               │                    │                   │
│  Timer:    snd_hrtimer (CLOCK_MONOTONIC)                │
└────────────────────────────────────────────────────────┘
        ↕ /dev/snd/seq

21.4.12.2 Data Structures¶

/// Maximum MIDI ports per sequencer client.
/// Ports 0-191: user-space clients. Ports 192-255: kernel/system clients.
pub const SEQ_MAX_PORTS_PER_CLIENT: usize = 256;

/// MIDI event FIFO depth per sequencer client output queue.
/// 256 events × ~28 bytes each ≈ 7 KB per client — fixed, no heap allocation.
pub const SEQ_CLIENT_FIFO_DEPTH: usize = 256;

/// ALSA sequencer client (one per application or hardware source).
///
/// The `RingBuf<SeqEvent, SEQ_CLIENT_FIFO_DEPTH>` type is defined in Section 11.5
/// (umka-driver-sdk ring buffer). If not yet imported in this context, add
/// `use umka_driver_sdk::ring::RingBuf;`.
pub struct SeqClient {
    /// Client number (0-191 = user clients; 192-255 = kernel clients).
    pub client_id:   u8,
    /// Client type.
    pub type_:       SeqClientType,
    /// Client name (for display in aconnect etc.).
    pub name:        [u8; 64],
    /// Port table indexed directly by port ID (0-255). O(1) access by port_id.
    /// Option<Arc> allows sparse allocation — clients need not use all 256 ports.
    /// **Size**: 256 × 8 bytes = 2048 bytes inline. This is acceptable because
    /// SeqClient is heap-allocated (Arc<SeqClient>) — it is NOT a stack variable.
    /// The inline array avoids a second heap allocation and pointer indirection
    /// on every port lookup (hot path for MIDI event routing).
    pub ports:       [Option<Arc<SeqPort>>; SEQ_MAX_PORTS_PER_CLIENT],
    /// Output event ring buffer (kernel→client direction). Fixed-size, no heap allocation.
    /// When full, new events are dropped and `lost` is incremented.
    pub fifo:        Mutex<RingBuf<SeqEvent, SEQ_CLIENT_FIFO_DEPTH>>,
    /// Count of dropped events due to full FIFO. Monotonically increasing.
    pub lost:        AtomicU64,
}

pub enum SeqClientType {
    /// Kernel client (e.g., hardware MIDI driver, snd_seq_dummy).
    Kernel,
    /// Userspace application connected via /dev/snd/seq.
    User,
}

/// ALSA sequencer port.
/// A subscription connecting a sender port to a receiver port.
/// Created by `SNDRV_SEQ_IOCTL_SUBSCRIBE_PORT`.
pub struct SeqSubscription {
    /// Sender port address (client_id, port_id).
    pub sender: SeqAddr,
    /// Destination port address.
    pub dest: SeqAddr,
    /// Subscription flags (e.g., exclusive, timestamp).
    pub flags: u32,
}

/// Sender/destination address for ALSA sequencer subscriptions.
pub struct SeqAddr {
    pub client_id: u8,
    pub port_id: u8,
}

/// Maximum subscriptions per sequencer port direction (read or write).
/// 64 is sufficient because: ALSA sequencer ports in practice have ≤10
/// subscriptions (typically 1-3 for a MIDI instrument chain). The bound
/// prevents unbounded heap allocation under the RwLock — subscription
/// add/remove is a warm path (user-initiated connect/disconnect), not a
/// hot path, so the ArrayVec overhead is negligible. If a client attempts
/// to exceed 64 subscriptions on a single port, `snd_seq_subscribe_port()`
/// returns `-ENOSPC`.
pub const MAX_SUBS_PER_PORT: usize = 64;

pub struct SeqPort {
    pub port_id:     u8,
    pub client_id:   u8,
    pub name:        [u8; 64],
    /// Port capability flags.
    pub capability:  SeqPortCapability,
    /// Port type flags.
    pub type_:       SeqPortType,
    /// Subscriber list: ports that send TO this port (WRITE direction).
    pub write_subs:  RwLock<ArrayVec<SeqSubscription, MAX_SUBS_PER_PORT>>,
    /// Subscriber list: ports this port sends TO (READ direction).
    pub read_subs:   RwLock<ArrayVec<SeqSubscription, MAX_SUBS_PER_PORT>>,
    /// Per-port kernel client callback (for kernel clients).
    pub kernel_fn:   Option<fn(port: &SeqPort, event: &SeqEvent)>,
}

bitflags! {
    pub struct SeqPortCapability: u32 {
        const READ        = 1 << 0; // Other ports may receive from this port
        const WRITE       = 1 << 1; // Other ports may send to this port
        const SYNC_READ   = 1 << 2; // Obsolete
        const SYNC_WRITE  = 1 << 3; // Obsolete
        const DUPLEX      = 1 << 4; // Full-duplex port
        const SUBS_READ   = 1 << 5; // Subscription list readable by other clients
        const SUBS_WRITE  = 1 << 6; // Subscription list writable by other clients
        const NO_EXPORT   = 1 << 7; // Do not export this port via ANNOUNCE
    }
}

bitflags! {
    pub struct SeqPortType: u32 {
        const SPECIFIC    = 1 << 0; // Hardware-specific (not a standard MIDI port)
        const MIDI_GENERIC = 1 << 1; // Standard MIDI port
        const MIDI_GM     = 1 << 2; // General MIDI compatible
        const MIDI_GS     = 1 << 3; // Roland GS compatible
        const MIDI_XG     = 1 << 4; // Yamaha XG compatible
        const MIDI_MT32   = 1 << 5; // Roland MT-32 compatible
        const MIDI_GM2    = 1 << 6; // General MIDI 2 compatible
        const SYNTH       = 1 << 10; // Software synthesizer
        const DIRECT_SAMPLE = 1 << 11; // Sampling synthesizer
        const SAMPLE      = 1 << 12; // Sample player
        const HARDWARE    = 1 << 16; // Hardware port (MIDI interface)
        const SOFTWARE    = 1 << 17; // Software port (application)
        const SYNTHESIZER = 1 << 18; // Synthesizer
        const PORT        = 1 << 19; // Port connector (MIDI port on a hardware device)
        const APPLICATION = 1 << 20; // Application (sequencer, arpeggiator, etc.)
    }
}

21.4.12.3 MIDI Event¶

/// ALSA sequencer event (matches struct snd_seq_event, 28 bytes).
#[repr(C)]
pub struct SeqEvent {
    /// Event type (see SeqEventType enum).
    pub type_:   u8,
    /// Flags: timestamp format, data format.
    pub flags:   u8,
    /// Tag (for application use).
    pub tag:     u8,
    /// Queue ID (for scheduled events; SNDRV_SEQ_QUEUE_DIRECT = 253 for immediate).
    pub queue:   u8,
    /// Timestamp (union: tick or real-time depending on flags).
    pub time:    SeqTimestamp,
    /// Source port (client_id, port_id).
    pub source:  SeqAddr,
    /// Destination port (client_id, port_id; SNDRV_SEQ_ADDRESS_BROADCAST = 253 for all subscribers).
    pub dest:    SeqAddr,
    /// Event data (union of MIDI event types).
    pub data:    SeqEventData,
}
const_assert!(core::mem::size_of::<SeqEvent>() == 28);

/// Timestamp union (8 bytes).
pub union SeqTimestamp {
    /// MIDI tick timestamp (SNDRV_SEQ_TIME_STAMP_TICK flag).
    pub tick: u32,
    /// Real-time timestamp (SNDRV_SEQ_TIME_STAMP_REAL flag).
    pub time: SeqRealTime,
}

#[repr(C)]
pub struct SeqRealTime {
    pub tv_sec:  u32,
    pub tv_nsec: u32,
}
// SeqRealTime: u32(4)*2 = 8 bytes. Userspace ABI sub-struct.
const_assert!(core::mem::size_of::<SeqRealTime>() == 8);

/// Sequencer event data union (12 bytes, matching Linux `union snd_seq_event_data`).
/// All variants are exactly 12 bytes or smaller (padded to 12 by the union).
/// The active variant is determined by `SeqEvent.type_`.
#[repr(C)]
pub union SeqEventData {
    /// Note events: NOTE_ON, NOTE_OFF, KEY_PRESSURE, NOTE.
    pub note:    SeqEvNote,       // 8 bytes (padded to 12 by union)
    /// Control events: CONTROLLER, PGMCHANGE, PITCHBEND, CHANPRESS.
    pub control: SeqEvCtrl,       // 12 bytes
    /// Raw 8-bit data (inline SYSEX, up to 12 bytes).
    pub raw8:    SeqEvRaw8,       // 12 bytes
    /// Raw 32-bit data.
    pub raw32:   SeqEvRaw32,      // 12 bytes
    /// Extended data pointer (SYSEX > 12 bytes, bounce-buffered events).
    pub ext:     SeqEvExt,        // 12 bytes (packed: len(4) + addr(8))
    /// Queue control: start/stop/tempo/position.
    pub queue:   SeqEvQueue,      // 12 bytes
    /// Address (for announce events: CLIENT_START, PORT_START, etc.).
    pub addr:    SeqAddr,         // 2 bytes (padded to 12 by union)
    /// Port subscription: PORT_SUBSCRIBED, PORT_UNSUBSCRIBED.
    pub connect: SeqEvConnect,    // 4 bytes (padded to 12 by union)
    /// Result/echo: ECHO, OSS, RESULT events.
    pub result:  SeqEvResult,     // 8 bytes (padded to 12 by union)
}
const_assert!(core::mem::size_of::<SeqEventData>() == 12);

/// Note event data (8 bytes; union pads to 12).
/// Matches Linux `struct snd_seq_ev_note`.
#[repr(C)]
pub struct SeqEvNote {
    /// MIDI channel (0-15).
    pub channel:      u8,
    /// Note number (0-127).
    pub note:         u8,
    /// Velocity (0-127; NOTE_OFF: release velocity).
    pub velocity:     u8,
    /// Off-velocity (for compound NOTE event; ignored for NOTE_ON/NOTE_OFF).
    pub off_velocity: u8,
    /// Duration in ticks (for compound NOTE event; ignored for NOTE_ON/NOTE_OFF).
    pub duration:     u32,
}
// SeqEvNote: u8(1)*4 + u32(4) = 8 bytes. Userspace ABI sub-struct.
const_assert!(core::mem::size_of::<SeqEvNote>() == 8);

/// Control change event data (12 bytes).
/// Matches Linux `struct snd_seq_ev_ctrl`.
#[repr(C)]
pub struct SeqEvCtrl {
    /// MIDI channel (0-15).
    pub channel: u8,
    /// Reserved padding (3 bytes).
    pub _pad:    [u8; 3],
    /// Controller number (CC# for CONTROLLER), program number (PGMCHANGE), etc.
    pub param:   u32,
    /// Controller value; pitch bend range is -8192..+8191.
    pub value:   i32,
}
// SeqEvCtrl: u8(1) + [u8;3](3) + u32(4) + i32(4) = 12 bytes. Userspace ABI sub-struct.
const_assert!(core::mem::size_of::<SeqEvCtrl>() == 12);

/// Raw 8-bit event data (12 bytes).
/// Matches Linux `struct snd_seq_ev_raw8`.
#[repr(C)]
pub struct SeqEvRaw8 {
    /// Raw byte data.
    pub d: [u8; 12],
}
// SeqEvRaw8: [u8;12] = 12 bytes. Userspace ABI sub-struct.
const_assert!(core::mem::size_of::<SeqEvRaw8>() == 12);

/// Raw 32-bit event data (12 bytes).
/// Matches Linux `struct snd_seq_ev_raw32`.
#[repr(C)]
pub struct SeqEvRaw32 {
    /// Raw 32-bit words.
    pub d: [u32; 3],
}
// SeqEvRaw32: [u32;3] = 12 bytes. Userspace ABI sub-struct.
const_assert!(core::mem::size_of::<SeqEvRaw32>() == 12);

/// Extended event data pointer (12 bytes, packed).
/// Matches Linux `struct snd_seq_ev_ext` (`__attribute__((packed))`).
/// Used for SYSEX messages and other variable-length data that exceeds
/// the 12-byte inline limit. The kernel copies the extended data into a
/// bounce buffer; `addr` points to kernel memory (not directly to userspace).
// kernel-internal, not KABI
#[repr(C, packed)]
pub struct SeqEvExt {
    /// Length of extended data in bytes.
    pub len: u32,
    /// Address of extended data buffer (kernel address).
    /// Uses usize to match Linux `void *ptr` — 8 bytes on 64-bit, 4 bytes
    /// on 32-bit. SeqEvExt is 12 bytes on 64-bit (4+8), 8 bytes on 32-bit
    /// (4+4); the union is always 12 bytes from other larger variants.
    pub addr: usize,
}
// SeqEvExt (packed): u32(4) + usize. 64-bit: 12 bytes; 32-bit: 8 bytes.
// Userspace ABI sub-struct (Linux __attribute__((packed))).
#[cfg(target_pointer_width = "64")]
const _: () = assert!(core::mem::size_of::<SeqEvExt>() == 12);
#[cfg(target_pointer_width = "32")]
const _: () = assert!(core::mem::size_of::<SeqEvExt>() == 8);

/// Queue control event data (12 bytes).
/// Matches Linux `struct snd_seq_ev_queue_control`.
/// Used for queue start/stop/continue, tempo changes, and position updates.
#[repr(C)]
pub struct SeqEvQueue {
    /// Affected queue ID.
    pub queue:    u8,
    /// Reserved padding.
    pub _pad:     [u8; 3],
    /// Parameter value union (8 bytes). Interpretation depends on event type:
    /// - TEMPO: `value` = microseconds per quarter note
    /// - SETPOS_TICK: `position` = tick position
    /// - SETPOS_TIME: `time` = real-time position
    /// - QUEUE_SKEW: `skew` = skew value/base pair
    pub param:    SeqQueueParam,
}
// SeqEvQueue: u8(1) + [u8;3](3) + SeqQueueParam(8) = 12 bytes.
// Userspace ABI sub-struct.
const_assert!(core::mem::size_of::<SeqEvQueue>() == 12);

/// Queue control parameter union (8 bytes).
/// Matches the inner union of Linux `struct snd_seq_ev_queue_control.param`.
#[repr(C)]
pub union SeqQueueParam {
    /// Affected value (e.g., tempo in microseconds per quarter note).
    pub value:    i32,
    /// Timestamp (for position set operations).
    pub time:     SeqTimestamp,
    /// Sync position in ticks.
    pub position: u32,
    /// Queue skew (value/base pair for tempo scaling).
    pub skew:     SeqQueueSkew,
    /// Raw access (two 32-bit words).
    pub d32:      [u32; 2],
    /// Raw access (eight bytes).
    pub d8:       [u8; 8],
}

/// Queue skew parameters (8 bytes).
/// Matches Linux `struct snd_seq_queue_skew`.
#[repr(C)]
pub struct SeqQueueSkew {
    /// Skew numerator.
    pub value: u32,
    /// Skew denominator.
    pub base:  u32,
}
// SeqQueueSkew: u32(4)*2 = 8 bytes. Userspace ABI sub-struct.
const_assert!(core::mem::size_of::<SeqQueueSkew>() == 8);

/// Port connection event data (4 bytes; union pads to 12).
/// Matches Linux `struct snd_seq_connect`.
/// Used for PORT_SUBSCRIBED and PORT_UNSUBSCRIBED events.
#[repr(C)]
pub struct SeqEvConnect {
    /// Sender address (client, port).
    pub sender: SeqAddr,
    /// Destination address (client, port).
    pub dest:   SeqAddr,
}
// SeqEvConnect: SeqAddr(2)*2 = 4 bytes. Userspace ABI sub-struct.
const_assert!(core::mem::size_of::<SeqEvConnect>() == 4);

/// Result/echo event data (8 bytes; union pads to 12).
/// Matches Linux `struct snd_seq_result`.
/// Used for ECHO (loopback timing measurement) and RESULT (operation outcome) events.
#[repr(C)]
pub struct SeqEvResult {
    /// Processed event type (the original event type that produced this result).
    pub event:  i32,
    /// Result code (0 = success, negative = error).
    pub result: i32,
}
// SeqEvResult: i32(4)*2 = 8 bytes. Userspace ABI sub-struct.
const_assert!(core::mem::size_of::<SeqEvResult>() == 8);

21.4.12.4 Event Types¶

Key event types (SNDRV_SEQ_EVENT_*):

Type	Value	Description
NOTE_ON	6	Note On (channel, note, velocity)
NOTE_OFF	7	Note Off (channel, note, velocity)
KEYPRESS	8	Key Pressure / Aftertouch
CONTROLLER	10	Control Change (CC# 0-127)
PGMCHANGE	11	Program Change
CHANPRESS	12	Channel Pressure
PITCHBEND	13	Pitch Bend (±8192)
QFRAME	22	MIDI Quarter Frame (MTC)
SONGPOS	20	Song Position Pointer
SONGSEL	21	Song Select
START	30	MIDI Start
CONTINUE	31	MIDI Continue
STOP	32	MIDI Stop
CLOCK	36	MIDI Clock
RESET	41	Reset to power-on state
SENSING	42	Active Sensing
ECHO	50	Echo back to sender (for timing measurement)
SYSEX	130	System Exclusive (extended data format)
PORT_SUBSCRIBED	66	Port subscription created
PORT_UNSUBSCRIBED	67	Port subscription deleted

21.4.12.5 Queues and Timers¶

/// A scheduled event in a sequencer queue's min-heap. Ordered by timestamp
/// so the queue can dispatch the earliest event first.
pub struct ScheduledEvent {
    /// Delivery timestamp (ticks or nanoseconds depending on queue mode).
    pub timestamp: u64,
    /// The sequencer event payload.
    pub event: SeqEvent,
    /// Destination port address for delivery.
    pub dest: SeqAddr,
}
impl PartialOrd for ScheduledEvent {
    fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {
        // Min-heap ordering: earlier timestamps sort first.
        other.timestamp.partial_cmp(&self.timestamp)
    }
}

/// Maximum scheduled events per sequencer queue. MIDI workloads rarely
/// schedule more than a few hundred events ahead; 1024 is generous.
/// Events beyond this limit are dropped with an ENOMEM error to the client.
pub const SEQ_QUEUE_MAX_EVENTS: usize = 1024;

/// Sequencer queue (schedules events for future delivery).
pub struct SeqQueue {
    pub queue_id:  u8,
    /// Queue owner client (only owner can start/stop/set tempo).
    pub owner:     u8,
    /// Running state.
    pub running:   AtomicBool,
    /// Tempo in microseconds per quarter note (default 500000 = 120 BPM).
    pub tempo_us:  AtomicU32,
    /// Time signature numerator.
    pub ppq:       u32,  // Pulses Per Quarter note (default 96)
    /// Current position in ticks.
    pub tick:      AtomicU64,
    /// Current real-time position.
    pub real_time: AtomicU64, // nanoseconds
    /// Scheduled event min-heap (sorted by timestamp).
    ///
    /// Uses a fixed-capacity `ArrayVec` backing to avoid heap allocation on
    /// the event push path. `BinaryHeap` allocates from the heap on every
    /// `push()` that triggers a grow, which is unacceptable under `Mutex`
    /// on the sequencer's warm path. The `ArrayVec` is sorted manually:
    /// insert via binary search + shift, extract-min from position 0.
    /// The capacity `SEQ_QUEUE_MAX_EVENTS` (1024) bounds memory to
    /// ~56 KiB per queue (56 bytes per `ScheduledEvent`).
    // O(N) insert/extract is acceptable for MIDI event rates (~1K events/sec).
    // A custom binary min-heap on ArrayVec backing would give O(log N) but
    // adds complexity for negligible gain at this scale.
    pub events:    Mutex<ArrayVec<ScheduledEvent, SEQ_QUEUE_MAX_EVENTS>>,
    /// hrtimer for next scheduled event.
    pub timer:     HrTimer,
}

Queue operations via ioctl SNDRV_SEQ_IOCTL_START_QUEUE, SNDRV_SEQ_IOCTL_STOP_QUEUE, SNDRV_SEQ_IOCTL_CONTINUE_QUEUE. Tempo change via SNDRV_SEQ_IOCTL_SET_QUEUE_TEMPO.

21.4.12.6 /dev/snd/seq Interface¶

ioctls on /dev/snd/seq (one fd per client):

ioctl	Description
`SNDRV_SEQ_IOCTL_PVERSION`	Get sequencer version
`SNDRV_SEQ_IOCTL_CLIENT_ID`	Get caller's client ID
`SNDRV_SEQ_IOCTL_SYSTEM_INFO`	Get max_queues, max_clients, max_ports, max_channels
`SNDRV_SEQ_IOCTL_CREATE_PORT`	Create a new port
`SNDRV_SEQ_IOCTL_DELETE_PORT`	Delete a port
`SNDRV_SEQ_IOCTL_GET_PORT_INFO`	Get port info (name, capability, type)
`SNDRV_SEQ_IOCTL_SET_PORT_INFO`	Set port info
`SNDRV_SEQ_IOCTL_SUBSCRIBE_PORT`	Create subscription (routing)
`SNDRV_SEQ_IOCTL_UNSUBSCRIBE_PORT`	Remove subscription
`SNDRV_SEQ_IOCTL_CREATE_QUEUE`	Create event queue
`SNDRV_SEQ_IOCTL_DELETE_QUEUE`	Delete queue
`SNDRV_SEQ_IOCTL_GET_QUEUE_STATUS`	Get queue running state
`SNDRV_SEQ_IOCTL_GET_QUEUE_TEMPO`	Get BPM/PPQ
`SNDRV_SEQ_IOCTL_SET_QUEUE_TEMPO`	Set BPM/PPQ
`SNDRV_SEQ_IOCTL_START_QUEUE`	Start queue timer
`SNDRV_SEQ_IOCTL_STOP_QUEUE`	Stop queue timer
`SNDRV_SEQ_IOCTL_CONTINUE_QUEUE`	Continue queue from pause
`SNDRV_SEQ_IOCTL_RUNNING_MODE`	Toggle real-time vs tick scheduling
`SNDRV_SEQ_IOCTL_GET_CLIENT_INFO`	Get client metadata
`SNDRV_SEQ_IOCTL_SET_CLIENT_INFO`	Set client name etc.

Read/write on the fd: each read() returns one or more SeqEvent structs; write() sends events to destination ports immediately (queue=SNDRV_SEQ_QUEUE_DIRECT) or schedules them (queue=Q0/Q1 with timestamp). O_NONBLOCK supported.

21.4.12.7 snd_seq_dummy — Loopback Client¶

snd_seq_dummy creates one kernel client (client 14, "Midi Through") with two ports: port 0 (writable by apps, readable by output devices) and port 1 (reverse). All events written to port 0 are echoed back to all subscribers of port 0. This provides a software MIDI loopback for virtual instruments.

21.4.12.8 Linux Compatibility¶

/dev/snd/seq character device (major 116, minor 1): same as Linux ALSA
ioctl codes identical to Linux ALSA sound/asound.h
struct snd_seq_event binary layout identical
aconnect(1), aplaymidi(1), aseqdump(1) work without modification
JACK and PipeWire MIDI ports connect via snd_seq (JACK uses seq_midi_event translation)
Timidity++, FluidSynth, and other software synthesizers use /dev/snd/seq directly

21.4.13 ALSA Timer Interface¶

The ALSA timer interface (/dev/snd/timer, major 116, minor 33) provides high-resolution timer services to userspace audio applications. Required by MIDI sequencer clients for tempo-accurate event scheduling and by some audio frameworks for synchronized clock sources.

Ioctls (magic 'T', matching Linux include/uapi/sound/asound.h):

Ioctl	Nr	Direction	Description
`SNDRV_TIMER_IOCTL_PVERSION`	0x00	R	Protocol version (`u32`)
`SNDRV_TIMER_IOCTL_NEXT_DEVICE`	0x01	RW	Enumerate next timer device
`SNDRV_TIMER_IOCTL_GINFO`	0x03	RW	Get timer general info
`SNDRV_TIMER_IOCTL_GPARAMS`	0x04	W	Set timer general parameters
`SNDRV_TIMER_IOCTL_GSTATUS`	0x05	RW	Get timer general status
`SNDRV_TIMER_IOCTL_SELECT`	0x10	W	Select timer by ID
`SNDRV_TIMER_IOCTL_INFO`	0x11	R	Get selected timer info
`SNDRV_TIMER_IOCTL_PARAMS`	0x12	W	Set selected timer parameters
`SNDRV_TIMER_IOCTL_STATUS`	0x14	R	Get selected timer status (64-bit)
`SNDRV_TIMER_IOCTL_START`	0xA0	None	Start selected timer
`SNDRV_TIMER_IOCTL_STOP`	0xA1	None	Stop selected timer
`SNDRV_TIMER_IOCTL_CONTINUE`	0xA2	None	Continue (unpause) timer
`SNDRV_TIMER_IOCTL_PAUSE`	0xA3	None	Pause timer

/// Timer device identifier. Matches Linux `struct snd_timer_id`.
#[repr(C)]
pub struct SndTimerId {
    /// Device class (SNDRV_TIMER_CLASS_*).
    pub dev_class: i32,
    /// Device subclass (SNDRV_TIMER_SCLASS_*).
    pub dev_sclass: i32,
    /// Card number (-1 for global timers).
    pub card: i32,
    /// Device number (timer index within card).
    pub device: i32,
    /// Subdevice number.
    pub subdevice: i32,
}
// SndTimerId: i32(4)*5 = 20 bytes. Userspace ABI struct (ALSA timer ioctl).
const_assert!(core::mem::size_of::<SndTimerId>() == 20);

/// Timer general info. Matches Linux `struct snd_timer_ginfo`.
// kernel-internal, not KABI
#[repr(C)]
pub struct SndTimerGinfo {
    /// Timer identifier (input).
    pub tid: SndTimerId,
    /// Timer flags (output).
    pub flags: u32,
    /// Card number (output).
    pub card: i32,
    /// Timer ID string (output, NUL-terminated).
    pub id: [u8; 64],
    /// Timer name string (output, NUL-terminated).
    pub name: [u8; 80],
    /// Reserved.
    pub reserved0: u64,
    /// Resolution in nanoseconds (output).
    pub resolution: u64,
    /// Minimum resolution in nanoseconds (output).
    pub resolution_min: u64,
    /// Maximum resolution in nanoseconds (output).
    pub resolution_max: u64,
    /// Number of active clients (output).
    pub clients: u32,
    /// Reserved for future use.
    pub reserved: [u8; 32],
}
// SndTimerGinfo: SndTimerId(20) + u32(4) + i32(4) + [u8;64] + [u8;80] + 4pad
//   + u64(8)*4 + u32(4) + [u8;32] + 4pad_trailing = 248 bytes on 64-bit.
// Userspace ABI struct (SNDRV_TIMER_IOCTL_GINFO).
#[cfg(target_pointer_width = "64")]
const _: () = assert!(core::mem::size_of::<SndTimerGinfo>() == 248);

/// Timer device class constants.
pub const SNDRV_TIMER_CLASS_NONE: i32 = -1;
pub const SNDRV_TIMER_CLASS_SLAVE: i32 = 0;
pub const SNDRV_TIMER_CLASS_GLOBAL: i32 = 1;
pub const SNDRV_TIMER_CLASS_CARD: i32 = 2;
pub const SNDRV_TIMER_CLASS_PCM: i32 = 3;

/// Global timer device IDs.
pub const SNDRV_TIMER_GLOBAL_SYSTEM: i32 = 0;
pub const SNDRV_TIMER_GLOBAL_RTC: i32 = 1;
pub const SNDRV_TIMER_GLOBAL_HPET: i32 = 2;
pub const SNDRV_TIMER_GLOBAL_HRTIMER: i32 = 3;

read(2) returns SndTimerRead or SndTimerTread events (depending on SNDRV_TIMER_IOCTL_PARAMS filter field). Events report timer ticks with nanosecond resolution for tempo-synchronization.

21.4.14 ALSA Hardware-Dependent (hwdep) Interface¶

The ALSA hwdep interface (/dev/snd/hwC{C}D{D}, minor = 4 + 32*C + D) provides hardware-specific access for firmware upload, DSP programming, and direct hardware register access. Used by audio devices with proprietary DSP firmware (e.g., USB audio DSP devices, HD Audio codecs with firmware patches).

Ioctls (magic 'H', matching Linux include/uapi/sound/asound.h):

Ioctl	Nr	Direction	Description
`SNDRV_HWDEP_IOCTL_PVERSION`	0x00	R	Protocol version (`u32`)
`SNDRV_HWDEP_IOCTL_INFO`	0x01	R	Get device info (`SndHwdepInfo`)
`SNDRV_HWDEP_IOCTL_DSP_STATUS`	0x02	R	Get DSP load status
`SNDRV_HWDEP_IOCTL_DSP_LOAD`	0x03	W	Upload firmware to DSP

/// Hwdep device info. Matches Linux `struct snd_hwdep_info`.
#[repr(C)]
pub struct SndHwdepInfo {
    /// Device index within card.
    pub device: u32,
    /// Card number.
    pub card: i32,
    /// Hwdep device ID string (NUL-terminated).
    pub id: [u8; 64],
    /// Hwdep device name string (NUL-terminated).
    pub name: [u8; 80],
    /// Interface type (SNDRV_HWDEP_IFACE_*).
    pub iface: i32,
    /// Reserved.
    pub reserved: [u8; 64],
}
// SndHwdepInfo: u32(4) + i32(4) + [u8;64] + [u8;80] + i32(4) + [u8;64] = 220 bytes.
// Userspace ABI struct (SNDRV_HWDEP_IOCTL_INFO).
const_assert!(core::mem::size_of::<SndHwdepInfo>() == 220);

/// DSP firmware image header. Matches Linux `struct snd_hwdep_dsp_image`
/// (in `include/uapi/sound/asound.h`).
///
/// Linux uses `unsigned char __user *image`, `size_t length`, and
/// `unsigned long driver_data` — all pointer-width types. UmkaOS uses
/// `usize` to match: on 64-bit systems these are 8 bytes, on 32-bit
/// systems 4 bytes. The ioctl number encodes the struct size, so the
/// size MUST match the target architecture exactly.
///
/// Verified against torvalds/linux master `include/uapi/sound/asound.h`.
#[repr(C)]
pub struct SndHwdepDspImage {
    /// DSP block index (for multi-part firmware).
    pub index: u32,
    // 4 bytes padding on 64-bit (alignment of `name` is 1, but `image` is
    // pointer-aligned). On 32-bit, no padding here because all fields before
    // `image` sum to 68 bytes (4 + 64), and `image` is 4-byte aligned.
    // Note: #[repr(C)] inserts padding automatically per platform ABI.
    /// Firmware image name (NUL-terminated, for diagnostics).
    pub name: [u8; 64],
    /// Pointer to firmware data in userspace (`unsigned char __user *`).
    /// Copied via `copy_from_user` — never dereferenced directly.
    pub image: usize,
    /// Length of firmware data in bytes (`size_t`).
    pub length: usize,
    /// Driver-specific flags (`unsigned long`).
    pub driver_data: usize,
}
// Size depends on pointer width:
//   64-bit: 4 (index) + 64 (name) + 8 (image) + 8 (length) + 8 (driver_data) = 92
//           but #[repr(C)] aligns `image` to 8: 4 + 4pad + 64 + 8 + 8 + 8 = 96
//   32-bit: 4 (index) + 64 (name) + 4 (image) + 4 (length) + 4 (driver_data) = 80
#[cfg(target_pointer_width = "64")]
const_assert!(size_of::<SndHwdepDspImage>() == 96);
#[cfg(target_pointer_width = "32")]
const_assert!(size_of::<SndHwdepDspImage>() == 80);

/// Hwdep interface type constants.
pub const SNDRV_HWDEP_IFACE_OPL2: i32 = 0;
pub const SNDRV_HWDEP_IFACE_OPL3: i32 = 1;
pub const SNDRV_HWDEP_IFACE_OPL4: i32 = 2;
pub const SNDRV_HWDEP_IFACE_SB16CSP: i32 = 3;
pub const SNDRV_HWDEP_IFACE_EMU10K1: i32 = 4;
pub const SNDRV_HWDEP_IFACE_EMUX_WAVETABLE: i32 = 8;
pub const SNDRV_HWDEP_IFACE_BLUETOOTH: i32 = 9;
pub const SNDRV_HWDEP_IFACE_USX2Y: i32 = 10;
pub const SNDRV_HWDEP_IFACE_FW_DICE: i32 = 15;
pub const SNDRV_HWDEP_IFACE_FW_FIREWORKS: i32 = 16;
pub const SNDRV_HWDEP_IFACE_FW_BEBOB: i32 = 17;
pub const SNDRV_HWDEP_IFACE_FW_OXFW: i32 = 18;
pub const SNDRV_HWDEP_IFACE_FW_DIGI00X: i32 = 19;
pub const SNDRV_HWDEP_IFACE_FW_TASCAM: i32 = 20;
pub const SNDRV_HWDEP_IFACE_FW_MOTU: i32 = 21;
pub const SNDRV_HWDEP_IFACE_FW_FIREFACE: i32 = 22;

The read(2) and write(2) operations on hwdep fds are driver-specific (raw byte transfer to/from device). The firmware upload path uses SNDRV_HWDEP_IOCTL_DSP_LOAD which copies via copy_from_user() and passes to the driver's dsp_load() callback.

21.4.15 ALSA Control Interface¶

The ALSA control interface (/dev/snd/controlC*) exposes mixer controls, jack detection events, and card-level information. Required by PipeWire, PulseAudio, amixer, and alsamixer.

Ioctls (magic 'U', matching Linux include/uapi/sound/asound.h):

Ioctl	Nr	Description
`SNDRV_CTL_IOCTL_PVERSION`	0x00	Protocol version
`SNDRV_CTL_IOCTL_CARD_INFO`	0x01	Card info (id, driver, name, longname, components)
`SNDRV_CTL_IOCTL_ELEM_LIST`	0x10	List all control elements (count + offset pagination)
`SNDRV_CTL_IOCTL_ELEM_INFO`	0x11	Get element info (type, access flags, value range)
`SNDRV_CTL_IOCTL_ELEM_READ`	0x12	Read element value
`SNDRV_CTL_IOCTL_ELEM_WRITE`	0x13	Write element value
`SNDRV_CTL_IOCTL_ELEM_LOCK`	0x14	Lock element (exclusive write access)
`SNDRV_CTL_IOCTL_ELEM_UNLOCK`	0x15	Unlock element
`SNDRV_CTL_IOCTL_SUBSCRIBE_EVENTS`	0x16	Enable/disable event subscription
`SNDRV_CTL_IOCTL_ELEM_ADD`	0x17	Add user-defined control element
`SNDRV_CTL_IOCTL_ELEM_REPLACE`	0x18	Replace user-defined control element
`SNDRV_CTL_IOCTL_ELEM_REMOVE`	0x19	Remove user-defined control element
`SNDRV_CTL_IOCTL_TLV_READ`	0x1A	Read TLV data for dB scale
`SNDRV_CTL_IOCTL_TLV_WRITE`	0x1B	Write TLV data
`SNDRV_CTL_IOCTL_TLV_COMMAND`	0x1C	TLV command (volatile data)
`SNDRV_CTL_IOCTL_HWDEP_NEXT_DEVICE`	0x20	Enumerate hwdep devices
`SNDRV_CTL_IOCTL_HWDEP_INFO`	0x21	Hwdep device info
`SNDRV_CTL_IOCTL_PCM_NEXT_DEVICE`	0x30	Enumerate PCM devices
`SNDRV_CTL_IOCTL_PCM_INFO`	0x31	PCM device info
`SNDRV_CTL_IOCTL_PCM_PREFER_SUBDEVICE`	0x32	Set preferred PCM subdevice

Key ABI structs:

/// Element info (type, access, count, value range).
/// Userspace ABI struct — matches Linux `struct snd_ctl_elem_info`.
/// Full size depends on SndCtlElemId and SndCtlElemInfoValue definitions;
/// const_assert deferred to implementation (where those types are fully defined).
// kernel-internal, not KABI
#[repr(C)]
pub struct SndCtlElemInfo {
    pub id:        SndCtlElemId,     // numid + iface + name
    pub elem_type: SndCtlElemType,   // BOOLEAN, INTEGER, ENUMERATED, BYTES, etc.
    pub access:    u32,              // SNDRV_CTL_ELEM_ACCESS_* flags
    pub count:     u32,              // number of values per element
    pub owner:     i32,              // PID of locking process (0 = unlocked)
    pub value:     SndCtlElemInfoValue, // union: integer{min,max,step}, enumerated{items,names}
    pub _reserved: [u8; 64],
}

/// Element value (read/write payload).
/// Userspace ABI struct — matches Linux `struct snd_ctl_elem_value`.
/// const_assert deferred to implementation (where SndCtlElemId/SndCtlElemValueData are defined).
// kernel-internal, not KABI
#[repr(C)]
pub struct SndCtlElemValue {
    pub id:    SndCtlElemId,
    pub value: SndCtlElemValueData, // union: integer[128], integer64[64], enumerated[128], bytes[512]
    pub _reserved: [u8; 128],
}

Jack detection events: When a jack state changes (headphone insert/remove), the driver calls snd_jack_report(jack, status). The control interface delivers an SNDRV_CTL_EVENT_MASK_VALUE event on the jack's control element. PipeWire detects this via poll() on the control fd and routes audio accordingly.

21.4.15.1 D-Bus Bridge Schema for Audio¶

The ALSA subsystem declares D-Bus interface schemas for the D-Bus bridge service (Section 11.11). The bridge handles transport, connection management, and bus registration; the audio subsystem only declares the schema. User-space audio management tools (e.g., pavucontrol, gnome-control-center) use these interfaces for volume control and card enumeration without opening /dev/snd/* devices directly.

dbus_interface "org.umkaos.Audio1.Mixer" {
    /// Get the current volume for a named control element (e.g., "Master Playback Volume").
    /// Returns the volume as a percentage (0-100) normalized from the element's
    /// hardware min/max range. The element name matches the ALSA mixer element
    /// name reported by `SNDRV_CTL_IOCTL_ELEM_LIST`.
    @dbus_method("GetVolume")
    fn get_volume(card_index: u32, element_name: &str) -> Result<u32>;

    /// Set the volume for a named control element.
    /// `volume_pct` is clamped to [0, 100] and mapped linearly to the element's
    /// hardware range. Requires the caller to have audio device access
    /// (membership in the `audio` group or `CAP_SYS_ADMIN`).
    @dbus_method("SetVolume")
    fn set_volume(card_index: u32, element_name: &str, volume_pct: u32) -> Result<()>;

    /// Get the mute state of a named control element.
    /// Returns `true` if muted. Elements without a mute switch return `false`.
    @dbus_method("GetMute")
    fn get_mute(card_index: u32, element_name: &str) -> Result<bool>;

    /// Set the mute state of a named control element.
    @dbus_method("SetMute")
    fn set_mute(card_index: u32, element_name: &str, muted: bool) -> Result<()>;

    /// Emitted when any mixer element's value changes (volume, mute, or switch).
    /// Bridges the ALSA `SNDRV_CTL_EVENT_MASK_VALUE` event to D-Bus.
    @dbus_signal("VolumeChanged")
    fn volume_changed(card_index: u32, element_name: &str, volume_pct: u32, muted: bool);
}

dbus_interface "org.umkaos.Audio1.Card" {
    /// Return information about a sound card.
    /// Fields match `SNDRV_CTL_IOCTL_CARD_INFO`: id, driver, name, longname, mixername.
    @dbus_method("GetInfo")
    fn get_info(card_index: u32) -> Result<AudioCardInfo>;

    /// Return the operational status of the card.
    /// `online`: card is present and functional.
    /// `suspended`: card is in runtime PM suspend.
    /// `disconnected`: card was hot-unplugged.
    @dbus_method("GetStatus")
    fn get_status(card_index: u32) -> Result<AudioCardStatus>;

    /// List all available sound cards.
    /// Returns an array of (card_index, card_id_string) pairs.
    /// **Bound**: Maximum 32 sound cards (SNDRV_CARDS = 32 with CONFIG_SND_DYNAMIC_MINORS=y).
    /// The Vec is bounded by SNDRV_CARDS; exceeding 32 cards returns the first 32.
    @dbus_method("ListCards")
    fn list_cards() -> Result<Vec<(u32, String)>>;

    /// Emitted when a card is added or removed (hot-plug events).
    @dbus_signal("CardChanged")
    fn card_changed(card_index: u32, event: AudioCardEvent);
}

dbus_interface "org.umkaos.Audio1.Stream" {
    /// Get the current PCM stream state (OPEN, SETUP, PREPARED, RUNNING, etc.).
    @dbus_method("GetState")
    fn get_state(card_index: u32, device: u32, subdevice: u32) -> Result<u32>;

    /// Get stream parameters: (rate_hz, channels, format).
    @dbus_method("GetParams")
    fn get_params(card_index: u32, device: u32, subdevice: u32) -> Result<(u32, u32, u32)>;

    /// Emitted when a PCM stream transitions state (e.g., PREPARED → RUNNING).
    @dbus_signal("StateChanged")
    fn state_changed(card_index: u32, device: u32, subdevice: u32, new_state: u32);
}

dbus_interface "org.umkaos.Audio1.Jack" {
    /// Get the current state of a named jack (headphone, line-out, etc.).
    /// Returns true if a plug is inserted.
    @dbus_method("GetState")
    fn get_state(card_index: u32, jack_name: &str) -> Result<bool>;

    /// List all jacks on the given card.
    /// Returns Vec of (jack_name, connected) tuples.
    @dbus_method("List")
    fn list(card_index: u32) -> Result<Vec<(String, bool)>>;

    /// Emitted when a jack's plug state changes (insert/remove).
    /// Desktop audio managers (PipeWire, PulseAudio) use this signal
    /// to reroute audio streams when headphones are plugged/unplugged.
    @dbus_signal("JackStateChanged")
    fn jack_state_changed(card_index: u32, jack_name: String, connected: bool);
}

AudioCardInfo, AudioCardStatus, and AudioCardEvent are D-Bus struct types derived from the ALSA control ABI structs (SndCtlCardInfo, SndCtlElemValue). The bridge translates between the kernel's ioctl-based ALSA control interface and the D-Bus wire format; no new kernel data paths are introduced.

Note on D-Bus schema types: The schema above uses logical types (&str, String, Vec<...>) that correspond to D-Bus wire types (STRING, ARRAY). The D-Bus bridge service (Section 11.11) handles serialization between these logical types and the fixed-size repr(C) ring buffer entries used internally. The kernel-side KABI ring messages use [u8; 64] NUL-terminated strings and bounded arrays; the bridge performs the translation at the D-Bus protocol boundary.

21.5 Display and Graphics (DRM/KMS)¶

The Direct Rendering Manager (DRM) and Kernel Mode Setting (KMS) subsystems manage GPUs, display outputs, and hardware-accelerated rendering.

21.5.1 DRM as a Tier 1 Subsystem¶

DRM device numbering (Linux ABI-compatible):

Major number: 226 (DRM_MAJOR, assigned by LANANA).
Primary nodes (/dev/dri/cardN): minor N (0, 1, 2, ..., max 63). One per GPU/display controller. Supports modesetting (requires DRM master).
Render nodes (/dev/dri/renderDN): minor 128 + N (128, 129, ..., max 191). One per GPU. Supports unprivileged GPU compute and rendering (no modesetting, no DRM master required). Minor numbering follows Linux's 64 * DRM_MINOR_RENDER scheme (enum: PRIMARY=0, CONTROL=1, RENDER=2).
Allocation: device registry assigns the next free minor in the appropriate range when a DRM driver registers via drm_dev_register().
/dev/dri/ directory is created by devtmpfs. Symlinks in /dev/dri/by-path/ map bus addresses to card/render nodes.

Character device registration (Section 14.5):

/// Called from drm_subsystem_init() during boot Phase 5.3+ (after Tier 1 driver loading).
fn drm_register_chrdev() {
    register_chrdev_region(ChrdevRegion {
        major: 226,
        minor_base: 0,
        minor_count: 256,  // 64 primary + 64 control + 128 render
        fops: &DRM_FOPS,
        name: "dri",
    }).expect("DRM major 226 registration");
}

DRM_FOPS.open() determines the node type from the minor number: 0–63 = primary (card), 64–127 = control (legacy, usually disabled), 128–191 = render. It then locates the DrmDevice instance registered by the GPU driver and creates a per-open DrmFile state (GEM handle namespace, DRM master status, authentication token). Render node opens bypass DRM master authentication checks, matching Linux behavior.

/// Per-open state for a DRM device file descriptor. One per `open()` on
/// `/dev/dri/cardN` or `/dev/dri/renderDN`.
pub struct DrmFile {
    /// The DRM device this file belongs to.
    pub device: Arc<DrmDevice>,
    /// GEM handle namespace for this fd. Maps userspace u32 handles to
    /// kernel GEM objects. XArray keyed by handle (integer key, O(1) lookup).
    pub gem_handles: XArray<Arc<GemObject>>,
    /// Next GEM handle to allocate (monotonically increasing per-fd).
    pub next_handle: AtomicU32,
    /// True if this fd is the DRM master (can perform modesetting).
    /// Only one fd per DRM device can be master at a time.
    pub is_master: bool,
    /// DRM authentication token. Non-master clients must authenticate
    /// via DRM_AUTH ioctl before submitting GPU commands on primary nodes.
    /// Render nodes skip authentication entirely.
    pub authenticated: bool,
    /// Minor type: Primary (modesetting), Render (compute/render only).
    pub minor_type: DrmMinorType,
    /// Client capabilities negotiated via DRM_IOCTL_SET_CLIENT_CAP.
    /// Tracks DRM_CLIENT_CAP_STEREO_3D, DRM_CLIENT_CAP_UNIVERSAL_PLANES,
    /// DRM_CLIENT_CAP_ATOMIC, DRM_CLIENT_CAP_WRITEBACK_CONNECTORS.
    pub client_caps: u32,
    /// Event queue for this fd (VBlank events, page flip completions).
    /// Stores full `DrmEventVblank` (32 bytes each) — the largest DRM event type.
    /// `DrmEvent` is only the 8-byte ABI header; storing it would lose the payload.
    /// All current DRM event types are 32 bytes (DrmEventVblank). If future event
    /// types differ in size, change to a byte-level ring with length-prefixed entries.
    /// Read via `read()` on the DRM fd. Consumer side (`read()`) acquires
    /// with IRQs disabled (`spin_lock_irqsave`) to prevent deadlock with
    /// the VBlank IRQ handler producer.
    pub event_queue: SpinLock<BoundedRing<DrmEventVblank, 256>>,
    /// Wait queue for poll/select/epoll on this DRM fd.
    pub waiters: WaitQueueHead,
}

pub enum DrmMinorType {
    Primary,  // /dev/dri/cardN (modesetting + render)
    Render,   // /dev/dri/renderDN (render/compute only, no modesetting)
}

21.5.1.1 GEM Buffer Objects¶

The Graphics Execution Manager (GEM) provides the fundamental GPU memory allocation and tracking layer. Every GPU buffer — dumb framebuffers, render targets, textures, command buffers — is represented as a GemObject. Userspace references GEM objects through per-fd u32 handles stored in the DrmFile::gem_handles XArray (integer-keyed, O(1) lookup). The kernel holds Arc<GemObject> so that a single buffer can be shared across multiple handles (via DRM_IOCTL_PRIME_FD_TO_HANDLE import) and across processes (via DMA-BUF export).

/// A GEM buffer object — the fundamental GPU memory allocation unit.
/// Each GemObject is reference-counted (Arc) and tracked in a per-file
/// handle table (XArray<Arc<GemObject>>, integer-keyed by handle).
// kernel-internal, not KABI — never crosses compilation boundary or ioctl interface.
#[repr(C)]
pub struct GemObject {
    /// Size of the allocation in bytes (page-aligned, immutable after creation).
    pub size: usize,
    /// Backing memory type. Determines where the physical pages live.
    pub backing: GemBacking,
    /// DMA address for GPU access (populated after pin/map via
    /// [Section 4.14](04-memory.md#dma-subsystem) `umka_driver_dma_map_sg`). `None` until the
    /// buffer is bound to a GPU address space.
    pub dma_addr: Option<DmaAddr>,
    /// Fake offset for userspace mmap via `DRM_IOCTL_MODE_MAP_DUMB`.
    /// Unique per-device, allocated from a per-DrmDevice `Idr`.
    /// Userspace passes this as the `offset` argument to `mmap()` on
    /// the DRM fd; the DRM fault handler resolves it to physical pages.
    pub mmap_offset: u64,
    /// Import reference count — tracks how many DMA-BUF importers hold
    /// this object. Incremented on `DRM_IOCTL_PRIME_FD_TO_HANDLE` (import),
    /// decremented when the importing `DrmFile` closes its handle. When
    /// this reaches zero and no local handles remain, the GEM object is
    /// eligible for destruction.
    pub import_count: AtomicU32,
    /// Global name for `DRM_IOCTL_GEM_FLINK` legacy sharing. 0 = unnamed.
    /// Flink names are allocated from a per-DrmDevice `Idr` and stored in
    /// `DrmDevice::flink_table: XArray<Arc<GemObject>>`. Deprecated in
    /// favour of DMA-BUF / PRIME, but required for Xorg DDX compatibility.
    pub flink_name: u32,
    /// Reservation object for implicit fencing (shared with DMA-BUF layer).
    /// Tracks read/write fences from GPU command submissions so that
    /// cross-device synchronisation (e.g., GPU render → display scanout)
    /// waits for the correct operations to complete.
    pub resv: ReservationObject,
}

/// Backing memory type for a GEM buffer.
pub enum GemBacking {
    /// System RAM pages (default for dumb buffers and most render targets).
    /// Pages are allocated from the physical allocator ([Section 4.2](04-memory.md#physical-memory-allocator))
    /// at buffer creation time. The page array is heap-allocated with a known
    /// upper bound: `size / PAGE_SIZE` entries. Uses `Box<[PageRef]>` (not Vec)
    /// because the size is fixed at creation and never grows — `Box<[T]>` avoids
    /// the 8-byte capacity field overhead of Vec and signals immutable length.
    Pages(Box<[PageRef]>),
    /// VRAM carved from a PCI BAR region (discrete GPUs with dedicated memory).
    /// `bar_offset` is relative to the BAR base; the driver's VRAM allocator
    /// (a simple buddy or best-fit allocator over the BAR range) manages
    /// sub-allocation.
    Vram { bar_offset: u64, bar_index: u8 },
    /// Imported DMA-BUF from another device (cross-device zero-copy sharing).
    /// The GemObject does not own the backing pages — the exporting device
    /// does. `sg_table` caches the scatter-gather mapping obtained from
    /// `DmaBuf::map_attachment()` for the lifetime of the import.
    DmaBuf { dmabuf: Arc<DmaBuf>, sg_table: DmaSgl },
}

/// Reservation object — tracks implicit DMA fences on a shared buffer.
/// One per GemObject (and per DMA-BUF). Writers add exclusive fences;
/// readers add shared fences. A fence is signalled when the GPU completes
/// the associated command buffer submission.
pub struct ReservationObject {
    /// Lock protecting fence list mutations. Readers use RCU for
    /// lock-free fence inspection on the scanout/display path.
    pub lock: SpinLock<()>,
    /// Exclusive (write) fence — only one writer at a time.
    pub fence_excl: Option<Arc<DmaFence>>,
    /// Shared (read) fences — multiple concurrent readers allowed.
    /// Bounded: a single buffer is rarely read by more than 4 consumers
    /// simultaneously (display, video encoder, compositor, second GPU).
    pub fence_shared: ArrayVec<Arc<DmaFence>, 8>,
}

Handle lifecycle: DRM_IOCTL_GEM_OPEN or DRM_IOCTL_PRIME_FD_TO_HANDLE inserts an Arc<GemObject> into DrmFile::gem_handles at the next available handle slot (returned to userspace as a u32). DRM_IOCTL_GEM_CLOSE removes the handle entry and decrements the Arc refcount. When the last Arc reference drops (no handles, no DMA-BUF exports, no active GPU submissions referencing it), GemObject::drop() frees the backing memory: Pages are returned to the physical allocator, Vram is returned to the BAR sub-allocator, and DmaBuf detaches the scatter-gather mapping.

GPUs are complex, high-bandwidth devices that require aggressive memory management (GART/TTM) and rapid command submission. Therefore, UmkaOS GPU drivers (e.g., umka-amdgpu, umka-i915) operate in Tier 1 (Ring 0, MPK-isolated) (Section 11.3). Full implementation details covering display device models, atomic modesetting, framebuffer objects, and scanout planes are specified in Section 21.5–Section 21.5.

The GPU driver runs in a dedicated hardware memory domain. It receives command buffers from userspace (Mesa/Vulkan) via shared memory rings. The driver validates the command buffers (ensuring they don't contain malicious GPU memory writes) and submits them to the hardware command rings.

Because the driver is MPK-isolated, a bug in the complex command validation logic (a frequent source of Linux CVEs) cannot corrupt UmkaOS Core memory or the page cache. If the GPU driver faults, it is reloaded (~50-150ms). Userspace rendering contexts are lost (triggering a VK_ERROR_DEVICE_LOST in Vulkan applications), but the system remains stable.

21.5.2 DMA-BUF and Secure File Descriptor Passing¶

Modern Linux graphics rely entirely on DMA-BUF: a mechanism for sharing hardware-backed memory buffers between different devices and processes (e.g., sharing a rendered frame from the GPU to the Wayland compositor, or from a V4L2 webcam to the GPU).

In Linux, a DMA-BUF is represented as a standard file descriptor. Passing the file descriptor over a UNIX domain socket grants access to the underlying memory.

UmkaOS's DMA-BUF Implementation: UmkaOS implements DMA-BUF using the core Capability System (Section 9.1). 1. When the GPU driver allocates a framebuffer, it creates an UmkaOS Memory Object and mints a Capability Token granting MEM_READ | MEM_WRITE access. 2. umka-sysapi wraps this Capability Token in a synthetic file descriptor. 3. When the Wayland client passes the file descriptor to the compositor over AF_UNIX (using SCM_RIGHTS), the kernel securely delegates the Capability Token to the compositor's capability space. 4. The compositor uses the Capability Token to map the framebuffer into its own address space, or passes it back to the GPU driver to queue a page flip (KMS).

By backing DMA-BUF file descriptors with cryptographic Capability Tokens, UmkaOS guarantees that memory access rights cannot be forged or leaked, and seamlessly supports distributed graphics rendering (Section 5.1) where the compositor and the rendering client exist on different physical nodes in the cluster.

21.5.3 Display Device Model¶

Interface contract: Section 13.3 (DisplayDriver trait, display_device_v1 KABI). This section specifies the Intel i915, AMD DCN, and embedded display pipeline implementations of that contract. Tier decision and atomic modesetting requirement are authoritative in Section 13.3.

Tier: Tier 1 for integrated GPUs (Intel i915, AMD amdgpu iGPU). Tier 2 only for fully offloaded display (USB DisplayLink, network display servers).

// umka-core/src/display/mod.rs

/// Display device handle.
// kernel-internal, not KABI — opaque handle, never exposed to userspace.
#[repr(C)]
pub struct DisplayDeviceId(u64);

/// Display connector type. Values from Linux 6.12 include/uapi/drm/drm_mode.h.
/// Binary compatibility requires exact value matches.
#[repr(u32)]
pub enum ConnectorType {
    Unknown     = 0,
    VGA         = 1,
    DVII        = 2,
    DVID        = 3,
    DVIA        = 4,
    Composite   = 5,
    SVIDEO      = 6,
    LVDS        = 7,
    Component   = 8,
    NinePinDIN  = 9,
    DisplayPort = 10,
    HDMIA       = 11,
    HDMIB       = 12,
    TV          = 13,
    EDP         = 14,
    VIRTUAL     = 15,
    DSI         = 16,
    DPI         = 17,
    WRITEBACK   = 18,
    SPI         = 19,
    USB         = 20,
}

/// Display connector state. Matches Linux `enum drm_connector_status` in
/// `include/drm/drm_connector.h`. Value 0 is unused in Linux.
/// EDID availability is tracked separately in `ConnectorProps` — it is
/// orthogonal to the connection state (a connected display may lack EDID
/// if the DDC channel is broken).
///
/// Verified against torvalds/linux master `include/drm/drm_connector.h`:
///   connector_status_connected = 1
///   connector_status_disconnected = 2
///   connector_status_unknown = 3
#[repr(u32)]
pub enum ConnectorState {
    /// Display attached and sink detected (digital: HPD asserted; analog:
    /// load detected). EDID may or may not have been read successfully.
    Connected = 1,
    /// No display attached. For digital outputs (DP, HDMI) this means HPD
    /// is deasserted. For analog (VGA) this means no load detected.
    Disconnected = 2,
    /// Connection status could not be reliably determined. The connector
    /// should be treated as potentially connected; the compositor may
    /// attempt to light it up with fallback modes from the connector's
    /// mode list.
    Unknown = 3,
}

/// Display connector.
///
/// Mutable connector properties (EDID, modes, active mode) are grouped into
/// a single `ConnectorProps` snapshot, swapped atomically via RCU during
/// hotplug or modeset. This eliminates per-field RwLock overhead and ensures
/// readers always see a consistent snapshot (no half-updated EDID + stale
/// mode list). Connector state and DPMS are independent atomic fields
/// because they change on different paths (hotplug IRQ vs userspace ioctl).
pub struct DisplayConnector {
    /// Connector ID (unique per display device).
    pub id: u32,
    /// Connector type.
    pub connector_type: ConnectorType,
    /// Current state (connected, disconnected). Updated atomically by
    /// hotplug IRQ handler — no lock needed.
    pub state: AtomicU32, // ConnectorState
    /// DPMS (Display Power Management Signaling) state.
    pub dpms: AtomicU32, // DpmsState
    /// Mutable connector properties. Updated during hotplug (EDID read,
    /// mode list rebuild) and modeset (active_mode change). RCU-protected:
    /// readers (userspace mode queries, compositor enumeration) are lock-free;
    /// writers (hotplug handler, atomic commit) clone-and-swap.
    pub props: RcuPtr<Arc<ConnectorProps>>,
    /// Back-reference to the parent display device's driver operations and
    /// opaque driver context. Used by connector methods (VRR, DPMS, hotplug)
    /// to call into the hardware driver via `DisplayHwOps` function pointers.
    pub driver: DisplayDriverRef,
}

/// Reference to a display driver's operations table and opaque context.
/// Stored in each `DisplayConnector` to allow connector-level methods
/// (e.g., VRR enable, DPMS control) to call back into the driver without
/// traversing the parent `DisplayDevice`.
pub struct DisplayDriverRef {
    /// Driver hardware operations vtable.
    pub hw_ops: &'static DisplayHwOps,
    /// Opaque driver context passed as first argument to all `DisplayHwOps` functions.
    pub ctx: *mut c_void,
}

/// Immutable snapshot of connector properties. Created during hotplug
/// (EDID parse → mode list → props swap) or atomic commit (active_mode
/// change). Freed after RCU grace period when superseded.
pub struct ConnectorProps {
    /// EDID data. Fixed-size buffer avoids heap allocation during hotplug.
    /// EDID standard: 128 bytes/block; E-EDID extensions up to 256 bytes;
    /// DisplayID and CTA extensions can reach 512 bytes total.
    pub edid: Option<ArrayVec<u8, 512>>,
    /// Supported display modes (parsed from EDID or driver-provided fallbacks).
    /// Typical displays advertise 10-40 modes; 64 is sufficient for 8K panels
    /// with multiple refresh rates.
    pub modes: ArrayVec<DisplayMode, 64>,
    /// Currently active mode (if connected and enabled).
    pub active_mode: Option<DisplayMode>,
}

/// Display mode (resolution, refresh rate).
#[repr(C)]
#[derive(Clone, Copy, PartialEq, Eq)]
pub struct DisplayMode {
    /// Horizontal resolution in pixels. **Kernel-internal type**: u16 is sufficient
    /// for display resolutions (max 65535; 8K = 7680). Linux DRM uses `int` (i32) for
    /// hdisplay/vdisplay, but the ABI-facing `drm_mode_modeinfo` uses `__u16`. UmkaOS
    /// uses u16 for the kernel-internal type to match the ABI type and save space.
    pub hdisplay: u16,
    /// Vertical resolution in pixels (same u16 rationale as hdisplay).
    pub vdisplay: u16,
    /// Refresh rate in millihertz (60000 = 60.000 Hz).
    pub vrefresh_mhz: u32,
    /// Flags (interlaced, VRR capable, preferred mode).
    pub flags: u32,
    /// Pixel clock in kHz (for driver use, validates mode is achievable).
    pub clock_khz: u32,
    /// Horizontal timings (front porch, sync, back porch).
    /// u32 matches Linux DRM `drm_display_mode` (uses `int` for all timing fields).
    /// Required for 8K@120Hz+ with extended VRR blanking where htotal can exceed 65535.
    pub hsync_start: u32,
    pub hsync_end: u32,
    pub htotal: u32,
    /// Vertical timings (front porch, sync, back porch).
    pub vsync_start: u32,
    pub vsync_end: u32,
    pub vtotal: u32,
}
// DisplayMode: u16(2)*2 + u32(4)*9 = 40 bytes.
// Kernel-internal display mode representation. The ioctl layer translates to/from
// Linux's drm_mode_modeinfo (u16 timing fields) for ABI compatibility. u32 timing
// fields allow htotal > 65535 for 8K@120Hz+ with extended VRR blanking.
//
// **Ioctl translation policy**: When copying to userspace `drm_mode_modeinfo`
// (e.g., `DRM_IOCTL_MODE_GETCONNECTOR`), timing fields exceeding `u16::MAX`
// cause the mode to be omitted from the userspace mode list. The kernel logs
// `klog(Info, "DRM: mode {}x{}@{}Hz omitted from userspace list (htotal={} > u16::MAX)",
// ...)` for discoverability. Such modes are accessible only via the UmkaOS-native
// atomic modesetting interface (Phase 4).
const_assert!(core::mem::size_of::<DisplayMode>() == 40);

/// Display mode flags.
pub mod mode_flags {
    /// Interlaced mode.
    pub const INTERLACED: u32 = 1 << 0;
    /// Variable Refresh Rate (VRR) capable (FreeSync, G-Sync, HDMI VRR).
    pub const VRR: u32 = 1 << 1;
    /// Preferred mode (from EDID).
    pub const PREFERRED: u32 = 1 << 2;
}

/// DPMS (Display Power Management Signaling) state.
#[repr(u32)]
pub enum DpmsState {
    /// Display on, normal operation.
    On = 0,
    /// Display standby (monitor sleeps, can wake instantly).
    Standby = 1,
    /// Display suspend (lower power than standby).
    Suspend = 2,
    /// Display off (lowest power, may take 1-2 seconds to wake).
    Off = 3,
}

21.5.4 Atomic Modesetting Protocol¶

UmkaOS uses an atomic modesetting model (same as Linux DRM atomic). Changes to the display configuration (resolution, framebuffer, connector enable/disable) are batched into a single atomic transaction. Either all changes apply or none do. This eliminates tearing and half-configured states.

// umka-core/src/display/atomic.rs

/// Atomic modesetting request.
///
/// Uses fixed-capacity `ArrayVec` instead of `Vec` to avoid heap allocation
/// on every display frame. The bounds are hardware-limited: no display
/// controller has more than `MAX_CONNECTORS` (8) connectors or `MAX_PLANES`
/// (32) planes. At 60fps+ with multiple displays, eliminating per-frame
/// heap allocation avoids allocator contention on the latency-sensitive
/// commit path.
pub struct AtomicModeset {
    /// Connector changes (enable, disable, mode change).
    pub connectors: ArrayVec<ConnectorUpdate, MAX_CONNECTORS>,
    /// Plane changes (scanout buffer, position, scaling).
    pub planes: ArrayVec<PlaneUpdate, MAX_PLANES>,
    /// Flags (test-only, allow modeset, async).
    pub flags: AtomicFlags,
}

/// Connector update (part of atomic transaction).
pub struct ConnectorUpdate {
    /// Connector ID.
    pub connector_id: u32,
    /// New mode (None = disable connector).
    pub mode: Option<DisplayMode>,
    /// CRTC to attach this connector to (if enabling).
    pub crtc_id: Option<u32>,
}

/// Plane update (part of atomic transaction).
pub struct PlaneUpdate {
    /// Plane ID.
    pub plane_id: u32,
    /// Framebuffer handle (None = disable plane).
    pub fb: Option<FramebufferHandle>,
    /// Source rectangle in framebuffer (for scaling/cropping).
    pub src: Rectangle,
    /// Destination rectangle on screen.
    pub dst: Rectangle,
}

/// Atomic modesetting flags.
pub mod atomic_flags {
    /// Test-only (validate but don't apply; used by compositors to check if mode is possible).
    pub const TEST_ONLY: u32 = 1 << 0;
    /// Allow modeset (may cause visible glitch; only allow during VT switch or initial setup).
    pub const ALLOW_MODESET: u32 = 1 << 1;
    /// Async flip (flip on next vblank, don't wait; lower latency).
    pub const ASYNC: u32 = 1 << 2;
}

/// Rectangle (for plane src/dst).
#[repr(C)]
#[derive(Clone, Copy)]
pub struct Rectangle {
    pub x: u32,
    pub y: u32,
    pub width: u32,
    pub height: u32,
}
// Rectangle: u32(4)*4 = 16 bytes.
// Used in atomic modesetting plane source/destination parameters.
const_assert!(core::mem::size_of::<Rectangle>() == 16);

Atomic commit flow: 1. Wayland compositor builds an AtomicModeset transaction: "attach framebuffer FB123 to primary plane, set mode to 1920x1080@60Hz on connector 0, disable connector 1". 2. Compositor calls ioctl(dri_fd, UMKA_DRM_ATOMIC_COMMIT, &atomic_modeset) (via umka-sysapi DRM emulation). 3. Kernel validates the transaction: - Mode is supported by the connector (in the modes list from EDID). - Framebuffer format is supported by the plane (RGB888, XRGB8888, NV12, etc.). - Bandwidth is achievable (pixel clock within limits, memory bandwidth sufficient). 4. If valid, kernel programs the display controller hardware (Intel i915 writes to plane registers, GGT, pipe config; AMD writes to DCN registers). 5. Hardware scans out the new framebuffer on the next vblank (tear-free).

21.5.5 Framebuffer Objects¶

A framebuffer is a region of GPU memory containing pixel data. The display controller's scanout engine reads from the framebuffer via DMA and sends pixels to the monitor.

// umka-core/src/display/framebuffer.rs

/// Framebuffer handle (opaque to userspace).
// kernel-internal, not KABI — opaque handle type.
#[repr(C)]
pub struct FramebufferHandle(u64);

/// Framebuffer format (pixel layout).
#[repr(u32)]
pub enum FramebufferFormat {
    /// 32bpp XRGB (X=unused, R=red, G=green, B=blue; 8 bits each).
    Xrgb8888 = 0x34325258,
    /// 32bpp ARGB (with alpha channel).
    Argb8888 = 0x34325241,
    /// 24bpp RGB (no alpha, no padding).
    Rgb888 = 0x34324752,
    /// 16bpp RGB565.
    Rgb565 = 0x36314752,
    /// YUV 4:2:0 planar (NV12, for video).
    Nv12 = 0x3231564e,
}

/// Framebuffer descriptor.
pub struct Framebuffer {
    /// Handle.
    pub handle: FramebufferHandle,
    /// Width in pixels.
    pub width: u32,
    /// Height in pixels.
    pub height: u32,
    /// Pixel format.
    pub format: FramebufferFormat,
    /// Pitch (bytes per row; may be larger than width * bpp if aligned).
    pub pitch: u32,
    /// GPU memory object backing this framebuffer (for DMA-BUF export, Section 4.3/Section 21.4).
    pub mem_obj: MemoryObjectHandle,
}

Framebuffer allocation: Compositor allocates GPU memory (via the GPU driver, Section 9.1/Section 22.1), renders the desktop into it (via Vulkan/OpenGL), then creates a framebuffer object pointing to that memory and passes it to the display subsystem for scanout.

21.5.6 Scanout Planes¶

Modern display controllers have multiple planes (hardware overlays) that can scan out independent framebuffers simultaneously: - Primary plane: The desktop/window contents (always present). - Cursor plane: The mouse cursor (small, can be moved with no desktop re-render). - Overlay planes: Video playback windows (compositor passes video framebuffer directly to hardware, zero-copy).

// umka-core/src/display/plane.rs

/// Display plane (hardware overlay).
///
/// Plane state (framebuffer, position, scaling) is grouped into a single
/// `PlaneState` snapshot, swapped atomically via RCU during atomic commit.
/// This eliminates the need to acquire three separate RwLocks (fb, src, dst)
/// and guarantees readers see a consistent plane configuration.
pub struct DisplayPlane {
    /// Plane ID (unique per display device).
    pub id: u32,
    /// Plane type.
    pub plane_type: PlaneType,
    /// Bitmask of CRTCs this plane can be assigned to. Bit N set means this
    /// plane is compatible with the CRTC at index N in `DisplayDevice::crtcs`.
    /// Required for atomic commit validation: the kernel rejects commits that
    /// assign a plane to a CRTC not in its `possible_crtcs` set.
    pub possible_crtcs: u32,
    /// Supported framebuffer formats (immutable after probe).
    // ArrayVec<_, 16>: KABI-stable (no heap pointer, no runtime allocation).
    // 16 format slots cover all known display hardware (typical: 4–12 formats per plane).
    // Cannot use Vec across KABI boundary (Rust Vec layout not guaranteed stable).
    pub formats: ArrayVec<FramebufferFormat, 16>,
    /// Current plane state. Replaced atomically during modeset commit.
    /// Readers (vblank handlers, userspace queries) get a consistent
    /// snapshot via `rcu_read_lock()` — no per-field locking.
    pub state: RcuPtr<Arc<PlaneState>>,
}

/// Immutable snapshot of plane state. Created during atomic commit and
/// swapped via RCU. Freed after grace period when superseded.
pub struct PlaneState {
    /// Current framebuffer attached (None = plane disabled).
    pub fb: Option<FramebufferHandle>,
    /// Source rectangle in framebuffer (for scaling/cropping).
    pub src: Rectangle,
    /// Destination rectangle on screen.
    pub dst: Rectangle,
}

/// Plane type.
#[repr(u32)]
pub enum PlaneType {
    /// Primary plane (desktop contents).
    Primary = 0,
    /// Cursor plane (mouse cursor, small, high-priority).
    Cursor = 1,
    /// Overlay plane (video, additional window).
    Overlay = 2,
}

Cursor plane optimization: Moving the cursor only requires updating the cursor plane's dst rectangle. The compositor does NOT need to re-render the desktop or flip the primary plane. This is why modern desktops have smooth 144Hz cursors even with a 60Hz desktop.

21.5.7 Hotplug Detection¶

When a display is connected (USB-C DP Alt Mode, HDMI, etc.), the display controller raises an interrupt. The driver handles hotplug in the interrupt handler:

// umka-core/src/display/hotplug.rs

/// Per-display-controller device state.
/// One instance per display controller (e.g., one per i915 GPU, one per
/// DisplayPort MST hub). Created by the display driver during probe.
pub struct DisplayDevice {
    /// Device registry handle for this display controller.
    pub device_id: DisplayDeviceId,
    /// All connectors attached to this controller (HDMI, DP, eDP, etc.).
    pub connectors: ArrayVec<DisplayConnector, MAX_CONNECTORS>,
    /// Hardware display planes available for composition.
    pub planes: ArrayVec<DisplayPlane, MAX_PLANES>,
    /// MMIO base address for display controller registers.
    pub mmio_base: u64,
    /// IRQ number for hotplug/vblank interrupts.
    pub irq: u32,
}

impl DisplayDevice {
    /// Hotplug interrupt handler (runs in Tier 1 driver domain).
    pub fn handle_hotplug_interrupt(&self) {
        // Scan all connectors for state changes.
        for connector in &self.connectors {
            let new_state = self.read_connector_state(connector.id);
            let old_state = connector.state.load(Ordering::Acquire);

            if new_state as u32 != old_state {
                connector.state.store(new_state as u32, Ordering::Release);

                if new_state as u32 == ConnectorState::Connected as u32 {
                    // Display connected: read EDID, parse modes.
                    // Build a new ConnectorProps snapshot and swap it into
                    // the RCU pointer atomically; readers are always lock-free.
                    if let Ok(edid) = self.read_edid(connector.id) {
                        let modes = parse_edid(&edid);
                        let new_props = Arc::new(ConnectorProps {
                            edid: Some(edid),
                            modes,
                            active_mode: None,
                        });
                        connector.props.swap(new_props);
                    }
                    // Post hotplug event to userspace.
                    self.post_hotplug_event(connector.id, HotplugEventType::Connected);
                } else {
                    // Display disconnected: replace props with an empty snapshot.
                    let new_props = Arc::new(ConnectorProps {
                        edid: None,
                        modes: ArrayVec::new(),
                        active_mode: None,
                    });
                    connector.props.swap(new_props);
                    self.post_hotplug_event(connector.id, HotplugEventType::Disconnected);
                }
            }
        }
    }
}

Compositor response: When the compositor receives a hotplug event (via the event ring buffer), it: 1. Re-enumerates connectors and modes (ioctl(DRM_IOCTL_MODE_GETRESOURCES)). 2. Decides how to configure the new display (extended desktop, mirror, ignore). 3. Allocates new framebuffers (if needed) for the new resolution. 4. Submits an atomic modesetting request to enable the new connector.

21.5.7.1.1 parse_edid() — EDID Parsing Specification¶

parse_edid() is called during hotplug handling (see handle_hotplug_interrupt above) to convert raw EDID bytes read from the monitor's I2C DDC bus into a list of display modes.

/// Parse an EDID (Extended Display Identification Data) blob into display modes.
///
/// Supports EDID 1.0–1.4 (128 bytes) and E-EDID (DisplayID, CTA-861 extensions,
/// up to 512 bytes). Input is the raw bytes read from the monitor's I2C DDC bus.
///
/// # Algorithm
///
/// 1. **Header validation**: First 8 bytes must be `[0x00, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00]`.
///    Return `Err(EdidError::InvalidHeader)` if not.
///
/// 2. **Checksum**: Sum all 128 bytes; result must be 0 (mod 256).
///    Return `Err(EdidError::BadChecksum)` if not.
///
/// 3. **Established timings** (bytes 35–37, 24 well-known modes):
///    Bit map to modes: bit 7 of byte 35 = 720×400@70Hz, bit 6 = 720×400@88Hz, ...
///    (See VESA EDID standard Table 3.20 for full mapping.)
///    Add each set bit as a `DisplayMode` to the output list.
///
/// 4. **Standard timing descriptors** (bytes 38–53, 8 entries × 2 bytes):
///    Each entry encodes horizontal active pixels and aspect ratio + refresh rate.
///    Skip entries equal to `0x0101` (unused).
///    Formula: `h_active = (byte0 + 31) * 8; v_active = h_active / aspect_ratio;
///               refresh = (byte1 & 0x3F) + 60`
///
/// 5. **Detailed timing descriptors** (bytes 54–125, 4 × 18-byte blocks):
///    Each 18-byte block is either a monitor descriptor (first byte 0x00) or
///    a detailed timing (first byte non-zero). Detailed timings encode pixel clock,
///    h/v active, h/v blanking, sync polarity, and flags for interlaced/stereo.
///    Parse each detailed timing as a `DisplayMode`.
///
/// 6. **CEA/CTA extensions** (each extension block is 128 bytes, same checksum rule):
///    Tag byte 0x02 = CEA-861 extension. Parse Video Data Block (tag=2), short
///    video descriptors (SVDs), and native mode indicator. VIC (Video Identification
///    Code) → mode lookup table (CEA-861-F Table 1).
///
/// # Output
///
/// Returns `ArrayVec<DisplayMode, 64>`: up to 64 modes. Modes are sorted by
/// descending priority: detailed timings first (native mode = bit15 of CEA block or
/// first detailed timing), then established timings, then standard timings.
/// Duplicate modes (same h×v×refresh) are deduplicated; the one from the highest-
/// priority source is kept.
///
/// # Error handling
/// Returns `Err(EdidError)` only for header/checksum failures on the base 128-byte
/// block. Invalid or unrecognized descriptor blocks are skipped silently (a partial
/// mode list is better than no modes at all).
pub fn parse_edid(raw: &[u8]) -> Result<ArrayVec<DisplayMode, 64>, EdidError>;

/// Errors returned by `parse_edid()`.
#[derive(Debug)]
pub enum EdidError {
    /// Buffer too short (< 128 bytes).
    TooShort,
    /// Magic header bytes wrong (first 8 bytes are not the EDID header pattern).
    InvalidHeader,
    /// Checksum over 128 bytes != 0 mod 256.
    BadChecksum,
}

21.5.8 Panel Self-Refresh (PSR)¶

When the compositor has not updated the framebuffer (static desktop), the display controller can enter Panel Self-Refresh mode: - The monitor's internal controller (eDP panel, DP monitor with PSR support) caches the last frame. - The GPU's scanout engine stops reading from VRAM (memory bandwidth saved). - The GPU's memory controller enters a low-power state (watts saved).

When the compositor updates the framebuffer (user moves the mouse, window animates), the display driver detects the change (via atomic commit) and exits PSR mode, resuming scanout.

Power savings: PSR saves 1-2W on a laptop when the screen is static (reading a document, watching a video with no UI movement). This extends battery life by ~10-15% for typical office workloads.

21.5.9 Variable Refresh Rate (VRR)¶

Modern monitors support VRR (FreeSync, G-Sync, HDMI VRR): the display refreshes at variable intervals (e.g., 40-144 Hz) synchronized with the compositor's render rate. This eliminates tearing without vsync's fixed-cadence latency.

VrrMode enum — the mode selector passed to the hardware driver:

// umka-core/src/display/vrr.rs

/// Variable Refresh Rate mode selector.
///
/// Passed to [`DisplayHwOps::set_vrr_mode`] (Section 21.4.12) to program the
/// display controller's Adaptive-Sync (DP), HDMI VRR, or FreeSync registers.
#[repr(u32)]
pub enum VrrMode {
    /// VRR disabled — the display runs at a fixed refresh rate determined by the
    /// active `DisplayMode`.
    Disabled = 0,
    /// VRR enabled — the display refreshes at variable intervals within the range
    /// reported by [`DisplayHwOps::get_vrr_range`] (Section 21.4.12).
    Enabled = 1,
}

Call path: DisplayConnector::set_vrr() validates the active mode, then delegates to the hardware driver via the DisplayHwOps::set_vrr_mode function pointer defined in Section 21.5. The set_vrr_mode field is Option<fn> — drivers that do not support VRR leave it None, and the call returns DisplayError::VrrNotSupported.

impl DisplayConnector {
    /// Enable or disable VRR on this connector.
    ///
    /// Checks that the active mode advertises VRR capability and that the
    /// hardware driver provides a `set_vrr_mode` implementation.
    pub fn set_vrr(&self, enabled: bool) -> Result<(), DisplayError> {
        // Read the RCU-protected ConnectorProps snapshot (lock-free; safe in interrupt context).
        let props = self.props.read();
        let mode = props.active_mode.ok_or(DisplayError::NoActiveMode)?;
        if (mode.flags & mode_flags::VRR) == 0 {
            return Err(DisplayError::VrrNotSupported);
        }
        let vrr_mode = if enabled { VrrMode::Enabled } else { VrrMode::Disabled };
        let set_vrr = self.driver.hw_ops.set_vrr_mode
            .ok_or(DisplayError::VrrNotSupported)?;
        // SAFETY: driver context pointer is valid for the lifetime of the DisplayDevice.
        let rc = unsafe { set_vrr(self.driver.ctx, self.id, vrr_mode) };
        IoResultCode::into_result(rc).map_err(|_| DisplayError::HwError)
    }
}

Range query: Compositors must know the monitor's supported VRR range to clamp their render rate and decide whether to enable Low Framerate Compensation (LFC) below min_mhz. The range is obtained via DisplayHwOps::get_vrr_range (also in Section 21.5), which reads the range from EDID/DisplayID (DP Adaptive-Sync), HDMI Forum VSDB, or vendor extensions.

Compositor use: Wayland compositors (KWin, Mutter, wlroots) query the VRR range, enable VRR via the atomic modesetting commit (Section 21.5), and schedule presentation to match the compositor's render loop (unlocked framerate, no vsync wait).

21.5.10 VBlank Handling and Synchronization¶

VBlank (vertical blanking interval) is the fundamental display timing primitive. The display controller generates a VBlank interrupt at the start of each blanking interval (between the last scanline of one frame and the first scanline of the next). All page flips, cursor moves, and mode changes are synchronized to VBlank to avoid tearing.

// umka-core/src/display/vblank.rs

/// Kernel-internal VBlank ring buffer event. NOT the DRM ABI struct (see
/// DrmEventVblank for the ABI-facing version). The event_type/length header
/// is for the internal event ring multiplexing, not for userspace consumption.
// kernel-internal, not KABI — internal event ring format, not userspace-facing.
#[repr(C)]
pub struct VblankEvent {
    /// Event type discriminant (for the generic event ring).
    pub event_type: u32,            // = DRM_EVENT_VBLANK (0x01)
    /// Size of this event struct in bytes.
    pub length: u32,                // = 32
    /// Monotonic timestamp (ns) at which VBlank occurred (from CLOCK_MONOTONIC).
    pub timestamp_ns: u64,
    /// VBlank sequence counter (monotonically increasing, wraps at u64::MAX).
    pub sequence: u64,
    /// CRTC ID that generated this VBlank.
    pub crtc_id: u32,
    /// Padding.
    pub _pad: u32,
}

/// Per-CRTC VBlank tracking state (kernel-internal).
pub struct VblankState {
    /// Monotonically increasing VBlank counter.
    pub count: AtomicU64,
    /// Timestamp of the most recent VBlank (ns, CLOCK_MONOTONIC).
    pub last_timestamp_ns: AtomicU64,
    /// Wait queue for threads blocked on VBlank (epoll_wait, ioctl WAITVBLANK).
    pub waiters: WaitQueue,
    /// Number of userspace clients requesting VBlank events on this CRTC.
    /// When zero, the kernel masks the VBlank interrupt to save power.
    pub event_refcount: AtomicU32,
    /// Whether this CRTC is actively scanning out (false during DPMS off/suspend).
    pub enabled: AtomicBool,
}

VBlank interrupt handler (runs in Tier 1 driver domain):

VBlank IRQ fires:
  1. Read hardware VBlank status register, acknowledge interrupt.
  2. Increment vblank_state.count (atomic).
  2a. Copy `vblank_state.count.load(Relaxed) as u32` to `DrmEventVblank.sequence`
      (truncation is ABI-required; see longevity comment at DrmEventVblank definition).
  3. Store current timestamp in vblank_state.last_timestamp_ns (atomic).
  4. If a page flip was pending (committed via atomic modesetting with !ASYNC flag):
     a. The hardware has latched the new framebuffer address — the flip is complete.
     b. Post a PAGE_FLIP_COMPLETE event to the compositor's event ring buffer.
     c. Release the old framebuffer's reference (it is no longer being scanned out).
  5. If vblank_state.event_refcount > 0:
     a. Post a VblankEvent to each subscribed client's event ring buffer.
  6. Wake all threads on vblank_state.waiters.

VBlank event delivery to userspace: Compositors subscribe to VBlank events via ioctl(dri_fd, UMKA_DRM_CRTC_ENABLE_VBLANK, crtc_id). Events are delivered through the DRM file descriptor's event ring buffer (readable via read(2) or epoll). When the last subscriber unsubscribes, the kernel masks the VBlank interrupt to avoid unnecessary IRQ overhead on idle displays.

VBlank-synchronized page flips: When the compositor submits an atomic commit without the ASYNC flag, the kernel programs the new framebuffer address into a shadow register. The hardware latches the shadow register on the next VBlank, atomically switching scanout to the new framebuffer. The compositor blocks (or polls) until the PAGE_FLIP_COMPLETE event confirms the flip.

VBlank Event Ring Specification:

/// Per-CRTC VBlank event ring. Written by the display interrupt handler;
/// read by compositors and frame-synchronization tools.
pub struct VblankEventRing {
    /// Ring buffer entries. Fixed size: 64 events × 32 bytes = 2 KiB per CRTC.
    /// 64 entries provides ~1 second of headroom at 60Hz even if the compositor
    /// stalls for one full refresh period.
    pub ring: RingBuffer<VblankEvent, 64>,
    /// Subscribers waiting for the next VBlank (list of tasks blocked on poll/select
    /// or io_uring POLL_ADD for the CRTC's event fd).
    pub waiters: WaitQueue,
    /// Monotonic VBlank counter. Wraps at u64::MAX (~584 years at 120Hz).
    pub sequence: AtomicU64,
    /// Timestamp of the last VBlank (CLOCK_MONOTONIC nanoseconds).
    pub last_vblank_ns: AtomicU64,
}

Userspace DRM event structs — the ABI format delivered via read() on the DRM device fd. Must match Linux include/uapi/drm/drm.h exactly:

/// Base DRM event header (8 bytes).
/// Userspace ABI — matches Linux `struct drm_event` from `include/uapi/drm/drm.h`.
#[repr(C)]
pub struct DrmEvent {
    /// Event type: DRM_EVENT_VBLANK (0x01), DRM_EVENT_FLIP_COMPLETE (0x02),
    /// DRM_EVENT_CRTC_SEQUENCE (0x03).
    pub event_type: u32,
    /// Total length of this event including header (e.g., 32 for DrmEventVblank).
    pub length:     u32,
}
// DrmEvent: u32(4)*2 = 8 bytes.
// Userspace ABI struct — delivered via read(2) on DRM device fd.
const_assert!(core::mem::size_of::<DrmEvent>() == 8);

/// VBlank / page-flip completion event (32 bytes, matches Linux drm_event_vblank).
// Userspace ABI struct — matches Linux struct drm_event_vblank from
// include/uapi/drm/drm.h. Delivered to userspace via read(drm_fd) to
// Wayland compositors, Mesa, and every DRM client. Do NOT modify layout.
#[repr(C)]
pub struct DrmEventVblank {
    pub base:       DrmEvent,
    /// User-provided data from DRM_IOCTL_PAGE_FLIP or DRM_IOCTL_WAIT_VBLANK.
    pub user_data:  u64,
    /// Timestamp (seconds since epoch or since boot, depending on clock source).
    /// **ABI-constrained**: u32 matches Linux `drm_event_vblank.tv_sec`.
    /// With CLOCK_MONOTONIC (default since Linux 4.15): wraps in ~136 years
    /// from boot — well within 50-year uptime target. With CLOCK_REALTIME:
    /// wraps in 2106 (Y2106 problem, same as Linux). Clock source selected
    /// via `DRM_CAP_TIMESTAMP_MONOTONIC` capability query.
    pub tv_sec:     u32,
    /// Timestamp (microseconds).
    pub tv_usec:    u32,
    /// VBlank sequence counter.
    /// **ABI-constrained**: u32 matches Linux `drm_event_vblank.sequence`.
    /// At 120 Hz, wraps in ~414 days. DRM userspace (Mesa, Weston) handles
    /// wrap via unsigned arithmetic comparison. Not a correctness issue.
    pub sequence:   u32,
    /// CRTC ID that generated this event.
    /// Historically named `reserved` for DRM_EVENT_VBLANK (type 0x01);
    /// formally `crtc_id` since DRM_EVENT_CRTC_SEQUENCE (type 0x03).
    /// Modern Linux drivers fill this for all event types.
    pub crtc_id:    u32,
}
const _: () = assert!(core::mem::size_of::<DrmEventVblank>() == 32);

When delivering VBlank events to userspace via read(drm_fd), the kernel translates VblankEvent (internal) to DrmEventVblank (ABI): timestamp_ns splits into tv_sec = timestamp_ns / 1_000_000_000 and tv_usec = (timestamp_ns % 1_000_000_000) / 1_000; sequence truncates to u32; user_data is filled from the subscription's page-flip or vblank-wait ioctl request.

The kernel writes DrmEventVblank entries into the DRM fd's read buffer. Compositors read them via read(drm_fd, buf, sizeof(DrmEventVblank)). Multiple events may be coalesced in one read() call; the length field allows parsing sequential events.

Subscription mechanism:

/// Request a VBlank notification for a specific CRTC.
/// Returns a subscription that becomes readable when the next VBlank fires.
/// Equivalent to DRM_IOCTL_WAIT_VBLANK with DRM_VBLANK_EVENT flag.
pub fn drm_vblank_subscribe(crtc_id: u32) -> Result<VblankSubscription, DrmError>;

pub struct VblankSubscription {
    /// Event fd: readable when VBlank fires (or immediately if missed_vblanks > 0).
    pub event_fd: EventFd,
    /// If > 0, this subscription was registered late and missed this many VBlanks.
    /// The first read from event_fd will return immediately to signal the miss.
    pub missed_vblanks: u32,
}

Overflow behavior: When the 64-entry ring overflows (compositor not reading fast enough):

The oldest event is overwritten (ring buffer semantics; latest events take priority).
The missed_count field on the subscription's next event is set to the number of overwritten events.
The compositor can detect overflow by checking event.sequence != last_sequence + 1.
No compositor blocking: the ring is lock-free (SPSC; interrupt handler is the single producer, compositor is the single consumer per subscription). Overflow silently drops old events — never blocks the interrupt handler.

Interrupt path: The display interrupt handler directly writes to VblankEventRing.ring using the interrupt-safe SPSC write path (no allocation, O(1)). Then calls WaitQueue::wake_all(&ring.waiters) to wake subscribed compositors.

21.5.11 Multi-Monitor Coordination¶

A display controller typically has multiple CRTCs (CRT Controllers — the name is historical; they drive flat panels too). Each CRTC is an independent timing generator that scans out one framebuffer to one or more connectors. The mapping is:

Display Pipeline:
  Planes → CRTC → Encoder → Connector → Monitor
             │
             ├── Each CRTC has independent timing (mode, refresh rate, VBlank)
             ├── Each CRTC owns a set of planes (primary + optional cursor/overlay)
             └── Multiple connectors can share a CRTC (clone/mirror mode)

21.5.11.1.1 Color Management: GammaLut and CrtcColorProperties¶

Before the CRTC structs, this section defines the color management types used by CrtcState. These types represent the CRTC-level display pipeline color correction stages: de-gamma (linearization), CTM (color space conversion), and gamma (re-gamma for display encoding). They match the Linux DRM color management ABI so that Wayland compositors and color management tools (colord, icc-profiles) work unmodified.

Display color pipeline (stages applied in hardware order):

Plane pixels (encoded, e.g., sRGB)
  → per-plane tone mapping (optional, plane-level property)
  → alpha compositing / blending
  → CRTC degamma LUT  (encoded → linear light, using degamma_lut)
  → CTM               (color space conversion, e.g., sRGB → display native)
  → CRTC gamma LUT    (linear light → display encoding, using gamma_lut)
  → scanout to panel

// umka-core/src/display/color.rs

/// Single entry in a hardware gamma lookup table.
///
/// Maps one input intensity level to per-channel output intensities.
/// The hardware applies the LUT per channel: `R_out = lut.red[R_in >> shift]`,
/// where `shift` accounts for the difference between input bit depth (e.g., 10-bit
/// pipe) and the LUT size (e.g., 256 entries → shift = 2 for a 10-bit pipe).
///
/// Layout matches Linux's `struct drm_color_lut` (see `include/uapi/drm/drm_mode.h`)
/// for binary ABI compatibility with Wayland compositors, Xorg, and color
/// management daemons that set gamma via `DRM_IOCTL_MODE_SETCRTC` or the
/// atomic `DRM_IOCTL_MODE_ATOMIC` with the `GAMMA_LUT` CRTC property blob.
#[repr(C)]
pub struct GammaLutEntry {
    /// Red channel output value (16-bit, linear, range 0..=65535).
    pub red: u16,
    /// Green channel output value (16-bit, linear, range 0..=65535).
    pub green: u16,
    /// Blue channel output value (16-bit, linear, range 0..=65535).
    pub blue: u16,
    /// Reserved for alignment; must be zero.
    /// (Matches the padding in `struct drm_color_lut` for ABI compatibility.)
    pub _reserved: u16,
}
// GammaLutEntry: u16(2)*4 = 8 bytes.
// Userspace ABI struct — matches Linux `struct drm_color_lut`.
const_assert!(core::mem::size_of::<GammaLutEntry>() == 8);

/// Color correction gamma LUT (Look-Up Table).
/// Pre-allocated at display device initialization to the hardware's LUT capacity.
/// `Box<[GammaLutEntry]>` over `Vec<GammaLutEntry>`: allocated once at init,
/// never resized. Atomic modeset context writes into the pre-allocated slice
/// without risk of allocation failure.
///
/// Used for both the gamma LUT (post-blend, re-encodes into display gamma) and
/// the de-gamma LUT (pre-blend, linearizes sRGB input). The number of entries
/// is hardware-dependent — query via `CrtcProperties::gamma_lut_size` (the
/// read-only `GAMMA_LUT_SIZE` DRM CRTC property).
///
/// The kernel validates that `count <= entries.len()` on every atomic
/// commit that includes a `GAMMA_LUT` or `DEGAMMA_LUT` property update.
/// Mismatched sizes are rejected with `-EINVAL`.
///
/// **Default (linear) LUT**: When `CrtcColorProperties::gamma_lut` is `None`,
/// the hardware applies a linear identity mapping: entry `i` maps to
/// `i * 65535 / (size - 1)` per channel. This is the power-on default and the
/// behavior when color management is not requested.
///
/// **Allocation**: The `Box<[GammaLutEntry]>` is allocated once in
/// `drm_device_init()` when the hardware's `gamma_size` is read from the
/// display controller (via `DRM_IOCTL_MODE_GETPROPBLOB` → `gamma_size`).
/// After initialization the slice is never reallocated; atomic modesetting
/// paths only update entries within the already-allocated slice, so there is
/// no allocation failure path in the atomic commit code.
pub struct GammaLut {
    /// Pre-allocated LUT entries. Capacity = hardware LUT size (typically 256
    /// or 1024 per channel, queried from display hardware at init via
    /// `DRM_IOCTL_MODE_GETPROPBLOB` → `gamma_size`).
    /// Entries in order from darkest (index 0, input = black) to
    /// brightest (index count-1, input = full intensity).
    pub entries: Box<[GammaLutEntry]>,
    /// Number of valid entries (≤ entries.len()). Must match the hardware's
    /// `GAMMA_LUT_SIZE` property for the target CRTC on every atomic commit.
    pub count: u32,
}

/// 3×3 color transform matrix (CTM) in S31.32 fixed-point format.
///
/// Applied between the de-gamma and gamma stages to convert between color spaces
/// (e.g., sRGB → DCI-P3, BT.709 → BT.2020, or ICC profile adjustments).
///
/// Entry `matrix[i][j]` is the contribution of input channel `j` to output
/// channel `i`, where channels are ordered R=0, G=1, B=2. The fixed-point
/// format is S31.32: bit 63 is sign, bits 62..32 are the integer part, bits
/// 31..0 are the fractional part. This matches the Linux DRM CTM property blob
/// layout (`struct drm_color_ctm`, `include/uapi/drm/drm_mode.h`).
///
/// Identity matrix (no color conversion):
/// ```
/// matrix = [[1<<32, 0, 0],
///            [0, 1<<32, 0],
///            [0, 0, 1<<32]]
/// ```
#[repr(C)]
pub struct ColorTransformMatrix {
    /// Row-major 3×3 matrix. `matrix[output_channel][input_channel]`.
    pub matrix: [[i64; 3]; 3],
}
// ColorTransformMatrix: 9 × i64(8) = 72 bytes.
// Userspace ABI struct — matches Linux `struct drm_color_ctm`.
const_assert!(core::mem::size_of::<ColorTransformMatrix>() == 72);

/// CRTC color management properties.
///
/// Grouped as a sub-struct within `CrtcState` so that all color properties
/// are updated atomically as part of an RCU-swapped state snapshot. This
/// prevents a race where gamma is updated but the CTM is not yet applied,
/// which would briefly produce incorrect colors on a live display.
pub struct CrtcColorProperties {
    /// Gamma LUT for post-blending correction (CRTC-level re-encoding).
    /// Applied after the CTM, converts linear light to display-encoded values.
    /// `None` means linear (no gamma correction — hardware applies identity LUT).
    /// Set via the atomic `GAMMA_LUT` CRTC property blob.
    pub gamma_lut: Option<GammaLut>,
    /// De-gamma LUT for pre-blending linearization (CRTC-level).
    /// Applied before plane blending, converts sRGB-encoded plane pixels to
    /// linear light for physically correct alpha compositing and CTM application.
    /// `None` means input is treated as linear (no de-gamma applied).
    /// Set via the atomic `DEGAMMA_LUT` CRTC property blob.
    pub degamma_lut: Option<GammaLut>,
    /// Color transform matrix (CTM) for color space conversion.
    /// Applied between de-gamma and gamma stages.
    /// `None` means identity (no color space conversion).
    /// Set via the atomic `CTM` CRTC property blob.
    pub ctm: Option<ColorTransformMatrix>,
}

Linux DRM ABI compatibility: - GammaLutEntry layout matches struct drm_color_lut exactly (field order and sizes are identical, including the 16-bit reserved padding field). - ColorTransformMatrix layout matches struct drm_color_ctm (nine S31.32 values in row-major order). - Gamma and de-gamma LUTs are set as blob properties via DRM_IOCTL_MODE_ATOMIC with the CRTC property names GAMMA_LUT and DEGAMMA_LUT. - The read-only CRTC property GAMMA_LUT_SIZE (and DEGAMMA_LUT_SIZE if the hardware has a separate de-gamma LUT) reports the hardware LUT size in entries. - The legacy DRM_IOCTL_MODE_SETCRTC gamma interface (which passes a simple 256- entry RGB table) is translated internally to a GammaLut with count = 256 (written into a pre-allocated slice of at least 256 entries) and applied as the gamma_lut property with degamma_lut = None, matching Linux behavior.

// umka-core/src/display/crtc.rs

/// CRTC (display timing generator).
///
/// All mutable CRTC properties (mode, plane assignments, connectors, gamma)
/// are grouped into a single `CrtcState` snapshot, swapped atomically via
/// RCU during modeset commit. This eliminates four separate RwLocks and
/// guarantees readers see a fully consistent CRTC configuration — no
/// half-applied modeset where `active_mode` is updated but `planes` is stale.
pub struct Crtc {
    /// CRTC index (0..num_crtcs-1, unique per display device).
    pub id: u32,
    /// VBlank tracking for this CRTC (independent lifecycle, not part
    /// of modeset state — vblank counters increment continuously).
    pub vblank: VblankState,
    /// Current CRTC state. Replaced atomically during modeset commit.
    /// VBlank handlers and userspace queries read lock-free via RCU.
    pub state: RcuPtr<Arc<CrtcState>>,
}

/// Immutable snapshot of CRTC configuration. Created during atomic commit
/// and swapped via RCU. Freed after grace period when superseded.
pub struct CrtcState {
    /// Current display mode (None = CRTC disabled).
    pub active_mode: Option<DisplayMode>,
    /// Planes assigned to this CRTC.
    pub planes: ArrayVec<u32, MAX_PLANES_PER_CRTC>,
    /// Connectors currently routed to this CRTC.
    pub connectors: ArrayVec<u32, MAX_CONNECTORS_PER_CRTC>,
    /// Color management properties (degamma LUT, CTM, gamma LUT).
    /// All three stages are updated atomically as part of this state snapshot.
    /// See `CrtcColorProperties` and the color pipeline diagram above.
    pub color: CrtcColorProperties,
}

/// Maximum CRTCs per display device (i915 = 4, AMD = 6, typical).
pub const MAX_CRTCS: usize = 8;
/// Maximum planes per CRTC (primary + cursor + overlays).
pub const MAX_PLANES_PER_CRTC: usize = 8;
/// Maximum connectors per CRTC (for clone/mirror).
pub const MAX_CONNECTORS_PER_CRTC: usize = 4;
/// Maximum connectors per display device.
pub const MAX_CONNECTORS: usize = 8;
/// Maximum planes per display device.
pub const MAX_PLANES: usize = 32;

Plane-to-CRTC assignment: Not all planes can drive all CRTCs. Each plane has a possible_crtcs bitmask (set by the driver during probe) indicating which CRTCs it can be attached to. The atomic commit validator checks this constraint. Example: on an Intel Gen12 GPU, the cursor plane for pipe A cannot be assigned to pipe B.

Bandwidth validation: When an atomic commit enables multiple CRTCs at high resolutions, the kernel validates that the total scanout bandwidth does not exceed the display controller's memory bandwidth limit:

bandwidth_check(commit):
  total_bw = 0
  for each active CRTC in commit:
    mode = crtc.active_mode
    bpp = framebuffer.format.bytes_per_pixel()
    total_bw += mode.clock_khz * 1000 * bpp  // bytes/sec
  if total_bw > display_device.max_scanout_bandwidth:
    return Err(DisplayError::InsufficientBandwidth)

This prevents configurations like 4x 4K@120Hz on a controller that can only sustain 2x 4K@120Hz, which would cause visual corruption or FIFO underruns.

Independent timing: Each CRTC runs at its own refresh rate. A laptop with a 120Hz internal panel (eDP) and a 60Hz external monitor (HDMI) has two CRTCs with independent VBlank timing. The compositor receives separate VBlank events for each and renders at independent cadences.

21.5.12 Display Register Abstraction¶

Display drivers access hardware via MMIO-mapped registers. To maintain the tier isolation model and support multiple display controller families, register access is abstracted behind a per-driver operations table:

// umka-core/src/display/hw.rs

/// Display hardware operations — implemented by each display driver
/// (i915, amdgpu, nouveau, etc.). Passed to the display core during probe.
/// KABI vtable — crosses Tier 1 driver boundary. Display drivers (i915,
/// amdgpu, etc.) implement this trait and provide it during probe.
#[repr(C)]
pub struct DisplayHwOps {
    /// Write a 32-bit value to a display register (MMIO offset from base).
    pub reg_write32: unsafe extern "C" fn(ctx: *mut c_void, offset: u32, value: u32),
    /// Read a 32-bit value from a display register.
    pub reg_read32: unsafe extern "C" fn(ctx: *mut c_void, offset: u32) -> u32,
    /// Program a CRTC's timing generator with the given mode.
    /// The driver translates DisplayMode into hardware-specific register values
    /// (PLL dividers, pipe timings, sync polarities).
    pub crtc_set_mode: unsafe extern "C" fn(
        ctx: *mut c_void,
        crtc_id: u32,
        mode: *const DisplayMode,
    ) -> IoResultCode,
    /// Enable/disable a CRTC's timing generator.
    /// `enable`: 1 = enable, 0 = disable. u8 instead of bool for stable
    /// C ABI (bool size is implementation-defined across compilers).
    pub crtc_enable: unsafe extern "C" fn(
        ctx: *mut c_void,
        crtc_id: u32,
        enable: u8,
    ) -> IoResultCode,
    /// Program a plane's scanout address and position.
    pub plane_update: unsafe extern "C" fn(
        ctx: *mut c_void,
        plane_id: u32,
        fb: *const Framebuffer,
        src: *const Rectangle,
        dst: *const Rectangle,
    ) -> IoResultCode,
    /// Commit all pending register writes atomically (latch on next VBlank).
    /// Called after crtc_set_mode/plane_update to apply changes together.
    pub commit_flush: unsafe extern "C" fn(ctx: *mut c_void) -> IoResultCode,
    /// Read EDID from a connector's DDC/CI I2C bus.
    pub read_edid: unsafe extern "C" fn(
        ctx: *mut c_void,
        connector_id: u32,
        out_edid: *mut u8,
        edid_buf_size: u32,
        out_edid_len: *mut u32,
    ) -> IoResultCode,
    /// Read connector hotplug state (connected/disconnected).
    pub read_connector_state: unsafe extern "C" fn(
        ctx: *mut c_void,
        connector_id: u32,
    ) -> ConnectorState,
    /// Acknowledge VBlank interrupt. Returns the CRTC ID that generated it.
    pub ack_vblank: unsafe extern "C" fn(
        ctx: *mut c_void,
        out_crtc_id: *mut u32,
    ) -> IoResultCode,
    /// Set DPMS power state on a connector.
    pub set_dpms: unsafe extern "C" fn(
        ctx: *mut c_void,
        connector_id: u32,
        state: DpmsState,
    ) -> IoResultCode,
    /// Enable/disable VRR (Adaptive-Sync/FreeSync/HDMI VRR) on a connector.
    /// Returns `IO_NOT_SUPPORTED` if monitor does not advertise VRR capability.
    pub set_vrr_mode: Option<unsafe extern "C" fn(
        ctx: *mut c_void,
        connector_id: u32,
        mode: VrrMode,
    ) -> IoResultCode>,
    /// Query the Variable Refresh Rate range supported by the connected monitor.
    /// The driver reads the VRR range from EDID/DisplayID (DP Adaptive-Sync),
    /// HDMI Forum VSDB, or vendor extensions (FreeSync). Compositors need this
    /// to clamp their render rate within the supported range and to decide
    /// whether to enable Low Framerate Compensation (LFC) below `min_mhz`.
    /// Returns `IO_NOT_SUPPORTED` if VRR is not advertised by the monitor.
    pub get_vrr_range: Option<unsafe extern "C" fn(
        ctx: *mut c_void,
        connector_id: u32,
        out_min_mhz: *mut u32,
        out_max_mhz: *mut u32,
    ) -> IoResultCode>,
}
const_assert!(core::mem::size_of::<DisplayHwOps>() == 80); // 10 fn ptrs × 8

The display core (generic, hardware-independent code) calls DisplayHwOps methods to program the hardware. Each driver (i915, amdgpu, etc.) provides its own DisplayHwOps implementation that translates generic operations into hardware-specific register writes. This is the same VTable pattern used by all UmkaOS KABI interfaces (Section 12.1).

Register access isolation: Display drivers run in Tier 1. Their MMIO regions are mapped into the driver's isolation domain. The reg_write32/reg_read32 functions access MMIO directly (no syscall overhead). The display core, running in umka-core's domain, calls the driver's DisplayHwOps via a domain switch (~23-80 cycles).

21.5.13 DRM/KMS Compatibility Interface¶

Userspace compositors (Wayland compositors, Xwayland, mpv) interact with the display subsystem via Linux DRM/KMS ioctl() calls on /dev/dri/card* device nodes. UmkaOS's umka-sysapi layer (Section 19.1) translates these ioctls into UmkaOS-native display operations.

Supported DRM ioctls (minimum viable set for Wayland compositors). All use DRM_IOCTL_BASE = 'd' (0x64). Definitions from include/uapi/drm/drm.h:

ioctl	Linux definition	UmkaOS handler	Description
`DRM_IOCTL_MODE_GETRESOURCES`	`DRM_IOWR(0xA0, drm_mode_card_res)`	`display_get_resources()`	Enumerate CRTCs, connectors, encoders
`DRM_IOCTL_MODE_GETCRTC`	`DRM_IOWR(0xA1, drm_mode_crtc)`	`display_get_crtc()`	Get current CRTC mode and framebuffer
`DRM_IOCTL_MODE_SETCRTC`	`DRM_IOWR(0xA2, drm_mode_crtc)`	`display_legacy_set_crtc()`	Legacy mode setting (translated to atomic internally)
`DRM_IOCTL_MODE_GETENCODER`	`DRM_IOWR(0xA6, drm_mode_get_encoder)`	`display_get_encoder()`	Get encoder↔CRTC mapping
`DRM_IOCTL_MODE_GETCONNECTOR`	`DRM_IOWR(0xA7, drm_mode_get_connector)`	`display_get_connector()`	Get connector properties and supported modes
`DRM_IOCTL_MODE_RMFB`	`DRM_IOWR(0xAF, unsigned int)`	`display_remove_framebuffer()`	Destroy framebuffer object
`DRM_IOCTL_MODE_PAGE_FLIP`	`DRM_IOWR(0xB0, drm_mode_crtc_page_flip)`	`display_page_flip()`	Flip primary plane (translated to atomic commit)
`DRM_IOCTL_MODE_CREATE_DUMB`	`DRM_IOWR(0xB2, drm_mode_create_dumb)`	`display_create_dumb()`	Allocate a dumb scanout buffer
`DRM_IOCTL_MODE_MAP_DUMB`	`DRM_IOWR(0xB3, drm_mode_map_dumb)`	`display_map_dumb()`	Obtain mmap offset for a dumb buffer
`DRM_IOCTL_MODE_DESTROY_DUMB`	`DRM_IOWR(0xB4, drm_mode_destroy_dumb)`	`display_destroy_dumb()`	Release a dumb buffer
`DRM_IOCTL_MODE_ADDFB2`	`DRM_IOWR(0xB8, drm_mode_fb_cmd2)`	`display_add_framebuffer()`	Create framebuffer object from DMA-BUF / GEM handle
`DRM_IOCTL_MODE_ATOMIC`	`DRM_IOWR(0xBC, drm_mode_atomic)`	`display_atomic_commit()`	Full atomic modesetting
`DRM_IOCTL_MODE_CREATEPROPBLOB`	`DRM_IOWR(0xBD, drm_mode_create_blob)`	`display_create_blob()`	Create property blob (for gamma LUTs, HDR metadata)
`DRM_IOCTL_MODE_DESTROYPROPBLOB`	`DRM_IOWR(0xBE, drm_mode_destroy_blob)`	`display_destroy_blob()`	Destroy property blob
`DRM_IOCTL_GET_MAGIC`	`DRM_IOR(0x02, drm_auth)`	`drm_get_magic()`	Client obtains a magic token for master authentication
`DRM_IOCTL_AUTH_MAGIC`	`DRM_IOW(0x11, drm_auth)`	`drm_auth_magic()`	Master authenticates a client's magic token, granting rendering access
`DRM_IOCTL_SET_MASTER`	`DRM_IO(0x1E)`	`drm_set_master()`	Acquire DRM master status; returns EPERM if another fd is already master
`DRM_IOCTL_DROP_MASTER`	`DRM_IO(0x1F)`	`drm_drop_master()`	Release DRM master status, revoking modesetting privileges
`DRM_IOCTL_PRIME_HANDLE_TO_FD`	`DRM_IOWR(0x2d, drm_prime_handle)`	`dma_buf_export()`	Export GEM handle as DMA-BUF fd
`DRM_IOCTL_PRIME_FD_TO_HANDLE`	`DRM_IOWR(0x2e, drm_prime_handle)`	`dma_buf_import()`	Import DMA-BUF fd as GEM handle

Dumb buffer ABI structs (match Linux include/uapi/drm/drm_mode.h):

/// Request/response for DRM_IOCTL_MODE_CREATE_DUMB (0xB2).
/// Userspace fills width, height, bpp; kernel fills handle, pitch, size.
#[repr(C)]
pub struct DrmModeCreateDumb {
    pub height: u32,
    pub width: u32,
    pub bpp: u32,       // bits per pixel (must be multiple of 8)
    pub flags: u32,     // currently unused, must be zero
    pub handle: u32,    // [out] GEM handle
    pub pitch: u32,     // [out] bytes per row (may exceed width*bpp/8 for alignment)
    pub size: u64,      // [out] total buffer size in bytes
}
// DrmModeCreateDumb: u32(4)*6 + u64(8) = 32 bytes.
// Userspace ABI struct — DRM_IOCTL_MODE_CREATE_DUMB argument.
const_assert!(core::mem::size_of::<DrmModeCreateDumb>() == 32);

/// Request for DRM_IOCTL_MODE_MAP_DUMB (0xB3).
/// Returns an mmap offset for the buffer identified by `handle`.
#[repr(C)]
pub struct DrmModeMapDumb {
    pub handle: u32,
    pub pad: u32,
    pub offset: u64,    // [out] fake offset to pass to mmap()
}
// DrmModeMapDumb: u32(4) + u32(4) + u64(8) = 16 bytes.
// Userspace ABI struct — DRM_IOCTL_MODE_MAP_DUMB argument.
const_assert!(core::mem::size_of::<DrmModeMapDumb>() == 16);

/// Request for DRM_IOCTL_MODE_DESTROY_DUMB (0xB4).
#[repr(C)]
pub struct DrmModeDestroyDumb {
    pub handle: u32,
}
// DrmModeDestroyDumb: u32(4) = 4 bytes.
// Userspace ABI struct — DRM_IOCTL_MODE_DESTROY_DUMB argument.
const_assert!(core::mem::size_of::<DrmModeDestroyDumb>() == 4);

DRM master state machine:

The DRM master is the file descriptor that holds exclusive modesetting privileges on a DRM device (KMS operations require master status).

    ┌──────────────────────────────────────────────────────┐
    │ First open() on /dev/dri/cardN → auto-acquire master │
    └──────────────────┬───────────────────────────────────┘
                       ▼
              ┌────────────────┐
              │  fd is MASTER  │◄──── DRM_IOCTL_SET_MASTER (requires
              │  (modesetting  │      CAP_SYS_ADMIN or no current master)
              │   permitted)   │
              └───────┬────────┘
                      │
        DRM_IOCTL_DROP_MASTER / VT switch away
                      │
                      ▼
              ┌────────────────┐
              │  fd is NON-    │
              │  MASTER        │
              │  (render only) │
              └────────────────┘

First opener: The first open() on a primary node (/dev/dri/cardN) automatically acquires DRM master. Subsequent openers are non-master by default.
VT switch: When the user switches to a different virtual terminal, the VT subsystem calls drm_drop_master() on the current compositor's fd and drm_set_master() on the incoming one. This ensures only the active VT's compositor can modeset.
Wayland/X11 compositors hold master for the lifetime of their session. Clients render via render nodes (/dev/dri/renderDN) which never require master.
Client authentication (legacy): A non-master client on a primary node calls DRM_IOCTL_GET_MAGIC to obtain a magic token (u32), passes it to the master process via an out-of-band channel (e.g., a Unix socket), and the master calls DRM_IOCTL_AUTH_MAGIC with that token to grant the client rendering access. This mechanism predates render nodes and is used by legacy X11 DRI2 clients.

Error mapping: UmkaOS DisplayError variants map to Linux errno values:

/// Display subsystem error codes. Each variant maps to a unique Linux errno.
/// Multiple display errors mapping to the same errno (e.g., EINVAL) use the
/// variant's identity for internal dispatch; the errno value is only used at
/// the userspace ABI boundary (DRM ioctl return).
///
/// **Convention**: Discriminant values use the **negative** of the Linux errno,
/// following the standard kernel-internal convention where functions return
/// `-EFOO` on failure. The syscall return path (`umka-sysapi`) passes the
/// negative value directly to userspace via the register ABI; glibc then
/// negates it, stores it in `errno`, and returns -1. This matches Linux
/// kernel behavior (`return -EINVAL;` in C kernel code).
#[repr(i32)]
pub enum DisplayError {
    /// Permission denied (DRM_MASTER required for modesetting).
    PermissionDenied     = -1,   // -EPERM
    /// Connector not found.
    ConnectorNotFound    = -2,   // -ENOENT
    /// CRTC not found.
    CrtcNotFound         = -6,   // -ENXIO
    /// Mode not supported by connector.
    ModeNotSupported     = -22,  // -EINVAL
    /// Bandwidth exceeded for display controller.
    InsufficientBandwidth = -28, // -ENOSPC
    /// Framebuffer format not supported by plane.
    FormatNotSupported   = -61,  // -ENODATA
    /// No active mode on connector (VRR without mode set).
    NoActiveMode         = -71,  // -EPROTO
    /// VRR not supported by connector/mode.
    VrrNotSupported      = -95,  // -EOPNOTSUPP
    /// Atomic test failed (TEST_ONLY flag).
    AtomicTestFailed     = -125, // -ECANCELED
}

Legacy compatibility: Older applications use DRM_IOCTL_MODE_SETCRTC and DRM_IOCTL_MODE_PAGE_FLIP (non-atomic). UmkaOS translates these into atomic commits internally — SETCRTC becomes an atomic commit with ALLOW_MODESET, PAGE_FLIP becomes an atomic commit with only the primary plane updated. This matches the approach used by modern Linux DRM drivers (i915, amdgpu) which internally implement legacy ioctls as wrappers around atomic.

21.5.14 Architectural Decision¶

Display: Wayland-only + Xwayland

UmkaOS's KMS interface (Section 21.5) is Wayland-native (DRM atomic modesetting, DMA-BUF via capabilities). X11 support via Xwayland (same as Fedora, Ubuntu 22.04+). No native X11 server support — X11 protocol is a 40-year-old security liability (MIT-MAGIC-COOKIE-1, unrestricted window snooping). Xwayland provides compatibility for legacy apps without compromising security.