128 lines
No EOL
5.1 KiB
Text
128 lines
No EOL
5.1 KiB
Text
Source: https://bugs.chromium.org/p/project-zero/issues/detail?id=808
|
|
|
|
In Linux >=4.4, when the CONFIG_BPF_SYSCALL config option is set and the
|
|
kernel.unprivileged_bpf_disabled sysctl is not explicitly set to 1 at runtime,
|
|
unprivileged code can use the bpf() syscall to load eBPF socket filter programs.
|
|
These conditions are fulfilled in Ubuntu 16.04.
|
|
|
|
When an eBPF program is loaded using bpf(BPF_PROG_LOAD, ...), the first
|
|
function that touches the supplied eBPF instructions is
|
|
replace_map_fd_with_map_ptr(), which looks for instructions that reference eBPF
|
|
map file descriptors and looks up pointers for the corresponding map files.
|
|
This is done as follows:
|
|
|
|
/* look for pseudo eBPF instructions that access map FDs and
|
|
* replace them with actual map pointers
|
|
*/
|
|
static int replace_map_fd_with_map_ptr(struct verifier_env *env)
|
|
{
|
|
struct bpf_insn *insn = env->prog->insnsi;
|
|
int insn_cnt = env->prog->len;
|
|
int i, j;
|
|
|
|
for (i = 0; i < insn_cnt; i++, insn++) {
|
|
[checks for bad instructions]
|
|
|
|
if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
|
|
struct bpf_map *map;
|
|
struct fd f;
|
|
|
|
[checks for bad instructions]
|
|
|
|
f = fdget(insn->imm);
|
|
map = __bpf_map_get(f);
|
|
if (IS_ERR(map)) {
|
|
verbose("fd %d is not pointing to valid bpf_map\n",
|
|
insn->imm);
|
|
fdput(f);
|
|
return PTR_ERR(map);
|
|
}
|
|
|
|
[...]
|
|
}
|
|
}
|
|
[...]
|
|
}
|
|
|
|
|
|
__bpf_map_get contains the following code:
|
|
|
|
/* if error is returned, fd is released.
|
|
* On success caller should complete fd access with matching fdput()
|
|
*/
|
|
struct bpf_map *__bpf_map_get(struct fd f)
|
|
{
|
|
if (!f.file)
|
|
return ERR_PTR(-EBADF);
|
|
if (f.file->f_op != &bpf_map_fops) {
|
|
fdput(f);
|
|
return ERR_PTR(-EINVAL);
|
|
}
|
|
|
|
return f.file->private_data;
|
|
}
|
|
|
|
The problem is that when the caller supplies a file descriptor number referring
|
|
to a struct file that is not an eBPF map, both __bpf_map_get() and
|
|
replace_map_fd_with_map_ptr() will call fdput() on the struct fd. If
|
|
__fget_light() detected that the file descriptor table is shared with another
|
|
task and therefore the FDPUT_FPUT flag is set in the struct fd, this will cause
|
|
the reference count of the struct file to be over-decremented, allowing an
|
|
attacker to create a use-after-free situation where a struct file is freed
|
|
although there are still references to it.
|
|
|
|
A simple proof of concept that causes oopses/crashes on a kernel compiled with
|
|
memory debugging options is attached as crasher.tar.
|
|
|
|
|
|
One way to exploit this issue is to create a writable file descriptor, start a
|
|
write operation on it, wait for the kernel to verify the file's writability,
|
|
then free the writable file and open a readonly file that is allocated in the
|
|
same place before the kernel writes into the freed file, allowing an attacker
|
|
to write data to a readonly file. By e.g. writing to /etc/crontab, root
|
|
privileges can then be obtained.
|
|
|
|
There are two problems with this approach:
|
|
|
|
The attacker should ideally be able to determine whether a newly allocated
|
|
struct file is located at the same address as the previously freed one. Linux
|
|
provides a syscall that performs exactly this comparison for the caller:
|
|
kcmp(getpid(), getpid(), KCMP_FILE, uaf_fd, new_fd).
|
|
|
|
In order to make exploitation more reliable, the attacker should be able to
|
|
pause code execution in the kernel between the writability check of the target
|
|
file and the actual write operation. This can be done by abusing the writev()
|
|
syscall and FUSE: The attacker mounts a FUSE filesystem that artificially delays
|
|
read accesses, then mmap()s a file containing a struct iovec from that FUSE
|
|
filesystem and passes the result of mmap() to writev(). (Another way to do this
|
|
would be to use the userfaultfd() syscall.)
|
|
|
|
writev() calls do_writev(), which looks up the struct file * corresponding to
|
|
the file descriptor number and then calls vfs_writev(). vfs_writev() verifies
|
|
that the target file is writable, then calls do_readv_writev(), which first
|
|
copies the struct iovec from userspace using import_iovec(), then performs the
|
|
rest of the write operation. Because import_iovec() performs a userspace memory
|
|
access, it may have to wait for pages to be faulted in - and in this case, it
|
|
has to wait for the attacker-owned FUSE filesystem to resolve the pagefault,
|
|
allowing the attacker to suspend code execution in the kernel at that point
|
|
arbitrarily.
|
|
|
|
An exploit that puts all this together is in exploit.tar. Usage:
|
|
|
|
user@host:~/ebpf_mapfd_doubleput$ ./compile.sh
|
|
user@host:~/ebpf_mapfd_doubleput$ ./doubleput
|
|
starting writev
|
|
woohoo, got pointer reuse
|
|
writev returned successfully. if this worked, you'll have a root shell in <=60 seconds.
|
|
suid file detected, launching rootshell...
|
|
we have root privs now...
|
|
root@host:~/ebpf_mapfd_doubleput# id
|
|
uid=0(root) gid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),113(lpadmin),128(sambashare),999(vboxsf),1000(user)
|
|
|
|
This exploit was tested on a Ubuntu 16.04 Desktop system.
|
|
|
|
Fix: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8358b02bf67d3a5d8a825070e1aa73f25fb2e4c7
|
|
|
|
|
|
Proof of Concept: https://bugs.chromium.org/p/project-zero/issues/attachment?aid=232552
|
|
Exploit-DB Mirror: https://gitlab.com/exploit-database/exploitdb-bin-sploits/-/raw/main/bin-sploits/39772.zip |