Background
I competed in the HackTheBox University CTF this year with the Flinders University Cybersecurity Society. The University CTF is an international competition for university students to compete as a team with their peers. The competiton had 955 teams this year.
There were six types of puzzles in the CTF. I focused my efforts on the Reverse Engineering challenges, and was able to solve 2 of them.
A reverse engineering challenge involves finding a secret flag hidden inside an executable application. Imagine being given a lock, and figuring out how it works so you can build the key; in this case, building the key would be getting the flag.
The first challenge, WindowOfOpportunity, was rated easy. However, the second challenge, BioBundle, which was ranked medium, let me experience the thrill of the hunt.
This writeup is aimed to showcase my thought process as I tackled BioBundle. I cover everything I tried, not just what worked. Hopefully this will be an educational and entertaining read.
If you are just after the solution, check out HackTheBox’s official writeup.
Part 1 - Initial Investigation
Using file
on BioBundle shows that it is an unstripped, 64-bit ELF.
When you BioBundle, nothing happens until you enter a line of text. Then it prints:
[X] Critical Failure
It’s a safe assumption that if you type in the flag, it’ll tell you that you got it correct. Otherwise, it will just say that you’ve got it wrong.
Running BioBundle with ltrace
or strace
provides no useful information to aid with the challenge. However,
it is still worth trying ltrace
and strace
; sometimes an easy challenge will just compare your input
with the flag, and ltrace
or strace
can show you the string comparison, revealing
the flag.
Running strings
on the BioBundle executable was interesting.
There were the usual strings you’d expect to find in an
executable, and some strings specific to BioBundle.
A part of the output of strings
:
... SNIP ...
[*] Untangled the bundle
[x] Critical Failure
;*3$"
Hr{q56677777777747 76777g'777777w7777777
7777777777w7
7>7w7,7-767773777777777777777777777777777
3777777
37777777'777777677727777'7777777'7777777'777777
6777777
67777777'777777677737777
7777777
7777777
777777
7777777
77777777'77777767771777'
777777' 777777' 777777/5777777
57777777'77777757771777
777777
777777
777777
6777777
6777777?777777737773777
5777777
5777777
... SNIP ...
Now, it’s obvious that the first two lines are just output
to be printed to the user. But what’s the deal with all these 7
s?
There are entire blocks of text from strings
that are JUST 7
s.
... SNIP ...
5t:15@;0?77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777&777777
... SNIP ...
There’s clearly something funny about this executable. What are all the 7
s?
We are not going to find anything more without moving into reverse engineering. It’s time to open Ghidra.
Part 2 - Go Go Gadget: Ghidra
After opening the file with Ghidra, and running all analysers, we are greeted
with the main()
function. For the sake of brevity, I’m going to assume that
you are at least somewhat familiar with basic C or C++.
Here’s the main()
function as shown in Ghidra:
undefined8 main(void)
{
int iVar1;
undefined8 uVar2;
size_t sVar3;
char local_98 [128];
code *local_18;
long local_10;
local_10 = get_handle();
local_18 = (code *)dlsym(local_10,&DAT_00102019);
if (local_18 == (code *)0x0) {
uVar2 = 0xffffffff;
}
else {
fgets(local_98,0x7f,stdin);
sVar3 = strcspn(local_98,"\n");
local_98[sVar3] = '\0';
iVar1 = (*local_18)(local_98);
if (iVar1 == 0) {
puts("[x] Critical Failure");
}
else {
puts("[*] Untangled the bundle");
}
uVar2 = 0;
}
return uVar2;
}
Looking through the main()
function, it seems rather straight-forward,
except for one part at the beginning (get_handle()
). Let’s skip over
that for now, and reverse the rest of the main()
function.
Knowing C
, the main function will return an int
. We can
correct this type, and assume that the value that gets returned, uVar2
,
is also an int
. I’ll also rename uVar2
to return_code
.
local_98
is a char[]
, which will be a C-string. For those unaware, a
C-string is a character array with its end denoted with a null-byte (0x00
).
This local_98
is used in an fgets
call reading from stdin
, so this will contain the
user input.
To make it simplier, I’ll convert the number in fgets
to decimal, and
rename local_98
to user_input
. From the above assembly, some may be wondering
why the fgets
only reads 127
characters, but user_input
is a char[]
with 128
characters. This is so that there is still room for the single null-byte at the end of
the char[]
.
Let’s look at the next two lines:
sVar1 = strcspn(user_input,"\n");
user_input[sVar1] = '\0';
This uses strcspn
, which is a function that takes two arguments, the first, a C-string
to search, the second, another C-string to search for. It will return the location of of the
second string in the first string.
You don’t need to memorise every function call. The man pages are a great resource; just search them for the function, and you’ll find some documentation.
Now, our disassembly looks a little more readable. However, there are still an few big
questions: What is local_10
? What does get_handle()
do? What is local_18
? Where are
all the 7
s?
int main(void)
{
int return_code;
size_t end_of_user_input;
char user_input [128];
code *local_18;
long local_10;
local_10 = get_handle();
local_18 = (code *)dlsym(local_10,&DAT_00102019);
if (local_18 == (code *)0x0) {
return_code = -1;
}
else {
fgets(user_input,127,stdin);
end_of_user_input = strcspn(user_input,"\n");
user_input[end_of_user_input] = '\0';
return_code = (*local_18)(user_input);
if (return_code == 0) {
puts("[x] Critical Failure");
}
else {
puts("[*] Untangled the bundle");
}
return_code = 0;
}
return return_code;
}
Tracing back through this code, it’s clear that we need to solve the mystery of local_18
.
There is a call to dlsym
, so let’s
check the man pages.
The function dlsym() takes a “handle” of a dynamic loaded shared object returned by dlopen(3) along with a null-terminated symbol name, and returns the address where that symbol is loaded into memory. If the symbol is not found, in the specified ob‐ ject or any of the shared objects that were automatically loaded by dlopen(3) when that object was loaded, dlsym() returns NULL. (The search performed by dlsym() is breadth first through the dependency tree of these shared objects.)
Ok, it seems that get_handle()
gives us a handle to a dynamically loaded library. dlsym
then
accesses a certain part of that library and it’s stored in local_18
to execute later.
Making an educated guess, we can assume that local_18
is a function pointer to the function that
compares our user_input
with the flag.
Now, we have figured out what main()
does:
- Load a dynamic library.
- Pull a specific symbol from that library (most likely the function to check the flag).
- Null-terminate the
user_input
. - Run that loaded function with the
user_input
. - Tell the user
Critical Failure
unless they entered the correct flag.
Part 3 - Getting a handle on get_handle()
Let’s have a look at the get_handle()
function:
long get_handle(void)
{
long lVar1;
undefined8 *puVar2;
byte bVar3;
byte local_1029;
undefined8 local_1028;
undefined8 local_1020;
undefined8 local_1018 [511];
long local_20;
int local_14;
ulong local_10;
bVar3 = 0;
local_14 = memfd_create(&DAT_00102004,0);
if (local_14 == -1) {
/* WARNING: Subroutine does not return */
exit(-1);
}
for (local_10 = 0; local_10 < 0x3e08; local_10 = local_10 + 1) {
local_1029 = __[local_10] ^ 0x37;
write(local_14,&local_1029,1);
}
local_1028 = 0;
local_1020 = 0;
puVar2 = local_1018;
for (lVar1 = 0x1fe; lVar1 != 0; lVar1 = lVar1 + -1) {
*puVar2 = 0;
puVar2 = puVar2 + (ulong)bVar3 * -2 + 1;
}
sprintf((char *)&local_1028,"/proc/self/fd/%d",local_14);
local_20 = dlopen(&local_1028,1);
if (local_20 == 0) {
/* WARNING: Subroutine does not return */
exit(-1);
}
return local_20;
}
This looks complicated, but we can break it down step by step.
memfd
Skipping over the variables, we see that local_14
gets set to the result of some
memfd_create()
function. Let’s find out what this function does.
We can check the man pages:
memfd_create() creates an anonymous file and returns a file descriptor that refers to it. The file behaves like a regular file, and so can be modified, truncated, memory-mapped, and so on. However, unlike a regular file, it lives in RAM and has a volatile backing storage. Once all references to the file are dropped, it is automatically released. Anony‐ mous memory is used for all backing pages of the file. Therefore, files created by memfd_create() have the same semantics as other anonymous memory allocations such as those al‐ located using mmap(2) with the MAP_ANONYMOUS flag.
To summarise, this function creates an in-memory file, and a file descriptor to access it.
It takes two arguments, first, a C-string for the name, and second, an unsigned int for optional flags.
Like a lot of C functions, it returns -1
on failure, which explains the if
statement below.
Now we can rename local_14
to in_mem_file
.
The for
loop
Now there is a for
loop, that goes from 0
, up to 0x3e08
(15880
in decimal).
We can see that local_10
is the iterator, that then accesses an as yet unknown array denoted as __
. I’ll rename
local_10
to iterator
. I could name this variable simply i
. However, when reverse
engineering something, I find it helpful to use more specific names.
Now inside that for
loop we see that some byte
variable, local_1029
, is set to __[iterator] ^ 37
.
This immediately looks like a simple XOR cipher.
Let’s rename local_1029
to deciphered_byte
.
Finally, we write
the deciphered data to the in_mem_file
. Storing this deciphered data in in_mem_file
means this is probably
what we’re looking for.
However, let’s finish with get_handle()
first.
The second for
loop
Continuing with the function, two variables are set to 0
, puVar2
is set to local_1018
, which is an array,
but I don’t know what it does. Then, we get into a second for
loop.
This for
loop starts at 0x1fe
, which I’ll convert to decimal (510
), and goes until it is 0
. The iterator
is lVar1
, which I’ll rename to iterator2
. It’s worth noting that iterator2
is is decremented by adding -1
.
I’ll assume that this is a decompilation artifact and move on. It sets the value of puVar2
, which is
our unknown array, to 0
. Then puVar2
is set to puVar + (ulong)bVar3 * -2 + 1
. Looking at bVar3
, it’s set to 0
at the
top of the function, and it is never updated. Let’s rename it to zero
. I’ll also rename local_1018
to some_array
,
and puVar2
to ptr_to_some_array
.
Looking at the equation again, we see that ptr_to_some_array = ptr_to_some_array + (ulong)zero * -2 + 1
. That is one of
the more verbose ways of simply writing ptr_to_some_array = ptr_to_some_array + 1
(or ptr_to_some_array++
).
ptr_to_some_array
, and some_array
are not used again in this function. For those unaware, in C, an array access arr[n]
is the same as using (arr + n)
. With that in mind, this loop is setting the value of ptr_to_some_array
, which is an element
in some_aray
, to 0
, then it increments itself, moving to the next element.
To summarise, this loop goes backwards from 510
to 0
, by adding -1
, each time setting an element of some_array
to 0
(accessing it via ptr_to_some_array
), and moves to the next element. It never uses iterator2
(for example, some_array[iterator2] = 0
).
This loop looks like it does nothing, and was written in such a way as to waste time as you figure out it does nothing.
The last section
After wasting our time with a seemingly red herring for
loop. Let’s look at the last bit of the code:
sprintf((char *)&local_1028,"/proc/self/fd/%d",in_mem_file);
local_20 = dlopen(&local_1028,1);
if (local_20 == 0) {
/* WARNING: Subroutine does not return */
exit(-1);
}
return local_20;
So we sprintf
/proc/self/fd/<in_mem_file>
into local_1028
as a char*
. If you are unaware of sprintf
, I’ll leave it as an exercise,
look it up in the man pages.
Then, we use dlopen
on the address of local_1028
, and store that in local_20
. Checking the man pages, we see that it loads a dynamic
shared object (library) from a filename, and again, takes flags as the second argument. dlopen
returns a handle to the loaded object.
Let’s rename local_1028
to in_mem_file_filename
, and change the type to be a char*
. I’ll also rename local_20
to be dynamic_library_handle
.
Now we can see that at the end of this function, if dlopen
fails, we just exit. Otherwise, we return the handle to the dynamic library.
Putting it together
Now that we’ve added better types and names for the variables, the code is easier to read.
long get_handle(void)
{
long iterator2;
undefined8 *ptr_to_some_array;
byte zero;
byte deciphered_byte;
char *in_mem_file_filename;
undefined8 local_1020;
undefined8 some_array [511];
long dynamic_library_handle;
int in_mem_file;
ulong iterator;
zero = 0;
in_mem_file = memfd_create(&DAT_00102004,0);
if (in_mem_file == -1) {
/* WARNING: Subroutine does not return */
exit(-1);
}
for (iterator = 0; iterator < 15880; iterator = iterator + 1) {
deciphered_byte = __[iterator] ^ 0x37;
write(in_mem_file,&deciphered_byte,1);
}
in_mem_file_filename = (char *)0x0;
local_1020 = 0;
ptr_to_some_array = some_array;
for (iterator2 = 510; iterator2 != 0; iterator2 = iterator2 + -1) {
*ptr_to_some_array = 0;
ptr_to_some_array = ptr_to_some_array + (ulong)zero * -2 + 1;
}
sprintf((char *)&in_mem_file_filename,"/proc/self/fd/%d",in_mem_file);
dynamic_library_handle = dlopen(&in_mem_file_filename,1);
if (dynamic_library_handle == 0) {
/* WARNING: Subroutine does not return */
exit(-1);
}
return dynamic_library_handle;
}
Having a look at the code again, we can finally make sense of it:
- Create an in-memory file.
- Take data from
__
, decipher the data with0x37
as the key, and store it in the in-memory file. - Get the filename for the in-memory file.
- Load the in-memory file as a
dll
. - Return the handle to the loaded
dll
.
Part 4 - Extracting the data from __
Looking at the disassembly in Ghidra, we can see that there is a bunch of data under the __
label.
I manually deciphered the first 4 bytes (0x48, 0x72, 0x78, 0x71
). The first byte, now 0x7f
, was
not ascii, but the next 3 were, 'E', 'L', 'F'
. In one of my less inteligent moments, I thought,
“Huh, that looks just like an ELF header. What a coincidence?” And promptly went back to starting at
the red herring for
loop. However, after a few minutes, I realized that it probably wasn’t a
coincidence.
Starting at the beginning of __
, I used Ghidra to select the next 15580
bytes, the same amount
as the iterator in the for
loop that deciphers this data. I copied it as a Python list, and put it
into a script to decipher all the data, and save that resulting deciphered data to a file called
output.dll
.
arr = [ 0x48, 0x72, 0x7b, 0x71, 0x35, 0x36, 0x36, 0x37,
... SNIP ...
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37 ]
deciphered_data = [b ^ 0x37 for b in arr]
with open('output.dll', 'wb') as f:
f.write(bytearray(converted_arr))
Checking output.dll
with file
shows that it is an ELF 64-bit LSB shared object. This should contain
the function that checks our input with the flag.
Before we move on, we can solve another mystery: Where did all the 7
s go?
Looking an a snippet of the end of the bytes contained in __
, we see that there are a lot of 0x37
bytes.
... SNIP ...
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x3f, 0x37,
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x3f, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0xe2, 0x37, 0x37,
0x37, 0x36, 0x37, 0x37, 0x37, 0x34, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x17, 0x77, 0x37, 0x37,
0x37, 0x37, 0x37, 0x37, 0x17, 0x07, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x3f, 0x37, 0x37, 0x37, 0x37,
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x3f, 0x37, 0x37, 0x37, 0x37, 0x37,
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0xec, 0x37, 0x37, 0x37, 0x3f, 0x37, 0x37,
0x37, 0x34, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x1f, 0x77, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37,
0x1f, 0x07, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x3f, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37,
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x36, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37,
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0xd7, 0x37, 0x37, 0x37, 0x36, 0x37, 0x37, 0x37, 0x07, 0x37, 0x37,
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x1f, 0x07, 0x37, 0x37,
0x37, 0x37, 0x37, 0x37, 0x10, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37,
0x37, 0x37, 0x37, 0x36, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x36, 0x37, 0x37, 0x37, 0x37, 0x37,
0x37, 0x37, 0x36, 0x37, 0x37, 0x37, 0x35, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37,
... SNIP ...
When we decipher this data, by XORing it with 0x37
, they all become null-bytes. However, in their current
form, they corrospond to 7
in ascii.
>>> chr(0x37)
'7'
Now, we’re down to our last mystery: What is the flag? To answer this, we’ll need to reverse engineer
the dll
we extracted. Running strings
on the dll shows that the flag is almost readable.
... SNIP ...
GLIBC_2.2.5
u/UH
HTB{st4tH
1c_l1b5_H
but_c00l
;*3$"
GCC: (Debian 10.2.1-6) 10.2.1 20210110
... SNIP ...
We’ll need to use Ghidra again. After opening the dll
, and running all analysers, there is an interesting
looking function called only _
.
bool _(char *param_1)
{
int iVar1;
undefined8 local_48;
undefined8 local_40;
undefined8 local_38;
undefined8 local_30;
undefined8 local_28;
undefined8 local_20;
undefined8 local_18;
undefined8 local_10;
local_48 = 0x743474737b425448;
local_40 = 0x5f3562316c5f6331;
local_38 = 0x6c3030635f747562;
local_30 = 0x7d7233;
local_28 = 0;
local_20 = 0;
local_18 = 0;
local_10 = 0;
iVar1 = strcmp(param_1,(char *)&local_48);
return iVar1 == 0;
}
This function takes a C-string, does a string compare between that C-string and some magic values inside the function, and returns the result. This looks like what we’re after.
Based on the type cast in the strcmp
, we can change the type of local_48
to a char[32]
. Now we see that
the function has changed significantly. While we’re tweaking the function, I’ll rename local_48
to flag
,
param_1
to user_input
, and iVar1
to result
.
bool _(char *user_input)
{
int result;
char flag [32];
flag[0] = 'H';
flag[1] = 'T';
flag[2] = 'B';
flag[3] = '{';
flag[4] = 's';
flag[5] = 't';
flag[6] = '4';
flag[7] = 't';
flag[8] = '1';
flag[9] = 'c';
flag[10] = '_';
flag[11] = 'l';
flag[12] = '1';
flag[13] = 'b';
flag[14] = '5';
flag[15] = '_';
flag[16] = 'b';
flag[17] = 'u';
flag[18] = 't';
flag[19] = '_';
flag[20] = 'c';
flag[21] = '0';
flag[22] = '0';
flag[23] = 'l';
flag[24] = '3';
flag[25] = 'r';
flag[26] = '}';
flag[27] = '\0';
flag[28] = '\0';
flag[29] = '\0';
flag[30] = '\0';
flag[31] = '\0';
result = strcmp(user_input,flag);
return result == 0;
}
We can now see the flag in plain text:
HTB{st4t1c_l1b5_but_c00l3r}
Conclusion
BioBundle was a fun reverse engineering challenge. This was my first time reverse engineering an executable using a shared library. I had to consult the man pages quite a bit, but I learned a lot from this challenge.
I haven’t practiced reverse engineering much, but I faired well in this challenge due to my experience with C/C++, and microcontrollers. I want to learn more about reverse engineering because of this CTF.
In the end, Flinders University placed 258th out of 955 teams. I was able to secure 2 reverse engineering flags, and a user fullpwn flag. I was quite suprised by this result, but it shows that sometimes all you need is a little determination and a willingness to learn as you go.