Background

I competed in the HackTheBox University CTF this year with the Flinders University Cybersecurity Society. The University CTF is an international competition for university students to compete as a team with their peers. The competiton had 955 teams this year.

There were six types of puzzles in the CTF. I focused my efforts on the Reverse Engineering challenges, and was able to solve 2 of them.

A reverse engineering challenge involves finding a secret flag hidden inside an executable application. Imagine being given a lock, and figuring out how it works so you can build the key; in this case, building the key would be getting the flag.

The first challenge, WindowOfOpportunity, was rated easy. However, the second challenge, BioBundle, which was ranked medium, let me experience the thrill of the hunt.

This writeup is aimed to showcase my thought process as I tackled BioBundle. I cover everything I tried, not just what worked. Hopefully this will be an educational and entertaining read.

If you are just after the solution, check out HackTheBox’s official writeup.

Part 1 - Initial Investigation

Using file on BioBundle shows that it is an unstripped, 64-bit ELF. When you BioBundle, nothing happens until you enter a line of text. Then it prints:

[X] Critical Failure

It’s a safe assumption that if you type in the flag, it’ll tell you that you got it correct. Otherwise, it will just say that you’ve got it wrong.

Running BioBundle with ltrace or strace provides no useful information to aid with the challenge. However, it is still worth trying ltrace and strace; sometimes an easy challenge will just compare your input with the flag, and ltrace or strace can show you the string comparison, revealing the flag.

Running strings on the BioBundle executable was interesting. There were the usual strings you’d expect to find in an executable, and some strings specific to BioBundle.

A part of the output of strings:

... SNIP ...
[*] Untangled the bundle
[x] Critical Failure
;*3$"
Hr{q56677777777747	76777g'777777w7777777
7777777777w7
7>7w7,7-767773777777777777777777777777777
3777777
37777777'777777677727777'7777777'7777777'777777
6777777
67777777'777777677737777
7777777
7777777
777777
7777777
77777777'77777767771777'
777777'	777777'	777777/5777777
57777777'77777757771777
777777
	777777
	777777
6777777
6777777?777777737773777
5777777
5777777
... SNIP ...

Now, it’s obvious that the first two lines are just output to be printed to the user. But what’s the deal with all these 7s?

There are entire blocks of text from strings that are JUST 7s.

... SNIP ...
5t:15@;0?77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777&777777
... SNIP ...

There’s clearly something funny about this executable. What are all the 7s?

We are not going to find anything more without moving into reverse engineering. It’s time to open Ghidra.

Part 2 - Go Go Gadget: Ghidra

After opening the file with Ghidra, and running all analysers, we are greeted with the main() function. For the sake of brevity, I’m going to assume that you are at least somewhat familiar with basic C or C++.

Here’s the main() function as shown in Ghidra:

undefined8 main(void)
{
  int iVar1;
  undefined8 uVar2;
  size_t sVar3;
  char local_98 [128];
  code *local_18;
  long local_10;
  
  local_10 = get_handle();
  local_18 = (code *)dlsym(local_10,&DAT_00102019);
  if (local_18 == (code *)0x0) {
    uVar2 = 0xffffffff;
  }
  else {
    fgets(local_98,0x7f,stdin);
    sVar3 = strcspn(local_98,"\n");
    local_98[sVar3] = '\0';
    iVar1 = (*local_18)(local_98);
    if (iVar1 == 0) {
      puts("[x] Critical Failure");
    }
    else {
      puts("[*] Untangled the bundle");
    }
    uVar2 = 0;
  }
  return uVar2;
}

Looking through the main() function, it seems rather straight-forward, except for one part at the beginning (get_handle()). Let’s skip over that for now, and reverse the rest of the main() function.

Knowing C, the main function will return an int. We can correct this type, and assume that the value that gets returned, uVar2, is also an int. I’ll also rename uVar2 to return_code.

local_98 is a char[], which will be a C-string. For those unaware, a C-string is a character array with its end denoted with a null-byte (0x00). This local_98 is used in an fgets call reading from stdin, so this will contain the user input.

To make it simplier, I’ll convert the number in fgets to decimal, and rename local_98 to user_input. From the above assembly, some may be wondering why the fgets only reads 127 characters, but user_input is a char[] with 128 characters. This is so that there is still room for the single null-byte at the end of the char[].

Let’s look at the next two lines:

sVar1 = strcspn(user_input,"\n");
user_input[sVar1] = '\0';

This uses strcspn, which is a function that takes two arguments, the first, a C-string to search, the second, another C-string to search for. It will return the location of of the second string in the first string.

You don’t need to memorise every function call. The man pages are a great resource; just search them for the function, and you’ll find some documentation.

Now, our disassembly looks a little more readable. However, there are still an few big questions: What is local_10? What does get_handle() do? What is local_18? Where are all the 7s?

int main(void)
{
  int return_code;
  size_t end_of_user_input;
  char user_input [128];
  code *local_18;
  long local_10;
  
  local_10 = get_handle();
  local_18 = (code *)dlsym(local_10,&DAT_00102019);
  if (local_18 == (code *)0x0) {
    return_code = -1;
  }
  else {
    fgets(user_input,127,stdin);
    end_of_user_input = strcspn(user_input,"\n");
    user_input[end_of_user_input] = '\0';
    return_code = (*local_18)(user_input);
    if (return_code == 0) {
      puts("[x] Critical Failure");
    }
    else {
      puts("[*] Untangled the bundle");
    }
    return_code = 0;
  }
  return return_code;
}

Tracing back through this code, it’s clear that we need to solve the mystery of local_18. There is a call to dlsym, so let’s check the man pages.

The function dlsym() takes a “handle” of a dynamic loaded shared object returned by dlopen(3) along with a null-terminated symbol name, and returns the address where that symbol is loaded into memory. If the symbol is not found, in the specified ob‐ ject or any of the shared objects that were automatically loaded by dlopen(3) when that object was loaded, dlsym() returns NULL. (The search performed by dlsym() is breadth first through the dependency tree of these shared objects.)

Ok, it seems that get_handle() gives us a handle to a dynamically loaded library. dlsym then accesses a certain part of that library and it’s stored in local_18 to execute later. Making an educated guess, we can assume that local_18 is a function pointer to the function that compares our user_input with the flag.

Now, we have figured out what main() does:

  1. Load a dynamic library.
  2. Pull a specific symbol from that library (most likely the function to check the flag).
  3. Null-terminate the user_input.
  4. Run that loaded function with the user_input.
  5. Tell the user Critical Failure unless they entered the correct flag.

Part 3 - Getting a handle on get_handle()

Let’s have a look at the get_handle() function:

long get_handle(void)
{
  long lVar1;
  undefined8 *puVar2;
  byte bVar3;
  byte local_1029;
  undefined8 local_1028;
  undefined8 local_1020;
  undefined8 local_1018 [511];
  long local_20;
  int local_14;
  ulong local_10;
  
  bVar3 = 0;
  local_14 = memfd_create(&DAT_00102004,0);
  if (local_14 == -1) {
                    /* WARNING: Subroutine does not return */
    exit(-1);
  }
  for (local_10 = 0; local_10 < 0x3e08; local_10 = local_10 + 1) {
    local_1029 = __[local_10] ^ 0x37;
    write(local_14,&local_1029,1);
  }
  local_1028 = 0;
  local_1020 = 0;
  puVar2 = local_1018;
  for (lVar1 = 0x1fe; lVar1 != 0; lVar1 = lVar1 + -1) {
    *puVar2 = 0;
    puVar2 = puVar2 + (ulong)bVar3 * -2 + 1;
  }
  sprintf((char *)&local_1028,"/proc/self/fd/%d",local_14);
  local_20 = dlopen(&local_1028,1);
  if (local_20 == 0) {
                    /* WARNING: Subroutine does not return */
    exit(-1);
  }
  return local_20;
}

This looks complicated, but we can break it down step by step.

memfd

Skipping over the variables, we see that local_14 gets set to the result of some memfd_create() function. Let’s find out what this function does.

We can check the man pages:

memfd_create() creates an anonymous file and returns a file descriptor that refers to it. The file behaves like a regular file, and so can be modified, truncated, memory-mapped, and so on. However, unlike a regular file, it lives in RAM and has a volatile backing storage. Once all references to the file are dropped, it is automatically released. Anony‐ mous memory is used for all backing pages of the file. Therefore, files created by memfd_create() have the same semantics as other anonymous memory allocations such as those al‐ located using mmap(2) with the MAP_ANONYMOUS flag.

To summarise, this function creates an in-memory file, and a file descriptor to access it. It takes two arguments, first, a C-string for the name, and second, an unsigned int for optional flags. Like a lot of C functions, it returns -1 on failure, which explains the if statement below.

Now we can rename local_14 to in_mem_file.

The for loop

Now there is a for loop, that goes from 0, up to 0x3e08 (15880 in decimal). We can see that local_10 is the iterator, that then accesses an as yet unknown array denoted as __. I’ll rename local_10 to iterator. I could name this variable simply i. However, when reverse engineering something, I find it helpful to use more specific names.

Now inside that for loop we see that some byte variable, local_1029, is set to __[iterator] ^ 37. This immediately looks like a simple XOR cipher. Let’s rename local_1029 to deciphered_byte.

Finally, we write the deciphered data to the in_mem_file. Storing this deciphered data in in_mem_file means this is probably what we’re looking for. However, let’s finish with get_handle() first.

The second for loop

Continuing with the function, two variables are set to 0, puVar2 is set to local_1018, which is an array, but I don’t know what it does. Then, we get into a second for loop.

This for loop starts at 0x1fe, which I’ll convert to decimal (510), and goes until it is 0. The iterator is lVar1, which I’ll rename to iterator2. It’s worth noting that iterator2 is is decremented by adding -1. I’ll assume that this is a decompilation artifact and move on. It sets the value of puVar2, which is our unknown array, to 0. Then puVar2 is set to puVar + (ulong)bVar3 * -2 + 1. Looking at bVar3, it’s set to 0 at the top of the function, and it is never updated. Let’s rename it to zero. I’ll also rename local_1018 to some_array, and puVar2 to ptr_to_some_array.

Looking at the equation again, we see that ptr_to_some_array = ptr_to_some_array + (ulong)zero * -2 + 1. That is one of the more verbose ways of simply writing ptr_to_some_array = ptr_to_some_array + 1 (or ptr_to_some_array++). ptr_to_some_array, and some_array are not used again in this function. For those unaware, in C, an array access arr[n] is the same as using (arr + n). With that in mind, this loop is setting the value of ptr_to_some_array, which is an element in some_aray, to 0, then it increments itself, moving to the next element.

To summarise, this loop goes backwards from 510 to 0, by adding -1, each time setting an element of some_array to 0 (accessing it via ptr_to_some_array), and moves to the next element. It never uses iterator2 (for example, some_array[iterator2] = 0). This loop looks like it does nothing, and was written in such a way as to waste time as you figure out it does nothing.

The last section

After wasting our time with a seemingly red herring for loop. Let’s look at the last bit of the code:

  sprintf((char *)&local_1028,"/proc/self/fd/%d",in_mem_file);
  local_20 = dlopen(&local_1028,1);
  if (local_20 == 0) {
                    /* WARNING: Subroutine does not return */
    exit(-1);
  }
  return local_20;

So we sprintf /proc/self/fd/<in_mem_file> into local_1028 as a char*. If you are unaware of sprintf, I’ll leave it as an exercise, look it up in the man pages.

Then, we use dlopen on the address of local_1028, and store that in local_20. Checking the man pages, we see that it loads a dynamic shared object (library) from a filename, and again, takes flags as the second argument. dlopen returns a handle to the loaded object.

Let’s rename local_1028 to in_mem_file_filename, and change the type to be a char*. I’ll also rename local_20 to be dynamic_library_handle.

Now we can see that at the end of this function, if dlopen fails, we just exit. Otherwise, we return the handle to the dynamic library.

Putting it together

Now that we’ve added better types and names for the variables, the code is easier to read.

long get_handle(void)
{
  long iterator2;
  undefined8 *ptr_to_some_array;
  byte zero;
  byte deciphered_byte;
  char *in_mem_file_filename;
  undefined8 local_1020;
  undefined8 some_array [511];
  long dynamic_library_handle;
  int in_mem_file;
  ulong iterator;
  
  zero = 0;
  in_mem_file = memfd_create(&DAT_00102004,0);
  if (in_mem_file == -1) {
                    /* WARNING: Subroutine does not return */
    exit(-1);
  }
  for (iterator = 0; iterator < 15880; iterator = iterator + 1) {
    deciphered_byte = __[iterator] ^ 0x37;
    write(in_mem_file,&deciphered_byte,1);
  }
  in_mem_file_filename = (char *)0x0;
  local_1020 = 0;
  ptr_to_some_array = some_array;
  for (iterator2 = 510; iterator2 != 0; iterator2 = iterator2 + -1) {
    *ptr_to_some_array = 0;
    ptr_to_some_array = ptr_to_some_array + (ulong)zero * -2 + 1;
  }
  sprintf((char *)&in_mem_file_filename,"/proc/self/fd/%d",in_mem_file);
  dynamic_library_handle = dlopen(&in_mem_file_filename,1);
  if (dynamic_library_handle == 0) {
                    /* WARNING: Subroutine does not return */
    exit(-1);
  }
  return dynamic_library_handle;
}

Having a look at the code again, we can finally make sense of it:

  1. Create an in-memory file.
  2. Take data from __, decipher the data with 0x37 as the key, and store it in the in-memory file.
  3. Get the filename for the in-memory file.
  4. Load the in-memory file as a dll.
  5. Return the handle to the loaded dll.

Part 4 - Extracting the data from __

Looking at the disassembly in Ghidra, we can see that there is a bunch of data under the __ label.

I manually deciphered the first 4 bytes (0x48, 0x72, 0x78, 0x71). The first byte, now 0x7f, was not ascii, but the next 3 were, 'E', 'L', 'F'. In one of my less inteligent moments, I thought, “Huh, that looks just like an ELF header. What a coincidence?” And promptly went back to starting at the red herring for loop. However, after a few minutes, I realized that it probably wasn’t a coincidence.

Starting at the beginning of __, I used Ghidra to select the next 15580 bytes, the same amount as the iterator in the for loop that deciphers this data. I copied it as a Python list, and put it into a script to decipher all the data, and save that resulting deciphered data to a file called output.dll.

arr = [ 0x48, 0x72, 0x7b, 0x71, 0x35, 0x36, 0x36, 0x37,
      ... SNIP ...
  0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37 ]

deciphered_data = [b ^ 0x37 for b in arr]

with open('output.dll', 'wb') as f:
    f.write(bytearray(converted_arr))

Checking output.dll with file shows that it is an ELF 64-bit LSB shared object. This should contain the function that checks our input with the flag.

Before we move on, we can solve another mystery: Where did all the 7s go? Looking an a snippet of the end of the bytes contained in __, we see that there are a lot of 0x37 bytes.

... SNIP ...
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x3f, 0x37, 
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x3f, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0xe2, 0x37, 0x37, 
0x37, 0x36, 0x37, 0x37, 0x37, 0x34, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x17, 0x77, 0x37, 0x37, 
0x37, 0x37, 0x37, 0x37, 0x17, 0x07, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x3f, 0x37, 0x37, 0x37, 0x37, 
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x3f, 0x37, 0x37, 0x37, 0x37, 0x37, 
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0xec, 0x37, 0x37, 0x37, 0x3f, 0x37, 0x37, 
0x37, 0x34, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x1f, 0x77, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 
0x1f, 0x07, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x3f, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x36, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0xd7, 0x37, 0x37, 0x37, 0x36, 0x37, 0x37, 0x37, 0x07, 0x37, 0x37, 
0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x1f, 0x07, 0x37, 0x37, 
0x37, 0x37, 0x37, 0x37, 0x10, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 
0x37, 0x37, 0x37, 0x36, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x36, 0x37, 0x37, 0x37, 0x37, 0x37, 
0x37, 0x37, 0x36, 0x37, 0x37, 0x37, 0x35, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 0x37, 
... SNIP ...

When we decipher this data, by XORing it with 0x37, they all become null-bytes. However, in their current form, they corrospond to 7 in ascii.

>>> chr(0x37)
'7'

Now, we’re down to our last mystery: What is the flag? To answer this, we’ll need to reverse engineer the dll we extracted. Running strings on the dll shows that the flag is almost readable.

... SNIP ...
GLIBC_2.2.5
u/UH
HTB{st4tH
1c_l1b5_H
but_c00l
;*3$"
GCC: (Debian 10.2.1-6) 10.2.1 20210110
... SNIP ...

We’ll need to use Ghidra again. After opening the dll, and running all analysers, there is an interesting looking function called only _.

bool _(char *param_1)
{
  int iVar1;
  undefined8 local_48;
  undefined8 local_40;
  undefined8 local_38;
  undefined8 local_30;
  undefined8 local_28;
  undefined8 local_20;
  undefined8 local_18;
  undefined8 local_10;
  
  local_48 = 0x743474737b425448;
  local_40 = 0x5f3562316c5f6331;
  local_38 = 0x6c3030635f747562;
  local_30 = 0x7d7233;
  local_28 = 0;
  local_20 = 0;
  local_18 = 0;
  local_10 = 0;
  iVar1 = strcmp(param_1,(char *)&local_48);
  return iVar1 == 0;
}

This function takes a C-string, does a string compare between that C-string and some magic values inside the function, and returns the result. This looks like what we’re after.

Based on the type cast in the strcmp, we can change the type of local_48 to a char[32]. Now we see that the function has changed significantly. While we’re tweaking the function, I’ll rename local_48 to flag, param_1 to user_input, and iVar1 to result.

bool _(char *user_input)
{
  int result;
  char flag [32];
  
  flag[0] = 'H';
  flag[1] = 'T';
  flag[2] = 'B';
  flag[3] = '{';
  flag[4] = 's';
  flag[5] = 't';
  flag[6] = '4';
  flag[7] = 't';
  flag[8] = '1';
  flag[9] = 'c';
  flag[10] = '_';
  flag[11] = 'l';
  flag[12] = '1';
  flag[13] = 'b';
  flag[14] = '5';
  flag[15] = '_';
  flag[16] = 'b';
  flag[17] = 'u';
  flag[18] = 't';
  flag[19] = '_';
  flag[20] = 'c';
  flag[21] = '0';
  flag[22] = '0';
  flag[23] = 'l';
  flag[24] = '3';
  flag[25] = 'r';
  flag[26] = '}';
  flag[27] = '\0';
  flag[28] = '\0';
  flag[29] = '\0';
  flag[30] = '\0';
  flag[31] = '\0';
  result = strcmp(user_input,flag);
  return result == 0;
}

We can now see the flag in plain text:

HTB{st4t1c_l1b5_but_c00l3r}

Conclusion

BioBundle was a fun reverse engineering challenge. This was my first time reverse engineering an executable using a shared library. I had to consult the man pages quite a bit, but I learned a lot from this challenge.

I haven’t practiced reverse engineering much, but I faired well in this challenge due to my experience with C/C++, and microcontrollers. I want to learn more about reverse engineering because of this CTF.

In the end, Flinders University placed 258th out of 955 teams. I was able to secure 2 reverse engineering flags, and a user fullpwn flag. I was quite suprised by this result, but it shows that sometimes all you need is a little determination and a willingness to learn as you go.