Peter Goodman (@cxx.dev) Bsky

I'd love yo heqr about your solutions 😀

1 year ago 2 0 1 0

I'm delighted to announce that in the new year I'll be joining @hex-rays.bsky.social as a C++ developer! IDA Pro and the Hex Rays decompiler are indispensable tools for reverse engineers -- I can't wait to work on these products and join another top notch engineering team.

1 year ago 12 0 0 0

How do you see the balance of value brought to the table between new models just being better vs. whatever smarts are encoded in the harnessing of those models?

1 year ago 0 0 0 0

📌

1 year ago 0 0 0 0

📌

1 year ago 0 0 0 0

16/16 GRR and microx were my first major contributions at Trail of Bits, and represented a continuity of my DBT research from my M.Sc from the University of Toronto. They were super fun projects to create and work on, and I'm extremely proud of both of them.

1 year ago 0 0 0 0

15/16 What was humbling was that UDB itself was a record/replay x86-64 dynamic binary translator (DBT). So while I was toiling away trying to get my DBT working for DECREE, I was relying on their much more general system to debug mine!

1 year ago 0 0 1 0

UDB UDB is the time travel debugger for C/C++ applications running on Linux. Replay execution history to inspect program state and see what happened. Quickly debug race conditions, seg faults, stackoverfl...

14/16 As you can imagine, debugging a dynamic binary translator can be tricky; when things go wrong, your debugger isn't as helpful because there's no debug information for just-in-time translated code. UndoDB's time-travelling debugger, UDB, was a productivity multiplier (undo.io/products/udb).

1 year ago 0 0 1 0

13/16 At the time, the Unicorn engine didn't provide fine-grained information about instruction dependencies, and it was very crashy. Our attempts to use it had us concretizing any symbolic bytes in big swaths of the stack, artificially limiting the futures that the symbolic executor could explore.

1 year ago 0 0 1 0

GitHub - lifting-bits/microx: Safely execute an arbitrary x86 instruction Safely execute an arbitrary x86 instruction. Contribute to lifting-bits/microx development by creating an account on GitHub.

12/16 Fun segue: our pysymemu fork used microx (github.com/lifting-bits...), my fourth binary translator, to *natively* execute instructions that didn't have symbolic python models. Microx allowed us to minimize how much symbolic state had to be concretized when executing instructions natively.

1 year ago 0 0 1 0

GitHub - feliam/pysymemu: An amd64 symbolic emulator An amd64 symbolic emulator. Contribute to feliam/pysymemu development by creating an account on GitHub.

11/16 GRR's snapshots could also be shared with a custom CGC-specific fork of pysymemu (github.com/feliam/pysym...). Fun fact: pysymemu evolved into the Manticore symbolic executor (github.com/trailofbits/...). This sharing allowed the fuzzer and symbolic execution components to "blindly" cooperate.

1 year ago 0 0 1 0

10/16 Another cool thing was that GRR was deterministic and could produce and resume from program snapshots. The original motivation of this feature was to skip to the first read(2) system call, avoiding deterministic program setup costs.

1 year ago 0 0 1 0

9/16 GRR was a fairly effective fuzzer, but the fuzzer logic wasn't nearly as smart as its contemporaries such as AFL. Where the GRR fuzzer was good was that it could operate on the whole input or individual system calls, doing things like repeating or swapping inputs at a finer granularity.

1 year ago 0 0 1 0

8/16 Faithfully emulating DECREE meant doing a lot of weird testing. One fun discovery was that write(2) will avoid returning an EFAULT as long as a minimum number of bytes have been read (github.com/lifting-bits...).

1 year ago 0 0 1 0

7/16 DECREE, as released by DARPA, was implemented a Linux kernel fork that loaded CGC binaries (really: slightly tweaked ELFs) that used a custom system call personality table that restricted loaded binaries to just their few system calls.

1 year ago 0 0 1 0

6/16 To get Radamsa to work as a function meant compiling the Scheme to C using the OWL Lisp compiler, then patching that horrible output so that I could track its memory allocations and network calls, and invoke its main function as though it were any other normal function in a program.

1 year ago 0 0 1 0

Aki Helin / radamsa · GitLab a general-purpose fuzzer

5/16 Another fun thing about GRR was that Radamsa (gitlab.com/akihe/radamsa) was embedded as a callable function. If you're familiar with Radamsa then this may come as a surprise -- Radamsa is written in a Scheme dialect, and normally sends inputs over the network.

1 year ago 0 0 2 0

4/16 One cool thing is that GRR could handle self-modifying DECREE binaries, which made opening the code cache in IDA Pro or Binary Ninja fun, because you could browse the evolution of those code modifications.

1 year ago 0 0 1 0

3/16 GRR translated x86 into x86-64, so that one or more DECREE binaries in 4 GiB (32 bit) memory spaces within its own much larget 64 bit address space. Translated code could be instrumented for code coverage, and cached to disk to amortize translation costs across GRR runs.

1 year ago 0 0 1 0

2/16 DECREE programs are basically simplified 32-bit, x86 Linux programs -- they can use only six or so system calls. GRR's used dynamic binary translation, a just-in-time translation technique that rewrote the target program machine code while it was running!

1 year ago 0 0 1 0

GitHub - lifting-bits/grr: High-throughput fuzzer and emulator of DECREE binaries High-throughput fuzzer and emulator of DECREE binaries - lifting-bits/grr

1/16 One of my first creations at Trail of Bits was GRR (github.com/lifting-bits...), an all-in-one emulator and fuzzer for programs running on the DECREE operating system used in DARPA's Cyber Grand Challenge.

1 year ago 11 1 1 0

Aki Helin / radamsa · GitLab a general-purpose fuzzer

5/15 Another fun thing about GRR was that Radamsa (gitlab.com/akihe/radamsa) was embedded as a callable function. If you're familiar with Radamsa then this may come as a surprise -- Radamsa is written in a Scheme dialect, and normally sends inputs over the network.

1 year ago 0 0 0 0

15/15 Thanks (and sorry!) to the team of people who helped/suffered along the way! Also thanks to DARPA for funding this research through Sergey Bratus' Assured Micro-Patching (AMP) program.

1 year ago 0 0 0 0

14/15 In summary, Dr. Lojekyll was one of the most fun projects I created at Trail of Bits. It was also the most brutal to debug. I learned that debugging declarative languages is hard, and debugging in-progress/broken compilers for declarative languages is harder.

1 year ago 0 0 1 0

13/15 I always saw automated database factorization and nesting as the ultimate solution to the intermediate relation explosion problem, but we never had the time to address it, and Dr. Lojekyll's codebase was not flexible enough to make experimental extensions easy.

1 year ago 0 0 1 0

12/15 Using micro-databases on both sides of a client/server architecture ended up being fun: the server could do the heavyweight computations, then keep a thin client up-to-date with its differentials. Micro-databases could also be used inside stateful functors, allowing for database nesting.

1 year ago 0 0 1 0

11/15 Micro-databases were originally motivated to help a human solve Dr. Lojekyll's intermediate relation explosion problem. In Dr. Lojekyll, the need to do top-down execution meant a subset of intermediate relations (along with the named ones) had to be saved. That caused a lot of redundancy.

1 year ago 0 0 1 0

10/15 Another learning was micro-databases -- my codegen produced C++/Python classes, after all. I could separate out a small part of my Datalog program, compile it to a C++/Python class, and instantiate and destroy those on-demand.

1 year ago 0 0 1 0

9/15 There were other learnings for me on this project, like how typical compiler optimizations like common subexpression elimination can't be just be applied to same-shaped dataflow system operators, because operations over values and sets of values have different semantics!

1 year ago 0 0 1 0

8/15 This required codegen to produce traditional bottom-up, fixpoint-style procedural code, as well as top-down "double checking" code. The same IR was used to represent both cases, allowing me to retarget codegen to languages like C++ and Python. See slide 41: www.petergoodman.me/docs/dr-loje...

1 year ago 1 0 1 0

Posts by Peter Goodman