I am a cyber security researcher specializing in automated binary analysis - AMA

Hexorg@beehaw.org · edit-2 1 year ago

I am a cyber security researcher specializing in automated binary analysis - AMA

KNova@links.dartboard.social · 1 year ago

Any trends in the world of security research or malware analysis that security-minded should be paying attention to?

Hexorg@beehaw.org · 1 year ago

The concolic execution research is speeding up though slowly. At CGC we showed that automation could find plenty of vulnerabilities of the 90s. At the same time at the end of CGC the best machine was pitted against humans in DEFCON capture the flag and the best machine placed second-to-last. So old school vulnerabilities can now be found automatically, but we also have all-purpose mitigations for them now like no-execute memory pages, stack canaries, and addres-space layout randomization. Once automation is able to reason about those general purpose mitigations we will probably see many zero days in existing code bases. I think that day is about 10 years away.

strudel6242@beehaw.org · 1 year ago

What was your journey like getting into this as a career? What have been some of the toughest challenges you’ve faced as a researcher? Why did you specialise in automated binary analysis?

Hexorg@beehaw.org · 1 year ago

I think I technically started by trying to cheat in Diablo 1 using cheat’o’matic when I was 12😅 Then I started learning programming, I got an electrical engineering bachelors which got my understanding close to the wires inside of the CPU. Then I got my PhD in engineering with concentration in cyber security. I think my toughest challenge was just she sheer amount of domain-specific research there is in binary analysis. For example preventing stack overflows, SQL injections, cross-site scripting, or unauthorized access - all completely disjoint.

One Darpa PM said that binary analysis feels like using an electron tunneling microscope scanning the whole baseball field and trying to figure out the rules of baseball based of the scans.

RoaringSilence@kbin.social · 1 year ago

Thanks for doing this ama!

Without revealing to much, what are your customers or is it pure research based?

A second question, is the code generated vulnerable often because using certain programming languages that have “known” problems or are the problems coming mostly from bad coding habits?

Hexorg@beehaw.org · 1 year ago

I was associated with this so you can infer clients from there.

Overall - no, even memory-safe languages can let you write vulnerable code. Heck even SQL which is a database query language can have SQL injections. Developers write code to reason over infinite possible data. We can’t reason over infinite data so we use assumptions about it. Vulnerabilities happen when our assumptions can be broken. Theoretically if you formalize all of your assumptions you can have a computer check if those assumptions hold, but then what if you forgot to list an assumption? There are infinite amount of possible assumptions too so even fully formalized approaches can’t help you 100% (though they can make your code a lot more resilient).

Better coding practices essentially help developers manage assumptions better. But what happens if the requirement changed and you didn’t account for old assumptions in the new code? Or what if you’re the new developer and you don’t know what assumptions the code holds? It’s hard. Automation can make it easier, but I doubt it’ll ever be 100% non vulnerable code.

PenguinCoder@beehaw.org · edit-2 1 year ago

I’m an incident responder/malware analyst. Mostly do static analysis and reverse engineering. What would you say the benefit of your research and this binary analysis is compared to other offerings? What do you do about highly obfuscated or ‘benign’ looking binaries that aren’t?

Hexorg@beehaw.org · 1 year ago

I’m not too sure about the chain of command during incident response. Theoretically this research is going to make finding vulnerabilities and finding attack vectors easier. Once you have the malicious binary (and we solved some problems) you can say “what input caused this malicious binary to call ptrace” and the automation will say “if socket X read ‘write \0\0\0 to stdin of pid 3738’ then the binary eventually will call ptrace”. The analysis is dynamic and works on stripped binaries so generally obfuscation isn’t a concern. Currently the biggest challenge is variable-sized loops where the size is symbolic (as in the path to ptrace depends on the iteration count). The automation needs domain specific knowledge about reasoning over variable sized loops. (Eg the automation needs to be taught how to invert strlen())

PenguinCoder@beehaw.org · 1 year ago

deleted by creator

StringTheory@beehaw.org · edit-2 1 year ago

deleted by creator

Hexorg@beehaw.org · 1 year ago

Uh… I think I agree, but… wrong thread?

StringTheory@beehaw.org · 1 year ago

Yes, thank you! My screen hiccuped and I don’t know how my comment landed here!