Break in to Tool Dev / RE
This article is designed for people that want to break into reverse engineering and tool development. This breakout is effectively the pipeline that has gotten a large number of candidates jobs in these two disciplines.
Tool development: the art and science of systematically building software that can break other software to find and test their vulnerabilities which leads to protecting against those same vulnerabilities. Figure out how and why something breaks so that it can be fixed before the bad guys break it.
Reverse engineering (RE): The art and science of looking deeply at a tool or piece of malware to figure out how it works and use that information to fix the software and better identify and defend against attacks.
Yes, tool development and reverse engineering are two different disciplines. However, they have substantially similar crossover and the fundamentals are generally focused on the C language. So learn it. Example: You write and compile a program and it doesn’t work. You troubleshoot it and it doesn’t work. So you pull it into a debugger and step through the program instruction by instruction or a decompiler to verify if the compiled program is what you expect to see. Boom: you just reverse engineered your own program. On the flip side, you are reversing something and you don’t understand what you’re looking at so you pull up a developer book to look up what some esoteric command does. Boom: you just did some developer work.
Each STEP is broken into two sections: macro and micro. The macro section includes some concepts you need to know and background information to help frame WHAT and WHY you do in the micro portion which is more hands on. Throughout each step, consider posting your code and results to GitHub as some employers want to see a lot of green (commits).
If you have a basic background or understanding in computer science, this won’t take terribly long. If you’re starting from scratch, YOU CAN DO IT, but it can take up to a year or more depending on your situation, grit, and ability to teach yourself new content. If you’re new to this sort of thing but are relatively technically-minded and comfortable with asking WHY 30 times a day and only figuring out an answer four times a day, you got this.
— — — — — — — — — — — — — — — — — — — — — — — — — —
STEP 1: The goal of this exercise is to understand the C language fundamentals to a 30% solution. Understanding will increase in additional steps. You will also use some tools that are fundamental to progress such as gcc, Ghidra, and IDA Free.
Step 1 Macro: This orients you to the C language.
- Start by downloading the C Cheatsheet from the Rainier Cyber GitHub. Print a copy and keep it on you at all times so you can reference and make notes. https://github.com/Rainier-Cyber/C-cheatsheet. This cheatsheet is designed to take a modern approach to the C language with definitions and language that makes sense to current audiences. There’s also multiple code examples that reinforce learning concepts. Huge shout out to Sean Eyre who built most of the cheatsheet!
Step 1 Knowledge:
- Disassembler: A disassembler is a software tool which transforms machine code into a human readable mnemonic representation called assembly language.
- Debugger: Debuggers allow the user to view and change the running state of a program.
- Decompiler: Software used to revert the process of compilation. Decompilers take a binary program file as input and output the same program expressed in a structured higher-level language.
- Learn the following sections: Code, Comments, Data Types, Casting, Structs/Arrays/Pointers, Functions, Operations, Statements, Key Words, IO, Memory, Strings, and Compiling
- THERE ARE NO STRINGS IN C! There are arrays of characters. Put an end of line / new line at the end of your array for text
- What is the difference between strcopy and strncopy?
- How does malloc() work? Eventually you will write your own version of strncopy and malloc()
Step 1 Macro Option A: If you don’t have a background in programming or the C Cheatsheet scares you, take a C Programming course if you need to. Here’s one that some friends have enjoyed on Udemy (not an affiliate): https://www.udemy.com/course/c-programming-for-beginners-/. Print out the cheatsheet and keep it open and take notes on it throughout the course.
Step 1 Macro Option B: You have a background in programming and you once invited malloc to your birthday party and it showed up. Start by again reviewing the C Cheatsheet above.
Step 1 Micro — You really should be doing this in a virtual machine (VM). It’s good practice. If you don’t know how, go to YouTube. There are plenty of videos that will show you how to spin up a Linux VM. Go find them. This intro is about learning RE and tool dev, not VM fundamentals.
- Install Ghidra from https://ghidra-sre.org/
- Note, to render GitHub html pages in the browser, prepend the https://github with https://htmlpreview.github.io/? or you can download the GitHub repository
- Installation guide: https://ghidra-sre.org/InstallationGuide.html#Install
- Ghidra Cheat Sheet 1: https://htmlpreview.github.io/
- Ghidra Cheat Sheet 2: https://github.com/NationalSecurityAgency/ghidra/blob/master/GhidraDocs/CheatSheet.html
- Ghidra Classes on NSA’s GitHub page: https://github.com/NationalSecurityAgency/ghidra/tree/master/GhidraDocs/GhidraClass
- Download IDA Free: https://www.hex-rays.com/products/ida/support/download.shtml
- Write a quick program in C (possibly copy an example from the Github page above), compile it with gcc, get it to run and make sure it runs as expected, decompile in Ghidra, and debug it in IDA Free. See if code decompiled and debugged as expected. Does decompiled code look the same as the way you wrote it? When debugging your program in IDA free, you watch the registers change, watch the stack, etc. This step is a bit of a jump because you have to put it all together. If you can’t figure it out, go watch YouTube on writing/compiling C programs or look at this link which probably won’t be around forever. You need to teach yourself how to learn. Now talk about the stuff you just did: become conversational in those aspects.
-At the end of STEP 1, you now know a) the basics of the C language b) how write and compile a basic program, and c) are familiar with gcc, Ghidra, and IDA Free.
— — — — — — — — — — — — — — — — — — — — — — — — — —
STEP 2 — Dive deeper into some of the knowledge of what’s going on deeper in the stack and start a GitHub account if you don’t have one yet. Now that you have at least a cursory understanding of C, it’s time to get the age old beauty, AoE — The Art of Exploitation by Jon Erickson. I really need to make these affiliate links. For now, just buy me a beer if this is useful.
S2 Macro: Know buffer overflows at a macro level. Focus on different types of buffer overflows and their general implementations
- Buffer overflows at the high level, ASLR, and stack canaries — https://en.wikipedia.org/wiki/Stack_buffer_overflow
- Buffer overflows at a technical level — specific registers that might be involved
- Learn about ROP, ROP chains, and stack smashing— https://en.wikipedia.org/wiki/Return-oriented_programming
- Read this whole article: https://hshrzd.wordpress.com/how-to-start/
- If you want to go deep fast, read Forrest Orr’s 2020 blog post titled A Modern Exploration of Windows Memory Corruption Exploits — Part I: Stack Overflows
S2 Micro: Dive deep into the following to understand and explain ways to defeat buffer overflows. You should be able to walk up to a whiteboard and explain these concepts on a whim.
- DEP — and, more broadly, executable space prevention. DEP is generally specific to Windows but the community often refers to DEP but means ESP
- Compiler learning: Write a simple program on https://godbolt.org/ and look at how different compilers change your code when they compile. THIS IS A HUGE DEAL and knowing this concept will save you tons and tons of time and frustration later!
- Make a GitHub account, learn how to branch.
- Start looking at strcopy versus strncopy a bit deeper and start thinking about how you would write strncopy from scratch
- Look into IDEs or just use vim/nano. Up to you. Benefits to both. Oh, and the Linux-based editors now work on Windows via WSL. If you didn’t understand that, don’t worry and just use your favorite notepad tool and read the C cheatsheet.
- Pick up a copy of John Mongan’s Programming Interviews Exposed (not an affiliate). Learn about Big-O notation and some of the classic fizz-buzz type examples. Yes, they’re still used all the time.
- Start practicing interviewing.
— — — — — — — — — — — — — — — — — — — — — — — — — —
STEP 3 — This looks like a short section, but it’s a bit longer than you might realize. This is often where both disciplines start to split their focus between embedded, OS, and mobile. At this point, you should have proven that you can teach yourself things. So go out and find the information you need to understand and apply the following concepts:
- Go back and learn pointers well and start using them in your practice challenges if you haven’t already. Write 3–4 small (50 lines or less) programs that use pointers. Repeat until you can walk up to a markerboard and diagram how you use pointers ad nauseum.
- Rewrite strncopy from scratch. Focus on how strncopy differs from strcopy. Don’t just call strncopy from your program. Literally take an array of characters, check for bounds, move it somewhere else, and validate that it worked. That’s a simple explanation. There’s more that goes into it. Look at the C Cheatsheet for more info.
- Understand OS-level sandboxing
- TPM — trusted platform module
- SMEP — supervisor mode execution protection
- PAC — pointer authentication codes
- More for RE — Work through Malware Unicorn 101/102
- More for Tool Dev — Learn about dependencies the hard way: Find the source code to Google Chrome and compile it
- Bonus: Pick 2–3 Level 1 Crackme.one challenges or a couple Level 2
— — — — — — — — — — — — — — — — — — — — — — — — — —
Step 4
- Find your router’s firmware and binwalk it. Look for flaws.
- Rewrite memcpy in C from scratch
- For Mac/Android folks: Pick your favorite book by Andrew Levin.
- Know and understand S-box: https://en.wikipedia.org/wiki/S-box
- Know and understand Rijndael S-Box: https://en.wikipedia.org/wiki/Rijndael_S-box
- Learn about the three types of bugs: Syntactical, Logical, and Runtime
- Learn about the four ways you can run into deadlock
— — — — — — — — — — — — — — — — — — — — — — — — — —
Step 5 — now we’re getting crazy. At this point, you’ve probably already gotten a job.
- Write your own (very basic) OS for a Raspberry Pi
If you have an applied understanding of computer science, this won’t take terribly long. If you’re technically-minded but starting from scratch, YOU CAN DO IT! If you have no background whatsoever in tech, YOU CAN DO IT [but you’ll probably want to go back and take some computer science fundamentals first like structures and algorithms]. I’ve seen people get through Step 4 in as little as three weeks. Those people usually have a BS in computer science some formal, applied education through the DoD/Military, or deep math, engineering, or physics backgrounds. For those sharp/technical people without the background or experience doing this nights and weekends and a couple kids running around, this can take up to a year. Priorities are a thing! Be gracious and forgiving to yourself as you go along this journey. This isn’t easy. In fact, the journey can really suck. Take breaks, take notes, and please, for the love of all that is good, comment your code!
As always, this is a living document and I take feedback. The goal here is that rising tides lift all ships: help others help themselves. Give back and you get back. #sharingiscaring.