Break in to Tool Dev / RE

Stephen C. Semmelroth
9 min readSep 9, 2020
So you want to be a tool developer or start reverse engineering? You can. Here’s how. Photo by Kelly Sikkema.

This article is designed for people that want to break into reverse engineering and tool development. This breakout is effectively the pipeline that has gotten a large number of candidates jobs in these two disciplines.

Tool development: the art and science of systematically building software that can break other software to find and test their vulnerabilities which leads to protecting against those same vulnerabilities. Figure out how and why something breaks so that it can be fixed before the bad guys break it.

Reverse engineering (RE): The art and science of looking deeply at a tool or piece of malware to figure out how it works and use that information to fix the software and better identify and defend against attacks.

Yes, tool development and reverse engineering are two different disciplines. However, they have substantially similar crossover and the fundamentals are generally focused on the C language. So learn it. Example: You write and compile a program and it doesn’t work. You troubleshoot it and it doesn’t work. So you pull it into a debugger and step through the program instruction by instruction or a decompiler to verify if the compiled program is what you expect to see. Boom: you just reverse engineered your own program. On the flip side, you are reversing something and you don’t understand what you’re looking at so you pull up a developer book to look up what some esoteric command does. Boom: you just did some developer work.

Each STEP is broken into two sections: macro and micro. The macro section includes some concepts you need to know and background information to help frame WHAT and WHY you do in the micro portion which is more hands on. Throughout each step, consider posting your code and results to GitHub as some employers want to see a lot of green (commits).

If you have a basic background or understanding in computer science, this won’t take terribly long. If you’re starting from scratch, YOU CAN DO IT, but it can take up to a year or more depending on your situation, grit, and ability to teach yourself new content. If you’re new to this sort of thing but are relatively technically-minded and comfortable with asking WHY 30 times a day and only figuring out an answer four times a day, you got this.

— — — — — — — — — — — — — — — — — — — — — — — — — —

STEP 1: The goal of this exercise is to understand the C language fundamentals to a 30% solution. Understanding will increase in additional steps. You will also use some tools that are fundamental to progress such as gcc, Ghidra, and IDA Free.

Step 1 Macro: This orients you to the C language.

  • Start by downloading the C Cheatsheet from the Rainier Cyber GitHub. Print a copy and keep it on you at all times so you can reference and make notes. https://github.com/Rainier-Cyber/C-cheatsheet. This cheatsheet is designed to take a modern approach to the C language with definitions and language that makes sense to current audiences. There’s also multiple code examples that reinforce learning concepts. Huge shout out to Sean Eyre who built most of the cheatsheet!
C Cheatsheet designed and compiled by Sean Eyre and Stephen C. Semmelroth

Step 1 Knowledge:

  • Disassembler: A disassembler is a software tool which transforms machine code into a human readable mnemonic representation called assembly language.
  • Debugger: Debuggers allow the user to view and change the running state of a program.
  • Decompiler: Software used to revert the process of compilation. Decompilers take a binary program file as input and output the same program expressed in a structured higher-level language.
  • Learn the following sections: Code, Comments, Data Types, Casting, Structs/Arrays/Pointers, Functions, Operations, Statements, Key Words, IO, Memory, Strings, and Compiling
  • THERE ARE NO STRINGS IN C! There are arrays of characters. Put an end of line / new line at the end of your array for text
  • What is the difference between strcopy and strncopy?
  • How does malloc() work? Eventually you will write your own version of strncopy and malloc()

Step 1 Macro Option A: If you don’t have a background in programming or the C Cheatsheet scares you, take a C Programming course if you need to. Here’s one that some friends have enjoyed on Udemy (not an affiliate): https://www.udemy.com/course/c-programming-for-beginners-/. Print out the cheatsheet and keep it open and take notes on it throughout the course.

Step 1 Macro Option B: You have a background in programming and you once invited malloc to your birthday party and it showed up. Start by again reviewing the C Cheatsheet above.

Step 1 Micro — You really should be doing this in a virtual machine (VM). It’s good practice. If you don’t know how, go to YouTube. There are plenty of videos that will show you how to spin up a Linux VM. Go find them. This intro is about learning RE and tool dev, not VM fundamentals.

-At the end of STEP 1, you now know a) the basics of the C language b) how write and compile a basic program, and c) are familiar with gcc, Ghidra, and IDA Free.

— — — — — — — — — — — — — — — — — — — — — — — — — —

STEP 2 — Dive deeper into some of the knowledge of what’s going on deeper in the stack and start a GitHub account if you don’t have one yet. Now that you have at least a cursory understanding of C, it’s time to get the age old beauty, AoE — The Art of Exploitation by Jon Erickson. I really need to make these affiliate links. For now, just buy me a beer if this is useful.

S2 Macro: Know buffer overflows at a macro level. Focus on different types of buffer overflows and their general implementations

S2 Micro: Dive deep into the following to understand and explain ways to defeat buffer overflows. You should be able to walk up to a whiteboard and explain these concepts on a whim.

  • DEP — and, more broadly, executable space prevention. DEP is generally specific to Windows but the community often refers to DEP but means ESP
  • Compiler learning: Write a simple program on https://godbolt.org/ and look at how different compilers change your code when they compile. THIS IS A HUGE DEAL and knowing this concept will save you tons and tons of time and frustration later!
  • Make a GitHub account, learn how to branch.
GitHub’s Octocat
  • Start looking at strcopy versus strncopy a bit deeper and start thinking about how you would write strncopy from scratch
  • Look into IDEs or just use vim/nano. Up to you. Benefits to both. Oh, and the Linux-based editors now work on Windows via WSL. If you didn’t understand that, don’t worry and just use your favorite notepad tool and read the C cheatsheet.
  • Pick up a copy of John Mongan’s Programming Interviews Exposed (not an affiliate). Learn about Big-O notation and some of the classic fizz-buzz type examples. Yes, they’re still used all the time.
  • Start practicing interviewing.

— — — — — — — — — — — — — — — — — — — — — — — — — —

STEP 3 — This looks like a short section, but it’s a bit longer than you might realize. This is often where both disciplines start to split their focus between embedded, OS, and mobile. At this point, you should have proven that you can teach yourself things. So go out and find the information you need to understand and apply the following concepts:

  • Go back and learn pointers well and start using them in your practice challenges if you haven’t already. Write 3–4 small (50 lines or less) programs that use pointers. Repeat until you can walk up to a markerboard and diagram how you use pointers ad nauseum.
  • Rewrite strncopy from scratch. Focus on how strncopy differs from strcopy. Don’t just call strncopy from your program. Literally take an array of characters, check for bounds, move it somewhere else, and validate that it worked. That’s a simple explanation. There’s more that goes into it. Look at the C Cheatsheet for more info.
  • Understand OS-level sandboxing
  • TPM — trusted platform module
  • SMEP — supervisor mode execution protection
  • PAC — pointer authentication codes
  • More for RE — Work through Malware Unicorn 101/102
  • More for Tool Dev — Learn about dependencies the hard way: Find the source code to Google Chrome and compile it
  • Bonus: Pick 2–3 Level 1 Crackme.one challenges or a couple Level 2

— — — — — — — — — — — — — — — — — — — — — — — — — —

Step 4

— — — — — — — — — — — — — — — — — — — — — — — — — —

Step 5 — now we’re getting crazy. At this point, you’ve probably already gotten a job.

  • Write your own (very basic) OS for a Raspberry Pi
Because a) Pis are cool and b) OS fundamentals. Photo by Harrison Broadbent

If you have an applied understanding of computer science, this won’t take terribly long. If you’re technically-minded but starting from scratch, YOU CAN DO IT! If you have no background whatsoever in tech, YOU CAN DO IT [but you’ll probably want to go back and take some computer science fundamentals first like structures and algorithms]. I’ve seen people get through Step 4 in as little as three weeks. Those people usually have a BS in computer science some formal, applied education through the DoD/Military, or deep math, engineering, or physics backgrounds. For those sharp/technical people without the background or experience doing this nights and weekends and a couple kids running around, this can take up to a year. Priorities are a thing! Be gracious and forgiving to yourself as you go along this journey. This isn’t easy. In fact, the journey can really suck. Take breaks, take notes, and please, for the love of all that is good, comment your code!

As always, this is a living document and I take feedback. The goal here is that rising tides lift all ships: help others help themselves. Give back and you get back. #sharingiscaring.

--

--

Stephen C. Semmelroth

VP Cyber at StrataCore. I talk to the bits so the customers don’t have to.