Debugging/Reversing NT System Binaries 来自http://www.openrce.org/blog/view/292 作者:AlexIonescu 作者是reactos 的内核开发者,AlexIonescu 自己还有一个主页在http://www.alex-ionescu.com,里面也都大量的好东西。他的具体介绍。 http://advdbg.org/blogs/advdbg_system/articles/268.aspx http://advdbg.org/blogs/advdbg_system/articles/369.aspx http://advdbg.org/也是个不错的网站,深入研究Windows内部原理系列课程系列中有几讲是网站的维护者讲的 我觉得我应该在博客一开始就把这些要点放上……其中一些看起来是显而易见的,但是我发现很多逆向工程师并没有意识到这些会对NT系统的逆向和调试提供很大的方便。随意给我发送何额外的资源以便我能加上它们。 1)调试版本(Checked Builds)。 你得首先准备好它。如果你正在逆向一个发行版的二进制文件,请立即停下吧。你已经错过了大量的调试信息,断言和更容易读懂的代码。这里列举一些调试版本的优势: *大多数情况下NT系统的调试版本都是免费。那是正确的,如果你想要比较不同版本的NT系统之间的区别,你不需要带上每个发行版本的CD(或者更遭,在因特网上面寻找)。NT Service Packs 包含所有你想逆向的系统内核文件,而且包括调试版本在内的NT Service Packs也是可以免费地下载的。现在,你可以免费地对Windows2003的二进制文件进行逆向了。 *读起来的容易得多的代码。调试版本编译的时候不会使用编译器的OMAP技术,OMAP技术会在编译的时候吧函数分割为不同的块并重新编排它们以方便CPU缓存它们。那意味着函数可以被线性和轻松地被逆向。 *调试信息输出。这个是很棒的东西。M$开发者正告诉你在他们的代码里发生着什么,因此你不必猜测这些代码的功能。有时你甚至能找到警告(例如: "This will crash if the user sents RTL_FOO!!"),没有修复的错误等等东西。一些二进制里有完整的输出函数,例如Dbg_DumpSomeStructure,可以打印出一些大型的数据结构图示,以致于你没必要再进行逆向。调试信息输出也能给你提供有意义的标记名字,常量等。 *断言。微软出色的代码(特别是核心/系统级的)充满了断言。这些断言实际上是C代码,并且常常会给你提供具体的标识名,结构成员或者其他一些未公开的符号的名称。作者以前逆向一个文件时,单纯依靠断言就找到了一个拥有25个成员的结构的其中18个成员的名称(当然还有函数)。 *运行时情况的显示,调试,或者其他有帮助的功能。如果你感到好奇,你实际上能试着在你的系统上使用调试版本(我推荐那些需要被调试的二进制文件/设置) (译注:译者理解为在你的系统上用调试版本的文件替换你需要分析、调试的那些文件)。利用WinDbg对调试文件和调试符号的支持,这个手段能给你提供分析二进制的新方法,创造复杂的调试日志,甚至可以利用内置的消息、时间函数来处理和性能有关的代码。重申一下,只有调试版本能给你提供这些额外的服务。 *跟踪和保护。动态调试发布版本能使你从代码中获得更多信息,但是调试版本能打开内核的很多有利于逆向的调试功能。例如,你能跟踪一个堆的变化,或者内核对象,看见全部的分配和释放、创建和使用它们的地方的列表,这往往比单纯地在一个结构上下内存访问断点有用得多。 那么哪里可以找到它们呢?OSR的站点是一个能经常提供最新链接的站点:http://www.osronline.com/article.cfm?id=259 2)PDBs(符号) 或许我本应把这放在最前面,因为它实在太基本了,但根据文章的逻辑顺序,我把它放在这里。PDBs、符号、调试数据库,无论你怎么称呼它,它都是逆向工程的基本元素之一。在大多数的符号文件中,它们能提供给你对应的二进制文件中的每一个函数的名称(除了静态的) ,以及全局变量。即使在OMAP优化的二进制文件中,它们也包含了函数分块的特别信息。这意味着你调用080854处的函数时,调试器会给你转换成 AdvapipGenerateHash,使判断起来容易得多。在HAL(硬件加速层)或Kernel(系统内核)中,PDBs也包含了大量的没有在 WDK/PSDK中提供的内部结构。IDA不会主动去解析它们,不过你可以使用IDA的pdbPlus(http://www.datarescue.com/idabase/freefiles/PDBPlus.zip)插件来让IDA自动地添加这些结构。 3)WinDBG The Debugging tools for Windows(Windows Debugger/WinDBG),是一件极具价值的工具。不是对于它的反汇编功能,而是对它提供的无数来扩展来说的,甚至还包括内置的函数来打印其他途径没法找到的结构。例如,它有两款扩展能打印出CSR_PROCESS和CSR_THREAD,CSRSS使用的两个结构,这两个结构没有在任何公开的文档中提及。再次,能获得结构体、符号、标识和常量的名称对理解函数做了些什么有很大的帮助。 4)信息 现在你已经准备好了分析的工具和被分析的程序,不过你还有一件事要做:学习、阅读,熟悉你将要调试和逆向的东西。先看看能获得的公开文档,搜索一下 Internet站点,看见其它人已经发现了什么。但是在没有弄清fs:18h(TEB.NT_TIB.Self)代表了什么,在TEB块的874h处的结构体成员是什么,以及在那里0x5标识了什么的时候,请不要发布类似"我操,把fs:18h里的地址加上874h偏移量的那个地址里的值和5做与运算以后再右移3位,然后就出错了"。因为你不过发现了设置NtCurrentTeb()->CrashMode & PS_CRASH_IMMEDIATELY会导致出错提示,没啥意思的。如果一定要说的话,至少把那些结构体的名字放上去,免得让人看得一头雾水。 友情提示: 5. 代码质量 避免发布浪费他人时间的废话,比如刚才那个例子,这里给出一些建议: *给你的代码加上正确、详细的注释。 如果我看见一些表述新发现的文章,但是没有多少注释,我不会认为你是懒惰的,因为你花了很多时间来逆向它;相反我会认为你并没有清楚地搞懂你的代码做了些什么,为什么这么做。 *可移植性。 更常见的是,底层的NT代码是不可移植的。虽然这往往是设计的原因,但是试着尽量保持可移植性也没有什么坏处。不要通过直接使用sysenter来进行系统调用跳入Ring0。没错,这样做的话,只需要2个时钟周期,你看起来很酷;不过很遗憾这样做没法在我打过SP补丁的系统上工作。还有,不要用硬编码的偏移量。尽量使用头文件或者内嵌结构体定义,这样,如果这些定义在不同系统上有版本区别的话我可以通过NTDDI_VERSION来创建我需要运行的兼容版本。 *线程安全,多处理器支持。 再次地,很多底层的NT代码好像在说"oh well, I'm hacking sh*t anyways, who cares if I do it badly and I don't respect actual coding methdology".在你发布你的代码之前,尽量在多种系统上测试它。尝试在不同的复杂运行环境下运行它,判断潜在的冲突并修复它们,保证你的代码是线程安全以及多处理器兼容的。你自己使用单处理器的电脑不代表别人使用的电脑也是单处理器的。在多处理器系统上有很多NT内核模式的问题是单处理器系统上不需要考虑的,始在NT 核心态里担心什么时候你在一台多处理器的机器上。在需要的时候在C语言里使用“volatile”。(volatile是用来修饰变量的,表明某个变量的值可能在外部被改变,因此对这些变量的存取不能缓存到寄存器,每次使用时需要重新存取。该关键字在多线程环境下经常使用,因为在编写多线程的程序时,同一个变量可能被多个线程修改,而程序通过该变量同步各个线程) 有必要就使用互缩的操作。不要改变能同时被另外一个线程读取的指针。不要在没有同步所有的CPU之前做CPU层次的修改(去看看IPI吧)。每个CPU都有它自己的IDT,GDT等等东西。在你只hook一个CPU之前记住这一点。 *64位系统 还有,不要因为你只有32位的电脑而不去对程序做64位系统的兼容性考量。当然,有时那很困难,不过至少你可以使用/Wp64来得知一些明显的64位不兼容代码。如果可能的话,尽可能少用汇编代码。新版本的MSC(在WDK或者MSVC 2005里,版本14)有很多支持可移植性的新特性,包括很多类似获得返回地址,读取eflags寄存器,设置、读取、写入各种寄存器比如fs、gs、 dr*、cr*等等的功能,可以善加利用。 这是我现在能想起的全部,并且我希望我所说的话没有使人不愉快。我所有给出的例子是我所想象出来的常见的情况,并没有特别针对任何人,请勿多虑。
来自http://www.openrce.org/blog/view/292 作者:AlexIonescu 作者是reactos 的内核开发者,AlexIonescu 自己还有一个主页在http://www.alex-ionescu.com,里面也都大量的好东西。他的具体介绍。 http://advdbg.org/blogs/advdbg_system/articles/268.aspx http://advdbg.org/blogs/advdbg_system/articles/369.aspx http://advdbg.org/也是个不错的网站,深入研究Windows内部原理系列课程系列中有几讲是网站的维护者讲的 Here are some tips I thought I'd share in an blog entry... some of these may seem fairly obvious, but I've come across many reverse engineers who are not aware of the wealth of resources available for easier NT reversing and debugging. Feel free to message me any additional resources so that I may add them. 1) Checked builds. This is your first priority. If you are reversing a retail binary, STOP NOW. You are missing out on a wealth of debugging messages, assertions and easier to read code. Here are some of the advantages of checked builds: * Mostly FREE for ANY NT OS. That's right, if you want to compare code across NT versions, you don't need to carry your 15 CDs of every version released (or worse, beg around the Internet). NT Service Packs contain all the core system files you're likely to reverse, and their checked builds are free to download. Granted, you will be missing out on the retail versions, but now you don't need to buy Windows 2003 to reverse a Windows 2003 binary. * Much, much, much easier code to read. Checked builds are not built with OMAP, the compiler technology which splits up functions in chunks and re-organizes them for better CPU caching. That means that functions are linear and a breeze to reverse. * Debug prints. These are just awesome. Microosft developers are telling you what's going on in their code, so you don't have to guess. Sometimes you can even find warnings (ie: "This will crash if the user sents RTL_FOO!!"), unfixed bugs, etc. Some binaries have entire built-in dumping functions, such as Dbg_DumpSomeStructure, which will graphically print out some huge structure that you don't need to reverse anymore. Debug prints can also give you valuable flag names, constants and etc. * Assertions. Good Microsoft code (especially core/system-level) is filled with assertions. These assertions are actually C code, and more often then not will give you the name of a flag, structure member, or other symbolic names which are not public. While reversing a file once, I was able to find the name (and thus function) of about 18 fields out of a 25 field structure, merely by reading the assertions. * Run-time profiling, debugging, or other helpful functions. If you are feeling curious, you can actually try using a checked build live on your system (I recommend only the specific binary/set, however). Coupled with WinDBG, this could give you new ways to analyze the binary, create complex debug logs, and even use built-in profiling/timing code if your reversing project is performance related, or if you're just curious. Again, only in a checked build. * Tracing and protection. This applies more for testing your code, but checked builds also enable many tracing options in the kernel, which can be useful for reversing. For example, you can track a heap block, or any kernel object, and see a list of all acquires/releases, creators and users, which can sometimes be more useful then putting a memory breakpoint on a structure. OK, so where to get them? A good place for up-to-date links is on OSR's site: http://www.osronline.com/article.cfm?id=259 2) PDBs (Symbols). Perhaps I should've put this first, because it really is even more basic, but I'm going at this in logical order. PDBs. Symbols. Debug Databases. Whatever you want to call them, you should not be reversing without them. In their most basic form, they will give you the internal name of every function in your binary (except statics), as well as global variables. With an OMAP-binary, they also contain special information to link chunked functions together. This means that your call 080854 just became call AdvapipGenerateHash, making your job a lot easier. With something like HAL or the kernel, PDBs also contain a wealth of structures not publically documneted in the WDK/PSDK. IDA doesn't unfortunately parse them, but if you use the pdbPlus plugin (available on the site), IDA will automatically add them to its structure database. 3. WinDBG. The Debugging tools for Windows (Windows Debugger/WinDBG) is an extremly valuable tool, not for its diassembler, but for the myriad of extensions that it provides, which also have built-in code to dump structures which are unavaialble anywhere else. For example, two of its extensions are able to dump CSR_PROCESS and CSR_THREAD, which are the structures used by CSRSS, and not documented anywhere. Again, having access to structures and symbolic/flag/constant names can go a long way toward understanding what a function does. 4. Information Now that you're all setup with the tools and binary, there is one more thing you should do: learn, read, and get acquainted with what you're going to debug/reverse. Read all the documentation avaialble, browse internet sites, see what others have discovered. But please don't post excitely that "omfg, if you set fs:18h+874h & 0x5 >> 3 you get a bugcheck", when you haven't taken the time to understand what fs:18h is in the first place, what member of the TEB 874h is, and what the 0x5 flag's symbolic name/meaning is. Because you might as well have discovered that setting NtCurrentTeb()->CrashMode & PS_CRASH_IMMEDIATELY crashes, which isn't really interesting to know, or at the very least, makes a lot more sense if presented that way. Bonus: 5. Code Quality Apart from avoiding to produce un-symbolized crap like seen above, it's a good idea to: * Comment your code. Properly. Extensively. If I see some discovery or exploit that is poorly commented, I'm not going to assume you were lazy, since you spent all this time reversing it, I'm going to assume you don't really know why your code is doing what it's doing. * Portability. More often then not, low-level NT code is completely unportable. While this is by design in many cases, it wouldn't hurt to try remaining as compatible as possible. Don't do systemcalls by using sysenter directly. Yes, yes, you look cool and it's 2 cycles faster, but it'll also not run on my service pack. And again, don't hard-code offsets. Use actual structures/headers, which may be versionned so that if I want a compatible version on 2003, I can just build it by using NTDDI_VERSION. * Thread safety, multi-processor. Again, much low-level NT code seems to be saying "oh well, I'm hacking sh*t anyways, who cares if I do it badly and I don't respect actual coding methdology". Before you publish your code, try to test it on a variety of systems. Try to profile it and stress it. Identify potential race conditions and fix them. Make sure your code is thread-safe and multiprocessor compatible. Just because you're running on an uni-processor machine doest't mean everyone else is. There a are a great number of things you need to start worrying about in NT kernel mode when you're on a multi-processor machine. Use "volatile" in C when needed. Use Interlocked operations when required. Don't change a pointer that could be read by another thread in the same time. Don't do CPU-level modifications without synchronizing them to both CPUs (learn about IPI). Each CPU has its own IDT, GDT, etc. Remember that before you only hook one. * 64-bit Again, just because you don't have a 32-bit machine doesn't mean you don't have to make your code as compatible on 64-bit as possible. Sure, that's sometimes impossible, but at least use /Wp64 so you get warned about obvious 64-bit incompatibilities and broken code. Minimize your use of assembly if possible. Version1 14 of MSC (in the WDK or MSVC 2005) has many intrinsics that are portable when recompiled, including stuff like getting the return address, reading eflags, setting/reading/writng fs/gs/dr*/cr*, etc. That's all I can think of for the moment, and I hope nobody takes this offensively. All the examples I've given were out of my head and I'm not targetting anyone in particular, these are just some considerations.
mov ecx, DWORD PTR _size$[esp-4] push esi mov esi, DWORD PTR _s$[esp] mov eax, ecx push edi mov edi, DWORD PTR _d$[esp+4] shr ecx, 2 rep movsd mov ecx, eax and ecx, 3 rep movsb pop edi pop esi
因为我们并不知道 size 是否是 4 的整数倍,所以尾巴上用 and ecx,3 / repmovsb 来处理了一下。
Reverse engineering is the process of extracting the knowledge or design blue-prints from anything man-made. The concept has been around since long before computers or modern technology, and probably dates back to the days of the industrial revolution. It is very similar to scientific research, in which a researcher is attempting to work out the "blueprint" of the atom or the human mind. The difference between reverse engineering and conventional scientific research is that with reverse engineering the artifact being investigated is man-made, unlike scientific research where it is a natural phenomenon.