This week I’ve been researching on some statical analysis tools for C language that suits the needs of the project I’m currently working on at LSD-FI-UPM.
Here is a summary of what I’ve found out.
CIL: Intermediate Language and Tools for Analysis and Transformations of C Programs
It’s suitable, but the transformations have to be written in Ocaml.
From the CIL Documentation:
The most common way to use CIL is to write an Ocaml module containing your analysis and transformation, which you then link into our boilerplate driver application called cilly.
It’s possible to use it as a library, but the API is also for Ocaml projects.
CIL is able to handle big projects, through a module they call the whole-program merger. It’s not explicitly mentioned if it’s able to handle shared libs, but it seems so, since it was able to handle some very big projects like the Linux kernel, the gcc compiler and the Apache web server.
It has a Control Flow Graph module and a Data-flow Analysis module, but the paper says that they were not as much exercised as the other parts of CIL.
Written using CIL, has the same drawback that the extensions have to be written in Ocaml.
Clang is the C/C++ front-end of the LLVM compiler. It’s under heavy development, and since it’s intended use is real world compiling, we can expect to have a very good and complete tool. It’s development is founded by Apple Inc. Clang is built with a library based architecture that makes it relatively easy to adapt it and build new tools with it. These are some of it’s layers:
- libast – Provides classes to represent the C AST, the C type system, builtin functions, and various helpers for analyzing and manipulating the AST (visitors, pretty printers, etc).l
- libsema – Semantic Analysis. This provides a set of parser actions to build a standardized AST for programs.
- librewrite – Editing of text buffers (important for code rewriting transformation, like refactoring).
- libcodegen – Lower the AST to LLVM IR for optimization & code generation.
It fails to compile on my machine at the lab, so I compiled it at my laptop and was able to generate CFG’s for parts of a program, but not for the whole program. Also, I wasn’t able to control exactly for witch parts of the program the CFG’s were generated.
It’s a tool intended for compilers construction. It seems to suit the project needs, but it’s not available for download on the project page. One need to contact the authors in order to get the tool. In the CIL paper, it’s author says that C-Breeze doesn’t support analyzing programs that span over multiple files, but I haven’t checked this issue.
The documentation of this tool is a bit confusing. It seems to do a lot of stuff, the documentation mentions AST’s but it doesn’t say nothing about CFG’s or call graphs, so I’m not sure if it’s suitable for the project or no. In the CIL paper, the author says it can’t handle many of the GCC extensions, so it can’t analyze real-world programs.
Right now, it seems to me that LLVM/Clang is the best choice. I still have to figure out how to generate call-graphs using it, but I think it’s possible. Another choice wouId be CIL, but since I’ve never used Ocaml, and don’t have a strong basis on functional languages, I prefer to avoid it. If someone has any sugestions on other tools, or pointers on more detailed info regarding these tools (specially LLVM/Clang), please leave a comment.