Flexible static analysis for C++ code bases
The importance of static code analysis is already a well known thing in the domain of software development. There are plenty of useful and less useful tools for the purpose, especially in the case of C++. However, even if in general the quality of these softwares is adequate they usually suffer from the inability for extending or customizing behavior. Also, a usual problem arises from the fact that the C++ language syntax is overwhelmingly complex and it makes the code parser of any static analysis tool a nightmare. In this article I would like to present a tool called CppDepend that gracefully solves the aforementioned problems primarily focusing on providing an interface that enables 100% adaptability and extensibility for creating customized metrics that are relevant or applicable in a particular domain.
Why static code analysis?
Analysis of computer software, in particular verification and validation, is a very important factor in professional software development. The process behind itself can come in different forms. Generally all kind of verification and validation techniques can be categorized in two major groups: static analysis and dynamic analysis. The key difference between the two is while dynamic analysis verifies the execution of the code, static analysis strictly works on the code base itself.
Well, there are thousands of reasons why using a static code analysis tool makes any benefits to a particular software development process. If you ask various people they will all have their own reasons and rationale behind that. Just to mention my favorites here is a brief excerpt from the long list:
- Find coding errors before executing a single line of code. This is important as it does not require the project to be built or executed as in many cases these two additional phases can be quite expensive from both time and budget point of view.
- Identifies parts of the code that seem to be difficult to maintain or do not conform to various policies of a particular company or organization. This provides us the benefit to move towards a sustainable development by heavily reducing maintenance costs.
- Provides us miscellaneous metrics about our code that can have key importance in measuring the quality of the code base.
If you can’t measure it, you can’t improve it – Lord Kelvin
Many people still think that code metrics are overrated. Even if at first sight it seems to be true for micro-projects its importance becomes very obvious when one mets a large code bases specifically talking about situations when legacy code is inherited from earlier software developer generations. When the magnitude of the software goes out of the limits a programmer is capable to keep in mind (this means the 99% of software products) code metrics provide great value to identify “hot spots” in the code base, no matter what actual situation we are talking about.
Also, making decision about whether the evolution of the software goes in the right direction is very difficult if not impossible without ways of measuring the quality of the code. The most naive solution for this problem is to measure the amount of bug reports reported over time, however, code metrics provide a much more sophisticated way of measuring the quality by different aspects and on different levels.
During my career, as a software developer, I also faced many situations when the inspection of the legacy code was necessary in order to introduce new functionalities. Unfortunately, in most of the cases, due to the lack of an adequate static code analyst, this required developers to read and manually inspect the code in order to solve the particular problem. I can tell you that it’s not a joyful duty. Just to mention some of the most critical situations that current developers meet regarding to the topic:
Removing dependencies on deprecated features. This is a thing that each software development faces from time to time. This time interval is usually relatively low, as we talk about few years which can be called quite often compared to other industries. Just think about situations when one migrates to a new version of a third party library that the whole software depends on. As a recent event, we can talk about the release of version 3 of the OpenGL specification. CAD software developer companies faced a huge challenge by being forced to adopt the new features as the old ones became deprecated and obsolete. Actually they were quite lucky that vendors denied to drop features from their implementations. Using a code analyst one can easily identify the modules that needs to be modified in order to adopt to the latest changes.
Introducing multiprocessing. This is also a very imminent problem that every software development company will face sooner or later. Code bases inherited from the previous decades were not prepared to handle concurrent execution of the code thus making big headaches to software architects to redesign the code in order to be SMP compliant, especially when dealing with multi-core processors. I’ve also faced this situation during my career and it was a painful lesson that code analyzing possibilities have a great importance. Before inspecting carefully the whole code base it is very difficult to identify the possible problems that may arise by the introduction of multiprocessing. Automatic inspection of the code can be a very handy tool for minimizing the required efforts.
What makes up a good static code analysis tool?
There are many different aspects that affect how good a particular static code analysis tool is. In many situations having competing alternatives for this purpose is at a premium. Fortunately, this is not the case regarding to C++ as being a well supported programming language from the community. However, in order to choose a suitable alternative we have to collect our requirements:
- Correctness – It must correctly analyze the code. This is a very basic requirement against any software development tool. While this seems to be a completely obvious requirement and one expects that tools behave as expected from this point of view, most of such tools for C++ do not conform to this principle. Those who know the C++ language standard know well that writing a good parser for it is almost impossible.
- Usefulness – There is no sense in using a static code analyst if we don’t get any benefits from it. The reports generated by the analyst should provide useful information that are directly applicable in a particular use case. One typical example that I also faced quite often is that when one analyses legacy code and gets a report about thousands of problematic code parts. These reports are almost impossible to be handled and it makes headaches to the developers even to answer the very simple question: where to start?
- Customizability – This requirement directly relates to the previous one. By examining the previous example if there would be some customization possibility to get reports only about the 10 most problematic module it would be much easier to handle it. However, this requirement goes far beyond this. As an example, beside the build-in metrics of the analysis tool, it should provide means to add or modify metrics in order to have more relevant measures about the code fitting a particular domain or use case.
We’ve just mentioned three requirements explicitly and we already heavily reduced the number of alternatives…
CppDepend as a flawless alternative
Recently I’ve got a request to review a C++ static code analyst tool called CppDepend. After having a brief eye shot on the product I realized that it deserves a thorough inspection as it features a revolutionary technology called CQL that I will talk about a bit later in the article.
CppDepend was developed in partnership with NDepend, it was released six months ago having a two years development history by a very small team of experts. Actually it is accompanied with it’s brothers NDepend and XDepend that accomplish the same job for .NET and Java projects respectively.
We are talking about a Windows application that has tight integration with Visual Studio projects but also provides ways to be applicable in case of projects built with other development tool-set. Beside it is a command-line static code analysis tool for the C++ language, it provides a powerful GUI tool for visual inspection of different aspects of the code base thus enabling increased productivity and ease of use.
Lets have our first sight on the tool by using the visual interface to analyse a sample code base that will be in our case the source code of SFML.
Setting up the basic configuration for an analysis project is very straightforward. Beside that, the code analysis itself is surprisingly fast. While testing, the longest time it took was in case when I parsed the code of the Bullet Physics Library but even that didn’t required a minute on my system.
The visual controls themselves sometimes lack of good responsiveness due to the complex structures and relationships presented by them but we soon forgive CppDepend this minor issue when we take a closer look at the navigation possibilities offered by the tool.
At first sight, the user interface seems to be a bit overcomplicated but we soon realize that each and every element of it is made by purpose in order to provide as much freedom in navigation as possible. Just to mention the most interesting ones here’s the explanation of the purpose of the graphical figures at the top right part of the GUI:
- At top left we see a graphical representation of the currently selected code metric. It shows the magnitude of the result of the metric according to the selected level of granularity. We can easily visualize here as an example how the size of different classes of our project compare to each other.
- At middle left is the dependency matrix of our solution. We can easily find “hot spots” in our code regarding to coupling, by default, on project level. The granularity of the table can be easily changed in a non-proportional way from project level down to method level. I used the word “non-proportional” by intension as we can examine dependency even between a method and a foreign project thus providing additional flexibility over how fine grained we would like to have our numbers.
- My favorite is in the middle, called dependency graph. It can present the dependencies between different software elements from project level down to method level, as usual, by means of a graph that is very convenient for human inspection.
The whole user interface is designed in a way that each time we point on a particular element it shows convenient information about that particular element and its environment, no matter if we talk about the metrics view, the dependency graph or matrix.
Beside the tools for navigation and easy visualization, the GUI provides a collection of built-in reports about different aspects of the code. One of the first thing everybody would try out from these is the query called “Quick summary of methods to refactor”. This is exactly the answer what the developer would like to have for the question “where to start?” that I mentioned earlier.
To emphasize even more the fact that how convenient is the user interface, when one selects a particular query it will immediately show the results by means of a list of classes, methods or whatever, but beside this, the code elements in question are immediately highlighted in the relevant graphical views as well.
Maybe I already convinced most of you that CppDepend is a tool that deserves attention as being a valuable tool in good hands but I haven’t even talked about the most interesting feature that really makes it a uniquely powerful software.
The power of extensibility
I have often brought to relief the importance of extensibility and customizability of a static code analyst. This, in fact, is not just my craze but it is an important factor in the decision of most software developers out there. Being able to get some common metrics about the code is one thing, having the possibility to define own metrics and analysis criterias is another…
The power of CppDepend is behind a revolutionary technology that provides us an interface to retrieve information about the code that is relevant for us as easy as querying a relational database. The apparatus in our hand to achieve this is the Code Query Language (CQL). CppDepend actually builds some internal database structure from the source code and provides us an SQL-like language to make queries that fetches reports from this internal database. Those who are already familiar with SQL will adore this feature. Just to illustrate how easy it is to use CQL in order to build custom queries, let’s query the classes that have more than 20 methods is as simple as the following line of CQL code:
SELECT TYPES WHERE NbMethods > 20
Simple, isn’t it? For further details, please refer to the specification of the Code Query Language: http://www.cppdepend.com/CQL.htm
This means that the software developers have complete freedom over how they define the metrics that indicate whether the code quality reaches the levels required by company policies or individual needs. It is also useful to solve the problems arising from the sample situations I’ve mentioned earlier, namely the problem with dependency on deprecated features and the introduction of multiprocessing, by easily and clearly identifying the modules that need to be changed even in situations when the code base is extremely huge and traditional ways for identifying affected modules are not applicable or simply not feasible.
Well, I’ve already talked enough about the abilities of CppDepend regarding to usefulness and customizability, however, I’ve barely touched the topic of correctness. As I’ve already mentioned, parsing C++ code correctly is not as easy as it may look like. For this purpose I’ve prepared a bunch of template heavy libraries like GLM and GoogleMock to check how well CppDepend handles code bases when it comes to awkward features of the C++ language.
Even though generally static analyst tools does not provide too much useful information about such project, due to their special nature, it still looked convenient to try to make parsed these libraries by CppDepend in order to have a picture about how it would handle huge projects that also take advantage of the templating mechanisms of C++. I have to say that the results are very promising as it had problems only with GoogleMock but the developers were already informed about the problem I’ve encountered.
The dark side of the story
While CppDepend is an excellent tool for software developers working under Windows, especially if they use Visual Studio, I would like to see a cross-platform version of CppDepend in the future, at least for Linux and MacOSX.
Also, CppDepend does not come for free but at a reasonable price. Even though most probably individuals and hobbyists would not consider buying it, for enterprises, even for small ones, the price of the tool will most probably pay back soon by heavily decreasing short- and long-run maintenance costs of the development.
A clever static code analyst tool is nowadays a must for every software development company that deals with code whose size have already ran over a certain threshold but it is also good to use one from the very beginning of a new project. Selecting a particular tool for this purpose is the choice of the enterprise, still, the requirements against such a software are usually the same.
CppDepend proved to me of being a valuable software in the tool-chain of every C++ programmer using Windows as primary development platform. If you are still not convinced then check out the full feature list on the official site.
Even if you are not interested in using CppDepend or in static analysis tools at all, you should still take a look at CQL and the great idea behind it as it is a perfect example how a solution for a well discussed problem can ascend to new levels by adopting good practices from other domains, in this case from relational databases and related technologies.
|Print article||This entry was posted by Daniel Rákos on March 2, 2010 at 5:12 pm, and is filed under General, Programming. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site.|
No comments yet.
No trackbacks yet.
about 2 years ago - 16 comments
In this article, I would like to present you an edge detection algorithm that shares similar performance characteristics like the well-known Sobel operator but provides slightly better edge detection and can be seamlessly extended with little to no performance overhead to also detect corners alongside with edges. The algorithm works on a 3×3 texel footprint…
about 3 years ago - 6 comments
Dynamic geometry level-of-detail (LOD) algorithms are very popular and powerful algorithms that provide a great level of rendering performance optimization while preserving detail by using less detailed geometry for objects that are far away, too small or otherwise less significant in the quality of the final rendering. Many of these are used since the very…
about 3 years ago - 29 comments
Hierarchical-Z is a well known and standard feature of modern GPUs that allows them to speed up depth testing by rejecting large group of incoming fragments using a reduced and compressed version of the depth buffer that resides in on-chip memory. The technique presented in this article uses the same basic idea to allow batched…
about 3 years ago - 18 comments
OpenGL 3.0 capable GPUs introduced a level of processing power and programming flexibility that isn’t comparable with any earlier generations. After that, OpenGL 4.0 and the hardware supporting it even further pushed the limits of what previously seemed to be impossible. Thanks to these features nowadays more and more possibilities are available for the graphics…
about 3 years ago - 55 comments
Gaussian blur is an image space effect that is used to create a softly blurred version of the original image. This image then can be used by more sophisticated algorithms to produce effects like bloom, depth-of-field, heat haze or fuzzy glass. In this article I will present how to take advantage of the various properties…
about 3 years ago - 16 comments
A few months ago I’ve presented an object culling mechanism that I’ve named Instance Cloud Reduction (ICR) in the article Instance culling using geometry shaders. The technique targets the first generation of OpenGL 3 capable cards and takes advantage of geometry shaders’ capability to reduce the emitted geometry amount in order to get to a…
about 3 years ago - 23 comments
Nowadays comprehensive testing is a must for any software product. However, it isn’t such a general rule when it comes to graphics applications. Many developers face difficulties when they have to test their rendering codes. Manual tests and visual feedback is sometimes satisfactory but if one would like to have automated regression tests usual approaches…
about 3 years ago - No comments
Those who worked enough with C or other procedure oriented languages know how much flexibility callbacks provide. The simplest example is the qsort function of the C standard library. It is also not unintentional that many libraries, windowing system APIs and operating system APIs also highly rely on callbacks to pass a particular task over…
about 3 years ago - 50 comments
Since the appearance of Shader Model 4.0 people wonder how to take advantage of the newly introduced programmable pipeline stage. The most important feature enabled by geometry shaders is that one can change the amount of emitted primitives inside the pipeline. The first thing that a naive developer would try to do with it is…
about 3 years ago - 6 comments
Previously I talked about how one can easily take advantage of multiprocessing using OpenMP. Even if the C pragmas introduced by the parallel programming API standard is very straightforward for simple programs, it simply doesn’t fit nicely in a complex C++ application that is built from the ground with the OOP in mind. To smoothly…