Current Entries | Archives   RSS

 


How Visual Lint parses projects and makefiles

Thursday 6th November, 2014


Code analysis tools can require a lot of configuration to be useful. Whilst some (e.g. Vera++ or cpplint) need very little configuration to make use of effectively, others such as PC-lint (and even, to a lesser extent, CppCheck) may need to be fed pretty much the same configuration as the compiler itself to be useful. As a result the command lines you need to use with some analysis tools are pretty complex in practice (which is of course a disincentive to using them, and so where Visual Lint comes in. But I digress...).

In fact, more capable code analysis tools may need to be given even more information than the compiler itself to generate meaningful results - for example they may need to be configured with the built-in preprocessor settings of the compiler itself. In a PC-lint analysis configuration, this is part of the job compiler indirect files such as co-msc110.lnt do.

As a result of the above, a product such as Visual Lint which effectively acts as a "front end" to complex (and almost infinitely configurable) code analysis tools like PC-lint needs to care about the details of how your compiler actually sees your code every bit as much as the analysis tool itself does.

What this means in practice is that Visual Lint needs be able to determine the properties of each project it sees and understand how they are affected by the properties of project platforms, configurations and whatever compiler(s) a project uses. When it generates an analysis command line, it may need to reflect those properties so that the analysis tool sees the details of the preprocessor symbols, include folders etc. each file in the project would normally be compiled with - including not only the properties exposed in the corresponding project or makefile, but also built-in symbols etc. normally provided by the corresponding compiler.

That's a lot of data to get right, and inevitably sometimes there will be edge cases where don't quite get it right the first time. It goes without saying that if you find one of those edge cases - please tell us!

So, background waffle over - in this post I'm going to talk about one of the things Visual Lint does - parsing project files to identify (among other things) the preprocessor symbols and include folders for each file in order to be able to reflect this information in the analysis tool configuration.

When analysing with CppCheck, preprocessor and include folder data read in this way can be included on the generated command line as -D and -I directives. Although we could do the same with PC-lint, it is generally better to write the preprocessor and include folder configuration to an indirect ("project.lnt") file which also includes details of which implementation (.c, .cpp, .cxx etc.) files are included in the project -as well as any other project specific options. For example:

// Generated by Visual Lint 4.5.6.233 from file: SourceVersioner_vs90.vcproj
// -dConfiguration=Release|Win32

//
-si4 -sp4                     // Platform = "Win32"
                              //
+ffb                          // ForceConformanceInForLoopScope = "TRUE"
-D_UNICODE;UNICODE            // CharacterSet = "1"
-DWIN32;NDEBUG;_CONSOLE       // PreprocessorDefinitions = "WIN32;NDEBUG;_CONSOLE"
-D_CPPRTTI                    // RuntimeTypeInfo = "TRUE"
-D_MT                         // RuntimeLibrary = "0"
                              //
                              // AdditionalIncludeDirectories = ..\Include"
-save -e686                   //
  -i"..\Include"              //
-restore                      //
                              //
                              // SystemIncludeDirectories = "
                              //   F:\Projects\Libraries\boost\boost_1_55_0;
                              //   C:\Program Files (x86)\
                              //       Microsoft Visual Studio 9.0\VC\include;

                              //   C:\Program Files (x86)\
                              //       Microsoft Visual Studio 9.0\VC\atlmfc\include;

                              //   C:\Program Files\
                              //       Microsoft SDKs\Windows\v6.0A\include;

                              //   F:\Projects\Libraries\Wtl\8.1\Include"
                              //
-save -e686                   //
 +libdir(F:\Projects\Libraries\boost\boost_1_55_0)
 +libdir("C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include")
 +libdir("C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\atlmfc\include")
 +libdir("C:\Program Files\Microsoft SDKs\Windows\v6.0A\include")
 +libdir("C:\Program Files\Microsoft SDKs\Windows\v6.0A\include")
 +libdir(F:\Projects\Libraries\Wtl\8.1\Include)
 -iF:\Projects\Libraries\boost\boost_1_55_0
 -i"C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include"
 -i"C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\atlmfc\include"
 -i"C:\Program Files\Microsoft SDKs\Windows\v6.0A\include"
 -iF:\Projects\Libraries\Wtl\8.1\Include
-restore                      //
                              //
SourceVersioner.cpp           // RelativePath = "SourceVersioner.cpp"
SourceVersionerImpl.cpp       // RelativePath = "SourceVersionerImpl.cpp"
stdafx.cpp                    // RelativePath = "stdafx.cpp"
Shared\FileUtils.cpp          // RelativePath = "Shared\FileUtils.cpp"
Shared\FileVersioner.cpp      // RelativePath = "Shared\FileVersioner.cpp"
Shared\PathUtils.cpp          // RelativePath = "Shared\PathUtils.cpp"
Shared\ProjectConfiguration.cpp
                              // RelativePath = "Shared\ProjectConfiguration.cpp"
Shared\ProjectFileReader.cpp  // RelativePath = "Shared\ProjectFileReader.cpp"
Shared\SolutionFileReader.cpp // RelativePath = "Shared\SolutionFileReader.cpp"
Shared\SplitPath.cpp          // RelativePath = "Shared\SplitPath.cpp"
Shared\StringUtils.cpp        // RelativePath = "Shared\StringUtils.cpp"
Shared\XmlUtils.cpp           // RelativePath = "Shared\XmlUtils.cpp"

 

A file like this is written for every project configuration we analyse, but the mechanism used to actually read the configuration data from projects varies slightly depending on the project type and structure.

In the case of the Visual Studio 2002, 2003, 2005 and 2008 .vcproj files Visual Lint was originally designed to work with, this is straightforward as the project file contains virtually all of the information needed in a simple to read, declarative form. Hence to parse a .vcproj file we simply load its contents into an XML DOM object and query the properties in a straightforward manner. Built-in preprocessor symbols such as _CPPUNWIND are defined by reading the corresponding properties (EnableExceptions in the case above) or inferred from the active platform etc.

Similarly, for Visual C++ 6.0 and eMbedded Visual C++ 4.0 project files we just parse the compiler command lines in the .dsp or .vcp file directly. This is pretty straightforward as well as although .dsp and .vcp files are really makefiles they have a very predictable structure. Some other development environments (e.g. Green Hills MULTI or CodeVisionAVR) have bespoke project file formats which are generally easy to parse using conventional techniques.

Visual Studio 2010, 2012 and 2013 .vcxproj project files are far more tricky, as the MSBuild XML format they use is effectively a scripting language rather than a declarative format. To load them, we effectively have to execute them (but obviously without running the commands they contain).

As you can imagine this is rather tricky! We basically had to write a complete MSBuild parsing engine* for this task, which effectively executes the MSBuild script in memory with defined parameters to identifiy its detailed properties.

* Although there are MSBuild APIs which could conceivably help, there are implemented in managed code - which we can't use in a plug-in environment due to .NET framework versioning restrictions. Instead, our MSBuild parsing engine is written natively in C++.

To work out the parameters to apply to the MSBuild script, we prescan the contents of the .vcxproj file to identify the configurations and platforms available. Once we have those, we can run the script in memory and inspect the properties it generates for the required build configuration. This process is also used with CodeGear C+ and Atmel Studio projects, both of which are MSBuild based.

Eclipse projects (.project and .cproject files) are also XML based and are not too difficult to parse, but whereas it is straightforward to work out how things are defined in a .vcproj file, Eclipse project files are much less clearly defined and more variable (not surprising as they effectively represent serialised Java object data).

To make matters worse, each compiler toolchain can have its own sub-format, so things can change in very subtle ways between based projects for different Eclipse based environments such as Eclipse/CDT, QNX Momentics and CodeWarrior. In addition, Eclipse C/C++ projects come in two flavours - managed builder and makefile.

Of the two, managed builder projects are easy to deal with - for example to determine the built-in settings of the compiler in a managed builder we read the scanner (*.sc) file produced by the IDE when it builds the project, and add that data to the configuration data we have been able to read from the project file.

The scanner (.sc) file is a simple XML file located within the Eclipse workspace .metadata folder. Here's an extract to give you an idea what it looks like:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?scdStore version="2"?>
<scannerInfo id="org.eclipse.cdt.make.core.discoveredScannerInfo">
<instance id="cdt.managedbuild.config.gnu.mingw.exe.debug.116390618;
  cdt.managedbuild.config.gnu.mingw.exe.debug.116390618.;
  cdt.managedbuild.tool.gnu.cpp.compiler.mingw.exe.debug.1888704458;
  cdt.managedbuild.tool.gnu.cpp.compiler.input.391761176">
<collector id="org.eclipse.cdt.make.core.PerProjectSICollector">
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/include/c++"/>
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/include/c++/mingw32"/>
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/include/c++/backward"/>
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/../../../../include"/>
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/include"/>
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/include-fixed"/>
<definedSymbol symbol="__STDC__=1"/>
<definedSymbol symbol="__cplusplus=1"/>
<definedSymbol symbol="__STDC_HOSTED__=1"/>
<definedSymbol symbol="__GNUC__=4"/>
<definedSymbol symbol="__GNUC_MINOR__=5"/>
<definedSymbol symbol="__GNUC_PATCHLEVEL__=2"/>
<definedSymbol symbol="__GNUG__=4"/>
<definedSymbol symbol="__SIZE_TYPE__=unsigned int"/>
    .
    .
    .
</collector>
</instance>
</scannerInfo>

Unfortunately scanner files are not generated for makefile projects, and the enclosing project files do not give details of the preprocessor symbols and include folders needed to analyse them. So how do we do it?

The answer is similar to the way we load MSBuild projects - by running the makefile without executing the commands within. In this case however the make tools themselves can (fortunately!) lend a hand, as both NMake and GNU Make have an option which runs the makefile and echoes the commands which would be executed without actually running them. For NMake this is /n and GNU Make -n or --just-print.

The /B (NMake) -B (GNU Make) switch can also be used to ensure that all targets are executed regardless of the build status of the project (otherwise we wouldn't be able to read anything if the project build was up to date).

If we run a makefile with these switches and capture the output we can then parse the compiler command lines themselves and extract preprocessor symbols etc. from it. As so many compilers have similar command line switches for things like preprocessor symbols and include folders this is generally a pretty straightforward matter. Many variants of GCC also have command line options which can report details of built-in preprocessor symbols and system include folders, which we can obviously make use of - so the fact that many embedded C/C++ compilers today are based on GCC is very useful indeed.

There's a lot of detail I've not covered here, but I hope you get the general idea. It follows from the above that adding support for new project file types to Visual Lint is generally not difficult - provided we have access to representative project files and (preferably) a test installation of the IDE in question. If there is a project file type you would like us to look at, please tell us.

Posted by Anna at 12:32pm | Get Permalink