I've just released lscp, a lightweight source code preprocesser. Check it out on GitHub.
This is one of the many tools I've written to conduct my research in using IR models on software repositories. So many people have asked me for a copy of the tool, that I decided to clean it up a bit and make it accessible to the world.
Check out the GitHub page for a detailed description and how to use it. Feel free to fork it and extend it, or add any bugs or feature requests you find to the issue tracker.