I've just released lscp, a lightweight source code preprocesser. Check it out on GitHub.
This is one of the many tools I've written to conduct my research in using IR models on software repositories. So many people have asked me for a copy of the tool, that I decided to clean it up a bit and make it accessible to the world.
Check out the GitHub page for a detailed description and how to use it. Feel free to fork it and extend it, or add any bugs or feature requests you find to the issue tracker.
Saturday, 8 September 2012
Wednesday, 5 September 2012
A Linux one-liner to find all the acronyms in your Latex files
At the beginning of my PhD thesis, I include a List of Acronyms. Of course, I would like to be sure that my list is comprehensive. I don't want any strange acronyms to appear in the text of my thesis, without first appearing in my list of acronyms. But how can I easily identify all of the acronyms in the Latex source, without having to read all 244 pages manually?
Note that this command works with any text file; it is not unique to Latex. Just change the cat command.
grep to the rescue, again
Like most other areas of my life, this problem can be easily solved with a Linux one-liner centered around grep:
cat *.tex | grep -wo "[A-Z]\+\{2,10\}" | sort | uniq -c | sort -gr
Let's take a look at the pipeline:
- The cat *.tex outputs all my Latex to standard output.
- The grep -wo "[A-Z]\+\{2,10\}" matches whole words (the -w flag) that contain between 2 and 10 upper case letters. The -o flag returns only the match, not the entire line.
- The first sort sorts the acronyms, which is useful for the next step.
- The uniq gets rid of duplicates, but retains a counter because of the -c flag.
- Finally, the second sort sorts the entries numerically (-g) and reverses the results (-r).
Here's the output on my thesis:
292 IR
241 LDA
166 LSI
125 VSM
87 TCP
80 EM
35 APFD
34 SUT
29 HSD
22 TOPIC
22 II
18 MALLET
16 LOC
14 PS
14 OR
14 IDE
12 CALLG
11 ICA
10 RNDM
10 MAP
10 KL
10 CS
...
Note that this command works with any text file; it is not unique to Latex. Just change the cat command.
Subscribe to:
Posts (Atom)