One prominent feature of an IDE is a built-in system for managing files, both the elementary functions like moving, renaming, and deleting, and ones more specific to development, like compiling and checking syntax. It may also be useful to have operations on sets of files, such as finding files of a certain extension or size, or searching files for specific patterns. In this first article, I’ll explore some useful ways to use tools that will be familiar to most Linux users for the purposes of working with sets of files in a project.
Listing files
Using ls is probably one of the first commands an administrator will learn
for getting a simple list of the contents of the directory. Most administrators
will also know about the -a and -l switches, to show all files including
dot files and to show more detailed data about files in columns, respectively.
There are a few other switches to ls which are a bit less frequently used,
and turn out to be very useful for programming:
-t— List files in order of last modification date, newest first. This is useful for very large directories when you want to get a quick list of the most recent files changed, maybe piped throughheadorsed 10q. Probably most useful combined with-l. If you want the oldest files, you can add-rto reverse the list.-X— Group files by extension; handy for polyglot code, to group header files and source files separately, or to separate source files from directories or build files.-v— Naturally sort version numbers in filenames.-S— Sort by filesize.-R— List files recursively. This one is good combined with-land pipedthrough a pager likeless.
Since the listing is text like anything else, you could, for example, pipe the
output of this command into a vim process, so you could add explanations of
what each file is for and save it as an inventory file or add it to a README:
$ ls -XR | vim -
This kind of stuff can even be automated by make with a little work, which
I’ll cover in another article later in the series.
Finding files
Funnily enough, you can get a complete list of files including relative paths
by simply typing find with no arguments, though it’s usually a good idea to
pipe it through sort:
$ find | sort
.
./Makefile
./README
./build
./client.c
./client.h
./common.h
./project.c
./server.c
./server.h
./tests
./tests/suite1.pl
./tests/suite2.pl
./tests/suite3.pl
./tests/suite4.pl
If you want an ls -l style listing, you can add -ls as the action to find
results:
$ find -ls | sort -k 11
1155096 4 drwxr-xr-x 4 tom tom 4096 Feb 10 09:37 .
1155152 4 drwxr-xr-x 2 tom tom 4096 Feb 10 09:17 ./build
1155155 4 -rw-r--r-- 1 tom tom 2290 Jan 11 07:21 ./client.c
1155157 4 -rw-r--r-- 1 tom tom 1871 Jan 11 16:41 ./client.h
1155159 32 -rw-r--r-- 1 tom tom 30390 Jan 10 15:29 ./common.h
1155153 24 -rw-r--r-- 1 tom tom 21170 Jan 11 05:43 ./Makefile
1155154 16 -rw-r--r-- 1 tom tom 13966 Jan 14 07:39 ./project.c
1155080 28 -rw-r--r-- 1 tom tom 25840 Jan 15 22:28 ./README
1155156 32 -rw-r--r-- 1 tom tom 31124 Jan 11 02:34 ./server.c
1155158 4 -rw-r--r-- 1 tom tom 3599 Jan 16 05:27 ./server.h
1155160 4 drwxr-xr-x 2 tom tom 4096 Feb 10 09:29 ./tests
1155161 4 -rw-r--r-- 1 tom tom 288 Jan 13 03:04 ./tests/suite1.pl
1155162 4 -rw-r--r-- 1 tom tom 1792 Jan 13 10:06 ./tests/suite2.pl
1155163 4 -rw-r--r-- 1 tom tom 112 Jan 9 23:42 ./tests/suite3.pl
1155164 4 -rw-r--r-- 1 tom tom 144 Jan 15 02:10 ./tests/suite4.pl
Note that in this case I have to specify to sort that it should sort by the
11th column of output, the filenames; this is done with the -k option.
find has a complex filtering syntax all of its own; the following examples
show some of the most useful filters you can apply to retrieve lists of certain
files:
find -name '*.c'— Find files with names matching a shell-style pattern. Use-inamefor a case-insensitive search.find -path '*test*'— Find files with paths matching a shell-style pattern. Use-ipathfor a case-insensitive search.find -mtime -5— Find files edited within the last five days. You can use+5instead to find files edited before five days ago.find -newer server.c— Find files more recently modified thanserver.c.find -type d— Find directories. For files, use-type f; for symbolic links, use-type l.
Note, in particular, that all of these can be combined, for example to find C source files edited in the last two days:
$ find -name '*.c' -mtime -2
By default, the action find takes for search results is simply to list them
on standard output, but there are several other useful actions:
-ls— Provide anls -lstyle listing, as above-delete— Delete matching files-exec— Run an arbitrary command line on each file, replacing{}with the appropriate filename, and terminated by\;; for example:$ find -name '*.pl' -exec perl -c {} \;It might be a bit more straightforward to use
xargsin most cases, though, to turn the printed results into arguments for a command:$ find -name '*.pl' | xargs perl -c-print0— If you’re dealing with filenames with spaces and intend to pipe results toxargsas above, use this to make the record separator a null character rather than a space to handle this, along with the-0option forxargs:$ find -name '*.jpg' -print0 | xargs -0 jpegoptim
One trick I find myself using often is using find to generate lists of files
that I then edit in vertically split Vim windows:
$ vim -O $(find . -name '*.c')
Searching files
More often than attributes of a set of files, however, you want to find files
based on their contents, and it’s no surprise that grep, in particular
grep -R, is useful here. This searches the current directory tree recursively
for anything matching ‘someVar’:
$ grep -FR 'someVar' .
Don’t forget the case insensitivity flag either, since by default grep works
with fixed case:
$ grep -iR 'somevar' .
Also, you can print a list of files that match without printing the matches
themselves with grep -l, which again is very useful for building a list of
files to edit in your chosen text editor:
$ vim -O $(grep -lR 'somevar' .)
If you’re using version control for your project, this often includes metadata
in the .svn, .git, or .hg directories. This is dealt with easily enough
by excluding (grep -v) anything matching an appropriate fixed (grep -F)
string:
$ grep -R 'someVar' . | grep -vF '.svn'
With all this said, there’s a very popular alternative to grep called
ack, which excludes this sort of stuff for you by default. It also allows you
to use Perl-compatible regular expressions (PCRE), which are a favourite for
many hackers. It has a lot of utilities that are generally useful for working
with source code, so while there’s nothing wrong with good old grep since you
know it will always be there, if you can install ack I highly recommend it.
There’s a Debian package called ack-grep, and being a Perl script it’s
otherwise very simple to install.
Unix purists might be displeased with my even mentioning a relatively new Perl
script alternative to classic grep, but I don’t believe that the Unix
philosophy or using Unix as an IDE is dependent on sticking to the same classic
tools when alternatives with the same spirit that solve new problems are
available.
File metadata
The file tool gives you a one-line summary of what kind of file you’re
looking at, based on its extension, headers and other cues. This is very handy
used with find and xargs when examining a set of unfamiliar files:
$ find | xargs file
.: directory
./hanoi: Perl script, ASCII text executable
./.hanoi.swp: Vim swap file, version 7.3
./factorial: Perl script, ASCII text executable
./bits.c: C source, ASCII text
./bits: ELF 32-bit LSB executable, Intel 80386, version ...
Matching files
As a final tip for this section, I’d suggest learning a bit about pattern matching and brace expansion in Bash, which you can do in my earlier post entitled Bash shell expansion.
All of the above make the classic UNIX shell into a pretty powerful means of managing files in programming projects.
<< Unix as IDE: IntroductionUnix as IDE: Editing >>
The default separator for xargs is the newline, not spaces. ‘find … | xargs …’ is fine unless you have filenames that consist of more than one line. :) (Yes it’s possible.)
Just be careful when putting filenames in shell variables. You have to double-quote variable expansion or the shell will do extra processing like word-splitting after the variable has been substituted for its value. (E.g. if I have the filename “foo bar” in $myfile, I have to use e.g. ‘rm “$myfile”‘ for it to work. ‘rm $myfile’ will result in ‘rm’ receiving the two words in “foo bar” separately, and thus trying to remove two files named “foo” and “bar”.)
‘find | xargs file’ can actually fail even with single-line filenames if they contain spaces. To handle this case and the case of newlines do ‘find -print0 | xargs -0 file’ (thanks to the writers of the man pages for find and xargs for this).
Sorry but this isn’t quite right, at least on my system; xargs definitely uses whitespace to tokenise:
However, your advice about double-quoting variable expansion is good and a worthy pitfall to watch out for, thanks for your contribution.
$ grep -R ‘someVar’ . | grep -vF ‘.svn’
There is a tool called ‘ack’ designed for this purpose:
“ack is a tool like grep, designed for programmers with large trees of heterogeneous source code.” – http://betterthangrep.com/
Yes, you might note I do mention Ack a bit later in the article. I’m a convert myself!
I’m surprised that you didn’t mention cscope for doing searches. If you have a large code base, doing a Grep or ack will take a long time each time you execute it because it recursively hits the directory tree every time. Cscope indexes files and makes it easy to do search and replaces in c-style, perl, and python.
I used to use Grep until I switched to cscope. Haven’t gone back, except on windows development.
I didn’t actually know about cscope. It looks great. I’ll look into it and maybe add some references to the article. Thanks for the tip!
Some times it is easier to browse files, in vim you can use :40Sex!
Suppose there is a large Java project and a user wants to rename some class and move it to another package. In order to perform this correctly a number of changes required: rename a file with the class, rename the class, rename all the references to this class (in java and xml files), move the file to another directory that corresponds to the new package, update all the imports of this class inside java files, and finally record the moving of file in the version control system. Can you please describe a scenario how would you do it in a Unix as IDE? How all this command line stuff can help to perform such a high level operation?
The argument in this series is not that you should use UNIX as an IDE for every kind of programming. The case you’ve outlined there is certainly better suited to a Java IDE. The point of the series is discussing the features that make UNIX a capable development environment, not the be-all and end-all.
So with all these UNIX command line tools and text editors it is not possible to automate such high-level operations as moving/renaming classes in a project? UNIX as IDE is not capable for such things? Just want to understand the capabilities of UNIX as IDE.
There’s no reason you couldn’t automate it yourself to some extent using tools like Cscope, but it doesn’t tend to be a built-in feature on things like text editors. It might be better to think of UNIX less in terms of a blow-by-blow feature comparison and more in terms of being a programmable development environment. Much less of the behaviour is prescribed. It’s not necessarily a better way to do things, just different and preferable to a lot of people.
Choice of language helps a lot. If you develop in Java or C# on Linux you won’t find it as easy as you would developing in, say, C or Python. It’s not really an ideological war; it really does come down to choice. If I had to develop in C# for my day job (I don’t), I would almost certainly use Visual Studio, though probably with something like viemu to make the text editor behave like vi.
Great series! :-) I generally substitute du(1) and cut(1) for find(1), usually using
du -a path | cut -f 2. I don’t have to remember the many flags for find that can usually be done simply using other basic shell commands.Could you show an example use of ‘ls -v’? The man pages don’t have one and googling didn’t turn up anything, so this seems like a hidden gem. In particular, does it expect version numbers in the filename or contents? Do the versions have to have a specific format?
Thanks!
The ‘X’ and ‘v’ options to ls are gnu extensions, as is the option of leaving out the path in find. ack actually uses perl regular expressions, not the no longer accurately named perl compatible regular expressions.