Elegant Awk usage | Arabesque

For many system administrators, Awk is used only as a way to print specific columns of data from programs that generate columnar output, such as netstat or ps. For example, to get a list of all the IP addresses and ports with open TCP connections on a machine, one might run the following:

# netstat -ant | awk '{print $5}'

This works pretty well, but among the data you actually wanted it also includes the fifth word of the opening explanatory note, and the heading of the fifth column:

and
Address
0.0.0.0:*
205.188.17.70:443
172.20.0.236:5222
72.14.203.125:5222

There are varying ways to deal with this.

Matching patterns

One common way is to pipe the output further through a call to grep, perhaps to only include results with at least one number:

# netstat -ant | awk '{print $5}' | grep '[0-9]'

In this case, it’s instructive to use the awk call a bit more intelligently by setting a regular expression which the applicable line must match in order for that field to be printed, with the standard / characters as delimiters. This eliminates the need for the call to grep:

# netstat -ant | awk '/[0-9]/ {print $5}'

We can further refine this by ensuring that the regular expression should only match data in the fifth column of the output, using the ~ operator:

# netstat -ant | awk '$5 ~ /[0-9]/ {print $5}'

Skipping lines

Another approach you could take to strip the headers out might be to use sed to skip the first two lines of the output:

# netstat -ant | awk '{print $5}' | sed 1,2d

However, this can also be incorporated into the awk call, using the NR variable and making it part of a conditional checking the line number is greater than two:

# netstat -ant | awk 'NR>2 {print $5}'

Combining and excluding patterns

Another common idiom on systems that don’t have the special pgrep command is to filter ps output for a string, but exclude the grep process itself from the output with grep -v grep:

# ps -ef | grep apache | grep -v grep | awk '{print $2}'

If you’re using Awk to get columnar data from the output, in this case the second column containing the process ID, both calls to grep can instead be incorporated into the awk call:

# ps -ef | awk '/apache/ && !/awk/ {print $2}'

Again, this can be further refined if necessary to ensure you’re only matching the expressions against the command name by specifying the field number for each comparison:

# ps -ef | awk '$8 ~ /apache/ && $8 !~ /awk/ {print $2}'

If you’re used to using Awk purely as a column filter, the above might help to increase its utility for you and allow you to write shorter and more efficient command lines. The Awk Primer on Wikibooks is a really good reference for using Awk to its fullest for the sorts of tasks for which it’s especially well-suited.