For many system administrators, Awk is used only as a way to print specific
columns of data from programs that generate columnar output, such as netstat
or ps
. For example, to get a list of all the IP addresses and ports with open
TCP connections on a machine, one might run the following:
# netstat -ant | awk '{print $5}'
This works pretty well, but among the data you actually wanted it also includes the fifth word of the opening explanatory note, and the heading of the fifth column:
and
Address
0.0.0.0:*
205.188.17.70:443
172.20.0.236:5222
72.14.203.125:5222
There are varying ways to deal with this.
Matching patterns
One common way is to pipe the output further through a call to grep
, perhaps
to only include results with at least one number:
# netstat -ant | awk '{print $5}' | grep '[0-9]'
In this case, it’s instructive to use the awk
call a bit more intelligently
by setting a regular expression which the applicable line must match in order
for that field to be printed, with the standard /
characters as delimiters.
This eliminates the need for the call to grep
:
# netstat -ant | awk '/[0-9]/ {print $5}'
We can further refine this by ensuring that the regular expression should only
match data in the fifth column of the output, using the ~
operator:
# netstat -ant | awk '$5 ~ /[0-9]/ {print $5}'
Skipping lines
Another approach you could take to strip the headers out might be to use sed
to skip the first two lines of the output:
# netstat -ant | awk '{print $5}' | sed 1,2d
However, this can also be incorporated into the awk
call, using the NR
variable and making it part of a conditional checking the line number is
greater than two:
# netstat -ant | awk 'NR>2 {print $5}'
Combining and excluding patterns
Another common idiom on systems that don’t have the special pgrep
command is
to filter ps
output for a string, but exclude the grep
process itself from
the output with grep -v grep
:
# ps -ef | grep apache | grep -v grep | awk '{print $2}'
If you’re using Awk to get columnar data from the output, in this case the
second column containing the process ID, both calls to grep
can instead be
incorporated into the awk
call:
# ps -ef | awk '/apache/ && !/awk/ {print $2}'
Again, this can be further refined if necessary to ensure you’re only matching the expressions against the command name by specifying the field number for each comparison:
# ps -ef | awk '$8 ~ /apache/ && $8 !~ /awk/ {print $2}'
If you’re used to using Awk purely as a column filter, the above might help to increase its utility for you and allow you to write shorter and more efficient command lines. The Awk Primer on Wikibooks is a really good reference for using Awk to its fullest for the sorts of tasks for which it’s especially well-suited.