For many system administrators, Awk is used only as a way to print specific
columns of data from programs that generate columnar output, such as netstat
or ps. For example, to get a list of all the IP addresses and ports with open
TCP connections on a machine, one might run the following:
# netstat -ant | awk '{print $5}'
This works pretty well, but among the data you actually wanted it also includes the fifth word of the opening explanatory note, and the heading of the fifth column:
and
Address
0.0.0.0:*
205.188.17.70:443
172.20.0.236:5222
72.14.203.125:5222
There are varying ways to deal with this.
Matching patterns
One common way is to pipe the output further through a call to grep, perhaps
to only include results with at least one number:
# netstat -ant | awk '{print $5}' | grep '[0-9]'
In this case, it’s instructive to use the awk call a bit more intelligently
by setting a regular expression which the applicable line must match in order
for that field to be printed, with the standard / characters as delimiters.
This eliminates the need for the call to grep:
# netstat -ant | awk '/[0-9]/ {print $5}'
We can further refine this by ensuring that the regular expression should only
match data in the fifth column of the output, using the ~ operator:
# netstat -ant | awk '$5 ~ /[0-9]/ {print $5}'
Skipping lines
Another approach you could take to strip the headers out might be to use sed
to skip the first two lines of the output:
# netstat -ant | awk '{print $5}' | sed 1,2d
However, this can also be incorporated into the awk call, using the NR
variable and making it part of a conditional checking the line number is
greater than two:
# netstat -ant | awk 'NR>2 {print $5}'
Combining and excluding patterns
Another common idiom on systems that don’t have the special pgrep command is
to filter ps output for a string, but exclude the grep process itself from
the output with grep -v grep:
# ps -ef | grep apache | grep -v grep | awk '{print $2}'
If you’re using Awk to get columnar data from the output, in this case the
second column containing the process ID, both calls to grep can instead be
incorporated into the awk call:
# ps -ef | awk '/apache/ && !/awk/ {print $2}'
Again, this can be further refined if necessary to ensure you’re only matching the expressions against the command name by specifying the field number for each comparison:
# ps -ef | awk '$8 ~ /apache/ && $8 !~ /awk/ {print $2}'
If you’re used to using Awk purely as a column filter, the above might help to increase its utility for you and allow you to write shorter and more efficient command lines. The Awk Primer on Wikibooks is a really good reference for using Awk to its fullest for the sorts of tasks for which it’s especially well-suited.
Probably you already know it, but your examples of combining and excluding patterns could get an step ahead in simplification if you use this little trick:
ps -ef | awk ‘$8 ~ /[a]pache/ {print $2}’
Doing that the awk process string will include the square brackets and won’t be matched by the regexp
BTW, nice blog!
In your last two examples, you are filtering out the grep command, which you’re not longer using. So you might want to exchange that !/grep/ to !/awk/. And keep in mind the comment above, which is somewhat cleaner. But is harder to use in scripts or functions, where you use a variable as your search string.
Bye the way: I find a lot of usefull information on your blog, nice work.
Corrected! Thank you.