Don't Be Awkward, Use AWK!

Thursday, May 23

10:29PM

After providing a tutorial on grep and a brief peek into sed, we move on to another great Linux command-line utility, but one rarely used by beginners. Perhaps because it's really a full-blown programming language, which must bring images of impenetrable complexity to the minds of some?

Hopefully, this article will show you that AWK is not to be feared, but loved. It is extraordinarily useful, after all.

AWK (the name comes from the surnames of its developers: Aho, Weinberger and Kernighan) is a pattern-scanning and processing programming language specializing in textual data manipulation. The output of most, if not all command-line utilities is text, and AWK provides a way to filter and process all those rows and columns of data. It features C-like syntax for arithmetic operations, functions for working with strings, regular expressions and associative arrays.

There are several versions of the utility, including the AT&T original, an improved AT&T version called NAWK, and Free Software Foundation's GAWK (GNU AWK), which is found on most Linux distributions today. The examples given here are with GAWK in mind, but since it respects the POSIX AWK specification, all proper AWK programs should work without issues. It should be noted that AWK can be compiled for Windows as well.

The simplest way to explain AWK is to show it in action. Here's a sample:

awk 'length > 100' /var/log/dmesg.log

This snippet will print out lines of text longer than one hundred characters. AWK is a data-driven language, meaning you have to specify a pattern in the data on which you wish to operate, and then the operation itself. This is unlike procedural programming languages, where you'd specify every single step from start to result.

The function used here, length, is one of AWK's string manipulation functions. Others include: index (search for the position of a character), substr (extract a portion of a string) and tolower/toupper (GAWK-specific functions for easy case conversion). Of course, there are also numeric functions, and you can define your own.

ls -l | awk '{print $1 " " $9}'

The output of ls -l that we're piping into AWK is the long-format listing of information about files in a directory. Let's assume you don't want all that information, but just certain columns, such as permissions and file names. AWK operates on records (this usually means lines of text) and fields, which are pieces of the data in a line delimited by a certain character (the default character is a space). Therefore, the above command prints fields 1 and 9, which are exactly what we wanted. The quotes between $1 and $9 are for formatting purposes - we separate the two fields by three spaces.

There are a number of built-in variables in AWK at your disposal. Suppose there were too many fields to count in the above example, but you just wanted the first and the last one. You could use NF, which stands for Number (of) Fields:

ls -l | awk '{print $1, $NF}'

Notice the quotes between the two fields are gone. The comma is the default value of the OFS (Output Field Separator) variable, which is a character that signifies separation between fields.

If you'd like to see only records with five or more fields, you'd type:

awk 'NF<=5' /var/log/dmesg.log

Another useful one to know is NR (Number of Records), which enables you to print out only certain lines:

awk '{ if (NR > 250 && NR < 300) print $0}' /var/log/dmesg.log

This prints lines past 250 but before 300. Field $0 means the whole line.

As mentioned, you can use regular expressions in conjunction with AWK:

awk '/kernel/ { n++ }; END { print n+0 }' /var/log/dmesg.log

This tries to match the pattern "kernel" and increases the variable n by one whenever it does match. The keyword END means the result is printed after the file has been processed; otherwise it would output the current value for every line in the file.

Hopefully these examples have given you at least a hint of AWK's usefulness. It can be used for everything from numbering and calculation to generating reports, manipulating strings, managing small personal databases, sorting data, extracting information, and more. It would not be an exaggeration to say it's brimming with functionality. As with sed, examples, tutorials and manuals are available online for free.

Image credit: 1

Submitted by

Ivana Isadora

May 23, 2013 at 10:29 PM in

Linux,

Programming,

TekSocial,

Terminal,

Tools tagged

Linux,

coding,

command-line,

grep,

programming,

tools

Post Comment

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>