Learning AWK Programming
上QQ阅读APP看书,第一时间看更新

Workflow of AWK

Having a basic knowledge of the AWK interpreter workflow will help you to better understand AWK and will result in more efficient AWK program development. Hence, before getting your hands dirty with AWK programming, you need to understand its internals. The AWK workflow can be summarized as shown in the following figure:

Figure 1.1:  AWK workflow

Let's take a look at each operation:

  • READ OPERATION: AWK reads a line from the input stream (file, pipe, or stdin) and stores it in memory. It works on text input, which can be a file, the standard input stream, or from a pipe, which it further splits into records and fields:
    • Records: An AWK record is a single, continuous data input that AWK works on. Records are bounded by a record separator, whose value is stored in the RS variable. The default value of RS is set to a newline character. So, the lines of input are considered records for the AWK interpreter. Records are read continuously until the end of the input is reached. Figure 1.2  shows how input data is broken into records and then goes further into how it is split into fields:
Figure 1.2: AWK input data is split into records with the record separator
    • Fields: Each record can further be broken down into individual chunks called fields. Like records, fields are bounded. The default field separator is any amount of whitespace, including tab and space characters. So by default, lines of input are further broken down into individual words separated by whitespace. You can refer to the fields of a record by a field number, beginning with 1. The last field in each record can be accessed by its number or with the NF special variable, which contains the number of fields in the current record, as shown in Figure 1.3:
Figure 1.3: Records are split into fields by a field separator
  • EXECUTE OPERATION: All AWK commands are applied sequentially on the input (records and fields). By default, AWK executes commands on each record/line. This behavior of AWK can be restricted by the use of patterns.
  • REPEAT OPERATION: The process of read and execute is repeated until the end of the file is reached.

The following flowchart depicts the workflow of the AWK interpreter:

Figure 1.4:   Workflow of the AWK interpreter