Go to the first, previous, next, last section, table of contents.
The previous
subsection
discussed the use of single characters or simple strings as the
value of FS.
More generally, the value of FS may be a string containing any
regular expression. In this case, each match in the record for the regular
expression separates fields. For example, the assignment:
FS = ", \t"
makes every area of an input line that consists of a comma followed by a space and a tab, into a field separator. (`\t' is an escape sequence that stands for a tab; see section Escape Sequences, for the complete list of similar escape sequences.)
For a less trivial example of a regular expression, suppose you want
single spaces to separate fields the way single commas were used above.
You can set FS to "[ ]" (left bracket, space, right
bracket). This regular expression matches a single space and nothing else
(see section Regular Expressions).
There is an important difference between the two cases of `FS = " "'
(a single space) and `FS = "[ \t\n]+"' (left bracket, space,
backslash, "t", backslash, "n", right bracket, which is a regular
expression matching one or more spaces, tabs, or newlines). For both
values of FS, fields are separated by runs of spaces, tabs
and/or newlines. However, when the value of FS is "
", awk will first strip leading and trailing whitespace from
the record, and then decide where the fields are.
For example, the following pipeline prints `b':
$ echo ' a b c d ' | awk '{ print $2 }'
-| b
However, this pipeline prints `a' (note the extra spaces around each letter):
$ echo ' a b c d ' | awk 'BEGIN { FS = "[ \t]+" }
> { print $2 }'
-| a
In this case, the first field is null, or empty.
The stripping of leading and trailing whitespace also comes into
play whenever $0 is recomputed. For instance, study this pipeline:
$ echo ' a b c d' | awk '{ print; $2 = $2; print }'
-| a b c d
-| a b c d
The first print statement prints the record as it was read,
with leading whitespace intact. The assignment to $2 rebuilds
$0 by concatenating $1 through $NF together,
separated by the value of OFS. Since the leading whitespace
was ignored when finding $1, it is not part of the new $0.
Finally, the last print statement prints the new $0.
Go to the first, previous, next, last section, table of contents.