comp.lang.awk FAQ

Last-modified: 1996-Sep-24

Frequently Asked Questions == FAQ


========================================================================

Contents:

   1. Disclaimer
   2. Can you answer my awk question?
   3. How can I add a FAQ and its answer to the FAQ list?
   4. What is awk?
   5. What well-maintained awk-compatible languages are there?
     5.1 nawk
     5.2 gawk
     5.3 mawk
     5.4 tawk
     5.5 mksawk
     5.6 awkcc
     5.7 awk2c
     5.8 a2p
   6. Where can I buy awk?
     6.1 AT&T (awk, awkcc)
     6.2 Thompson Automation (tawk)
     6.3 MKS (awk, can generate standalone interpreted .exe)
   7. Where can I get awk for free?  For what platforms?
     7.0 meta-answer
     7.1 the one true awk
     7.2 gawk
       7.2.1 gawk precompiled for MS-DOS or OS/2
       7.2.2 gawk precompiled for Macintosh
       7.2.3 jgawk (Japanese gawk)
       7.2.4 gawk.dll
     7.3 mawk
     7.4 awk2c
     7.5 awkcc
     7.6 various old binary-only distributions for MSDOS
   8. Why would anyone still use awk instead of perl?
   9. How can I learn awk?
  10. What are some other awk resources?
  11. How do I report a bug in gawk?
  12. What's wrong with gawk on Digital's OSF/1?
  13. How can I access shell or environment variables in an awk script?
    13.1 Environment variables in general
    13.2 Quoting
    13.3 ENVIRON and "env"|
    13.4 exporting environment variables back to the parent process
  14. Is there an easy way to determine if you have oawk or nawk?
  15. How can awk test for the existence of a file?
  16. How many elements were created by split()?
  17. How do I use tolower() in the SunOS nawk?
  98. Miscellaneous
  99. Credits


========================================================================

1. Disclaimer

Read at your own risk.  The current, previous, or original authors
make no claim as to fitness for any purpose or absence of any errors,
and offer no warranty.  Do not eat.


========================================================================

2. Can you answer my awk question?

Probably not.  Please don't mail it to me.

Read the FAQ, and the materials pointed to by it, and if you can't find
an answer there, by all means post to the newsgroup.

A FAQ list is intended to reduce traffic on a newsgroup, not eliminate it.

Return
========================================================================

3. How can I add a FAQ and its answer to the FAQ list?

Mail BOTH of them to me.  Then I can add them to the FAQ and it should
help people who have that same question later, as well as everyone who
reads the group, because they won't see it asked and answered so often.

I do not work on this FAQ every day, but I will try to get updates
incorporated in a timely manner.

  Return
========================================================================

4. What is awk?

awk is a programming language, named after its three original authors:

  Alfred V. Aho
  Brian W. Kernighan
  Peter J. Weinberger

they write:

``
  Awk is a convenient and expressive programming language that can be
  applied to a wide variety of computing and data-manipulation tasks.
''

the title of the book uses `AWK', but the contents of the book use `awk'
(except at the beginning of sentences, as above).  I will attempt to do
the same (except perhaps at the beginning of sentences, as above).

Return
========================================================================

5. What well-maintained awk-compatible languages are there?

  5.1 nawk
    AT&T's `new awk' -- probably nobody uses the `old awk' anymore.
    interpreter
    might NOT be well-maintained

  5.2 gawk
    from the GNU project
    interpreter

  5.3 mawk
    from Michael Brennan <mailto:brennan@whidbey.com>
    interpreter

  5.4 tawk
    from Thompson Automation
    interpreter
    compiler

  5.5 mksawk
    interpreter
    compiler
    from Mortice Kern Systems

    an old version of mksawk is shipped as `nawk' on Ultrix and
    OSF/1.

  5.6 awkcc
    translator to C
    might NOT be well-maintained

  5.7 awk2c
    translator to C
    uses GNU awk libraries extensively, and is subject to GPL

  5.8 a2p
    translator to Perl
    comes with Perl
    doesn't handle multiple concatenations in a row:  e.g., "x" "y" "z"
      -> must be in pairs:  i.e.,  ( "x" "y" ) "z"

Return
========================================================================

6. Where can I buy awk?

6.1 AT&T (awk, awkcc)

  _The AWK Programming Language_ says:
    phone
      +1 201 522 6900 [is this number still valid?]
    and login as `guest'.

  <http://www.unipress.com/att/new/awk.html>
  <http://www.unipress.com/att/new/awkcc.html>
  <http://www.bell-labs.com/org/ssg/new/awk.html>
  <http://www.bell-labs.com/org/ssg/new/awkcc.html>

  these versions might NOT be well-maintained

6.2 Thompson Automation (tawk)

  <http://www.tasoft.com/~thompson/tawk.html>

  <http://www.teleport.com/~thompson/>

  Thompson Automation Software
  5616 SW Jefferson
  Portland, OR   97221
  USA

  North America: 800-944-0139
  Phone: +1 503 224 1639
  Fax: +1 503 224 3230

6.3 MKS (awk, can generate standalone interpreted .exe)

  <http://www.mks.ca/solution/tk/>

  Mortice Kern Systems
  185 Columbia Street W
  Waterloo, ON
  N2L 5Z5
  Canada

  North America: 800-265-2797
  Phone: +1 519 884 2251
  Fax: +1 519 884 8861

Return
========================================================================

7. Where can I get awk for free?  For what platforms?

  7.0 meta-answer
    Obtaining Awk and Perl
    <http://uts.cc.utexas.edu/~churchh/awk-perl.html>

  7.1 the one true awk
    <http://plan9.bell-labs.com/who/bwk/awk.sh>
    <ftp://netlib.bell-labs.com/netlib/research/awk.bundle.Z>

    This is the version of awk described in "The Awk Programming Language",
    by A. V. Aho, B. W. Kernighan, and P. J. Weinberger
    (Addison-Wesley, 1988, ISBN 0-201-07981-X).
    Changes, mostly bug fixes, are listed in FIXES.

  7.2 gawk
    ftp.gnu.ai.mit.edu/pub/gnu/gawk*
    <ftp://ftp.gnu.ai.mit.edu/pub/gnu/>
    e.g.,
      <ftp://ftp.gnu.ai.mit.edu/pub/gnu/gawk-3.0.0.tar.gz>

      7.2.1 gawk precompiled for MS-DOS or OS/2

        32bit DOS (djgpp) and 16bit OS/2 and DOS (msc) versions are
        part of the GNUish project:

        <http://www.coast.net/SimTel/gnu/gnuish.html>      (US)
        <http://www.simtel.net/simtel.net/>                (US)
        <ftp://ftp.simtel.net/pub/simtelnet/gnu/gnuish/>
        <http://www.leo.org/pub/comp/platforms/pc/gnuish/> (Germany)
        <http://wuarchive.wustl.edu/systems/msdos/gnuish/> (US)

        32bit OS/2 and DOS (emx) versions:

        <http://www.leo.org/pub/comp/os/os2/gnu/script/gnuawk.zip> (Germany)
        <ftp://ftp-os2.cdrom.com/pub/os2/lang/gnuawk.zip>          (US)

      7.2.2 gawk precompiled for Macintosh

        <ftp://ftp.funet.fi/pub/mac/programming/>
        <ftp://ezinfo.ethz.ch/mac/programming/>
        <ftp://ftp.uwtc.washington.edu/pub/Mac/Programming/>
        <ftp://ftp.eos.hokudai.ac.jp/pub/mac/util/Gawk/>
        <ftp://ftp.fu-berlin.de/mac/lang/MPW/>
        <ftp://pascal.zedat.fu-berlin.de/mac/lang/MPW/>
        <ftp://ftp.cs.tu-berlin.de/pub/mac/lang/MPW/>
        <ftp://nic.switch.ch/software/mac/src/mpw_c/>

      7.2.3 jgawk (Japanese gawk)

        <ftp://ftp.eos.hokudai.ac.jp/pub/mac/util/jgawk/>

      7.2.4 gawk.dll

        <http://www.muc.de/~walkerj/>
          Gawk 2.15.2 plus extensions
          + Read/Write functions for INI files
          + Read-only functions for DBF files

  7.3 mawk
    <ftp://ftp.whidbey.net/pub/brennan/>
      e.g.,
      <ftp://ftp.whidbey.net/pub/brennan/mawk1.3.2.tar.gz>

    <ftp://oxy.edu/public/>
    e.g.,
      <ftp://oxy.edu/public/mawk1.2.2.tar.gz>
      <ftp://oxy.edu/public/mawk1.3beta.tar.gz>

  7.4 awk2c
    <ftp://sunsite.unc.edu/pub/Linux/utils/text/awk2c050.tgz>

  7.5 awkcc [binary distribution only?!  for what platform?  DOS?]
    <http://www.mks.ca/files/support/unsupported/awkcc.tgz>
    binary file, not marked as such -- your browser may fumble it.
    `lynx -source http://... > awkcc.tgz' works

  7.6 various old binary-only distributions for MSDOS
    ftp.coast.net:/pub/SimTel/msdos/awk/
    <http://www.coast.net/SimTel/msdos/awk.html>

Return
========================================================================

8. Why would anyone still use awk instead of perl?

  a valid question, since awk is a subset of perl.  however:

  - awk is simpler
  - you may already know awk well enough
  - you may already have awk installed
  - awk can be smaller, thus much quicker to execute for small programs
  - awk variables don't have `$' in front of them :-)

Return
========================================================================

9. How can I learn awk?

  The commercial vendors of DOS versions (MKS and Thompson) each have
  their own well written books with examples.  [available separately?]

  English Book:

      _The AWK Programming Language_, by Aho, Kernighan and Weinberger,
      who invented the language. Published by Addison-Wesley. Lots of
      good material in not a lot of space.  A little out of date
      w.r.t. POSIX awk.

      ISBN 0-201-07981-X

      <http://www.heg-school.aw.com/cseng/authors/aho/awk/awk.html>

  English Book:

      _Effective AWK Programming_ by Arnold Robbins.  Published by
      SSC (+1 206-FOR-UNIX, <http://www.ssc.com>, <mailto:sales@ssc.com>).
      Also published by the FSF as "The GNU AWK User's Guide"; Texinfo
      source is included with the gawk distribution, so you can also
      print this yourself.

      ISBN 0-916151-88-3

      Russell recommends buying the book instead of printing it all
      out, for three reasons:

        1. it's probably cheaper than using your own toner and paper.

        2. some money goes back to help further development, both to
           Arnold Robbins (only if you buy from SSC) and the Free
           Software Foundation (if you buy from either SSC or the FSF).

        3. it helps convince publishers that we _like_ having full
           documentation available on-line (e.g., for searching), but
           will still pay for a compact, bound copy

      information, including an errata list, is on the web site.

      <http://www.ssc.com/ssc/eap/>

  English Book:

      _Sed & Awk_, by Dale Dougherty, published by O'Reilly and
      associates.  A nice introduction to sed and awk, showing how
      they relate to each other. However, the first edition is full
      of typos and out-and-out mistakes. A second edition is supposedly
      in the works.

      [does anyone have an errata list?  I'm told it would be `too huge'.]

      <http://www.ora.com/catalog/sed/noframes.html>
      <http://www.ora.com/catalog/covers/sedawk-t.gif>

      ISBN 0-937175-59-5

  Deutsch Book:

      Awk und Sed, by Helmut Herold.

      <http://www.addison-wesley.de/KD/BS/685.html>

      [ISBN ?]

      [any other information?]

  Web Site:

      <http://www.cs.hmc.edu/FAQ/qref/awk.html>

      Getting started with Awk

  Web Site:

      <http://www.uga.edu/~ucns/sss/unix/awk.html>

      Awk introduction

  Web Site:

      <http://www.mbnet.mb.ca/~natewild/awk/awk.html>

      Information about Tawk; Awk sample source code

  Web Site:

      <http://www.ssc.com/lj/issue25/1156.html>

      Introduction to Gawk

  Web Site:

      <http://wire.xenitec.on.ca:457/OSUserG/_Learning_awk.html>

      [ unseen - isn't altavista handy for this kind of thing? ]

Return
========================================================================

10. What are some other awk resources?

  Unix and awk courseware:

      <http://www.cit.ac.nz/smac/os202/default.htm>

  Awk course

      <http://www.ee.ic.ac.uk/course/advanced/awk/awk.html>

  Developer information on awk

      <http://www.devinfo.com/languages/awk/>

  Spatial Analysis with Awk (course)

      <http://www.udel.johnmack/frec682/682awk.html

  Debugger and Assertion Checker for Awk

      <http://www.irisa.fr/EXTERNE/manifestations/AADEBUG95/Abstracts/auguston2.html>

  Free Compilers and Interpreters List

      <http://www.idiom.com/free-compilers/LANG/awk-1.html>

  Voicenet.com awk page

      <http://www.voicenet.com/tech/comp/prog/awk/>

  Four awk implementations for MS-DOS:  How do they compare?

      <http://www.voicenet.com/tech/comp/prog/awk/awk2.rev>

  Gawk 3 manual

      <http://hill.ucs.ualberta.ca/Documentation/gawk-3.0.0/gawk_toc.html>

  Unix Vault

      <http://www.nda.com/~jblaine/vault/>

  Yahoo's awk links

      <http://www.yahoo.com/Computers_and_Internet/Programming_Languages/Awk/>

      [ I've tried multiple times to change alt.lang.awk to comp.lang.awk ]

  NMT awk information [ who is NMT again? ]

      <http://www.nmt.edu/tcc/help/lang/awk.html>
      <http://www.nmt.edu/bin/man?awk>
      <http://www.nmt.edu/bin/man?gawk>

Return
========================================================================

11. How do I report a bug in gawk?

This is described in great detail in the gawk documentation. In brief,

   1. Make sure what you've discovered is really a bug by checking
      the documentation and, if possible, comparing with nawk and mawk.

   2. Cut down the program and data to as small as possible a test
      case that will illustrate the bug.

   3. Send mail to <mailto:bug-gnu-utils@prep.ai.mit.edu>, with a carbon
      copy to <mailto:arnold@gnu.ai.mit.edu>.

   4. Do not just post in comp.lang.awk; Arnold's readership there is
      sporadic.


========================================================================

12. What's wrong with gawk on Digital's OSF/1?

The version of gawk shipped with OSF/1 is very old, based on gawk 2.14.
Get the current version from a GNU mirror near you, and if you still
have a problem, report it as per the directions in the gawk documentation.

Return
========================================================================

13. How can I access shell or environment variables in an awk script?

13.1 Environment variables in general

Answer 1:  Use "alternate quoting", e.g.

        awk -F: '$1 ~ /'"$USER"'/ {print $5}' /etc/passwd
                ^^^^^^^^*******^^^^^^^^^^^^^^

        any Unix shell will send the underlined part as one long
        argument (with embedded spaces) to awk, for instance:

        $1 ~ /bwk/ {print $5}

        Note that there may not be any spaces between the quoted
        parts.  Otherwise, you wouldn't end up a single, long script
        argument, because Unix shells break arguments on spaces
        (unless they are \d or in '' or "", as the example shows).

Answer 2:  RTFM to see if and how your awk supports variable definitions
           on the command line, e.g.,

        awk -F: -v name=$USER   '$1 ~ name {print $5}'  /etc/passwd

Answer 3:  RTFM if your awk can access enviroment vars.  Then perhaps

        awk -F:  '$1 ~ ("/" ENVIRON["USER"] "/")  {print $5}'  /etc/passwd

        Always remember for your /bin/sh scripts that it's easy to put
        things into the environment for a single command run:

        name=felix age=56 awk '... ENVIRON["name"] .....'

        this also works with ksh and some other shells.

The first approach is extremely portable, but doesn't work with awk
"-f" script files.  In that case, it's better to use a shell script
and stretch a long awk command argument in '...' across multiple lines
if need be.

Also note: /bin/csh requires a \ before an embedded newline, /bin/sh not.


13.2 Quoting

Quoting can be such a headache for the novice, in shell programming,
and especially in awk.  

    (see below for a verbose explanation of the first one, with 7 quotes)

    awk 'BEGIN { q="'"'"'";print "Never say can"q"t."; exit }'
    nawk -v q="'" 'BEGIN { print "Never say can"q"t."; exit }'
    awk 'BEGIN { q=sprintf("%c",39); print "Never say can"q"t."; exit }'
    awk 'BEGIN { q=sprintf("%c",39); print "Never say \"can"q"t.\""; exit }'
    [Others?]

and

    cat <<@@ > foo.awk
    { print "Never say can't." }
    @@
    awk -f foo.awk; rm foo.awk

But not:

    awk '{ q="\'"; print "Never say \"can"q"t.\""; exit }'


explanation of the 7-quote example:

note that it is quoted three different ways:

    awk 'BEGIN { q="'
                     "'"
                        '";print "Never say can"q"t."; exit }'

and that argument comes out as the single string (with embedded spaces)

    BEGIN { q="'";print "Never say can"q"t."; exit }

which is the same as

    BEGIN { q="'"; print "Never say can" q "t."; exit }
                          ^^^^^^^^^^^^^  ^  ^^
                          |           |  |  ||
                          |           |  |  ||
                          vvvvvvvvvvvvv  |  ||
                          Never say can  v  ||
                                         '  vv
                                            t.

which gets you

                          Never say can't.


13.3 ENVIRON and "env"|

   Modern versions of new awk (gawk, mawk, Bell Labs awk, any POSIX awk)
   all provide an array named ENVIRON. The array is indexed by environment
   variable name, the value is that variables value. For instance,
   ENVIRON["HOME"] might be "/home/joeuser".  To print out all the names
   and values, use a simple loop:

        for (i in ENVIRON)
                printf("ENVIRON[\"%s\"] = \"%s\"\n", i, ENVIRON[i])


   What if my awk doesn't have ENVIRON?

   Short answer, get a better awk. There are three freely available versions.

   Longer answer, you can use a pipe from the `env' or `printenv' commands,
   but this is less pretty, and may be a problem if the values contain
   newlines:

        # untested!
        while (("env" | getline line) > 0) {
                n = split(line, v, "=")
                # v[1] is variable name
                # v[2] is variable value
                if (n > 2)
                        # value contained "=", handle it ...
        }

13.4 exporting environment variables back to the parent process

   How can I put values into the environment of the program that
   called my awk program?

   Short answer, you can't. Unix ain't Plan 9, and you can't tweak
   the parent's address space.

   Longer answer, write the results in a form the shell can parse
   to a temporary file, and have the shell "source" the file after
   running the awk program.

        awk 'BEGIN { printf("NEWVAR='%s'\n", somevalue) }' > /tmp/awk.$$
        . /tmp/awk.$$        # sh/ksh/bash/pdksh/zsh etc

   Csh syntax left as an exercise for the reader.

Return
========================================================================

14. Is there an easy way to determine if you have oawk or nawk?

The following in a BEGIN rule will do the trick.

        if (ARGC == 0)
                # old awk
        else
                # new awk

Return
========================================================================

15. How can awk test for the existence of a file?

The best way is to simply try and read from the file.

        function exists(file,        dummy, ret)
        {
                if ((getline dummy < file) > 0)
                        # file exists and can be read
                        ret = 1
                else
                        ret = 0
                close(file)
                return ret
        }

[ I've read reports that earlier versions of mawk would write to stderr
as well as getline returning <0 -- is this still true? ]

Return
========================================================================

16. How many elements were created by split()?

when I do a split on a field, i.e.,
 
        split($1,x,"string")
 
   how can i find out how many elements x has (i mean other than testing
   for null string or doing a for (n in x) test)?

use the return value of split()

        n = split($1, x, "string")

Return
========================================================================

17. How do I use tolower() in the SunOS nawk?

I want to use the tolower() function with SunOS nawk, but all I get is

        nawk: calling undefined function tolower

The SunOS nawk is from a time before awk acquired the tolower() and
toupper() functions. Either use one of the freely available awks, or
write your own function to do it using index, substr, and gsub.

An example of such a function is in O'Reilly's _Sed & Awk_.

Return
========================================================================

98. Miscellaneous


========================================================================

99. Credits

I expect most of the information in this FAQ to be supplied by people
other than myself -- it's just going to work better that way.  The
newsgroup readers have a LOT more awk experience than I ever will
(unless I multiply myself by a few thousand, which is not legal with
today's tax laws).


These people have contributed to the well-being of the FAQ:

  arnold@gnu.ai.mit.edu (Arnold D. Robbins)
  100335.120@CompuServe.COM (James G. Walker)
  jland@worldnet.att.net (Jim Land)
  yuli.barcohen@telrad.co.il (Yuli Barcohen)
  johnd@mozart.inet.co.th (John DeHaven)
  amnonc@mercury.co.il (Amnon Cohen)
  saguyami@post.tau.ac.il (Shay)
  hankedr@mail.auburn.edu (Darrel Hankerson)
  mark@ispc001.demon.co.uk (Mark Katz)
  brennan@whidbey.com (Michael D. Brennan)
  neitzel@gaertner.de (Martin Neitzel)
  pjf@osiris.cs.uoguelph.ca (Peter Jaspers-Fayer)
  dmckeon@swcp.com (Denis McKeon)

Thanks.

========================================================================

thus endeth the awk FAQ.