GAWK: Effective AWK Programming

A User's Guide for GNU Awk

Edition 3

March, 2001

Arnold D. Robbins


(1)

These commands are available on POSIX-compliant systems, as well as on traditional Unix based systems. If you are using some other operating system, you still need to be familiar with the ideas of I/O redirection and pipes.

(2)

Often, these systems use @command{gawk

(3)

All such differences appear in the index under the heading "differences between @command{gawk

(4)

GNU stands for "GNU's not Unix."

(5)

The terminology "GNU/Linux" is explained in the section Glossary.

(6)

Although we generally recommend the use of single quotes around the program text, double quotes are needed here in order to put the single quote into the message.

(7)

The `#!' mechanism works on Linux systems, systems derived from the 4.4-Lite Berkeley Software Distribution, and most commercial Unix systems.

(8)

The line beginning with `#!' lists the full file name of an interpreter to run and an optional initial command-line argument to pass to that interpreter. The operating system then runs the interpreter with the given argument and the full argument list of the executed program. The first argument in the list is the full file name of the @command{awk

(9)

In the C shell (@command{csh

(10)

On some very old systems, you may need to use `ls -lg' to get this output.

(11)

The `?' and `:' referred to here is the three-operand conditional expression described in section Conditional Expressions. Splitting lines after `?' and `:' is a minor @command{gawk

(12)

In other literature, you may see a character list referred to as either a character set, a character class or a bracket expression.

(13)

Use two backslashes if you're using a string constant with a regexp operator or function.

(14)

Experienced C and C++ programmers will note that it is possible, using something like `IGNORECASE = 1 && /foObAr/ { ... }' and `IGNORECASE = 0 || /foobar/ { ... }'. However, this is somewhat obscure and we don't recommend it.

(15)

At least that we know about.

(16)

In POSIX @command{awk

(17)

The @command{sed

(18)

Older versions of @command{gawk

(19)

The technical terminology is rather morbid. The finished child is called a "zombie," and cleaning up after it is referred to as "reaping."

(20)

The internal representation of all numbers, including integers, uses double-precision floating-point numbers. On most modern systems, these are in IEEE 754 standard format.

(21)

Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.

(22)

The POSIX standard is under revision. The revised standard's rules for typing and comparison are the same as just described for @command{gawk

(23)

The original version of @command{awk

(24)

In POSIX @command{awk

(25)

Some early implementations of Unix @command{awk

(26)

Thanks to Michael Brennan for pointing this out.

(27)

The C version of rand is known to produce fairly poor sequences of random numbers. However, nothing requires that an @command{awk

(28)

Computer generated random numbers really are not truly random. They are technically known as "pseudo-random." This means that while the numbers in a sequence appear to be random, you can in fact generate the same sequence of random numbers over and over again.

(29)

Unless you use the @option{--non-decimal-data

(30)

This is different from C and C++, where the first character is number zero.

(31)

This consequence was certainly unintended.

(32)

As this Info file was being finalized, we learned that the POSIX standard will not use these rules. However, it was too late to change @command{gawk

(33)

A program is interactive if the standard output is connected to a terminal device.

(34)

See section Glossary, especially the entries for "Epoch" and "UTC."

(35)

The GNU @command{date

(36)

Occasionally there are minutes in a year with a leap second, which is why the seconds can go up to 60.

(37)

As this is a recent standard, not every system's strftime necessarily supports all of the conversions listed here.

(38)

If you don't understand any of this, don't worry about it; these facilities are meant to make it easier to "internationalize" programs. Other internationalization features are described in @ref{Internationalization, ,Internationalization with @command{gawk

(39)

This is because ISO C leaves the behavior of the C version of strftime undefined and @command{gawk

(40)

This example shows that 0's come in on the left side. For @command{gawk

(41)

For some operating systems, the @command{gawk

(42)

Americans use a comma every three decimal places and a period for the decimal point, while many Europeans do exactly the opposite: 1,234.56 vs. 1.234,56.

(43)

Eventually, the @command{xgettext

(44)

This example is borrowed from the GNU gettext manual.

(45)

This is good fodder for an "Obfuscated @command{awk

(46)

Perhaps it would be better if it were called "Hippy." Ah, well.

(47)

This is very different from the same operator in the C shell, @command{csh

(48)

Not recommended.

(49)

Your version of @command{gawk

(50)

The effects are not identical. Output of the transformed record will be in all lowercase, while IGNORECASE preserves the original contents of the input record.

(51)

While all the library routines could have been rewritten to use this convention, this was not done, in order to show how my own @command{awk

(52)

@command{gawk

(53)

http://mathworld.wolfram.com/CliffRandomNumberGenerator.hmtl

(54)

ASCII has been extended in many countries to use the values from 128 to 255 for country-specific characters. If your system uses these extensions, you can simplify _ord_init to simply loop from 0 to 255.

(55)

It would be nice if @command{awk

(56)

This function was written before @command{gawk

(57)

It is often the case that password information is stored in a network database.

(58)

It also introduces a subtle bug; if a match happens, we output the translated line, not the original.

(59)

@command{wc

(60)

On some older System V systems, @command{tr

(61)

This program was written before @command{gawk

(62)

"Real world" is defined as "a program actually used to get something done."

(63)

On some very old versions of @command{awk

(64)

http://cm.bell-labs.com/who/bwk

(65)

This version is edited slightly for presentation. The complete version can be found in `extension/filefuncs.c' in the @command{gawk

(66)

Compiled programs are typically written in lower-level languages such as C, C++, Fortran, or Ada, and then translated, or compiled, into a form that the computer can execute directly.

(67)

http://www.validgh.com/goldberg/paper.ps

(68)

Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.


This document was generated on 23 October 2001 using the texi2html translator version 1.54.