Previous: , Up: Values   [Contents][Index]

#### 6.1.4 Conversion of Strings and Numbers

Strings are converted to numbers and numbers are converted to strings, if the context of the `awk` program demands it. For example, if the value of either `foo` or `bar` in the expression ‘foo + bar’ happens to be a string, it is converted to a number before the addition is performed. If numeric values appear in string concatenation, they are converted to strings. Consider the following:

```two = 2; three = 3
print (two three) + 4
```

This prints the (numeric) value 27. The numeric values of the variables `two` and `three` are converted to strings and concatenated together. The resulting string is converted back to the number 23, to which 4 is then added.

If, for some reason, you need to force a number to be converted to a string, concatenate that number with the empty string, `""`. To force a string to be converted to a number, add zero to that string. A string is converted to a number by interpreting any numeric prefix of the string as numerals: `"2.5"` converts to 2.5, `"1e3"` converts to 1000, and `"25fix"` has a numeric value of 25. Strings that can’t be interpreted as valid numbers convert to zero.

The exact manner in which numbers are converted into strings is controlled by the `awk` built-in variable `CONVFMT` (see Built-in Variables). Numbers are converted using the `sprintf()` function with `CONVFMT` as the format specifier (see String Functions).

`CONVFMT`’s default value is `"%.6g"`, which creates a value with at most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, 17 digits is usually enough to capture a floating-point number’s value exactly.32

Strange results can occur if you set `CONVFMT` to a string that doesn’t tell `sprintf()` how to format floating-point numbers in a useful way. For example, if you forget the ‘%’ in the format, `awk` converts all numbers to the same constant string.

As a special case, if a number is an integer, then the result of converting it to a string is always an integer, no matter what the value of `CONVFMT` may be. Given the following code fragment:

```CONVFMT = "%2.2f"
a = 12
b = a ""
```

`b` has the value `"12"`, not `"12.00"`. (d.c.)

Prior to the POSIX standard, `awk` used the value of `OFMT` for converting numbers to strings. `OFMT` specifies the output format to use when printing numbers with `print`. `CONVFMT` was introduced in order to separate the semantics of conversion from the semantics of printing. Both `CONVFMT` and `OFMT` have the same default value: `"%.6g"`. In the vast majority of cases, old `awk` programs do not change their behavior. However, these semantics for `OFMT` are something to keep in mind if you must port your new-style program to older implementations of `awk`. We recommend that instead of changing your programs, just port `gawk` itself. See Print, for more information on the `print` statement.

And, once again, where you are can matter when it comes to converting between numbers and strings. In Locales, we mentioned that the local character set and language (the locale) can affect how `gawk` matches characters. The locale also affects numeric formats. In particular, for `awk` programs, it affects the decimal point character. The `"C"` locale, and most English-language locales, use the period character (‘.’) as the decimal point. However, many (if not most) European and non-English locales use the comma (‘,’) as the decimal point character.

The POSIX standard says that `awk` always uses the period as the decimal point when reading the `awk` program source code, and for command-line variable assignments (see Other Arguments). However, when interpreting input data, for `print` and `printf` output, and for number to string conversion, the local decimal point character is used. (d.c.) Here are some examples indicating the difference in behavior, on a GNU/Linux system:

```\$ export POSIXLY_CORRECT=1                        Force POSIX behavior
\$ gawk 'BEGIN { printf "%g\n", 3.1415927 }'
-| 3.14159
\$ LC_ALL=en_DK.utf-8 gawk 'BEGIN { printf "%g\n", 3.1415927 }'
-| 3,14159
\$ echo 4,321 | gawk '{ print \$1 + 1 }'
-| 5
\$ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print \$1 + 1 }'
-| 5,321
```

The ‘en_DK.utf-8’ locale is for English in Denmark, where the comma acts as the decimal point separator. In the normal `"C"` locale, `gawk` treats ‘4,321’ as ‘4’, while in the Danish locale, it’s treated as the full number, 4.321.

Some earlier versions of `gawk` fully complied with this aspect of the standard. However, many users in non-English locales complained about this behavior, since their data used a period as the decimal point, so the default behavior was restored to use a period as the decimal point character. You can use the --use-lc-numeric option (see Options) to force `gawk` to use the locale’s decimal point character. (`gawk` also uses the locale’s decimal point character when in POSIX mode, either via --posix, or the `POSIXLY_CORRECT` environment variable, as shown previously.)

Table 6.1 describes the cases in which the locale’s decimal point character is used and when a period is used. Some of these features have not been described yet.

FeatureDefault--posix or --use-lc-numeric
`%'g`Use localeUse locale
`%g`Use periodUse locale
InputUse periodUse locale
`strtonum()`Use periodUse locale

Table 6.1: Locale Decimal Point versus A Period

Finally, modern day formal standards and IEEE standard floating point representation can have an unusual but important effect on the way `gawk` converts some special string values to numbers. The details are presented in POSIX Floating Point Problems.

### (32)

Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.

Previous: , Up: Values   [Contents][Index]