A truly International Standard
Peter J. Knaggs
Computing and Information Systems,
University of Paisley, High Street,
Paisley, Scotland. PA1 2BE
pjk@bcs.org.uk
Abstract
When ANS Forth was developed a number of issues where unaddressed. The ability
to provide multilingual programs being one of them. This is understandable, as
they where developing an American Standard. When this standard was
accepted by the ISO as an international standard they have asked the ANS Forth
committee to address the issues of international programming.
1. Background
When the American National Standards (ANS) X3/J14 committee where considering
the ANS Forth standard (X3.215-1994) they took a very "open"
attitude to this. When the final document was accepted in 1994 it was probably
the most openly discussed standard to date, and most defiantly the most openly
discussed Forth standard. There are a number of areas the committee did not
address itself to, one of these being 'internationalisation'. The
committee did receive at least one proposal on this matter, but correctly
decided that they where dealing with an American standard and thus need not
open this particular can of worms.
In 1997 the International Organisation for Standardisation (ISO) and
International Electrotechnical Commission (IEC) accepted ANS Forth as an
"International Standard" (ISO/IEC 15145:1997(E)). When accepting the
standard the British Standards Institute (BSI) made comments concerning the
lack of internationalisation and has asked the X3/J14 technical committee (TC)
to address this issue when it reconvenes in 1998. To quote from the
"Disposition of Comments Report" (International Organization
for Standardization 1997):
Disposition of Comments Report
ISO/IEC DIS 15145 - Programming Language Forth
The editorial comments from the Netherlands and the ISO Central
Secretariat have been incorporated in the revised DIS.
The comments from the United Kingdom concerning Internationalization
issues and requirements for embedded systems programmed in Forth will
be addressed when the TC reconvenes in 1998. At that time, we will
consider these and other needs that have arisen in this evolving
technology.
This paper looks at some of the issues that will need to be addressed in the
forthcoming TC meeting. A topic of interest to the European Forth community
(Borrell 1990; Nelson 1995; Nelson and Shermer 1996).
2. Issues
There are in essence four areas of regional variance.
2.1. Date Format
There are two separate date formats the need to be specified, the 'short' and
'long' formats. There are three areas of difference for the 'short' date
format:
- The date separator - the character that is used to separate the
day, month and year fields.
- The order of the day, month and year fields.
- The size of the three fields. For the day and month fields we need to
know if the leading zero should be displayed. For the year field it
is necessary to know if the century should be displayed or not.
Some countries prefer the month to be displayed as a three letter abbreviation
rather than a number. This option has not been catered for in the above
description.
In the 'long' format the names of the day and month (in both abbreviated and
full form) may be displayed, and are thus subject to language differences.
There are five additional formatting considerations:
- Whether the day of month is displayed, if so then whether it is shown
with or without a leading zero.
- Whether the day of week is shown, if so then in abbreviated or full form.
- Whether the month is shown as a number (with or without leading zero),
in abbreviated or full form.
- Whether the year is shown in full (with century) or abbreviated (without
century) form.
- The order of the fields (day of week, day of month, month and year)
including any separating text, such as commas or spaces.
2.2. Time Format
The time format is the least complex of all the variants to be considered.
In many ways it is similar to the short date format. There are only five
formatting considerations to be taken into account.
- The time separator - the character that is used to separate the
hour, minute and seconds fields.
- The order of the hour, minute and seconds fields.
- Whether the time is displayed as 24- or 12-hour clock.
- The pre- and post- meridian indicators.
- When using the 12-hour clock, whether the meridian indicator is before
or after the time.
Technically we should allow for the hour, minute and seconds fields to be
displayed with or without a leading zero, however no county (to the authors
knowledge) allows the minute and seconds fields to be displayed without a
leading zero. The hours field is normally dependent on the method of display,
that is to say in the 12-hour clock the leading zero is not displayed, while
in the 24-hour clock it is.
2.3. Number Format
There are a surprising number of formatting considerations that need to be
taken into account when displaying a number. First let us consider the
unsigned integer:
- The group separator - the character that is used to separate
large numbers into smaller groups for ease of reading.
- The size of the groupings.
Whilst is has become traditional in the computing world to display large
numbers without thought of formatting, it is traditional in the non-computing
world to group large numbers into smaller groupings in order to ease reading.
For example, in the UK it is traditional to provide groupings of three digits
separated by a comma:
Computing Display: | 4294967295
|
UK Non-computing Display: | 4,294,967,295
|
For smaller numbers this is not an issue (numbers <10000). For larger
numbers it becomes important that the user (not necessarily computer literate)
can interpret the number correctly at a glance, thus the grouping becomes an
issue.
Let us now turn to the signed integer, in addition to the considerations for
the unsigned integer, there are:
- The positive and negative sign symbols.
- Whether the sign symbol is displayed before or after the number.
- Whether there is a space between the sign symbol and the number.
- Whether the positive symbols is displayed for positive numbers.
Note that some countries allow negative numbers to be displayed in parentheses
(i.e., "(n)"). This option is not accounted for in the above
description.
Moving on to floating point numbers, the following considerations are added to
those for a signed integer:
- The decimal separator.
- Whether a leading zero should be displayed for numbers less than 1 or
not.
This makes the assumption that the least number of significant digits are
shown in the fractional part of the number. Indeed if the fractional part is
zero then it not shown at all. I.e., that "1.50" is displayed as
"1.5" and that "1.0" is displayed as "1". This
assumption may be overcome by specifying a minimum number of decimal places.
Thus the set of considerations necessary to correctly display a (signed
floating point) number are:
- The group separator.
- The size of the groupings.
- The positive and negative sign symbols.
- Whether the sign symbol is displayed before or after the number.
- Whether there is a space between the sign symbol and the number.
- Whether the positive symbols is displayed for positive numbers.
- The decimal separator.
- Whether a leading zero should be displayed for numbers less than 1 or
not.
- No of decimal places (optional).
Note that in the UK the group and decimal separators are the comma and decimal
point respectively. In many European countries these are reversed.
2.4. Currency Format
Most if not all currency systems (currently known to the author) are now based
on the decimal system, thus the currency format automatically inherits the
nine formatting considerations for floating point numbers in addition to
having its own considerations:
- The Currency Symbol.
- Whether the currency symbol is displayed before or after the value.
- Whether there is a space between the currency symbol and the value.
- Whether the sign appears before or after the currency symbol.
Some countries allow negative currency values to be displayed in parentheses
(I.e., "(£n)"), this option is not accounted for in
this description.
Given these thirteen formatting considerations (one could even add a
"negatives in parentheses" option, making it fourteen
considerations) it should be possible to display any decimal based currency in
the format of the target country.
3. Addressing the Issues
A quick survey of existing systems show two fundamental methods of addressing
these issues. In Unix/Posix (Sun Microsystems, Inc. 1994) the
environ
function returns a data structure which holds the setting
for each configuration option for a given country. The number of configuration
options provided for are more limited than those outline in section 2. In
Windows ('95 and NT) (Microsoft 1995) and OS/2 (IBM
1994) a set of special display routines are provided to display the
date/time/number/currency in the required format. These routines take all of
the identified considerations into account when displaying the object.
In this section a method of addressing these issues is presented which lies
between these two extremes.
There are a number of ways the interface could be implemented. One could
define the display words as deferred words, resolving to the actual display
routine once the location has been selected. The display words could access a
jump table to access the correct display routine for the given country code.
One could define the words to operate based on an internal data structure,
somewhat like the Unix system, changing the values in the data structure when
the country location is set. Alternatively, one could extend the
ENVIRONMENT?
table to include the fourteen considerations, and
define the display words to operate off this table.
The two table methods have the advantage of allowing for "user
defined" environment, that may not have been taken into account in the
pre-configured country settings. Provided that the configuration options
(formatting considerations) are accessible to the user/programmer.
3.1. Setting Location
A "country indicator" is used to identify the country concerned.
This can be either a country code (such as 44
for the UK) or a
language string ("en-GB
" for "English [UK]").
The language string is more flexible, and allows for Dutch speaking Belgium
(nl-BE
) and French speaking Belgium (fr-BE
).
However, it may not be as specific as the numerical codes. Both the country
code and the language string systems are supported by ISO standards.
A word is also provided to extract the current country setting in order to
allow the application to provide user messages in the appropriate language.
(See Borrell (1990) or Nelson (1995) for comments on multilingual
programming.)
SET-COUNTRY
( country - )
- This will take a country indicator and set the display words to display
their object in the default format required for the indicated country.
GET-COUNTRY
( - country )
- Will return the country indicator the system is currently using.
3.2. Date/Time Format
A number of new words are defined to take a parameter (the date or time) and
display this in a country specific manner. The parameter may be a day, month,
year or a hour, minute, second set or a Julian date/time value.
Using a Julian date/time value will allow the system to produce a single value
to indicate the date/time of any item. The date/time display words would
interpret this value for display using the current country settings. However,
a number of additional Julian value handling words would need to be
introduced.
Alternatively, using the day, month, year or the hour, minute, second sets
will allow direct interfacing to the current TIME&DATE
word.
Thus the sequence:
TIME&DATE .DATE SPACE .TIME
will display the current date/time in the format required by the current
country setting.
.DATE
( day month year - ) or ( Julian - )
- This can be defined to take either a Julian date/time value or a day,
month, year set and display the date in the current 'short' date format.
.FULLDATE
( day month year - ) or ( Julian - )
- Like
.DATE
this can be defined to take either a Julian
date/time value or a day, month, year set and will display the given date in
the current 'long' date format.
.TIME
( hour minute second - ) or ( Julian - )
- Take a hour, minute, second set or a Julian date/time value and display
the given time using the current format.
3.3. Number Format
Either the current numeric display words (.
, .R
,
U.
, U.R
, D.
, D.R
,
F.
) should be modified to display the number according to the
format options or, a more viable alternative would be to, define a number of
new words to display numbers in a country specific manner:
.NUM
( n - )
- Display the signed integer n in a country specific manner.
.RIGHT
( n1 n2 - )
- Display the signed integer n1 in a country specific
manner, right aligned in a field n2 characters wide.
U.NUM
( u - )
- Display the unsigned integer u in a country specific manner.
U.RIGHT
( u n - )
- Display the unsigned integer u in a country specific manner, right
aligned in a field n characters wide.
D.NUM
( d - )
- Display the signed double number d in a country specific manner.
D.RIGHT
( d n - )
- Display the signed double number d in a country specific manner,
right aligned in a field n characters wide.
.FLOAT
( F: r - ) or ( r - )
- Display the floating point number r in a country specific manner.
Note that in all cases the number is automatically displayed in decimal no
matter the value of BASE
. The BASE
variable is not
effected by any of these words.
Where a number is to be right aligned, if the number of characters required to
display the number is greater than the number of characters provided then all
digits of the number are displayed with no leading spaces in a field as wide
as necessary.
3.4. Currency Format
In an attempt to provide a single method of handling all currencies, the
currency value should be an integer value treated as a fixed point number
with the least significant digit representing the smallest currency unit.
I.e., 1 indicates 1 pence, while 100 represents £1 (100 pence).
With this definition in mind we have the following word definitions:
$.
( n - )
- Display the signed number n as a fixed point currency value in a
country specific manner.
$.RIGHT
( n1 n2 - )
- Display the signed number n1 as a fixed point currency
value in a country specific manner, right aligned in a field
n2 characters wide.
D$.
( d - )
- Display the signed double number d as a fixed point currency value
in a country specific manner.
D$.RIGHT
( d n - )
- Display the signed double number d as a fixed point currency value
in a country specific manner, right aligned in a field n characters
wide.
F$.
( F: r - ) or ( r - )
- Display the floating point number r as a currency value in a country
specific manner.
F$.RIGHT
( n - ) ( F: r - )
or ( r n - )
- Display the floating point number r as a currency value in a country
specific manner, right aligned in a field n characters wide.
4. Summary
Section 1 provides an introduction to the problem of internationalisation and
the Forth standard's position relating international or multilingual
programming. Section 2 identifies some twenty seven formatting considerations
required to support the provision of multilingual application. These relate to
the display of the date, time, numbers, and currency.
Section 3 presents a method of addressing the issues identified in the previous
section. There are a number of points of interest raised in this section.
- There are a number of different methods available to implement the
interface outlined in section 3, however, the preferred method is the
use of a table of the formatting options. The table can be loaded with
pre-defined defaults for a set of known countries. However, if the
table is available to the user, it would be possible to define
additional environments that are not catered for in the pre-defined
setting.
- A country code is required to identify a set of pre-defined values. This
may be either a country code (
44
for the UK) or a
language/region string ("en-GB
" for "English
[UK]"). Both methods are supported by ISO definitions. It is not
known which is more specific.
- The concept of using a Julian date/time value to indicate a date/time is
assumed in section 3.2. This would allow a system to provide a single
value to indicate the date/time and provides a flexible method of
handling date/time that is totally system independent.
It should be noted that the introduction of a Julian date/time value
will require a the definition of a number of additional Julian value
handling words.
Section 3.2 provide an alternative definition of the date/time display
words based on the current system without the Julian value facility.
- Currency is defined to be represented as a fixed point integer value,
with the least significant value representing the smallest unit of
currency. Two words are defined to convert an integer value into the
currency display required for the given country.
A currency display word is also defined which takes a floating point
number and displays this as a currency in the format required by the
current country setting. This allows floating point numbers to be used
as an alternative to the fixed point integer representation.
References
- Borrell, R. (1990, September).
- Deferred Language Translation.
In euroFORML'90 Conference Proceedings, MPE Ltd, 133 Hill Lane,
Southampton SO1 5AF UK. Forth Interest Group.
- IBM (1994).
- User's Guide to OS/2 Warp.
International Business Machines Corporation.
- ISO (1997).
- Disposition of Comments Report:
ISO/IEC DIS 15145 - Programming Language Forth.
comp.lang.forth
.
- Microsoft (1995).
- On-Line Manuals. Microsoft.
- Nelson, N. J. (1995, October).
- Internationalization of Windows applications using Forth.
In euroForth '95 Conference Proceedings,
DELTA-t, Adenauerallee 54, D-20097 Hamburg, GERMANY.
Forth Interest Group.
- Nelson, N. J. and K. Shermer (1996, October).
- Far eastern Forth.
In euroForth '96 Conference Proceedings,
DELTA-t, Adenauerallee 54, D-20097 Hamburg, GERMANY.
Forth Interest Group.
- Sun Microsystems, Inc. (1994).
- On-Line Manuals.
Sun Microsystems, Inc.