From Newsgroup: comp.lang.forth
"Kerr-Mudd, John" <admin@127.0.0.1> writes:
Unicode?
Here's an example from Figure 1 of our 2005 paper [ertl&paysan05]:
: +L+E+c+o+L ." a+ea+#a+Oa+Ua+|a+Oa+Ua+ua+#" ;
cr +L+E+c+o+L
On Gforth-0.7.3 (from 2008), development Gforth, lxf, and vfx64 this prints
a+ea+#a+Oa+Ua+|a+Oa+Ua+ua+#
as intended. On iForth-5.1 mini this produces the intended output, but
pasting the code into iForth shows the code as:
FORTH> : N++N++N++N++N++N++N++N++N++N++ ." N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++N++" ; ok
FORTH> cr N++N++N++N++N++N++N++N++N++N++
SwiftForth 4.0RC89 shows the input as intended, but reports an error
on the definition and, consequently also in the use of the word. If I
instead write
: foo ." a+ea+#a+Oa+Ua+|a+Oa+Ua+ua+#" ;
foo
SwiftForth 4.0.0-RC89 produces the intended output (a+ea+#a+Oa+Ua+|a+Oa+Ua+ua+#).
In any case, several Forth systems handle Unicode (in UTF-8 form)
fine, including in word names. This is mainly due to the great design
of UTF-8, but it does answer your question.
The xchar wordset (an early version of which is proposed in the paper)
has been standardized in Forth-2012, but you actually rarely need to
use words from it.
I used xterm for this work, which showed the Hebrew text
left-to-right, while Emacs 28.2 shows the two input lines
right-to-left (including the parts that are in a left-to-right script;
mixing the two results in tough layout problems).
@InProceedings{ertl&paysan05,
author = {M. Anton Ertl and Bernd Paysan},
title = {Xchars or {Unicode} in {Forth}},
crossref = {euroforth05},
pages = {16--20},
url = {
https://www.complang.tuwien.ac.at/papers/ertl%26paysan05.ps.gz},
pdfurl = {
https://www.complang.tuwien.ac.at/anton/euroforth2005/papers/ertl%26paysan05.pdf},
OPTnote = {not refereed},
abstract = {When dealing with different scripts at the same time
(e.g., Latin, Greek, Cyrillic), or with Chinese
ideograms, 8-bit fixed-width characters are too
narrow. However, many Forth programs have an
environmental dependency on $\code{1 chars}=1$, so
just making Forth characters wider would cause quite
a lot of portability problems. We propose to add
xchars for dealing with potentially wider,
variable-width characters. This extension is
relatively painless, requiring changes in only those
program parts that work with individual characters,
if they should work with the extended characters;
uses of string words need no changes to work with
extended characters. The xchar words can also be
implemented on 8-bit-only Forth systems, so programs
written to use xchars can also work on such
systems.}
}
@Proceedings{euroforth05,
title = {21st EuroForth Conference},
booktitle = {21st EuroForth Conference},
year = {2005},
key = {EuroForth'05},
editor = {M. Anton Ertl},
url = {
https://www.complang.tuwien.ac.at/anton/euroforth2005/papers/proceedings.pdf}
}
- anton
--
M. Anton Ertl
http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs:
http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard:
https://forth-standard.org/
EuroForth 2025 CFP:
http://www.euroforth.org/ef25/cfp.html
EuroForth 2025 registration:
https://euro.theforth.net/
--- Synchronet 3.21a-Linux NewsLink 1.2