ECL implements all stream types described in ANSI. Additionally,
when configured with option --enable-clos-streams
, ECL
includes a version of Gray streams where any object that implements the
appropiate methods (stream-input-p
,
stream-read-char
, etc) is a valid argument for the
functions that expect streams, such as read
,
print
, etc.
ECL distinguishes between two kinds of streams: character streams and byte streams. Character streams only accept and produce characters, written or read one by one, with write-char
or read-char
, or in chunks, with write-sequence
or any of the Lisp printer functions. Character operations are conditioned by the external format, as described in Section 19.1.3
ANSI Common Lisp also supports binary streams. Here input and output is performed in chunks of bits. Binary streams are created with the function open
passing as argument a subtype of integer and the implementation is free to round up that integer type to the closest size it supports. In particular ECL rounds up the size to a multiple of a byte. For example, the form (open "foo.bin" :direction :output :element-type '(unsigned-byte 13))
, will open the file foo.bin
for writing, using 16-bit words as the element type.
An external format is an encoding for characters that maps character codes to a sequence of bytes, in a one-to-one or one-to-many fashion. External formats are also known as "character encodings" in the programming world and are an essential ingredient to be able to read and write text in different languages and alphabets.
ECL has one of the most complete supports for external formats, covering all of the usual codepages from the Windows and Unix world, up to the more recent UTF-8, UCS-2 and UCS-4 formats, all of them with big and small endian variants, and considering different encodings for the newline character.
However, the set of supported external formats depends on the size of the space of character codes. When ECL is built with Unicode support (the default option), it can represent all known characters from all known codepages, and thus all external formats are supported. However, when ECL is built with the restricted character set, it can only use one codepage (the one provided by the C library), with a few variants for the representation of end-of-line characters.
In ECL, an external format designator is defined recursively as either a symbol or a list of symbols. The grammar is as follows
external-format-designator := symbol | ( {symbol}+ )
and the table of known symbols is shown below. Note how some symbols (:cr
, :little-endian
, etc) just modify other external formats.
Table 19.1. Stream external formats
Symbols | Codepage or encoding | Unicode required |
---|---|---|
:cr | #\Newline is Carriage Return | No |
:crlf | #\Newline is Carriage Return followed by Linefeed | No |
:lf | #\Newline is Linefeed | No |
:little-endian | Modify UCS to use little endian encoding. | No |
:big-endian | Modify UCS to use big endian encoding. | No |
:utf-8 ext:utf8 | Unicode UTF-8 | Yes |
:ucs-2 ext:ucs2 ext:utf-16 ext:utf16 ext:unicode | UCS-2 encoding with BOM. | Yes |
:ucs-2le ext:ucs2le ext:utf-16le | UCS-2 with big-endian encoding | Yes |
:ucs-2be ext:ucs2be ext:utf-16be | UCS-2 with big-endian encoding | Yes |
:ucs-4 ext:ucs4 ext:utf-32 ext:utf32 | UCS-4 encoding with BOM. | Yes |
:ucs-4le ext:ucs4le ext:utf-32le | UCS-4 with big-endian encoding | Yes |
:ucs-4be ext:ucs4be ext:utf-32be | UCS-4 with big-endian encoding | Yes |
ext:iso-8859-1 ext:iso8859-1 ext:latin-1 ext:cp819 ext:ibm819 | Latin-1 encoding | Yes |
ext:iso-8859-2 ext:iso8859-2 ext:latin-2 ext:latin2 | Latin-2 encoding | Yes |
ext:iso-8859-3 ext:iso8859-3 ext:latin-3 ext:latin3 | Latin-3 encoding | Yes |
ext:iso-8859-4 ext:iso8859-4 ext:latin-4 ext:latin4 | Latin-4 encoding | Yes |
ext:iso-8859-5 ext:cyrillic | Latin-5 encoding | Yes |
ext:iso-8859-6 ext:arabic ext:asmo-708 ext:ecma-114 | Latin-6 encoding | Yes |
ext:iso-8859-7 ext:greek8 ext:greek ext:ecma-118 | Greek encoding | Yes |
ext:iso-8859-8 ext:hebrew | Hebrew encoding | Yes |
ext:iso-8859-9 ext:latin-5 ext:latin5 | Latin-5 encoding | Yes |
ext:iso-8859-10 ext:iso8859-10 ext:latin-6 ext:latin6 | Latin-6 encoding | Yes |
ext:iso-8859-13 ext:iso8859-13 ext:latin-7 ext:latin7 | Latin-7 encoding | Yes |
ext:iso-8859-14 ext:iso8859-14 ext:latin-8 ext:latin8 | Latin-8 encoding | Yes |
ext:iso-8859-15 ext:iso8859-15 ext:latin-9 ext:latin9 | Latin-7 encoding | Yes |
ext:dos-cp437 ext:ibm-437 | IBM CP 437 | Yes |
ext:dos-cp850 ext:ibm-850 ext:cp850 | Windows CP 850 | Yes |
ext:dos-cp852 ext:ibm-852 | IBM CP 852 | Yes |
ext:dos-cp855 ext:ibm-855 | IBM CP 855 | Yes |
ext:dos-cp860 ext:ibm-860 | IBM CP 860 | Yes |
ext:dos-cp861 ext:ibm-861 | IBM CP 861 | Yes |
ext:dos-cp862 ext:ibm-862 ext:cp862 | Windows CP 862 | Yes |
ext:dos-cp863 ext:ibm-863 | IBM CP 863 | Yes |
ext:dos-cp864 ext:ibm-864 | IBM CP 864 | Yes |
ext:dos-cp865 ext:ibm-865 | IBM CP 865 | Yes |
ext:dos-cp866 ext:ibm-866 ext:cp866 | Windows CP 866 | Yes |
ext:dos-cp869 ext:ibm-869 | IBM CP 869 | Yes |
ext:windows-cp932 ext:windows-932 ext:cp932 | Windows CP 932 | Yes |
ext:windows-cp936 ext:windows-936 ext:cp936 | Windows CP 936 | Yes |
ext:windows-cp949 ext:windows-949 ext:cp949 | Windows CP 949 | Yes |
ext:windows-cp950 ext:windows-950 ext:cp950 | Windows CP 950 | Yes |
ext:windows-cp1250 ext:windows-1250 ext:ms-ee | Windows CP 1250 | Yes |
ext:windows-cp1251 ext:windows-1251 ext:ms-cyrl | Windows CP 1251 | Yes |
ext:windows-cp1252 ext:windows-1252 ext:ms-ansi | Windows CP 1252 | Yes |
ext:windows-cp1253 ext:windows-1253 ext:ms-greek | Windows CP 1253 | Yes |
ext:windows-cp1254 ext:windows-1254 ext:ms-turk | Windows CP 1254 | Yes |
ext:windows-cp1255 ext:windows-1255 ext:ms-hebr | Windows CP 1255 | Yes |
ext:windows-cp1256 ext:windows-1256 ext:ms-arab | Windows CP 1256 | Yes |
ext:windows-cp1257 ext:windows-1257 ext:winbaltrim | Windows CP 1257 | Yes |
ext:windows-cp1258 ext:windows-1258 | Windows CP 1258 | Yes |