Subject: Re: Clean punctuation problem
From: rpw3@rpw3.org (Rob Warnock)
Date: Wed, 11 Apr 2007 21:36:50 -0500
Newsgroups: comp.lang.lisp
Message-ID: <dp6dnRflv9_fBoDbnZ2dnUVZ_jqdnZ2d@speakeasy.net>
Espen Vestre  <espen@vestre.net> wrote:
+---------------
| Szymon <delme.ssbm2@o2.pl> writes:
| > hm, there are/was computers with variable BYTE length.
| > There are/was computers with 4, 6, 7, 9, ... (?) bit bytes.
| > and so on.
| 
| You forgot 5 bits, which was/is used in TELEX communication.
| And TOPS-10 used five-bit bytes in its file name character set!
+---------------

Actually, TOPS-10 used *six*-bit bytes for filenames [uppercase
alpha, digits, and some punct, with 6 chars for name + 3 for type]
and system call names. SIXBIT used exactly the same characters as
ASCII codes #\Space through #\_ (underscore), but 32 less. That is,
0 was a SIXBIT #\Space, and 63 (#o77) was #\_. The following has
a table:

    http://nemesis.lonestar.org/reference/telecom/codes/sixbit.html

All filenames, that is, except on DECtape, where there was a
compressed encoding called RADIX50. The "50" in RADIX50 is actually
octal, #o50 or 40 decimal, so "RADIX50" is actually a radix-40 format,
and supports *only* uppercase alpha, digits, #\., #\$, #\%, and space
[meaning NUL]. The table here:

    http://nemesis.lonestar.org/reference/telecom/codes/radix50.html

has the correct character codes, but doesn't tell you how it
was packed into words. The RADIX50 code used on the PDP-11
packed three RADIX50 characters into 16 bits [I think]:
(+ (* 1600 c1) (* 40 c2) c3), where "c1" is the leftmost of
the three. So "A2" would be encoded as 17720 (#x4538). On the
PDP-10, you could pack six RADIX50 characters into 32 bits,
leaving 4 bits for some flags they just *had* to squeeze
into the PDP-10 DECtape directories for some arcane reason
that escapes me at the moment...  ;-}

ISTR that RADIX50 was also used inside the PDP-10 assembler,
MACRO-10, and in the linker's symbol table format, again because
you could pack six chars & those extra four flag bits into a
36-bit word.



-Rob

p.s. For those not familiar, the DEC PDP-10 hardware provided
byte access (including auto-incrementing) for all sizes of
bytes from 1 through 36 bits. The only restriction was that
bytes had to fit within the 36-bit PDP-10 words. When an even
number couldn't so fit, the remaining bits were "wasted". E.g.,
the most common format for English plaintext was 7-bit ASCII,
stored five bytes per 36-bit word, with the least significant
bit of each word unused [except... well, that's another story].

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607