Expanding the MC6847: Deciphering Fonts

Searching for Information

As I was trying to decipher how to leverage an external character font ROM with the MC6847, I searched for existing project/products in this space. My web searches first found Ed Snider’s (Zippster of ZippsterZone) reproduction of the Green Mountain Micro (GMM) “lowerkit” product from “back in the day”. Ed shared a bit of technical detail and project progress, but no schematics or additional information that might answer questions on how the circuit worked. I also found a reproduction of the lowercase daughterboard for the Dragon 200e (which also works for the Dragon 32/64) produced by DragonPlus Electronics. Like with Ed’s page, DragonPlus offered some technical information (build instructions) but no schematic. Still, I had a start.

After changing some search terms, I found a few downloads that purported to be for the “lowerkit” product on a TRS-80 Model 1 web site. Confused why files for a CoCo product were on a web archive for a different machine, I still downloaded the files, and searching for those same filenames found related files on a few other web archives. At the end, I found:

Initial Results

Extracting all of the files, I found a number of binary files in the archives, but the size seemed too big (3kB), whereas the datasheet suggested a 2kB (12 address lines) EPROM.

Loading the files into a hex editor suggested font data, so I grabbed a quick file reading C program template from the Internet and modified it to spit out the data as a text-based bitmap file, with “*” for a set bit, and ‘ ‘ for a clear one.

Running chgmath.bin through my simple program provided encouraging results:

Byte: 0000 05 '     * *'
Byte: 0001 06 ' ** '
Byte: 0002 43 ' * **'
Byte: 0003 48 ' * * '
Byte: 0004 47 ' * ***'
Byte: 0005 4D ' * ** *'
Byte: 0006 41 ' * *'
Byte: 0007 54 ' * * * '
Byte: 0008 01 ' *'
Byte: 0009 00 ' '
Byte: 000A 00 ' '
Byte: 000B 52 ' * * * '
Byte: 000C 00 ' '
Byte: 000D 00 ' '
Byte: 000E 08 ' * '
Byte: 000F 14 ' * * '
Byte: 0010 14 ' * * '
Byte: 0011 22 ' * * '
Byte: 0012 22 ' * * '
Byte: 0013 41 ' * *'
Byte: 0014 7F ' *******'
Byte: 0015 00 ' '
Byte: 0016 00 ' '
Byte: 0017 00 ' '
Byte: 0018 00 ' '
Byte: 0019 00 ' '
Byte: 001A 00 ' '
Byte: 001B 00 ' '
Byte: 001C 7F ' *******'
Byte: 001D 41 ' * *'
Byte: 001E 20 ' * '
Byte: 001F 10 ' * '
Byte: 0020 08 ' * '
Byte: 0021 10 ' * '
Byte: 0022 20 ' * '
Byte: 0023 41 ' * *'
Byte: 0024 7F ' *******'
Byte: 0025 00 ' '
Byte: 0026 00 ' '
Byte: 0027 00 ' '
Byte: 0028 00 ' '
Byte: 0029 00 ' '
Byte: 002A 00 ' '
Byte: 002B 00 ' '
Byte: 002C 06 ' ** '
Byte: 002D 09 ' * *'
Byte: 002E 08 ' * '
Byte: 002F 08 ' * '
Byte: 0030 08 ' * '
Byte: 0031 08 ' * '
Byte: 0032 08 ' * '
Byte: 0033 08 ' * '
Byte: 0034 08 ' * '
Byte: 0035 08 ' * '
Byte: 0036 48 ' * * '
Byte: 0037 30 ' ** '
Byte: 0038 00 ' '
Byte: 0039 00 ' '
Byte: 003A 00 ' '
Byte: 003B 00 ' '
...

The first 2 bytes of each file were 0x05, 0x06, and the next few bytes looked like ASCII (and they are. The bytes above spell out CHGMAT, which looks like a 6 byte filename). There 4 more bytes that don’t look like font data, and then font glyph bitmap data in 16 byte chunks.

That answers one question about the datasheet schematic. The row counter address lines are the lowest 4 bits of the font ROM address. I thought they were, since it makes creating the ROM easier (all of the data for each character glyph’s bitmap resides in contiguous memory locations), but it “wastes” 4 bytes per font glyph (the lower 4 bits of the address will never go above 11, according to the datasheet, but the next bitmap will be 16 bytes above the previous one.

Looking at the rest of the files, some have odd data into bytes 12,13,14,15 of each character glyph, and there’s extra bytes in the file every so often, but I felt like I had made some progress. I decided to create a GitHub repository at https://github.com/go4retro/lowerkit and place my current progress in the charsets directory. I took a stab at trying to clean up the parsing functions and remove some of the extraneous data in the file, noticing that 4 extra bytes appeared every $102 bytes in the file. Theorizing those were some type of “sector” or “block” structure, I parsed them out of the font stream and the results looked even better, save for some oddities at the end of each file.

Deciphering the File Format

Since the files were in a TRS80 Model 1 archive, I reached out to the TANDY Discord server, relating the oddity about control characters appearing every $102 bytes in the file. George Phillips (of TRS80GP emulator fame) responded that the files was encoded in the TRSDOS /CMD format, and that his TRLD utility could parse the files. I grabbed the utility, followed his directions, and the resulting binary files appeared!

The only issue was that they didn’t appear alone. The TRLD utility extracted the data, but saved the ROM data in what appeared to be a larger memory dump, which makes sense, given how the file format works (it has blocks of data, which contain a load address and length, and they could appear anywhere. George’s utility takes the safe approach and parses the file into a blank memory map, and then saves it all to disk. However, since these font ROM file blocks are contiguous, there’s no need to save off the entire map.

To be fair, George noted it would do this, and suggested I could either look at the first block’s load address in the file (using the diagnostic dump parameter of the utility) and then strip off that many bytes from the file, or dig into the trld source and extract the parse engine to roll my own. While I was debating which to do, I found a blog posting about parsing CMD files from Jim Lawless, a friend from many years back, complete with source code.

Jim’s utility didn’t parse the data per se, but it did parse the structure, and the source was available, so I just lifted my functions from my quick utility and inserted them into this new source. The resulting output was clean, so I then added in an optional parameter to save off the binary data as a contiguous file, suitable for burning to an EPROM.

The LowerKit Fonts

To make it easier to compare fonts and allow a better way to show what’s in them, I modified one of my simple utilities to brute force create some SVG files, shown below. I apologize for nothing, it was simply a means to an end 🙂

CHGAPL

CHGAPL Font

CHGEURO

CHKGREEK

CHGKATA

CHGKAT2

Internal name is KATA, just like the one above

CHGMATH

CHGSCR

CHGSMALL

CHGNEW

The SVG images of the fonts are at: https://github.com/go4retro/lowerkit/tree/main/charsets/drawsvg and font binaries are at: https://github.com/go4retro/lowerkit/tree/main/charsets/mkroms.

I extracted all of the binary files from the various archives, but it looks like some are duplicates:

  • CHGAPL2 -> CHGAPL
  • LITTLE2->SMALL

Discovering More Fonts!

While working on the font files themselves, I looked at the BAS files in the various archives, which appear to be Model 1 BASIC files. However, after perusing a few of them, I noticed it appeared they hold a copy of a font file within the BASIC program. I quickly copied out the BAS files, trimmed off everything except the BASIC ROM data, and compiled a second parser to make sense of this data. CHGEURO.BAS, for example, contains data looking like this:

XXXXXXX,XXXXX,XXXXXXX,XXX,XXXXXX,XX,XXXXX,XXXXX,XX,XXXXXXX,XXXXXXX,XXXXXXX ...

7 (and sometimes 8) bits of font data, a comma, next row, etc. Every so often, there’s 6 extra bytes of data that are not font bitmap data, so I had to decipher that. But, it was not hard, and that data above became this:

'       '
' * * '
' '
' **** '
' * '
' ***** '
'* * '
'* * '
' **** *'
' '
' '
' ' Ignoring 00, 79, 6B, 6E, 00, 88
' * '
' * '
' '
' **** '
' * '
' ***** '
'* * '
'* * '
' **** *'
' '
' '
' ' Ignoring 00, DE, 6B, 78, 00, 88

I have a feeling the extra data is internal BASIC linkage information, but it was easy to discard.

Most of the files match up to the font data files I parsed, but there were a few additional ones:

CHGDEVLR

CHGDEVL2

Fonts Left to Find

In the manual, it references:

  • ASCII Shifted with Greek (I think that’s CHGGREEK above)
  • ASCII Shifted with Kata Kana (CHGKATA)
  • ASCII Shifted with APL Character Set (CHGAPL)
  • Kata Kana Shifted with ASCII (CHGKAT2)
  • ASCII Shifted with Cyrillic
  • ASCII Shifted with Arabic
  • ASCII Shifted with Hebrew
  • ASCII Shifted with French Characters
  • ASCII Shifted with European character variants (CHGEURO)
  • ASCII Shifted with MAth symbols and operators (CHGMATH)

So, it looks like a few are left to find. Let me know if you own or have archived the Model 1 programs that would include these files, or if you have the ROM images (or ROMs, I can image them here)