Expanding the MC6847: Deciphering Fonts

Searching for Information

As I was trying to decipher how to leverage an external character font ROM with the MC6847, I searched for existing project/products in this space. My web searches first found Ed Snider’s (Zippster of ZippsterZone) reproduction of the Green Mountain Micro (GMM) “lowerkit” product from “back in the day”. Ed shared a bit of technical detail and project progress, but no schematics or additional information that might answer questions on how the circuit worked. I also found a reproduction of the lowercase daughterboard for the Dragon 200e (which also works for the Dragon 32/64) produced by DragonPlus Electronics. Like with Ed’s page, DragonPlus offered some technical information (build instructions) but no schematic. Still, I had a start.

After changing some search terms, I found a few downloads that purported to be for the “lowerkit” product on a TRS-80 Model 1 web site. Confused why files for a CoCo product were on a web archive for a different machine, I still downloaded the files, and searching for those same filenames found related files on a few other web archives. At the end, I found:

Initial Results

Extracting all of the files, I found a number of binary files in the archives, but the size seemed too big (3kB), whereas the datasheet suggested a 2kB (12 address lines) EPROM.

Loading the files into a hex editor suggested font data, so I grabbed a quick file reading C program template from the Internet and modified it to spit out the data as a text-based bitmap file, with “*” for a set bit, and ‘ ‘ for a clear one.

Running chgmath.bin through my simple program provided encouraging results:

Byte: 0000 05 '     * *'
Byte: 0001 06 ' ** '
Byte: 0002 43 ' * **'
Byte: 0003 48 ' * * '
Byte: 0004 47 ' * ***'
Byte: 0005 4D ' * ** *'
Byte: 0006 41 ' * *'
Byte: 0007 54 ' * * * '
Byte: 0008 01 ' *'
Byte: 0009 00 ' '
Byte: 000A 00 ' '
Byte: 000B 52 ' * * * '
Byte: 000C 00 ' '
Byte: 000D 00 ' '
Byte: 000E 08 ' * '
Byte: 000F 14 ' * * '
Byte: 0010 14 ' * * '
Byte: 0011 22 ' * * '
Byte: 0012 22 ' * * '
Byte: 0013 41 ' * *'
Byte: 0014 7F ' *******'
Byte: 0015 00 ' '
Byte: 0016 00 ' '
Byte: 0017 00 ' '
Byte: 0018 00 ' '
Byte: 0019 00 ' '
Byte: 001A 00 ' '
Byte: 001B 00 ' '
Byte: 001C 7F ' *******'
Byte: 001D 41 ' * *'
Byte: 001E 20 ' * '
Byte: 001F 10 ' * '
Byte: 0020 08 ' * '
Byte: 0021 10 ' * '
Byte: 0022 20 ' * '
Byte: 0023 41 ' * *'
Byte: 0024 7F ' *******'
Byte: 0025 00 ' '
Byte: 0026 00 ' '
Byte: 0027 00 ' '
Byte: 0028 00 ' '
Byte: 0029 00 ' '
Byte: 002A 00 ' '
Byte: 002B 00 ' '
Byte: 002C 06 ' ** '
Byte: 002D 09 ' * *'
Byte: 002E 08 ' * '
Byte: 002F 08 ' * '
Byte: 0030 08 ' * '
Byte: 0031 08 ' * '
Byte: 0032 08 ' * '
Byte: 0033 08 ' * '
Byte: 0034 08 ' * '
Byte: 0035 08 ' * '
Byte: 0036 48 ' * * '
Byte: 0037 30 ' ** '
Byte: 0038 00 ' '
Byte: 0039 00 ' '
Byte: 003A 00 ' '
Byte: 003B 00 ' '
...

The first 2 bytes of each file were 0x05, 0x06, and the next few bytes looked like ASCII (and they are. The bytes above spell out CHGMAT, which looks like a 6 byte filename). There 4 more bytes that don’t look like font data, and then font glyph bitmap data in 16 byte chunks.

That answers one question about the datasheet schematic. The row counter address lines are the lowest 4 bits of the font ROM address. I thought they were, since it makes creating the ROM easier (all of the data for each character glyph’s bitmap resides in contiguous memory locations), but it “wastes” 4 bytes per font glyph (the lower 4 bits of the address will never go above 11, according to the datasheet, but the next bitmap will be 16 bytes above the previous one.

Looking at the rest of the files, some have odd data into bytes 12,13,14,15 of each character glyph, and there’s extra bytes in the file every so often, but I felt like I had made some progress. I decided to create a GitHub repository at https://github.com/go4retro/lowerkit and place my current progress in the charsets directory. I took a stab at trying to clean up the parsing functions and remove some of the extraneous data in the file, noticing that 4 extra bytes appeared every $102 bytes in the file. Theorizing those were some type of “sector” or “block” structure, I parsed them out of the font stream and the results looked even better, save for some oddities at the end of each file.

Deciphering the File Format

Since the files were in a TRS80 Model 1 archive, I reached out to the TANDY Discord server, relating the oddity about control characters appearing every $102 bytes in the file. George Phillips (of TRS80GP emulator fame) responded that the files was encoded in the TRSDOS /CMD format, and that his TRLD utility could parse the files. I grabbed the utility, followed his directions, and the resulting binary files appeared!

The only issue was that they didn’t appear alone. The TRLD utility extracted the data, but saved the ROM data in what appeared to be a larger memory dump, which makes sense, given how the file format works (it has blocks of data, which contain a load address and length, and they could appear anywhere. George’s utility takes the safe approach and parses the file into a blank memory map, and then saves it all to disk. However, since these font ROM file blocks are contiguous, there’s no need to save off the entire map.

To be fair, George noted it would do this, and suggested I could either look at the first block’s load address in the file (using the diagnostic dump parameter of the utility) and then strip off that many bytes from the file, or dig into the trld source and extract the parse engine to roll my own. While I was debating which to do, I found a blog posting about parsing CMD files from Jim Lawless, a friend from many years back, complete with source code.

Jim’s utility didn’t parse the data per se, but it did parse the structure, and the source was available, so I just lifted my functions from my quick utility and inserted them into this new source. The resulting output was clean, so I then added in an optional parameter to save off the binary data as a contiguous file, suitable for burning to an EPROM.

The LowerKit Fonts

To make it easier to compare fonts and allow a better way to show what’s in them, I modified one of my simple utilities to brute force create some SVG files, shown below. I apologize for nothing, it was simply a means to an end 🙂

CHGAPL

CHGAPL Font

CHGEURO

CHKGREEK

CHGKATA

CHGKAT2

Internal name is KATA, just like the one above

CHGMATH

CHGSCR

CHGSMALL

CHGNEW

The SVG images of the fonts are at: https://github.com/go4retro/lowerkit/tree/main/charsets/drawsvg and font binaries are at: https://github.com/go4retro/lowerkit/tree/main/charsets/mkroms.

I extracted all of the binary files from the various archives, but it looks like some are duplicates:

  • CHGAPL2 -> CHGAPL
  • LITTLE2->SMALL

Discovering More Fonts!

While working on the font files themselves, I looked at the BAS files in the various archives, which appear to be Model 1 BASIC files. However, after perusing a few of them, I noticed it appeared they hold a copy of a font file within the BASIC program. I quickly copied out the BAS files, trimmed off everything except the BASIC ROM data, and compiled a second parser to make sense of this data. CHGEURO.BAS, for example, contains data looking like this:

XXXXXXX,XXXXX,XXXXXXX,XXX,XXXXXX,XX,XXXXX,XXXXX,XX,XXXXXXX,XXXXXXX,XXXXXXX ...

7 (and sometimes 8) bits of font data, a comma, next row, etc. Every so often, there’s 6 extra bytes of data that are not font bitmap data, so I had to decipher that. But, it was not hard, and that data above became this:

'       '
' * * '
' '
' **** '
' * '
' ***** '
'* * '
'* * '
' **** *'
' '
' '
' ' Ignoring 00, 79, 6B, 6E, 00, 88
' * '
' * '
' '
' **** '
' * '
' ***** '
'* * '
'* * '
' **** *'
' '
' '
' ' Ignoring 00, DE, 6B, 78, 00, 88

I have a feeling the extra data is internal BASIC linkage information, but it was easy to discard.

Most of the files match up to the font data files I parsed, but there were a few additional ones:

CHGDEVLR

CHGDEVL2

Fonts Left to Find

In the manual, it references:

  • ASCII Shifted with Greek (I think that’s CHGGREEK above)
  • ASCII Shifted with Kata Kana (CHGKATA)
  • ASCII Shifted with APL Character Set (CHGAPL)
  • Kata Kana Shifted with ASCII (CHGKAT2)
  • ASCII Shifted with Cyrillic
  • ASCII Shifted with Arabic
  • ASCII Shifted with Hebrew
  • ASCII Shifted with French Characters
  • ASCII Shifted with European character variants (CHGEURO)
  • ASCII Shifted with MAth symbols and operators (CHGMATH)

So, it looks like a few are left to find. Let me know if you own or have archived the Model 1 programs that would include these files, or if you have the ROM images (or ROMs, I can image them here)

How Low Can You Go?

Understanding external character ROM functionality on the Motorola MC6847 Video Display Generator (VDG)

I’ll be honest, I rarely use lowercase characters on any of the classic computers I use. Back in the day, of course, I needed them when writing posts on BBS systems, composing school papers, and writing general correspondence (my handwriting was never great). Still, they were always there when I needed them.

Sometimes, they came stock with the system. Others needed a bit of help, usually with a character ROM swap or popping in an updated video IC. But, some required more complex upgrades, like adding lowercase to the TANDY Color Computer. Though later models offered the updated video IC option or a newly designed video IC (CoCo 3), early models took a different approach.

Up until the CoCo 3, the CoCo utilized variants of the Motorola MC6847 Video Display Generator, commonly called the VDG. The VDG was an early video display entry, with limited graphics capabilities, first offered in 1978. The original MC6847 offered only 64 characters comprised of uppercase alphanumeric characters and the requisite punctuation and related characters. A later T1 variant included lowercase characters but was not a drop-in replacement. Both variants utlized an 8 pixel wide by 12 pixel tall character matrix, but included only a 7×5 font in ROM.

However, both variants offered the ability to utilize an external character ROM, as shown on page 23 of the MC6847 datasheet:

C6847 External character ROM support

Notice the portion in the lower right. I will admit the connection from the Display memory data lines to the address lines of the character ROM address lines confused me for a while, but a quick inquiry on the 6502.org discussion forum saw Rob Finch clear it up, noting that the display memory would provide the character value to the ROM, and the Row Counter would then provide the 12 lines of the character font address. The buffers and boolean logic serves to disable the external ROM if either the internal ROM or the semigraphics mode is selected by the programmer.

Now understanding how the ROM was used, I turned my attention to the counter component at the bottom of the diagram. Page 13 of the datasheet showed more detail, but it was somewhat confusing to understand how it worked:

Row Counter

I understand the idea of the counter, but preloading it with 9 (P3 and P0 high, P2/P1 low) made no sense. For reference, *RP is Row Preset, while *FS is frame sync and *HS is Horizontal Sync. *HS will trigger once per line, while *FS will trigger once per frame, and *RP triggers very 12th *HS, BUT there is a note in the datasheet:

Mind you, all datasheets take a bit to understand, but the timing diagram showing the relationship among these 3 signals was hard to grasp from the technical document. I could pore over it some more, but I thought it might be better to see if there was an existing external ROM product sold back in the day I could find. And, of course, there were a few, but the first I found was Green Mountain Micro’s “lowerkit” solution, which provided lowercase characters (and other goodies) for the CoCo. The next step was the find out more information about this product and how it worked.