Blog

Expanding the MC6847: Deciphering the Interface, Part 2

As noted in a previous post, along with finding that Green Mountain Micro (GMM) had created a lowercase modification for the CoCo called “lowerkit”, Eurohard SA also created a lowercase update for the Dragon 200e for the Spanish market, which I think was planned or actually produced as an official part of the Dragon 200e design. Currently, DragonPlus Electronics offers a reproduction of this solution in their store. As I mentioned previously, I did not find a schematic of this solution, though I did find pictures of the PCB on the World of Dragon archives. Since the lowerkit effort had been successful, I decided to reverse engineer the PCB from the pics on the site.

Now, unlike the “lowerkit” option, this PCB used the datasheet recommended 74LS161 counter, and contained fewer components, which made the work easier

The header and the 6847 IC were easy, as was the 74LS161 counter. Based on a note on the DragonPlus web site talking about their effort to create the reproduction, I determined the EPROM was a 2532. Looking at the 74LS244 buffer IC in light of the “lowerkit” design, I surmised that this design used the output enable line on the EPROM to disable it from the VDG data bus, meaning only the data coming from the header data pins needed to be buffered/disabled. This saves an IC over the dual 74ls157 option on “lowerkit”

Also knowing that this design put the counter as the top 4 bits of the EPROM address bus, most of this design was easy to reconstruct. Because the design was tailored to the Dragon, which included a MC6883 Synchronous Address Multiplexer (SAM), none of the MC6847 address lines were bridged to the header except DA0 (as they are not connected to anything on the motherboard. The design took shape, with the exception of the 74LS00 quad NAND gate.

Having the IC in the photo hindered the effort, as there were no traces going into/out of the IC on the top. The reverse shows a few, though, and there a few adjacent pins that appear to be bridged. Working back from how I thought the design should behave, I was able to reconstruct the 74LS00 wiring:

On the 74LS00 (top left IC, reversed pinout on bottom photo), we can follow the trace on pin 1 (top right of IC) to A*/S (Alpha Semigraphics). Likewise, pin 3 ends up at INT*/EXT, which makes sense. If we are requesting alphanumeric glyphs, use the external ROM, otherwise, use the internal. On the other side, pin 13 is connected to A*/G, and pin 9 is connected to the EPROM output enable line (OE*). Finally, pin 8 is connected to the enable line on the 74ls244, which must be low for operation. That makes sense, as either the EPROM OE* should be low OR the buffer enable, but never at the same time. If you squint, you can jsut see that pins 9 and 10 are connected via a trace (the slightly silver part in between). The same thing shows up between 1 and 2, 3 and 4, and 12 and 13. 1/2, 9/10, and 12/13 are all inputs, meaning all 3 of those gates are being used as simple inverters (you can easily turn any NOR and NAND gate into a NOT gate by tying the inputs to the same signal). 3 is an output, and it being tied to 4 means that A*/S is being inverted and sent to a second NAND gate. But, that’s all we can see from the PCB photo. The rest requires inferring what logic is needed.

With pin 9 and 10 going to the EPROM OE* line, that means it should be low when we are wanting alphanumerics. And, we actually only want the EPROM when both A*/S and A*/G are low. But, we know both signals are being inverted for some reason. So, let’s take their inverted versions (A/S* and A/G*) which should both be 1 for external EPROM use. We also know that the inversion of A*/S (pin 3) is being fed to pin 4. Let’s assume the output of the other inversion (pin 11) goes to pin 5 (the other input of the second NAND gate. That means pin 6 will be low when the EPROM needs to be enabled, so OE* should be connected to pin 6. But, we only see it connected to pin 9 and then bridged to pin 10. That means it must connect to pin 6 under the IC:

The only remaining item is address line 7 on the EPROM. Since there are 128 characters in a font (the upper 128 spots are semigraphics), we expect and see the lowest 7 bits of the MC6847 data lines coming from the SAM go to the lowest 7 address lines of the EPROM. We know, because of the font ROM configuration, Address lines 8-11 go to the counter. That leaves address line 7 open. But, if we trace the EPROM pin 1 (address ine 7), we see it goes to the INT*/EXT line from the header. That means that changing the value of that pin on the MC6847 socket (which comes from the PIA) will shift between 2 ROM font sets. That solves the last remaining mystery. Adding in the capacitors, a simple jumper block to select inverse characters, and the design is complete.

However, just like the “lowerkit” design and it’s 2716 EPROM, the 2532 EPROM is hard to find and hard to program. As well, this design is quite a large PCB, making fitment a concern. We should update the design to make it both smaller and shift the EPROM to a newer part, which will no doubt hold more. The larger EPROM then demands some switches to select which font to use. DragonPlus Electronics came to the same conclusion, modifying the original design to take up less space, switching to surface mount parts in the process.

I felt like the design could be shrunk while still keeping through hole parts, though placement would be tight:

I made one “executive” decision on this update, which I am sure will anger some purists. While having free space all self contained in spot of the ROM, I don’t see any useful value for it beyond stowing some explanatory text or copyright, and it makes understanding the ROM data much more difficult. So, I placed the 4 counter address lines at the bottom, meaning each font glyph bitmap will occupy 16 consecutive addresses in the EPROM. Since the files will be Open Source, others can revert that change if it offends them.

Since I didn’t want to find and program a 2532 EPROM, I decided to just manufacture and test the updated PCB, which works perfectly fine:

I can probably squeeze the EPROM closer to the header to reduce PCB size, but the current PCB is 50mm by 58mm (1.95″ x 2.3″), so further optimizations short of moving to SMT is left as an exercise for the reader.

The Heresy! Here’s the Dragon 200e lowercase kit on a CoCo 1

As always, source files, artwork, designs, etc. are available at my GitHub repository: https://github.com/go4retro/dragon200e-lowercase

Expanding the MC6847: Deciphering Fonts, Part 2

Another Player Enters the Game

While searching for TANDY Color Computer external font solutions, I found information on a similar solution that was created for the Dragon 200e. As a TANO Dragon owner (basically, an imported Dragon Data Dragon 64), I was aware of the Dragon line, but the 200 (and 200e) were new to me.

Having owned a Color Computer first, I want to be careful not to malign the Dragon systems. Both share a base architecture, being 6809/6847 based, with cartridge ports that are physically and electrically compatible. The Dragon, though, sports extra features like composite video output, a Centronics parallel printer port, and an actual hardware (6551) based RS232 port. Concerning base functionality, BASIC variants and keyboard layouts do differ. The Dragon 200 (and 200e) mirror the Dragon 64, but offer an updated exterior, and the 200e includes localized character fonts for the Spanish market.

Since the 200e lowercase board was an OEM item, I wasn’t lucky enough to find much information on the unit, but I did find that John Whitworth of DragonPlus Electronics offers a reproduction kit of the original design, available here: Lower Case Text Upgrade for Dragon 32/64/200 – Kit Version (2023) – DragonPlus Electronics. He also shared a copy of the 32 fonts you can burn into a 27C256 (16 fonts) or 27C512 (all 32 fonts) here: Dragon Lower Case Daughterboard Downloads – DragonPlus Electronics.

Initial Results

Since I already created some utilities for font extraction and visualization, I applied them to the font pack on DragonPlus’ web site, and the results were… gibberish. Since the file was an actual ROM binary image, there was no pesky file format to parse, but the output of my text view of the font data showed nothing of value. On a hunch, I wondered if, unlike the lowerkit solution, where each font glyph bitmap is stored in consecutive location, this solution spread them across the binary image. This is why software engineers sometimes need to understand the hardware. If the designers put the row counter on the upper address lines of the EPROM, consecutive bytes of a font glyph would appear every 128 or 256 bytes. A quick rework of my textual display program showed little at 128 byte offsets, but 256 byte offsets yielded recognizable font data:

Byte: 0000 FF '********'
Byte: 0100 FF '********'
Byte: 0200 FF '********'
Byte: 0300 E3 '***   **'
Byte: 0400 DD '** *** *'
Byte: 0500 D9 '** **  *'
Byte: 0600 D5 '** * * *'
Byte: 0700 DB '** ** **'
Byte: 0800 DF '** *****'
Byte: 0900 E3 '***   **'
Byte: 0A00 FF '********'
Byte: 0B00 FF '********'
Byte: 0C00 FF '********'
Byte: 0D00 FF '********'
Byte: 0E00 FF '********'
Byte: 0F00 FF '********'
Byte: 0001 00 '        '
Byte: 0101 00 '        '
Byte: 0201 00 '        '
Byte: 0301 00 '        '
Byte: 0401 00 '        '
Byte: 0501 0C '    **  '
Byte: 0601 02 '      * '
Byte: 0701 1E '   **** '
Byte: 0801 22 '  *   * '
Byte: 0901 1D '   *** *'
Byte: 0A01 00 '        '
Byte: 0B01 00 '        '
Byte: 0C01 FF '********'
Byte: 0D01 FF '********'
Byte: 0E01 FF '********'
Byte: 0F01 FF '********'
Byte: 0002 00 '        '
Byte: 0102 00 '        '
Byte: 0202 00 '        '
Byte: 0302 20 '  *     '
Byte: 0402 20 '  *     '
Byte: 0502 3C '  ****  '
Byte: 0602 22 '  *   * '
Byte: 0702 22 '  *   * '
Byte: 0802 22 '  *   * '
Byte: 0902 5C ' * ***  '
Byte: 0A02 00 '        '
Byte: 0B02 00 '        '
Byte: 0C02 FF '********'
Byte: 0D02 44 ' *   *  '
Byte: 0E02 FF '********'
Byte: 0F02 FF '********'

I wasn’t sure why the data was organized this way, as it makes it more difficult to create a font glyph, but I set that question aside for a bit. The main question was “why 256 byte spans”? Specifically, while the MC6847 supports a full 256 glyph font, the Dragon and the CoCo map semigraphics characters into the upper half of the space, which implies a font set need only be 128 characters in size. And, when I textualized the font data, it became clear font sets in the binary image only represented 128 characters. I also wondered a bit about non $FF values in the trailing 4 bytes of every font glyph, the 4 bytes that would never be visible on screen.

Since the original OEM design used a 4kB EPROM, I surmised font sets were 4kB in size, and I split the DragonPlus binary into 4kB chunks. I then extracted each 128 character font from the resulting files, resulting in 32 2kB font binaries. Finally, I used my font visualization solution from the lowerkit project (https://github.com/go4retro/lowerkit) to create some nice graphical views of the font data.

Secret Data

Of the remaining unanswered questions, I didn’t have enough information on why the font data was offset by 256 bytes when the fonts were 128 characters in length. But, the non $ff data starting at $0d00 in the various files does have an answer, and it somewhat explains why the binary is laid out in this manner.

If you’ll recall from a previous article, the easiest way to organize a font glyph bitmap for the MC6847 is to consider each one 16 bytes in size, even though only 12 bytes will be used. Doing so makes the electronics and the math easier. But, that leaves 4 bytes per character unused. On the lowerkit ROM image, those 4 bytes appear after every 12 bytes of font bitmap data. But, by laying out the Dragon font binary as shown above, all of the “spare” bytes are grouped in the last 1kB of each 4kB font set. Thus, starting at $0d00 in each file, there’s blank space.

Well, it should be blank space, but it turns out Eurohard SA (the group that engineered the Dragon 200e font localization daughterboard and brought the Dragon 200 and 200e to the Spanish market, put some non bitmap data there. While converting the ROMs, I asked in the World of Dragon forum if there was a copy of the original 200e font ROM binary to ensure my efforts were on the right path, and robcfg quickly linked to it. I checked in the supposed blank area of the original ROM image, and I found the following (visible via a hex editor):

(C)  1985  EUROHARD S.A.  -  DRAGON 200   AUTOR:  JORDI  PALET  MARTINEZ

Personally, I think the same thing could have been done by putting the data in the first byte of the 4 byte blank space of the first few characters, but the method Eurohard SA used made it easier to see in the file, perhaps as a deterrent to copy. In any event, it was interesting to find. John of DragonPlus Electronics continued the practice, identifying each font set in the same manner.

The DragonPlus Lowercase Mod Fonts

Below are the 16 font sets in DragonPlus’ font bitmap binary for the reproduction daughercard, arranged in 2 font groupings. As well, I extracted the non bitmap data from each font set.

Set 0

MC6847 Font
MC6847
MC6847
ORIGINAL MC6847 CHARACTER SET FOR COMPATIBILITY

ORIGINAL MC6847 CHARACTER SET FOR COMPATIBILITY

Set 1

D32/64 compatible w. lower case – based on 200E character set
Original 200E character set 1.
DRAGON 32/64/200 COMPATIBLE CHARACTER SET WITH LOWER CASE - BASED ON 200E CHARACTER SET - MODIFIED BY DUBLE
DRAGON 32/64/200 COMPATIBLE CHARACTER SET WITH INVERSE CHARS - BASED ON 200E CHARACTER SET - MODIFIED BY DUBLE

Set 2

Original 200E character set 1.
Original 200E character set 1.
DRAGON 32/64/200 COMPATIBLE CHARACTER SET WITH INVERSE CHARACTERS - BASED ON 200E CHARACTER SET - MODIFIED BY DUBLE
DRAGON 32/64/200 COMPATIBLE CHARACTER SET WITH INVERSE CHARS - BASED ON 200E CHARACTER SET - MODIFIED BY DUBLE

Set 3

Original 200E character set 0.
Original 200E character set 1.
(C)  1985  EUROHARD S.A.  -  DRAGON 200   AUTOR:  JORDI  PALET  MARTINEZ - ORIGINAL DRAGON 200E CHARACTER SET
DRAGON 32/64/200 COMPATIBLE CHARACTER SET WITH INVERSE CHARS - BASED ON 200E CHARACTER SET - MODIFIED BY DUBLE

Set 4

Spectrum w. lower case
Spectrum w. inverse characters.
ZX SPECTRUM CHARACTER SET WITH LOWER CASE - TAKEN FROM MAME ROM SET          ZX SPECTRUM CHARACTER SET WITH INVERSE CHARACTERS - TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE

Set 5

Spectrum w. inverse characters
Original 200E character set 1.
ZX SPECTRUM CHARACTER SET WITH INVERSE CHARACTERS - TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE
DRAGON 32/64/200 COMPATIBLE CHARACTER SET WITH INVERSE CHARACTERS - BASED ON 200E CHARACTER SET - MODIFIED BY DUBLE

Set 6

TI99/4A w. lower case
TI99/4A w. inverse characters.
TI99/4A CHARACTER SET WITH SMALL CAPS FOR LOWER CASE. TAKEN FROM MAME ROM SET
TI99/4A CHARACTER SET WITH INVERSE CHARACTERS - TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE

Set 7

TI99/4A w. inverse characters
Original 200E character set 1.
TI99/4A CHARACTER SET WITH INVERSE CHARACTERS - TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE
D32/64/200 COMPATIBLE WITH INVERSE CHARS - BASED ON 200E CHARSET - MODIFIED BY DUBLE

Set 8

CGA Light for 200E – with accented chars etc.
Original 200E character set 0.
CGA LIGHT CHARACTER SET WITH ACCENTS AND LOWER CASE - 200E COMPATIBLE. TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE
(C)  1985  EUROHARD S.A.  -  DRAGON 200   AUTOR:  JORDI  PALET  MARTINEZ

Set 9

CGA Light for 200E – with accented chars etc.
CGA Light for D32/64 w. inverse chars.
CGA LIGHT CHARACTER SET WITH ACCENTS AND LOWER CASE - 200E COMPATIBLE. TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE
CGA LIGHT WITH INVERSE CHARS - TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE

Set 10

CGA Light for D32/64 w. lower case
CGA Light for D32/64 w. inverse chars.
CGA LIGHT WITH LOWER CASE - 32/64/200 COMPATIBLE. TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE
CGA LIGHT WITH INVERSE CHARS - TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE

Set 11

CGA Light for D32/64 w. inverse chars
D32/64 compatible w. inv. Chars – based on 200E character set.
CGA LIGHT WITH INVERSE CHARS - 32/64/200 COMPATIBLE. TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE
D32/64/200 COMPATIBLE WITH INVERSE CHARS - BASED ON 200E CHARSET - MODIFIED BY DUBLE

Set 12

CGA Bold for 200E – with accented chars etc.
Original 200E character set 0.
CGA BOLD CHARACTER SET WITH ACCENTS AND LOWER CASE - 200E COMPATIBLE. TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE
(C)  1985  EUROHARD S.A.  -  DRAGON 200   AUTOR:  JORDI  PALET  MARTINEZ

Set 13

CGA Bold for 200E – with accented chars etc.
Original 200E character set 1.
CGA BOLD CHARACTER SET WITH ACCENTS AND LOWER CASE - 200E COMPATIBLE. TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE
CGA BOLD WITH INVERSE CHARS - TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE

Set 14

CGA Bold for D32/64 w. lower case
CGA Bold for D32/64 w. inverse chars.
CGA BOLD WITH LOWER CASE - 32/64/200 COMPATIBLE. TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE
CGA BOLD WITH INVERSE CHARS - TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE

Set 15

CGA Bold for D32/64 w. inverse chars
D32/64 compatible w. inv. Chars – based on 200E character set.
CGA BOLD WITH INVERSE CHARS - 32/64/200 COMPATIBLE. TAKEN FROM MAME ROM SET AND MODIFIED BY DUBLE
D32/64/200 COMPATIBLE WITH INVERSE CHARS - BASED ON 200E CHARSET - MODIFIED BY DUBLE

Some of the fonts are repeated in the binary image, probably to allow flexibility in choosing the combination most desired.

The SVG images of the fonts are at: https://github.com/go4retro/dragon200e-lowercase/tree/main/charsets/drawsvg and font binaries are at: https://github.com/go4retro/dragon200e-lowercase/tree/main/charsets/fixrom.

Pedantic Inconsistencies

Not that it matters, but John’s font binary contains what appear to be some font description inconsistencies.

  • The second font in Set 1 claims it is a modified Dragon 200e set, but, except for the hidden text data, the font information is the same as the original Dragon 200e second font
  • Likewise, though the second fonts in sets 2,3,5,and 7 claim to be modified, they appear to be the original second font from the Dragon 200e mod.
  • The second font in set 13 claims to be a CGA bold font, but is actually another copy of the original Dragon 200e second font.

It’s possible the second font was modified, but then reverted, prior to the ROM merge. On the CGA font, that’s more confusing, and I’d appreciate validation of my efforts, which I’ve placed on GitHub at https://github.com/go4retro/dragon200e-lowercase.

Expanding the MC6847: Deciphering the Interface

Having found some external fonts used with a MC6847 lowercase modification from “back in the day” (the lowerkit product from Green Mountain Micro) and converting them into ROM-burnable images, I determined a ROM font was composed of 12 bytes of font bitmap data, with 4 bytes of “padding” to align font data on 16 byte boundaries. This, in turn, helped me understand how the font ROM was connected to the MC6847. Specifically:

As I mentioned in a previous article, notice the External Character Generator ROM, with “Row A” and “Char. Add” going into the left side of the ROM. It makes sense that the 4 bit output of the Row Counter (shown elsewhere in the data sheet) and the value of the character desired at a specific screen location would be fed into the ROM address lines, but it doesn’t show if the 4 bits of the counter are the lowest 4 bits of the address or not. Well, given how the “lowerkit” fonts were arranged, the counter must be at the lowest 4 bits. It makes sense, as then the entire bitmap for a single character would be in consecutive memory locations, but it pays to be certain.

With that knowledge, I felt I had gone about as far as I could with poring over datasheets and understanding font bitmap files. It was time to build a circuit.

With the knowledge of the GMM “lowerkit” product in hand, I asked for assistance in the CoCo Mailing List, and Tim Lindner responded in the next message with a link to the product manual. Since the “lowerkit” manual not only helpfully included a full schematic but also included PCB artwork scans (We owe a debt of gratitude to hardware and software developers of the 1980s that put such technical detail in manuals!), it was possible to lay out a new PCB. And, so I did. Page 22 (30 in the PDF) showed a schematic, but it used an entirely different set of TTL ICs:

Page 22 (30 in the PDF) Interestingly, lowerkit’s design utilized a different configuration for the counter. There was still a counter involved (74LS93), and some of the logic I realized was to provide some additional features, but the 74LS73 and the 74LS93 together confused me. Still, it was a working design, and perhaps it was better then the datasheet option (In general, designs in a datasheet are workable, but may not be optimal).

With all of this, the bulk of the work was trivial. Lay out the schematic and then create a PCB, tracing over the shots of the PCB artwork to line things up. While, for a first attempt, it’s always best to use the exact same layout, there was 1 large issue with the original design. Lowerkit used a 24 pin EPROM to store the font bitmap data. 24 pin EPROMs (2516 or 2716, along with 2532 and 2732) are very difficult to source now and they are hard to program. 28 pin JEDEC standard pinout EPROMs are much easier to find (27C64-27C512) and they offer the option of larger storage. So, I extended the PCB a bit lower to support the 28 pin 27C256, which offered storage for 16 different font bitmap sets. However, this, in turn, forced me to make another small change, the switches on the right hand side, to select which font to use. Along the way, Ed Snider (who I previously mentioned had also created a reproduction of this unit some years back) opped up in the mailing list and shared the schematic for his initial reverse engineering effort. I checked my design against his schematic to ensure I didn’t miss anything.

You’d think, after 20 years of PCB design, I’d check the layout before sending off to jlcpcb.com, the PCB manufacturer I use. But no, it was late, I was “sure” all was correct, and I hurried the design off. Haste makes waste, as they say:

Notice the metal RF shield “wall” on the left side. Yep, my attempt to quickly extend the PCB to support a 28 pin EPROM caused a fitment issue. It’s OK, though, as this was just a prototype and I’m used to creating PCB socket “risers” from stacking IC sockets up to varying heights. Thus, this was a quick issue to solve.

After stuffing the PCB and mounting it in the 6847 socket (with my riser), turning on the machine yielded this:

That’s a nice looking font (this is the font called “NEW” in the ROM sets I found). But, I noticed something wrong. The first row of characters had the top lines cut off. A quick set of tests showed that the font data is fine (Notice the ‘E’ in the first row is chopped off, but the ‘E’ in the third line is fine). It has to be the schematic. Time to break out the logic analyzer.

While I have a very capable HP logic analyzer that can analyze dozens of signals, I surmised the issue here was with the counter and the weird 7473 flip flop circuit in the design, which only requires a few signals be watched. This meant my USB-based 16 channel unit would suffice to look at the signals at the beginning of the screen frame:

Hmmm, the counter never stops counting, and until the first negative pulse of the *RP (Row Preset) signal, the counter is counting from 0-15.

Let’s look back at the lowerkit schematic:

Don’t worry if you’re not good at reading schematic, I will walk you through it. We’re focused on the 74(LS)93 R signals (pins 2 and 3). Those clear the counter. So, when a frame starts, the flip flop is cleared (pin2 on the 7473) as the *FS signal goes low (CLR will keep the flip flop cleared the entire time *FS is low). When a flip flop is cleared, the Q output becomes 0. *Q (not Q) thus becomes 1. And, we see this on the analyzer output. *FS (first line, in red) goes low, and the !Q (line 7, pale orange) goes high at the same time. Now, looking at the schematic again, with the !Q high, the 74125 (the little triangle next to the 7493 R pins) buffer is disabled, meaning the signal going into it will not be sent on. That line is RP (Row Preset, but inverted via the inverter just above and to the left of the 7473 box). But, if the signal is not being sent on, what happens to the R inputs on the counter?

To answer that, let’s talk about buffers, like the 74(ls)125. When a buffer is enabled, the input is sent to the output. Makes sense. When disabled, the input is not sent to the output, and the output is put into a state called “high impedance”, or Hi-Z. Basically, the pin looks like a very high resistance resistor to the circuit, and it shouldn’t be considered a 1 or 0. But, in this case, when the buffer is disabled, we want the R pins to be high, as that keeps the counter at 0.

Often, when you need to “weakly” bias an input pin high, you tie it to 5V (or whatever voltage the circuit is using) through a 4.7K-10K resistor. This trickles just a tiny bit of power (5/10000 = .5mA) into the input put, “weakly” pulling it to ‘1’. When another signal tries to pull the pin low, it does so easily because the weak bias can’t overpower the much stronger signal. I checked, and there’s no bias resistor on this pin in the original PCB or the schematic. In reality, in many IC families, like the “Low Power Schottky” or “LS” family, input pins tend to “float” high, due to the way inputs are fabricated. But, it’s not guaranteed.

Still, if you put a 74ls93 there, it should have worked. And, it probably would have, had I used a 74ls93. But, as a general rule, I don’t buy 74lsXXX parts anymore. I tend to by High Speed CMOS TTL compatible (HCT) or Advanced High Speed CMOS TTL compatible (AHCT) parts, which is what I used here. And, surprise, 74HCT93 R input pins do not “float” high. They quickly settle to 0, which meant the counter was never being cleared while the flip flop was in the “cleared” status

A quick bias resistor to the R inputs helped matters immensely:

Notice the counter doing nothing until DA0 starts toggling? That’s expected behavior. The screen echoes this…

Yes, I should have used the exact parts specified by the original design, so ultimately this is my own problem. But, even if I had purchased LS TTL parts and the problem appeared, I’d not be too hasty to blame Dennis Bathory-Kitsz, the designer. The “proper” solution here (at least for this signal) was to output 1 IF (!Q=0 AND *RP=0) OR (!Q=1). You could rewrite that as R = Q*RP+!Q*!Q and then implement that with 3 NAND gates, but that just adds another IC to the design, increasing cost and creating more layout complexity. And, there may be other permutations, but I bet they all require another IC. Dennis had an extra 74ls125 buffer available after the rest of the design was finished, so he made use of it. It’s creative, and a quick bias resistor ensures the logic is sound. And, before you ask, “Why not always use bias resistors?”, note that such resistors don’t pull the inputs up as fast as a logic signal. For classic 1970’s and 1980’s systems, it’s often fast enough, but your mileage may vary.

As it is, the “failure” was most educational. By varying that starting value of the counter, you can “shift” how characters are displayed on the screen. That suggests lots of cool ideas, so we’ll keep that in mind as we continue to explore external character generation.

I took a bit of time to look at all the fonts Dennis created, by flipping through the ROMs via the DIP switches. Since there’s no harm in doing so while the VDG is using the ROM, you can put some characters on the screen and then see how each font set renders the text. I also looked at the design a bit more and realized data line 7 is not connected to the EPROM at all. The VDG always gets the high bit from the incoming data. Since D7 is also connected on the CoCo to the Alpha/Semigraphics signal, that means every external font bitmap has bit 7 cleared. I checked the fonts I assembled, and almost all of the high bits were clear, except for CHGSCR:

the last character in CHGNEW:

And CHGDEVL2, which appears to mirror CHGSCR.

The design uses DD7 (which is, as I noted, connected to the *Alpha/Semigraphics pin externally on the CoCo PCB, to select whether the internal or external ROMs should be used). It then uses 2 74ls157 4 bit A/B selector ICs to pick which data to send to the VDG, just like in the VDG diagram above. It uses 7 of those A/B inputs for the internal/external data, and 1 to pass a 1 or 0 out to various logic gates. Since the last A/B gate is used for this purpose, it’s not available for font data.

Looking at the schematic, there’s a few things that I’d consider changing. As noted, the VDG datasheet uses a preloadable 74*161 4 bit counter to handle the low address lines of the font ROM, and this design uses a 74*93 and 74*73, so I’d consider swapping that design. And, from pictures I saw online, later revisions of the “lowerkit” product did just that. Here’s Revision 1 (as illustrated in the manual, compared to LowerKit 3C Revision H (thanks to Al Hartman for his flickr stream where I found these examples):

Notice the 74ls161 on the second PCB? That would eliminate the issue I found, as the 7473 is gone from the design.

I also think Data line 7 from the ROM can be fed into the VDG, via the A/B selector ICs. The logic in the design to select a 1 or 0 from the A/B selector IC I think can be replaced with using the now spare 74ls125 buffer (which we no longer need for the counter logic) feeding DD7 into the various logic cells.

Finally, I’d replace the 74ls157 a/b selector with a 74ls244/245 8 bit buffer/transceiver. It turns out the EPROM already has a 8 bit buffer inside the EPROM (RAMs and ROMs have an output enable pin for just this purpose). Data Line 7 or the *A/S line could be used to enable the EPROM, and the inverted version of that signal could be used to enable the 8 bit buffer. That would reduce the parts count by yet 1 more IC.

But, as Dennis told me, he was an early creator in this space, and there wasn’t a lot of guidance out there at the time.

If you’d like to check out the schematic and the PCB design, it’s on the GitHub repository.

Expanding the MC6847: Deciphering Fonts

Searching for Information

As I was trying to decipher how to leverage an external character font ROM with the MC6847, I searched for existing project/products in this space. My web searches first found Ed Snider’s (Zippster of ZippsterZone) reproduction of the Green Mountain Micro (GMM) “lowerkit” product from “back in the day”. Ed shared a bit of technical detail and project progress, but no schematics or additional information that might answer questions on how the circuit worked. I also found a reproduction of the lowercase daughterboard for the Dragon 200e (which also works for the Dragon 32/64) produced by DragonPlus Electronics. Like with Ed’s page, DragonPlus offered some technical information (build instructions) but no schematic. Still, I had a start.

After changing some search terms, I found a few downloads that purported to be for the “lowerkit” product on a TRS-80 Model 1 web site. Confused why files for a CoCo product were on a web archive for a different machine, I still downloaded the files, and searching for those same filenames found related files on a few other web archives. At the end, I found:

Initial Results

Extracting all of the files, I found a number of binary files in the archives, but the size seemed too big (3kB), whereas the datasheet suggested a 2kB (12 address lines) EPROM.

Loading the files into a hex editor suggested font data, so I grabbed a quick file reading C program template from the Internet and modified it to spit out the data as a text-based bitmap file, with “*” for a set bit, and ‘ ‘ for a clear one.

Running chgmath.bin through my simple program provided encouraging results:

Byte: 0000 05 '     * *'
Byte: 0001 06 ' ** '
Byte: 0002 43 ' * **'
Byte: 0003 48 ' * * '
Byte: 0004 47 ' * ***'
Byte: 0005 4D ' * ** *'
Byte: 0006 41 ' * *'
Byte: 0007 54 ' * * * '
Byte: 0008 01 ' *'
Byte: 0009 00 ' '
Byte: 000A 00 ' '
Byte: 000B 52 ' * * * '
Byte: 000C 00 ' '
Byte: 000D 00 ' '
Byte: 000E 08 ' * '
Byte: 000F 14 ' * * '
Byte: 0010 14 ' * * '
Byte: 0011 22 ' * * '
Byte: 0012 22 ' * * '
Byte: 0013 41 ' * *'
Byte: 0014 7F ' *******'
Byte: 0015 00 ' '
Byte: 0016 00 ' '
Byte: 0017 00 ' '
Byte: 0018 00 ' '
Byte: 0019 00 ' '
Byte: 001A 00 ' '
Byte: 001B 00 ' '
Byte: 001C 7F ' *******'
Byte: 001D 41 ' * *'
Byte: 001E 20 ' * '
Byte: 001F 10 ' * '
Byte: 0020 08 ' * '
Byte: 0021 10 ' * '
Byte: 0022 20 ' * '
Byte: 0023 41 ' * *'
Byte: 0024 7F ' *******'
Byte: 0025 00 ' '
Byte: 0026 00 ' '
Byte: 0027 00 ' '
Byte: 0028 00 ' '
Byte: 0029 00 ' '
Byte: 002A 00 ' '
Byte: 002B 00 ' '
Byte: 002C 06 ' ** '
Byte: 002D 09 ' * *'
Byte: 002E 08 ' * '
Byte: 002F 08 ' * '
Byte: 0030 08 ' * '
Byte: 0031 08 ' * '
Byte: 0032 08 ' * '
Byte: 0033 08 ' * '
Byte: 0034 08 ' * '
Byte: 0035 08 ' * '
Byte: 0036 48 ' * * '
Byte: 0037 30 ' ** '
Byte: 0038 00 ' '
Byte: 0039 00 ' '
Byte: 003A 00 ' '
Byte: 003B 00 ' '
...

The first 2 bytes of each file were 0x05, 0x06, and the next few bytes looked like ASCII (and they are. The bytes above spell out CHGMAT, which looks like a 6 byte filename). There 4 more bytes that don’t look like font data, and then font glyph bitmap data in 16 byte chunks.

That answers one question about the datasheet schematic. The row counter address lines are the lowest 4 bits of the font ROM address. I thought they were, since it makes creating the ROM easier (all of the data for each character glyph’s bitmap resides in contiguous memory locations), but it “wastes” 4 bytes per font glyph (the lower 4 bits of the address will never go above 11, according to the datasheet, but the next bitmap will be 16 bytes above the previous one.

Looking at the rest of the files, some have odd data into bytes 12,13,14,15 of each character glyph, and there’s extra bytes in the file every so often, but I felt like I had made some progress. I decided to create a GitHub repository at https://github.com/go4retro/lowerkit and place my current progress in the charsets directory. I took a stab at trying to clean up the parsing functions and remove some of the extraneous data in the file, noticing that 4 extra bytes appeared every $102 bytes in the file. Theorizing those were some type of “sector” or “block” structure, I parsed them out of the font stream and the results looked even better, save for some oddities at the end of each file.

Deciphering the File Format

Since the files were in a TRS80 Model 1 archive, I reached out to the TANDY Discord server, relating the oddity about control characters appearing every $102 bytes in the file. George Phillips (of TRS80GP emulator fame) responded that the files was encoded in the TRSDOS /CMD format, and that his TRLD utility could parse the files. I grabbed the utility, followed his directions, and the resulting binary files appeared!

The only issue was that they didn’t appear alone. The TRLD utility extracted the data, but saved the ROM data in what appeared to be a larger memory dump, which makes sense, given how the file format works (it has blocks of data, which contain a load address and length, and they could appear anywhere. George’s utility takes the safe approach and parses the file into a blank memory map, and then saves it all to disk. However, since these font ROM file blocks are contiguous, there’s no need to save off the entire map.

To be fair, George noted it would do this, and suggested I could either look at the first block’s load address in the file (using the diagnostic dump parameter of the utility) and then strip off that many bytes from the file, or dig into the trld source and extract the parse engine to roll my own. While I was debating which to do, I found a blog posting about parsing CMD files from Jim Lawless, a friend from many years back, complete with source code.

Jim’s utility didn’t parse the data per se, but it did parse the structure, and the source was available, so I just lifted my functions from my quick utility and inserted them into this new source. The resulting output was clean, so I then added in an optional parameter to save off the binary data as a contiguous file, suitable for burning to an EPROM.

The LowerKit Fonts

To make it easier to compare fonts and allow a better way to show what’s in them, I modified one of my simple utilities to brute force create some SVG files, shown below. I apologize for nothing, it was simply a means to an end 🙂

CHGAPL

CHGAPL Font

CHGEURO

CHKGREEK

CHGKATA

CHGKAT2

Internal name is KATA, just like the one above

CHGMATH

CHGSCR

CHGSMALL

CHGNEW

The SVG images of the fonts are at: https://github.com/go4retro/lowerkit/tree/main/charsets/drawsvg and font binaries are at: https://github.com/go4retro/lowerkit/tree/main/charsets/mkroms.

I extracted all of the binary files from the various archives, but it looks like some are duplicates:

  • CHGAPL2 -> CHGAPL
  • LITTLE2->SMALL

Discovering More Fonts!

While working on the font files themselves, I looked at the BAS files in the various archives, which appear to be Model 1 BASIC files. However, after perusing a few of them, I noticed it appeared they hold a copy of a font file within the BASIC program. I quickly copied out the BAS files, trimmed off everything except the BASIC ROM data, and compiled a second parser to make sense of this data. CHGEURO.BAS, for example, contains data looking like this:

XXXXXXX,XXXXX,XXXXXXX,XXX,XXXXXX,XX,XXXXX,XXXXX,XX,XXXXXXX,XXXXXXX,XXXXXXX ...

7 (and sometimes 8) bits of font data, a comma, next row, etc. Every so often, there’s 6 extra bytes of data that are not font bitmap data, so I had to decipher that. But, it was not hard, and that data above became this:

'       '
' * * '
' '
' **** '
' * '
' ***** '
'* * '
'* * '
' **** *'
' '
' '
' ' Ignoring 00, 79, 6B, 6E, 00, 88
' * '
' * '
' '
' **** '
' * '
' ***** '
'* * '
'* * '
' **** *'
' '
' '
' ' Ignoring 00, DE, 6B, 78, 00, 88

I have a feeling the extra data is internal BASIC linkage information, but it was easy to discard.

Most of the files match up to the font data files I parsed, but there were a few additional ones:

CHGDEVLR

CHGDEVL2

Fonts Left to Find

In the manual, it references:

  • ASCII Shifted with Greek (I think that’s CHGGREEK above)
  • ASCII Shifted with Kata Kana (CHGKATA)
  • ASCII Shifted with APL Character Set (CHGAPL)
  • Kata Kana Shifted with ASCII (CHGKAT2)
  • ASCII Shifted with Cyrillic
  • ASCII Shifted with Arabic
  • ASCII Shifted with Hebrew
  • ASCII Shifted with French Characters
  • ASCII Shifted with European character variants (CHGEURO)
  • ASCII Shifted with MAth symbols and operators (CHGMATH)

So, it looks like a few are left to find. Let me know if you own or have archived the Model 1 programs that would include these files, or if you have the ROM images (or ROMs, I can image them here)

How Low Can You Go?

Understanding external character ROM functionality on the Motorola MC6847 Video Display Generator (VDG)

I’ll be honest, I rarely use lowercase characters on any of the classic computers I use. Back in the day, of course, I needed them when writing posts on BBS systems, composing school papers, and writing general correspondence (my handwriting was never great). Still, they were always there when I needed them.

Sometimes, they came stock with the system. Others needed a bit of help, usually with a character ROM swap or popping in an updated video IC. But, some required more complex upgrades, like adding lowercase to the TANDY Color Computer. Though later models offered the updated video IC option or a newly designed video IC (CoCo 3), early models took a different approach.

Up until the CoCo 3, the CoCo utilized variants of the Motorola MC6847 Video Display Generator, commonly called the VDG. The VDG was an early video display entry, with limited graphics capabilities, first offered in 1978. The original MC6847 offered only 64 characters comprised of uppercase alphanumeric characters and the requisite punctuation and related characters. A later T1 variant included lowercase characters but was not a drop-in replacement. Both variants utlized an 8 pixel wide by 12 pixel tall character matrix, but included only a 7×5 font in ROM.

However, both variants offered the ability to utilize an external character ROM, as shown on page 23 of the MC6847 datasheet:

C6847 External character ROM support

Notice the portion in the lower right. I will admit the connection from the Display memory data lines to the address lines of the character ROM address lines confused me for a while, but a quick inquiry on the 6502.org discussion forum saw Rob Finch clear it up, noting that the display memory would provide the character value to the ROM, and the Row Counter would then provide the 12 lines of the character font address. The buffers and boolean logic serves to disable the external ROM if either the internal ROM or the semigraphics mode is selected by the programmer.

Now understanding how the ROM was used, I turned my attention to the counter component at the bottom of the diagram. Page 13 of the datasheet showed more detail, but it was somewhat confusing to understand how it worked:

Row Counter

I understand the idea of the counter, but preloading it with 9 (P3 and P0 high, P2/P1 low) made no sense. For reference, *RP is Row Preset, while *FS is frame sync and *HS is Horizontal Sync. *HS will trigger once per line, while *FS will trigger once per frame, and *RP triggers very 12th *HS, BUT there is a note in the datasheet:

Mind you, all datasheets take a bit to understand, but the timing diagram showing the relationship among these 3 signals was hard to grasp from the technical document. I could pore over it some more, but I thought it might be better to see if there was an existing external ROM product sold back in the day I could find. And, of course, there were a few, but the first I found was Green Mountain Micro’s “lowerkit” solution, which provided lowercase characters (and other goodies) for the CoCo. The next step was the find out more information about this product and how it worked.

CoCo DMA: All Charged Up!

Addressing the Color Computer 3 DMA operation challenges via hardware modification, while pretty simple, limits initial adoption. It places the technique into a classic “chicken and egg” situation. The CoCo3 provides an ideal platform for DMA capabilities, but owners will only perform hardware modification if highly desired peripheral capability demands it. Hardware designers, for their part, prefer to focus energy on innovative peripheral design that targets common CoCo3 hardware configurations. What if we could shortcut the process by implementing a solution that does not require hardware alteration?

Return your attention to the portion of the schematic showing the M68B09E data bus and the 74LS245 bus transceiver, the root of the challenge. As we have noted, the transceiver is always tied to the memory data bus (pins 2-9 in the schematic) and outputs data onto the memory bus during any write cycle. How might we work around this constraint or, failing that, leverage this fact? One idea often considered attempts to “race” the ‘245 transceiver. Essentially, place the data you want to store on the bus, delay issuing the write until the last possible moment, enable the write line, which then enables the ‘245 buffer, and then hope that the memory latches your data before the ‘245 flips over and places its data on the bus.

Figure 1: CoCo3 CPU Data Bus Schematic

First, let’s review DRAM access by referencing the timing diagram for a Texas Instruments 4464 64kbx4 Dynamic Random Access Memory (DRAM) in Figure 2, of the same type often installed in the CoCo3. For various reasons (pin count reduction, moving the multiplexing function out of each IC into a common area, supporting faster memory access options), DRAMs expose multiplexed addressing pins. Think of a DRAM as a matrix of memory cells arranged in a row/column configuration. To read or write DRAM memory, one places the row portion of the address onto the multiplexed address lines, and activates the “row address strobe” (CAS) signal. The DRAM latches the row value and calls up that row in the memory matrix. A short time later, one places the column portion of the address onto the same pins and activates the “column address strobe” (CAS) signal. The DRAM then reads or writes that portion of the matrix.

Figure 2: DRAM Write Timing

There are rules to follow. Notice the “write” signal, denoted by a W with a line over it (active low). Right above it and below it, focus attention on the arrows showing th(CLW) and tw(W). th(CLW) illustrates the amount of time after the CAS line falls until the write line can go inactive (essentially, the time it takes for the actual memory write to occur). Tw(W) illustrates the duration the write signal must be held low. For a 150nS (nanosecond) speed grade DRAM, Tw(W) and th(CLW) are both specified as 45nS minimum (the timing diagram misleadingly suggests the write line must be pulled low before CAS goes low, which is not true).

On a CoCo3 running in FAST mode, we have ~280nS to perform a CPU write (560nS for a complete cycle at 1.78MHz, and the CPU gets half of that). Assuming the write will complete at the end of the CPU cycle, that means we need to enable the write function 45 nS before that, or at 235nS. This poses a problem. Assuming that the address starts getting set up at the beginning of the CPU portion of the clock cycle and we don’t immediately activate the write line, the DRAM will perform a read activity, culminating in valid data on the DRAM data lines 150nS after the start of the cycle. We weren’t going to activate the write line until 235nS into the cycle, so now the data on the memory bus (from the DRAM) will be fighting the data we’ve placed on the bus. If that isn’t enough of an issue, when we activate the write line, the 74LS245 will enable data on its output lines approximately 25nS after doing so. But, we need stable data for 45nS after activating the write line. The good news is that we’re no longer fighting the DRAM on the bus, but we end up fighting the ‘245 on the bus for 20nS before the end of the cycle (45ns – 25ns).

Perhaps, if we choose not to fight the 74LS245, we may be able to leverage it do help our cause. Enter Darren Atkinson once again. While I was preparing and testing the hardware solution for this challenge, Darren (who had started conversing with me on the topic some weeks ago) started considering options and dropped a deceivingly simple email to me on the topic a week ago. Between us, we have refined this idea, which I now present to the public on behalf of Darren and myself.

The essential idea: We can’t keep the 74LS245 off the memory bus during a write cycle, but what if we can leverage it to put the data we want onto the bus?

Let’s illustrate the CoCo1/2/Dragon timing diagram. I should take moment here to sing the praises of the WaveDrom (https://wavedrom.com) Online Timing Diagram Editor:

Figure 3: CoCo1/2 DMA Write Example

Shown here, the address and data lines are activated by the DMA engine during the entire computer cycle, along with the enabled R/W line. The CPU essentially removes itself from the bus during the cycle. If we consider the Color Computer 3, we know from Figure 1 that the 74LS245 will output data onto the bus during a write cycle, but we also know that the memory is not connected to the CPU during the first half of the cycle (the GIME accesses memory during this time). What if we could store a value on the one side of the ‘245 during the first part of the clock cycle, and then have the transceiver push that value onto the bus during the latter half of the cycle? That would be awesome, except the only thing connected to the other side of the bus transceiver is the CPU, now removed from the bus, and some short PCB traces. It doesn’t look too promising.

Since you know where this is going, let’s dig a bit deeper into electronics. In hardware design, we like to think of computers as digital systems, where the only thing that matters is high or low, 1 or 0, +5Volt or ground. That’s great, but digital computers are inherently analog in nature, regardless of how much we want to ignore that. In analog circuits, everything has an inherent capacitance, or the ability to hold a voltage charge for a period of time. As well, everything has an inherent resistance, or a desire to slow down the flow of electrons. In fact, this capacitance and resistance lies at the heart of DRAM. Unlike static ram, where the memory cell holds its value until power disappears, a DRAM cell is basically a small capacitor, holding a bit of charge (or not) that represents the value desired in that memory bit location. The inherent resistance in the DRAM cell slowly “bleeds” off the voltage in the capacitor, which is why DRAM must be “refreshed” every so often. Delay the refresh, the resistor bleeds off too much charge, and the memory is lost.

What if we treat the external pins of the CPU data bus, the small traces between it and the 74LS245, and the connected pins of the bus transceiver as a set of 8 small memory cells? They are crude and they will lose their value very fast, but perhaps they will hold a value long enough for us to accomplish our goals.

Let’s revise our timing diagram with this idea. Instead of placing data on the data bus for the entire cycle, let’s place data onto the bus during the first quarter of the cycle, while we also hold the read line high. With the CPU tri-stated (though the CPU pins are still physically connected to the bus, and so can participate in our memory cell idea), the bus transceiver dutifully captures that data and places it onto the now tri-stated data pins of the CPU (and the traces between the bus transceiver and the CPU). Then, a quarter cycle later, the DMA engine pulls the data off the bus, and signals the system to write data, placing the correct address onto the address bus that time as well. At this point, the 74LS245 bus transceiver should dutifully read the CPU data bus pins, which now hold the residual charge placed there moments before. The transceiver, locked into the “write” mode, will then amplify that charge and place it onto the memory bus, where the DRAM can access it.

Figure 4: No-Mod CoCo3 DMA Timing Diagram

Outcome: Success! It turns out that our little 8 bit memory cell can charge up in 140nS or less and will hold its value for at least 840nS (3/4 of a cycle in SLOW mode).

Let’s step back and look at the entire DMA process, both for reads and also for writes, with special emphasis on the changes needed for the CoCo3. As you will recall, all Color Computers share the same read operations, placing the system into DMA mode and reading memory:

Figure : Color Computer 4-Byte DMA Read Example

For DMA write operations on the CoCo 1 and 2, things change very little. We place data on the data bus and signal a write to memory:

Figure : Color Computer 1 and 2 4-Byte DMA Write Example

Finally, let’s show how this changes slightly to accommodate the CoCo3:

Figure : Color Computer 3 4-Byte DMA Write Example

Astute readers have already started wondering: Why is $ffxx placed on the address bus at the beginning of the cycle? Why can’t the desired memory address be placed on the address bus for the entire cycle? To answer that, let’s look at all the items that can place values on the data bus. On the CoCo3, the bus transceiver, the two (2) Peripheral Interface Adapters (PIA), the cartridge port, and the GIME can output data on the bus. Let’s further focus on the first quarter of the CPU cycle, when our DMA engine has placed data on the bus and a read operation has been requested. We already know the 74LS245 bus transceiver is pulling data from the bus, not writing to it. The PIAs specifically gate their reads with the E clock and tri-state the data bus during the low half of the E clock. That leaves the GIME as a possibility.

Interestingly, the data path from the CPU to RAM and the data path from RAM to the CPU (or our DMA Engine) differ, which may surprise some. During write cycles, there are two bus buffers (74LS244) that shepherd data from the data bus into one (1) of the two (2) memory banks (the odd memory locations are in one bank, while the even ones are in another). But, all RAM read accesses pass through the GIME enroute to the CPU. Page 36 stipulates the GIME controls “granting access to the processor during the high time of E (CPU portion)”, which suggests the GIME acts like the PIAs. However, experimentation suggests this is slightly incorrect (or at least somewhat vague).

Since all RAM read accesses travel through the GIME data pins to the CPU, the GIME is allowed to place data on the data bus during any RAM read access. When the Color Computer 3 is configured for an all-RAM memory map, only read accesses to $ff00-$ff7f (I/O region) or $fff0-$ffff (CPU vectors) would be serviced elsewhere. The former would be handled by the PIAs and the cartridge port peripherals, while the latter always reads from onboard ROM. Normally, we would expect the data pins to be inactive during the low portion of the E clock cycle, when the CPU is not accessing the bus. However, possibly due to the need to reduce logic complexity, the GIME tri-states its data bus anytime a write is requested or the above address blocks are accessed, regardless of the state of the E clock cycle. Thus, to ensure the GIME stays off the data bus during the read portion of our DMA write activity, we select an address in one of these areas. We did not experimentally test, but we assume that if a memory map with ROM is selected, all ROM memory locations will also force the GIME to tri-state its data pins (the DMA engine has to select a suitable address given the most restrictive set of requirements/memory map).

As a further note, there is one other portion of the memory map that forces the GIME off the bus during a read cycle. The data sheet for the MC6883 Synchronous Address Multiplexer (SAM), of which the GIME partially emulates, denotes locations $ffe0-$ffef as “Reserved for future MPU enhancements. Do Not Use!” (emphasis in original document). To validate this information, a special version of the DMA Engine was created that allowed the “read address” value to be manipulated by the CoCo3 and all 65536 address values were tested.

We must perform more testing to ensure correct operation across various CoCo3 variations. As well, though we have tested with both MC6809E and HD63C09E CPUs, using stock memory, DRAM-based memory expansions, and SRAM-based memory expansions; various Dynamic Address Translation (DAT) logic options like those found in the Boomerang E2 and Triad+ have yet to be assessed. Initial results, though, look promising, and we are eager for others to replicate Darren’s and my findings.

I would like to take this opportunity to once again thank Darren Atkinson for discovering this solution. Though we both contributed to the final implementation, I can honestly say I would not have considered this path had he not done the initial investigation. As well, I appreciate that, unlike years ago when electronic testing equipment and design tools and services were completely out of the hobbyist’ reach, items like surplus test equipment and PCB CAD software and manufacturing services allow easy investigation into ideas like these.

Now, since we have conquered all of the Color Computer variants, can we do anything else with this DMA idea?

CoCo DMA: Missing Without a Trace

Radio Shack’s introduction of the Color Computer 3 introduced extensive changes to the Color Computer design. Gone were the 6847 VDG and 6883 SAM, functionally replaced and enhanced with the TCC1014 Advanced Color Video Chip (ACVC) which most folks reference as the GIME. Other additions included an additional 64kB of RAM with an option to upgrade all the way to 512kB, support for an additional joystick fire button, and composite and RGB video output options. Almost every addition brought new capabilities or an expansion of capabilities to the Color Computer…

Figure 1: 74LS245 Pinout

Interestingly, for a machine design that I understand focused on parts and cost reduction within TANDY, 1 part was added to the CPU’s data path along the memory bus, an addition that renders our previous work unusable, an addition that breaks the very idea of sharing the bus: the 74LS245 octal bus transceiver. Ironically, the ‘245 transceiver was designed to allow bus sharing. As the name suggests, it contains 8 logic elements, each able to pass one signal from the ‘A’ pin to the ‘B’ pin or vice versa, depending on the state of the “direction” pin. More importantly, when an “enable” pin is de-activated, the logic element effectively “removes” the signal from the pin on the IC. In the industry, the pin is said to be in high impedance mode or “Hi-Z” mode. Such pins are commonly called “tri-state” pins, as they can either output 0, 1, or Hi-Z. The mechanics vary, but the term implies the logic element effectively places such a large resistance in between the signal and the pin that the rest of the circuit can essential ignore or will have no awareness of the signal. The 8 elements, Hi-Z operation, and direction function fit the demands of an 8-bit CPU bidirectional data bus perfectly. We can’t blame the IC itself for this issue, so why does the inclusion of this IC in the Color Computer 3 design render bus sharing unusable?

Let’s peruse the CoCo3 schematic, located on page 103 (or 127 for PAL) of the TANDY Color Computer 3 Service Manual. A snippet is included below in Figure 2. The 74LS245 connects directly to the MC6809E CPU data bus, with the DIR line connected to the CPU R/W line. During normal operation, the CPU drives the ‘B’ lines during a write action, and the ‘245 reads those lines and drives the ‘A’ lines with the same value. On a read activity, the ‘245 reads the data bus ‘A’ lines and drives the ‘B’ lines with that value, which the CPU then reads on its data pins and processes.

Figure 2: CoCo3 CPU Data Bus Schematic

So far, so good. And, all works fine when we perform a DMA-based read action. After halting the CPU, The DMA engine places an address on the bus, and raises the R/W line to read data from memory, just like the CPU. Doing its part, the ‘245 transceiver pulls the data from the ‘A’ pins and places it on the ‘B’ pins, which are connected to the CPU. It’s a nice gesture, though unneeded, since the CPU isn’t running.

Writing, though, is where the issues arrive. Let’s run through the scenario. After halting the CPU, the DMA engine places an address on the address bus, places some data on the data bus, and pulls the R/W line low to signal a memory write. The ‘245, assuming the CPU is trying to write some data to memory, grabs the values on the ‘B’ pins and places the results on the ‘A’ pins, which are connected directly to the shared data bus. The first problem: The CPU is halted, so it is not outputting any data on the CPU data pins. The ‘245 is effectively pulling residual charge from the tri-stated CPU data bus, cleaning it up and driving the data bus. The second problem: Someone else is already driving the bus: the DMA engine. At the very least, the data bus represents an interdeterminate composite of the expected value from the DMA engine and the unknown value from the CPU data pins. More importantly (and more concerning), one output might be attempting to drive a data line low while the other tries to drive it high. In some sense, this is like connecting a battery’s positive post to its negative post. While the amperage involved is much lower, it’s no less damaging to the ICs.

Focus attention on pin 19 of the 74LS245, the “enable” pin, labeled “G”. (I have not yet been able to determine why “G” was used, maybe “gate”. Other versions of the IC use OE, which means “output enable”). Regardless of the name, the bar over the name indicates that it is “active low”. A logic zero (0) on the pin will enable the function. Ground the pin, and the data passes from A to B or B to A. Connect the pin to a “one” (1) signal and all transceiver channels switch into Hi-Z mode. Notice in Figure 2 that the enable line is connected to a symbol that just ends. That symbol is called “chassis ground”, representing a wire connected to a large sheet of metal (the “chassis” of old electronic equipment was made of metal). I won’t go into the specifics of various grounding options (signal ground, earth ground, and chassis ground), but we can all agree this pin is permanently tied to zero (0). Thus, the transceiver is permanently enabled. Good for the CPU, bad for DMA.

On previous Color Computer designs, the 74LS245 was not even included. I don’t know exactly why the functionality was added to the CoCo3, but we can theorize. While dedicated bus transceivers like the ‘245 clean up marginal signals, it’s highly doubtful that signals originating from the MC6809E showed any integrity loss. Bus transceivers like the 74LS245 can also act as “buffers”, protecting (in this case) the CPU from dangerous voltages or skipes on the data bus. Again, this seems dubious, since dangerous voltages on the address bus or other control lines could just as easily damage the CPU, and no buffers are placed on those lines. However, bus transceivers are also used to increase the “drive” capability of a signal. Think of the bus transceiver like an 8 channel amplifier. Each IC connected to the bus uses a bit of power, which must be supplied by the IC driving the bus. ICs thus often specify their drive characteristics in terms of Transistor to Transistor Logic (TTL) “load”, a current requirement of 1.6 milliampere (mA). A standard TTL IC input comprises one (1) TTL load, and a TTL output can typically drive ten (10) TTL loads (16mA). In electronics, this is called the “fanout” rating. Most ICs provide their fanout capability, and the MC6809E specifies a fanout capability of 4 LSTTL loads. A LSTTL (Low Power Schottky TTL), as the name suggests, requires less power than standard TTL. As such, an LSTTL load is equivalent to ¼ TTL load, or 0.4mA. Even if we simplify and consider everything on the CoCo3 data bus as requiring only 1 LSTTL load (0.4mA) each, the CoCo3 attaches 6 items besides the cartridge port and CPU to the databus, which exceeds the 4 LSTTL limit. For comparison, the CoCo1 attaches 5 items, again not including the CPU and the cartridge port. While the MC6809 can probably drive more than 4 LSTTL loads, and CMOS devices like the GIME probably consume less than 1 LSTTL of load, the ‘245 was most likely placed into the design to address this load issue. Now, some will ask, “But, what about the need to boost the drive on the address lines”? It turns out that the address lines are not connected to as many devices (excepting the cartridge port, most address lines only connect to GIME and the ROM, with A0,A1, and A5 signals connecting to 2 more devices).

We’ve determined the bus transceiver is designed to allow bus sharing, and our theory suggests it was a useful addition to the CoCo3 design. So, how should we proceed? As the title of this article suggests, the problem lies in a missing wire in the schematic. Such lines, called printed circuit board (PCB) traces or just traces, connect all of the IC pins in the CoCo3 design. As noted previously, we need one connected to the “enable” pin of the bus transceiver, one that will be zero (0) when the transceiver should be working, and one (1) when the transceiver should be turned off.

Figure 3: MPU States

It turns out that the MC6809E CPU “BA” (Bus Available) line appears to offer the correct operation and polarity. When zero (0), the CPU is utilizing the bus, and while one (1), the bus is “available” for others. Tying the “enable” line to the CPU BA line should allow all existing functionality to continue while enabling CoCo3 DMA operation. That’s the good news. The bad news: adding this trace in some fashion requires hardware modification. Take heart, though, there are a few different ways to proceed.

For reference, let’s look at an NTSC CoCo3 PCB in Figure 4, showing the 74LS245 bus transceiver. You will notice the transceiver sits in between the CPU and the cartridge port (the PAL PCB is slightly different):

Figure 4: CoCo3 PCB

The “enable” pin on the 74LS245 has been denoted with the red arrow, as has the BA pin on the MC6809E (this particular CoCo3 has had the CPU removed and a socket installed).

NOTE: While testing suggests this modification will not adversely affect any current CoCo3 usage, more testing is needed to conclusively prove the point. The below instructions are designed for those who want to modify and confirm operation. Make these modifications at your own risk.

Option 1:

This option requires the least amount of work, but also looks the least pleasing. Simply snip the 74LS245 pin 19 at the spot where it goes into the PCB, bend it up off the PCB, and solder a small wire from it to pin 6 of the CPU. To return to stock, simply remove the wire, bend the pin down, and resolder to the stub in the PCB. Alternatively, desolder and replace the 74LS245.

Option 2:

This option looks neater, but does require cutting a trace.

For this option, remove the PCB from the case, remove the little retaining clips on the shield under the board, and locate pin 19 of the 74LS245. Cut the trace going to that pin, and then solder a wire from pin 19 to pin 6 of the CPU on the bottom of the PCB. To return to stock, unsolder the wire and create a small solder bridge over the cut trace.

Option 3:

This option requires the most work, but it is easiest to convert back to stock.

Figure 5: DMAEnabler PCB

Desolder and socket the 74LS245. Build and populate the DMEnabler PCB shown below, installing the ‘245 into the adapter PCB and then installing the adapter into the CoCo3. Attach a test grabber lead to the BA pin on the adapter PCB and clip the test lead to pin 6 of the CPU. (Enterprising PCB designers could also create a dual-header PCB that fits in both the CPU and the 74LS245 sockets, removing the need for the test lead)

With one of these alterations, normal CoCo3 operation should continue as before, and DMA functionality will be enabled. As teased in last week’s article, with this modification in place, our test application now transfers data from the CoCo3 to external RAM and back:

Figure 6: CoCo3 Transfer to External RAM

While running in FAST mode (poke &hffd9,0), we can see the double read of $ff67/68 trigger the DMA transfer, which then pulls data from $4000-$4003 at ~2MBps.

Figure 7: External RAM transfer to CoCo3

Going the other way, the two cycle $ff67/68 read triggers the engine to push our data to $5000-$5003, again at ~2MBps.

Given that the addition of a single PCB trace into the CoCo3 circuit board would have added no cost and that the inclusion of that trace appears to maintain full compatibility with earlier CoCo systems while also fully enable bus sharing, one can only assume TANDY did not foresee DMA or other bus sharing activities being needed on the cartridge bus or they did not care to support such functions. Alas, requiring a hardware modification to support CoCo3 bus sharing severely limits adoption of this technique. At this point, folks socketing their CPU to upgrade to the Hitachi 63C09 CPU could also start socketing the 74LS245 or otherwise asking this modification be performed, but the majority of CoCo3 units remain unable to utilize this DMA technique.

Still, one cannot help but wonder… What if there were a way to coax an unmodified CoCo3 into sharing the bus? 🙂

CoCo DMA: Invisible RAM

Now that we’ve learned a bit about how the 6809E wants to be treated during a direct memory access (DMA) request, we can put all we’ve learned into action. But, before we launch into the details, let’s address some questions that arose during previous article discussions:

Some asked why the logic doesn’t just watch for M6809E signals BA and BS to both equal 1 (this condition indicates that the CPU is halted), commencing DMA actions at that point. This is my fault. As I started the project effort, I didn’t make it clear that I am trying to determine what DMA capabilities exist at the external side of the TANDY Color Computer expansion port (game cartridge port). I did comment about the expansion port initially, but I should have called that out in more detail. It’s worthwhile to also answer how one enables DMA when inside the computer, but that effort requires modifying the internals of the machine (at least plugging and unplugging ICs, which might not be socketed), and thus trades design complexity for end user complexity. The lowest entry point for DMA exploration (as with most things) is the expansion port, so I’ve concentrated my efforts there. But, for the record, we now know that an internal DMA engine need only pull HALT low, watch for BA=BS=1, and then transfer data as desired.

Others asked if halting the CPU and performing data transfers can really be called “DMA”. Instead of just answering the question by articulating the literal definition of DMA, I believe the question speaks more about how the term “DMA” has evolved over the years. In the beginning, DMA actions made no promises about CPU activity. The act of offloading memory access from the CPU and the act of running the CPU while that alternative access occurred were completely different efforts. In fact, Motorola manufactured and sold a DMA controller IC (the 6844 DMAC) that performed DMA actions by stopping the CPU (in various ways). That said, in today’s IBM PC-based world, where the memory bus, the peripheral bus, and the CPU bus are all separated and tied together with “bridge” ICs, it’s expected that a DMA action will not impact CPU operations. That’s the question being asked. I still believe the answer is “yes”, not only because of the literal definition but also that period-correct implementations would have stopped the CPU. But, I will agree that performing data transfers while not impacting CPU performance would be ideal. The conversation did bring up some neat ideas on how one might share the memory bus, not stop the CPU and perform DMA activities, which we can consider in later installments.

Concerning our primary objective, we now know how to safely stop the CPU and gain control of the bus, and we also know how to manipulate the bus to transfer data from one place to another. But, transferring a single piece of data is boring, so let’s add some more value. On the Color Computer 1 and 2, a fully expanded machine would include 64kB of RAM. Expanding RAM beyond 64kB typically takes 1 of 2 paths:

  • Internal RAM expansion. While CoCo systems were being manufactured, some vendors offered internal memory expansion options that “paged” RAM in 32kB banks. The RAM was easy to access, but the granularity was poor (if you needed 1 byte from another bank, you had to swap out 32kB of code and/or data to access it). Internal expansion solutions also had to contend with motherboard layout differences and lack of socketed ICs.
  • External RAM expansion using the cartridge port. Since the entire address and data bus resides on the expansion port, we can place additional on the bus there. However, due to technical reasons, only 32kB of internal memory can be protected from external memory writes, and I believe external memory cannot be seen by the video subsystem. The currently produced MOOH expansion memory and SD card interface by Tormod Volden represents this category.

There may be value in supporting a third option: DMA memory expansion. Place a large amount of memory external to the machine and enable DMA actions to swap that external memory with the data inside the machine.

Pros:

  • 1 byte granularity. If you need to map in 1 data value, DMA expansion can support that. No 16kB or 32kB banking granularity.
  • No restrictions on memory location. Map in new data anywhere internal RAM exists.
  • No restrictions for video. Map in new data anywhere and the MC6847 VDG can see it.

Cons

  • Slower than banked memory. Each mapped byte takes ~1uS to map.
  • Still limited by internal RAM size. If the CoCo has 4kB of RAM, one can only map in 4kB of data at a time.

Still, in spite of the potential drawbacks, we should implement the idea, if only to use as a stepping stone to more expansive capabilities.

We’ll implement this invisible memory idea by adding some IO registers to our test logic:

  • 3 bytes to hold the memory location in external memory we wish to transfer to/from
  • 2 bytes to hold the memory location in internal memory we wish to transfer from/to
  • 2 bytes to hold the length of memory to transfer
  • 1 byte to configure what type of transfer we want (and some other flags), like:
    • Transfer from internal memory to external?
    • Transfer from external memory to internal?
    • Use a constant address for internal memory (don’t increment internal address after each transfer)?
    • Use a constant address for external memory (don’t increment external address after each transfer)?
    • Enable DMA transfers?

As presented during the last discussion, let’s use the 2 byte “length” parameter as our trigger to start a DMA transfer. This allows the programmer to make most efficient use of code, by leveraging the 16 bit length register as both information sharing and process initiation.

Let’s get into the Verilog code now:

always @(negedge e_cpu or negedge _reset_cpu)
begin
   if(!_reset_cpu)
   begin
      flag_write <= 0;
      flag_mem_hold <= 0;
      flag_sys_hold <= 0;
      flag_active <= 0;
   end
   else if(ce_ctrl & !r_w_cpu)
   begin
      flag_write <= data_cpu[0];
      flag_mem_hold <= data_cpu[5];
      flag_sys_hold <= data_cpu[6];
      flag_active <= data_cpu[7];
   end     
end

  • flag_write = Are we doing a read from internal memory or a write to internal memory?
  • flag_mem_hold = Do not increment the external memory address after every transfer
  • flag_sys_hold = Do not increment the internal memory address after every transfer
  • flag_active = Enable DMA engine

This code just sets the operational flags from the various data bits, or resets them during a reset activity.

always @(posedge e_cpu or negedge _reset_cpu)
begin
   if(!_reset_cpu)
      begin
         flag_halt <= 0;
         flag_knock <= 0;
         flag_run <= 0;
      end
   else if(!flag_dma & ce_knock)
      begin
         flag_halt <= 1;
         flag_knock <= 1;
      end
   else if(!flag_dma & ce_knock2 & flag_knock)
      begin
         flag_knock <= 0;
         flag_run <= 1;
      end
   else if(!flag_dma & !ce_knock2 && flag_knock)
      begin
         flag_knock <= 0;
         flag_halt <= 0;
      end
   else if(flag_dma && (!len))
      begin
         flag_halt <= 0;
         flag_knock <= 0;
         flag_run <= 0;
      end
end

  • flag_knock = access to $ff67
  • flag_run = access to $ff68 after an immediately preceding access to $ff67

This represents a finite state machine with 3 states (IDLE, KNOCK, RUN). I will later optimize this to use a 2 bit “state” value instead of 3 binary bits. Still, I think you can see the sequence. On reset, reset the flags. If $ff67, set flag_knock. If we next see $ff68 (ce_knock), set run flag. The additional check for flag_dma is simply to prevent retriggering a DMA activity while performing a DMA activity that accesses $ff67:68 😊.

always @(negedge e_cpu)
begin
   flag_dma <= flag_active & flag_run;
end

We quantize DMA actions to begin and end on the falling edge of E, and DMA can only occur if the criteria is met and the DMA engine is enabled.

always @(*)
begin
   if(e_cpu & r_w_cpu & ce_addre)
      data_cpu_out = address_mem_out[23:16];
   else if(e_cpu & r_w_cpu & ce_addrh)
      data_cpu_out = address_mem_out[15:8];
   else if(e_cpu & r_w_cpu & ce_addrl)
      data_cpu_out = address_mem_out[7:0];
   else if(e_cpu & r_w_cpu & ce_addrh_sys)
      data_cpu_out = address_sys[15:8];
   else if(e_cpu & r_w_cpu & ce_addrl_sys)
      data_cpu_out = address_sys[7:0];
   else if(e_cpu & r_w_cpu & ce_lenh)
      data_cpu_out = len[15:8];
   else if(e_cpu & r_w_cpu & ce_lenl)
      data_cpu_out = len[7:0];
   else if(flag_dma & !r_w_cpu)
      data_cpu_out = data_mem;
   else
      data_cpu_out = 8'bz;
end

This code looks complicated, but it’s just allowing the developer to read various values. The only condition of interest is the one where flag_dma is active and R/W is 0 (a write to internal memory. In this case, we want to bridge the external memory data bus to the internal memory databus.

always @(*)
begin
   if(!flag_write & flag_dma)
      data_mem_out = data_cpu;
   else
      data_mem_out = 8'bz;
end

Conversely, if we are transferring data from internal memory to external, bridge the CoCo data bus to the external memory data bus.

always @(negedge e_cpu or negedge _reset_cpu)
begin
   if(!_reset_cpu)
      address_mem_out <= 0;
   else if(ce_addre & !r_w_cpu)
      address_mem_out[23:16] <= data_cpu;
   else if(ce_addrh & !r_w_cpu)
      address_mem_out[15:8] <= data_cpu;
   else if(ce_addrl & !r_w_cpu)
      address_mem_out[7:0] <= data_cpu;
   else if(flag_dma & !flag_mem_hold)
      address_mem_out <= address_mem_out + 1;
end

Again, the Verilog looks complicated but is not. We’re simply storing the various pieces of the starting memory address via writes from the CoCo, in 8 bit chunks. During a DMA activity without the address being held, we increment during each falling edge of E.

always @(negedge e_cpu or negedge _reset_cpu)
begin
   if(!_reset_cpu)
      address_sys <= 0;
   else if(ce_addrh_sys & !r_w_cpu)
      address_sys[15:8] <= data_cpu;
   else if(ce_addrl_sys & !r_w_cpu)
      address_sys[7:0] <= data_cpu;
   else if(flag_dma & !flag_sys_hold)
      address_sys <= address_sys + 1;
end

Same story here. We store the internal starting memory address in registers in 8 bit chunks, and we increment the counter by 1 during each cycle if the internal memory address is not being locked into position.

always @(negedge e_cpu or negedge _reset_cpu)
begin
   if(!_reset_cpu)
      len <= 0;
   else if(ce_lenh & !r_w_cpu)
      len[15:8] <= data_cpu;
   else if(ce_lenl & !r_w_cpu)
      len[7:0] <= data_cpu;
   else if(flag_run)
      len <= len - 1;
end

Finally, perform the same action for the length, storing in 8 bit chunks and incrementing while the DMA activity is occurring.

always @(*)
begin
   if(flag_dma)
      begin
         address_cpu_out = address_sys;
         r_w_cpu_out = !flag_write;
      end
   else
      begin
         address_cpu_out = 16'bz;
         r_w_cpu_out = 'bz;
      end
end

Normally, we don’t mess with the address bus, preferring to read it for information. But, during a DMA cycle, we need to place an address on the CoCo bus.

assign _halt = (flag_active & flag_halt ? 0 : 'bz);
assign _ce_ram =!flag_dma;
assign _we_mem =!(flag_dma & !flag_write);

One would think we could use flag_run to configure HALT, but flag_run only goes active after both trigger accesses have happened. Thus, using flag_halt lets us start the HALT condition after the first trigger (access to $ff67) if the DMA engine is active. External memory is selected only during DMA cycles, while the !WE signal to external memory is only enabled if we are reading from internal memory (this signal is overqualified, in that the external memory won’t be active unless flag_dma is active, which means this signal could be simplified to assign _we_mem = flag_write;

After compiling and downloading the firmware into our test cartridge (which contains 512kB of static RAM), let’s see what we can do. We’ll enable the DMA engine (128 => $ff69), set the external address to $000000, internal address to $4000, and length to $0004:

I’d like to interrupt for a second and express appreciation to David Wood (jbevren on IRC and Discord) for pointing me to the VNC server capabilities on my HP logic analyzer. Starting VNC allows me to capture better screenshots and remotely control the logic analyzer.

In this case, we performed a read from $ff67:68, since the program was written primarily in BASIC with a small EXEC to perform LDD $ff67:rts. We can see the $ff67:68 reads, the HALT line going low after the $ff67 access, the wait for $ff68 access, and then 4 reads from internal memory. Before issuing the DMA transfer, I populated $4000-$400a with the ascending values 0-10, and we see them pulled across the data bus in the trace above.

Now, let’s got the other way, transferring those external values back into internal memory at $5000 by turning the DMA engine on, switching transfer direction (129 => $ff69), setting the external address to $000000, setting the internal address to $5000, and keeping the length at $0004:

Again, we see the trigger condition $ff67:68, the HALT condition, and the transfer of our 4 data values from external memory to internal locations, and we see the address incrementing with each transfer. Additional tests show “pinning” an address works as well, which can be useful in the following situations:

  • setting memory to a constant value at 1byte/us by pinning the source address at a location that contains that value
  • Dumping data to the Orchestra 90 or other CoCoSDC at 1MB/s (Yes, the DMA action works even if both the DMA device and the peripheral are external to the CoCo!)

This invisible memory implementation nicely illustrates the DMA capability available on the TANDY Color Computer 1 and 2. Perhaps this knowledge triggers a fellow enthusiast to develop software that will play to the advantages of this memory expansion option. To that end, all resources associated with this effort have been placed under the Creative Commons Share-Alike 4.0 license and uploaded to a GitHub project repository:

https://github.com/go4retro/PhantomRAM/

If interest warrants, PhantomRAM can be turned into a technology solution, though that is outside the scope of this research effort.

Next time, we turn our attention to the 3rd system in the Color Computer lineup and the challenges it poses concerning DMA functionality. Until next time, I present the following two screen shots (hint: check the timestamps 😊)

CoCo DMA: “Fighting on the bus”

Picking up from last time, we were able to successfully place a byte into the internal CoCo memory from the cartridge port without the use of the CPU by utilizing a direct memory transfer procedure. However, after the transfer, BASIC programs would stop with errors at times, machine language programs would simply lock up, and the IO registers of the cartridge device would be corrupted. Clearly, our initial implementation has issues. Since education and understanding drive this effort, we need to dig deeper into the actual bus activity. For that, we must turn to the digital logic designer’s tool of choice: the logic analyzer

For anyone who has even a passing interest in digital circuitry, I (and so many others) strongly recommend obtaining a 10-20MHz dual channel oscilloscope. A multimeter may be the first tool purchased, but an economical dual channel scope should be next on the list. That said, while oscilloscopes are great to see transients and strange signals (like NTSC or audio), they don’t handle digital logic investigation as well. Enter the logic analyzer. Instead of trying to replicate the shape of a signal like a scope, the LA simply detects whether a signal is 0 or 1 (typically using TTL voltage levels as a reference, where 0 = 0-0.8V, and 1= 2.1V-5V). It performs this limited action across many channels, as opposed to the 2 or 4 of a scope. After a scope, I strongly recommend obtaining a small 8-channel USB-based analyzer. They are inexpensive, easy to use, and 8 channels supports simple parallel testing and a plethora of serial protocol testing (RS232, SPI, I2C, etc.). That said, debugging a single board computer or large interface card can take a long time with 8 (or even 16 or 34) analyzer channels. Thus, I also keep a larger professional grade logic analyzer (it used to be the only option, before the USB analyzer options came on the market). Found on eBay from test equipment manufacturers like Tektronix and HP/Agilent/Keysight at reasonable pricing, these units support dozens and sometimes hundreds of analysis channels at frequencies far beyond what the 1980’s computer enthusiast will regularly see.

HP/Agilent/Keysight 16702A Logic Analysis System
Picture 1 of 1
3M 40 pin Test Clip

I’ve recently upgraded my bench analyzer from a 1980’s era HP 1650b (great unit, cheap to buy and own, now passed onto a fellow hardware designer to continue its usefulness) to a 2000’s era HP modular system (HP 16702a). This project gives me my first opportunity to learn how this unit works. I first connect the individual analyzer channels to the various 6809E signals using a 40-pin 3M Test Clip. I highly recommend adding this to your toolbox for signal inspection (the units are expensive if purchased new, but they never wear out, and eBay has more reasonable pricing), as they simplify moving the signal investigation to a new IC or new system. Since we want to investigate all of the bus activity, I place address lines 0-15 under test, as well as data lines 0-7, R/W (read/write), HALT, and the clock signals (E and Q). As we want to know what happens after the access of $ff61, we set a trigger on access to that memory location.

To more accurately pinpoint the issue, we’ll use a hastily (and terribly, I’ll add) written 6809 assembly application to exercise the test device and which exhibits the crashing behavior. A snippet is included below:

lda LOC* grab initial data at $4000
jsr CONVERT * convert binary to hexadecimal and return in D
std SCREEN * store at $0400
lda DMA * execute the DMA action ($ff61)
lda DATA * grab data at $ff60 (the external IO data location)
jsr CONVERT * convert binary to hexadecimal and return in D
std SCREEN+2 * store at $0402
lda LOC * grab data at $4000 (the internal IO data location)
jsr CONVERT * convert binary to hexadecimal and return in D
std SCREEN+4 * store at $0404

Since the application originates at $0e00, the lda DATA instruction after our $ff61 access resides at $0e4c. When we run the test program, the logic analyzer triggers and the program crashes. But, the evidence surfaces. On the logic analyzer, we see the following:

Address Data R/W HALT Notes
0e49 b611
0e4a ff11
0e4b6111
ffff2711Dead Cycle
ff612710Nothing at $ff61, but bus rests at $27
40002d01 Our data write ($2d = 45)
We sample on falling E, so HALT=1
0e4d ff11 Why did we jump to $0e4d?

Folks probably start to see what is going on, but it pays to be sure. The Verilog is modified to not activate the address bus or R/W line, hold HALT low forever, and the test is repeated. Here is the result:

Infinite HALT condition

The problem begins to show itself. Even though the device has pulled HALT low before the end of the instruction execution, the CPU reads and executes one more instruction before releasing the bus. We can tell because of the state of the BS and BA lines. The 6809 datasheet notes that BS=BA=1 signifies a HALT condition in the CPU. Further attempts to pull HALT low earlier in the $ff61 read cycle make no difference.

Darren Atkinson (of CoCoSDC design fame) emailed after hearing about the project effort, asking about HALT line triggering. I initially misread his email as inquiring whether I had pulled HALT low early enough in the instruction cycle. But, Darren responded again and pointed to a key portion of the datasheet I had misinterpreted:

6809E Datasheet HALT Condition Specifics

Hint: It’s the text at the top of the page. I noticed the “2nd to Last Cycle of Current Instruction” notation, but interpreted it to be illustrating to the reader that activating the HALT line before the last cycle of an instruction would not cause an immediate reaction. From my days working with the 6502, I knew that 8-bit processors are not designed to maintain an intermediate instruction state for any length of time. The CPU assumes that once an instruction is started, it must complete before anything else will be processed. While this doesn’t seem to be a concern for the HALT condition (just simply stop the processor, wait for HALT to become inactive, and continue on), it makes sense that HALT would use the same sense logic and internal handling logic as interrupts, and stopping the CPU in mid-instruction to handle an interrupt would require saving an intermediate instruction state. Thus, for that reason, the simpler 8-bit CPUs just don’t do that. Once an instruction opcode is fetched, the CPU will fully execute the current instruction before handling any event. If the datasheet showed the HALT line going low on the last cycle, it could imply that execution would stop at the end of the current instruction cycle. Thus, I thought this text was simply reinforcing the text describing the HALT pin: “A low level on this input pin will cause the MPU to stop running at the end of the present instruction and remain halted indefinitely without loss of data”.

However, my assessment was plain wrong. As Darren pointed out, the “2nd to last instruction” text carries crucial significance. As well, it’s the key to the problem we are experiencing. The Verilog is activating HALT on the last cycle of the current instruction, which is actually too late. The CPU moves ahead to read and process the next instruction, only then latching and acting on the HALT condition. I wish this prerequisite had been noted somewhere in the datasheet, but I checked the 6809E, the 6809, the 63C09E and the 63C09 datasheets online and in my possession and found no mention of the constraint. That said, a Facebook commenter also pointed out this requirement, so perhaps everyone in CoCo land knows this tidbit of information.

Now that we know the issue, how do we solve it? I first added a NOP to the code, thinking that doing so would allow the next real instruction to be read correct (the lda $ff60). However, before testing, I quickly realized that would not work. Since the CPU is still reading an opcode, it would interpret any data on the databus during that cycle as the next opcode, potentially altering the program anyway. I next added Verilog code to wait an arbitrary 8 cycles after HALT activation before accessing memory:

HALT with 8 wait states
  • The read from $ff61 happens, and HALT is activated
  • A NOP is read
  • The next instruction is read (the second cycle of a NOP instruction appears to read and discard the next opcode)
  • Then the CPU is halted (as shown by the BS=BA=1 condition
  • The Verilog waits a few more cycles (shown as $ffff:$27 read cycles)
  • Then the data is transferred to 4000 (in this test, the written value was $27) while HALT is deactivated.
  • The CPU takes a cycle to acknowledge the deactivation
  • The CPU takes another cycle to prepare for startup
  • Then, the $b6 opcode (of lda $ff60 instruction) is read

Success! BASIC test applications begin running to completion with no unexpected errors, and the test machine language application no longer crashes the machine, instead it runs to completion and illustrates that all 256 values can be transferred from outside the machine to internal memory locations.

That said, adding dead cycles to the DMA engine is far from ideal. In fact, after Darren noted that a full register stack-up could take 20+ cycles on a 63C09, I quickly found the dead cycle wait idea untenable. Clearly, we need to find a better way to trigger a DMA transfer. I quickly sketched out a few requirements and some preferences:

  1. The action must pull HALT low prior to the second to last instruction cycle
  2. The condition must not occur over two consecutive opcodes (interrupt could occur in between)
  3. The condition must not require scanning the databus for specific opcode/operand sequences (i.e. watch the databus for $b6,$ff,$61)
    1. Doing so requires constantly activating the SLENB line when installed in an MPI, since the data bus is hidden from MPI slots unless an IO access is made or SLENB is activated. And, SLENB being activated on arbitrary memory locations would cause other problems
    2. There’s no guarantee such a sequence would never occur in any other way.
  4. Ideally, the “trigger” action should be an instruction that would be otherwise needed (so as to not waste any time performing some action JUST to start the transfer)
  5. If a memory address trigger, the address should be constant

I just as quickly decided that CPU instructions that perform 2 memory accesses in a single instruction would ideally suit these constraints. HALT could be triggered to go low on the first memory access, and if the second memory access did not occur 1 or 2 cycles later, the HALT condition would become inactive and the system would not perform the transfer. If the condition was met, the system initiate a transfer after the second memory access. I first gravitated to INC $ff61, which performs a ®ead-(M)odify-(W)rite action (reading the memory location, adding 1 to the value, and writing it back to memory). Since INC $ff61 performs a read cycle, then takes an internal cycle to perform arithmetic (ALU) operations, and finally writes the new value back, the Verilog state engine needed to detect the initial access, wait a cycle, and then detect the second access. That’s not hard to support, but it’s wasteful. Supporting preference #4 proves the larger challenge. Rarely would someone need to increment a value in the DMA engine register set. While I was debating options, fellow enthusiast and Nitros9 developer L. Curtis Boyle suggested LDD, which performs 2 consecutive memory accesses in 2 consecutive cycles. While LDD may not be of great use, STD (store D) would often be used to place a 16 bit value in the registers, as would other 16-bit memory store operations. Since it also simplifies the Verilog, we will utilize it as the DMA transfer start mechanism. In essence:

  • Once the lower IO register is accessed, bring HALT low and set flags
  • In the immediate next cycle:
    • If the next higher IO address is accessed in the next cycle, continue to hold HALT low and prepare for a DMA transfer
    • If not, release the HALT line and do not prepare for a DMA transfer

If the first condition is met but not the second, the CPU will likely notice the HALT condition and stall for 2 cycles after the current instruction, but no harm will arise. To prevent accidental DMA activity while setting register values, we should also add a bit in the eventual control register to enable or disable DMA transfers.

More testing is needed to understand other issues, if any, with assuming a transfer can begin immediately following an LDD/STD type instruction. Since Darren Atkinson already broached the subject of interrupt handling, he performed a test that activated NMI and HALT at the same time, attempting to understand which behavior took precedence. Darren utilized the following circuit and program snippet:

By organizing the stack right above screen memory and driving NMI and HALT low with an SCS access to $ff40, NMI precedence would show as characters on screen (stack values). When executed, no such values appeared, which strongly suggests HALT takes precedence over interrupts.

Now that we have verified the ability to safely transfer data from outside the CoCo to internal memory without disrupting program execution and a way to reliably and atomically trigger such a transfer, we will put all of the pieces together into action. Many thanks to those following along on this journey so far, and a special thanks to Mr. Atkinson for his insights.

CoCo DMA Early Efforts

As referenced in http://www.go4retro.com/2020/02/26/direct-memory-access-possibilities-on-the-tandy-color-computer/, I am attempting to coax some direct memory access functionality from a TANDY Color Computer. I’ll be utilizing a TANDY Color Computer 1 for initial testing, as I feel it most closely matches the Motorola 6809 reference architecture and this year is the 40th anniversary of the machine’s introduction.

As noted in the last article, the !HALT signal is key to enabling DMA functionality (if possible at all) on the 6809. The datasheet stipulates that this line must fall no later than 200nS (for a 1MHz CPU) before the falling edge of the Q clock to ensure correct operation. As well, the signal must rise 200nS prior to the Q clock falling edge of the last DMA cycle. It turns out that, for a ~1MHz system, each of the 4 clock phases exhibited by the two clock signals (E and Q) last 250nS:

  • eq = 250nS
  • eQ = 250nS
  • EQ = 250nS
  • Eq = 250nS

Thus, to ensure our signal becomes active 200+nS before the fall of Q, we can simply enable the signal at the rising edge of E. Coupled with the fact that the CoCo 1 runs a bit slower than 1MHz (.89MHz), we have over 40nS of buffer.

As previously noted, I am utilizing a Complex Programmable Logic Device to simplify the hardware portion of these tests. To make the device perform work, I write the logic in the Verilog Hardware Description Language (HDL). Since I develop in the C programming language, I chose Verilog since it resembles C in some respects (functions, assignment, case statements, etc.). One can also program in VHDL, which is often compared to ADA (its verbosity reminds me of COBOL, actually), but these can spark religious debates. Suffice it to say that Verilog works well for me, and others may find other solutions. Cue the Verilog writing-style comments 🙂

I’ll dispense with the module definition, since it’s pretty standard (it resembles a C function definition), and get right to the important parts:

assign ce_reg = (address_cpu[15:4] == 12'hff6);
assign ce_test = ce_reg & (address_cpu[3:0] == 0);
assign ce_start = ce_reg & (address_cpu[3:0] == 1);

I’ve situated the hardware registers in the CoCo IO range, and moved them out of the way of a floppy drive controller. This allows me to test my hardware with a floppy drive using a simple ‘Y’ cable. We’ll set up a storage register at $ff60 for a value to DMA back to the CoCo, and we’ll use $ff61 to start a transfer.

register #(.WIDTH(8)) reg_test(
q_cpu,
!_reset_cpu,
ce_test & !r_w_cpu,
data_cpu,
data_test
);

This just points to a user developed module defined elsewhere that creates little 74ls574-like devices to hold data, clocked on the falling edge of Q, using data_cpu as the input, and placing the result in data_test.

always @(*)
begin
if(e_cpu & r_w_cpu & ce_test)
data_cpu_out = data_test;
else if(flag_dma)
data_cpu_out = data_test;
else
data_cpu_out = 8'bz;
end

When one has a bunch of assignment options in Verilog, this is an ideal way to compose the logic. Essentially, since I dislike “write-only” registers, the first condition checks if someone is trying to read the value from $ff60 or not. The second condition is our DMA transfer condition, and the third (default) condition places the output lines in a tri-stated configuration. Yes, I could combine the first and second conditions, but I’ll not do that.

always @(posedge e_cpu)
begin
if(ce_start)
flag_halt <= 1;
else
flag_halt <= 0;
end

As discussed earlier, we want !HALT to go low at the rising edge of E, and go high in the same spot when the DMA cycle is concluding. Since we’re only doing a single DMA transfer, this code works fine.

assign _halt = (flag_halt ? 0 : 'bz);

We’ll use our flag to push the !HALT line low or keep it tristated (!HALT is like !IRQ and !FIRQ and !NMI, in that they are all “wired-or” type signals.

always @(negedge e_cpu)
begin
if(flag_halt)
flag_dma <= 1;
else
flag_dma <= 0;
end

While we have to handle the !HALT line essentially in the middle of the cycle prior to the DMA, we can’t start the actual DMA cycle that early. This Verilog allows us to position the DMA cycle at the beginning and ending boundaries of the E cycle, which defines the CPU clock cycle.

always @(*)
begin
if(flag_dma)
begin
address_cpu_out = 16'h4000;
r_w_cpu_out = 0;
end
else
begin
address_cpu_out = 16'bz;
r_w_cpu_out = 'bz;
end
end

If we’re in a DMA cycle, place $4000 on the address bus (arbitrary testing address) and set R/W to 0. Otherwise, tri-state both sets of signals.

All that remains is to compile the Verilog into a suitable JEDEC bit file and download into the Xilinx 95288XL-6 144 pin CPLD 5 volt tolerant CPLD. Yes, it’s overkill for this test (just a few IO would have been plenty), but it’s what was close at hand, and using a large part for initial development allows on to focus on the code rather than the size of the code. This unit has a 6nS latency, which must be added to all signal timings.

On the CoCo1 side, we need to write a small test program. In this case, let’s just place 45 in the CPLD register, issue the DMA cycle, and see if it made it’s way to $4000.

1 poke &hff60, 45: poke &h4000,0
2 ? peek(&hff60);
3 a = peek(&hff61): rem don't care, just need something to hit that address
4 ? peek (&h4000)

After downloading the Verilog firmware, writing the CoCo BASIC program, and executing, we see the following on the logic analyzer:

Figure 1: 6809 DMA Example

We see the !HALT line going low right as the E clock goes high, and then we see the R/W line falling just as the previous cycle end and the next cycle starts. In the midst of the DMA cycle, we see the !HALT line rise, which meets the timing requirements.

The results? Promising. After running the test, location $4000 contains the expected 45, so the write completed successfully. However, all is not right as yet.

  1. After a transfer, the value in $ff60 is corrupted. Not sure why that would be, but I suspect it related to #2
  2. For some test values, BASIC will return an error during the A=peek(&hff61) line and stop the program.

Removing the line that places the test data on the data bus during a DMA cycle eliminates the issue, which means the mere presence of data on the data bus during a DMA cycle is an issue. The next step is to wire up the CPU to a larger logic analyzer, one that can consume all of the address and data lines of the 6809 at one time. The good news is that I have just recently purchased a HP 16702A Logic Analysis System economically (from eBay) and outfitted it with 2 16717A 68-channel 333MHz timing logic analysis cards (134 logic channels in total, though 1 will suffice for this CPU). Now, I just need to learn how to use it (it’s a sight more complex than my older HP 1650B logic analyzer). Off to debug!