CoCo DMA Early Efforts

As referenced in, I am attempting to coax some direct memory access functionality from a TANDY Color Computer. I’ll be utilizing a TANDY Color Computer 1 for initial testing, as I feel it most closely matches the Motorola 6809 reference architecture and this year is the 40th anniversary of the machine’s introduction.

As noted in the last article, the !HALT signal is key to enabling DMA functionality (if possible at all) on the 6809. The datasheet stipulates that this line must fall no later than 200nS (for a 1MHz CPU) before the falling edge of the Q clock to ensure correct operation. As well, the signal must rise 200nS prior to the Q clock falling edge of the last DMA cycle. It turns out that, for a ~1MHz system, each of the 4 clock phases exhibited by the two clock signals (E and Q) last 250nS:

  • eq = 250nS
  • eQ = 250nS
  • EQ = 250nS
  • Eq = 250nS

Thus, to ensure our signal becomes active 200+nS before the fall of Q, we can simply enable the signal at the rising edge of E. Coupled with the fact that the CoCo 1 runs a bit slower than 1MHz (.89MHz), we have over 40nS of buffer.

As previously noted, I am utilizing a Complex Programmable Logic Device to simplify the hardware portion of these tests. To make the device perform work, I write the logic in the Verilog Hardware Description Language (HDL). Since I develop in the C programming language, I chose Verilog since it resembles C in some respects (functions, assignment, case statements, etc.). One can also program in VHDL, which is often compared to ADA (its verbosity reminds me of COBOL, actually), but these can spark religious debates. Suffice it to say that Verilog works well for me, and others may find other solutions. Cue the Verilog writing-style comments 🙂

I’ll dispense with the module definition, since it’s pretty standard (it resembles a C function definition), and get right to the important parts:

assign ce_reg = (address_cpu[15:4] == 12'hff6);
assign ce_test = ce_reg & (address_cpu[3:0] == 0);
assign ce_start = ce_reg & (address_cpu[3:0] == 1);

I’ve situated the hardware registers in the CoCo IO range, and moved them out of the way of a floppy drive controller. This allows me to test my hardware with a floppy drive using a simple ‘Y’ cable. We’ll set up a storage register at $ff60 for a value to DMA back to the CoCo, and we’ll use $ff61 to start a transfer.

register #(.WIDTH(8)) reg_test(
ce_test & !r_w_cpu,

This just points to a user developed module defined elsewhere that creates little 74ls574-like devices to hold data, clocked on the falling edge of Q, using data_cpu as the input, and placing the result in data_test.

always @(*)
if(e_cpu & r_w_cpu & ce_test)
data_cpu_out = data_test;
else if(flag_dma)
data_cpu_out = data_test;
data_cpu_out = 8'bz;

When one has a bunch of assignment options in Verilog, this is an ideal way to compose the logic. Essentially, since I dislike “write-only” registers, the first condition checks if someone is trying to read the value from $ff60 or not. The second condition is our DMA transfer condition, and the third (default) condition places the output lines in a tri-stated configuration. Yes, I could combine the first and second conditions, but I’ll not do that.

always @(posedge e_cpu)
flag_halt <= 1;
flag_halt <= 0;

As discussed earlier, we want !HALT to go low at the rising edge of E, and go high in the same spot when the DMA cycle is concluding. Since we’re only doing a single DMA transfer, this code works fine.

assign _halt = (flag_halt ? 0 : 'bz);

We’ll use our flag to push the !HALT line low or keep it tristated (!HALT is like !IRQ and !FIRQ and !NMI, in that they are all “wired-or” type signals.

always @(negedge e_cpu)
flag_dma <= 1;
flag_dma <= 0;

While we have to handle the !HALT line essentially in the middle of the cycle prior to the DMA, we can’t start the actual DMA cycle that early. This Verilog allows us to position the DMA cycle at the beginning and ending boundaries of the E cycle, which defines the CPU clock cycle.

always @(*)
address_cpu_out = 16'h4000;
r_w_cpu_out = 0;
address_cpu_out = 16'bz;
r_w_cpu_out = 'bz;

If we’re in a DMA cycle, place $4000 on the address bus (arbitrary testing address) and set R/W to 0. Otherwise, tri-state both sets of signals.

All that remains is to compile the Verilog into a suitable JEDEC bit file and download into the Xilinx 95288XL-6 144 pin CPLD 5 volt tolerant CPLD. Yes, it’s overkill for this test (just a few IO would have been plenty), but it’s what was close at hand, and using a large part for initial development allows on to focus on the code rather than the size of the code. This unit has a 6nS latency, which must be added to all signal timings.

On the CoCo1 side, we need to write a small test program. In this case, let’s just place 45 in the CPLD register, issue the DMA cycle, and see if it made it’s way to $4000.

1 poke &hff60, 45: poke &h4000,0
2 ? peek(&hff60);
3 a = peek(&hff61): rem don't care, just need something to hit that address
4 ? peek (&h4000)

After downloading the Verilog firmware, writing the CoCo BASIC program, and executing, we see the following on the logic analyzer:

Figure 1: 6809 DMA Example

We see the !HALT line going low right as the E clock goes high, and then we see the R/W line falling just as the previous cycle end and the next cycle starts. In the midst of the DMA cycle, we see the !HALT line rise, which meets the timing requirements.

The results? Promising. After running the test, location $4000 contains the expected 45, so the write completed successfully. However, all is not right as yet.

  1. After a transfer, the value in $ff60 is corrupted. Not sure why that would be, but I suspect it related to #2
  2. For some test values, BASIC will return an error during the A=peek(&hff61) line and stop the program.

Removing the line that places the test data on the data bus during a DMA cycle eliminates the issue, which means the mere presence of data on the data bus during a DMA cycle is an issue. The next step is to wire up the CPU to a larger logic analyzer, one that can consume all of the address and data lines of the 6809 at one time. The good news is that I have just recently purchased a HP 16702A Logic Analysis System economically (from eBay) and outfitted it with 2 16717A 68-channel 333MHz timing logic analysis cards (134 logic channels in total, though 1 will suffice for this CPU). Now, I just need to learn how to use it (it’s a sight more complex than my older HP 1650B logic analyzer). Off to debug!

Direct Memory Access Possibilities on the TANDY Color Computer

As part of my continuing efforts to understand direct memory access (DMA) capabilities of various 80’s home computer systems, I decided I should figure out what, if any, DMA capabilities are possible on the TANDY Color Computer systems. There are 3 essential models in the lineup, though from a technical perspective, I feel there are 2 main variants: The CoCo1 and 2, which share very similar features and capabilities (mainly, the difference is in some cost reduced circuitry in the 2 and more memory in the later machines) and the CoCo3, which contains a more capable video processor, substantially more memory, and a memory management unit (MMU). Most folks also consider the Dragon Data Dragon machines as part of this lineup, and those are roughly similar to the CoCo1/2 systems (both systems seem to be based on a Motorola reference design. The CP1400 might also qualify, but I don’t have one and know little about the design, so I will consider the 4 above machines inclusively

A pre-requisite for DMA operation is an ability to stop the running CPU in some fashion and a way to access internal memory and/or IO from the cartridge or expansion port. The VIC-20, for example, has no way to stop the running CPU, so DMA is not possible. However, TANDY (and Dragon Data) helpfully provided a pin on the expansion port that will temporarily shut off the internal Motorola M6809E CPU and remove its address and data signals from the system bus (this part is essential, because if the CPU is not running but it still outputs address and data on the system bus, the bus is not available for other users). On the M6809E, that signal is called !HALT and is active low. To “halt” the 6809 CPU, simply bring this line low on the expansion port during a cycle. Of course, nothing is that simple, so here’s some more detail:

Figure 8: HALT behavior

!HALT can be pulled low anytime, but will only take effect if it falls 200nS or more before the falling edge of Q (Q is 1 half of the 4 phase clock system used by the 6809 and typically is high during the middle of the CPU clock “cycle”). This consideration is called tPCS on the timing diagram, and is 200nS for a 1MHz CPU, 140nS for a 1.5MHz, and 100nS for a 2MHz CPU. As well, one can bring !HALT high at a later time, but it again must occur tPCS nS before the fall of the Q clock signal. Still, if one adheres to those rules, the entire address and data bus (on the CoCo1 and 2, anyway) is available at the expansion port for reading from and writing data to the internal memory.

One must crawl, then walk, and then run in these efforts, so the first thought is to figure out the !HALT signalling and attempt to transfer a single value from the expansion port to an internal memory location. Testing would then involve setting the internal memory location to a value other than the one expected, triggering the DMA functionality, and then looking at the internal memory location. If the value has been changed, one can be confident a DMA transfer has occurred. Once that is in place, it can be used as a foundation to expand the functionality to support the transfer of many data values to many locations, and then digress into the various useful DMA use cases (fast floppy emulator, extra memory on demand, feeding audio data to a DAC, etc.). But, it’s a tall order to transfer that first byte, so let’s focus on the task at hand.

Many years ago, I would approach this by grabbing a perfboard and wiring up some 74LS TTL logic ICs on the board to approximate the functionality desired. Of course, it would be wrong, so many hours of rewiring and soldering up new ICs would be involved. Now, though, as much as some retro enthusiasts object, I utilize a suitable Complex Programmable Logic Device (CPLD) attached to a small cartridge PCB to efficiently design the required logic. While soldering and deciding on the correct TTL IC has value, if you’re trying to make something work, those activities can get in the way of the design process. A CPLD allows one to change the design simply by modifying the “software”, downloading it into the IC, and testing. Once the design is working, one can translate back into TTL ICs if desired, or (in many of my recent designs), simply continue to utilize a CPLD. CPLDs are the children of PLDs like the 16v8 PAL used in the 1980’s, and some CPLD variants were available in the era of these machines. These complex devices can replace dozens of ICs in a design, decreasing finished unit costs consderibly and bringing these types of functions well within the grasp of hobbyists to manufacture and enthusiasts to buy. Since I’ve always desired to create economical solutions for these classic machines, the CPLD provides significant value. Still, for the moment, I’m most interested in the design efficiencies, especially since there’s no guarantee anything will work in the end.