RC 2015/7 – Home Straight

Engineer’s Log, Labdate 0730.12. We finally reached Warp 5.0! Too bad this starship has been shut down long since.

Yesterday I met target #6 of my Retrochallenge project: A very basic Graphical Processing Unit has been implemented. Besides its primary tasks (framebuffer handling, host communication, set pixel at x/y position) the microcontroller features a DrawLine command now. This results in a tremendous speedup:


We went from 159 seconds (see previous post) to 0.62 seconds! Instead of sending 1776 single SetPixel commands to the HRE we’re now sending only 4 DrawLine commands. This way we overcome the poor BASIC performance.

The acceleration would be even more impressive if we would compare diagonal lines because in that case we’d have to do some additional calculation in BASIC. I implemented a complete Bresenham line drawing algorithm in the HRE firmware so diagonal lines are supported.

The microcontroller has still plenty of flash memory left for future expansion (about 79%). But I have attained all goals for this Retrochallenge and prefer to spend more time in bed in the upcoming nights. 😉

I’m going to write a demo and to shoot a video in the remaining hours until RC2015/07 ends – provided that Murphy won’t bug me.

RC 2015/7 – Dances With Pixels #3

Engineer’s Log, Labdate 0728.20. Warp 4.8! We’re getting close.

After a close fight against Murphy I finally won the day: The Commodore CBM 8296 is talking to the HRE via the userport. This small BASIC program demonstrates how the communication is handled on the CBM side. It draws a border around the screen in this order:

(x, y)    0, 0 –> 639, 0 –> 639, 249 –> 0, 249 –> 0, 0


Parts of the program have been squeezed by some methods we used in the early 80th to reduce execution time. These tricks reduced not only the readability of the program but also the runtime from 206 seconds …
… to 159 seconds …
which is still very slow.

If my cell phone cam is able to shoot a video of the program output there will be a final comparison of this pixel by pixel output and the accelerated line output I plan to implement tomorrow.

RC 2015/7 – Dances With Pixels #2

Engineer’s Log, Labdate 0726.15. We are traveling at warp 4.5 now. Scott and Spock are still trying to reach warp 5.

Found the missing pixels. They have been buried under some missing NOP instructions. Hmm, ok, it’s nearly impossible to bury anything under something that is missing but I managed it somehow. 😉

Now HRE has full resolution of 640 x 250 = 160,000 pixels!

A simple interface should read commands from the CBM userport. How do we implement this feature? It depends on the time between successive commands and on the transmission time for a single command. I’ll write a small transmit loop in BASIC and measure the execution time. A datassette has the honorable duty of being responsible for a save and reliable storage.

I don’t care about command structure at this stage. All I want to measure is the time for the fastest possible change of signal levels at the userport (BASIC only). Will the execution speed change if we put two BASIC statements in a single line?

1000 POKE 59459,255
1010 POKE 59471,255:POKE 59471,0
1020 POKE 59471,255
1030 POKE 59471,0
1040 GOTO 1010

usrport01Wow! 7.281 milliseconds! What a blazing speed! That equates to a bandwidth of more than 137.343 bytes/s! Eat your heart out Nvidia!

Putting two statements in a single line doesn’t make any difference.

Ok, let’s assume the 6502 is going to train for CPU olympics. Hence, further calculation will be based on a value of 5 ms to be on the safe side. If we sample the userport at each HSYNC interrupt we get more than 80 samples per level change. That’s more than adequate. Currently there are about 170 free cycles at the beginning of the HSYNC ISR so I intend to handle the userport communication there. It’ll be somewhat challenging because the HSNC ISR must be cycle exact. So what? May I remind you that this is a retroCHALLENGE entry?

Now we should think the protocol through. The absolute minimum number of bytes per command (without considering compression) will be four:

1  byte for instruction (e.g.  1 = SetPixel, 2 = MoveCursor, 3 = DrawLine, …)
2 bytes for X coordinate
1 byte for Y coordinate

Unfortunately the HRE wouldn’t be able to identify the single bytes. For example the command SetPixel(256,1) will translate to $01 $01 $01 $01. Not a single bit does change during the whole data transfer! How do we separate the bytes?

We’ll use only 5 bits of a byte for data and 3 bits for synchronisation. That adds some complexity to the protocol which reduces bandwith even more but we can strive for speed in an upcoming Retrochallenge.  😉

The following table of the three most significant bits shows the unique coding of each byte:

000 => command
001 => X-coordinate low
010 => X-coordinate high
011 => Y-coordinate low
100 => Y-ccordinate high
101 => future expansion
110 => future expansion
111 => future expansion

So we have to transmit a total of five bytes per command.

Looking forward to my first non trivial BASIC program since the 80th.  🙂


RC 2015/7 – Dances with Pixels #1

Engineer’s Log, Labdate 0725.22. Science Officer Spock finally managed to persuade the computer to accept the modification. We are traveling at warp 4.4 now. Warp 5 should be possible, though.

Ok, I swapped pins PE0 and PD0 to free an interrupt for VSYNC detection and added the interrupt service routine. Luckily vertical synchronisation worked right from the start. 🙂

Sadly, tracking down the jitter problem isn’t that easy. At first glance it appeared to be a small difference in frequencies (main clock CBM vs. main clock HRE) so I tried to run the HRE on the CBM’s main clock and it improved somewhat. The jitter disappeared for one or two minutes to be precise – then returned. Cooling spray helped to locate the source of the problem: It’s the hex inverter chip (UD3) that forms the oscillator circuit. Really? I carefully examined the CBM text on the screen and there wasn’t any jitter at all. Only the HRE output showed jitter. Then I touched UD3 with my fingers and the jitter changed or disappeared. Unfortunately I have no spare finger to add to the circuit…

My provisional conclusion: The jitter is caused by the marvellous wiring and perfect grounding/shielding of the HRE board <ahem>.  As we are pinched for time I won’t design a proper PCB to solve this problem. Rearranging the cabling helps a bit…

A screen full of equally spaced vertical lines is of avail when checking horizontal sync but otherwise quite boring. A screen full of freaking awesome vertical lines would require variable spacing! Easy! Just change the bit pattern and … fail.

For the first test I wrote a bit pattern to the external SRAM and imported the data in the next cycle to the shift register then clocked the bits out of the shift register to the monitor.

LDI  r25, 0b01111110  // load register 25 with bit pattern
ST    $8000, r25             //  store register 25 value into SRAM
OUT CtrlPort, r19         //  load data into shift register
OUT CtrlPort, r20        //  start clocking bits out of shift register
NOP                               //  waste one cycle
NOP                               //  waste one cycle

These six instructions take 8 cycles which is the time that is needed to clock 8 bits (pixel) out of the shift register. In other words 8 pixels are loaded and sent out to the monitor in 8 cycles thus allowing for a pixel frequency of 16 MHz.

Along with changing bit patterns for variable line spacing I coded a complete framebuffer handling as it will be needed for a graphic extension. Now I was reading the bit patterns from SRAM and then transferring them to the shift register. It turns out that the byte that has just been read from SRAM stays valid on the data bus for only one clock cycle (when /RD is LOW). The following OUT instruction cannot see this data anymore thus the transferred byte is not correct.

First idea was to send the data (that has been read into a microcontroller register) out to another port and import into shift register from that port. That would work if we had a free port.

Second idea was to handle the external SRAM manually but that would take much more clock cycles and reduce the horizontal resolution.

Third time is a charm: Enable XMEM, read bit pattern from external SRAM, disable XMEM, enable manual output on port A, send bit pattern to port A, import into shift register, start shifting. That results in 9 clock cycles which is still one cycle too many <Ҥ$+#%&/()!!!!> Fortunately the XMEM interface overrides any conflicting port definition if enabled and restores the old definition if disabled. So I could remove the port definition from the framebuffer read sequence and saved a cycle.

Meanwhile I wasn’t positive about vertical lines at all and programmed a bunch of sinewaves instead:hre02

After some time of debugging the sinewaves appeared in a more pleasant fashion:hre03
Men are like little kids so I played with my new toy:hre04Border, lines and sinewaves have been computed by the HRE microcontroller. Text has been typed on the CBM keyboard. Due to the fact that the HRE has a separate video memory one can type and edit text/programs as usual without destroying the graphics. Current resolution is 637 x 250 pixels. Hopefully an additional debugging session will find the lost three pixels per line.

Implementation of the userport interface will be next. Stay tuned!

RC 2015/7 – Dances with Bits #2

Engineer’s Log, Labdate 0720.23. The Klingons wonder how we manage to stay combat-ready even if they prevent us from getting any sleep. I instructed McCoy to brew up some more of our secret weapon and to hand thermos out to the entire crew.

Yeah! After a slight hardware modification the HRE firmware can now display vertical lines that are aligned hre01with CBM characters.  Nothing to write home about but every little counts.

Vertical synchronisation will be next. Unfortunately I forgot to arrange for VSYNC detection. Oops! There is no pin with interrupt support left. Swapping PE0 / PD0 might help but that’s a task for tomorrow.

And I’ll hunt the jitter down … tomorrow.

Need some sleep badly…


RC 2015/7 – Dances with Bits #1

Engineer’s Log, Labdate 0719.21. Mr. Spock remarks that our computer underperforms even the human brain. That’s nice to hear.

I already had a look at LunaAVR about a year ago but haven’t used it for a real project until now. That’s simply because I haven’t worked on a private microcontroller project last year. LunaAVR has a nice IDE and comes with more than 300 pages of documentation as well as many example programs. There seems to be a small but informed and responsive community. After all it might be the right tool for the job.

Last night I tinkered a bit with the new release. It still lacks support for the external memory interface that I want to use. No big deal, though. Flipping some bits should turn XMEM on (it’s an Atmel term for eXternal MEMory)  and memory access will be handled by assembler anyway.

Where do we start? The CBM 8296 should behave as usual until a special command is issued to activate the HRE. For that purpose we have to set pin 9 of the shift register (output) to high level and pin 1 of the multiplexer (A/B switch) to low level so that only the CBM video signal is routed to the monitor. A 10k pulldown resistor takes care of the multiplexer but the shift register has no tri-state output and therefore the microcontroller must initialize the chip. While we’re at it: a subroutine for switching three video modes (CBM only / HRE only / both) would be nice.

Video synchronisation will be next.

IF you haven’t tinkered with video signals
THEN you might find the following (greatly simplified) abstract of avail
ELSE skip next four paragraphs

The CBM has a CRT (Cathode Ray Tube) monitor that sends an electron beam onto a fluorescent screen. The electron beam moves continually across the screen area in a fixed pattern starting at the upper left corner. It then moves horizontally to the right screen border, then back to the left border and down by the width of a pixel. This movement repeats until the beam reaches the bottom right corner of the screen. It then moves back to the upper left corner and the whole process starts again. In other words, the beam moves from left to right and line by line from top to bottom – just like you move your eyes while reading this text.

Three digital signals control the beam. In this text they are labeled VIDEO, HSYNC and VSYNC. One can’t see the beam as long as VIDEO is HIGH. If VIDEO is LOW then the fluorescent coating lights up at the current position of the beam. If HSYNC (Horizontal SYNChronisation) is HIGH the beam passes from left to right along the screen. If HSYNC is LOW the beam moves one line down and back to the left. HSYNC is like Carriage Return and Line Feed for the beam. While VSYNC (Vertical SYNChronisation) is HIGH the beam moves down the screen line by line. The beam returns to its home position (upper left corner) when VSYNC is LOW.

How do we display useful information? By precisely controlling the LOW periods of VIDEO. A long LOW period gives a long horizontal line and a short LOW period gives a short line. Our horizontal timing reference is the 16 MHz pixel clock of the CBM system. 16 Mhz pixel clock means that a single pixel has a duration of 1 / 16,000,000 s = 62.5 ns. Let’s assume we want to display a horizontal line of 8 pixels width. We then need to drive VIDEO LOW for 8 * 62.5 ns = 500 ns. A complete set of pixel information that can be displayed on the screen at a time is called a frame. The vertical timing reference is called frame rate (or refresh rate). It tells us how many frames per second will be displayed, that is how many VSYNC pulses occur per second. My CBM system uses a frame rate of 50 Hz.

Back in the days most computer monitors had quite simple (aka cheap) circuits inside. This tended to result in performance variations within a production lot and in temperature dependency of the display. Imagine they would have used the whole screen area to draw pixels and the start position of the beam moves slightly to the left while the monitor heats up. Some pixels would disapear behind the chassis of the screen. Thus borders were introduced to the timings.

hsync2v640This oscillogram shows a line of 640 bright pixels between two HSYNC pulses. The LOW period is 40 µs which is just what we expect (40 µs = 40,000 ns; 40,000 ns / 640 = 62.5 ns; 1/62.5 ns = 16 MHz). The HIGH period of 19 µs forms the horizontal borders. It takes about 16 µs from HSYNC going LOW to VIDEO_OUT going LOW and about 2.9 µs from VIDEO_OUT going HIGH to HSYNC going LOW. These periods define the horizontal position of the visible area.

To synchronize the HRE output with the CBM output I’ll use an interrupt that is triggered by HSYNC. Due to the fact that we have to be cycle exact in our timing we must take interrupt latency into account. The ATmega completes any ongoing instruction before an interrupt can be serviced. Since execution times vary from 1 to 4 cycles per instruction, an interrupt service routine has a random entry time. The solution to this problem is to put the µC into sleep mode. It then always executes the exact same steps when it wakes up and these will take always the exact same time to execute.


RC 2015/7 – Off Topic

Looking for debugging fun? How about this: Connect the RESET line of your microcontroller to an output pin that occasionally changes its level.

Yep. I’ve been there. Don’t ask me why. Maybe sunstroke while soldering. So much for “Hardware has no severe faults.”

RC 2015/7 – Enter Keyboard

Engineer’s Log, Labdate 0715.17. Warp engines are up and running again. Mr. Spock has a lively discussion with the computer on the options to unleash the engines’ full capability.

The soldering iron has cooled down bang on time. Half-time for the Retrochallenge. But there is no time for wild partying. Just one deep breath while downloading the recent version of LunaAVR which will be the programming language for this project. I like to try new languages and this tool sounds promising. One important feature is the inline assembler. When it comes to tight timing requirements I prefer assembler to any compiler optimization.

The LunaAVR IDE will work shoulder to shoulder with an AVR JTAGICE mkII in programming the ATmega162 microcontroller. This is my “hello world” program to test the build chain, the main clock and the microcontroller:


… compiler output …


… programmer output …


… and the µC toggles the LED in 1 s intervals.

All systems nominal! 🙂

RC 2015/7 – Power Up

Engineer’s Log, Labdate 0715.14. We have stopped our engines for maintenance and upgrade. Mr. Scott is supervising the initial startup procedure of the new warp extension.

kitTadaaaah!!! Here it is: The Double Retro Kit! It’s double retro because it has been built entirely from parts out of my junkbox. 😉  I used my last card connector that fit the CBM/PET boards. In case you know a source where one can still buy these connectors, please drop me a line.

The 1:1 adapter from user port to pin header hasn’t been finished yet. The whole thing will have a bath in epoxy sometime.


bmowYep! There are some very good reasons why not to build it like this. We have to handle high-speed digital signals (pixel stream, fast edges of 74AC00 and 74ALS573) and if this project were meant to be a consumer product, a proper PCB would be required. I consider this project to be a proof of concept and hope to get along with the direct wiring. I’m prepared to fix some weird issues that might pop up, though.

Ok, let’s populate the board step by step and do some checking. Maybe we can prevent it from catching fire. Please feel free to skip the rest of this post as it might be quite boring to read unless you are interested in the engineering part of the story (aka watching me putting my foot in my mouth).

Step 1: Limit power supply to 40 mA; check current consumption with empty sockets.
Target: < 1 mA
Actual: < 1 mA  (passed)

I’d like to see if the CBM video signal can pass the HRE board without interference. It enters the HRE at pin 2 of the multiplexer (74HC257) and is fed to the NAND (74AC00) afterwards. Let’s check that path.

Step 2: Insert 74HC257; check current consumption and voltages.
Target: < 1 mA
Actual: < 1 mA  (passed)
Target:  5.0 V
Actual:  5.0 V  (passed)

Step 3: Insert 74AC00; check current consumption and voltages.
Target: < 1 mA
Actual:  4 to 32 mA  (failed)
Target:  5.0 V
Actual:  5.0 V  (passed)

It seems as if we have some unwanted oscillation here. That’s ok because of currently floating inputs at pin 9 and pin 10. Does the current stabilize if we apply some pulldown resistors? Fortunately, yes. The current drops below 1 mA. (passed)

Step 4: Connect HRE to user port; start CBM; check video signal.
Target:  The signal should look quite the same after having passed the HRE.
Actual:  The output signal looks pretty good and there is vidinoutno perceptible difference on the screen. You’d expect to see a pulse width of 62.5 ns but I connected the probes with hooks and ground leads, which isn’t the right method for precise measurements at high frequencies (edit: and I had the bandwidth limited to 20 MHz). Good enough for this quick check, though. The output signal is about 20 ns late, which is caused by the propagation delay of the multiplexer and the NAND gates and is within the expected range (74HC257 typ. 10 ns @ 25°C; 74AC00 typ. 5 ns @ 25°C; 10 ns + 2 * 5 ns = 20 ns).  (passed)

Step 5: Limit power supply to 100 mA; Insert the remaining chips; check current consumption and voltages.
Target:  < 100 mA (all chips inserted)
Actual:  roughly 65 mA  (passed)
Target:  5.0 V
Actual:  4.97 V  (passed)

Yippee! So far so good. Hardware has no severe faults. Now I’m going to setup the software development environment …

RC 2015/7 – Aaarrgh!!!!

Engineer’s Log, Labdate 0713.10. I am at a loss for words. – Mr. Spock comments: “Fascinating is a word I use for the unexpected. In this case, I should think interesting would suffice.”

Isn’t this a nice connecting cable to the user port pin header?upcable1