My entry to Retrochallenge 2015/07 was quite successful but left room for improvement. So I’ll take a new attempt based on the experience with the previous entry:
The project for Winter Warmup will be a follow-up to HRE aka High Resolution Extension aka monochrome 648 x 256 pixel graphics board for Commodore PET/CBM 8000 series:
Goal #1: replace the breadboard by a printed circuit board
Goal #2: improve firmware (character support; more sophisticated data transfer protocol; circles)
Goal #3: implement CBM control software as a BASIC extension
Goal #4: add color capabilities in the remaining time (no, just kidding).
Talking about goal #4: What happens when you leave a nerd alone with a full pint of beer and a crazy idea on New Year’s Eve? Yep! He will come up with an even more insane idea!
So this is goal #4 defined and explained in detail:
– add VGA port
– add color mode with 160 x 200 pixel resolution at 8 colors
– add firmware functions similar to monochrome mode
– buy ticket valid for 10 meetings with psychologist
EDIT: Unfortunately I haven’t had time to get this project going. It’s still in my mind, though. To be continued some day …
A month has past since the end of Retrochallenge 2015/07 and … I miss the feeling! Yep, I miss the Retrochallenge feeling. That is, working on a challenging project just for fun. Specifically working on interesting stuff for ancient computers. Ok, I assume you’re one of the 0.0236986301 ppm of world population who do understand what I’m talking about if you’re reading this at all.
Therefore I won’t wait for RC 2016/01. I’ll continue to work on my RC 2015/07 project and turn the proof of concept into a working prototype. To be honest I already restarted the project and didn’t tell you. 😉
Finding the Vsync bug that showed up in the demo video (RC 2015/7 final post) has been the first task. As I have mentioned before it is critical to count every single cycle within the video output routines. And what happens when you work after work ’til late night? You sometimes can’t count anymore! The error was caused by a not equally timed program path. There is a reason why they call an oscilloscope “electronic engineer’s best friend”. Would have been clever to verify the counted cycles with my friend somewhat earlier…
Optimizing the code gave me some additional free cycles so I could increase the resolution from 640 by 250 to 648 by 256. It’s nothing to write home about but the extended resolution allows to draw small symbols outside the CBM text region which might come in handy in certain cases.
One page consumes 648 * 256 / 8 = 20,736 bytes of external SRAM. The size of the installed SRAM is 131,072 bytes (128k * 8 bit). How do we make use of the remaining memory? Several framebuffers! The ATmega162 can address 64k of external SRAM minus 1k internal SRAM minus 256 bytes registers. These 64,256 bytes contain 3 framebuffers and leave 2k for future ideas. The other half of the SRAM chip can be reached by banking. That equals to six independend framebuffers and two 2k free memory blocks in total.
The framebuffers can be used in two modes. In mode 0 all six buffers are orthogonal. In mode 1 only the three buffers in bank 0 are available for drawing while the three buffers in bank 1 are used to implement a second ‘color’. You may call this lower level of brightness ‘dark green’ as well as ‘transparent green’. The following image shows the extended resolution by a border drawn outside the CBM display real estate and explains the meaning of ‘dark/transparent green’.
Please take a closer look at the characters ‘T’ in the bottom half of the screen and ‘1’ in ‘Line 21’ as well as the disruption of the right border line. These seem to be side effects of the wild wiring – sometimes present, sometimes absent. EDIT: No, it’s a problem of the CBM mainboard that persists even if I disconnect the HRE completely. Talking about the characters only. The line disruption is most likely a problem of the HRE. And no, it’s not the dust on my camera.
To get the second ‘color’ two associated framebuffers at the same address on different memory banks are used. The video output routine switches between these two buffers at a frequency of 25 Hz. If a pixel at a specific location is set in both framebuffers it will be displayed at full brightness. If it is set in one framebuffer only it will be displayed at half brightness. Due to the fact that CBM text is always displayed at full brightness it will shine through in areas of ‘dark green’.
Unfortunately the applicability of this second ‘color’ is very limited: A refresh rate of 25 Hz implies severe flickering. Adjusting the potentiometer at the back of the screen to reduce overall brightness might help a bit but won’t provide a good display quality.
I plan to modify the user port communication but haven’t come up with a better solution. A first attempt to implement a more sophisticated protocol killed the balanced timing.
Ok, Murphy did bug me – as always. I eventually finished a short crapy video that is currently uploading to my Youtube channel. It’s my first video ever and I have no idea if it will be of acceptable quality, but I’m sure you will enjoy my stuttering:
All aims of this RC project have been accomplished. 🙂 I’m astonished that the wild wiring hasn’t caused any severe problems. Yes, there is some jitter and maybe it’s a result of not having a proper PCB – maybe. And there is a loose contact. That’s it. Not half bad!
And the firmware? Well, as you can see in the video there is an issue with synchronisation when line mode is used. Unfortunately I failed to debug it this day. Drawing procedures that have a “longer” execution time will trigger the error. That’s confusing because both sync routines have been implemented as interrupt service routines (ISR). Some investigation is required here…
Ideas for future development? Yes, sure! Character generator, page switching, sprites, DrawRectangle, DrawCircle, …
Retrochallenge 2015/07 has been great fun! Thanks for having me! Looking forward to RC2016/01! 🙂
Engineer’s Log, Labdate 0730.12. We finally reached Warp 5.0! Too bad this starship has been shut down long since.
Yesterday I met target #6 of my Retrochallenge project: A very basic Graphical Processing Unit has been implemented. Besides its primary tasks (framebuffer handling, host communication, set pixel at x/y position) the microcontroller features a DrawLine command now. This results in a tremendous speedup:
We went from 159 seconds (see previous post) to 0.62 seconds! Instead of sending 1776 single SetPixel commands to the HRE we’re now sending only 4 DrawLine commands. This way we overcome the poor BASIC performance.
The acceleration would be even more impressive if we would compare diagonal lines because in that case we’d have to do some additional calculation in BASIC. I implemented a complete Bresenham line drawing algorithm in the HRE firmware so diagonal lines are supported.
The microcontroller has still plenty of flash memory left for future expansion (about 79%). But I have attained all goals for this Retrochallenge and prefer to spend more time in bed in the upcoming nights. 😉
I’m going to write a demo and to shoot a video in the remaining hours until RC2015/07 ends – provided that Murphy won’t bug me.
Engineer’s Log, Labdate 0728.20. Warp 4.8! We’re getting close.
After a close fight against Murphy I finally won the day: The Commodore CBM 8296 is talking to the HRE via the userport. This small BASIC program demonstrates how the communication is handled on the CBM side. It draws a border around the screen in this order:
(x, y) 0, 0 –> 639, 0 –> 639, 249 –> 0, 249 –> 0, 0
Parts of the program have been squeezed by some methods we used in the early 80th to reduce execution time. These tricks reduced not only the readability of the program but also the runtime from 206 seconds …
… to 159 seconds …
which is still very slow.
If my cell phone cam is able to shoot a video of the program output there will be a final comparison of this pixel by pixel output and the accelerated line output I plan to implement tomorrow.
Engineer’s Log, Labdate 0726.15. We are traveling at warp 4.5 now. Scott and Spock are still trying to reach warp 5.
Found the missing pixels. They have been buried under some missing NOP instructions. Hmm, ok, it’s nearly impossible to bury anything under something that is missing but I managed it somehow. 😉
Now HRE has full resolution of 640 x 250 = 160,000 pixels!
A simple interface should read commands from the CBM userport. How do we implement this feature? It depends on the time between successive commands and on the transmission time for a single command. I’ll write a small transmit loop in BASIC and measure the execution time. A datassette has the honorable duty of being responsible for a save and reliable storage.
I don’t care about command structure at this stage. All I want to measure is the time for the fastest possible change of signal levels at the userport (BASIC only). Will the execution speed change if we put two BASIC statements in a single line?
1000 POKE 59459,255
1010 POKE 59471,255:POKE 59471,0
1020 POKE 59471,255
1030 POKE 59471,0
1040 GOTO 1010
Putting two statements in a single line doesn’t make any difference.
Ok, let’s assume the 6502 is going to train for CPU olympics. Hence, further calculation will be based on a value of 5 ms to be on the safe side. If we sample the userport at each HSYNC interrupt we get more than 80 samples per level change. That’s more than adequate. Currently there are about 170 free cycles at the beginning of the HSYNC ISR so I intend to handle the userport communication there. It’ll be somewhat challenging because the HSNC ISR must be cycle exact. So what? May I remind you that this is a retroCHALLENGE entry?
Now we should think the protocol through. The absolute minimum number of bytes per command (without considering compression) will be four:
1 byte for instruction (e.g. 1 = SetPixel, 2 = MoveCursor, 3 = DrawLine, …)
2 bytes for X coordinate
1 byte for Y coordinate
Unfortunately the HRE wouldn’t be able to identify the single bytes. For example the command SetPixel(256,1) will translate to $01 $01 $01 $01. Not a single bit does change during the whole data transfer! How do we separate the bytes?
We’ll use only 5 bits of a byte for data and 3 bits for synchronisation. That adds some complexity to the protocol which reduces bandwith even more but we can strive for speed in an upcoming Retrochallenge. 😉
The following table of the three most significant bits shows the unique coding of each byte:
000 => command
001 => X-coordinate low
010 => X-coordinate high
011 => Y-coordinate low
100 => Y-ccordinate high
101 => future expansion
110 => future expansion
111 => future expansion
So we have to transmit a total of five bytes per command.
Looking forward to my first non trivial BASIC program since the 80th. 🙂
Engineer’s Log, Labdate 0725.22. Science Officer Spock finally managed to persuade the computer to accept the modification. We are traveling at warp 4.4 now. Warp 5 should be possible, though.
Ok, I swapped pins PE0 and PD0 to free an interrupt for VSYNC detection and added the interrupt service routine. Luckily vertical synchronisation worked right from the start. 🙂
Sadly, tracking down the jitter problem isn’t that easy. At first glance it appeared to be a small difference in frequencies (main clock CBM vs. main clock HRE) so I tried to run the HRE on the CBM’s main clock and it improved somewhat. The jitter disappeared for one or two minutes to be precise – then returned. Cooling spray helped to locate the source of the problem: It’s the hex inverter chip (UD3) that forms the oscillator circuit. Really? I carefully examined the CBM text on the screen and there wasn’t any jitter at all. Only the HRE output showed jitter. Then I touched UD3 with my fingers and the jitter changed or disappeared. Unfortunately I have no spare finger to add to the circuit…
My provisional conclusion: The jitter is caused by the marvellous wiring and perfect grounding/shielding of the HRE board <ahem>. As we are pinched for time I won’t design a proper PCB to solve this problem. Rearranging the cabling helps a bit…
A screen full of equally spaced vertical lines is of avail when checking horizontal sync but otherwise quite boring. A screen full of freaking awesome vertical lines would require variable spacing! Easy! Just change the bit pattern and … fail.
For the first test I wrote a bit pattern to the external SRAM and imported the data in the next cycle to the shift register then clocked the bits out of the shift register to the monitor.
LDI r25, 0b01111110 // load register 25 with bit pattern
ST $8000, r25 // store register 25 value into SRAM
OUT CtrlPort, r19 // load data into shift register
OUT CtrlPort, r20 // start clocking bits out of shift register
NOP // waste one cycle
NOP // waste one cycle
These six instructions take 8 cycles which is the time that is needed to clock 8 bits (pixel) out of the shift register. In other words 8 pixels are loaded and sent out to the monitor in 8 cycles thus allowing for a pixel frequency of 16 MHz.
Along with changing bit patterns for variable line spacing I coded a complete framebuffer handling as it will be needed for a graphic extension. Now I was reading the bit patterns from SRAM and then transferring them to the shift register. It turns out that the byte that has just been read from SRAM stays valid on the data bus for only one clock cycle (when /RD is LOW). The following OUT instruction cannot see this data anymore thus the transferred byte is not correct.
First idea was to send the data (that has been read into a microcontroller register) out to another port and import into shift register from that port. That would work if we had a free port.
Second idea was to handle the external SRAM manually but that would take much more clock cycles and reduce the horizontal resolution.
Third time is a charm: Enable XMEM, read bit pattern from external SRAM, disable XMEM, enable manual output on port A, send bit pattern to port A, import into shift register, start shifting. That results in 9 clock cycles which is still one cycle too many <“§$+#%&/()!!!!> Fortunately the XMEM interface overrides any conflicting port definition if enabled and restores the old definition if disabled. So I could remove the port definition from the framebuffer read sequence and saved a cycle.
After some time of debugging the sinewaves appeared in a more pleasant fashion:
Men are like little kids so I played with my new toy:Border, lines and sinewaves have been computed by the HRE microcontroller. Text has been typed on the CBM keyboard. Due to the fact that the HRE has a separate video memory one can type and edit text/programs as usual without destroying the graphics. Current resolution is 637 x 250 pixels. Hopefully an additional debugging session will find the lost three pixels per line.
Implementation of the userport interface will be next. Stay tuned!
Engineer’s Log, Labdate 0720.23. The Klingons wonder how we manage to stay combat-ready even if they prevent us from getting any sleep. I instructed McCoy to brew up some more of our secret weapon and to hand thermos out to the entire crew.
Vertical synchronisation will be next. Unfortunately I forgot to arrange for VSYNC detection. Oops! There is no pin with interrupt support left. Swapping PE0 / PD0 might help but that’s a task for tomorrow.
And I’ll hunt the jitter down … tomorrow.
Need some sleep badly…