The QTCore-A1 is a basic accumulator-based 8-bit microarchitecture (with an emphasis on the micro). It is a Von Neumann design (shared data and instruction memory).
Although primarily designed for Tiny Tapeout 3, it is parameterized and may be synthesized to FPGAs (and a project file for CMOD A7 is provided on the project GitHub).
Probably the most interesting thing about this design is that all functional Verilog beyond the Tiny Tapeout wrapper was written by GPT-4, i.e. not a human! The author (Hammond Pearce) developed with GPT-4 first the ISA, then the processor, fully conversationally. Hammond wrote the test-benches to validate the design, and then had the appropriate back-and-forth with GPT-4 to have it fix all bugs. For your interest, we will provide all conversation logs in the project repository.
The architecture defines a processor with the following components:
In order to interact with the processor, all registers are connected via one large scan chain. As such, you can use an external microcontroller's SPI peripheral to read and write the status of the processor. We use the SPI clock SPI_SCK as the clock to drive both the QTCore-A1 and the scan chain. You choose which you are using by asserting the appropriate chip select. This makes it quite easy to interact with the processor.
The ISA description follows. For your convenience, we also provide an assembler in Python (also written by GPT-4) which makes it quite easy to write simple programs.
LDA: Load Accumulator with memory contents
STA: Store Accumulator to memory
ADD: Add memory contents to Accumulator
SUB: Subtract memory contents from Accumulator
AND: AND memory contents with Accumulator
OR: OR memory contents with Accumulator
XOR: XOR memory contents with Accumulator
JMP: Jump to memory address
JSR: Jump to Subroutine (save address to ACC)
BEQ_FWD: Branch if equal, forward (branch if ACC == 0)
BEQ_BWD: Branch if equal, backward (branch if ACC == 0)
BNE_FWD: Branch if not equal, forward (branch if ACC != 0)
BNE_BWD: Branch if not equal, backward (branch if ACC != 0)
HLT: Halt the processor until reset
SHL: Shift Accumulator left
SHR: Shift Accumulator right
SHL4: Shift Accumulator left by 4 bits
ROL: Rotate Accumulator left
ROR: Rotate Accumulator right
LDAR: Load Accumulator via indirect memory access (ACC as ptr)
DEC: Decrement Accumulator
CLR: Clear (Zero) Accumulator
INV: Invert (NOT) Accumulator
Writing assembly programs for QTCore-A1 is simplified by the assembler produced by GPT-4. First, we define two additional meta-instructions:
NOP: Do nothing
DATA: Define raw data to be loaded at the current address
[address]: [mnemonic] [optional operand]
An interesting example program is presented. This assumes a button is connected to the general purpose input, and some LEDs are connected to the LED output.
We assume this file is called test_btn_led.asm
:
; This program tests the btn and LEDs
; It will wait for low->high transitions on the button input.
; After receiving this, it will toggle a set of LEDs.
;
; BTNs and LEDS are at address 17, btn at LSB
0: LDA 17 ; load the btn and LEDS
1: AND 16 ; mask the btn
2: BNE_BWD ; if btn&1 is not zero then branch back to 0
3: LDA 17 ; load the btn and LEDS
4: AND 16 ; mask the btn
5: BEQ_BWD ; if btn&1 is zero then branch back to 3
;
; the button has now done a transition from low to high
;
6: LDA 14 ; load the counter toggle
7: XOR 16 ; toggle the counter using the btn mask
8: STA 14 ; store the counter value
9: ADDI 14 ; get the counter value offset
; (if 0, will be 14 (which is 0), if 1, 15)
10: LDAR ; load the counter LED pattern
11: STA 17 ; store the LED pattern
12: CLR
13: JMP
;
; data
;
14: DATA 0; toggle and LED pattern 0
15: DATA 24; LED pattern of ON
; (test for led_out=value 12, since 24>>1 == 12)
16: DATA 1 ; btn and counter mask
To compile this program, we would invoke the assembler (provided in the associated repository) as follows:
$ ./assembler.py test_btn_led.asm
The assembler will generate the following files for us:
test_btn_led.bin
: This is a binary representation of the assembly program. It can be used, or we can use one of the helper formats...test_btn_led.memarray.v
: This provides the binary ready for use in a Verilog test bench to directly write to the memory of the processor.test_btn_led.scanchain.v
: This provides the binary ready for use in the provided Verilog test bench which loads and unloads programs via the scan chain.test_btn_led.c
: This provides the binary as an array suitable for use in a C program on an external microcontroller which can load it into the processor via SPI.The processor executes all instructions via 2 stages (multi-cycle).
The timing is as follows. Note the branch instructions are +2/-3 due to the already-incremented PC in the fetch stage.
FETCH cycle (all instructions)
EXECUTE cycle
For Immediate Data Manipulation Instructions:
For Instructions with Variable-Data Operands:
For Control and Branching Instructions:
For Data Manipulation Instructions:
Address | Description |
---|---|
0-16 | 17 bytes of general purpose Instruction/Data memory |
17 | I/O: {OUT[7:1], IN[0]} |
18 | Constant value: 19 (See "7seg" note below) |
19 | Constant value: Bits for 7seg "0" |
20 | Constant value: Bits for 7seg "1" |
21 | Constant value: Bits for 7seg "2" |
22 | Constant value: Bits for 7seg "3" |
23 | Constant value: Bits for 7seg "4" |
24 | Constant value: Bits for 7seg "5" |
25 | Constant value: Bits for 7seg "6" |
26 | Constant value: Bits for 7seg "7" |
27 | Constant value: Bits for 7seg "8" |
28 | Constant value: Bits for 7seg "9" |
29-31 | Constant value: 1 |
Tiny Tapeout 3 has a 7-segment display on the board. To make it useful with the QTCore-A1, the processor helpfully includes the bit patterns stored at addresses 19-28. The easiest way to make use of these is with the LDAR instruction. Here's an example snippet, which assumes a value between 0 and 9 is stored at address 16:
0: LDA 16 ; load the value we want to display
1: ADD 18 ; add it to the constant 19 to get the 7 segment pattern offset
2: LDAR ; load the 7 segment pattern
3: STA 17 ; emit the 7 segment pattern
The processor will not run unless a program is scanned into it. Fortunately, this is easy using the SPI peripheral of an external microcontroller.
The QTCore-A1 is designed to be connected to an SPI peripheral along with two chip selects. This is because we use the SPI SCK as the clock for both the scan chain and the processor.
Connect the I/O according to the provided wiring table. Then, set the SPI peripheral to the following settings:
During the scan chain mode (when the scan enable is low), the entire processor acts as a giant shift register, with the bits in the following order:
All elements of the scan chain are presented MSB first.
scan_chain[2:0]
- 3-bit state registerscan_chain[7:3]
- 5-bit PCscan_chain[15:8]
- 8-bit Instruction Registerscan_chain[23:16]
- 8-bit Accumulatorscan_chain[31 -: 8]
- Memory[0]scan_chain[39 -: 8]
- Memory[1]scan_chain[47 -: 8]
- Memory[2]scan_chain[55 -: 8]
- Memory[3]scan_chain[63 -: 8]
- Memory[4]scan_chain[71 -: 8]
- Memory[5]scan_chain[79 -: 8]
- Memory[6]scan_chain[87 -: 8]
- Memory[7]scan_chain[95 -: 8]
- Memory[8]scan_chain[103 -: 8]
- Memory[9]scan_chain[111 -: 8]
- Memory[10]scan_chain[119 -: 8]
- Memory[11]scan_chain[127 -: 8]
- Memory[12]scan_chain[135 -: 8]
- Memory[13]scan_chain[143 -: 8]
- Memory[14]scan_chain[151 -: 8]
- Memory[15]scan_chain[159 -: 8]
- Memory[16]scan_chain[167 -: 8]
- Memory[17] (the IO register)C code to load the previously-discussed example program (e.g. via the STM32 HAL) is provided:
//The registers are presented in reverse order to
// the table as we load them MSB first.
uint8_t program_led_btn[21] = {
0b00000000, //IOREG
0b00000001, //MEM[16]
0b00011000, //MEM[15]
0b00000000, //MEM[14]
0b11110000, //MEM[13]
0b11111101, //MEM[12]
0b00110001, //MEM[11]
0b11111011, //MEM[10]
0b11101110, //MEM[9]
0b00101110, //MEM[8]
0b11010000, //MEM[7]
0b00001110, //MEM[6]
0b11110011, //MEM[5]
0b10010000, //MEM[4]
0b00010001, //MEM[3]
0b11110101, //MEM[2]
0b10010000, //MEM[1]
0b00010001, //MEM[0]
0b00000000, //ACC
0b00000000, //IR
0b00000001 //PC[5bit], CU[3bit]
};
//... this will go in the SPI peripheral initialization code
hspi1.Init.Mode = SPI_MODE_MASTER;
hspi1.Init.Direction = SPI_DIRECTION_2LINES;
hspi1.Init.DataSize = SPI_DATASIZE_8BIT;
hspi1.Init.CLKPolarity = SPI_POLARITY_LOW;
hspi1.Init.CLKPhase = SPI_PHASE_1EDGE;
hspi1.Init.NSS = SPI_NSS_SOFT;
//...
hspi1.Init.FirstBit = SPI_FIRSTBIT_MSB;
hspi1.Init.TIMode = SPI_TIMODE_DISABLE;
//...
//... this will go in your main or similar
uint8_t *program = program_led_btn;
//on the first run, scan_out will give us the reset value of the
// processor
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 0);
HAL_Delay(100);
HAL_SPI_TransmitReceive(&hspi1, program, scan_out, 21, HAL_MAX_DELAY);
HAL_Delay(100);
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 1);
//we can check if the program loaded correctly by immediately scanning
// it back in again, it will be unloaded to scan_out
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 0);
HAL_Delay(100);
HAL_SPI_TransmitReceive(&hspi1, program, scan_out, 21, HAL_MAX_DELAY);
HAL_Delay(100);
HAL_GPIO_WritePin(SPI_SCAN_CS_GPIO_Port, SPI_SCAN_CS_Pin, 1);
//Check if the program loaded correctly
for(int i = 0; i < 21; i++) {
if(program[i] != scan_out[i]) {
while(1); //it failed
}
}
Once the program is loaded (using the above code or similar), we can run a program. This is as easy as providing a clock signal to the processor and setting the processor enable line high. We do this using the SPI as follows:
uint8_t dummy; //a dummy value
HAL_GPIO_WritePin(SPI_PROC_CS_GPIO_Port, SPI_PROC_CS_Pin, 0);
//...
//run this in a loop
HAL_SPI_TransmitReceive(&hspi1, &dummy, &dummy, 1, HAL_MAX_DELAY);
The processor will ignore any value being shifted in on the MOSI data line during operation. However, it does provide a nice feature in that the processor will emit the current value of the processor_halt signal on the MISO line. This means that we can improve the code to run this process in a loop and catch when the program reaches a HLT instruction:
HAL_Delay(100);
HAL_GPIO_WritePin(SPI_PROC_CS_GPIO_Port, SPI_PROC_CS_Pin, 0);
dummy = 0;
while(1) {
HAL_SPI_TransmitReceive(&hspi1, &dummy, &dummy, 1, HAL_MAX_DELAY);
if(dummy == 0xFF)
break;
}
HAL_GPIO_WritePin(SPI_PROC_CS_GPIO_Port, SPI_PROC_CS_Pin, 1);
HAL_Delay(100);
Of course, the example program does not HLT, so we will not reach this point in this code. But, it works for other programs that do contain a HLT instruction.
Once you have the provided example program running, you will be able to press the button and see how the LEDs are toggled.
Any microcontroller with an SPI peripheral
# | Input | Output |
---|---|---|
0 | clock - connect to an SPI SCK. | general purpose output 0 (e.g. LED segment a). This output comes from the I/O register bit 1. |
1 | reset (active high) | general purpose output 1 (e.g. LED segment b). This output comes from the I/O register bit 2. |
2 | scan enable (active low) - connect to an SPI chip select. | general purpose output 2 (e.g. LED segment c). This output comes from the I/O register bit 3. |
3 | processor enable (active low) - connect to an SPI chip select. | general purpose output 3 (e.g. LED segment d). This output comes from the I/O register bit 4. |
4 | scan data in - connect to an SPI MOSI. | general purpose output 4 (e.g. LED segment e). This output comes from the I/O register bit 5. |
5 | general purpose input (e.g. Button). This input will be provided to the I/O register bit 0. | general purpose output 5 (e.g. LED segment f). This output comes from the I/O register bit 6. |
6 | general purpose output 6 (e.g. LED segment g). This output comes from the I/O register bit 7. | |
7 | scan data out - connect to an SPI MISO. |