The Reference Nucleus
Authors: Lorenz Sommer 2024, Johannes Hofmann 2025, Niklas Sirch 2025, Tristan Kundrat 2026
This chapter gives an overview of the reference nucleus’ capabilities and details its submodules.
Overview
The main purpose of the reference nucleus is to provide a well documented and minimal implementation of the RV32 RISC-V ISA.
It is minimal in the sense that it does not contain any pipelines and does not support multi-nucleus architectures.
Capabilities
The reference nucleus largely supports running of programs compiled by the RISC-V GCC toolchain. The nucleus is capable of running programs that utilize external peripherals and interrupt-driven programming models.
It implements the following extensions of the RISC-V ISA:
I (Base integer instruction set)
A (Atomic instructions)
M (Multiplication, division and remainder instructions)
Zicsr (Control and Status Register instructions)
Zifencei (Instruction-fetch fence instructions) - only implemented as a NOP operation as the reference nucleus is single core and does not reorder memory operations.
Memory
The reference nucleus is intended for use with byte-addressable memory.
It interfaces with a so called “membrana” to access memory. It does not access memory directly by itself.
Communication with the membrana is split into two ports, the instruction port for instruction fetching, and the data
port for load and store operations. For these ports, a custom communication protocol has been created.
The membrana is connected to the internal system bus and acts as an interface for the nucleus’ requests to the system bus.
RISC-V Interrupt Support
The reference nucleus implements machine mode interrupt handling. The system supports the three standard machine mode interrupt types:
Machine External Interrupt (MEIP) - Highest priority interrupt from external peripheral devices
Machine Timer Interrupt (MTIP) - Medium priority interrupt for time-based scheduling and periodic tasks
Machine Software Interrupt (MSIP) - Lowest priority interrupt for inter-processor communication
The interrupt system includes a complete CSR (Control and Status Register) implementation with all required machine mode registers: MIP, MIE, MSTATUS, MTVEC, MEPC, MCAUSE, and MTVAL. A hardware trap handling state machine manages interrupt entry and exit sequences.
The nucleus integrates with CLINT (Core Local Interruptor) modules that provide memory-mapped timer and software interrupt control at standard RISC-V addresses (base address 0x2000000).
Interrupt handling adds additional states to the controller’s finite state machine, including trap entry sequences for context saving and the MRET instruction for interrupt return. When an interrupt occurs, the hardware automatically saves the current PC to MEPC, updates MCAUSE with the interrupt type, disables global interrupts, and transfers control to the trap handler specified in MTVEC.
Working principle
The reference nucleus utilizes a simple fetch-decode-execute workflow.
The decode step is always executed within one clock cycle. All instructions that are not load or store
instructions, also execute within one clock cycle. The fetch step is a IPort memory transaction and takes
3 clock cycles to complete. load operations require 4 clock cycles, store operations require 3.
For the time being, the ECALL and EBREAK operations halt the processor.
The FENCE operation is implemented as a NOP operation, as the nucleus does not reorder memory operations.
All Module outputs are always buffered in registers, limiting combinational logic depth and ensuring clean timing.
Fig. 2 reference nucleus block diagram
Arithmetic Logic Unit (ALU)
Fig. 3 ALU module symbol
-
SC_MODULE(m_alu)
The ALU (Arithmetic Logic Unit) is a circuit that performs arithmetic and logical operations.
- Authors
Lorenz Sommer, Niklas Sirch, Daniel Sommerfeldt, Tristan Kundrat
- Input ports:
Port name
Width in bits
Description
a
32
Operand A.
b
32
Operand B.
funct3
3
3-bit control signal that selects the operation to be performed.
Derived from IR[14:12].
funct7
7
Decides between ADD/SUB, SRA/SRL operations.
Derived from IR[31:25].
force_add
1
If set, forces the ALU to add the operands.
force_amo
1
If set, forces the ALU to select the operation from funct5.
funct5(IR[31:27]) is in funct7.
alu_mode
3
Selects mode of operation for ALU. Either Idle, Reg-Reg operations or Reg-Imm operations.
clk
1
Clock signal for multiplication and division
reset
1
Reset signal for multiplication and division
- Output ports:
Port name
Width in bits
Description
y
32
Result of the logical and arithmetic operation performed on
operands A and B.
equal
1
Status signal, 1 if a == b, else 0.
less
1
Status signal, 1 if a < b, else 0.
lessu
1
Status signal, 1 if a < b, else 0 (unsigned).
valid
1
Status signal, 1 when calculation finished, else 0.
When calculating a number using the multiplication or division module,
the valid signal of the ALU corresponds to the valid signal of the calculating module.
When calculating any RV32I instruction, the logic will be purely combinatorial and the valid signal will always be 1.
- Operation table:
Given the input signals funct3 and funct7, the ALU performs the following operations on the operands A and B.
Operation
Shorthand
funct3
funct7
Description
ADD
y = a + b0x0
0x0
Regular addition
SUB
y = a - b0x0
0x20
Two’s complement of B is added to A.
R-Type only.
AND
y = a & b0x7
0x0
Bitwise AND
OR
y = a OR b0x6
0x0
Bitwise OR
XOR
y = a ^ b0x4
0x0
Bitwise XOR
SLL
y = a << b0x1
0x0
Shift left logical
SRL
y = a >> b0x5
0x0
Shift right logical
SRA
y = a >> b0x5
0x20
Shift right arithmetic (sign extends)
SLT
y = a < b ? 1,00x2
0x0
Set less than
SLTU
y = a < b ? 1,00x3
0x0
Set less than unsigned (zero extends)
MUL
y = (int)(a * b)0x0
0x01
multiplication (lower half)
MULH
y = ((a * b) >> sizeof(int))0x1
0x01
signed*signed multiplication (upper half)
MULHU
y = (unsigned int)((a * b) >> sizeof(int))0x3
0x01
unsigned*unsigned multiplication (upper half)
MULHSU
y = (int)((a * b) >> sizeof(int))0x2
0x01
signed*unsigned multiplication (upper half)
DIV
y = b == 0 ? -1 : rtz(a / b)0x4
0x01
signed/signed, rounding towards zero (rtz), dividing with 0 gives -1.
DIVU
y = b == 0 ? 2^sizeof(int)-1 : rtz(a / b)0x5
0x01
unsigned/unsigned, rounding towards zero (rtz), dividing with 0 gives 2^sizeof(int)-1.
REM
y = b == 0 ? a : a mod b0x6
0x01
signed mod signed, dividing with 0 gives a. a == -2^(sizeof(int) - 1) && b == -1 gives 0.
REMU
y = b == 0 ? a : a mod b0x7
0x01
unsigned mod unsigned, dividing with 0 gives a.
- Notes:
The funct3_in signal is not mapped uniquely. For example, the ADD and SUB operation share the same funct3 value and are distinguished by the funct7 signal. The same goes for the SRL and SRA operations.
The ALU is indirectly used to perform branch and jump instructions as well as the LUI and AUIPC instructions.
The ALU is also used to perform AMO operations, that are determined by the funct5 (inside funct5) field of the instruction.
For AMOs the ALU also performs the unique swap, min(u) and max(u) operations.
R&I-Type instructions
The ALU can handle R- and I-Type instructions. In order to correctly implement the few differences the two instruction
types have, a few distinctions have to be made. This concerns the fact, that there is no I-Type subi instruction.
As seen in the operation table, funct7 decides whether subtraction is performed instead of
addition and whether to shift logically or arithmetically. According to the RISC-V
specification the only relevant values of the funct7 block are 0x0 and 0x20 for the base RV32I instruction set
and 0x01 for the M extension. Additionally, even though I-Type instructions get decoded into their internal immediate value,
the position of the funct7 block remains the same.
Address calculation
Next to R- and I-Type instructions, the ALU is also indirectly used to perform address calculations for the load, store,
branch and jump instructions. It also performs the additions necessary for the LUI and AUIPC instructions. All of
the previously named instructions use the force_add flag to force the ALU to perform addition. Depending on the
instruction, the operands are selected accordingly by the control unit (controller). Possible sources for operand A
are rs1 and pc_out (current program counter). Possible operands for operand B are rs2 and imm_out (immediate value).
Atomic Memory Operations (AMOs)
The ALU is also used to perform atomic memory operations (AMOs) from the RISC-V A-Extension.
When the controller sets the force_amo flag, the ALU will select the operation from
the funct5 (contained inside funct7) block of the current instruction.
The 4 A-Extension specific operations swap, min, minu, max and maxu are implemented in
the ALU as well.
M-/Zmmul-Extension
The M-Extension (multiplication, division and remainder instructions) are handled by two modules: one for multiplication and one for division/remainder.
Multiplication and division module
Currently, multiplication is implemented as a simple shift-and-add multiplier
and division using a shift-and-subtract divider.
Calculation modules for the M-Extension can be found under alu/m_extension/mul
and alu/m_extension/div under the nucleus_ref folder.
Every division and multiplication module has the following structure:
Fig. 4 Multiplication/Division Module Block Diagram
Also, reset and clock signals are connected, if the calculation is not purely combinatorial.
The signals have the following meaning:
Signal name |
Explanation |
|---|---|
a_in |
Value for calculation |
b_in |
Value for calculation |
funct3_in |
Funct3 from instruction. Contains which calculation should be made |
start_in |
Must be set to High, when the inputs a_in and b_in are valid numbers and calculation should start. |
y_out |
Multiplication: \(a_{in} * b_{in} = y_{out}\) |
valid_out |
High, when calculation is done and the y_out signal is valid. |
Edge cases (for division)
Division should always round towards zero. The following edge cases should be handled like so:
Condition |
Dividend (a) |
Divisor (b) |
|
|
|
|
|---|---|---|---|---|---|---|
Division by zero |
\(x\) |
\(0\) |
\(2^{32} - 1\) |
\(x\) |
\(-1\) |
\(x\) |
Overflow (signed only) |
\(-2^{31}\) |
\(-1\) |
- |
- |
\(-2^{31}\) |
\(0\) |
Changing out the calculation modules
Signals to the multiplication and division module are connected in the alu.cpp
file. To change out the multiplication and/or division module with a different
version, multiple steps have to be taken. In this example, the new multiplication
module is called mult_new and the new division module divi_new.
First change the sources in the nucleus_ref/Makefile:
# ...
# Module sources that are used for simulation and for synthesis ...
# Change the sources for the multiplication and division modules here.
MODULE_SRC := $(MODULE).cpp \
alu/alu.cpp \
alu/m_extension/mul/mult_new.cpp \
alu/m_extension/div/divi_new.cpp \
# ...
TESTBENCH_ALU_SRC := alu/alu.cpp \
alu/m_extension/mul/mult_new.cpp \
alu/m_extension/div/divi_new.cpp \
alu/alu_tb/alu_tb.cpp
# ...
Next, the submodules in the nucleus_ref/alu/alu.h header file:
// ...
/* Submodules */
#if PN_CFG_ALU_ENABLE_ZMMUL_EXTENSION == 1
// Change class here, if you want to use a different multiplication module.
// Also change imports accordingly.
class m_mult_new* multiplication_module;
#endif
#if PN_CFG_ALU_ENABLE_M_EXTENSION == 1
// Change class here, if you want to use a different division module.
// Also change imports accordingly.
class m_divi_new* division_module;
#endif
// ...
And finally, the imports and definitions in the nucleus_ref/alu/alu.cpp file:
// ...
// change the following import to change out the multiplication module
#include "m_extension/mul/mult_new.h"
// change the following import to change out the division module
#include "m_extension/div/divi_new.h"
// ...
void m_alu::init_submodules()
{
#if PN_CFG_ALU_ENABLE_ZMMUL_EXTENSION == 1
// change this assign according to the manual, if you want to replace
// the multiplication module.
multiplication_module = sc_new<m_mult_new>("mult_new");
// ...
#endif
#if PN_CFG_ALU_ENABLE_M_EXTENSION == 1
// change this assign according to the manual, if you want to replace
// the division module.
division_module = sc_new<m_divi_new>("divi_new");
// ...
#endif
}
Enabling and disabling the M-/Zmmul-Extension
Enabling and disabling the M-/Zmmul-Extension is possible.
For enabling the entire M-Extension the variable PN_MARCH in the piconut-config.mk
has to be updated to rv32im_zicsr in the system using this nucleus.
Also the PN_CFG_ALU_ENABLE_M_EXTENSION variable in the same file has to be changed.
As an example, look at the file piconut/systems/refdesign/piconut-config.mk:
## RISC-V ISA and extensions ...
## Changed -march argument for gcc. Comment following lines, if M and Zmmul
## extensions are not used.
# Uncomment following line, if only using Zmmul extension
# PN_MARCH ?= rv32i_zicsr_zmmul
# Uncomment following line, if using M extension
PN_MARCH ?= rv32im_zicsr
# ...
# M-Extension ...
# Enable (1) or Disable (0) M-Extension
PN_CFG_ALU_ENABLE_M_EXTENSION ?= 1
# Enable (1) or Disable (0) Zmmul-Extension
PN_CFG_ALU_ENABLE_ZMMUL_EXTENSION ?= 1
Enabling only the Zmmul-Extension, we need the following in the systems piconut-config.mk:
## RISC-V ISA and extensions ...
## Changed -march argument for gcc. Comment following lines, if M and Zmmul
## extensions are not used.
# Uncomment following line, if only using Zmmul extension
PN_MARCH ?= rv32i_zicsr_zmmul
# Uncomment following line, if using M extension
# PN_MARCH ?= rv32im_zicsr
# ...
# M-Extension ...
# Enable (1) or Disable (0) M-Extension
PN_CFG_ALU_ENABLE_M_EXTENSION ?= 0
# Enable (1) or Disable (0) Zmmul-Extension
PN_CFG_ALU_ENABLE_ZMMUL_EXTENSION ?= 1
Disabling everything is possible via commenting the line in piconut-config.mk of the system
and setting both config enables to zero:
## RISC-V ISA and extensions ...
## Changed -march argument for gcc. Comment following lines, if M and Zmmul
## extensions are not used.
# Uncomment following line, if only using Zmmul extension
# PN_MARCH ?= rv32i_zicsr_zmmul
# Uncomment following line, if using M extension
# PN_MARCH ?= rv32im_zicsr
# ...
# M-Extension ...
# Enable (1) or Disable (0) M-Extension
PN_CFG_ALU_ENABLE_M_EXTENSION ?= 0
# Enable (1) or Disable (0) Zmmul-Extension
PN_CFG_ALU_ENABLE_ZMMUL_EXTENSION ?= 0
Keep in mind, that the PN_CFG_... variables may be set by the piconut-config.mk in the root
directory of this project, so comment those lines out or change them accordingly.
Scalar Cryptography
The reference nucleus includes a scalar cryptography module that implements parts of the RISC-V Scalar Cryptography Extension.
The module currently supports the following extensions:
Zkne for RV32 (NIST Encryption = AES)
Zknd for RV32 (NIST Decryption = AES)
This module is integrated into the ALU and started when a crypto instruction is executed. Like multiplication this module operates over multiple cycles.
-
SC_MODULE(m_scalar_crypto)
Scalar Cryptography Module implementing parts of the RISC-V Scalar Cryptography Extension.
At the moment, this module supports AES encryption and decryption instructions as defined as Zkne and Zknd extensions for RV32. This module can be extended to support additional scalar cryptographic instructions.
This module implements a simple state machine to interact with the ALU similar to other ALU modules like multiplication.
- Ports:
- Parameters:
clk – [in] Clock of the module.
reset – [in] Reset of the module.
rs1_in – [in] First source register input.
rs2_in – [in] Second source register input.
funct7_in – [in] Function 7 field input.
funct3_in – [in] Function 3 field input.
start_in – [in] Start signal input.
valid_out – [out] Output signal indicating valid result.
res_out – [out] Result output.
AES Instructions (Zkne, Zknd)
Algorithm
The Advanced Encryption Standard (AES) is the Standard for symmetric encryption specified by the National Institute of Standards and Technology (NIST).
AES is a round-based cipher. The number of rounds depends on the key size, and the algorithm includes a special final round. These operations implement the core principles of symmetric-key cryptography: confusion (making the output significantly different from the input) and diffusion (spreading input bits across the output), combined with the addition of the round key. Decryption applies the inverse operations in reverse order. The basic operations displayed in the figure below of AES encryption are:
SubBytes Substitution of bytes via a lookup table (S-box)
ShiftRows Shifting of rows in the state matrix
MixColumns Mixing of columns in the state matrix via matrix multiplication
AddRoundKey XORing the state matrix with a round key
Fig. 5 AES algorithm overview (original by John Savard, CC0)
Instructions
The Zkne and Zknd extensions implement AES encryption and decryption instructions, respectively. They are split because some cryptographic modes do not require decryption support.
There are four instructions implemented for AES encryption and decryption:
aes32esmi– Middle-round AES encryptionaes32dsmi– Middle-round AES decryptionaes32esi– Final-round AES encryptionaes32dsi– Final-round AES decryption
The instructions can be used to construct AES in software and also assist with key schedule generation. More details can be found in the Software Chapter.
The implementation of Zkne and Zknd in the reference nucleus is simple but performant for 40MHz clock speed.
Better implementations focus on side-channel resistance, less footprint and support for faster clock speeds but for the purpose of this reference nucleus, this implementation is sufficient.
For the exact implementation please refer to the commented source code.
Regfile
The regfile component of the nucleus is used to store temporary values that are needed for calculations and program flow. It contains 32 32-bit registers.
Port name |
Signal width |
Description |
|---|---|---|
data_in |
32 |
Data input |
select_in |
5 |
Selects the register in which input data is stored |
rs1_select_in |
5 |
Selects which registers value gets output to output port |
rs2_select_in |
5 |
Selects which registers value gets output to output port |
en_load_in |
1 |
Control signal to enable/disable storing of input data with the next rising edge. |
Port name |
Signal width |
Description |
|---|---|---|
rs1_out |
32 |
Output of the register selected by |
rs2_out |
32 |
Output of the register selected by |
Fig. 6 Regfile module symbol
The register x0 holds the permanent value 0x00000000 according to the RISC-V specification.
The reset value of all registers is 0x00000000.
Program Counter (PC)
The program counter module is used to store the address of the current instruction at any given time. Its content
can be thought of as the current position within the program. The program counter is incremented by 0x4 with the next
rising edge, once the controller sets the s_pc_inc4 control signal. Its internal value can be overwritten with the value
at its input port with the next rising edge if s_pc_ld_en is set. This is used to perform jump and branch instructions.
PC Interrupt
The PC module supports RISC-V interrupt and exception handling by providing mechanisms to:
Load trap handler addresses from the MTVEC CSR during interrupt/exception entryg
Restore return addresses from the MEPC CSR during MRET instruction execution
PC Control Priority
The PC module implements a priority-based loading mechanism where interrupt/exception handling takes precedence over normal program flow control. The priority order (highest to lowest) is:
Debug level exit - Loads address from DPC CSR
Debug level entry - Loads debug handler start address
Trap handler entry - Loads trap handler address from MTVEC CSR
Interrupt return (MRET) - Loads return address from MEPC CSR
Normal load - Loads address from
pc_in(jump/branch instructions)Increment - Increments PC by 4 (normal instruction sequence)
Port name |
Signal width |
Description |
|---|---|---|
pc_in |
32 |
Data input for when the PC is to be manipulated by jump and branch instructions. |
inc_in |
1 |
Control signal. When set, the PC is incremented by |
en_load_in |
1 |
Control signal. Enables loading of a new value with the next rising edge. |
debug_level_enter_in |
1 |
Debug level entry signal. Loads debug handler start address. |
debug_level_leave_in |
1 |
Debug level exit signal. Loads address from DPC CSR. |
csr_dpc_in |
32 |
Debug Program Counter from CSR module. |
trap_handler_enter_in |
1 |
Trap handler entry signal for interrupt/exception handling. |
csr_mtvec_in |
32 |
Machine Trap Vector from CSR module containing trap handler base address. |
mret_in |
1 |
Machine Return instruction signal for interrupt/exception return. |
csr_mepc_in |
32 |
Machine Exception Program Counter from CSR module containing return address. |
Note
Program counter internal register
The internal register of the program counter is 30 bits wide. The lowest two bits of any given address pertaining
to the main program in memory can be omitted because every instruction is naturally word aligned and 4 bytes in size.
To maintain a level of consistency for the program counters interactions with other modules and the instruction port
interface, the output and input ports kept at a width of 32 bits. The lowest two bits are simply set to a constant 0 at the output and dismissed at the input. Instead of
every connected module having to append these two zeroes at their inputs, it is done once at the PCs output. The same goes for the PC input port.
Interrupt and Exception Handling
The PC module plays a critical role in RISC-V interrupt and exception handling:
Trap Entry: When an interrupt or exception occurs, the controller asserts trap_handler_enter_in, causing the PC
to load the trap handler address from the MTVEC CSR. This automatically transfers control to the interrupt service routine.
Trap Exit: The MRET (Machine Return) instruction asserts mret_in, causing the PC to load the return address
from the MEPC CSR, restoring execution to the point where the interrupt occurred.
Priority Handling: The PC module ensures that interrupt/exception control takes precedence over normal program flow, preventing race conditions during trap handling sequences.
Port name |
Signal width |
Description |
|---|---|---|
pc_out |
32 |
Constant output of the internal register. |
Fig. 7 Program counter module symbol
Instruction Register (IR)
The instruction register module stores the current instruction in its internal register and provides its content to the rest of the nucleus via a constant output.
Port name |
Signal width |
Description |
|---|---|---|
ir_in |
32 |
Input data. Internal register takes on this value at the next rising edge. |
en_load_in |
1 |
Control signal. Enables loading of a new value with the next rising edge. |
Port name |
Signal width |
Description |
|---|---|---|
ir_out |
32 |
Constant output of the internal register. |
The IR input port data_in is connected to the IPort rdata signal.
Fig. 8 Instruction register module symbol
Immediate Generator (immgen)
The immediate generator (immgen) module decodes the current instruction and generates an immediate value from it. All but the R-Type instruction type carry an immediate value. This immediate value is a simple constant built into an instruction.
Port name |
Signal width |
Description |
|---|---|---|
data_in |
32 |
Instruction word from which an immediate value is to be generated from. |
Port name |
Signal width |
Description |
|---|---|---|
imm_out |
32 |
Immediate value generated from an instruction word. |
Immediate values are decoded differently and serve a different purpose depending on the instruction and instruction type. Decoding and generation are executed in accordance with the RISC-V specification page 44, section “Immediate Encoding Variants.
Instruction Type |
Resulting immediate value |
|---|---|
I-Type |
|
S-Type |
|
B-Type |
|
U-Type |
|
J-Type |
|
How to read the immediate encoding table
In the table above, inst stands for the current instruction word from which an immediate value is to be generated from.
The operator + in this case stands for concatenation of bits/bit-ranges.
Additionally, syntax like (inst[31])[31:12] resolves as: “fill the range [31:12] in the output value with inst[31]”.
The immediate generator module detects the correct decoding variant by evaluating the opcode field of the current
instruction. Groups of instructions in the RV32I subset (load/store/immediate/branch/jump/upper immediate) do not
always use the same instruction type format across the board.
Instruction type |
Instructions |
|---|---|
I-Type |
All “immediate arithmetic” instructions, all |
S-Type |
All |
B-Type |
All |
U-Type |
|
J-Type |
|
Note
The R-Type instruction format is not listed, since it does not contain an immediate value.
Fig. 9 Immediate generator module symbol
Byteselector
The byteselector module generates the bsel signal. This signal is forwarded to the data port interface as
well as modules that handle memory access. It is necessary for the implementation of byte and halfword load/
store commands.
Port name |
Signal width |
Description |
|---|---|---|
adr_in |
2 |
Lowest two bits of the |
funct3_in |
3 |
|
Port name |
Signal width |
Description |
|---|---|---|
byteselect_out |
4 |
Outgoing byteselect signal. |
invalid_out |
1 |
Set if adress is misaligned. |
Because memory accesses always use a full 32-bit word aligned address (lowest two bits are 0) it is necessary to
generate an additional signal to allow for loading/storing of individual bytes and halfwords. This bsel signal
signifies which of the 4 bytes of a 32-bit word at a given address are to be loaded from or stored to. The bsel signal
is 4 bits wide and each bit represents a single byte of a 32-bit word.
When an address is calculated by the ALU, only the upper 30 bits are forwarded to the data port interface.
Memory accesses must always be word aligned and thus the lowest two bits of any address must always be 00.
This is because the main memory is byte-addressable and therefore 32-bit words sit in memory in intervals of 0x4,
leading to the lowest to bits always being 00.For the purpose of generating the bsel signal, they - along with the
funct3 block of the current instruction - lead to the following bsel signal assignments.
adr_in |
funct3_in |
bsel_out |
|---|---|---|
|
lb/lbu/sb |
|
|
lb/lbu/sb |
|
|
lb/lbu/sb |
|
|
lb/lbu/sb |
|
|
lh/lhu/sh |
|
|
lh/lhu/sh |
|
|
lh/lhu/sh |
invalid, |
|
lh/lhu/sh |
invalid, |
|
lw |
|
|
lw |
invalid, |
|
lw |
invalid, |
|
lw |
invalid, |
Note
The nucleus currently only supports fully 32-bit aligned loads and stores. It is not possible to load/store data
beyond the 32-bit boundary of any address. Additionally, disjointed loads are also not supported. (for example bsel 0101).
Examples
To further illustrate the interaction between the effective address and the bsel signal, consider the following examples:
Assume the following memory layout.
Address |
Value |
|---|---|
|
|
|
|
Example 1
Let the registers
x1andx2contain the value0.Let the current instruction be
lb x1, 1(x2).This reads as “load the value of the byte at address
0x00000000offset by0x1intox1.This leads to an effective target address of
0x00000001.The targeted byte is therefore
0x34.The address at the dport interface wil be
0x00000000. (Lowest two bits always00!)The byteselector will evaluate the
funct3block (here:0x0) and the lowest two bits of the effective address (here:0x1), resulting in abselvalue of0010.After the memory access transaction is complete, the register
x1will contain the value0x00000034.
Example 2
Let the register
x1contain the value0x0and registerx2contain the value0x00000004.Let the current instruction be
lh x1, 2(x2).This reads as “load the value of the halfword at address
0x00000004offset by0x2intox1.This leads to an effective address of
0x00000006.The targeted halfword is therefore
0xBABE.The address at the dport interface wil be
0x00000004. (Lowest two bits always00!)The byteselector will evaluate the
funct3block (here:0x1) and the lowest two bits of the effective address (here:0x2), resulting in abselvalue of1100.After the memory transaction is complete. the register
x1will contain the value0x0000BABE.
Note
The examples above only demonstrate the generation and function of the bsel signal. A full memory transaction involves
additional modules.
Fig. 10 Byteselector module symbol
Extender
The extender module prepares incoming data requested by the nucleus via load instructions. It ensures relevant data is in the correct position within the data word and sign extends it, if needed.
Port name |
Signal width |
Description |
|---|---|---|
data_in |
32 |
Incoming data word to be processed |
funct3_in |
3 |
funct3 block of the current instruction |
bsel_in |
4 |
Byteselect signal generated by the byteselector. |
Port name |
Signal width |
Description |
|---|---|---|
extend_out |
32 |
Processed data word |
When the nucleus requests data from memory via a load instruction, the membrana will always provide a 32-bit
data word. If the load instruction requests a full 32-bit word (lw instruction) no further steps have to be taken and
the full data word is stored in the target internal register. This changes, when the nucleus requests to load
a byte or halfword via the lb or lh instructions respectively. The membrana will still provide a 32-bit word
with the relevant data occupying either any of the single 4 bytes within the full word (lb) or two consecutive bytes
(lh). The purpose of this module is to resolve this arrangement and to reorient it to form a new 32-bit word
that represents this data. For this, the extender module uses the bsel signal already provided by the
Byteselector buffered through the bsel register. The bsel signal represents a
mask where each of its bits corresponds to a byte in the incoming data word.
Incoming data |
|
Output |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In addition to rearranging the relevant bytes or halfwords, the extender module
is responsible for applying the correct extension to the loaded data. For lb
and lh instructions, sign extension is performed: the module inspects the most
significant bit of the selected byte or halfword and, if it is set, fills the
upper bits of the output word with ones. For lbu and lhu instructions, zero
extension is applied, ensuring the upper bits are cleared. This behavior is
determined by evaluating the funct3 field of the current instruction.
For A-Extension instructions, write-backs from the rdata register require the extender module to
use the bsel value stored in the bsel register, which holds the bsel from the most recent load
instruction. This is necessary because, during A-Extension operations, the address is not calculated
by the ALU, so the current bsel would not reflect the correct byte selection.
Fig. 11 Extender module symbol
Datahandler
The datahandler module is prepares outgoing data to the Dport interface. It ensures data within the outgoing 32-bit word is in the correct position.
Port name |
Signal width |
Description |
|---|---|---|
data_in |
32 |
Incoming data word to be processed |
bsel_in |
4 |
Byteselect signal generated by the byteselector |
Port name |
Signal width |
Description |
|---|---|---|
data_out |
32 |
Processed data word |
Data stored in any internal register (the regfile) occupies the low bits of the register according to the RISC-V
specification. However, since sb and sh instructions enable the storing of individual bytes and halfwords to a
specific, byte or halfword aligned position within any given memory address, it is necessary to rearrange outgoing
data accordingly. Essentially, the datahandler module performs the inverse operation of the
extender module. It uses the bsel signal to determine which position within the outgoing
data word the incoming data should occupy. Unlike the extender module, it does not perform
any sign or zero extension.
Incoming data |
|
Output |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fig. 12 Datahandler module symbol
Controller
The controller (also known as the control unit) is a finite state machine that sets a number of control signals which control signal flow and toggle functions of other submodules. It reads status signals from various parts of the nucleus and evaluates them at specific decision points to facilitate the overall function of executing instructions by setting control signals accordingly. The controller also implements interrupt and trap handling, managing the multi-state sequences required for interrupt entry, context saving, and interrupt return (MRET) operations.
-
SC_MODULE(m_controller)
Main state machine of the processor. Reads status signals and controls module behavior via control signals.
The controller handles interrupt processing according to RISC-V specifications:
Monitors interrupt pending signals from the CSR module
Implements trap entry sequence (STATE_TRAP_ENTRY, STATE_TRAP_SAVE_PC, etc.)
Handles MRET instruction for returning from machine-mode exceptions/interrupts
Coordinates with CSR module for saving/restoring processor state during traps
- Notes
The syntax “s_<signal_name>” and “c_<signal_name>” is used to denote status and control signals, respectively. Status signals are outputs of various modules, while control signals are outputs of the controller.
”Current instruction” refers to the value held by the instruction register at the current cycle.
- Parameters:
clk – [in] Clock signal
reset – [in] Reset signal
s_instruction_in<32> – [in] The current instruction
s_alu_less_in<1> – [in] High if ALU operand A is less than operand B
s_alu_lessu_in<1> – [in] High if ALU operand A is less than operand B (unsigned)
s_alu_equal_in<1> – [in] High if ALU operand A is equal to operand B
s_alu_valid_in<1> – [in] High if ALU is finished with calculation
s_dport_ack_in<1> – [in] Data port acknowledge signal
s_iport_ack_in<1> – [in] Instruction port acknowledge signal
s_debug_haltrequest_in<1> – [in] Halt request for debugging
s_debug_step_in<1> – [in] Debug stepping active
s_interrupt_pending_in<1> – [in] Combined interrupt pending signal from CSR module
s_mip_msip_in<1> – [in] Machine Software Interrupt Pending from CSR
s_mie_msie_in<1> – [in] Machine Software Interrupt Enable from CSR
s_mip_mtip_in<1> – [in] Machine Timer Interrupt Pending from CSR
s_mie_mtie_in<1> – [in] Machine Timer Interrupt Enable from CSR
s_mip_meip_in<1> – [in] Machine External Interrupt Pending from CSR
s_mie_meie_in<1> – [in] Machine External Interrupt Enable from CSR
s_mstatus_mie_in<1> – [in] Machine Status Global Interrupt Enable from CSR
c_iport_stb_out<1> – [out] Instruction port strobe signal
c_dport_stb_out<1> – [out] Data port strobe signal
c_dport_we_out<1> – [out] Data port write enable signal
c_dport_lrsc_out<1> – [out] Data port load-reserved/store-conditional signal
c_dport_amo_out<1> – [out] Data port atomic memory operation signal
c_ld_en_rdata_out<1> – [out] Enable register for loading rdata
c_ld_en_wdata_out<1> – [out] Enable register for loading wdata
c_ld_en_adr_bsel_out<1> – [out] Enable register for loading address bsel
c_reg_ldpc_out<1> – [out] Enables register file to load value from program counter
c_reg_ldmem_out<1> – [out] Enables register file to load value from memory
c_reg_ldimm_out<1> – [out] Enables register file to load immediate value
c_reg_ldalu_out<1> – [out] Enables register file to load value from ALU
c_reg_ldcsr_out<1> – [out] Enables register file to load value from CSR-bus read signal
c_reg_ld_en_out<1> – [out] Enables register file to load a value
c_alu_pc_out<1> – [out] Direct program counter to be ALU operand A
c_alu_imm_out<1> – [out] Direct immediate value to be ALU operand B
c_alu_rdata_reg_out<1> – [out] ALU rdata register output enable
c_rs1_adr_bsel_out<1> – [out] ALU operand B is zero/address select
c_alu_out_to_wdata_out<1> – [out] ALU output to wdata
c_force_add_out<1> – [out] Force ALU to perform addition
c_force_amo_out<1> – [out] Force atomic memory operation
c_alu_mode_out<3> – [out] ALU mode output
c_pc_inc4_out<1> – [out] Increments program counter by 4 on next rising edge
c_pc_ld_en_out<1> – [out] Enables program counter to load a new value
c_ir_ld_en_out<1> – [out] Enables instruction register to load new value from memory
c_debug_haltrequest_ack_out<1> – [out] Acknowledge halt request for debugging
c_debug_level_enter_ebreak_out<1> – [out] Debug level enter request caused by ebreak
c_debug_level_enter_haltrequest_out<1> – [out] Debug level enter request caused by halt request
c_debug_level_enter_step_out<1> – [out] Debug level enter request caused by step
c_debug_level_leave_out<1> – [out] Debug level leave request (triggered by dret)
c_csr_bus_adr_out<PN_CFG_CSR_BUS_ADR_WIDTH> – [out] CSR-bus address
c_csr_bus_write_en_out<1> – [out] CSR-bus write enable
c_csr_bus_read_en_out<1> – [out] CSR-bus read enable
c_csr_imm_en_out<1> – [out] CSR immediate value enable
c_csr_imm_out<5> – [out] CSR immediate value
c_csr_write_mode_out<2> – [out] CSR write mode
c_csr_bus_wdata_out<PN_CFG_CSR_BUS_DATA_WIDTH> – [out] CSR-bus write data
c_csr_interrupt_out<1> – [out] Signal for interrupt handling to CSR module
c_csr_mret_out<1> – [out] Signal for MRET instruction execution to CSR module
c_trap_handler_enter_out<1> – [out] Signal to initiate trap handler entry sequence
The controller operates as a finite state machine with 53 distinct internal states, transitioning between them based on current status signals and instruction flow.
The diagram below provides an overview of the controller’s state machine and its transitions.
Fig. 13 Controller state machine diagram
State name |
Description |
|---|---|
|
Initial state. Resets all control signals. |
|
Instruction port strobe is set high. |
|
Await the instruction port acknowledge signal. |
|
Instruction fetched. Decode the current instruction. |
|
Execute ALU instructions, PC advances by |
|
Execute ALU instructions, PC does not change. |
|
Execute immediate ALU operations (I-Type). |
|
Execute immediate ALU shift instructions. |
|
Execute upper immediate instructions. |
|
Execute add upper immediate to PC instructions. |
|
No operation, PC advances by |
|
Calculate branch target address and set PC if condition is met. |
|
Do not branch, PC advances by |
|
Jump and link instruction. Calculate address and set PC. |
|
Jump and link register instruction. Calculate address and set PC. |
Load and Store
The load and store procedures (and related states) are subject to optimization and currently focus on on eliminating potential edge cases to ensure proper function.
State name |
Description |
|---|---|
|
Await data port acknowledge signal is low to ensure no conflict with an ongoing transaction. Then go to |
|
Set data port strobe signal high to begin transaction. Calculate the target address and increment PC. Go to |
|
Await data port acknowledge signal. If high, go to |
|
Data port acknowledge signal is high. Enable regfile to load data and go to |
|
Await data port acknowledge signal is low to ensure no conflict with an ongoing transaction. Then go to |
|
Set data port strobe signal high to begin transaction. Calculate the target address and increment PC. Set write enable signal high. Go to |
|
Await data port acknowledge signal. If high, go to |
The interrupt and trap handling procedures extend the controller with additional states for interrupt processing and CSR instruction execution.
State name |
Description |
|---|---|
|
Initial trap detection. Save current PC to MEPC register. |
|
Determine interrupt type by priority and save to MCAUSE register with interrupt flag set. |
|
Read current MSTATUS register to preserve state. |
|
Save current MIE bit to MPIE field and clear MIE bit to disable global interrupts. |
|
Read trap handler base address from MTVEC register. |
|
Transfer control to interrupt handler by loading trap handler address into PC. |
|
Execute machine return (MRET) instruction: restore MIE from MPIE, load PC from MEPC. |
Note
The controller checks for a pending interrupt condition in several states: during the instruction_finished() routine as well as in the IPORT_STB and AWAIT_IPORT_ACK states. If an interrupt is pending in any of these, the state machine immediately transitions to STATE_TRAP_ENTRY to initiate the trap handling sequence.
Note
While load and store procedures are essentially the same, the load procedure required an additional state to
avoid erroneous behavior when the target and source register (that holds the target address) are the same.
The specific solution was to delay the signals that allow the regfile to load a new value (c_reg_ldmem, c_reg_ld_en)
and enable them only after the bus transaction is complete and when the data present at the regfile input signal is valid.
There is potential for skipping over states when certain conditions are met, this however has not been explored with this implementation and is subject to future optimization.
Branching
Because the BRANCH and DONT_BRANCH transition edges would clutter the diagram unnecessarily, their conditions
are detailed in the following table
Note
The table is meant to be read such that if the conditions in the “Conditions” column are met, the controller transitions
into the BRANCH state, else it transitions into the DONT_BRANCH state.
Branch Type and opcode |
Condition |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
A-Extension
The A-Extension (atomic instructions) is handled by a dedicated set of controller states. These states coordinate the execution of atomic memory operations (AMOs), as well as load-reserved (LR) and store-conditional (SC) instructions.
Even though the reference nucleus does not support multi-core systems, the A-Extension is implemented to allow for future expansion.
For AMOs in general, the controller first performs a load from memory, then
executes the specified atomic operation in the ALU, and finally writes the
result back to memory. To ensure that all possible register combinations are
supported Piconut has to keep the old value in the rdata register before writing
it to the rd register. To support multicore systems in the future a dport_amo
flag is set to indicate that the current transaction is an AMO. A membrana module
has to garantee that during that time no other core can access the same memory
address.
For LR/SC instructions, the controller indicates by the dport_lrsc signal that
a load-reserved or store-conditional operation is being performed. On the
store-conditional operation the membrana has to return ‘0’ over rdata if the
load-reserved operation was successful, or ‘1’ if it was not. The controller
then writes the result of the store-conditional operation to the rd register.
Note
LR/SC Eventual Success Section 13.3. of unprivileged RISC-V ISA specification (20250508) explains under which conditions a store-conditional operation is successful.
Two notes on this:
The PicoNut does not enforce the limit of 16 instructions between the LR and SC instructions as this is not mandatory but only a optimization for caching.
Aligning with the spec interupts do not invalidate reservations. But note that software can invalidate the reservations in preemptive context switch by doing a store-conditional on a dummy location as described here
The following table describes each state and its function:
State name |
Description |
|---|---|
|
Awaits data port acknowledge signal is low, ensuring no ongoing transaction. Proceeds to |
|
Initiates a LOAD transaction by setting the data port strobe signal high and asserts |
|
Awaits data port acknowledge signal. Upon acknowledgment, transitions to |
|
Awaits data port acknowledge signal is low. Proceeds to |
|
Initiates a STORE transaction by setting the data port strobe signal high and asserts |
|
Awaits data port acknowledge signal. Upon acknowledgment, transitions to |
|
Awaits data port acknowledge signal is low. Proceeds to |
|
Initiates an AMO LOAD transaction by setting the data port strobe signal high. Advances to |
|
Awaits data port acknowledge signal. Upon acknowledgment, transitions to |
|
Executes the AMO operation in the ALU. The result is written directly to the wdata register. Proceeds to |
|
Awaits data port acknowledge signal is low. Proceeds to |
|
Initiates an AMO STORE transaction by setting the data port strobe signal high. Advances to |
|
Awaits data port acknowledge signal. Upon acknowledgment, transitions to |
|
Shared state for all A-Extension instructions. Loads the prestored data (AMO) or loaded (LR) or status (SC) from the rdata register into the |
CSR Master
Fig. 14 CSR-Master module symbol
-
SC_MODULE(m_csr_master)
The CSR master module is an interface between the controller and the csr bus. It helps generating the correct write data on the CSR-bus (
csr_bus_wdata_out). Thecsr_bus_wdata_outsignal generation can either be based on a general-purpose-register (GPR) or an immediate value. By default the input register is used as the source. To use immediate valueimm_en_inmust be set to1and the desired immediate value must be set atimm_in. The immediate value gets zero extended to matchPN_CFG_CSR_BUS_DATA_WIDTH.There are three write modes selected by the write_mode signal:
Write: This mode sets the
csr_bus_wdata_outto the source value.Set: In this mode,
csr_bus_wdata_outis set to the source value and OR-masked with the value ofcsr_bus_rdata_in.Clear: In this mode,
csr_bus_wdata_outis set to the source value and AND-masked with the value ofcsr_bus_rdata_in.
Write mode signal decoding:
Signal
Meaning
00
Write
01
Set (or-mask)
10
Clear (and-mask)
11
Reserved
- Ports:
- Parameters:
csr_bus_rdata_in – [in] <
PN_CFG_CSR_BUS_DATA_WIDTH> Csr bus read data.source_reg_in – [in] <32> Source register, that is written to the csr bus.
imm_en_in – [in] Enable signal for immediate value generation.
imm_in – [in] <5> Immediate value.
write_mode_in – [in] <2> Write mode of the next write operation to the csr bus.
csr_bus_wdata_out – [out] <
PN_CFG_CSR_BUS_DATA_WIDTH> Csr bus write data.
CSR
Fig. 15 CSR module symbol
-
SC_MODULE(m_csr)
This module implements the CSR’s which have effect in the nucleus itself, like for debug purposes, processor status etc. . The registers are connected to the CSR-bus for basic read/write operations. The read/write protection, if it exists, is implemented separately for each register in its own CThread.
Present CSR’s in this module:
Address
Name
0x300
mstatus
0x301
misa
0x304
mie
0x305
mtvec
0x341
mepc
0x342
mcause
0x343
mtval
0x344
mip
0x7b0
dcsr
0x7b1
dpc
0x7b2
dscratch0
0x7b3
dscratch1
Supported interrupt sources:
MSIP: Machine Software Interrupt (via CLINT)
MTIP: Machine Timer Interrupt (via CLINT)
MEIP: Machine External Interrupt (platform-specific)
- Interrupt Handling:
This module implements RISC-V machine-mode interrupt handling according to the specification.
MIE (0x304): Machine Interrupt Enable register controls which interrupt types are enabled
MTVEC (0x305): Machine Trap Vector register contains the base address of the interrupt handler
MEPC (0x341): Machine Exception Program Counter stores the return address for exceptions/interrupts
MCAUSE (0x342): Machine Cause register identifies the exception/interrupt type
MTVAL (0x343): Machine Trap Value register provides additional trap information
MIP (0x344): Machine Interrupt Pending register shows pending interrupt sources
- Ports:
- Parameters:
clk – [in] Clock of the module.
reset – [in] Reset of the module.
csr_bus_read_en_in – [in] CSR-bus read enable.
csr_bus_write_en_in – [in] CSR-bus write enable.
csr_bus_adr_in – [in] <
PN_CFG_CSR_BUS_ADR_WIDTH> CSR-bus address.csr_bus_wdata_in – [in] <
PN_CFG_CSR_BUS_DATA_WIDTH> CSR-bus write data.csr_bus_rdata_out – [out] <
PN_CFG_CSR_BUS_DATA_WIDTH> CSR-bus read data.pc_in – [in] <32> Program counter.
debug_level_enter_ebreak_in – [in] Debug level enter request caused by ebreak.
debug_level_enter_haltrequest_in – [in] Debug level enter request caused by halt request.
debug_level_enter_step_in – [in] Debug level enter request caused by step.
debug_level_leave_in – [in] Debug level leave request.
dpc_out – [out] <32> Full CSR Debug-Program-Counter.
debug_level_enter_out – [out] Debug level enter request.
debug_step_out – [out] Debug step signal.
msip_in – [in] Machine Software Interrupt Pending from CLINT.
mtip_in – [in] Machine Timer Interrupt Pending from CLINT.
meip_in – [in] Machine External Interrupt Pending (platform-specific).
mret_in – [in] Machine Return instruction execution signal.
interrupt_in – [in] General interrupt signal.
interrupt_pending_out – [out] Combined interrupt pending output signal.
mip_msip_out – [out] Machine Software Interrupt Pending output.
mie_msie_out – [out] Machine Software Interrupt Enable output.
mstatus_mie_out – [out] Machine Status Global Interrupt Enable output.
mtvec_trap_address_out – [out] Machine Trap Vector address output.
mip_mtip_out – [out] Machine Timer Interrupt Pending output.
mip_meip_out – [out] Machine External Interrupt Pending output.
mie_mtie_out – [out] Machine Timer Interrupt Enable output.
mie_meie_out – [out] Machine External Interrupt Enable output.
mepc_out – [out] Machine Exception Program Counter output.