2. Hardware Overview

The hardware can be structured in 3 hierachical levels.

The System level
The IP Core level
The Engine Core level

2.1. System Level

2.1.1. Description

The figure gives an overview of the overall system level of the project. The accelerator starts with a host-PC, that passes input data, as well as weights and configuration to the service processor over the Ethernet interface. The service processor runs C and Python scripts on an installed Peta-Linux to drive the accelerator over a direct connection to the THANNA IP core. Data, such as parameters, inputs, and outputs are stored and loaded from the external DDR memory. The programmable THANNA IP core also has direct access to the memory, which enables the exchange of large amounts of data. Because the CPU and IP Core do have different bus connections to the DDR, the buses can each only do either a read or a write access at the same time. Cache coherency is enabled to synchronize the CPU and IP core access to the DDR and avoid data loss. The components on the Zybo Z-720 are connected via 32-bit data buses, which means that each bus is able to transfer 4 Byte at a time.

2.1.2. Block Design

The hardware on the FPGA board is connected via the block design and can be opened and adapted with the viado editor. If persistent changes to block design file are made in vivado, copy the file from out/system/design_1.bd to board_files/<<name_of_fpga_device>>/design_1.bd.

With the block design editor in vivado one can:

integrate new peripherical components or IP cores
configure the current IP Core through parameters
generate the bitstream to program the FPGA board

2.2. IP Core Level

2.2.1. Description

The IP Core level describes the data interfaces between the peripheral and the main engines. The following diagramm presents an overview of the THANNA_IP_CORE:

The THANNA IP Core and service processor are directly connected over the AXI_SLAVE component. Both as well share access to the external DRR memory. Over the AXI_MASTER and MASTER_CONTROLLER. The data flow and command flow are kept strictly separated at the IP core and host processor intersection since they have different requirements. The data flow includes the inputs, intermediate results, and parameters. The command-flow data consists of instructions, responses, and debug information. Within the IP core, the command flow is distributed to the right engine over the SLAVE_CONTROLLER module. The number and types of engines that are deployed to the FPGA are actually fully customizable, as long as they are connected in the right way and follow the rules of the interfaces.

2.2.2. Modules

MAIN: Top level file for deployment on the FPGA device
MAIN_SIM: Top level file for the simulation on the FPGA device
AXI_TEST: Example engine core that tests the AXI interfaces
SLAVE_CONTROLLER: Command transfer logic
AXI_SLAVE: Vivado AXI interface module to the host-cpu
MASTER_CONTROLLER: Write / Read and engine core DDR access prioritization
AXI_MASTER: Vivado AXI interface module to the DDR

2.2.3. Interfaces

Hardware Interfaces:

SLAVE_CONTROLLER_INTERFACE: Example engine core that tests the AXI interfaces
MASTER_CONTROLLER_INTEFACE: Write / Read and engine core DDR access prioritization

Software Interfaces:

MASTER_CONTROLLER_INTEFACE: Interface to the host cpu which includes command handshakes and core targeting
AXI_TEST_INTEFACE: Interface to the host cpu that explains all provided command types

2.3. The Engine Core Level

2.3.1. Description

Each engine core is able to receive a custom set of control commands from the host-cpu It can receive/send data to the external memory. Beyond that it is free to do anything with the data and commands.

The AXI_TEST engine core serves as a demo engine core that tests the inferface functionality. It can receive essentially 3 control commands:

read data from the DDR
increment the data
write the data back

The SINGLE_ENGINE_CORE is supposed to be a lightweight, but scalable engine core that can compute all kind of layers within a neural network. The currently supported layers are:

convolutional layers
full connected layers

In addition it can perform:

a RELU activation function
max pooling