2. Hardware Overview

The hardware can be structured in 3 hierachical levels.

  • The System level

  • The IP Core level

  • The Engine Core level

2.1. System Level

2.1.1. Description

The figure gives an overview of the overall system level of the project. The accelerator starts with a host-PC, that passes input data, as well as weights and configuration to the service processor over the Ethernet interface. The service processor runs C and Python scripts on an installed Peta-Linux to drive the accelerator over a direct connection to the THANNA IP core. Data, such as parameters, inputs, and outputs are stored and loaded from the external DDR memory. The programmable THANNA IP core also has direct access to the memory, which enables the exchange of large amounts of data. Because the CPU and IP Core do have different bus connections to the DDR, the buses can each only do either a read or a write access at the same time. Cache coherency is enabled to synchronize the CPU and IP core access to the DDR and avoid data loss. The components on the Zybo Z-720 are connected via 32-bit data buses, which means that each bus is able to transfer 4 Byte at a time.

2.1.2. Block Design

The hardware on the FPGA board is connected via the block design and can be opened and adapted with the viado editor. If persistent changes to block design file are made in vivado, copy the file from out/system/design_1.bd to board_files/<<name_of_fpga_device>>/design_1.bd.

With the block design editor in vivado one can:

  • integrate new peripherical components or IP cores

  • configure the current IP Core through parameters

  • generate the bitstream to program the FPGA board

2.2. IP Core Level

2.2.1. Description

The IP Core level describes the data interfaces between the peripheral and the main engines. The following diagramm presents an overview of the THANNA_IP_CORE:

The THANNA IP Core and service processor are directly connected over the AXI_SLAVE component. Both as well share access to the external DRR memory. Over the AXI_MASTER and MASTER_CONTROLLER. The data flow and command flow are kept strictly separated at the IP core and host processor intersection since they have different requirements. The data flow includes the inputs, intermediate results, and parameters. The command-flow data consists of instructions, responses, and debug information. Within the IP core, the command flow is distributed to the right engine over the SLAVE_CONTROLLER module. The number and types of engines that are deployed to the FPGA are actually fully customizable, as long as they are connected in the right way and follow the rules of the interfaces.

2.2.2. Modules

  • MAIN: Top level file for deployment on the FPGA device

  • MAIN_SIM: Top level file for the simulation on the FPGA device

  • AXI_TEST: Example engine core that tests the AXI interfaces

  • SLAVE_CONTROLLER: Command transfer logic

  • AXI_SLAVE: Vivado AXI interface module to the host-cpu

  • MASTER_CONTROLLER: Write / Read and engine core DDR access prioritization

  • AXI_MASTER: Vivado AXI interface module to the DDR

2.2.3. Interfaces

Hardware Interfaces:

  • SLAVE_CONTROLLER_INTERFACE: Example engine core that tests the AXI interfaces

  • MASTER_CONTROLLER_INTEFACE: Write / Read and engine core DDR access prioritization

Software Interfaces:

  • MASTER_CONTROLLER_INTEFACE: Interface to the host cpu which includes command handshakes and core targeting

  • AXI_TEST_INTEFACE: Interface to the host cpu that explains all provided command types

2.3. The Engine Core Level

2.3.1. Description

Each engine core is able to receive a custom set of control commands from the host-cpu It can receive/send data to the external memory. Beyond that it is free to do anything with the data and commands.

The AXI_TEST engine core serves as a demo engine core that tests the inferface functionality. It can receive essentially 3 control commands:

  • read data from the DDR

  • increment the data

  • write the data back

The SINGLE_ENGINE_CORE is supposed to be a lightweight, but scalable engine core that can compute all kind of layers within a neural network. The currently supported layers are:

  • convolutional layers

  • full connected layers

In addition it can perform:

  • a RELU activation function

  • max pooling