



## Leveraging Open-Source Frameworks in Commercial FPGA Development A Case Study with SpinalHDL

#### Krzysztof Czyż, PhD

CTO @ embevity krzysztof.czyz@embevity.com

#### Mateusz Maciąg

architect @ embevity mateusz.maciag@embevity.com • How to improve FPGA design services?

• SpinalHDL - a tool enabling speed-up of gateware development.

• Vex/NaxRISCV - a highly configurable soft core.

• Cons and pros of using SpinalHDL in commercial projects.





#### Embevity, what we do?









# Original id find beign, prototyping nd testing Product





The SLVS-to-CSI brigde

How fast could you implement it?

#### SpinalHDL

- high-level hardware description language
- hosted on the top of Scala
- focus on RTL description
- interoperable with existing tools it generates VHDL/Verilog files (as an output netlist) it can integrate VHDL/Verilog IP as blackbox
- open source, started in December 2014 by Charles Papon



## SpinalHDL



#### SpinalHDL – basic concepts



#### Registers and state machines



#### SpinalHDL – basic concepts



#### Flows and streams

```
val io = new Bundle() {
  val userRx = master Flow (Bits(spiWordBits))
  val userTx = slave Stream (Bits(spiWordBits))
io.userRx.setIdle()
val user: State = new State {
  whenIsActive {
    when(spiDev.io.rx.valid) {
      io.userRx << spiDev.io.rx</pre>
. . .
io.userTx
  .throwWhen(spiTxDiscarding)
  .continueWhen(fsm.isActive(fsm.user)) >>
spiDev.io.tx
spiDev.io.txBusEnable := spiMisoAllowed
. . .
```







#### The redesign of the optical interrogator





### The redesign of the optical interrogator



#### Situation overview:

- FPGAs used so far for data transfer only (< 5% usage)</li>
- all computation overhead lies on the central unit
- hundreds of units manufactured
- client decision: offload expensive computations to FPGAs on the cards





Signal processing algorithm, due to its nature, requires a soft-core CPU

Due to very limited resources, *highly*configurable, ready for fine-tuning implementation is needed



#### **Chosen solution**

*VexRiscv* – SpinalHDL implementation of the RISC-V architecture.

VexRiscV / NaxRiscV- the frosting on the SpinalHDL cake 🔱 embevity

- RV32IM\*A\*F\*D\*C\* instruction set, pipeline from 2 to 5+ stages
- "Plugin" based design:



- Interface agnostic (AXI4, Avalon, Wishbone)
- Tested with Linux, Zephyr OS, FreeRTOS

### Custom CPU construction

The entire CPU configuration is done in a single place

# Each plugin provides many options to finely tune the implementation

### **Examples:**

- Set non-cacheable address range
- Instruction decoding details
- ALU implementation details
- Use full barrel shift vs. simple



embevity

thinkina thinas

#### Custom bus configuration



### • Peripheral attachment

# • Bus pipelining fine-tuning (scary Scala operators)

Bus connections

```
axiCrossbar.addSlaves(
ram.io.axi → (0×8000000L, onChipRamSize),
sdramCtrl.io.axi → (0×4000000L, sdramLayout.capacity)
apbBridge0.io.axi → (0×f000000L, 1 MB),
apbBridge1.io.axi → (0×f8000000L, (1 << apb1BusConfig.</pre>
```

```
axiCrossbar.addSlaves(
    ram.io.axi → (0×8000000L, onChipRamSize),
    sdramCtrl.io.axi → (0×40000000L, sdramLayout.capacity),
    apbBridge0.io.axi → (0×f0000000L, 1 MB),
    apbBridge1.io.axi → (0×f8000000L, (1 << apb1BusConfig.addressWi</pre>
```

axiCrossbar.addPipelining(sdramCtrl.io.axi)((crossbar, ctrl) ⇒ {
 crossbar.sharedCmd.halfPipe() >> ctrl.sharedCmd
 crossbar.writeData >/→ ctrl.writeData
 crossbar.writeRsp << ctrl.writeRsp
 crossbar.readRsp << ctrl.readRsp</pre>

#### Custom plugins



**Problem:** custom booting scheme requires telling the CPU to not execute any code *until the firmware is loaded*.

Solution: BootHoldOnPlugin

### Simulation capabilities



Need to simulate gateware and firmware...

... just launch Verilator simulation and connect to your RiscV core with OpenOCD and GDB.

|          | File Edit Se          | election View |           |           |                              | simple-fibonac |         |      |     |       |      |  |
|----------|-----------------------|---------------|-----------|-----------|------------------------------|----------------|---------|------|-----|-------|------|--|
| Û        |                       | onfigurat∨ ∉  | 🖙 main    | .срр м 🗙  |                              |                |         |      | c : |       |      |  |
|          | V VARIABLES G ma      |               |           | n.cpp > 😯 | main()                       |                |         |      |     |       |      |  |
| ρ        | $\sim$ Locals         |               |           |           | <iostream></iostream>        |                |         |      |     |       |      |  |
|          | i: 1                  | 8             |           |           |                              |                |         |      |     |       |      |  |
|          | n: 1                  | Set Value     | F2        | nclude    | <pre><string></string></pre> |                |         |      |     |       |      |  |
| 6        |                       | Set value     | F2        |           |                              |                |         |      |     |       |      |  |
|          |                       | Copy Value    |           |           |                              |                |         |      |     |       |      |  |
| a l      |                       |               |           | @brie     |                              |                |         |      |     |       |      |  |
| ~        |                       | Copy as Expre | ession    |           |                              |                |         |      |     |       | - F. |  |
| [_⊘      | > Registe -           |               |           | ((reti    |                              |                |         |      |     |       |      |  |
| <u>ن</u> |                       | Add to Watch  | + main()f |           |                              |                |         |      |     |       |      |  |
|          | Break on Value Change |               |           |           | n = 15;                      |                |         |      |     |       |      |  |
| ₿        |                       |               | 12        |           | t1 = 0, t2 = 1               | , nextTerm     | = 0;    |      |     |       |      |  |
|          |                       |               |           |           |                              |                |         |      |     |       |      |  |
|          |                       |               |           |           |                              | <= n; ++i)     |         |      |     |       |      |  |
|          |                       |               |           |           | // Prints the                | first two      | terms.  |      |     |       |      |  |
|          |                       |               | D 16      |           | if(i == 1) {                 |                |         |      |     |       |      |  |
|          | > WATCH               |               |           |           | std::cout                    | << t1 << "     |         |      |     |       |      |  |
| 8        | > CALL STACK          |               |           |           |                              |                |         |      |     |       |      |  |
| 2        | $\sim$ breakpoin      | ITS           |           |           | }<br>if(i == 2) {            |                |         |      |     |       |      |  |
|          | All C++               | Exceptio      |           |           | std::cout                    | ee +2 ee "     |         |      |     |       |      |  |
|          | 🏮 🗹 main.cj           | рр 🕕          |           |           | continue;                    |                |         |      |     |       |      |  |
| ×        | }²main* ↔             | ₽             |           |           |                              | Ln 16, Col     | 1 UTF-8 | CRLF | C++ | Win32 | D.   |  |
|          |                       |               |           |           |                              |                |         |      |     |       |      |  |
|          |                       | )/10/J        | 074       |           |                              |                |         |      |     |       |      |  |

2/19/2024

#### SDRAM : MODE REGISTER DEFINITION CAS=3 burstLength=0 CONNECTED SDRAM : MODE REGISTER DEFINITION CAS=3 burstLength=0 SDRAM : MODE REGISTER DEFINITION CAS=3 burstLength=0 Dhrystone Benchmark, Version 2.1 (Language: C) Program compiled without 'register' attribute Please give the number of runs through the benchmark: Execution starts, 200 runs through Dhrys Final values of the variables used in the benchmark: Int Glob: 5 should be: 5 Bool Glob: 1 should be: 1 Ch 1 Glob: A should be: A Ch 2 Glob: B should be: B Arr 1 Glob[8]: 7

**B**00T

```
should be: 7 Arr 2 Glob[8][7]: 210
      should be: Number Of Runs + 10 Ptr Glob->
Ptr Comp: 1073765508
      should be: (implementation-dependent)
Discr: 0
      should be: 0
Enum Comp: 2
      should be: 2
Int Comp: 17
      should be: 17
Str Comp: DHRYSTONE PROGRAM, SOME STRING
      should be: DHRYSTONE PROGRAM, SOME STRING Next Ptr Glob->
Ptr Comp: 1073765508
      should be: (implementation-dependent), same as above
Discr: 0
      should be: 0
Enum Comp: 1
      should be: 1
Int Comp: 18
      should be: 18
Str Comp: DHRYSTONE PROGRAM, SOME STRING
      should be: DHRYSTONE PROGRAM, SOME STRING Int 1 Loc: 5
      should be: 5 Int 2 Loc: 13
      should be: 13 Int 3 Loc: 7
      should be: 7 Enum Loc: 1
      should be: 1 Str 1 Loc: DHRYSTONE PROGRAM, 1'ST STRING
      should be: DHRYSTONE PROGRAM, 1'ST STRING Str 2 Loc: DHRYSTONE PROGRAM, 2'ND STRING
      should be: DHRYSTONE PROGRAM, 2'ND STRING
```

Clock cycles=97424 DMIPS per Mhz:

1.16



- Documentation: needs much more use case examples, can't avoid looking into library implementation (there is a Workshop GitHub repository though).
- The SpinalHDL engine is mature and quite robust, but library components may experience serious problems (found a buggy SPI slave peripheral).
- Conventions above the syntax Scala freedom to create and overload literally any operator makes understanding the code a bit harder.
- Some of the high-level abstractions make learning curve a bit steeper.



- Efficient way of describing hardware: no need to deal with implementation details. Time-boost you gain could be impressive.
- There is no logic overhead in the generated code.
- SpinalHDL is interoperable with VHDL and Verilog.
- Simulation using Verilator enables simulation not only your design, but also testing of firmware running in simulated design.
- Large SpinalHDL standard library.
- Open-source tool with licensing scheme enabling usage in commercial applications.
- Responsiveness of SpinalHDL's creator, Charles Papon.



## Thank you !

contact@embevity.com www.embevity.com













