RISC-V Server spec (for Fedora Koji System, as builder)

tekkamanninja · December 15, 2021, 11:23pm

Hi all,
FYI, StarFive will have a New Chip Call JH7110(4-core) which has PCIE Gen2. They want to make a RISC-V Server for Linux distros.
Also think about the new Chip From Chinese Academy of Sciences, and other potential server level RISC-V chips.

They may want a spec for RISC-V, So Can We lead them to have a Server spec, we may add what we need, like Fedora koji builder…

maybe we can start a document for it , what do you think? any reference ?

Great thanks for your help

sharkcz · December 16, 2021, 9:02am

Hi,

I believe the requirements are like this

enough memory per CPU hw thread, 4 GB per hart would be OK for now
fast local I/O for storage, 250 GB NVMe would be probably the easiest,
when PCIe is available from the SoC
fast remote I/O, at least 1 Gbit/s network
remote controls for power plus remote console, could be solved with
PiKVM(?)
ideally all packed in 1U/2U server chassis
```
  Dan
```

rjones · December 16, 2021, 9:05am

As a short-cut: “HiFive Unmatched ++”.

ATX / ITX board with standard power header, plenty of RAM, NVMe disk or disks, PCIe slot or slots.

A BMC of some sort would be a very nice addition. The HiFive has a USB-serial port (I think an FTDI chip or similar) which is good, but remote power management is the key to making something that we can put into a datacenter. The ideal chip would probably be the Aspeed AST 2500.

davidlt · December 16, 2021, 9:34am

Disclaimer: this would never be a true server as things like Profiles and Platforms are not even frozen & ratified. The true baseline for “server” class hardware is yet to be set.

Now back to reality Look at SiFive Unmatched as the good base with some changes:

All heatsinks should be compatible with 1U server chassis.
User a large heatsink on SoC (see how BeagleV was overengineered).
80mm/120mm/140mm mounting holes around SoC. Basically if I want to use a proper PC fan (usually 120mm) there should be holes for that to cool the whole board. You don’t need all 4 holes. Long hex standoffs M3 are fine to attach the fan this way. Top down blowing and cooling the whole board.
1 or 2 headers for PC fans. 3-pin or 4-pin PWM doesn’t matter too much.
NVMe is a must. Even if that’s a single lane PCIe Gen 2. That’s wasy better than microSD card or eMMC.
Must be able to select auto-boot on power. Currently Unmatched cannot do that and requires a button press to boot.
WDT is needed to reboot on failures. I do prefer the ones that could provide a longer timeouts (i.e. scale into minutes instead of just a few seconds). The most important is to have WDT in general.
TRNG. FU540 and FU740 has no TRNG or/and crypto engine. We use software solutions to solve low entropy otherwise you wouldn’t even be able to SSH to Unleashed or Unmatched immediatelly after boot. I also have seen haveged daemon eating a lot of CPU power (e.g. on a large git repo checkout). I would love to see TRNG (internal to SoC or external). If that’s an external chip, might as well add some crypto engine (those will have TRNG).
SPI-NOR Flash 32MiB in size (same as on Unleased or Unmatched). Keeping the same will make life easier. IIRC 32MiB is not enough for the current TianoCore EDK II. Maybe consider SPI-NOR Flash with socket? Once we kill it, we could replace it or go with a larger capacity.
AST2500/2600 would be nice to see. Both seem to be available in OpenBMC repository as MACHINE options. Seems the AST2500 is widely available, but it’s harder to find AST2600.
Ability to flash SPI-NOR Flash (new firmware) remotely (e.g. via BMC) or some sort of “Dual BIOS” mode that you can pick from BMC or somehow else. Worst case scenario JTAG to flash it. I would prefer to flash it on Linux side from a running distro (you can do that on Unmatched), but if there is a bug in OpenSBI / U-Boot SPL / U-Boot proper you cannot boot anymore. Need a way to recover withou manual invervention. Unleashed and Unmatched allows to use microSD card (SPi-MMC) for the firmware, which can work with work with SDMuxers (but that’s 100+ EUR extra per board).
Minimum 16GiB of RAM, but socket for DIMMs is better. I would go with 32GiB if possible too. Again doesn’t matter much if DIMMs are supported. LTO is default now, so in general linking stage is expensive. There are some packages where I disabled LTO because of low RAM. Some of QEMU machines are running with 32GiB RAM to help out here.
A single NIC, 1Gbps is fine for the builder.
Add RTC. Saves some headache with skewed clocks to the point where you cannot SSH to the machine.
PCIe slot, x16 slot (electrically it could be only 2 lanes). Also possible to go with shower connector, but would need to be open endded (see PINE64 desings). If we are talking about standard form factor (at least mini-ITX) just put a x16 connector.

TL;DR has to designed to be fully managed remotely. I do like PiKVM, SDMuxers, etc. but prefer a solution where you don’t need to spend hundreds of EUR to get it working. Fast storage is required (it’s epic, no need to deal with NBD too, so much time saved on DNF).

davidlt · December 16, 2021, 9:39am

Ah, BTW, we also need some easy version strap (e.g. via GPIOs) or something if there are multiple revisions of this. Also reprogram FTDI chip with vendor/product/serial so it would be easy to identify the board on the serial port.

kevin · December 16, 2021, 9:14pm

Yeah, all those things mentioned already.

Basically our builders are in a remote datacenter where getting hands on is expensive/difficult/slow and so we want to be able to 100% manage the machines remotely (power, firmware updates, etc).

jaruga · January 6, 2023, 9:25pm

This is a very interesting article. Perhaps, it’s time to prepare to add the RISC-V CPU to the Koji build system.

Google wants RISC-V to be a “tier-1” Android architecture

Arm has become an unstable, volatile business partner

jaruga · January 8, 2023, 1:09pm

I posted the link above to a thread on the Fedora devel@ mailing list too. The discussion is happening there.

mattdm · January 9, 2023, 5:00pm

And the summary still looks to be: we’re still waiting on hardware to exist.

jaruga · January 9, 2023, 6:51pm

Another important factor to add the RISC-V CPU to Koji build as the official port is a RISC-V-based CI and the servers logged in by SSH that upstream source projects can use for free. It makes our contributions to the upstream open source projects much easier.

In the case of Arm, Works-on-Arm, the team to thrive Arm eco-system has led the roles by partnering with Cloud services companies: AWS, Equinix and etc, and CI services: Travis CI, Cirrus CI and etc. So, I hope the RISC-V organization is also conscious of this role to thrive RISC-V ecosystem.

davidlt · January 10, 2023, 10:02am

Agreed. Short answer is that we aren’t ready.

A bit longer answer. 2023 is a year with SBCs that could allow us to expand capacity, and improve build times. Most likely we will not have enough builders for everyone. I expect general pool of users in fedora.riscv.rocks to increase. The market might have some servers systems this year, but it might be hard to acquire in 2023 (especially in enough quantity). We are in communication with multiple vendors and trying to provide early feedback/guidance where possible.

Standards continue to be in developed. RISC-V Profiles are not yet set in storage. Same with Platforms specification. Same with OS-A SEE specification.

I also don’t want to get stuck in the “legacy past”. There is a push towards RVA23 (next major ISA profile to replace RVA20).

jaruga · May 31, 2023, 7:15pm

I can see that the RISC-V organization is conscious and intentional of this role to thrive RISC-V ecosystem now!

sdgathman · July 18, 2023, 12:25pm

enough memory - ECC memory. Maybe that is taken for granted, but should be explict for a server.
remote controls - not really needed for a small office server
1U/2U - For running a server in home or office, the rack mount format is too noisy. You want a tower format with large (and therefore low RPM / low noise) fans. (Or even go fanless like Apple, but that is harder to engineer.)

Topic		Replies	Views
Infrastructure planning Project Discussion risc-v-sig	6	430	November 8, 2022
How can one help the RISC-V SIG? Project Discussion risc-v-sig	1	138	July 16, 2024
RISC-V: Hardinfo rebooted as hardinfo2 - Please try! Project Discussion risc-v-sig	1	209	March 18, 2024
Fedora/RISCV: Should we disable LTO (GCC; already disabled for Clang)? Project Discussion risc-v-sig	4	594	January 2, 2024
Fedora SD Card bootable image for RISC-V Unmatched Board Ask Fedora	1	446	April 12, 2022

RISC-V Server spec (for Fedora Koji System, as builder)

Related topics