RISC-V Server spec (for Fedora Koji System, as builder)

Hi all,
FYI, StarFive will have a New Chip Call JH7110(4-core) which has PCIE Gen2. They want to make a RISC-V Server for Linux distros.
Also think about the new Chip From Chinese Academy of Sciences, and other potential server level RISC-V chips.

They may want a spec for RISC-V, So Can We lead them to have a Server spec, we may add what we need, like Fedora koji builder…

maybe we can start a document for it , what do you think? any reference ?

Great thanks for your help :slight_smile:

3 Likes

Hi,

I believe the requirements are like this

  • enough memory per CPU hw thread, 4 GB per hart would be OK for now

  • fast local I/O for storage, 250 GB NVMe would be probably the easiest,
    when PCIe is available from the SoC

  • fast remote I/O, at least 1 Gbit/s network

  • remote controls for power plus remote console, could be solved with
    PiKVM(?)

  • ideally all packed in 1U/2U server chassis

      Dan
    

As a short-cut: “HiFive Unmatched ++”.

ATX / ITX board with standard power header, plenty of RAM, NVMe disk or disks, PCIe slot or slots.

A BMC of some sort would be a very nice addition. The HiFive has a USB-serial port (I think an FTDI chip or similar) which is good, but remote power management is the key to making something that we can put into a datacenter. The ideal chip would probably be the Aspeed AST 2500.

1 Like

Disclaimer: this would never be a true server as things like Profiles and Platforms are not even frozen & ratified. The true baseline for “server” class hardware is yet to be set.

Now back to reality :slight_smile: Look at SiFive Unmatched as the good base with some changes:

  • All heatsinks should be compatible with 1U server chassis.
  • User a large heatsink on SoC (see how BeagleV was overengineered).
  • 80mm/120mm/140mm mounting holes around SoC. Basically if I want to use a proper PC fan (usually 120mm) there should be holes for that to cool the whole board. You don’t need all 4 holes. Long hex standoffs M3 are fine to attach the fan this way. Top down blowing and cooling the whole board.
  • 1 or 2 headers for PC fans. 3-pin or 4-pin PWM doesn’t matter too much.
  • NVMe is a must. Even if that’s a single lane PCIe Gen 2. That’s wasy better than microSD card or eMMC.
  • Must be able to select auto-boot on power. Currently Unmatched cannot do that and requires a button press to boot.
  • WDT is needed to reboot on failures. I do prefer the ones that could provide a longer timeouts (i.e. scale into minutes instead of just a few seconds). The most important is to have WDT in general.
  • TRNG. FU540 and FU740 has no TRNG or/and crypto engine. We use software solutions to solve low entropy otherwise you wouldn’t even be able to SSH to Unleashed or Unmatched immediatelly after boot. I also have seen haveged daemon eating a lot of CPU power (e.g. on a large git repo checkout). I would love to see TRNG (internal to SoC or external). If that’s an external chip, might as well add some crypto engine (those will have TRNG).
  • SPI-NOR Flash 32MiB in size (same as on Unleased or Unmatched). Keeping the same will make life easier. IIRC 32MiB is not enough for the current TianoCore EDK II. Maybe consider SPI-NOR Flash with socket? Once we kill it, we could replace it or go with a larger capacity.
  • AST2500/2600 would be nice to see. Both seem to be available in OpenBMC repository as MACHINE options. Seems the AST2500 is widely available, but it’s harder to find AST2600.
  • Ability to flash SPI-NOR Flash (new firmware) remotely (e.g. via BMC) or some sort of “Dual BIOS” mode that you can pick from BMC or somehow else. Worst case scenario JTAG to flash it. I would prefer to flash it on Linux side from a running distro (you can do that on Unmatched), but if there is a bug in OpenSBI / U-Boot SPL / U-Boot proper you cannot boot anymore. Need a way to recover withou manual invervention. Unleashed and Unmatched allows to use microSD card (SPi-MMC) for the firmware, which can work with work with SDMuxers (but that’s 100+ EUR extra per board).
  • Minimum 16GiB of RAM, but socket for DIMMs is better. I would go with 32GiB if possible too. Again doesn’t matter much if DIMMs are supported. LTO is default now, so in general linking stage is expensive. There are some packages where I disabled LTO because of low RAM. Some of QEMU machines are running with 32GiB RAM to help out here.
  • A single NIC, 1Gbps is fine for the builder.
  • Add RTC. Saves some headache with skewed clocks to the point where you cannot SSH to the machine.
  • PCIe slot, x16 slot (electrically it could be only 2 lanes). Also possible to go with shower connector, but would need to be open endded (see PINE64 desings). If we are talking about standard form factor (at least mini-ITX) just put a x16 connector.

TL;DR has to designed to be fully managed remotely. I do like PiKVM, SDMuxers, etc. but prefer a solution where you don’t need to spend hundreds of EUR to get it working. Fast storage is required (it’s epic, no need to deal with NBD too, so much time saved on DNF).

1 Like

Ah, BTW, we also need some easy version strap (e.g. via GPIOs) or something if there are multiple revisions of this. Also reprogram FTDI chip with vendor/product/serial so it would be easy to identify the board on the serial port.

Yeah, all those things mentioned already. :slight_smile:

Basically our builders are in a remote datacenter where getting hands on is expensive/difficult/slow and so we want to be able to 100% manage the machines remotely (power, firmware updates, etc).

1 Like

This is a very interesting article. Perhaps, it’s time to prepare to add the RISC-V CPU to the Koji build system.

Google wants RISC-V to be a “tier-1” Android architecture

Arm has become an unstable, volatile business partner

I posted the link above to a thread on the Fedora devel@ mailing list too. The discussion is happening there.

And the summary still looks to be: we’re still waiting on hardware to exist.

1 Like

Another important factor to add the RISC-V CPU to Koji build as the official port is a RISC-V-based CI and the servers logged in by SSH that upstream source projects can use for free. It makes our contributions to the upstream open source projects much easier.

In the case of Arm, Works-on-Arm, the team to thrive Arm eco-system has led the roles by partnering with Cloud services companies: AWS, Equinix and etc, and CI services: Travis CI, Cirrus CI and etc. So, I hope the RISC-V organization is also conscious of this role to thrive RISC-V ecosystem.

1 Like

Agreed. Short answer is that we aren’t ready.

A bit longer answer. 2023 is a year with SBCs that could allow us to expand capacity, and improve build times. Most likely we will not have enough builders for everyone. I expect general pool of users in fedora.riscv.rocks to increase. The market might have some servers systems this year, but it might be hard to acquire in 2023 (especially in enough quantity). We are in communication with multiple vendors and trying to provide early feedback/guidance where possible.

Standards continue to be in developed. RISC-V Profiles are not yet set in storage. Same with Platforms specification. Same with OS-A SEE specification.

I also don’t want to get stuck in the “legacy past”. There is a push towards RVA23 (next major ISA profile to replace RVA20).

3 Likes

I can see that the RISC-V organization is conscious and intentional of this role to thrive RISC-V ecosystem now!

3 Likes
  • enough memory - ECC memory. Maybe that is taken for granted, but should be explict for a server.

  • remote controls - not really needed for a small office server

  • 1U/2U - For running a server in home or office, the rack mount format is too noisy. You want a tower format with large (and therefore low RPM / low noise) fans. (Or even go fanless like Apple, but that is harder to engineer.)