Cross-builds for GNU toolchain, LLVM, Linux Kernel are going to be much faster on a low end (similar price) x86.
But most packages don't have as solid (or any) cross-build infrastructure as those projects.
I did some benchmarks on RISC-V Linux kernel builds (commit 7503345ac5f5, defconfig) this week. Native on SBCs, cross-build on a couple of x86, docker (qemu-user) on x86.
- 1m45s i9-13900HX cross-compile
- 4m29s Milk-V Pioneer (youtube video from 9 months ago .. close enough)
- 22m48s RISC-V docker on 24 core i9-13900HX laptop
- 67m35s VisionFive 2 (4x U74)
- 88m4s Lichee Pi 4A (4x C910)
I need a figure for SpacemiT in BPI-F3 / Jupiter / LPi3A / DC-Roma II / MuseBook. I think it'll be more or less the same as the VF2.
My guess is the Milk-V Megrez (EIC7700X @1.8 GHz) might come in around 25 minutes, and this HiFive Premier (same Soc @1.4 GHz) 30 minutes.
So the P550 machines will probably be a little slower than qemu on the i9. Not a lot. But even the VF2 and LPi4A are going to be much faster than qemu on that 6 core Zen2 -- I haven't measured it, but I'm guessing around 130m.
So if you already have that high core count x86, maybe you don't need a P550 machine.
On the other hand it's good to verify on real hardware.
On the gripping hand, with a 16 GB Megrez costing $199 and my i9 costing $1600, if you want a build farm with 5 or 10 or 100 machines then the P550 is looking pretty good.
VF2 (or Mars) is still looking pretty good for price/performance. The problem there is that being limited to 8 GB RAM isn't good. That's not enough for example to do riscv-gnu-toolchain without swapping. That build is fine on my 16 GB LPi4A, but not my VF2.
16 GB or 32 GB on the P550 boards is much more robust.
Thanks for this, I was looking to upgrade my VF2 but I'm not sure it's worth it at this stage, the VF2 is painfully slow, and this board doesn't reach 2x perf
I get similar results here. The Banana Pi BPI-F3 was a big disappointment. I was expecting some improvement over the VisionFive 2, but no dice. A big Linux build at -j8 on the BPI-F3 takes essentially the same time as a -j4 build on the VF2.
Apparently the small level 2 caches on the X60 are crippling.
I'm surprised how much faster the Jupiter is than the BPI-F3: 28%.
That's a lot for the same SoC.
And, yes, ridiculously small caches on the BPI-F3 at 0.5 MB for each 4 core cluster, vs 2 MB on the VisionFive 2 and 4 MB on the P550.
The Pioneer still wins for cache and I think real-world speed though, with 4 MB L3 cache per 4 core cluster, but also access to the other 60 MB of L3 cache from the other clusters on the (near) single-threaded parts of your builds (autoconf, linking, that last stubborn .cpp, ...)
The test is probably somewhat disk bound, so I/O architecture matters. For example, we just retested the HiFive Premier P550, but using an NVMe drive (in an adapter in the PCIe slot) instead of the SATA SSD, and performance improved markedly for the exact same hardware. (See updated chart)
As long as you've got enough RAM for a file cache for the active program binaries and header files, I've never noticed any significant difference between SD card, eMMC, USB3, or NVMe storage for software building on the SBCs I have. It might be different on a Pioneer :-)
I just checked the Linux kernel tree I was testing with. It's 7.2 GB, but 5.6 GB of that is `.git`, which isn't used by the build. So only 1.6 GB of actual source. And much of that isn't used by any given build. Not least the 150 MB of `arch` that isn't in `arch/riscv` (which is 27 MB). Over 1 GB is in `drivers`.
riscv-gnu-toolchain has 2.1 GB that isn't in `.git`. Binutils is 488 MB, gcc 1096 MB.
This is all small enough that on an 8 GB or 16 GB board there is going to be essentially zero disk traffic. Even if the disk cache doesn't start off hot, reading less than 2 GB of stuff into disk cache over the course of a 1 hour build? It's like 0.5 MB/s, about 1% of what even an SD card will do.
It just simply doesn't matter.
Edit: checking SD card speed on Linux kernel build directory on VisionFive 2 with totally cold disk cache just after a reboot.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Tue Dec 10 07:39:01 2024 from 192.168.1.85
user@starfive:~$ time tar cf - linux/* | cat >/dev/null
real 2m37.013s
user 0m2.812s
sys 0m27.398s
user@starfive:~$ du -hs linux
7.3G linux
user@starfive:~$ du -hs linux/.git
5.6G linux/.git
user@starfive:~$ time tar cf - linux/* | cat >/dev/null
real 0m7.104s
user 0m1.120s
sys 0m8.939s
Yeah, so 2m37s seconds to cache everything. vs 67m35s for a kernel build. Maximum possible difference between hot and cold disk cache 3.9% of the build time. PROVIDED only that there is enough RAM that once something has been read it won't be evicted to make room for something else. But in reality it will be much less that that, and possibly unmeasurable. I think most likely what will actually show up is the 30s of CPU time.
I'm having trouble seeing how NVMe vs SATA can make any difference, when SD card is already 25x faster than needed.
I'm not familiar with the grub build at all. Is it really big?
The build directory is 790M (vs 16GB of RAM), but nevertheless the choice of underlying storage made a consistent difference in our tests. We ran them 3+ times each so it should be mostly warm cache.
Weird. It really seems like something strange is going on. Assuming you get close to 400 MB/s on the NVMe (which is what people get on the 1 lane M.2 on VF2 etc) then it should be just several seconds to read 790M.
Many or most packages don't have good cross-build infrastructure. That's important when you're a Fedora or Ubuntu building 50k+ random packages, not just working on GCC / LLVM / Linux Kernel all the time.
Doing "native" build in an emulator works for just about everything, but with a 10x - 15x slowdown compared to native or cross-build.
While price/performance of RISC-V is currently significantly worse than x86, it's not 10x worse.
A $2500 Milk-V Pioneer (64 cores, 128 GB RAM) builds a RISC-V Linux kernel five times faster than a $1500 x86 laptop (24 cores) using RISC-V docker/qemu.
A $75 VisionFive 2 or BPI-F3 takes 3 times longer than the x86 with qemu but costs 20 times less.
If you're only building one thing at a time and already have the fast x86 ... then, sure, use that. But if you want a build farm then RISC-V native on either the Pioneer or the VF2 is already much better.
These P550 machines are an incremental improvement again, in price/performance.
The goal is to actually run RISC-V binaries on RISC-V hardware, to see what works and what doesn't. You wouldn't spot code generations bugs like this one if you merely cross-compile and never run the binaries: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=c65046ff2e...
For quite some time to come, the main user of the Fedora riscv64 port will be the Fedora riscv64 builders. With cross-compilation, we wouldn't even have that limited use of the binaries produced.
There's a lot of cases where you want to build something and run it afterwards, such as tests or intermediate tooling used in later steps in the build.
In any case, I actually want to use RISC-V machines for my development environment.
The experience of the Debian folks working on cross-compiling is that ~50% of packages are cross-compilable, and that was only achieved with a lot of work and a lot of patches merged. Also, it regresses quite a lot.
$400 (16GB DDR5) and $500 (32GB DDR5) because that's not in the blog post.
But I have some questions:
Why the weird form factor? Mini-DTX is supported by a lot of cases but as a motherboard form factor it's incredibly niche compared to Mini-ITX and especially Micro-ATX.
Unless something is particularly unique with this HiFive revision you can use PCIe SSDs it just looks like this one is relatively lane starved and you'll need to eat the PCIe slot to do so (it's a x16 slot physically but only x4 in terms of actual wired lanes). The E-keyed slot is listed as SDIO, but even if it had PCIe wired up it'd be a max of x2 lanes.
Just checked: looks like it does provide a full size pcie slot. My guess is that the intended use case is for developers testing compatibility. You could almost certainly plug in an nvme drive through an adapter to that full size slot (although booting from it would likely require a custom uboot build) but if you use up the one slot for storage, you can’t plug in any other peripherals to test.
It's a 16x slot, but it only provides 4 lanes of PCIe 3.0
Appears to be a SoC limitation, which only has a single 4x PCIe 3.0 interface, and doesn't appear to support bifurcation (splitting into two 2x slots, or four 4x slots). You could probably throw a PCIe switch on it, but PCIe switches are expensive.
The SoC wasn't designed for this product, it was designed by 3rd party company for computer vision/robotics/manufacturing use cases.
Yeah, the Intel Horse Creek chip with the same SiFive P550 cores was probably much better on I/O -- and likely DDR IP as well, if they used their good stuff -- but for some reason Intel decided not to go to mass production on them, despite showing off a working board at a trade show in IIRC September 2023.
You can connect M.2 NVMe, it just doesn't have a dedicated slot. If you're looking for a cheap device to plug a bunch of different PCIe devices into a RISC-V development board is probably not your ideal pick, look at a normal computer.
Honestly, this type of board is something that you aren't really going to just install in a case and forget about it. You are most likely going to have this on a test bench so you can swap out all kinds of different hardware for validation.
Buttons on the bottom for power/reset, 2 different JTAG ports, DIP switches for settings, and remote board management aren't things that are normally found on consumer boards. Mini DTX probably allows them to have a marginally smaller width compared to Micro-ATX while still allowing space for all of that debug functionality with a 2 slot graphics card installed. eMMC is also kind of important for a SOM as well.
mini-dtx is mini-itx with space for an extra slot - in practice a lot of cases support that as a mini-itx board plus dual slot gpu has the same footprint.
> "As a result of increased production and economies of scale, we’re excited to announce we are able to lower the price to just $399 for the 16GB version and $499 for the 32GB version"
Oops! $200 less ... $199 for the 16 GB. And supposedly faster. Also the Megrez can be powered from a standard 12V barrel connector if you don't want to put it in a case / use an ATX power supply.
SiFive's boards are always more expensive than 3rd party boards using the same CPU cores. They are aimed primarily at SiFive IP customers who are designing their own chips, to develop software on before their chips are ready.
> Is this in par with or faster than comparable ARM, ADM or Intel processors at the same price level?
rwjm's package build benchmark shows it as 25% faster than a comparable µarch Raspberry Pi 4.
It's been some time since Intel or AMD had similar products: I guess something around Pentium III, Pentium M, or early Core 2.
Prices are a function of production volume (and features / quality, but mostly volume).
The Pi 4 is if course a mass-market product and is cheaper. Arm's own "Juno" A72 dev board is $10,000, which is 20x more than this SiFive board.
It can't run an upstream kernel yet, although it's very likely that we'll get there. SiFive have in the past been very good about getting changes upstream.
Can someone who is more familiar with this SoC confirm for me that this P550 doesn't have the RISC-V "V" vector extensions? I'm seeing a GBC suffix which I guess means bit manipulation, compressed instructions, and whatever the G extension is which I don't fully understand (IMAFD plz explain?)
Note that it's only on one 1.6GHz core, and still pretty anemic otherwise (pi form factor, it's to be expected.) So something "deskop grade" with all the nice extensions and other goodies is probably still a ways off. Maybe next year; we'll have to see -- but lots of useful extensions continue to be ratified today, so it may still be a while before things "cool off."
P550 cores don't have vector extensions. It's actually quite an old design, from 2021. What you'd want is SiFive P670 cores, which are RVA22 compliant with the vector 1.0 spec.
Three years from announcement of a core to SoCs on boards being available is actually on the quick side.
Arm A53 (Pi 3, October 2012 - February 2016), A72 (Pi 4, Feb 2015 - June 2019), and A76 (Pi 5, May 2018 - September 2023, or January 2022 for Radxa Rock 5B) all took longer.
P670 was only announced in November 2022. If a board ships by the end of 2025 it will be doing very well.
why does it take so long? I totally understand for leading edge products on new nodes why it would, but for chips on mature processes, what's the bottleneck between design and sale? I know the round trip time from the foundries is ~3 months. Is the rest validation? It feels like it would be really valuable to be able to reduce the time to ~1 year (even if that came at the cost of power, area or a moderate amount of performance). Shortening the feedback loop here would be super helpful for designers and programmers to be able to experiment with new paradigms faster I would think.
Largely because in the Arm and RISC-V worlds company A designs the CPU core, then at some point company B decides to license the core to make an SoC and starts designing it. Then company C manufactures the chip. Then when the chip is available companies D, E, F design boards for the chip to go on.
Traditionally in x86 Intel and AMD do all the first three steps in one company, with the stages overlapped, and feedback.
Also, Intel and AMD (and even more so Apple) don't announce a new chip until it is very close to shipping. They might have been working on it internally for five years before that.
Arm and RISC-V companies have to make public announcements when a core is nearing design completion, to give a chance for companies such as Allwinner and Rockchip and Broadcom and Mediatek and Sophgo and Starfive to take a look at the specs and decide that it might be interesting to build a chip using that core.
> I know the round trip time from the foundries is ~3 months.
That's only if the chip works first time. Many don't and need several re-spins. I believe 3 or 4 is not uncommon. And variable amounts of re-design and re-layout and re-verify time between each of those ~3 months at the foundry.
Given that, it would be a brave company that went straight to mass-production without a round of test chips first, so you've got 2x ~3 months, plus "bring up" time in between, even in the best case.
G includes MAFD extensions for non-embedded (I) applications. That's multiplication and division, atomics, single and double-precision floating point. It also includes the control/status register and a instruction-fence instruction. I think it's there to mean "the base plus the standard bits that people generally want in a processor if they're writing C for it".
it takes time (~2 years) for silicon designers to go from idea to taping out silicon. The P550's cores are advertised as having good area efficiency, so it could be both getting rid of the vector extensions to optimize area and they just couldn't incorporate them into the design.
I is the base integer instructions
M is integer multiplication
A is atomic
F is single precision float
D is double precision
G is shorthand for all of the above + 2 others that I honestly have no idea what they do
Ok, wow I didn't even clue in that the IMAFD was other extensions, and the G a "rollup" of them, because last time I played with RISC-V it was just with cores that did IMF
> + 2 others that I honestly have no idea what they do
The original RV32I and RV64I required some control registers for high frequency counters and instruction counters. You also needed the instructions to access these registers. This proved to be too complex for the simplest implementations, so recently (five years ago?) this functionality has been moved to their own extensions.
Including these extensions in G makes the current G have the same functionality as the original G.
> 2 others that I honestly have no idea what they do
CSR (i.e. status) register and instruction fence extensions. Instruction fences are most useful in cases where you modify text section during runtime (e.g. JIT or code hot reload) such that you need to ensure the consistency of code across different harts
You can get vectorized instructions from Microchip at a much higher price point in a few months on RISCV with the forthcoming $1500ish Devboard - it has some nice specs, 10GigE
This is similar to the High Performance Space Computer which will be coming out in Rad Hardened & Rad Tolerant versions, I think these devboards will be 40k-60k
That's an insane price for something which will perform similarly to the BPI-F3.
It has double the DLEN, but it also only runs at 1GHz, while the BPI-F3 is available at 1.6GHz and 1.8GHz for way cheaper.
> There is also a lower end 4 core unit too, list price for the devkit is $150, currently shipping.
This is an entirely different processor, the now very old SiFive U54 at 0.6GHz.
1. Ubuntu invested very heavily into making Linux friendly to a whole generation of makers when nobody else was. Ubuntu is most familiar to them. Canonical will benefit from that investment for the foreseeable future.
2. Ubuntu benefits from Debian's debootstrap which makes porting to a new architecture substantially easier.
debootstrap (or mmdebstrap) is just for installing the already existing binary packages. The bootstrap process for bringing up a new port is a lot more work.
The reason why Ubuntu is probably that they are a commercial vendor so you can make contracts with them, while the likes of Debian just work on what they care about when they have time.
Firefox has been able to run on RISC-V for as long as I can remember. I'm pretty sure I remember SiFive doing a demo back in 2018 at FOSDEM which included the browser. However generally GUI environments are still quite slow, so it won't be very usable.
"Now Available" must mean something different to SiFive than it does to me. When I click the links in the press release that purport to let me acquire one, they all say "No Stock Available," which means the opposite of "Now Available" to me.
They weren't loading at all earlier, though, so saying that I can't get one but showing me the price I can't get it at is some kind of improvement, I guess.
SiFive sent out an email on about the 32GB boards:
> The Grinch struck a little early this year. While the 16GB HIFive Premier P550 boards are in stock at Arrow.com and available now we were so excited to tell you the news we pushed the send button too soon. If you tried to order a 32GB board after yesterday's announcement our distributor is not quite ready. Our sincerest apologies to any of you who experienced difficulties and for multiple emails. We expect this problem to be solved in the next few days and we will send an update as soon as the 32GB boards are available. Thank you for your patience.
Note the benchmark is not very rigorous, but it reflects what we want to do with these boards which is to build Fedora packages.