Hacker News new | past | comments | ask | show | jobs | submit login
Vcc – The Vulkan Clang Compiler (shady-gang.github.io)
170 points by todsacerdoti on Jan 10, 2024 | hide | past | favorite | 68 comments




What kind of performance is achieveable with some of the features that Vcc enables (true function calls, function pointers, goto), and what are some of the limitations?

On GPU, function calls are much more expensive than on CPU. Usually it seems to be worth inlining as much as possible. Implementing a stack for recursion on GPUs is also likely to have performance implications. The whole point of using a GPU is to obtain good performance, but I can see the argument for having these features in order to port CPU code to GPU code incrementally.

For function pointers, how does that work? Multiple different implementations of the function are needed to support different devices and the host, which limits what a single pointer can do.

EDIT: To answer my second question, neither function nor data pointers are portable between host and device since Vulkan doesn't support unified addressing.


Function pointers between different architectures can be done.

Index into a table, webasm style is the easy option.

Writing a prolog that is valid in both architectures works too, I quite like that one.

You declare one to be the more important (e.g. x64) and do a hash table lookup on the other.

Or the most popular option of crash when calling a pointer from the other arch.


Strictly speaking, a stack is only needed for reentrant calls. Function pointers can be implemented via defunctionalization in a whole-program context (i.e. there is a single 'eval' function taking a variant record as an argument, and each case in the variant record marks some function implementation to dispatch to and the corresponding arguments).


I am long sought after a CUDA or HIP compiler that target SPIR-V or DXIL. So that we can compile all thoes neural network kernels to almost all compute devices.

The requirements are, 1. extend on C++17 this means template meta programming works so that cub or cutlass/cute works

2. AOT

3. and no shitshow!

The only thing that comes to close is circle[1].

- OpenCL is a no go, as it is purely C. And it is full of shitshow especially on Android devices. And the vendor drivers are the main source of shit, jit compile adds the other.

- Vulkan+GLSL is a no go. The degree of shitness is on par with OpenCL due to driver and jit compiler.

- slang[2] has the potential, but the meta programming part is not as strong as C++, existing libraries cannot be used.

The above conclusion is drawn from my work on OpenCL EP for onnxruntime. And it purely is a nightmare to work with thoes drivers and jit compilers. Hopefully Vcc can take compute shader more seriously.

[1]: https://www.circle-lang.org/

[2]: https://shader-slang.com/

[3]: https://github.com/microsoft/onnxruntime/tree/dev/opencl


What about HLSL, specially since it is a kind of C++ flavour, specially after HLSL2021 improvements?

https://devblogs.microsoft.com/directx/opening-hlsl-planning...

https://devblogs.microsoft.com/directx/announcing-hlsl-2021/

At Vulkanised 2023 discussion round Khronos admited that they aren't going to improve GLSL any further, and ironicly rely on Microsoft's HLSL work as the main shader language to go alongside Vulkan.

Maybe there is something else discussed at Vulkanises 2024, but I doubt it.

There was some SYCL work to target Vulkan, but seems to have been a paper attempt and fizzled out.

https://dl.acm.org/doi/fullHtml/10.1145/3456669.3456683


At the time of the dev of the EP, the tooling is not that good as current. I have imagined a pipeline that HLSL compiles down to DXIL then go through spirv cross and then target wide variety of mobile devices with opencl runtime. But they are more focused on graphics part and is cannot work with the kernel execution model, not to even mention the structural control flow things, it definitely not going to work. The OpenGL does not work with the imagined pipeline, because IIRC it cannot consume SPIRV bytecode. Vulkan is so niche and discarded very early. The final result with opencl runtime and cl language worked but the driver is a mess [palmface]


> OpenGL does not work with the imagined pipeline, because IIRC it cannot consume SPIRV bytecode.

What gave you that idea?

  $ eglinfo -a gl -p wayland | grep spirv
  GL_ARB_get_program_binary, GL_ARB_get_texture_sub_image, GL_ARB_gl_spirv, 
  GL_ARB_sparse_texture_clamp, GL_ARB_spirv_extensions, 
  GL_ARB_get_program_binary, GL_ARB_get_texture_sub_image, GL_ARB_gl_spirv, 
  GL_ARB_spirv_extensions, GL_ARB_stencil_texturing, GL_ARB_sync,
There it is: <https://registry.khronos.org/OpenGL/extensions/ARB/ARB_gl_sp...>. Not in OpenGL ES, though.


> At Vulkanised 2023 discussion round Khronos admited that they aren't going to improve GLSL any further, and ironicly rely on Microsoft's HLSL work as the main shader language to go alongside Vulkan.

That sounds intriguing, but I haven't been able of finding any references to it(I guess it was discussed in the panel but the video of it is private) do you have any reference of more information into it?

is it related to adding hlsl support to clang?



> Khronos admited that they [...] ironicly rely on Microsoft's HLSL work as the main shader language to go alongside Vulkan.

So Cg ultimately prevailed over GLSL. Can't say that disappoints me.


It's still early days for Vcc, I outline the caveats in the landing page. While I'm confident the control-flow bits and whatnot will work robustly, there's a big open question when it comes to the fate of standard libraries, the likes of libstdc++ were not designed for this use-case.

We'll be working hard on it all the way to Vulkanized, if you have some applications you can get up and running by then, feel free to get in touch.

I think the driver ecosystem for Vulkan is rather high-quality but that's more my (biased!) opinion that something I have hard data on. The Mesa/NIR-based drivers in particular are very nice to work with!


Thoes "existing libraries" does not necessary mean stdc++, but some parallel primitive, and are essential to performance portability. For example, cub for scan and reduction, cutlass for dense linear algebra[1].

> I think the driver ecosystem for Vulkan is rather high-quality

Sorry, I meant OpenGL. At the time of evaluation, the market shared of vulkan on Android deivces is too small and been out of consideration at very early stage. I'd assume the state has changed a lot thereafter.

It is really good to see more projects take a shot on compiling C++ to GPU natively.

[1] cutlass itself is not portable, but the recently added cute is well portable as I evaluated. It provides a unified abstraction for hierarchical layout decomposition along with copy primitive and gemm primitive.


Will C++17 parallel algorithms be supported?

https://on-demand.gputechconf.com/supercomputing/2019/pdf/sc...

Edit: Nevermind, I think I have misunderstood the purpose of this project. I thought it was a CUDA competitor, but it seems like it is just a shading language compiler for graphics.


SYCL/DPC++ are the only viable CUDA competitors I would say, assuming that the tooling gets feature parity.


circle lang is also very worth to check out.


See https://github.com/google/clspv for an OpenCL implementation on Vulkan Compute. There are plenty of quirks involved because the two standards use different varieties of SPIR-V ("kernels" vs. "shaders") and provide different guarantees (Vulkan Compute doesn't care much about numerical accuracy). The Mesa folks are also looking into this as part of their RustiCL (a modern OpenCL implementation) and Zink (implementing OpenGL and perhaps OpenCL itself on Vulkan) projects.


chipStar (formerly CHIP-SPV) might also be worth checking out: https://github.com/CHIP-SPV/chipStar

It compiles CUDA/HIP C++ to SPIR-V that can run on top of OpenCL or Level Zero. (It does require OpenCL's compute flavored SPIR-V, instead of graphics flavored SPIR-V as seen in OpenGL or Vulkan. I also think it requires some OpenCL extensions that are currently exclusive to Intel NEO, but should on paper be coming to Mesa's rusticl implementation too.


GCC supports nvptx and amd via openmp offloading and openacc. I have no idea of how well it works.


All of these are jit by the driver in the end though?


Not exactly, both cl and glsl can be aot, but the runtime will be limited to some newer version and then the market coverage will be niche, those vendor are so lazy on updating the driver and fixing the compiler bugs...


Having worked for a few of those lazy vendors, the aot usually just ends up being bitcode which is fully jitted later.


Are vendor-specific compilers generally clang-based? Emcc (emscripten), aocc (AMD), now vcc (Vulkan). I'm not sure about nvcc (NVIDIA) or icc (Intel)


Intel's modern compilers (icx, icpx) are clang-based. There is an open-source version [1], and the closed-source version is built atop of this with extra closed-source special sauce.

AOCC and ROCm are also based on LLVM/clang.

[1] https://github.com/intel/llvm


I forgot to ask my question which was "why are vendor compilers based on clang over gcc", but it must be related to LLVM?


LLVM is a big part of it. GCC is more opaque (and monolithic) in comparison. Having an intermediate language and a modular architecture makes it easier to build specialized compilers.


Another selling point is the Clang CFE, which amortizes the complexity of C++ parsing and semantic analysis, much to the chagrin of the Edison Design Group.


Meanwhile EDG are the ones coming up with an usable reflection proposal for C++26, and since Apple and Google stepped out, not many of those profiting from clang are that eager to contribute to upstream fronted.


I was surprised to see they were even still active, much less leading the way on reflection. Hopefully they will manage to stick around.

Anyway, such a pessimistic view of Clang/LLVM is unwarranted IMO. I have yet to see any metrics that imply their abandonment. Also, considering that Google is likely closer to 300 million LOC than 200 million, they really don't have a choice. Likewise for Apple unless they’ve given up on WebKit and LLVM for Swift.


One metric is the amount of red in cppreference.

Apple and Google are perfectly fine with C++17 for their purposes, which is the version used by LLVM to compile itself.


One metric is the amount of red in cppreference.

Sometimes one implementation's red box is better than another implementation's green box.

Clang supports the majority of C++20. The biggest blocker on coroutines is a Windows ABI issue and concepts will be ready after a few DRs are polished off. Basically, other than some minor CTAD and non-type template parameter enhancements, the biggest feature lagging behind are modules. IMO, I'm not ready for modules, my dependencies are not ready for modules, neither is my IDE nor my build system. So, for my interests, C++20 support in Clang is more than sufficient.

Apple and Google are perfectly fine with C++17 for their purposes, ...

Well, yes, and they will likely be using C++17 for quite a bit longer. In particular, not only is C++20 a massive update to the language, but the rate of development in the llvm-project has far exceeded the rate at which maintenance capacity can be increased. Not to mention a myriad of other issues, notably the architectural problems in Clang.


LLVM is so much easier to work with.

You can get started on reasonable optimizations and be done with a prototype in hours, compared to days or weeks in gcc.


If I had to guess, permissive licensing, better platform support, and the perception of LLVM being easier to target than GCC.


LLVM is a library for building compilers, GCC is not.

It's pretty much that simple.


It is called GNU Compiler Collection for a reason.


Also Microsoft DirectX Shader Compiler[1] for HLSL. In fact, effort is ongoing to upstream HLSL support into Clang.[2][3][4]

To answer your other question—why LLVM instead of GCC:

- First-class support for non-Unix/Linux platforms. LLVM can be built on Windows and Visual Studio without ever needing a single GNU tool besides git[5]. Clang even has a MSVC-compatible interface that allows MSVC developers to switch to Clang without needing to change their command-line invocation[6].

- Written in C++ from the ground up, with a modular, first-class SSA IR-based interface.

- Permissive Apache 2.0 licence. As much as this might exasperate the open-source community, it allows for significantly faster iteration; things tend to be upstreamed when private/corporate developers realise it is hard to maintain separate forks.

All this allows LLVM to have a pretty mature infrastructure; some very large companies have contributed to its development.

[1]: https://github.com/microsoft/DirectXShaderCompiler

[2]: https://clang.llvm.org/docs/HLSL/HLSLDocs.html

[3]: https://github.com/microsoft/DirectXShaderCompiler/wiki/Cont...

[4]: https://discourse.llvm.org/t/rfc-adding-hlsl-and-directx-sup...

[5]: https://llvm.org/docs/GettingStartedVS.html

[6]: https://clang.llvm.org/docs/UsersManual.html#clang-cl


Written in C++ from the ground up, with a modular, first-class SSA IR-based interface.

LLVM IR is a great accomplishment, but the design of Clang causes tremendous pain. In particular, the lack of any higher level IR(s) in Clang result in tremendous complexity, poorer diagnostics, and missed performance optimizations. Moreover, while initiatives like ClangIR[1] may exist, the fact is that a transition to MLIR will be long and messy, to say the least. Indeed, such a transition would likely involve the complete duplication of Clang internals while downstream implementations migrate. In comparison an implementation like GCC is free to define and modify new IRs as needed, e.g. GIMPLE has three major forms IIRC.

[1]: https://llvm.github.io/clangir//


> without ever needing a single GNU tool besides git[5]

Since when is git GNU?


My bad—I meant GPL.


Seems a huge pain relative to freestanding C++ to LLVM IR but more options in this space are welcome.


Well, that is amazing news!!!


This looks interesting, I wonder how this would help Apple M1 Mac's ARM to x86/x64 emulation.


Long overdue. Please give CUDA some competition. It's always been possible to do general purpose compute in Vulkan, but the API was high-friction due to the shader language, and the glue between shader and CPU. It sounds like Vcc is an attempt to address this.


> the API was high-friction due to the shader language, and the glue between shader and CPU

Direct3D 11 compute shaders share these things with Vulkan, yet D3D11 is relatively easy to use. For example, see that library which implements ML-targeted compute shaders for C# with minimal friction: https://github.com/Const-me/Cgml The backend implemented in C++ is rather simple, just binds resources and dispatches these shaders.

I think the main usability issue with Vulkan is API design. Vulkan was only designed with AAA game engines in mind. The developers of these game engines have borderline unlimited budgets, and their requirements are very different from ordinary folks who want to leverage GPU hardware.


On top of that, while Vulkan has a fraction of OpenGL lifetime, it isn't shy of outpacing it in extensions spaghetti.


Please give CUDA some competition.

I doubt Vulkan, a low-level graphics API with compute shaders, could ever directly compete with CUDA, a pure play GPGPU API. Although it isn't hard to imagine situations in which Vulkan compute shaders are sufficient to replace and/or avoid CUDA.


I wonder why people haven't coalesced around HLSL; I vastly prefer defining vertices as structs and having cbuffers over declaring every individual uniform or vertex variable with its specific ___location. It makes much more sense to me to define structs with semantics over saying ___location(X) for each var or addressing uniforms by name/string. I often see GLSL to HLSL transpilers, but not vice-versa, which is weird because HLSL is the more ergonomic language to me


There's no need for transpilers these days, you can just compile HLSL to SPIR-V bytecode with dxc.

https://github.com/microsoft/DirectXShaderCompiler


I would even advise to author the shaders directly in SPIR-V with a basic SSA checker and text to binary translater. That, very conservatively to avoid any drivers incompatibility (aka nothing fancy).

I wonder if there is an open source HLSL to spir-v compiler written in simple and plain C... but I am ready to work a bit more to avoid to depend on a complex HLSL compiler.


The quantity and type of SPIR-V issues which DirectXShaderCompiler has makes me believe it is not really used in serious settings.

Microsoft also directly states in the README it is a community contribution, which seems an intentional choice of language.

I'd love to be proven wrong, though


Can WebGL run SPIR-V? That was my main issue


I'm pretty sure you have to transpile SPIR-V shaders to GLSL first. WebGPU has its own Rust-inspired language as well.


It's worth noting that SPIR-V isn't really compiled for the target ISA, it's not even guaranteed to be in a particular optimized form already.. so when the GPU driver loads it, it may or may not decide to spend a lot of time running optimization passes before translating it to the actual machine code the GPU needs.

In contrast, Metal shaders can be pre-compiled to the actual ARM binary code the GPU runs.

And DirectX shader bytecode, DXIL, is (poorly) defined to be a low-level IR that LLVM spits out right before it would be translated to machine code, rather than a high-level IR like SPIR-V is. i.e. it is guaranteed to be in an optimized form already, so drivers do not expect to run any optimizations at load time.

SPIR-V seems a bit of a mess here, because you don't really know what the target GPU is going to do - and what you can expect in terms of time the GPU spends on optimizing the SPIR-V when loading it varies on mobile and non-mobile GPUs, depending basically on what the GPU manufacturer felt behaved the best based on how random people making games distribute optimized or unoptimized SPIR-V.

Valve even maintains a fancy database of (SPIRV, GPU driver) pairings which map to _actual_ precompiled shaders for all games distributed on their platform, so that they aren't affected by this.

Whew, what a mess shaders are.


Which is really how the experience with portable APIs from Khronos goes, most people that claim they are portable never really did any serious cross platform, cross device programming with them.

At the end of the day there are so many if this, if that, load this extension, load that extension, do this workaroud, do that workaround, that it could just be another API for all pratical purposes.


> I wonder why people haven't coalesced around HLSL

I mean, people kind of have, because Unity uses it (and Unreal too I think).

> I vastly prefer defining vertices as structs

I agree, and note that WGSL lets you do this.

> It makes much more sense to me to define structs with semantics over saying ___location(X) for each var or addressing uniforms by name/string.

I feel like semantics are a straitjacket, as though the compiler is forcing me to choose from a predefined list of names instead of letting me choose my own. Having to use TEXCOORDn for things that aren't texture coordinates is weird.

> I often see GLSL to HLSL transpilers, but not vice-versa, which is weird because HLSL is the more ergonomic language to me

Microsoft and Khronos support this now [1]. Microsoft's dxc can compile HLSL to SPIR-V, which can then be decompiled to GLSL with SPIRV-Cross.

In general, I prefer GLSL to HLSL because GLSL doesn't use semantics, and because GLSL has looser implicit conversions between scalars and vectors (this continually bites me when writing HLSL). But I highly suspect people just prefer what they started with and perfectly understand people who have the opposite opinion. In any case, WGSL feels like the best of both worlds and a few newer engines like Bevy are adopting it.

[1]: https://www.khronos.org/blog/hlsl-first-class-vulkan-shading...


They have, HLSL has practically won the shading language wars in the games industry, even on consoles, the Playstation and Switch shading languages are heavily inspired on HLSL.

Also as I noted on another comment, Khronos is basically leaving to Microsoft HLSL as de facto shading language for Vulkan, as they don't have monetary resources, nor anyone else, interested in improving GLSL as a language.

However, as per MSL and HLSL 2021 improvements, alongside SYCL and CUDA, eventually C++ will get the spot, and I doubt this is an area where any of the wannabe C++ replacements can do better.


Note that WGSL is closer to Rust than C++--syntactically speaking anyway--and it's pretty much guaranteed to get traction as it'll be the only way to talk to the GPU using a modern API on the Web.


I doubt it, everyone is generating WGSL from their engine toolchains.

I don't know anyone that is happy with the idea of WSGL, and the amount of work it has generated.

Other than the WebGPU folks, that is.


I'm happy with it. It's a nice improvement over HLSL/GLSL in Bevy.


S̶o̶u̶n̶d̶s̶ ̶c̶o̶o̶l̶,̶ ̶b̶u̶t̶ ̶t̶h̶i̶s̶ ̶r̶e̶q̶u̶i̶r̶e̶s̶ ̶y̶e̶t̶ ̶a̶n̶o̶t̶h̶e̶r̶ ̶l̶a̶n̶g̶u̶a̶g̶e̶ ̶t̶o̶ ̶l̶e̶a̶r̶n̶[0]. As someone who only has limited knowledge in this space, could someone tell me how comparable is the compute functionality of rust-gpu[1], where I can just write rust?

[0] https://github.com/Hugobros3/shady#language-syntax

[1] https://github.com/EmbarkStudios/rust-gpu

Edit: No new language to learn required, this was about IR


I often see folks complain/express confusion about shading languages.. but I think it's often incorrectly stated as syntax of the language, when in reality the confusion is about underlying GPU concepts.

Like, people (naturally) want to say 'I know how to write X, and therefor if X can run on the GPU then I can just write X for the GPU and be productive!" but in reality programming for a CPU and GPU are just.. very very different.

Vector processing differences.. shader stage concepts.. primitive types like textures/samplers.. bindings.. push constants.. uniforms.. recursion limitations.. dynamic array limitations.. invocation groups.. control flow limitations.. extension hell.. -- these aren't going anywhere

Heck, even if you stick with just modern compute APIs like Cuda - as close as you can get to treating the GPU as a general compute device - the way you write code for a GPU using that is just going to be very very different than how most write code for the CPU.


> Vector processing differences.. shader stage concepts.. primitive types like textures/samplers.. bindings.. push constants.. uniforms.. recursion limitations.. dynamic array limitations.. invocation groups.. control flow limitations.. extension hell.. -- these aren't going anywhere

The point of Vcc/Shady is to address some of these, in particular recursion is now possible in Vulkan, and control-flow is no longer limited. A lot of those are just historical language limitations and can be eliminated with enough effort.

The elephant in the room is SIMT and the subgroup/workgroup considerations, which don't really require any changes to syntax but indeed needs the programmer to have a good mental model. But I don't think it would be incompatible with raising the bar on expressiveness or host/device code interoperability !


> recursion is now possible in Vulkan, and control-flow is no longer limited.

If you care about mobile devices (even modern android phones) I suspect this is not true


"I often see folks complain/express confusion about shading languages.. "

The confusion already starts with the name. Historically shading languages had the purpose once to calculate shades - but otherwise there are no shades and more correct and helpful would be just "GPU language". As yes, it is very different than writing something for a CPU.


As others have commented, Shady is an IR that happens to have a Clang front-end bolted on, that combination is what I call Vcc. I really don't expect people to start writing shaders in the IR directly, even though has a textual form!

It's very comparable to Rust-GPU, I actually know a really nice person who does key work on that (you know who you are!) and they're actually facing very similar challenges, and we get to exchange ideas regularly. I am confident it's in good hands.


IDK that shady is that far from literally any other C-like shader language. Any skill you have in that area is going to transfer (unless you only know Rust, lol)


No, the entire project is about using existing languages (C/C++) on GPUs, the project you linked is their internal IR (intermediate representation) which is what they compile the C/C++ down to before it is turned into a Vulkan compute shader.


Ahh, thanks for the clarification!




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: