Classes vs. structs in .NET: how not to teach about performance

maxlin · on Nov 5, 2023

Urgh. People not understanding LINQ misusing like exposed here is painful to watch.

Sure, It took me a few head scratches myself originally to understand lazy/unintentional multi-evaluation, but for the original article to still push a PERFORMANCE article through even while the results so clearly say there's something very unexpected happening, boggles the mind.

Do not be single-minded and bulldoze a hypothesis even if you are in a teacher's role. This reminds me of the "nothing to see here" police squad scene with the second example being 100x slower being ignored lol

PreInternet01 · on Nov 4, 2023

I've seen LINQ lazy evaluation causing problems before, but mostly in the context of unit tests (where some code under test was simply never invoked, despite code coverage statistics looking OK).

It's clear that lambdas can be confusing to both humans and tooling, and fixing the latter seems the most viable. Visual Studio greying out LINQ lambda code that isn't reachable given current invocation patterns would be a nice start, and doesn't seem unfeasible to me given the kind of code analysis already done...

pharmakom · on Nov 5, 2023

LINQ is so simple! I don’t understand how people can produce OOP monster hierarchies with mutating references all over but LINQ stumps them. You’re right that this is common, however.

mumblemumble · on Nov 5, 2023

LINQ is simple, but it represents a way of thinking about computation that is wildly different from what has historically been available on popular programming languages, and interacts poorly with how most people are taught to program.

I see the same thing when teaching people to program for Apache Spark (which took inspiration from DryadLINQ). They keep wanting to do things in an imperative way. It typically results in unnecessarily complicated code, and it always interacts terribly with the computational model. But it's also the way they've been conditioned to think over years or decades, and, from what I've seen, is often so deeply baked into their thought process that doing it any other way is, at best, deeply uncomfortable.

to11mtm · on Nov 5, 2023

AFAIK Rider -usually- does a good job of squiggle-warning anything that will result in multiple enumeration of an unknown IEnumerable at least.

This is in fact important for two reasons:

1. General inefficiencies as mentioned.

2. There are plenty of `IEnumerable` things out there that -cannot- be repeatedly enumerated. Not the majority, but enough that you can run into day-to-day.

leetcrew · on Nov 5, 2023

this whole post is kinda confusing to me. I know barely anything about c# under the hood. the only language I've ever used in performance sensitive use cases is c++.

to me, all of the benchmark code looks like an obvious opportunity for dead store elimination. especially this loop in the second round:

  for (var i = 0; i < classes.Count(); i++)
  {
      var x = classes.ElementAt(i).Name;
  }

it doesn't look like any side effects are possible there and x is never referenced outside of the loop. can anyone ELI5 why the compiler generates any code for that snippet?

kevingadd · on Nov 5, 2023

It's impossible to guarantee that count and elementat have no side effects unless you fully devirtualize everything, which can't be done at compile time since the application's dependencies (SDK/runtime, third party libraries) could be swapped out before it runs.

So these optimizations would have to occur in the JIT and might come at the cost of worse startup time or memory usage.

Fwiw modern .net is getting pretty good at devirt but I don't expect it would optimize all this out.

idonttalkenough · on Nov 5, 2023

https://stackoverflow.com/questions/3202464/garbage-collecti...

Extrapolating it to a larger sense, this SO thread explains it a little bit.

The GC since the top answer has changed quite a bit; I'd read about the changes to the .net clr/framework since v5/6

leetcrew · on Nov 5, 2023

interesting thread, but doesn't really answer my question.

as a human, it is obvious from that snippet alone that the variable x is never read. therefore, unless `classes.ElementAt(i).Name` has some side effect, the entire loop could be replaced with a no-op without changing the program semantics.

so my question isn't about GC behavior at all. I expect the entire loop to be optimized away (no code emitted). why doesn't c# make this optimization? is it something subtle about how properties work, or does the compiler not attempt these optimizations in general?

jkulubya · on Nov 5, 2023

Property accessors `.name` and methods `ElementAt(i)` can have side-effects. From just eyeballing the code, that together with the gc issue would make the compiler in my brain wary of removing the loop. I don't personally know if it's possible to convince the C# compiler that it's safe.

to11mtm · on Nov 5, 2023

... I could be wrong, but I -thought- it was possible for the JIT to see there are no side effects and inline on property access (so long as the property is not a virtual call or otherwise able to devirt.)

.ElementAt() OTOH has a chance to throw in most implementations AFAIR so yeah, a bit of a moot point in this case.

Edited to add:

Actually, there -could- still be side effects in the case of .Name on a class, specifically, it's possible that .ElementAt() could return a null. I'm not sure what cases (if any) that the JIT could get around that.

OTOH, in the case of a struct, as long as .ElementAt() -doesn't- throw, .Name will always return regardless of if it is a property or field, and as part of a struct the compiler should do a good job of inlining as long as the access has no side effects (and you don't have too many fields on the struct!)

mumblemumble · on Nov 5, 2023

Just taking a stab at it; this might not be exactly right --

It's kind of down to C# being compiled into bytecode and then JIT compiled at run-time. During the initial compilation phase, the compiler doesn't necessarily have enough information to know whether `ElementAt()` or `Name` has side effects. (I assume here that Name is a property getter and not a field, in keeping with .NET conventions.) And then at run time the JIT compiler isn't as aggressive as an AOT compiler would typically be about optimization, so it may be less likely to do any dead store elimination.

idonttalkenough · on Nov 5, 2023

Pretty much correct from a historical sense.

On top of this recent advancements in .net have lead to native AOT.

Something to look into.

idonttalkenough · on Nov 5, 2023

Now a days the major performance difference between languages is memory handling and allocation.

Microsoft has been making major moves in how c# gets compiled (AOT/JIT/Native). This is a concurrent effort to cross-platform support.

In doing going so they've minimized performance differences with other languages. The only thing they've yet to completely tackle is memory handling, so in reality, while it might not seem like it at first, your question is asking about that subject. Also some small tidbits with property accessors that the other comments have noted. To my knowledge these will be optimized away soon.

The GC is responsible for allocation and destruction.

to11mtm · on Nov 5, 2023

> Now a days the major performance difference between languages is memory handling and allocation.

And thread management.

> The only thing they've yet to completely tackle is memory handling,

And thread management.

Async/Await did a whole lot to help with concurrency, IValueTaskSource and IThreadPoolWorkItem helped bring the allocation cost for that back down...

But I still don't have a good way to, say, hint to the scheduler that 'these async work loops are important enough that I want them to always run in this dedicated group of threads'.

Also, having a way to get high precision Sleep() without hacks that have impact on the rest of the system would be nice too.

tubthumper8 · on Nov 5, 2023

Similar to javac, the C# compiler barely does any optimizations at all ahead of time. Optimizations are generally done at runtime by the JIT compiler in the CLR virtual machine

rozab · on Nov 5, 2023

As the author says, the list of names is actually read from file every single iteration because `classes` is a LINQ query. So there's all sorts of potential exceptions etc

nyssos · on Nov 5, 2023

> it doesn't look like any side effects are possible there

Putting side effects in an `ElementAt` implementation would be an extremely bad idea, but C# won't actually stop you.

SideburnsOfDoom · on Nov 4, 2023

This is where a well-placed .ToList() is called for, to reify avoid re-evaluating the enumeration.

e.g.

var classes = Names.Select(x => new PersonStruct { Name = x }).ToList();

for (var i = 0; i < classes.Count; i++)

mxz3000 · on Nov 4, 2023

I also believe that an IDE like Rider will complain that the IEnumerable returned by `.Select` is consumed multiple times.

progmetaldev · on Nov 4, 2023

Yes, and both Rider and VS with Resharper will offer a refactor.

diarrhea · on Nov 4, 2023

Good point! I remember that being a lint in Visual Studio. A very valid one.

In contrast, in Python, initial use exhausts generators. Subsequent iterations turns up empty. A gotcha, but also a way to highlight misuse, as it should show up in testing.

buybackoff · on Nov 4, 2023

It's so appalling that a paid content could be that bad. For me the first benchmark that was doing nothing in both cases is as shocking as the second.

But the thing I wanted to highlight is that the blog author is a real authority on structs vs classes at a very very low level. His series on structs performance are must read for any .NET dev who cares about performance at low level. E.g. when to use readonly structs or when to use a mutable one to avoid excessive copying. That kind of things. https://devblogs.microsoft.com/premier-developer/author/sete...

He has published two analyzers on NuGet and both are must have. One is focused on the struct usage ErrorProne.NET.Structs and, for example, highlights cases of defensive copies and could suggest when to make a struct readonly.

https://devblogs.microsoft.com/premier-developer/avoiding-st...

https://www.nuget.org/packages?q=errorprone

ecshafer · on Nov 4, 2023

I wish paid programming content was on average better. My assumption for paid content (outside of a university course) is that it will be awful. Paid programming content has consistently fell short of my expectations. Coursera is much better than average here imo, with courses like Functional Programming in Scala by Odersky and Andrew Ng's AI course. Pluralsight, Udemy, Udacity, all of them I have found to really be lacking.

buybackoff · on Nov 4, 2023

YouTube and InfoQ are free and have lots of great content.

.NET is probably lacking good content. Yet many things from Java are directly applicable.

E.g. on benchmarking & performance, there are true gems on InfoQ by Gil Tene, Martin Thompson, et al. I would pay for that content after watching it. A problem with paid courses is that payment goes first before evaluation. Maybe both sides do not care in cases such as corporate spend on continuous education...

mike_hock · on Nov 4, 2023

Can we just go ahead and call this straight up fraud? If you advertise a paid course on C# and don't have the first clue about the language, that can't be assumed to be in good faith.

buybackoff · on Nov 4, 2023

Another course on another platform that overpromises and underdelivers - it is just noise. I did not want to discuss this subject beyond expressing some surprise. Fraud is a too loaded word, but this fits perfectly here: https://en.wikipedia.org/wiki/Hanlon%27s_razor

Benchmarking is hard. If not an art then at least a distinct skill. It goes beyond just using Bnechmark.Net. It's a good first step but far from enough. Oftentimes one need to make sure a compiler does not optimize away some benchmark paths, e.g. by using volatile field accesses. Especially if you want to avoid overheads and measure only certain things. But here it was so obvious it could be meme-tagged #YouHadOneJob.

mike_hock · on Nov 5, 2023

> Hanlon's razor

Just because you put a name on a plain wrong idea doesn't make it true.

> Benchmarking is hard.

There were no benchmarks since the author didn't even understand how to write simple C#. A subtly flawed benchmark would be a whole different story.

ShamelessC · on Nov 5, 2023

> #YouHadOneJob

Author has made a career out of providing teaching materials, but hasn't worked to solve actual problems (e.g. in industry) for so long that their skills have degraded. One possible explanation.

chacham15 · on Nov 4, 2023

The n^2 argument the author points at is a red herring. The reason this is an invalid criticism of the benchmark is that both benchmarks are using the same query structure, so theyre both n^2. The author himself admits later on that the real issue is allocations. However, they posit that allocating of structs is done using a different allocator than classes. I dont know enough about C# to know if this is true, but even so, the advice that "structs are more performant overall" still holds...so this article seems to be mostly clickbait.

mjr00 · on Nov 4, 2023

The original benchmark author is just really unclear what they're trying to benchmark. "Class vs struct performance" is meaningless, because performance at what? The first benchmark does nothing, as the article points out. The second benchmark tests the performance of large numbers of object creations; but why is ElementAt even being called in a loop there? It's confusing since it's irrelevant to what actually ends up taking time. If you're benchmarking the time it takes to create the struct/class objects, just write a benchmark that does that and don't include random other code!

And "structs are more performant" isn't even a correct conclusion; they're pass-by-value, so you could construct benchmarks where the copy time outweighs heap memory allocation time, eg constructing a large object once and passing it to a function many times.

quietbritishjim · on Nov 4, 2023

I don't think you've really refuted the parent comment here. But, what you have done, is written a much better blog post than the original article :-)

mjr00 · on Nov 4, 2023

I wasn't really trying to refute the parent! I agree that the article author focusing on LINQ making things secretly n^2 is a red herring (though important to know). But the more fundamental problem is the benchmark being very unclear in what it's benchmarking.

ablob · on Nov 4, 2023

If struct copies are you problem you can always pass it by ref, so even that isn't an easy claim to make.

neonsunset · on Nov 4, 2023

The criticism aims at poorly written benchmarking code that fails to evaluate its own claim and uses an anti-pattern. You may want to read the article in full.

Also, structs do not use an allocator, this is basics of many programming languages - they simply represent a structure in memory, which by default is placed on the stack. Think an integer variable in a local method scope.

astrange · on Nov 4, 2023

Allocation with a GC is typically not any more expensive than being "on the stack", so I don't think something being "on the stack" is a useful distinction.

(And in a language with stackful closures the stack itself is GCed.)

neonsunset · on Nov 4, 2023

This is incorrect in multiple ways. C# stack is completely native, identical to C++ or Rust.

The following factors contribute to “structs being faster”:

- Heap allocations have go to through allocation calls, which need to find free memory, possibly zero it out, and then return pointer (reference) to it, both in managed and unmanaged languages, with C# being much faster at small object allocation (tlv read, pointer bump, and object header write) while unmanaged wins for large allocations instead (you don't have to go through LOH and extra cost associated with it). In comparison, stack is already zeroed for structs that are written to it, and those are just movs (or ldr/ldp's and str/stp's in case of arm64), and even then, only when spilled to stack at all (see below)

- Stack may not be the best way to describe it - think "local exclusively owned memory" which means that compilers, no matter how strict, can reason about the exact lifetimes of local values and the changes that happen to them. This means that all struct values can be promoted to CPU registers and never touch memory unlike with heap allocations, where multiple reads of the same property may require repeated dereferencing to account for the fact that object memory may be globally observable. This in turn applies to optimizations like CSE which can elide multiple identical checks against struct values knowing they won't change between operations.

- In .NET, generic method bodies for class-based generic arguments are shared (closest example in Rust - Box<dyn Trait>-based dispatch but with less overhead). However, struct generic arguments force method body monomorphization aka emitting specialized version for the exact generic type, which allows to write code with zero-cost abstractions the same way one would do in Rust with generics or in C++ with templates.

astrange · on Nov 4, 2023

> In comparison, stack is already zeroed for structs that are written to it

This is not possible. A stack is a bump pointer allocator and is the same as any other bump pointer allocator. This includes having to decide when/if to zero memory. (The best time is on free because of memory compression, but most implementations don't do this.)

It's certainly not true that the unused part of the stack is always already zeroed; what if you already used it once? (But it is true if you zero on free.)

> - Stack may not be the best way to describe it - think "local exclusively owned memory" which means that compilers, no matter how strict, can reason about the exact lifetimes of local values and the changes that happen to them.

This is escape analysis and applies to anything with a known lifetime.

neonsunset · on Nov 4, 2023

This is .NET and not JVM, my previous comment describes how it works today.

Be it C++, Rust or C#, the necessary space on stack is usually reserved in function/method prologue when known statically. Additionally, because C# guarantees that all local variables/memory are initialized, the corresponding stack space is pre-zeroed (it is efficient since it is done with widest applicable writes - scalar, sse/avx(2/512)/neon, etc. (arm64 has dczva which kicks in above certain threshold).

Regardless, the cost is not in bumping the offset/ptr or zeroing out the memory, it is in going through the allocator call (even if it's inlined, you're still executing more code) and the book-keeping required for heap allocations in general (both .NET's GC and allocators like Mimalloc do it), and then there is subsequent cost for tracking and collecting objects in the case of GC.

In addition, .NET does not do escape analysis because, again, it is not JVM - while it may be added in the future, it is (relatively) unprofitable to do today because allocation traffic is far lower since everything isn't a potentially escaping object, and structs or stack-allocated buffers are often used in performance-sensitive code (or where it makes sense to do so in general). The way .NET views the objects is similar to the way C++ views heap allocated data, albeit with less aggressive (and often unsound or UB) assumptions compared to GCC. I cannot stress this enough that while JVM's escape analysis does lead to object stack-allocation, the reasoning the compilers can do about state of the data on stack is what e.g. JVM gets as a result of doing escape analysis, not vice versa. And other "unmanaged" languages are subject to similar limitations when it comes to stack vs heap.

vips7L · on Nov 5, 2023

IIRC escape analysis in HotSpot won’t actually allocate the whole object on the stack. It’ll explode the object into its sub fields and store them in registers or spill them into the stack if register pressure is high.

astrange · on Nov 5, 2023

This is a common optimization in any compiler, not just HotSpot. Although it's more important and more extensive in the JVM where almost everything is a heap allocation, yes. (Although the technical term isn't "escape analysis"; that just covers analysis, not resulting optimization.)

I've noticed Microsoft seems to think ordinary compiler optimizations are deep magic they're very proud of not implementing. Do they just not have good enough compiler people?

neonsunset · on Nov 5, 2023

> I've noticed Microsoft seems to think ordinary compiler optimizations are deep magic they're very proud of not implementing. Do they just not have good enough compiler people?

What in our discussion has prompted you to respond with an ad-hominem attack?

astrange · on Nov 5, 2023

That's not an attack, unless you designed .NET, in which case I'm of course referring to your recruiters and not you.

Anyway, it's the part about how C# doesn't need to implement scalar replacement because it has structs. Do it anyway, it's good!

But I've also noticed (reading some .NET developer blog post I couldn't find for you now) them talking about how they couldn't do inlining because it would take too long and be too slow, so they put some very simple heuristics that did not look like a good trade off. Inlining of course is often very beneficial and can decrease code size.

neonsunset · on Nov 5, 2023

Well, this kind of attitude is just sad.

Rather than imagining issues .NET has without verifying them first and then complaining, I'd like to suggest to spot check assumptions with Godbolt[0] which would be a good start (it can't show DynamicPGO, NativeAOT-specific and some other opts but is still fairly illustrative).

A more comprehensive view of produced asm can be acquired with [DisassemblyDiagnoser] attribute when running code with BenchmarkDotNet [1] (in the Java world a similar solution is called JMH).

[0] https://godbolt.org/

[1] https://benchmarkdotnet.org/articles/guides/getting-started....

vips7L · on Nov 5, 2023

> Although the technical term isn't "escape analysis"; that just covers analysis, not resulting optimization.

Yeah I meant scalar replacement!

whoknowsidont · on Nov 4, 2023

>C# stack is completely native, identical to C++ or Rust.

This is absolutely not true. Where are you getting this from? Pray tell, what you think this is: https://github.com/dotnet/runtime

sitharus · on Nov 4, 2023

The odd thing is this behaviour is documented. Classes reference types, are always heap allocated, and passed by reference. Structs are value types, allocated inline (that is inline in the containing object/array, or on the stack for local variables), and passed by value.

Allocation time is going to be around the same for both types due to the GC, but there are performance implications depending on what you do later. In particular you can avoid garbage collection with appropriate use of structs but pass by value can mitigate those improvements.

nullzzz · on Nov 4, 2023

To me the point seems that the benchmark is so misguided and missed the obvious error in the usage on LINQ that the results are not relevant. You should not take perf advice from the authors of that course.

agent281 · on Nov 4, 2023

I disagree. Using an n^2 algorithm will exaggerate the difference between the two data types at higher values of n. Using a linear algorithm would give a more consistent perspective on the relative performance of the two data types.

throwaheyy · on Nov 4, 2023

> I dont know enough about C# to know if this is true

Do you know enough about C# to realize that .Select() by itself doesn’t materialize the collections, making the benchmark completely nonsensical?

The query structure is the same because it was a failed attempt to evaluate classes vs. structs, not one query vs. another.

quietbritishjim · on Nov 4, 2023

The comment you're replying to is talking about the second benchmark, which does access the enumeration so does materialize the select().

starburst · on Nov 4, 2023

The first giveaway was the use of LINQ in a performance post

phillipcarter · on Nov 4, 2023

This is a strange comment. LINQ is an extremely common way to write code in C#, and the performance of code that uses it is certainly relevant. Additionally, this is a performance comparison post. If the baseline uses LINQ but compares something else, the other tests should also use LINQ.

whoknowsidont · on Nov 4, 2023

>LINQ is an extremely common way to write code in C#

It is extremely uncommon in performance contexts. It is actively discouraged and removed when writing performant C# code.

It is incredibly common in your "run of the mill" enterprise apps, or contexts where performance can slow down a bit for the sake of programmer happiness.

throwaheyy · on Nov 4, 2023

It isn’t uncommon when you know what it is doing. Wholesale removal or discouragement of LINQ is a sign of fake cargo-cult performance “optimization”.

It’s perfectly fine to use if you learn about how it works and how to use it properly.

whoknowsidont · on Nov 4, 2023

LINQ is going to add overhead, regardless of "properly" using it or "cargo-culting" things; save the platitudes for the Monday zoom meeting.

LINQ adding overhead is a _technical reality_, it's how it works and that is fine. It's a fine tool in many difference contexts, but when we talk about performant code the context is obviously one in which every cycle matters.

And those of us with enough experience know that LINQ performance and implementation details varies over time in the runtime, and those shifts aren't always positive.

So when writing code where performance is fundamental to the success of the application, avoid LINQ since it WILL add overhead and it will remove implementation control from your team. It is a risk without much benefit when you're in the performance arena. That doesn't mean it's not useful in many other contexts.

fabian2k · on Nov 4, 2023

There are cases where using the straightforward LINQ code would be a lot faster than a lower level alternative. For example when the code can be vectorized and use AVX instructions, which is implemented for quite a few LINQ methods. A straightforward non-LINQ version of the code would likely be slower as most developers would not or can't write the low-level AVX version.

I'd certainly be careful about LINQ in certain performance-sensitive code, e.g. about creating unnecessary copies of the data and allocating too much. But I would not trust myself without measuring to really know whether it actually makes a difference or if my "optimized" code might be even slower.

progmetaldev · on Nov 4, 2023

A lot of LINQ was also optimized and improved with the move to .NET Core (now just .NET). It's definitely important to actually profile the code, rather than just assume LINQ is slow/less-performant. Most of the time, unless the developer has added unneeded code (such as calling .AsEnumerable, or most anything that evaluates the entire collection), the difference between LINQ and standard iterator based code will be nearly non-existent, with some of the cases you mentioned where LINQ has been optimized beyond what a developer can do by hand.

Const-me · on Nov 5, 2023

> AVX instructions, which is implemented for quite a few LINQ methods

Are you sure? Any examples of such methods? And does AVX actually helps?

I don’t think that’s possible because IMO AVX and other SIMD can only help for dense inputs. The C# type is ReadOnlySpan, however ReadOnlySpan doesn’t implement IEnumerable and therefore incompatible with LINQ.

There’s even an alternative LINQ to workaround https://github.com/NetFabric/NetFabric.Hyperlinq but that thing is a third-party library most people aren’t using.

neonsunset · on Nov 5, 2023

- https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...

Const-me · on Nov 5, 2023

Interesting, I never heard about that. Was merged to master in February 2022, I wonder which release version includes the changes?

Still, the support seems very limited. They simply probe argument type for arrays and lists. Any other IEnumerable gonna return false from TryGetSpan, which reverts to the legacy scalar implementation.

merb · on Nov 5, 2023

pretty sure thats dotnet 7 only stuff. I can vaguely remember a blog post about it: https://devblogs.microsoft.com/dotnet/performance_improvemen... (I think this is the one?)

it's faster in bigger arrays/lists but smaller ones barely make a difference, even the linq vs non-linq make basically only noise difference as far as I remember.

neonsunset · on Nov 5, 2023

Algorithms that work on non-contiguous or synthesized data are not subject to vectorization in any language, aside from select cases where LLVM is able to inline and auto-vectorize loops in Rust for certain simple cases of map->collect.

What is your argument then?

Const-me · on Nov 5, 2023

My point is the support is very limited, only works for a few trivial use cases, like sum/min/max/average of arrays of a few selected types.

I believe it’s technically possible to vectorize more complicated stuff in C#, just the runtime library is not doing that. For an example, look at how Eigen C++ library https://eigen.tuxfamily.org/index.php?title=Main_Page does their math. Under the hood, they wrap inputs into classes which supply SIMD registers, then do math on these registers. Eigen does that in compile-time with template metaprogramming. A hypothetical C# implementation could do similar things using generics and/or runtime code generation. LINQ from the standard library was never designed for high-performance compute, but I think it might be possible to design similar API for that.

neonsunset · on Nov 5, 2023

How is that relevant to the vast majority of the code targeted by LINQ?

The niche scenario you have outlined is partially covered by a recent System.Numerics.Tensors package update (even though I believe it would have been best if there was a community-maintained package with comparable quality for a variety of reasons).

The goal of LINQ itself is to offer optimal codepaths when it can within the constraints of the current design (naturally, you could improve it significantly if not for backwards compatibility with the previous 15 or so years of .NET codebases). The argument that it's not good because it's not the tool to do BLAS is just nonsensical.

There is, however, an IL optimizer that can further vectorize certain common patterns and rewrite LINQ calls into open-coded loops: https://github.com/dubiousconst282/DistIL

Const-me · on Nov 6, 2023

> How is that relevant to the vast majority of the code targeted by LINQ?

The people I responded to were discussing applicability of LINQ. I think very fast sum of List<int> collections doesn’t compensate for suboptimal performance of pretty much everything else.

For 80% of problems that “suboptimal” is still fast enough for the job, but for other 20% it’s important. Using the same C# language it’s often possible to outperform LINQ by a large factor, using loops, SIMD intrinsics, and minimizing GC allocations.

> partially covered by a recent System.Numerics.Tensors package

They don’t generate code in runtime, they treat C# as a slower and safer C. I’m pretty sure the higher-level parts of the runtime allow more advanced stuff, similar to expression templates in Eigen, but better because runtime codegen could account for different ISA extensions, and even different L1/L2 cache sizes.

neonsunset · on Nov 6, 2023

I see, my bad, should not have responded in the first place.

starburst · on Nov 4, 2023

LINQ is pretty much frowned upon when programming games in C#, also when doing performance comparison you want to get as close as possible to the actual code without the extra overhead.

I would very much verify anything and not take it at face value when a C# performance post use LINQ.

bob1029 · on Nov 4, 2023

LINQ codepaths are only getting faster. A literal army of engineers is focused on this stuff full time.

https://devblogs.microsoft.com/dotnet/performance_improvemen...

> dotnet/runtime#64470 is the result of analyzing various real-world code bases for use of Enumerable.Min and Enumerable.Max, and seeing that it’s very common to use these with arrays, often ones that are quite large. This PR updates the Min<T>(IEnumerable<T>) and Max<T>(IEnumerable<T>) overloads when the input is an int[] or long[] to vectorize the processing, using Vector<T>. The net effect of this is significantly faster execution time for larger arrays, but still improved performance even for short arrays (because the implementation is now able to access the array directly rather than going through the enumerable, leading to less allocation and interface dispatch and more applicable optimizations like inlining).

What are the chances that you'd have patience to write a competitive bug-free SIMD implementation?

fabian2k · on Nov 4, 2023

Games in C# might be written in Unity, and a lot of those improvements wouldn't apply there. So in that context this might be accurate because it's an entirely different runtime.

neonsunset · on Nov 4, 2023

The relationship between LINQ and performance is not trivial, it pretty much depends on what you do (more complex LINQ chain -> worse overhead).

It does have base cost (allocating iterator object(s)), but it's less than what you think, I have seen enough game code that does intermediate list allocations when it doesn't need to, which are far costlier than LINQ.

In addition, the benchmarks that do other positive work alongside the benchmarked aspect can sometimes be more illustrative and overall better because it is much more important how a particular approach works together with surrounding code, matching more closely real world scenarios.

And last but not least - in this case using structs yields additional advantage with LINQ since monomorphization of methods where generic arguments are structs has additional codegen quality benefits.

starburst · on Nov 4, 2023

It is certainly possible to write slow code without LINQ, all I'm saying is that I wouldn't blindly trust a blog post that talk about performance and use LINQ.

neonsunset · on Nov 4, 2023

The article contents suggest deep understanding of the topic.

This type of thinking ("LINQ bad" or "SOLID good") is one reason among many why bad patterns proliferate through the projects e.g. "hey you should rewrite this code with SOLID principles in mind" (without accounting for the context) or "This code calculates the sum using LINQ, you should rewrite it" (LINQ's Sum implementation uses SIMD and is hard to beat).

starburst · on Nov 4, 2023

The article is fine, I was referencing the original article the article is referencing.

kevingadd · on Nov 5, 2023

fwiw, it's possible to use LINQ in games no problem if you provide your own implementation(s) of the core LINQ methods you care about. I'm using LINQ and async/await in my game with no problems (~900fps, one short gen0/gen1 GC every 70 seconds or so) since I did the work to write zero-allocation versions of the basic operators like Select and Where. The design of LINQ is such that the C# compiler will use whatever implementation of the operator(s) is available when it converts your SQL-y queries into actual code.

I suspect the BCL doesn't include zero-allocation query operators because they generalize poorly, but I'm not sure. Zero-allocation query operators end up looking like 'ZeroAllocSelect<TEnumerable, TSource, TResult> Select (this TEnumerable seq, Func<TSource, TResult> func) where TEnumerable : IEnumerable<TSource>' which is obviously not trivial for the JIT to compile (or trivial to write)

The closures it creates for your queries are kind of a pain though. It's possible that will have improved in NET8 or NET9 because the allocation rules for delegates were recently revised to allow more optimizations, but I don't know if that was fixed.

starburst · on Nov 5, 2023

I actually love the syntax if only it didn't alloc, is your implementation open source?

ttymck · on Nov 4, 2023

Wouldn't it be _more_ informative to see a "realworld" project, possibly built using LINQ, and the performance comparison done within the context of that project?

starburst · on Nov 4, 2023

If what you want to measure is LINQ performance, sure, but in the context of measuring the language fundamental like class versus struct, it is an unnecessary overhead.

The article itself says it:

> However, the main reason why the benchmarks are not correct is because of LINQ and lazy evaluation.

sixothree · on Nov 4, 2023

Honestly I wouldn’t mind knowing that linq is fast when using one versus the other.

Hixon10 · on Nov 4, 2023

Does dotnet has any tool/profiler, which allows to count number of copy bytes (not allocations)? For example, if I want to benchmark 2 different function and find out, which one from them copies more (so, do more work).

kevingadd · on Nov 5, 2023

It's built into visual studio's profiler

AtNightWeCode · on Nov 6, 2023

I believe the PersonClass should be sealed in a 1-1 comparison.

tester756 · on Nov 4, 2023

Why so much focus on N^2?

GOPbIHbI4 · on Nov 4, 2023

It is quite sad that a paid content like one from Pluralsight could provide a very bad advices on how to measure performance or how to use one or another language feature.

Spoiler: the benchmarking code had quadratic complexity instead of linear. Yes, in the course on performance.

karmakaze · on Nov 4, 2023

> It seems that the complexity is O(N^2) rather than O(2*N^2) as I expected. This is interesting! Obviously, my understanding of LINQ was incorrect.

Perhaps just a wording problem, but big-O notation doesn't care about constant factors.

nemetroid · on Nov 4, 2023

What the author means is that they thought that the complexity was O(n^2) for two different reasons, but it turned out that only one of those reasons was valid. But abusing big-O notation is not a good way to express that.

klysm · on Nov 4, 2023

Not only does it not care, it’s literally by definition that those two are identical

klysm · on Nov 4, 2023

In practice though the factor of two can be relevant

beart · on Nov 4, 2023

The author explicitly stated O(2*N^2) is the same as O(N^2), but maybe that was a later edit?

karmakaze · on Nov 4, 2023

Yeah it's edited, but still miscommunicating--this section should drop the big-O if it want to talk about constant factors.