Found this the other day, and wanted to share. Source is here: https://github.com/scross99/locic. Pleasantly readable and surprisingly compact given that it implements polymorphism, templates, modules, etc, as well as code generation to LLVM.
Yes, this is true; the point was to highlight more generally the complexity of ensuring correctness when moving/copying C++ objects versus doing the same thing in Loci.
Yep, it's not the most attractive parser ever. I chose to use Bison in order to save a lot of time/work and to take advantage of its GLR implementation; unfortunately its interaction with C++ isn't pleasant (due to the use of a union type for the symbol values).
That looks fairly clean for a Yacc parser. I do wish there was a more modern, friendlier parser-generator; yacc C++ tends to leak AST objects on error unless you use a pool allocator.
I like this language because it doesn't try to do anything too fancy. It's "just" a streamlined C++.
I do wonder how they're planning on avoiding overhead with the garbage collector, though. Once you have a single garbage collected object on your heap, it seems like you'd have to start scanning the whole thing to make sure the garbage collected object is still alive.
The overhead of GC is independent of heap size. Take the following variables:
L = total size of live objects in heap
H = maximum size of heap
P = time between collections
R_a = rate of allocation
R_s = rate of heap scanning
It's easy to define P and R_s (assume a mark-sweep collector so we can ignore copy reserve):
P = (H - L) / R_a
R_s = L / P
Substitute and simplify:
R_s = L / ((H - L) / R_a)
R_s = L / (H - L) * R_a
In other words, it doesn't matter how big your heap is. If you make your heap twice the live size, the amount of heap scanning you do will always be equal to your allocation rate times some constant. If H = 2L, that constant is 1.
So the way to improve performance is to cut down on that allocation rate. If you use RAII for most objects, and GC only occasionally, your allocation rate, net of non-GC allocations, will be very low, which means your scanning rate will be very low. You'll have to traverse all live objects every GC cycle, but you'll rarely have to do a GC cycle.
Say we use your example of H = 2L. Then we have R_s = L / (2L - L) * R_a, or R_s = R_a. Sure.
Now say we double the heap: H = 4L. Then we have R_s = L / (4L - L) * R_a, or R_s = R_a/3. For a fixed allocation rate, increasing the heap makes the scan rate go down, i.e. we scan less often.
It seems like you're assuming that the live set is a fixed fraction of the heap size, but that doesn't make sense to me. The size of the live set is a function of what the program does, not its heap size.
Are you sure about this? The main overhead of a mark-and-sweep GC isn't the marking phase, it's the sweeping phase - you need to find and free all dead objects (to mark them as free/add them to the free list), and since you don't know where they are (by definition, since they are dead, they have no references pointing at them) you have to scan the whole heap (or at least all the pages that were objects were allocated/live on since the last GC cycle).
If H is a constant multiple of L, then your scan rate will be a constant multiple of your allocation rate, independent of heap size.
In practice, you won't sweep the whole heap after each GC cycle, but will do it lazily: http://www.hboehm.info/gc/complexity.html. The point in that article about marking time dominating allocation/sweep is even more true today. Allocation and sweep access memory linearly and are easily parallelized. Meanwhile, marking accesses memory randomly, and the maximum concurrency is dependent on the shape of the live object graph.
I don't think that is generally true. Sweeping has much better cache locality than marking, which is following pointers all over who-knows-where in the heap.
Sure, but since any object could have a reference to a GC object, you still have to eat the latency of scanning the entire heap (not just the GC'd objects) every so often. That seems like a pretty big performance cliff.
* Use the garbage collector.
* Don't use the garbage collector.
I'd expect most users to choose option (2), however the idea is to support garbage collection for those users who find it to be worthwhile.
I say this choice is theoretical because the current implementation doesn't yet have a garbage collector (this is not a high priority task); ultimately I want to provide a second implementation of std.memory that supports garbage collection and when users build their project they can choose whether they want the garbage collected implementation.
This looks very promising. As a Python developer who primarily does systems programming, I sometimes have to jump into C for system calls and optimization, or to Java for stronger type safety and real interfaces, neither of which I particularly enjoy. I could definitely see doing new projects in a language like this.
Nim should look very attractive to Python developers. It's a good looking language that has positioned itself in a good market: static typing, looks and feels like scripting (not verbose), and almost systems level (has GC).
I haven't got any experience with Go development, but from those I know that use it, I hear nothing but promising things about Nim. "Modern" is a word that gets used.
Between Rust and Nim, we're starting to see some really clever new languages that fix common pain points. Both of these are so neat it makes me feel as though we're in a programming language boom right now.
While it is not the case of nim, I tend to think LLVM enabled a lot of this "boom".
It gives you a high quality set of components that cover a big bunch of the things you need when implementing a compiler. Energy can be spent in the language itself.
A lot of Python corresponds 1:1, IME. The biggest difference I notice is the _ shortcut, which I wish Python would adopt - "_ + 1" is so much shorter than "lambda x: x + 1" and no less clear. (And you can always use the "{x => x + 1}" style if you want).
Oh, and I guess the constructor syntax, which again I wish Python would adopt. I end up with too many "self.x = x" lines in my Python __init__ methods; pure syntactic ceremony.
I don't understand why vtables are implemented as hash tables.
The docs [1] describe that objects (of class type) only contain data (i.e. object fields), and no vtable pointers. Only when an object is cast to an interface type is a vtable generated and passed along with the object pointer in a fat pointer.
However, instead of just structuring the vtable as an array of pointers to methods (ordered e.g. by name), Loci instead generates a hashtable with method resolution stubs in case of conflicts. I don't understand why - since the target interface's type (and methods) are known at compile-time, it would be just as easy to fix the ordering of the methods and use the array approach (like in C++), instead of using a hashtable.
"This design also differs from C++ in that vtable pointers are not stored in objects, but are part of an interface type reference. This decision is particularly appropriate to the language, since Loci doesn’t require classes to explicitly implement interfaces"
This is irrelevant. Every time an object is cast to an interface, the compiler needs to know the definition of that interface. Therefore, it's just as easy to generate a hashtable or an array.
Basically I've had a lot of experience working with C++ on various projects and while I like it very much I've also observed its weaknesses (right now I'm struggling against slow build times); after a while it seemed logical to build a language that would solve a lot of these problems so that I wouldn't have to face them over and over again for each project.
I think this may fill a void in the "teaching language" space. It doesn't require lots of boilerplate but retains some key teachable moments (type system, pointers, stack vs heap, interfaces, etc.). I think a language with this sort of brevity and breadth of features would do well as a replacement for the Java/Python/C/C++ hodgepodge that currently exists at many universities.
So the kicker: is it ABI-compatible with C++? I've come to like the C compatibility in Julia and Rust (and I'm sure I'll come to like the C compatiblity in Loci, too), but the relative lack of C++ interface support (something that I know Julia's been working on, at least based on what I've seen on the mailing lists) has been a bit of a sore point for integrating with C++ projects.
The name mangling scheme is similar to (one of) C++'s so that could probably be made more compatible.
However the vtable mechanism is completely incompatible and given the reasoning in the docs (to support structural typing), I don't think it's a big stretch to say that it's a fundamental incompatibility.
TBH I'd be quite surprised if many (any?) languages will aim for C++ ABI compatibility that's much beyond simple extensions to the C ABI. Even ignoring the lack of a standard C++ ABI, once you start getting into virtual functions, exception handling and, especially, templates, you end up constraining your language to a point where you're probably better off just using C++.
I may well be wrong on this, LLVM helps a lot with the grunt work of name mangling, exceptions, trampolines etc. so it may be possible to get to a sweet spot that supports a significant number of important libraries without hamstringing the language too much. Not going to hold my breath though...
I can't find any mention of if they provide a C-compatible ABI. Kinda useless as a systems language if they don't - you can't call it from anything else.
Perhaps I scanned too quickly. I probably missed since there is no actual mention of ABI (or e.g. which calling conventions it supports on what platforms). The page is confusing - it seems to discuss ABI and syntax and stuff, which isn't exactly related to "compatability" afaict.
I digress... just the usual problems with wiki-type sites.
It compiles to LLVM and LLVM abstracts the calling convention. So it might be safe to assume it supports all the calling conventions that LLVM supports.
An example would be dynamic dispatch of virtual functions.
In C++ virtual functions are implemented as vtables attached to each object. As such you need to retrieve the object (possibly from the heap), there's a function pointer indirection, there may be trampolines if you're using multiple inheritance. That sort of thing.
The author has a nice piece in the documentation regarding how dynamic dispatch is implemented in loci in order to support structural typing. He explains the design choice, how it differs from C++ (a vtable at each interface reference call site) and gives some qualitative estimates as to how the performance would differ. It's worth the read.
C doesn't have dynamic dispatch built in to the language. If you want it, you'd have to code it by hand.
Seems it was created by one developer called Stephen Cross (scross.co.uk). I am quite impressed by his accomplishment. The only thing I miss are custom allocators.
He's 2.5 years out of his undergraduate degree. This is incredibly sophisticated code and documentation for such a person. Quite aside from the language, the person himself is worthy of note!
There may be some duality on the use of the word "language". If it is used in the literal sense, I agree. In another case (i.e., if you were referring to languages and their execution environments), I beg to differ. For the end user of a programming environment, differences on syntax are mostly tangible in the aesthetic sense. Some languages are more readable or expressive, indeed, but, in the end, syntax has something to do with our perceptions on how beautifully the code lays out.
Execution environments, on the other hand, are tailored for a class of problems, incorporating useful abstractions. Yes, I know it is not so evident in a world where we have general-purpose programming environments by the dozen; when we talk about ___domain-specific languages, this makes a lot of sense. They can only gain expressiveness if they can represent bigger abstractions with fewer words.
Eh? Languages are tools you use for their ecosystem, and especially for what work you can have done for you and what kinds of correctness the language encourages. Syntax is something you have to put up with as part of constructing a program.
I don't think abstract paradigms really fit into "the way you think about programming". Abstraction is software design, it doesn't belong to algorithms and how you make them work in an environment.
If that's seriously a problem then just don't check it out. I personally find it ridiculous that it's a requirement for someone's identity to be attached to a project in order for someone to even consider seeing what it's about.
What I think is particularly need is making lvalues and rvalues explicit constructs in the language: http://loci-lang.org/LvaluesAndRvalues.html, as well as being able to redefine the behavior of certain implicit operations relating to copies and moves: http://loci-lang.org/ImplicitOperations.html.