Loci: A C++-like systems programming language

rayiner · on Feb 19, 2015

Found this the other day, and wanted to share. Source is here: https://github.com/scross99/locic. Pleasantly readable and surprisingly compact given that it implements polymorphism, templates, modules, etc, as well as code generation to LLVM.

What I think is particularly need is making lvalues and rvalues explicit constructs in the language: http://loci-lang.org/LvaluesAndRvalues.html, as well as being able to redefine the behavior of certain implicit operations relating to copies and moves: http://loci-lang.org/ImplicitOperations.html.

hendzen · on Feb 20, 2015

Note that the given C++ class in the Lvalue/Rvalue example could simply be written as so:

  class SomeType {
    public:
       SomeType(int a) : p{std::make_unique<int>(a)} {}
    private:
       std::unique_ptr<int> p;
  };

This version supports move operations as well.

scross · on Feb 20, 2015

Yes, this is true; the point was to highlight more generally the complexity of ensuring correctness when moving/copying C++ objects versus doing the same thing in Loci.

millstone · on Feb 20, 2015

That parser is something: https://github.com/scross99/locic/blob/master/lib/Parser/Par...

72deluxe · on Feb 20, 2015

It really is! I wonder why they didn't specify the namespaces they were using instead of littering it with namespace and scope operators?

scross · on Feb 20, 2015

The intention is to avoid any kind of name clashes; I don't think this really inhibits readability.

scross · on Feb 20, 2015

Yep, it's not the most attractive parser ever. I chose to use Bison in order to save a lot of time/work and to take advantage of its GLR implementation; unfortunately its interaction with C++ isn't pleasant (due to the use of a union type for the symbol values).

pjc50 · on Feb 20, 2015

That looks fairly clean for a Yacc parser. I do wish there was a more modern, friendlier parser-generator; yacc C++ tends to leak AST objects on error unless you use a pool allocator.

panic · on Feb 19, 2015

I like this language because it doesn't try to do anything too fancy. It's "just" a streamlined C++.

I do wonder how they're planning on avoiding overhead with the garbage collector, though. Once you have a single garbage collected object on your heap, it seems like you'd have to start scanning the whole thing to make sure the garbage collected object is still alive.

rayiner · on Feb 19, 2015

The overhead of GC is independent of heap size. Take the following variables:

    L = total size of live objects in heap
    H = maximum size of heap
    P = time between collections
    R_a = rate of allocation
    R_s = rate of heap scanning

It's easy to define P and R_s (assume a mark-sweep collector so we can ignore copy reserve):

    P = (H - L) / R_a
    R_s = L / P

Substitute and simplify:

    R_s = L / ((H - L) / R_a)
    R_s = L / (H - L) * R_a

In other words, it doesn't matter how big your heap is. If you make your heap twice the live size, the amount of heap scanning you do will always be equal to your allocation rate times some constant. If H = 2L, that constant is 1.

So the way to improve performance is to cut down on that allocation rate. If you use RAII for most objects, and GC only occasionally, your allocation rate, net of non-GC allocations, will be very low, which means your scanning rate will be very low. You'll have to traverse all live objects every GC cycle, but you'll rarely have to do a GC cycle.

millstone · on Feb 20, 2015

I followed your math, but not your conclusion!

Say we use your example of H = 2L. Then we have R_s = L / (2L - L) * R_a, or R_s = R_a. Sure.

Now say we double the heap: H = 4L. Then we have R_s = L / (4L - L) * R_a, or R_s = R_a/3. For a fixed allocation rate, increasing the heap makes the scan rate go down, i.e. we scan less often.

It seems like you're assuming that the live set is a fixed fraction of the heap size, but that doesn't make sense to me. The size of the live set is a function of what the program does, not its heap size.

rayiner · on Feb 20, 2015

The calculation assumes the program is in equillibrium (no net growth of L). Many GCs, such as Boehms, maintain a fixed ratio of L and H.

tomp · on Feb 20, 2015

Are you sure about this? The main overhead of a mark-and-sweep GC isn't the marking phase, it's the sweeping phase - you need to find and free all dead objects (to mark them as free/add them to the free list), and since you don't know where they are (by definition, since they are dead, they have no references pointing at them) you have to scan the whole heap (or at least all the pages that were objects were allocated/live on since the last GC cycle).

rayiner · on Feb 20, 2015

R_s for a naive mark-sweep looks like this:

    R_s = (L + H) / (H - L) * R_a

For a copying collector:

    R_s = L / (H - 2L) * R_a

If H is a constant multiple of L, then your scan rate will be a constant multiple of your allocation rate, independent of heap size.

In practice, you won't sweep the whole heap after each GC cycle, but will do it lazily: http://www.hboehm.info/gc/complexity.html. The point in that article about marking time dominating allocation/sweep is even more true today. Allocation and sweep access memory linearly and are easily parallelized. Meanwhile, marking accesses memory randomly, and the maximum concurrency is dependent on the shape of the live object graph.

mnemonik · on Feb 20, 2015

I don't think that is generally true. Sweeping has much better cache locality than marking, which is following pointers all over who-knows-where in the heap.

panic · on Feb 20, 2015

Sure, but since any object could have a reference to a GC object, you still have to eat the latency of scanning the entire heap (not just the GC'd objects) every so often. That seems like a pretty big performance cliff.

scross · on Feb 20, 2015

Well, theoretically, you have two choices:

* Use the garbage collector. * Don't use the garbage collector.

I'd expect most users to choose option (2), however the idea is to support garbage collection for those users who find it to be worthwhile.

I say this choice is theoretical because the current implementation doesn't yet have a garbage collector (this is not a high priority task); ultimately I want to provide a second implementation of std.memory that supports garbage collection and when users build their project they can choose whether they want the garbage collected implementation.

keypusher · on Feb 20, 2015

This looks very promising. As a Python developer who primarily does systems programming, I sometimes have to jump into C for system calls and optimization, or to Java for stronger type safety and real interfaces, neither of which I particularly enjoy. I could definitely see doing new projects in a language like this.

kxo · on Feb 20, 2015

Have you seen Nim? (http://nim-lang.org)

possibilistic · on Feb 20, 2015

Nim should look very attractive to Python developers. It's a good looking language that has positioned itself in a good market: static typing, looks and feels like scripting (not verbose), and almost systems level (has GC).

I haven't got any experience with Go development, but from those I know that use it, I hear nothing but promising things about Nim. "Modern" is a word that gets used.

Between Rust and Nim, we're starting to see some really clever new languages that fix common pain points. Both of these are so neat it makes me feel as though we're in a programming language boom right now.

dmacvicar · on Feb 20, 2015

While it is not the case of nim, I tend to think LLVM enabled a lot of this "boom".

It gives you a high quality set of components that cover a big bunch of the things you need when implementing a compiler. Energy can be spent in the language itself.

lmm · on Feb 20, 2015

That was my position before I discovered Scala - stronger type safety than Java, more concise/clear/expressive than Python.

keypusher · on Feb 20, 2015

I respect the design descisions that Scala made, but I don't find the syntax to be very clear or intuitive.

lmm · on Feb 20, 2015

A lot of Python corresponds 1:1, IME. The biggest difference I notice is the _ shortcut, which I wish Python would adopt - "_ + 1" is so much shorter than "lambda x: x + 1" and no less clear. (And you can always use the "{x => x + 1}" style if you want).

Oh, and I guess the constructor syntax, which again I wish Python would adopt. I end up with too many "self.x = x" lines in my Python __init__ methods; pure syntactic ceremony.

Ace17 · on Feb 20, 2015

Have you seen D? (http://dlang.org/)

on Feb 20, 2015

[dead]

wbhart · on Feb 20, 2015

It's number 30 on the Tiobe list of programming language popularity. I'd hardly call that dead.

bpg_92 · on Feb 20, 2015

So you don't want the D?

keyle · on Feb 20, 2015

What about D or nim ?

tomp · on Feb 20, 2015

I don't understand why vtables are implemented as hash tables.

The docs [1] describe that objects (of class type) only contain data (i.e. object fields), and no vtable pointers. Only when an object is cast to an interface type is a vtable generated and passed along with the object pointer in a fat pointer.

However, instead of just structuring the vtable as an array of pointers to methods (ordered e.g. by name), Loci instead generates a hashtable with method resolution stubs in case of conflicts. I don't understand why - since the target interface's type (and methods) are known at compile-time, it would be just as easy to fix the ordering of the methods and use the array approach (like in C++), instead of using a hashtable.

[1] http://loci-lang.org/DynamicDispatch.html

wvenable · on Feb 20, 2015

This is the reason:

"This design also differs from C++ in that vtable pointers are not stored in objects, but are part of an interface type reference. This decision is particularly appropriate to the language, since Loci doesn’t require classes to explicitly implement interfaces"

tomp · on Feb 20, 2015

This is irrelevant. Every time an object is cast to an interface, the compiler needs to know the definition of that interface. Therefore, it's just as easy to generate a hashtable or an array.

scross · on Feb 20, 2015

This question was asked by someone else earlier and I answered it in full here: https://github.com/scross99/locic/issues/1

I hope this helps.

dasmithii · on Feb 19, 2015

After a quick read, I'm impressed. Looks like general improvement over c++.

mastodont · on Feb 20, 2015

But why? If it is already like C++ why build a new language?

scross · on Feb 20, 2015

I think this page answers your question: http://loci-lang.org/LanguageGoals.html

Basically I've had a lot of experience working with C++ on various projects and while I like it very much I've also observed its weaknesses (right now I'm struggling against slow build times); after a while it seemed logical to build a language that would solve a lot of these problems so that I wouldn't have to face them over and over again for each project.

iamed2 · on Feb 20, 2015

I think this may fill a void in the "teaching language" space. It doesn't require lots of boilerplate but retains some key teachable moments (type system, pointers, stack vs heap, interfaces, etc.). I think a language with this sort of brevity and breadth of features would do well as a replacement for the Java/Python/C/C++ hodgepodge that currently exists at many universities.

yellowapple · on Feb 20, 2015

So the kicker: is it ABI-compatible with C++? I've come to like the C compatibility in Julia and Rust (and I'm sure I'll come to like the C compatiblity in Loci, too), but the relative lack of C++ interface support (something that I know Julia's been working on, at least based on what I've seen on the mailing lists) has been a bit of a sore point for integrating with C++ projects.

lucozade · on Feb 20, 2015

A quick look at the docs make this a definite no.

The name mangling scheme is similar to (one of) C++'s so that could probably be made more compatible. However the vtable mechanism is completely incompatible and given the reasoning in the docs (to support structural typing), I don't think it's a big stretch to say that it's a fundamental incompatibility.

TBH I'd be quite surprised if many (any?) languages will aim for C++ ABI compatibility that's much beyond simple extensions to the C ABI. Even ignoring the lack of a standard C++ ABI, once you start getting into virtual functions, exception handling and, especially, templates, you end up constraining your language to a point where you're probably better off just using C++.

I may well be wrong on this, LLVM helps a lot with the grunt work of name mangling, exceptions, trampolines etc. so it may be possible to get to a sweet spot that supports a significant number of important libraries without hamstringing the language too much. Not going to hold my breath though...

Daemon404 · on Feb 19, 2015

I can't find any mention of if they provide a C-compatible ABI. Kinda useless as a systems language if they don't - you can't call it from anything else.

ori_b · on Feb 19, 2015

http://loci-lang.org/CompatibilityWithC.html

Daemon404 · on Feb 19, 2015

Perhaps I scanned too quickly. I probably missed since there is no actual mention of ABI (or e.g. which calling conventions it supports on what platforms). The page is confusing - it seems to discuss ABI and syntax and stuff, which isn't exactly related to "compatability" afaict.

I digress... just the usual problems with wiki-type sites.

wvenable · on Feb 20, 2015

It compiles to LLVM and LLVM abstracts the calling convention. So it might be safe to assume it supports all the calling conventions that LLVM supports.

Daemon404 · on Feb 20, 2015

Sure, but that requires knowledge of how LLVM works. Some dude checking out the language probably doesn't.

okamiueru · on Feb 19, 2015

> the language aims to have no performance overhead versus C and C++.

Could someone expand on this for me? What is the performance overhead typically associated with C/C++?

cgh · on Feb 19, 2015

It means Loci will perform similarly to C and C++, not that C and C++ have some particular overhead that Loci doesn't.

lucozade · on Feb 20, 2015

An example would be dynamic dispatch of virtual functions.

In C++ virtual functions are implemented as vtables attached to each object. As such you need to retrieve the object (possibly from the heap), there's a function pointer indirection, there may be trampolines if you're using multiple inheritance. That sort of thing.

The author has a nice piece in the documentation regarding how dynamic dispatch is implemented in loci in order to support structural typing. He explains the design choice, how it differs from C++ (a vtable at each interface reference call site) and gives some qualitative estimates as to how the performance would differ. It's worth the read.

C doesn't have dynamic dispatch built in to the language. If you want it, you'd have to code it by hand.

zem · on Feb 19, 2015

i think they mean that loci should have the same performance as c or c++.

heinrich5991 · on Feb 19, 2015

E.g. null-terminated strings without length perform badly in some operations.

shmerl · on Feb 20, 2015

Who runs the project? Who created it?

trevex · on Feb 20, 2015

Seems it was created by one developer called Stephen Cross (scross.co.uk). I am quite impressed by his accomplishment. The only thing I miss are custom allocators.

wbhart · on Feb 20, 2015

He's 2.5 years out of his undergraduate degree. This is incredibly sophisticated code and documentation for such a person. Quite aside from the language, the person himself is worthy of note!

perdunov · on Feb 20, 2015

The guy seems to be one of those super-productive programmers.

72deluxe · on Feb 20, 2015

We should ask him how he manages it - is it lack of sleep, loads of coffee and no social life?

I found that the best thing when I needed to be "productive".

jokoon · on Feb 20, 2015

Languages are tools you use for their syntax, not for the work they do for you or the abstract paradigms they encourage.

tmcb · on Feb 20, 2015

There may be some duality on the use of the word "language". If it is used in the literal sense, I agree. In another case (i.e., if you were referring to languages and their execution environments), I beg to differ. For the end user of a programming environment, differences on syntax are mostly tangible in the aesthetic sense. Some languages are more readable or expressive, indeed, but, in the end, syntax has something to do with our perceptions on how beautifully the code lays out.

Execution environments, on the other hand, are tailored for a class of problems, incorporating useful abstractions. Yes, I know it is not so evident in a world where we have general-purpose programming environments by the dozen; when we talk about ___domain-specific languages, this makes a lot of sense. They can only gain expressiveness if they can represent bigger abstractions with fewer words.

pjc50 · on Feb 20, 2015

Eh? Languages are tools you use for their ecosystem, and especially for what work you can have done for you and what kinds of correctness the language encourages. Syntax is something you have to put up with as part of constructing a program.

jokoon · on Feb 20, 2015

languages can be cross platform. you can talk to 2 different person and they will understand you.

syntax is a huge help, it's a shortcut to being able to understand what the program does. syntax is good.

ssalazar · on Feb 20, 2015

You are probably a troll, but:

> A language that doesn't affect the way you think about programming, is not worth knowing.

- Alan Perlis

jokoon · on Feb 20, 2015

I don't think abstract paradigms really fit into "the way you think about programming". Abstraction is software design, it doesn't belong to algorithms and how you make them work in an environment.

oxryly1 · on Feb 20, 2015

It bothers me that this is offered anonymously. I can't invest any time into checking it out without any concept of who produced it.

adnzzzzZ · on Feb 20, 2015

If that's seriously a problem then just don't check it out. I personally find it ridiculous that it's a requirement for someone's identity to be attached to a project in order for someone to even consider seeing what it's about.