Hacker News new | past | comments | ask | show | jobs | submit login
Why Is SQLite Coded in C? (2017) (sqlite.org)
372 points by jeffreyrogers on March 14, 2018 | hide | past | favorite | 345 comments



I've commented several times before that I consider "C" and "C with analysis backing" to be in practice two different languages. SQLite is the latter: https://sqlite.org/testing.html

Writing SQLite in "plain C" without all that would, well, simply not work.

I agree that "C with analysis backing" is the best language for SQLite right now. However, it should not be used as an example of "How C is a great language for programming in" unless you are also planning on skillfully using a very significant subset of the tools listed in that document. SQLite doesn't prove C is great; it demonstrates how much additional scaffolding is necessary to wrap around C to get that level of quality, and given the quantity and diversity of tools we are talking about, it is not particularly complimentary to "plain C".


I agree. SQLite gets away with using C because it literally uses military-grade levels of verification. As John Regehr pointed out, SQLite is quite possibly the only piece of software that goes to that level of validation and testing without being required to by law.

It's not just a matter of skill, either. The cost in terms of money and time needed to develop software in that way is completely impractical in almost any commercial scenario. Aside from some very specific situations, it's not an economically viable way to produce software.


it is not economically viable for entertainment or disposable "apps", but extremely required for any serious, mission critical software. Seriously, the comments here betray how many people are in disposable software careers.


The comments here betray how much of the software economy depends on developer productivity. The fact is that SQLite style verification is not practical for almost any software, very much including "mission-critical" software.


Yes, but look how much of the software economy's infrastructure* depends on underfunded products. OpenSSH, GnuPG and OpenSSL are just 3 projects which are installed on pretty much every Linux server on the internet, including the servers of billion-dollar-businesses. It got a lot better in recent years, but still: Quite a few economically viable software companies just depend on free labor for mission-critical software product which take a lot of resources to become solid.

And while we are at it: http://www.openbsdfoundation.org https://gnupg.org/donate/ https://www.openssl.org/support/donations.html


Can't it be both? We need it right now and we don't need to to work perfectly or last forever.

Really, we're talking about the same thing like they're two separate things: The software [1] (in this case, SQLite) and the application (whatever tables, queries, etc. you need to solve your problem) are used together. So we build the poorly tested, quick and dirty application on top of the well tested, solid software.

[1] Yes, I realize that "software" is a terribly generic term to use to mean "well designed and tested software" and that "application" is also a terribly generic term to use to mean "hastily designed and untested software". Feel free to mentally substitute your own terms if you have better ones.


I wish that were true. I've worked on products that are very much not "disposable" and I've seen some terrible code and very poor testing infrastructure. It's especially bad in shops that were not primarily writing software and had to integrate software solutions to remain competitive. Execs don't understand it, don't care for it, don't invest enough resources in it. It's often delegated to some 3rd party and integrated like a black box in their solution.


First: Apps can definitely be mission-critical. If you're lost somewhere dangerous, your GPS and mapping app can be as mission-critical as it gets. Your app which makes a phone call (and, yes, I'm calling the be-a-phone functionality of a cell phone an app here) can be mission-critical.

Is that software held to the standards of mission-critical software? Probably not.

Second: There's a lot of space between the software which is recognized as being mission-critical and software which is just Bioware's latest patch target, and that software includes things like web browsers and the web server software which hosts your company's website. Given how essential a web presence is to, you know, being able to make money, web server software is hardly disposable, but it is not held to the rigorous standards of mission-critical software.


I think you're using the words differently than the post you're replying to. As defined in that post:

"apps": entertainment or disposable

"software": serious, mission critical

Aside from the difference in terminology, it sounds like your opinions go in the same general direction.


Regarding the viability, another thing to consider is that C gives you an ubiquitous compatibility.


The Opus codec has an impressive amount of testing behind it as well:

https://www.ietf.org/proceedings/82/slides/codec-4.pdf


In Germany there are strict regulations if you're writing secure software, especially under the lens of BSI and of the Bundesdruckerei. Softwares which deal with eID are verified and certified with strict rules


> In Germany there are strict regulations if you're writing secure software, especially under the lens of BSI and of the Bundesdruckerei. Softwares which deal with eID are verified and certified with strict rules

just curious, does this extend to dependent libraries as well ?


Yup


Most/All these tests are done by machine. Debugging (in production) (while your service is down) costs much more then writing a bunch of tests. The more states your program can be in the more rigorous tests are needed.


SQLite went through thorough testing, not verification.

These are two different things.


> SQLite went through thorough testing, not verification. These are two different things.

I think you're confusing the terms "formal verification" and "verification". Testing is a subset of verification, but not of formal verification.


SQLite is litteraly used by the military in their Android phones.


It's literally used by every Android phone, military or not. It can be used in iOS apps, too.


> OOM tests are done in a loop. On the first iteration of the loop, the instrumented malloc is rigged to fail on the first allocation. Then some SQLite operation is carried out and checks are done to make sure SQLite handled the OOM error correctly. Then the time-to-failure counter on the instrumented malloc is increased by one and the test is repeated. The loop continues until the entire operation runs to completion without ever encountering a simulated OOM failure. Tests like this are run twice, once with the instrumented malloc set to fail only once, and again with the instrumented malloc set to fail continuously after the first failure.

That's pretty awesome actually.


I disagree. Another comment here mentions "military grade verification" (?!) is required to write such high quality C applications. No, not at all, unless you count knowing what you're engineering as Military Grade Awareness. You seem to be saying you prefer the code-slinging style of "see if it works, it seems to work... release it". I've been including SQLite in various applications for 10 years, as well as have been coding, primarily in C, for 35 years. If you are not testing your work to a degree of statistical awareness of it's behavior, you're an irresponsible cowboy, hardly a true professional, and probably a liability to you and your team.


This is almost a "no true scotsman": in practice, a lot of C projects and programmers (even ones that, by most other metrics, would be regarded as professional programmers) do not have the resources to test as extensively as SQLite.


It's more than that, it's a manipulative comment, designed to put others on the defensive about their own skills and muddle the issue.

Any real-world project will be developed unter specific time constraints and have a certain budget. There will be a mix of developers and other roles in the team, with varying skills. There will be politics involved and the project will probably be in use for decades, ensuring that all of the above will significantly change during its lifetime.

Under such conditions, C is an abject failure at supporting robust, quality code, because it neither benefits from a culture of safety (like Ada or even Rust) nor does it have any enforceable safety switches or indeed safety-by-default.

Anything that can be screwed up, will be screwed up over the years by someone, some configuration option, procedure or project management decision. Java is on average significantly safer than C or C++, because no matter what happens, the language won't allow certain errors to happen.

C can be assumed to be a hopeless case, but a purely theoretical way of avoiding these kinds of issues is to have a sufficient supply of bsenftner clones which will supervise a project through its development and operation until it's retired from production (assuming bsenftner is as awesome as they say).


I agree, I'm paid by the hour to deliver a set of features, and the client very, VERY rarely would pay for me to start 'diverging' and writing test cases for something they see as superfluous. A bit of field tests (ie, users as testers) and if it works, it ships.

I don't make the rules. The scary bit is, often, it works just fine; in many cases, it's cheaper to reboot/relaunch for a 1/100000 crashing bug than paying me to find it and fix it!


That's one of the reasons I'm pretty happy to work in a place where my manager will generally agree if I tell them that I'm planning to spend the next 3 months improving the test suite or harness or adding one layer of safety to eliminate a class of bugs or adding a new static analysis.

Of course, this is also a place where we practice static analysis, continuous integration, code reviews and we're now progressively migrating code to Rust.


The evidence is on the side of the parent poster.

Incredibly experienced programmers write C without those tools, and with practically no exceptions memory issues, crashes, and vulnerabilities surface if the code is large enough and the project used enough.

An easy example of this is the Linux kernel. They historically had relatively few verification tools and have one of the largest C codebases out there. There's a new vulnerability related to memory safety every two or three weeks. There's a bug fixed due to C memory management at least daily.

None of those bugs would have happened in Rust. Most of them couldn't have happened in well written C++ (whereas this well-written C is chock full of them).

Note as well that the kernel now gets some of the most exhaustive verification of any project (such as from google's syzkaller, etc), and it still has tons of C memory issues.


"Incredibly experienced programmers" ? The language is simple and well defined to the degree one can map out exactly what happens on paper, if need be. Everyone is so brainwashed to think C is some low level assembly like language that is "hard". No, not at all. It simply provides no training wheels or safety harness - meaning one needs to know what they are doing, not coding in a "type and see what it does" style.


> The language is simple and well defined to the degree one can map out exactly what happens on paper, if need be.

Without a copy of the language standard, I still can't remember all the intricate details about just arithmetic and comparison operators. Take the simple bitshift operators. What happens when the second operand is negative or larger than sizeof(left)*CHAR_BITS? What if it's equal? And the >> and << operators are not even symmetric in their definitions: left-shifting a negative value is undefined but right-shifting a negative value is implementation-defined! And even for simpler operators like + or >, can you immediately tell me all the integer promotion rules and usual arithmetic conversion rules? What happens when you add two integers with different signedness and width?

I can't and I doubt most programmers can. Try implementing a special C interpreter where every undefined and implementation-defined behavior is configurable and you will see how complicated things are.


> What happens when the second operand is negative or larger than sizeof(left) * CHAR_BITS? What if it's equal? And the >> and << operators are not even symmetric in their definitions: left-shifting a negative value is undefined but right-shifting a negative value is implementation-defined!

As a lover of C, these questions are totally irrelevant to me. What I've taken away is don't do things that do not have intuitively clear semantics. Use bitwise and shift operators only on unsigned numbers. Simple as that. And really I haven't found a need to do anything else. I think many of these complex rules are historical accidents or come from supporting a peculiar architecture or kind of hardware that's no longer relevant. In any case, remembering all these pesky rules is not C. C is simple. Remember how to stay away from the dubious operations and you're fine.


    uint32_t foo(uint8_t ubyte) {
        return ubyte << 24;
    }
Look ma, undefined behavior due to signed integer overflow (on 32 bit targets anyways), despite only unsigned types being declared.

And what about the aliasing rules:

    char *start, *end;
    //This is a function that sets start/end to the start and end of an address range.  It takes uintptr_t * arguments
    GetAddressRange((uintptr_t *)&start, (uintptr_t *)&end);
    *start=3; //Undefined behavior because we have now accessed start as both a uintptr_t and a char *.


The first is nice, but I think you should make sure that ubyte really has that many bits for it to make sense, no? And doesn't your compiler emit a warning if it takes advantage of that undefined behaviour? (e.g. using the upper part of the register where that byte is stored).

The second, why do you cast as uintptr_t? It makes no sense to me at all. Yep, it might be a little bit advanced, but it's well known that you should only have correctly typed pointers, or void pointers, or char pointers, to any object. It's not a totally obscure thing, just as most people understand the need for "restrict". I'd say the code looks bad to me at first sight, and even a beginner is unlikely to write it because it's a complicated construction.

You have your point, but anyway the pointer-type rule makes sense and I don't know a better version of the rule that could prevent many unnecessary reloads.


With the first, John Regehr found many, many examples of it in real world code, and in some cases the bug he filed was closed without fixing it because "no compiler currently will do the wrong thing with it." I'm not aware of any compiler that emits a warning for this with default compilation flags.

The second is based off of code where an operating system call would get the mapped ranges of a shared memory region. For whatever reason, it took a MemoryAddress which was essentially a pre-c99 uintptr_t defined by the system headers (an unsigned value large enough to hold an address). The casts were there because the compiler warned about incompatible pointer types.

I was never able to convince the engineer who wrote the code (~20 years of C experience) that this code was wrong (it worked fine until intermodule inlining was enabled). Instead we just enabled the compiler option to disable optimizations based off of aliasing rules.

You are both greatly overestimating the degree to which programmers understand C, and overestimating the degree to which programmers who do understand C do not make mistakes. I've had lots of people say that the code in the second example "looks bad to me" but the commit still made it through code review and was being run for 5 years before optimization flag changes caused bugs to appear.


> "no compiler currently will do the wrong thing with it."

Well then, that's nice no? I don't like to play language lawyer. If there are machines were it can lead to errors, it might be bad machine design. The compiler or the user should work together to catch the error / specify more precisely what is the intent. If we can assume that (ubyte << n) == 0 for any n >= 8, I'm very fine with that, too.

I don't think compilers should exploit every undefined behaviour in the C standard. Some of those might be there only to support rare, quirky architectures.

> For whatever reason, it took a MemoryAddress which was essentially a pre-c99 uintptr_t defined by the system headers

That's inconvenient that the API forces such a strange type to the user. But I think the logical usage is to declare "start" and "end" as uintptr_t then. In this case there should be no problem, or do I miss something?

> You are both greatly overestimating the degree to which programmers understand C, and overestimating the degree to which programmers who do understand C do not make mistakes.

All I can say is I've never had this kind of problem really, so far. The kinds of problems I get by using safe toys on the other hand (e.g. array access only possible through proxy containers) are incredibly more painful, because these approaches are extremely bad for modularity and elegance. You have to specify the container, and also its type, or at least implement all kinds of boilerplate interfaces. This is incredibly restricting.

Everybody makes mistakes and while some memory safety problems definitely happen regularly even to very experienced coders, and are harder to detect than e.g. an exception, they are also the minority of the problems I encounter, even when counting the occasional buffer overrun or similar. Even in terms of data security, I figure more exploits are high-level (i.e. logic errors not catched in a high-level language - SQL injection or other auth bypass) than clever low-level ones. And not every type of program is extremely security-sensitive.


>> "no compiler currently will do the wrong thing with it."

> Well then, that's nice no? I don't like to play language lawyer. If there are machines were it can lead to errors, it might be bad machine design. The compiler or the user should work together to catch the error / specify more precisely what is the intent. If we can assume that (ubyte << n) == 0 for any n >= 8, I'm very fine with that, too.

> I don't think compilers should exploit every undefined behaviour in the C standard. Some of those might be there only to support rare, quirky architectures.

In this case, it would be legal for a compiler to assume that ubyte is less than 128. It's not unreasonable to assume that at some point in the future a compiler writer will discover a way to get a 0.1% improvement on whatever the current artificial benchmark of the day is and implement an optimization for it. The landscape of C libraries is littered with undefined code that used to work everywhere and now doesn't for just such a reason.

>> For whatever reason, it took a MemoryAddress which was essentially a pre-c99 uintptr_t defined by the system headers

> That's inconvenient that the API forces such a strange type to the user. But I think the logical usage is to declare "start" and "end" as uintptr_t then. In this case there should be no problem, or do I miss something?

Yes, the correct way is to declare them as uintptr_t and then assign them to a pointer of whatever type you need. It's not uncommon for systems software to treat addresses as integers in places, because they actually do represent specific addresses. I'd have to re-read the specification, but I think that having it take a void instead of a uintptr_t * would have the exact same issue, so I don't think it's the odd API choice that causes this.


I don't think there should be a problem. I think you can cast to void * and back as much as you like, but you aren't supposed to ever access the object as something else than char or its original type.


> I don't think compilers should exploit every undefined behaviour in the C standard. Some of those might be there only to support rare, quirky architectures.

Well, that is one way of looking at it. My personal experience is that if there is an undefined behavior in the C standard, it will be exploited, eventually. I have seen this break millions-of-lines applications during compiler upgrades and/or flag-twiddling. As you can imagine, debugging that problem was a nightmare.

[...]

> Everybody makes mistakes and while some memory safety problems definitely happen regularly even to very experienced coders, and are harder to detect than e.g. an exception, they are also the minority of the problems I encounter, even when counting the occasional buffer overrun or similar. Even in terms of data security, I figure more exploits are high-level (i.e. logic errors not catched in a high-level language - SQL injection or other auth bypass) than clever low-level ones. And not every type of program is extremely security-sensitive.

I believe that you are right in terms of number of exploits, but for the wrong reasons.

I can't find the statistics on this, but from the top of my head, ~20 years ago, 80% of the security advisories were buffer overflows/underflows. At the time, just about everything was written in C/C++.

These days, we see lots of high-level exploits, but I would tend to assume that the reason is that 1/ the web makes it very easy to carry out attacks; 2/ C and C++ have a very small attack perimeter on the web, even if you include Apache/nginx/... as part of the web.

Also, yes, web applications, especially in dynamically typed languages also would require military-grade testing :)


The first is only undefined on 16 bit targets as ubyte is first promoted to int, and int is always at least 16 bits but on typical general purpose machines 32 bits. And in any event modern C compilers will emit a diagnostic when constant shifts are undefined--just not in this case because it's not actually wrong.

The latter problem[1] is applicable to other languages, including Rust, when they permit such type punning. C supports type punning because occasionally it's useful and necessary. The above invokes UB on Rust, too. Type punning has such a horrible code smell that you don't need unsafe{} as a gateway. Maybe for very inexperienced programmers, but they shouldn't be writing C code in situations where type punning suggests itself anymore than they should be writing assembly or unsafe Rust code.

C has many pitfalls, but I don't think I'm being overly pedantic here. There's no question C gives you a significant amount of rope to hang yourself (sometimes affirmatively but usually by turning its head when you steal some), but all languages give you slack to escape the limitations of how they model the environment--the points of contention are how much slack and at what cost. Arguing that requires more precision.

For example, even if we can agree that automatic arithmetic conversions are on balance a bad idea, it's still the fact that there's some benefit to that behavior, such as making your first example well-defined. That's not coincidental.

[1] It's actually only a problem if that routine actually stores the value through a uintptr_t pointer. If it casts the parameter to pointer-to-pointer-to-char before the store it's fine. You can cast a pointer to whatever you want, as many times as you want, and never will the casting, alone, change the semantics of the code. It's only actual load or store expressions that matter, and specifically the immediate pointer type they occur through.

Where you typically find code like this you do the correct thing--the abuse of function parameter types in this particular manner is usually because of const-ness headaches and the lack of function overloading (though now there's _Generic), and in those cases you're already paying attention to type punning pitfalls because you're already at a point where you're trying to skirt around relatively strict, inconvenient typing. If you're not then, again, you're not going to be doing the right thing when writing unsafe Rust code--and there are many more cases where Rust's strict typing is inconvenient, so arguably there's _greater_ risk with a language like Rust when the programmer is lazy or tired and resorts to subverting the type system.

Moreover, this isn't even a pointer aliasing problem. Pointer aliasing problems occur when the _order_ of loads or stores matters, but because of type punning the compiler can neither determine nor assume that access through variables of _different_ type point to the same object and thus it mustn't reorder the sequence. Not only is your example not a case of a single object accessed through multiple types across dependent loads and stores, it's not even a case of accessing through the same type. Unless you use the restrict qualifier, the compiler always assumes that two variables of the same pointer type alias the same object. Whether it's the _wrong_ type is a different, less consequential problem related to the potential for trap representations or alignment violations. But if the hardware doesn't care you'll get expected behavior; it's not the type of pitfall that threatens nasal daemons.


> The first is only undefined on 16 bit targets as ubyte is first promoted to int, and int is always at least 16 bits but on typical general purpose machines 32 bits. And in any event modern C compilers will emit a diagnostic when constant shifts are undefined--just not in this case because it's not actually wrong.

If ubyte is greater than or equal to 128, then converting it to a signed 32-bit integer (as on a 32 bit machine) and then shifting it left by 24 causes signed integer overflow. This is undefined behavior. Therefore the compiler is allowed to make optimizations that assume ubyte is less than 128.


Why is the ubyte promoted to int? I'd always assumed it would stay the same size.


Look up integer promotion rules. Basically almost all arithmetic operators promote their operands to int/unsigned int before doing anything.

And you are the person who replied to my earlier post saying "don't do things that do not have intuitively clear semantics." Do integer promotion rules count as intuitively clear semantics? Most C programmers never learned them and therefore the semantics is confusing. For the few C programmers who learned them, the semantics is clearer when they can remember the rules, and confusing otherwise.

Now I hope I've convinced you that there are a lot of such subtle semantics issues in C, and hardly anything is intuitively clear, unless you've been a language lawyer or programmed in C for long enough.


Point taken. I agree that the weakly typed nature of C (integer promotions / implicit casts) can be problematic. Would be interesting to see if there are programs that would be painful to write with a stricter model.

It seems I've never really done "overshifting", otherwise I would have easily noticed that the shifted value is promoted first. If you don't overshift there's no way to trigger the undefined behaviour, even when you shift multiple times - since by assigning to the smaller type the result gets cut down again, effectively leading to the behaviour I'd assumed. I would hardly call this a gotcha.

Where it might be confusing is in a situation like this:

    char x = 42;
    printf("%d\n", x << 4);
But then again I'm undecided if the behaviour is that bad. It might be even useful.


"type and see what it does" is a strawman. Some people absolutely do program this way, I don't deny that, but there's a huge gulf between that and the levels of verification it takes to ensure that a large program really does behave as intended. Typos, minor oversights, unforeseen consequences of refactoring. . . the number of things you need to guard against is immense, and nobody can be operating at 100% 100% of the time.


And very few people can know 100% 100% times. We build software on top of abstractions. Abstraction leaks, abstraction breaks and abstraction isolates you from "know what you are doing".

If you build software in C, you will build abstractions in C, and debugging / analysis tools help to understand these abstractions.


> It simply provides no training wheels or safety harness - meaning one needs to know what they are doing

Meaning that if you are human you will make mistakes.

My point is that people who do know what they're doing still make mistakes. Clearly the tool does not match the reality of human cognition.


> Meaning that if you are human you will make mistakes.

You can make mistakes in any language, and they will bite you in the backside any way.

Just because some people prefer to live in padded rooms that doesn't mean they can't or won't get hurt if they decide to pull stunts. Padded rooms just provide them with a false notion of safety that ends up making matters worse.


> You can make mistakes in any language, and they will bite you in the backside any way.

This is a false equivalency.. It's akin to saying "you can hit your thumb putting in a nail with any tool, so you might as well use a rock not a hammer since they're both bad".

Yes, you can make mistakes in any language, but that's not a reason to ignore modern tooling and advances and continue.

Other languages allow us to build abstractions and solve mistakes in ways which are just impossible with C... and without those abstractions, it's just needlessly difficult to keep all the state in your head and write correct code.

An easy example of this is locking and ownership. In C, mutexes are managed manually and what they hold is by convention. It's easy to write races or accidentally deadlock (again as evidenced by such a bug in the kernel existing every few weeks).

In languages with RAII, python's "with", etc, you can't forget to unlock the lock. It's one less thing to think about.

In languages like rust, it's possible to model in the type system that a given resource can only be accessed while a mutex is locked. Again, one less way to make a mistake.

C has no ability to provide or build abstractions that are this robust.

Using C is, more often than not, like using a rock to hammer in a nail.

> Just because some people prefer to live in padded rooms

With C, people are living in a room where the floor is lava and nails coat most surfaces.

I'll take my room that's carpeted, and if I need to pull a stunt and rip up the carpet, sure, I can do that, but at least I can walk normally most of the time.


> It's one less thing to think about.

I've been a professional C++ programmer for almost two decades (and a C++ amateur for almost a decade before that). I've written large amounts of fairly advanced C++ using the latest standards.

My current job however, is all plain C.

One of the interesting (and, for a C++ fan like me, disturbing) things I've found is that the cognitive load is much lower when writing plain well-structured C.

Sure, you need to remember to release that lock and free that memory but this you can do in a very structured way. What you win over C++ is that you don't need to view each line of code with suspicion ("What happens if this line throws an exception?", "I wonder what this line will cost in terms of performance?", "Could this operation cause a cascading destruction of objects I'm going to access further down?").

I love RAII, yet I've debugged enough crashes where "seemingly innocuous operation destructs some local object that has an owning reference to an object that shouldn't be destructed just yet and BAM!", that I'm beginning to doubt its usefulness outside of an exception-generating language (in C++ it's essential for proper exception-handling).


Even from a C amateur point of view I feel the same way in my learning of the language. I can’t speak to multithreading or very large applications yet— but the view from the ground looking up is that C is relatively straightforward. The small size of the language is something of a relief.


It's not very surprising for me, because C is a simpler language and it's easy to paint yourself into a corner with C++.

The critical question is: what does the error rate look like for code of similar complexity? It's very possible that C programmers will try to keep things simple, because the language doesn't support them and they have to be extra careful. The flip-side is that they probably can't develop projects as complex as C++ would enable them to.


> Sure, you need to remember to release that lock and free that memory

It's always bothered me how complex of an operation people seem to think this is.

    errno = pthread_mutex_lock(&m);
    if (errno == 0) {
       do_something();
       pthread_mutex_unlock(&m);
    }
Short of a segfault or 'do_something' doing something really, unusually stupid, you can't avoid freeing the mutex.


Yeah, most projects don't consist of a single function locking and unlocking a mutex. :) Try doing the same in five 200 LOC functions with multiple returns when three different threads are sharing the data, for 10 years, with a yearly personnel turnover of 20%.


Been there done that, fought similar problems. Deadlocks abound.

Only difference? We were in Python.


Something really unusually stupid like returning from the function (assuming do_something() is a standin for arbitrary code, not specifically a function call)?

This is not even remotely the worst thing about mutexes though, so it wouldn't be why I would suggest avoiding them.


Honestly, to help avoid mistakes, I would keep it as a function call. Sure, it adds to the stack depth, but it also ensures a separate return doesn't cause the lock to be lost.

There's also the goto pattern, but anytime you separate a lock from an unlock by more than a screen's worth of text, they're forgotten.


And inline functions don't even add stack depth. It's still something that happens manually though. Every item that needs to be done manually and doesn't cause obvious failures when not done adds to development costs.


And this is my unpopular complaint of programming in Rust, there is so much complexity not to do with the type system that reasoning becomes a pain.


I'm glad you mentioned Python. Python, with its developers who accept raw pickle objects from the wild and are surprised when arbitrary code can be executed. Ruby's (and Python's) YAML libraries which execute arbitrary code. Javascript (and Ruby, and Python) developers pulling in untrusted and/or poorly tested libraries just to save a few lines of code. Rust with its `unsafe` blocks.

Seems like that padded floor has some rusted nails hiding right behind the pretty fabric.

RAII is not something limited to Rust, or C++, or any other language. The abstraction underpinning RAII can be done and has been done in C; you can see it done repeatedly in the code for collectd.

Its up to the developers to make their programs safe and reliable. No language to date will do that for them.


> Its up to the developers to make their programs safe and reliable. No language to date will do that for them.

But languages do make a huge contribution. For example, Rust, Ada and Modula-3 are all much safer by defaults alone compared to C. Most Rust code sits outside unsafe blocks, so the existence of this feature does not prove there is no point to Rust.


> does not prove there is no point to Rust.

I didn't say anything along those lines. I said that it's up to developers to make their programs safe.

Defaults matter, no doubt. But they are not a silver bullet; greater base safety can even cause people to become lax when thinking about safety, resulting in even bigger problems. Why do Python developers accept and parse untrusted pickle objects? Because Python is safe, and they don't have to think about what's going on under the hood.

It's indirectly related to computer programming, but a study was done in Europe which showed that crashes in AWD vechicles were, on average, much more severe than 2WD vehicles. Why? Because of the added stability of AWD, people drove faster in adverse conditions.


C doesn't have destructors, so how do you release resources acquired with RAII when the acquiring object goes out of scope?


C programmers, in my experience, will use a "goto cleanup" pattern to emulate RAII in this case https://softwareengineering.stackexchange.com/a/154980


I'm glad that you are a video game programmer and you don't work on software where correctness matters.


You clearly don't understand the world of video game programmers. I worked on a game (Banjo Kazooie) that ended up being turned into more than 6 million ROM cartridges and distributed worldwide through retail. The cost of producing the cartridges alone was millions of $. With no internet patching available in case of a bug. We took verification VERY seriously.


I'm calling BS here.

Look at the types tests listed in the executive summary... which of these can be obviated by language features?

I'll try to break it down:

* Out-of-memory tests: no, all languages run out of memory. The explicit nature of memory management in C might help here, since there's no mystery about exactly where the failure might occur.

* I/O error tests: no, IO failures are independent of language.

* Crash and power loss tests: no, this has nothing to do with language.

* Fuzz tests: yes, partially. Different languages can reduce the range of invalid input a program could encounter.

* Boundary value tests: yes, partially.

* Disabled optimization tests: no, the optimizations being referred to here is a feature of SQLLite, which would need to be tested, regardless of language.

* Regression tests: no, this is independent of language.

* Malformed database tests: partially, because some languages are better at validating input statically, which covers some of the cases of malformed databases.

* Extensive use of assert() and run-time checks: I say no. Yes, asserts are code you need to write, and a cognitive load. But navigating the built-in language constraints of more type- and value- validating languages are also a cognitive load. I believe asserts are often a lessor load since they can be in the problem space of the software being developed vs. the problem space of the language design. In C the molehill comes to the mountain. OK, Ok, on the other hand, in C no one forces you to write good asserts. But then, in no language does anyone force you to write code that actually does the job.

* Valingrad analysis: yes

* Undefined behavior checks: yes

So, largely no. Other languages solve some problems, yet have their own drawbacks.

You're drawing a sharp distinction between "C" and "C with analysis backing", yet there is no such distinction in the real world. We see in this project what we would see in every successful, long-running project (and do NOT see, 100%, in every project that failed in the long-run): controls in place to deal with the potential pitfalls. One advantage to C is that it's history can give you some nice guidelines to understand when you have pretty good coverage on those, from a language standpoint.

I mean, I'm not arguing that some modern languages have their advantages. But, they have disadvantages as well. And, their advantages -- restricted as they are to static analysis -- solve little problems, not the bigs ones.


I never claimed that language features can obviate all the things. I'm just saying that you can't use SQLite as proof that C is A-OK okey dokey it's OK to just keep programming in it forever, because it does not use C. If you're not putting in that much effort to your C codebase, then you're not getting those results.

Also, while I'm sure you don't intend it this way, simply listing the issues like that is a bit visually deceptive in that our hindbrains tend to assume all those issues are roughly equally weighted, because they're all roughly equally the same size. But the distribution of issues there is very non-linear. C doesn't just have a bit of an issue with undefined behavior; it is nearly (but not quite) uniquely damaged by it.

Hypothetically some type systems could partially cover some of the things you've labelled as "no", but I'll aggressively beat you to the punch that such things are very hypothetical, except I'll be saying it with sadness and through grinding teeth, because I find it frustrating how thoroughly our environments ignore some issues like that. But that's a rant for a century I don't expect to live to see.


> If you're not putting in that much effort to your C codebase, then you're not getting those results.

My point is, if you're not putting that much effort into your codebase -- whatever the language you're not getting those results. There is no language that makes developing and maintaining SQLLite easy or straightforward.


>Hypothetically some type systems could partially cover some of the things you've labelled as "no", but I'll aggressively beat you to the punch that such things are very hypothetical, except I'll be saying it with sadness and through grinding teeth, because I find it frustrating how thoroughly our environments ignore some issues like that. But that's a rant for a century I don't expect to live to see.

I'd agree with you, except that just this morning I was reading a paper about the NARCISSUS framework that can automatically generate encoders and decoders that provably conform to a given specification, and can be extended by a user without rewriting or modifying the framework itself. The authors even patched a networking stack (in MirageOS) with the not-especially-optimized ML extracted from their generated code and showed a performance hit that could be considered acceptable for real-life use cases.

https://www.cs.purdue.edu/homes/bendy/Narcissus/

With advances like this in program synthesis and formal verification, plus real progress in machine learning, it feels like we're living in the start of the age that was _supposed_ to happen during the golden age of AI.

(Excuse the gushing hyperbole. My coffee must have been stronger than usual this morning.)


> I never claimed that language features can obviate all the things.

You did more than that: you've suggested that C was only usable if a pile of safety infrastructure by providing sqlite as your proof, and once that claim was debunked you kept insisting on your baseless assertion.

> I'm just saying that you can't use SQLite as proof that C is A-OK okey dokey

There is no need to prove that because it's quite obvious that C is fine. If you feel the need to prove otherwise then you need to put forward your own proof. Either you support your baseless assertions with rational and tangible claims or you're just venting an irrational dislike. Meanwhile the world runs on C and has been running for decades.

> it's OK to just keep programming in it forever, because it does not use C. If you're not putting in that much effort to your C codebase, then you're not getting those results.

You're somehow turning a blind eye to the fact that that "effort" is patently language-independent and has absolutely nothing to do with C. Either you somehow missed the whole point of the post or for some reason felt the need to keep repeating baseless claims that were already debunked.

I have no idea why you've developed an irrational hatred of C, but it clearly is a personal issue, not a technical one.


> There is no need to prove that because it's quite obvious that C is fine.

No. It is quite obvious to anyone rational and without a vested interest that C is no longer fit for purpose in this modern age.

> I have no idea why you've developed an irrational hatred of C

Let me guess. He's sick of patching security holes, memory leaks and crashes?


Wow.

Any one of the 3 test subsystems the describe is already ahead of well more than 99% of all software projects being developed. How is this funded?


This talk is really interesting too https://www.youtube.com/watch?v=Jib2AmRb_rk big industry projects (like the airline mentioned) probably contribute



> Consortium members can call any developer at any time, day or night, and expect to get their full and immediate attention

I wonder how this works. For sure even the most dedicated developers will every now and then be on a holiday / concert / drunk.


Wow, I never knew SQLite was that well tested!

I was recently looking at bedrockdb, and other DB built on top of it.

Anyone knew what happen to UnQLite ? By the same author.


My favorite aspect of programming in C is how little it gets in the way of the programmer. Once you become familiar with the language there is not much to lookup in the docs because it's such a small language. I might end up writing more verbose code, but it's usually very clear to me. You still can write spaghetti code but that's besides the point.


I mostly agree but I think there are too many weird UB and corner cases to qualify as a small language. I guess C compiled with -fwrapv -fno-strict-overflow -fno-strict-aliasing and a few others might come close to what you're describing.

I think C's apparent simplicity lulls me into complacency at times, I delude myself into thinking that I'm coding into some kind of macro assembly and I think I know what the resulting machine code will look like. And then some super weird optimization or UB kicks in and nothing makes sense anymore, because I stopped playing by the rules and I triggered the footgun.

Just look at the number of bug reports on the GCC bugtracker for code that at a glance ought to work and it turns out that it's actually not a bug, the code just triggered a subtle UB and the compiler ran away with it and generated code that ate your cat.


> I mostly agree but I think there are too many weird UB and corner cases to qualify as a small language.

I don't see the point of your claim regarding undefined behaviour. The rules are quite simple: undefined behavior means compiler-specific behavior. Therefore, if you aim for compiler independence then you don't use it. If somehou you decide yo target a compiler then you read the compiler's docs. It's that simple.

These UB complains are even more ridiculous when we realize they complain about the fact that the language is actually defined.


UB is not implementation defined, it's UB. Some compilers have options to defuse certain classes of UBs but then you're effectively coding in a non-compatible dialect of C. Otherwise you can't ever rely on a certain UB behaving one way or an other: a simple compiler update, code change or compiler flag modification could break everything. A compiler is under no obligation to define what it does in case of UB and that's the point of it, it leaves some room for aggressive optimization.

There are "implementation defined" details in the C standard but it's a different problem, see for instance: https://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html

Anyway that wasn't really my point, the problem is that some of these UB can arise because of subtle bugs in code that might not look suspicious at a glance. Things like breaking aliasing rules, mis-using unions, casting things that aren't compatible etc... Your code triggers an UB and you don't know it. Actually you might not notice it until you turn an optimization flag or you update your compiler and suddenly it doesn't do what you want anymore.

Even something as trivial as computing a pointer that's more than one byte after the end of an object is UB for instance (not dereferencing it, merely computing its address). For that reason `ptr.offset` in unsafe in Rust for instance, even though it doesn't dereference the pointer.


I find it a bit silly that `ptr.offset` is unsafe, but casting an arbitrary integer to a pointer isn't. E.g.:

    fn main() {
        // Look, an invalid pointer, no `unsafe` required.
        let ptr = 1000 as *const u8;
        // boom, segfault.
        println!("{}", unsafe { *ptr });
    }
Using casting one can even implement a "safe" pointer offset function, like so:

    fn main() {
        fn safe_offset<T>(ptr: *const T, offset: isize) -> *const T {
            ((ptr as usize).wrapping_add(offset as usize)) as *const T
        }
        let xs = [0u8, 10];
        let ptr = safe_offset(&xs[0], 1);
        println!("{}", unsafe { *ptr }); // prints '10'
    }
Obviously this "safe_offset" function can easily be used to trigger UB by computing invalid pointers, and not a single line of unsafe code was required (although we do need `unsafe` to dereference the bad pointer and actually trigger segfaults).


I believe this is because offset uses an llvm intrinsic with the extra requirements, as it uses that info for optimization. Your version doesn't.


Interesting. However, isn't casting a random (potentially invalid) integer into pointer triggering the same potential UB? I ask because for GCC it is apparently:

>When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.

https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-imple...

That's pretty explicitly what Thiez was doing in their rust code, although obviously Rust/LLVM might have different semantics here.


The docs list no undefined behaviour for inttoptr, so it's (probably) not problematic at the LLVM level: http://llvm.org/docs/LangRef.html#inttoptr-to-instruction .


The exact rules of unsafe code are still up in the air. It’s not explicitly defined as UB yet, IIRC, and when we set the rules, we have a goal of not invalidating large swaths of code.


> if you aim for compiler independence then you don't use [undefined behavior].

I think the point was that UB occurs in a lot of relatively common cases and programmers don't realize that they're depending on it/experiencing it.

> If [somehow] you decide [to] target a compiler then you read the compiler's docs.

Which is why the post you're responding to made the point about undefined behavior. "Read the compiler's docs" is in counterpoint to that post's parent, which praises C for being a small language, and thus one in which reading the language's docs is seldom required.


I'm not sure I get your point. You say that C is a tiny language, hence leading to small docs, but that can be said of every language? Except that they may have larger standard libraries, but nothing stop you from using a subset of them.


Python is a tremendously complicated language, even disregarding the standard library. Understanding all the "everything-is-an-object" magic that goes on behind the scenes is a burden which C does not share.

(Though, C does have its share of mental burdens. Understanding automatic type conversion is quite a beast.)


Though I wouldn't compare python and C, as they obviously don't target the same ___domain, for C, you have a whole world of UB and system interaction that you also have to keep in mind which is, IMHO, worse than the thing going on in Python.


The language specs is 700-page long, I wouldn't call that tiny: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Sure, C++ is double of that, but it's still long. In comparison, Java language specs is around 800 pages, Ruby 330 pages, and C# 500 pages.


> The language specs is 700-page long, I wouldn't call that tiny

To be fair, that now includes a substantial standard library. The language + preprocessor spec ends on page 197 and starts on page 19, for a total of 178 pages.


Language spec length is meaningless unless it's in some sort of quantifiable format.


None of the specifications you cite contains a formal semantics of the language in question; that ought to be the basis on which arguments about simplicity should be founded.

(Why the lack of formal semantics? My guess is that all of these languages, to varying degree, are too complex to easily formalize...)


Means nothing if you aren't talking about what that documentation actually includes. I currently work in a language that has like 10 pages of documentation, much of which is "TODO".


The length of the spec isn't really relevant is it? Or are you implying that C is about as complex as Java?


C is significantly more complex than Java. The memory model of Java is simple (multithreading notwithstanding), because it's memory-safe and garbage collected. Not so for C, which has very subtle rules, which result in undefined behavior if broken.


The rules for java memory management could probably fill 200 pages just by itself and it's not easy to debug issues with leaks, excessive overhead and the several layers and variants of garbage collectors.

C's memory model is trivial by comparison.


The question here isn't whether it's easier to write correct C programs than Java programs, it's whether C is a simple language.

Subtle UB, as much of a headache as it can be to a programmer using the language, in this case leads to simpler implementation of a compiler.


No, it doesn't. Maybe it did in 1978, but modern C compilers have to go to heroic lengths to do things like alias analysis that would be easier if the aliasing rules of C weren't so subtle. In Java the question of "can pointer A alias pointer B" is simple (unless typeof A derives typeof B or vice versa, the answer is no). In C, due to TBAA, it's hideously complex.


If you look at software engineering and static analysis papers for a language like Java, there's often a discussion about using stuff like 2-object, 3-call site context sensitive alias analyses built on more or less fairly standardized Datalog rules. If you look at C or C++, the response is generally "you want a precise alias analysis? fall down in riotous laughter".

The rules for TBAA are complex, and many C/C++ programs violate those rules because it's so hard to actually make sure you're not violating them, and half of the purpose of using C/C++ is actually to be able to do the kind of type-punning that TBAA prohibits.


I mean writing a compiler that fulfills the specification. Whether a design complicates the implementation of alias analysis doesn't really change whether it is simple.


I think you've pointed out the distinction many miss when talking about simplicity, which results in talking past each other:

simplicity for the user and simplicity for the compiler writer are totally different things. Java is simpler for the user but more complex for the implementer, and C is more complex for the user but simpler for the implementer.


> nothing stops you from using a subset of [the standard libraries].

This only applies if you're writing 100% of the code. Not having coworkers to mess around with every corner-case of the langage is a rare privilege.


C++ is a huge language by comparison, and very few people know every corner of it.


Very few people know every corner of the English language, or, say, French. This doesn't prevent us from communicating, though.


In Python, you _need_ to understand the GC and its class system - even if you ignore it, the standard library doesn't. In C++ the situation is similar unless you're actually writing C and compiling it with a C++ compiler. Go is fairly small for a modern language, but it does have a GC and a fancy type system.

OTOH preprocessor directives grow the base C a bit. It's still very tiny, though.


In what way do you need to understand the GC before using it? The only thing I can think of is avoiding circular references.


GC can cause a lot of issues if you care about how exactly long something takes.

Like all abstractions there are corner cases that force you to deal with the underlying complexity.


Go ahead and allocate and deallocate lots of tiny objects in a short period of time. Depending on GC, you'll either run out of memory, or grind down to a halt.


I don't write C but I have to imagine that the language being small doesn't really save you from having to look up stuff in documentation. If the language doesn't have it surely some library will and you'll have to read the docs no? The "problem" has just been outsourced.


"but that can be said of every language? "

What about Haskell?


I'm not sure what you mean. Haskell is a small language.


GHC Core is. Haskell is far from small

https://blog.chewxy.com/2014/09/03/small-languages/


If you think C++ is not overly complicated, just what is a protected abstract virtual base pure virtual private destructor and when was the last time you needed one?


I know the original quote was meant as a complaint about C++, but it’s always seemed to me to be more about how people were teaching C++ in the ‘90s.

What about the zinger, “if you think English is not overly complex, just what is a loud old fast tall red car, and when was the last time you needed one?” No language I’m familiar with limits the number of adjectives you can use for a single noun. That doesn’t mean you generally should string several adjectives together; usually you’re fine just saying “car,” but sometimes you need the ability to be specific. English is a complicated language, but limiting how many adjectives can apply to a particular noun would make it much worse in my opinion.

As for the original question, a pure virtual function is a function that must be implemented in a derived class, while a private function is a function that can’t be accessed outside of the class, not even from derived classes. A destructor is a function used to clean up when an object goes out of scope; I don’t think anything about the destructor is important in this case, it just happens to be a convenient member function to use.

The combination would make it impossible to instantiate objects of the class (because of the pure virtual function) or objects of classes that inherit from it (because the pure virtual function can’t be accessed, let alone implemented, from a derived class). That rules out much of what you’d want to do with a class. You’re left with static data and functions, and nested types. Overall, the best I can tell, you would use that to make the class imitate a namespace. But for the last twenty years, you don’t have to imitate namespaces: C++ has them.

You might still find somebody trying to avoid using actual namespaces because of a vague fear of argument dependent lookup, but to be honest, I’ve never seen argument dependent lookup call the wrong function. Some programmers oppose it in principle, but in practice it doesn’t seem to cause much trouble.


If you think C is not overly complicated, just what is a static const __declspec((dllimport)) __restrict volatile unsigned long long int* and when was the last time you needed one?


Considering that half of that is not standard C code, and the other is just a standard way of declaring a pointer that doesn’t differ much in any other typed language, i don’t think you’re making a point here.


I would agree, if you take out the __declspec((dllimport) then it's just a 'static const restrict volatile long long int * ', which has lots of qualifications but isn't really much more then just a pointer to a `long long`. The qualifiers make it a bit harder to reason about, but that's also why people rarely use `restrict` and `volatile` in the first place.

I could be wrong but I think the above poster hasn't actually written tons of C, or else they would have picked a much more complicated example. If you throw a bunch of arrays, pointers, and function pointers in it quickly gets out of hand, like this:

    int (* const(*foo)(int (*)(int, int))[5])(void);
That thing defines 'foo', which is a pointer to a function who's first argument is a pointer to a function return a int and taking two ints as arguments, and returns a pointer to an array of 5 const function pointers which take no arguments and return an int.

Thankfully, you rarely run into something like that in the wild. And if you `typedef` the function pointers (Which is a pretty common now) it becomes tons easier to read.


> That thing defines 'foo', which is a pointer to a function who's first argument is a pointer to a function return a int and taking two ints as arguments, and returns a pointer to an array of 5 const function pointers which take no arguments and return an int.

I find this much easier to understand than something with different keywords and qualifiers. Function pointers follow some simple rules to read, but can you be sure of what `volatile` implies ?


> I find this much easier to understand than something with different keywords and qualifiers. Function pointers follow some simple rules to read, but can you be sure of what `volatile` implies ?

Hmm, I suppose we just disagree then, which is fine. Things like `restrict` and `volatile` don't really bother me because:

1. The spots where people actually need to use them are very rare (`restrict` does have uses, but almost nobody uses it commonly. `volatile` has pretty much no correct uses outside of accessing registers in low-level code).

2. It's still just a pointer to a long long. I would need to look-up what __declspec((dllimport)) does, and I'd be highly suspicious of the use of `restrict` and `volatile` if it was not explained, but I still generally know what it is and how it can be used. Whereas with the example I gave, without staring for a long time or throwing it into a parser I can hardly tell what the type even is, let alone wrap my head around how it will be used.


>`volatile` has pretty much no correct uses outside of accessing registers in low-level code

Sorry, but what are you guys on about? You use "volatile" if you don't want the compiler to produce code that caches a variable. You use it all the time in embedded systems, or when writing multi-threaded code, etc. The usage of "volatile" is very clear, and should be well understood. "restrict" isn't used much because it's a c99 standard first of all, and second of all, it's like "inline"; it might make a difference, might not, and not a lot of people are in a situation where they have to shave off cycles.


> You use "volatile" if you don't want the compiler to produce code that caches a variable. You use it all the time in embedded systems, or when writing multi-threaded code, etc.

Thanks for providing a very nice exhibit of the confusion around this. Volatile is 99% useless for multithreading code (unless on MSVC which has a peculiar interpretation of the standard). See https://stackoverflow.com/a/4558031/1495627


It absolutely is not useless. You're the one confused. I never claimed it's a synchronization construct, or that it magically makes your code thread-safe.

No amount of fences or mutexes is going to help you if threads are operating on their own version of a variable that got cached in a register, or optimised away. You need to understand volatile to write correct multi-threaded code.

That StackOverflow answer is bogus, along with all the other comments. It's just a rant that, albeit correct, is barely related to the question asked, and it's a clear example why you shouldn't treat StackOverflow as more than an unreliable help forum. The guy asked about whether you should make variables shared inside of a critical section volatile. The answer is "yes, to prevent threads from working on stale data".


> No amount of fences or mutexes is going to help you if threads are operating on their own version of a variable that got cached in a register, or optimised away. You need to understand volatile to write correct multi-threaded code.

Any properly written Mutex or similar implementation is going to act as a full memory barrier and compiler barrier, meaning variables will already not be cached across lock/unlock. And if you're not properly taking your locks before accessing your variables, `volatile` is not going to save you. The fact is, if you're using a lock to protect a variable, marking it `volatile` gains you nothing and just slows your code down.

> "yes, to prevent threads from working on stale data".

`volatile` absolutely does not guarantee a variable doesn't contain 'stale' data. That's the entire reason you need memory barriers in the first place. Even through the compiler will read a `volatile` variable from memory every time, that memory may still have a stale value in the CPU cache, which `volatile` will do nothing to prevent. Only proper use of memory barriers ensures everyone is working on the same thing, which `volatile` does not do.


> Even through the compiler will read a `volatile` variable from memory every time, that memory may still have a stale value in the CPU cache, which `volatile` will do nothing to prevent

FWIW, it doesn't even do this, it's just a compiler-level annotation:

  volatile int x = 0;
  int foo() {
    // read
    int ret = x;
    (void)x;

    // write
    x = 0;
    x = 1;

    return ret;
  }
The 'volatile' ensures that that code results in two reads and two writes. Removing it allows the compiler to optimise down to the equivalent of 'int ret = x; x = 1; return ret;', but both with and without use the exact same read/write instructions (i.e. have the same interaction with the cache): mov on x86 and ldr/str on ARM.


Nobody's arguing that volatile has anything to do with CPU cache, or barriers, or locks, or any other nonsense! volatile is there to ensure that the compiler generates code that accesses the variable directly each time, which is necessary to write correct multi-threaded code. If you don't use volatile, and your code still works, then either the compiler's smarter than you, or you got lucky. End of story. The level of misunderstanding you guys are showing on the matter is beyond comical.

volatile's one of the simplest modifiers to understand if you care to actually learn the language. It has a clear, and well-defined purpouse, and i have no idea how you can spend so much time, and energy arguing that that's not the case.


> And if you're not properly taking your locks before accessing your variables, `volatile` is not going to save you.

Yes, it will! Just because you’re accessing shared variables doesn’t mean you need locks. This is what volatile is for!

You’re blindly hoping the compiler will do the right thing for you, anyway. You must have little-to-no experience writing multi-theaded code in production, let alone anything more advanced that doesn’t rely on locks.

Nobody’s arguing that volatile somehow bypasses the cache. When it comes to multithreading, that doesn’t matter though, because the whole process works with the cache, and the CPU keeps core caches synchronized. Memory barriers do nothing here.

Seriously, you’re arguing rubbish.


You do use it on embedded systems, hence my note about using it when accessing things like memory-mapped devices in low-level code (depending on what you're doing). However `volatile` is not at all correct for multi-threaded code and that's a very common misconception. It's really not correct for basically anything except for embedded systems.


Stick with 1989, and you donb't have to worry about it.


Well... the declaration you cited resolves down to a pointer to a wide integer. Part of C's simplicity is its small number of keywords, many of which have become way too overloaded as architectures have grown more advanced. The keywords were decided in the 1970s, when address spaces were 16 bits at the most. The ability to create type aliases with `typedef` has added to this apparent complexity, and this has been abused by compiler and OS vendors for decades.

None of it makes the language inherently any more complex, any more than representing machine instructions by long strings of binary makes _them_ any more complex.


> Well... the declaration you cited resolves down to a pointer to a wide integer

With a boatload of associated semantics, quite a bunch bein in practice compiler-dependent (looking at you, MSVC volatile).

And a "protected abstract virtual base pure virtual private destructor" does not mean anything -- or at least any code that compiles.


> With a boatload of associated semantics, quite a bunch bein in practice compiler-dependent (looking at you, MSVC volatile).

Which is not part of C. Your argument that "C is complicated" doesn't work if you cite an egregious misuse of nonstandard extensions.


Forgive my ignorance here, I'm genuinely intrigued: what's so special about MSVC's volatile as opposed to the regular volatile keyword?


C++ =/= C


> Once you become familiar with the language there is not much to lookup in the docs because it's such a small language.

The same could be said about Lisp. Would you trade your C compiler for Lisp one?


It is not fair to compare Lisp to C. C is much more practical.


C is great and SQLite seems doing amazing leveraging its power. Yet I bloody wish there were alternative (yet 100% compatible on the file format and SQL dialect levels) SQLite implementations native to specific languages/platforms (e.g. .Net). In just so many cases the DB speed is the least priority while simplicity and portability are valued much higher (in these particular cases). Every time I build a humble desktop .Net app featuring an SQLite database it turns into in a ridiculous "try this, google a lot, try that, repeat, .... oh wow, it works!... oh, it doesn't" nightmare and even if it actually starts working nobody really knows how to deploy such an app on another computer (especially if it runs a different kind of OS) correctly.

SQLite is arguably the best of what has happened to the world of desktop/embedded databases ever yet lack of alternative (not necessarily this fast) implementations in high-level languages is its main problem IMHO.


Ive used sqlite a lot in recent years and found the issue you describe to be largely one that only affects Windows deployments. On Linux, FreeBSD, Solaris and OS X builds it's all just worked for me. But the same build scripts will fail for a multitude of subtle ways on Windows. IIRC (I've not targeted Windows much recently) I've found the easiest way to get it working was just to have mingw (or similar POSIX layer) and build sqlite from within that.


As with most things, your mileage varies a lot based on what you are used to. The previous poster mentioned "cross-platform" and I think that is where the most headaches lie because you might know one permutation, but not all of them.

For instance, Windows binary deployments are quite predictable (and often dead simple; so easy a lot of Windows installers get it wrong and yet still work), but on *nix you may have to fight distro differences in libc or get a dozen different answers on what you should statically link in your compile versus what you "must" dynamically link and a half-dozen different installers/bundlers/distro tools to deal with making sure the shared libraries are both installed and correctly linked to. What "just works" is often relative and the magic is very easily dispelled in that terrible state of "why isn't this working?", especially if it "just works" like magic for other people.

For what it is worth, the Windows UWP platform even bundles a shared SQLite install (as like Android and iOS) and using it is quite easy in a bunch of languages (just not easily portable cross-platform).


> For instance, Windows binary deployments are quite predictable (and often dead simple; so easy a lot of Windows installers get it wrong and yet still work), but on nix you may have to fight distro differences in libc or get a dozen different answers on what you should statically link in your compile versus what you "must" dynamically link and a half-dozen different installers/bundlers/distro tools to deal with making sure the shared libraries are both installed and correctly linked to. What "just works" is often relative and the magic is very easily dispelled in that terrible state of "why isn't this working?", especially if it "just works" like magic for other people.

That's only if you're working with C / C++ though. Switch to Go, Java or basically nearly any other language, and most of your cross platform headaches go away.

Though for what it's worth, half those problems aren't really problems you'd generally have to get your hands dirty with as your tooling and distro should manage that for you (just so long as you write POSIX code that is). I'll grant you I've not done anything too complex in C++ but what I have done was portable between Linux, Solaris and FreeBSD (the 3 systems I needed those POSIX C++ programs to target).

That was compiling on those respective platforms rather than compiling for those platforms then packaging them up for deployment. I've written a lot of code over the last 30 years and shipped it in a plethora of different ways and creating installers has historically definitely been easier on Windows. No question. Go lang is helping somewhat in that it's now really easy to ship a dependency free binary. But Windows is only one platform and one that creates more problems for cross-platform portability than all of the rest of the big platforms put together.


> For instance, Windows binary deployments are quite predictable (and often dead simple...

It's neither predictable, nor simple nor easily portable when it's about a mixture of managed (C#, Java etc.) and native (C, C++, asm etc) code. I'm happy to know the Windows UWP platform finally has it built-in but I code WinForms and console apps targeting Windows 7 + Mac and Linux with Mono and using SQLite always means a problem (yet I always want to use SQLite to store anything (except just some things that obviously fit other ways of storage better) because it's great and it's a standard supported everywhere).


I'm not sure if it's related to the .Net framework, but definitely it's NOT related to the Windows platform. I've been using SQLite on Windows (with Delphi) for nearly a decade, while there were issues (solvable), but one is related to deployment, seriously.


Are you saying that there's no actually working .net sqlite bindings?

Because when using python, featuring an sqlite database is roughly:

    import sqlite3
    with sqlite3.connect(db_file) as cnx:
        # query the db


I think the parent means native implementations so that these bindings are not necessary. The bindings are an interface to a C library that should be shipped with the device. I can see how that can be problematic if you're targeting OS X, Windows, and Unix systems


The OP is saying problems exist before even deployment concerns:

> even if it actually starts working nobody really knows how to deploy such an app on another computer (especially if it runs a different kind of OS) correctly.

emphasis mine.

(and portability could obviously be solved the same way it was in Python: ship sqlite as part of the core system)


> The bindings are an interface to a C library that should be shipped with the device.

There is nothing stopping you from statically linking it with the rest of your code.


How exactly would you link a C library to a .NET application?

Apart from compiling it using something like ilcc, that is.


.NET has always had a foreign function interface (FFI) system called P/Invoke [1] or Platform Invoke. You build a DLL and ask the .NET runtime through metadata to load that DLL for you and call the functions you need.

It's not entirely cross-platform, but it is possible. Example: https://developers.redhat.com/blog/2016/09/14/pinvoke-in-net...

More details in the .NET Standard: https://docs.microsoft.com/en-us/dotnet/standard/native-inte...

[1] One interesting reference site: https://www.pinvoke.net/


That isn’t that simple. https://www.sqlite.org/download.html has separate libraries for 32-bit and 64-bit windows, and you have to write code to figure out what DLL to load (see http://system.data.sqlite.org/index.html/doc/trunk/www/faq.w...)

Nowadays, you mostly can ignore x86, but I guess you still hit similar problems if you want your code to run on Windows for ARM, .NET core on Linux, etc.


I'm aware of P/Invoke. I'm pretty sure that's how the ADO.NET wrapper to SQLite works. GP was referring to statically linking a native binary to a .NET application, and as far as I know, there's no way to do that.


I was curious. One approach is apparently to include it as a binary resource, extract it to temporary space at runtime, then P/Invoke: https://stackoverflow.com/a/768429

That seems a bit much to simulate static linking, but doesn't sound like it would be too noticeable to users.


That's an interesting technique, and would certainly work.

You could probably even make it cross-platform that way.


Not all languages are compiled.


If your language has no possibility of ffi, that's kinda an issue on that language.

Good luck using portable libraries.


Maybe you could help me understand why static linking is a requisite of FFI?


Oh, d--n it, misread. Never mind.


No problem; it happens!


The value of SQLite lies in its implementation, and a native implementation would be immensely costly.


You really don't want to re-implement SQLite for every high-level language!

Also, isn't SQLite just supported out-of-the-box by new .Net versions? E.g. https://docs.microsoft.com/en-us/ef/core/get-started/netcore....


I've worked fairly extensively with SQLite in C and .Net bindings. I prefer C, but you can do nearly everything at basically full speed with the various interfaces. The only thing off the top of my head that didn't work properly recently was saving in-memory DBs to disk using bindings. I had to use C for that.


I think it has been ported to .NET but unsure about maintenance: https://www.infoq.com/news/2009/08/SQLite-Has-Been-Ported-to...


If you're using Java, use H2 (h2database.com)

Edit: full disclosure, I am a maintainer :-)


I don't understand how you get in that situation. Python has shipped SQLite for years and it works the same on every single plateforms.


I strongly suspect this article was written as a response against C++, Java, or C#. Especially given SQLite being started in 2000, C really was undeniably the best choice, and the existing code in C is a strong argument in favor against rewrites.

That being said, Rust and possibly even Go would be strong contenders to make a new SQLite-like library/program today. At least on the Rust side, the C bindings are excellent too.


Lets not create a false equivalency between Go and C.

Go is a good language, one of its core pieces is to increase memory safety through the use of a garbage collector.

Much of what the SQLite post states may not be safe operations in Go.

Rust would be a safer alternative that meets many of the SQLite requirements.

The thing is, SQLite is bulletproof at this point, so do we need to replace it?


> The thing is, SQLite is bulletproof at this point, so do we need to replace it?

No, probably not, it is fine as it is. Still, even SQLite started out as a "for fun" project. Who knows what might happen in the future to displace it? ;)


Sqllite didn’t start for fun it was built for navy destroyers and submarines. (1) that’s probably why parent called it bulletproof.

(1) https://en.wikipedia.org/wiki/SQLite#History


It could also be a reference to the comprehensive SQLite test suite (1) or the fact that it has been in production systems for many years, though it’s initial purpose does make bulletproof quite apt.

(1) https://sqlite.org/testing.html


> Sqllite didn’t start for fun it was built for navy destroyers and submarines.

Those aren't mutually exclusive. I for one find destroyers to be an absolute riot.


I admit that I was merely being metaphorical and referencing the comprehensive test suite, wide distribution and usage.

I wish I had intended that pun embedded in the comment, but from now on ;)


When Wikipedia simply says otherwise I wonder where you got the "for fun" reference from. I tend to almost blindly trust comments on HN but this one was quite off.


You baited me into making an account.

See https://changelog.com/podcast/201#transcript-100

>I had couple months off and I thought, "Hey, I'm just gonna go and cobble together a really quick and simple database engine that just does a few very simple SQL commands, insert the lead, update and select." No joints, wasn't trying to be efficient... All I needed to do was pull stuff off of a disk in that memory.

>And I put it out there and... I've been doing open source for years before this, putting things on my website, and people would find my thing -- or well, you know, I'd put things on my website and it'd get like five downloads per year, or something like that. I'd figured this would be just another one of those things, but for whatever reason it really resonated with people.


> The thing is, SQLite is bulletproof at this point, so do we need to replace it?

Nothing written in C is bulletproof.


You're free to write bugs in any language. C just makes them easier to write or more critical.


OpenSSL was thought to be "bulletproof" and the "go to" option. And look where we are now


SQLite has a much more thorough testing process than OpenSSL did. Their acceptance tests have 100% MC/DC coverage [1], for example. While previous things like the IOC tester did find issues with SQLite, SQLite is very aggressive about including these sorts of tools into their ongoing testing process.

[1] MC/DC coverage is a slightly more rigorous form than branch coverage. A condition if (a && b) requires that you only test one of (a false, b true) and (a false, b false) for 100% coverage, whereas MC/DC coverage would insist on both being tested.


OpenSSL has always been nowhere near the level of robustness and bulletproofness that SQLite is at. Just look at how it's tested: https://www.sqlite.org/testing.html


It's too bad some part of tests are proprietary. I think it someone would want to rewrite SQLite (keeping compatibility), those tests would be of huge value.


Yes, but that is also a source of funding for the project.


And how many more critical flaws would OpenSSL have if people kept rewriting it in the trending language of the week?

Rewriting just for the sake of rewriting or because you like another language better is an almost certain recipe for disaster.


I have a hard time believing this. It was common knowledge and lamented before Heartbleed that OpenSSL was bloated an unwieldy. People were surprised by the severity, not that it happened.


I don't think this comparison is fair. SQLite is widely regarded as one of the best codebases in the world and it has 100% test coverage.


That level of test coverage isn't as meaningful as people think. I'd bet that a significant fraction (> 10%) of those tests are redundant or cost more (in terms of maintenance, for one example) than the value they provide.


I think you should read the full write-up on sqlite testing:

https://www.sqlite.org/testing.html

There's no question that sqlite is one of the most well understood and reliable codebases in the world.


OpenSSL is known to have a poorly documented, spaghetti codebase. The same is not true for SQLite.


Bob Beck during his LibreSSL talk had a good overview of what was thought of OpenSSL https://youtu.be/GnBbhXBDmwU?t=2m11s

"we are all guilty"


I suspect there's a huge code quality difference between OpenSSL and SQLite. Also, keep in mind that SQLite was much easier to fuzz - it's basically one of the libraries that AFL has specific features for (dictionary-based coverage-guided fuzzing).


OpenSSL was never thought to be bulletproof. It is, however, the best of the worst for many scenarios.


But it was bulletproof until proven otherwise.


I suspect Go will never be an ideal choice for creating libraries which are supposed to be used from any programming language. Even while the C-compatible-library-story of Go is developing (which is great), it brings runtime machinery that just isn't there with languages like C, C++ (especially with exceptions disabled) and Rust.


I agree.

I'm curious though--is it true that Go brings its runtime? I'm of the impression that the runtime is only compiled in if you actually use it, but if your library is just `func Add(a, b int) int { return a + b }`, would linking against it still bring in the runtime?

And to reiterate, this is only a curiosity. Even if you could take care to avoid importing the runtime, I wouldn't think that it's worth the while.


You would have to start at languages that are demonstrably better than C at being C for the reasons listed as to why C is still the best language for SQLite.

In the present day context of vulnerabilities it's tempting to blame C. Yet it's still a good choice for many reasons. It's not wise to suggest that everything written in C must be re-written in some other language because of hand-waving reasons like buffer-overflows or off-by-one errors.

Maybe you could prove Rust is a good choice by writing your own SQLite implementation feature for feature? Until then I don't think there's going to be a compelling reason to re-write SQLite because it's written in C.


>It's not wise to suggest that everything written in C must be re-written in some other language because of hand-waving reasons like buffer-overflows or off-by-one errors.

These security vulnerabilities are trivial to fix at the language level but cause extremely negative consequences if exploited successfully. Yet the default stance of C programmers is to not adress them at all. A competitor like Rust absolutely becomes necessary because of the complacency of C programmers. They will then will strongly critizise the newcomer that is built on strong fundamentals as "language of the week" that is chosen by "dumber programmers" that are trying to ride the latest hype train.


Ignoring your comments about other programmers...

Engineering is trade offs. I think the SQLite programmers are well aware of the risk the use of C brings in the context of security vulnerabilities. Given the statistics collected here: https://www.cvedetails.com/vendor/9237/Sqlite.html it seems that their choice of C was not a security disaster, because C. It actually seems well within a risk tolerance threshold that I don't sleep uneasy at night recommending its use in systems. If their choice of language came with unavoidable vulnerabilities I'd expect those statistics to be much worse.

So what does a Rust implementation of SQLite offer anyone? That within 8 years there may be one less SQL injection attack or perhaps < 4 overflow vulnerabilities? Will it have better interoperability or performance than SQLite does now or will have once this new version is done? Will the market care?

I'm not saying that care shouldn't given when choosing to use C in a greenfield project today. I am saying that for many problem domains the risks associated with C are tolerable given a well trained, disciplined team.

And I doubt there's much of a market for an in-memory database system that is going to be feature compatible with SQLite in a few years that is also as performant as SQLite and can deployed on as many platforms. But that's just a prediction... maybe I'm wrong.

I've chosen Haskell on a greenfield project recently for many of the safety guarantees it brings... who knows?


You're making some mistakes here.

If the problem is complacent programmers, switching languages is going to have no effect.

You're also making blanket statements about the attitudes of C programmers which have nothing to do with the C language. That probably means your generalizations are overly broad and not true, not even for a significant fraction of C programmers.

I think you need to use some logic to argue your point, not wild claims.


There is no need for arguments here. C leads to vulnerabilities. That's self-evident. See ISO/IEC TR 24772:2013 for more information.


printf(“hello, world!”);

Please, find the vulnerability in that code, or quit making wild claims


Besides the fact that your code doesn't compile, I would argue that you should deliver real world examples to assess and then comparing it to alternativ solution.

I know, theoretically it's possible to write secure code. Even in C. Experience showed, that 99,999% of people aren't able to do that. And many reasons have its origin in the design of this bad language.


> Maybe you could prove Rust is a good choice by writing your own SQLite implementation feature for feature?

A person undertaking that task would already have a head start in that they could likely use the extensive test suite for SQLite ;-)


But... who proposed a re-write?


Exactly. At this point, the strongest reason for SQLite to remain in C is that it is in C.


It's still not there yet, but getting close. And Rust is really the only language that has come close.

In terms of performance, it's really the only language out there that can claim to be as fast as C/C++. It has a minimal runtime, with the option of no runtime. It has pretty great FFI and can produce easily callable libraries for other runtime-heavy languages.

It's not, however, as broadly compatible as C. I don't think that's a problem for most cases, as most programming is now done for (MIPS|ARM|X86|X86_64), and it can handle those well enough. But microcontrollers and OS-less embedded devices still have a ways to go before Rust beats out C.

And stability is just not there. IMO, that's a good thing. Rust is the best thing to happen to systems programming in a really long time, and there are still tons of ideas with amazing potential benefits. The Rust community has been exceptional at guiding this development. I'm sure some day it will level off, but until then, I'm happy with it changing pretty rapidly.


I don't think "stability" and "changing rapidly" are incompatible notions. Stability implies that code written today continues to work years from now on later versions of the compiler and language. Rust has always (well, since 1.0) striven to provide stability in that sense, despite the rapid pace of language improvement.


> most programming is now done for (MIPS|ARM|X86|X86_64)

but not necessarily posix/windows/etc

having C as a least-common-denominator toolchain which almost all platforms will provide is still useful for the embedded world where sqlite has a huge number of applications..


Rust has a complete #nostd mode where most of it's ergonomics are still usable without any runtime. It also supports pluggable allocators, as well as replacing most of the builtin functionality of other language components (lang items).

It's story around panic and debug handling is also improving, including cutting out formatting code for things like println! and debug! when targetting embedded platforms.


It’s also not as fast as C/C++ in all cases, only in some.

http://benchmarksgame.alioth.debian.org/u64q/rust.html


The biggest differences there are due to lack of SIMD. We expect it to be stable quite soon, and that should close the gap quite a bit.


Do you say that because you have coded those programs using SIMD and rustc nightly, and seen that using SIMD eliminates the performance difference ?


I have not personally done this, but the people who have identified that as the issue, and what we’re stabilizing is the exact same thing as what C and C++ compilers have, so there’s no reason to believe it would be different. Once it’s stable, we’ll all see!


Someone has repeatedly commented on /r/rust/ that differences are due not so much to SIMD but iirc to not triggering the same LLVM loop unrolling.


I think you're confusing loop unrolling and autovectorization. That said, it's kinda moot, once SIMD is stable, we'll find out :)


>> I think you're confusing loop unrolling and autovectorization. <<

Here's exactly what I was told -- "This was achieved by manually unrolling a 10-step loop, which compiler apparently could not optimize."


I would say C will never be as safe as Rust. Just accept it, C is crap. It's just not cool.


You may want to check out Pony :)


In 2000, the only feasible language really was C. C++ suffered from severe portability issues, which have only arguably gone away in the past few years (that said, you still do see systems with pre-libstdc++-4.9, which is still a headache for portability). Managed languages generally suffer embedability problems, and still do to this day.

The conclusion that C remains the best language is a lot harder to support. Certainly, and especially with the level of infrastructure that SQLite developed to harden its implementation, the alternatives are not so much better as to be worth the cost of migration.


> you still do see systems with pre-libstdc++-4.9

Such as macOS ;)


I'd actually say D would be best suited as a replacement because of -betterC mode. You could replace one piece, run all the tests, and then move on to the next piece. Most of the C code could even be reused with minor modifications. In fact, this is why -betterC was introduced. The only reason you wouldn't want to do this is that D is limited to the platforms targeted by LLVM.


> Most of the C code could even be reused with minor modifications

I've done such conversions with C to Rust converter (https://gitlab.com/citrus-rs/citrus), but quickly found out that the style of writing idiomatic in C is part of the problem.

To replace code function by function you're generally forced to keep the same structs and APIs (often even internal ones) for most of the time, and these require you to erase the extra type safety, degrade smart pointers and slices to plain pointers, etc.

If you just do all the same wonky stuff that C does, but only with a slightly different syntax and compiler, you don't gain that much. The value comes from using idioms of a safer language, and that's much more work, and it's especially hard if your hands are tied by the rest of the program being C-like.


I agree. The important thing for a port is to be able to do it one small chunk at a time, so that you know you're not changing behavior. Then after the port is complete, you can start adding all the bells and whistles of the new language. There's no benefit to the port if you don't take the last step.


I'm an unabashed enthusiast for high-level languages that require complicated runtimes and tooling. I thank the heavens sqlite is written in C and if anything is to ever supplant it, I hope that's then also written in C.


What, you don't want to try and figure out how to embed the Erlang VM into your Python project? ;-)

EDIT: Jokes aside, Rust sounds like the most plausible alternative, though it still lags C due to being LLVM-only, whereas C has gcc and... others.


Probably originally written as a response against C++, Java or C# yes. However it’s also been kept up to date because it mentions Swift, so I think they don’t want to rewrite it in Rust or Go. But I agree that Rust might have been a good fit if SQLite had been created today. There is the point that sibling commenter made about small embedded systems though. But some people work on making Rust useful even for microcontrollers so who knows maybe a SQLite in Rust could have been highly portable too?


Sqlite is often just dropped into a project as source code. You can't do that if it were written in Rust.


You can't do that with C code either unless your project is in C (or C++?).


Sure, but for many embedded systems you have to use C. Plus C interoperates well with every other mainstream language. Every platform supports C, so by writing it in C you can support every platform (with very few caveats).


Go and Rust is too heavy and require more dependencies for SQLite. SQLite runs even on the smallest exotic embedded systems.


You can run Rust on atmegas. It doesn't get much more small, but it can get a lot more exotic. https://github.com/avr-rust/rust



Ok, then it leaves us a performance question, right?


If Rust performance diverges significantly from C, then it's a bug. Bugs do happen! We take bug reports, please file them.


but the point is to have maximum performance. This is most deployed database in the world, the overall impact is incredible. It is not about "significantly" slower.


I said "diverges significantly", that comes on both plus and minus. Right now, Rust can be faster sometimes, but is also slower sometimes. It just depends.


No it means there should _not_ be a performance question.

It does however mean that there is an issue with supported platforms. Rust has support for all the major platforms already, but C is probably the most widely supported language in existence at all.


But then you're forced to use the rust compiler. Sqlite can be included in any project that can use a decent C compiler.


Why would you be forced to use the Rust compiler? You can generate C headers for a static Rust image.


It brings the Rust compiler in as a dependency. I'm not saying you have to use Rust for the rest of the project, but it complicates things in a way that having C source code that you drop into your project doesn't.


I think you two are talking at cross-purposes. You are correct that building the project would require Rust. Your parent is correct that, with a pre-built binary, you don't need any Rust-specific stuff installed to make this work.

Building from source is usual, of course, given the lack of a stable ABI.


The smallest known Rust ELF binary is 151 bytes.


Given your user name, I believe you completely, but can you provide a citation for how this was done?


Here's the HN discussion, which also links it https://news.ycombinator.com/item?id=8869167

Of course, it's not usual Rust code, but no binary that small is usual code, even in C.

The point is to demonstrate that you can strip Rust down to as small as you want/need, not to suggest that every single Rust program is ultra-tiny.


I'd say that at that point, it doesn't even matter that you're writing Rust code. Most of the slimming down comes from cutting out cruft that the linker puts in by default.


Well, it matters in that Rust lets you eliminate this in the first place. Not all languages let you do this kind of thing. If you want to write Ruby, you have a VM, even if your program is `a = 5`.


Rust has no runtime... so... not sure what you mean by that.


I've seen a couple of efforts to attempt to port SQLite to Go, usually with the idea of 'mechnically' porting the existing C code to Go rather than starting from scratch. Someone has already ported the SQLite3 shell: https://github.com/cznic/sqlite3shell .. you can tell the code is transpiled, however, as it's nothing like what a Go developer would write riddled with gotos and artificial variable names :-)

Personally, I think it'd be better to just start afresh with new implementations, not necessarily under the auspices of the SQLite project. Being 'compatible' is reasonable enough and you can just ditch the lesser used or out of date features if you're not claiming to be a 100% clone.


There's a lot of first-party tooling for that, though, since it's how the go compiler itself has been slowly rewritten into go.


The shell might be the least neccessary to mechanically port as it could be easily implemented in the target language and probably doesn't change that much.

The core is where the real compatibility issues lie and where it makes the most sense to translate, as well as where the core vulnerabilities that can be remotely exploited lie.


Go has a bad fitness to this kind of problem as any language with non-deterministic GC would.

Candidates may be: C, C++, Rust and maybe Swift(if the deterministic ref-counting doesnt get in the way).

Of course you can do it in other langs, but it wont be able to compete with products made in languages with better fitness to this.

Eg. CouchDB vs. MongoDB. The first invented the concept, but the later used a language with better fitness for the kind of problem, and therefore was able to create a better product.


I don’t believe that Go is useful for creating a widely-consumable library.


> Rust and possibly even Go would be strong contenders to make a new SQLite-like library/program today.

Go absolutely would not be, one of the purposes of sqlite is to be embeddable in any and all software.


You can create shared libraries in Go, which can be used as any C-implemented shared library. This would make a Go-implemented SQLite equally usable by other programming languages as a C written one.


> Especially given SQLite being started in 2000, C really was undeniably the best choice, and the existing code in C is a strong argument in favor against rewrites.

That's not much of an argument considering the CLR and the JVM are written in C.


They are both weak contenders, for the reasons laid out in the article. Go and Rust have large runtimes with lots of dependencies, and Go is not as performant as C.

Literally every system has a C compiler, except maybe for a very small number of very old and niche systems. Assuming it cound fit on the ROM, sqlite could probably be ported to my z80 calculator, which has an 8 bit processor and 128 kiolobytes of RAM - trivially! C runs literally everywhere, which is something no other programming language can lay claim to. For a tool like sqlite there is no other choice, period.


Rust has an equivalent amount of runtime to C.

Our platform support is limited by LLVM though, this is a good reason for sure.


I was under the impression that the support for unrecoverable errors via panic! creates a runtime structure (allowing for the explicit release of resources) which is not typically present in C programs (unless manually included).


Honestly, "runtime" is just a complex word in general, like, most people refer to C as having "no runtime" even though it does.

Regardless, those are called "landing pads", and you can compile with an option to turn them off. Many do.


Fair, I forgot that Rust has good freestanding support. I still don't think it'd be a good candidate for sqlite, though, for one reason in particular: Rust is not competitive with C on portability.


Yes, agreed. It's not a real answer yet, but mrustc can compile Rust to C, so if this becomes a really huge blocker, we could put in some elbow grease and make it a real answer. The question is mostly "is that worth the time or not?"


Which is why Nim would be a good candidate. Compiles to C so has the benefits of its portability, interoperability and speed.


Oh please please no! Nim is almost impossible to debug at a low level and you have no fine-grained control over its performance or behavior. It's also far less well understood than C, which is more boring and the subject of decades of conservative development. I'm also not sure what kind of support nim has for freestanding programs.


Can you elaborate? I've never had problems debugging it, you can use lldb or gdb to do it.

You have a hell of a lot of control over its performance and behaviour. Are there instances where you found that wasn't the case?

As far as the understanding of C goes, I can agree with that. But we are discussing rewriting sqlite in something else after all.


Maybe you can reach out via email, it's in my profile. I'd rather not tear into Nim in this thread, we're getting pretty off-topic.


Done.


Doesn't it come with GC? That would limit embedding.


That's true. You do have the option to disable it though.

Seeing as Go was suggested above (which is also garbage collected), I figured I would mention Nim too.


Interesting. I love programming in Nim but I'm not sure if I would consider developing something like SQLite in Nim. It still has a few rough edges IMHO.


> It still has a few rough edges

Love to hear what those are and what language you would consider developing SQLite in.


I'm nowhere near technically competent to develop something like SQLite in any language. Richard Hipp is a legend unto himself.

But, I tried working in Nim without GC and out turned out to be a bigger hassle than I expected it to be. GC was supposed to be ref counted and now they're doing memory regions? There is also a -gc:stack which I'm not sure if it works the way like in C++. Then there is the pointer free paradigm. Don't get me wrong, I'm very optimistic about Nim and would be great if it really replaces C. But I feel like it's doing too many things at once.

I personally feel like Nim dev team should stop adding new features every release and work on releasing a solid 1.0.


> That being said, Rust and possibly even Go would be strong contenders to make a new SQLite-like library/program today. At least on the Rust side, the C bindings are excellent too.

Doubtful, rust and go do not run on nearly as many platforms as c and sqlite does. I get everyone wants to use other languages, but this incessant "rewrite in rust/go/$THING" is getting annoying.

I'm going to try to coin a new online discussion rule of programming language posts:

- At some point someone is going to suggest any problems for one language can be solved by rewriting in another language.

In this case "because c" "therefore rust". Its basically godwins law for programming language discussion.


Keyword is "SQLite-like", you are the one bringing "rewrite in rust" up.


Seems like a distinction without a difference to rewrite "sqlite in c" to "not sqlite but sqlite-like in rust or go".


How is it not different? It would be a completely new library. I don't see what the argument against that is.


You're free to do so, as is anyone.

But you'll have to overcome and provide a good reason for why anyone using sqlite should move to your new unproven shiny.

If anything sqlite is a poster child for c done (and tested/validated) right. You'll have to demonstrate a lot more than you can implement some things better in the new language.


I'm going to stop here since the goalposts keep moving. My point is: no one brought "rewrite in rust" up but yourself.


Fair enough, but I guess I don't see a difference between rewrite sqlite in rust and, make something exactly like sqlite, with a c ffi but in rust.


I interpreted that commentary as referencing a new library similar in architecture, like an implementation of S3 that lives locally in the software and is modeled after SQLite. Not “let’s rewrite SQLite” like everyone else seems to have run with.


It's always tempting to use c++ because of, for example, the availability of libraries such as STL and Boost. However doing so greatly increases the complexity of the code, when considered as a whole, and we all know that complexity spins off bugs even in well-tested libs such as mentioned above.

The great thing about C is that when someone shoots you in the foot you know who it was.


The big problem with writing a piece of software like SQLite in C++ is that there is generally no easy way to expose a C++ interface to other languages. C++ has too much complexity for other languages to be able to easily construct the data structures it would need to make a function call or handle the results.

The normal solution to this is for C++ libraries to expose an interface in plain C, which is much easier for other languages to call. However, this either restricts what you can do in your implementation (because you're stuck with the C "subset" of C++ for anything near the API), or you have to maintain wrapper code mapping C function calls to your actual C++ interface. Neither is great, so plain C ends up making a lot of sense for a library that is expected to be called from different languages.


This. SWIG is a heroic effort at cross-language compatibility with C++, but it still gets wonky because C++ interfaces are complex, especially when dealing with templates. I still find that the best way to do language bindings is to expose a plain C interface that has no name mangling and treats objects as opaque pointers. Then write or generate bindings to call this interface from whatever other language you're using. Finally, wrap that basic functionality by hand.

SQLite is meant to be basic infrastructure that is wrapped by a variety of other languages. It should have a lowest-common-denominator interface, and C is good for that. And while it might be easier to implement it in $FAVORITE_LANGUAGE, SQLite is mature and well-tested. In some sense it's "finished," and throwing it out and replacing it would be a waste of time.


This makes sense if you are writing an absolutely tiny library where the interface (the surface area) is a substantial fraction of the whole. In most projects of any appreciable scale, the user facing functions and types are a tiny, tiny fraction of the code and complexity of the whole. So no, this argument does not even come close to justifying using C instead of C++.

As few people seem to realize, it's common for even the C standard library to be implemented in C++. Having to implement all of the printf variants (there's 8, I think, at least) using C macros is horrible. Instead, the actual implementation of printf/fprintf etc happens in a function template. You then have one line extern C functions implemented via calling this template, which are declared in the header (and defined in the .cpp, along with the template).


Boost is a technical masterpiece and very high quality C++. It's also battle tested and very reliable. I use it for my professional projects but I loathe using it. Not just the complexity, but the error messages are really hard to decipher and most IDE's autocomplete systems completely choke when using Boost. Even if I want to see how something works 'under the hood' it's almost impossible to figure out.

In the end, I came to the conclusion that it's far more productive (for me atleast) just to stick with plain C and using a 'helper' library like Apache Portable Runtime library.


I wrote some code that was handling upwards of 2 billion qps and every few days it would mysteriously blow up. Fortunately my gut instinct, replacing stl::map with a boost::equivalent saved the day. It's not that I don't have faith in using boost, it's just that I prefer to avoid complication and dependencies when it makes sense.


Yep. I experienced many such situations. Like I said, I still use boost because the benefits outweigh the baggage, but I try to avoid it as much as I can.


btw, 2 billion qpDAY not s :)


> However doing so greatly increases the complexity of the code, when considered as a whole, and we all know that complexity spins off bugs even in well-tested libs such as mentioned above.

do you also take this into account when leveraging $HIGH_LEVEL_LANGUAGE's libraries that are written in C ?


I can perfectly agree to much of what they say, but here...

> The C language is old and boring. It is a well-known and well-understood language.

...I think they are very fundamentally mistaken. C is a horribly complicated language. It is one of the least-understood languages out there. Experienced programmers and compiler authors can debate for hours about whether C code of less than 50 lines has defined behavior or not, and still not come to a conclusion. People can write an entire PhD thesis <https://robbertkrebbers.nl/thesis.html> studying the semantics of C, and still leave many open question (chapter 2 of that thesis does not require any academic background to be understandable, and it comes with tons of links to tickets/questions filed against the C standard). Consistently writing safe C/C++ is near impossible <http://robert.ocallahan.org/2017/07/confession-of-cc-program..., and judging from <https://sqlite.org/testing.html> the SQLite team agrees.

C is old, yes -- and C has boring and well-understood fragments. But full C is very, very poorly understood.


Just to be clear: The SQLite project strives to fix UB whenever any is discovered (which is to say, "rarely"). But SQLite also focuses on testing at the machine-code level, not just at the source code level. The machine-code generated by GCC, CLANG, and MSVC is all tested to 100% branch coverage and beyond. So even if one were to find some new UB in the SQLite source code, all the usual compilers are known to be doing something sane with it, not something goofy or harmful, and so it is not really a problem.


"Why is SQLite Coded in C? Because 'C is best'". And you know what? I can't really say I disagree.


When taking the portability and the 'C is portable assembly'-statement into account I can't disagree either. While there is plenty of hard things and things you can do wrong with C, there is a lot that you can't do in pretty much any other language.


That’s impressive:

> Libraries written in C doe not have a huge run-time dependency. In its minimum configuration, SQLite requires only the following routines from the standard C library:

  memcmp()
  memcpy()
  memmove()
  memset()   	
  strcmp()
  strlen()


I think this statement is slightly misleading, because nobody uses SQLite in its minimum configuration. See the line below:

> In a more complete build, SQLite also uses library routines like malloc() and free() and operating system interfaces for opening, reading, writing, and closing files.


A little bit of an overstatement. It's not uncommon to see minimum configurations in many embedded scenarios.


Woah:

> In its minimum configuration, SQLite requires only the following routines from the standard C library:

memcmp() memcpy() memmove() memset()

strcmp() strlen() strnc


Surely something from <stdio.h> as well?


Minimal configuration only supports in-memory databases.


Of course. Makes sense.


Not necessarily, with an in-memory database or callback hooks for reading/writing


C is perhaps the most portable language. Most platforms support C.

I once used sqlite in a embedded system project that was based on a VLIW processor(Trimedia). The only compiler available was a C compiler. The integration worked like a charm.


My problem with C isn't even the language; it's the ecosystem around it. Glibc, auto{conf,tools,*}, CMake, etc.

Writing Makefiles by hand works for small projects, but for bigger ones you're forced to choose the lesser of all evils. And if you're on linux you'll still have to deal with glibc most of the time.

And of course you can only choose between compilers that do too much (as in, not "C compilers" but "compilers that happen to support C as well"), compilers that don't even aim for standards compliance, or proprietary compilers.


Worth noting sqlite has its own unique ecosystem of tools. Most people who compile it are compiling "the amalgamation", a unique thing where they smash all the sqlite source code together into one standalone .c file you can just compile and link with, no build tools needed. I've never worked with the actual source code but the docs talk about a simple Makefile, nothing more complex. https://www.sqlite.org/howtocompile.html


Well, as with C, the ecosystem around it is mostly just misunderstood. Autotools isn't all that hard once you actually know what you are doing. While having a build and configuration system that has the learning curve of a hockey stick isn't the nicest thing, spending some time to know to to use it gets you a lot of knowledge and access that is hard to come by otherwise.

I have spent about three weeks going deep in the autotools rabbit hole (mostly converting FOSS projects that used other build systems and then ran into trouble) but coming out the other side I can say it's not as bad as it looks, and compared to other systems it could be much, much worse.

As was posted earlier, calling C a "portable assembly language" seems right, and having a portable system to go with it to build and have common functionality isn't all that strange when you think about it that way.

While other systems might be easier for a lot of cases, not much out there can get you where the classic stack of C + Autotools + (g)make + glibc/musl gets you. As with everything, it's a tradeoff between quality, cost (As in time/knowledge) and options.


> And of course you can only choose between compilers that do too much (as in, not "C compilers" but "compilers that happen to support C as well"), compilers that don't even aim for standards compliance, or proprietary compilers.

I'm not sure if I understand, but almost all mainstream compilers treat 'C' as a first class citizen. Even MSVC, notorious for only supporting C89 finally caught up and now supports most of C11.


What I mean is more like they're not "C compilers", but "C/C++/Objective-C/Java/Rust/Perl/Bash/Fortran compilers".


> C/C++/Objective-C/Java/Rust/Perl/Bash/Fortran compilers

huh? What compiler supports C and Perl and Bash? WTF?

Sure, most C compilers are also C++ compilers. I'm not sure why this would be surprising to anyone. I don't even know how you would create a C++ compiler that didn't also compile C.


and ? what would you gain if the other languages were not supported ?


Mostly just get a CMakeLists.txt file and keep tweaking it. Here, have one of mine: https://github.com/camgunz/libd2k/blob/master/CMakeLists.txt


404 on that file and the repo https://github.com/camgunz/libd2k - is it private?


I can't disagree with you as tools for managing builds, but isn't clang a pretty good C compiler which does everything you need and works pretty much everywhere?


Yup, but I'd like to remove the Objective-C{,++} support I'm almost definitely not gonna use in the foreseeable future, for example.

Take it as my obsession for minimalism.


Since Objective-C is so similar to C, I don't see why you would write a compiler for one and not do the amount of work to get it working with C and not do the comparatively smaller amount to get Objective-C to work.


Try meson. http://mesonbuild.com/

It seems to be actually a simple build configuration tool. A ton of projects are using it, including systemd.

http://mesonbuild.com/Users.html


I tried using CMake for one of my course projects recently and it worked like a charm. Of course, I have no idea whether or not CMake can handle larger projects as effectively. Maybe someone with more experience can chime in.


> I have no idea whether or not CMake can handle larger projects as effectively.

well, it's able to handle whole operating systems so... https://github.com/reactos/reactos


both times i inherited a cmake build system in two (small-ish) teams with moderately sized projects it seemed to suffer pretty hard from a "last mile" problem. 90% of what we wanted out of it was easy and the last 10% ranged from "easy but quirky" to "how is something this simple this hard"

i don't have a solid gauge for how much of that was because of my own inexperience or the tool itself, although my gut feeling is that it's probably a bit of both.


C tools are not more complex than Javascript tools and they don't change that often. However, after few years in JS land I feel the rust.


> compilers that do too much

Why is this a problem?


I don't like the many-compilers-in-one approach for the same reason I wouldn't like an `ls` program with bundled support for LDAP, SSH, Git, SVN and CVS, for example.

Sure, some way or another all those can fall under the category "list directory contents", but I'd rather use separate programs for every one of those.

Programs that I will install if and when I need them.


This is kind of cheating, but Lua looks like it would have worked.

I consider that a bit of cheat, because it replaces the problem of embedding one C program (SQLite) with the problem of embedding a different C program (Lua).

But for those whose objection is that writing the database logic in C is risky because C is too low level, this would put the database logic in a higher level language than C.


https://sqlite.org/fasterthanfs.html

This page shows that SQLlite is 5 times faster than Win10 filesystem, but the performance gains are much small in other systems.

Can anyone comment on this?



I love C for system-level coding. As mentioned in OP, you don't need to study much about language implementation e.g. how GC works, the internal structure of objects. You have the control of every espect of system and that's why Linus prefer C over C++.

But I have to admit that the biggest downside of programming in C is people make their projects complicated. I saw some projects that have compilcated Makefile build systems and have hidden definitions. The worst part is that they leave these hidden stuff UNDOCUMENTED. This is often very frustrating when contributing open source projects.


You absolutely do have to study a lot about the C language to understand when you slip into undefined behavior. There are even circumstances in which compiler developers don't agree on whether some code has defined behavior or not.


You are correct. But most cases are explicitly mentioned in compiler's documentations. I would say it's less efforts than getting to know about other language internals.


I disagree. It is significantly easier to determine the semantics of arbitrary Java programs than C programs.


Ada wouldn't fit because of compatibility and availability, but it's old, fast, safe and relatively simple, as compared to C with extensive scaffolding.


I don't understand this. Ada is still being developed. Other question: Why not using SPARK?


It's much easier to include C source code in an application than Ada.


Yes, but you don't want C if you choose Ada, no?


But sqlite is a dependency for many other applications, lots on platforms that probably don't have good Ada compilers, so sqlite wouldn't be as useful if it were written in Ada.


There are a lot of comments about "well, if you use a bunch of tools and write a million tests, C is just fine". But do you _enjoy_ doing that? I know I don't, and anything we can do to achieve whatever we're aiming for with C (speed, resource usage, startup time, ubiqity) while avoiding that is great.


Hello, our friends at MITRE have assigned CVE-2018-8740 to an issue in SQLite3 that was discovered by OSS-Fuzz working on GDAL. http://seclists.org/oss-sec/2018/q1/244


Yes, SQLite is coded in C and that is okay. But there are situations, when you wish it would be different. For example if you want to cross-compile a Go project and you have to find out that the Go lib sqlite3 just provides bindings for the C library (probably due to the high quality of the C lib) and therefore breaks the easy to use Go compiler :-/

I don't know an easy solution to this as most C compilers seem to be platform specific. In general, I like C and even more SQLite. Nevertheless, it is quite unfortunate if you can't make use of the modern cross platform compilers.

Maybe one day we will live on a world where cross compiling isn't an issue anymore.


> the Go lib sqlite3 just provides bindings for the C library

So compile SQLite for your target platform and use the bindings?


Yeah sure, but in order to do that you need an extra C compiler for the target platform. With pure Go applications it is as easy as changing a parameter for the standard compiler.


The browser you're using to read this is written in C++, on your C++ or C OS, talking to a web server written in C.


Unless you are using the last couple versions of Firefox which has entire subsystems written in Rust now, as the easiest counter-example.

Also, the important bits of the HN "web server" are the business logic that makes HN HN, and that is written in a Lisp dialect.

Even if there weren't counter-examples: are the languages that browsers, OSes, and web servers written in somehow more important than any other languages? Are browsers/OSes/web servers the best emblems of reliability engineering to you? There are a lot of interesting assumptions at play here.


> Are browsers/OSes/web servers the best emblems of reliability engineering to you?

Since so many people depend on them, I'd say yes. It's extremely obvious to users when they don't work, so I'd assume there's a lot of work put into making them reliable.


Other than HaikuOS, I can't think of any other real-world kernel written in C++. Linux, all BSDs, Windows NT, Hurd are all C. I'm not sure but I think XNU is C as well (the wiki lists it as C/C++).


While the kernel is C, I believe that most of the windows OS is written in C++.


The new Zircon kernel now from Google fuchsia is being coded in C++ with some C interface for user program.

Giving the great design it have and the leverage of Google on mobile phones, it will probably get a massive deployment in the future.


XNU is basically all C, with a bit of C++ for IOKit if I remember correctly.


> talking to a web server written in C

The HN web server and application are written in a dialect of Lisp, unless something has changed.


I meant the SQLite.org server


rust panics when out of memory, that would not be acceptable to sqlite3.


The Rust language knows nothing of dynamic allocation. You're speaking of the behavior of the standard library, which is trivial to drop. You're free to have whatever allocator semantics you want, including robustness on OOM.


Steve, you usually give better answers. Invoking the distinction between Rust-as-defined and Rust-as-commonly-used feels like language-lawyery dodging of a real issue.

Suggesting to drop all of std just to customize just one feature of the allocator it is not a good solution. You should know harsh OOM handling is a problem for Rust users, and there's work being done to improve it.


Well, suggesting that Rust inherently has this problem is mis-representing the situation. And while it's true that lots of people use std, many people also don't, and especially the people that care most about this aspect of Rust.

> You should know harsh OOM handling is a problem for Rust users

A small number of people have this issue. It barely even applies on entire operating systems, for example.

That said, custom allocators will be nice.


Fair enough, I guess for a project like sqlite rewriting some custom data types is no big deal.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: