The C++ bashing season is back

jrockway · on Oct 18, 2009

µTorrent is written in C++, and you could never, ever, make it so fast and small using any other language.

Citation needed. I think it could be done easily in Haskell, or Erlang, or Common Lisp, or even Java. (Actually, I bet it would even be fine in Perl/Python/Ruby.) A bittorrent client's speed comes from an efficient TCP stack and some minimal userspace use of non-blocking IO (or lightweight threads). It's basically glue between epoll (or similar), some disk reads and writes, and a tiny bit of algorithmic code.

It is unlikely that C++ is the best choice for this sort of application. It will work and it will be fast, but it will probably be just as fast in safer languages too.

blasdel · on Oct 18, 2009

You're right that it could be easily done in any other language on a modern Unix, but µTorrent is written extremely closely for Win32 -- which is exactly why it's so terrific on that platform. The thing is brimming with features, insanely fast, and the self-contained executable is only 282kb!

My favorite torrent client is rtorrent (written in C), which has a lightweight featureful ncurses interface, and its executable on one of my Linux x86 boxes is 680k + 586k for libtorrent!

He's almost certainly right that you could never, ever, make it so fast and small using any other language ON WINDOWS.

dtf · on Oct 18, 2009

rtorrent is written in C++, as is the underlying libtorrent library. Check if those binaries are stripped though.

agazso · on Oct 18, 2009

> I think it could be done easily in Haskell, or Erlang, or Common Lisp, or even Java.

Real life example needed. I hear a lot about what could be done in those languages, but in the end, the programs that get the work done are written in C or C++.

pavelludiq · on Oct 18, 2009

The first bit-torrent client was written in python:

http://en.wikipedia.org/wiki/BitTorrent_(software)

agazso · on Oct 18, 2009

Yeah, I am aware of that fact. I also expected someone would say Azureus (now called Vuze). But if you check out the peers in your torrent client, you will see that the majority of users use utorrent.

Anyways it's not just about torrent client (which is not a too complicated piece of software) but other kinds of desktop and server applications, that are required to have low latency and small footprint. Rhetorical question: why all major browsers are written in C/C++?

Don't get me wrong, I like dynamic languages, currently I am working in python (besides C++), but the main advantage for me in those languages are rapid prototyping/development and lots of built in libraries, not the performance.

jrockway · on Oct 19, 2009

why all major browsers are written in C/C++?

Because they are all very, very old codebases.

Firefox's UI code is all JavaScript, BTW.

blub · on Oct 18, 2009

I would have been impressed if you had created the software in one of those languages and it was demonstrably as fast and small as the equivalent C or C++ software.

Unfortunately your post looks more like "I can write a stackoverflow.com clone in a weekend".

kragen · on Oct 18, 2009

There are other languages you could make it small in, but those languages probably aren't among them.

Haskell: Hugs 98 is 723K, but it depends on /usr/lib/hugs, which is 7.3M. GHC can almost do it; I compiled a "hello, world" program with GHC and it "only" came out to 366K, which is 84K bigger than μTorrent's reported size. With UPX, however, it packs to 130k. (ocamlopt's tax is in the same range.) Maybe Haskell would be an option? Or does it bloat up when you include the libraries you'd need for this?

Erlang: My Erlang install is 97M. I don't know how much of that is minimally necessary to run things, but /usr/lib/erlang/erts-5.6.3 is 3.6M, 12 times the size of μTorrent. Stand Alone Erlang (http://www.sics.se/~joe/sae.html) is 1.5MB compressed.

Common Lisp: All the Common Lisp implementations are pretty big (SBCL, say, is 53M, of which 25M is the image sbcl.core), and they can't produce a standalone executable without including most of the environment. So you're starting at 25M.

Java: The JRE on my system is 98 megabytes, so a Java application to run on Win32 will be 98 megabytes plus the application code. Even if you compile your app with GCJ, libgcj is 33 megabytes.

Now, you could argue that the size of this stuff isn't relevant because it isn't specific to your hypothetical Common Lisp implementation of BitTorrent, but could also be used by whatever other Common Lisp programs the user has installed. But you would be treating a very-low-probability event as if it were normal.

Languages — or rather, language implementations — that could make a 300k runnable BitTorrent client include not just C and C++, but also Forth, assembly, Pascal, Eiffel, and any number of other fatally flawed niche languages that don't require hundreds of kilobytes of libraries just to initialize a process and exit. There are probably also Scheme implementations that would work.

Just about anything that has a reasonably small bytecode interpreter and lets you package the rest of your program as bytecode would probably win. x86 code is pretty grossly bloated by comparison to a good compact bytecode. (But compressible. See comment above about UPX.)

The only other practical alternative I know of would be Lua, which only costs you about half of your space budget for the interpreter itself. Lua's "bytecode" is actually wordcode, but presumably UPX or something similar would squeeze it down to a reasonable size.

It's my understanding that μTorrent does not need to ship with any shared libraries to run; the <300k of the executable alone is sufficient. You may think this is unimportant but I think that is an unconcern based in unusual wealth: you probably own your own computer to install software on persistently, probably several of them, you probably have several hundred kilobits per second of bandwidth to download software with, so you probably don't care how long it takes to download something or how much of a 256MB USB pendrive it takes up.

It's certainly true that there are any number of languages you could write a BitTorrent client in that would be just as fast.

jrockway · on Oct 18, 2009

But let's face it -- micro-optimizing the space of the image on disk is a completely useless optimization these days. You are also not including the shared libraries that the binary uses.

When making my argument in the grandparent post, I did not even consider the size aspect to be worth discussing. If this is somehow important to you, then my argument does not apply.

ido · on Oct 18, 2009

>Java: The JRE on my system is 98 megabytes, so a Java application to run on Win32 will be 98 megabytes plus the application code. Even if you compile your app with GCJ, libgcj is 33 megabytes.

> Now, you could argue that the size of this stuff isn't relevant because it isn't specific to your hypothetical Common Lisp implementation of BitTorrent, but could also be used by whatever other Common Lisp programs the user has installed. But you would be treating a very-low-probability event as if it were normal.

I am pretty sure the vast majority of users already have the JRE installed, and if they don't it's a download that's around 15mb.

sb · on Oct 18, 2009

I am usually the last one to even read language flame wars, let alone participate in them, but how exactly are Eiffel, Forth and Pascal fatally flawed?

ilyak · on Oct 18, 2009

Why would you need a tiny bittorent image when the first it would download would cover the fattest image, ten to one?

Come on, bittorent is for flacs and movies.

fogus · on Oct 18, 2009

Thank goodness -- the Java bashing season seemed to go on forever.

junklight · on Oct 19, 2009

While I am not even vaguely a fan of C++ (despite making my living writing it for at least 5 years back in the day) you can have readable multi-programmer code - look at webkit for example. Its a complex bit of functionality and there is a lot of it but I spend quite a bit of time knee deep in its code (for an internal project) and I find myself quite impressed with the quality.

Ok - so one data point doesn't prove anything but it does show that it is perfectly possible to make good C++ projects.

For me the thing that drives me up the wall is how much work I have to do in C++ compared to Python to do the same things - I guess some of that is having not used C++ for a long time. That and the fact I keep typing single quoted strings in C++ and semicolon line endings in Python.

biotech · on Oct 18, 2009

For someone the good parts of C++ are exceptions and RAII. For another it’s templates and STL containers. Each one is picking his own subset, and no one seems to agree whose subset is better/safer/more comprehensible.

It would be nice to see some clear, thorough coding standards for C++ that chose a reasonable subset. I haven't found many publicly available apart from Google's Style Guide: http://google-styleguide.googlecode.com/svn/trunk/cppguide.x... . Anyone know of any others?

spamizbad · on Oct 18, 2009

It needs more than just coding standards. Someone needs to give C++ the same treatment Doug Crockford gave Javascript when he wrote Javascript: The Good Parts.

gchpaco · on Oct 18, 2009

One of the things that Yossi Kreinin points out over and over again in his FQA is that C++ is full of features that you cannot ignore. Crockford's Javascript: The Good Parts is made significantly simpler by the fact that Javascript doesn't have the ability to e.g. silently bitslice your objects if you don't write copy constructors at every level, or forcibly pollute your type system with const endlessly, or force you to declare virtual destructors at every level. It's a much simpler language with numerous weird and positively deranged syntactical and library constructs, but you can ignore syntax pretty easy and ignore the library constructs pretty easy, so you can subset it easily.

C++ is much harder to subset, because the various (mis)features lean against each other for support like a tower of cards. It's not stable, but it's even worse if you leave some of the cards out. Even compiling C under a C++ compiler lets the nose of the camel in, as it were, due to const correctness. And (depending on your runtime, but this is required by newer standards) the moment you use new, which you have to use in many ways, you now have a situation where your program can throw an exception.

btn · on Oct 18, 2009

Kernighan & Ritchie have already written that book.

0wned · on Oct 18, 2009

Only partially. C with vectors, strings, references, maps and algorithms added would be "C++: The Good Parts"

ido · on Oct 18, 2009

For you ;)

jomohke · on Oct 18, 2009

Such a distinction between good and bad parts is much more difficult to define for C++ than Javascript. Javascript hasn't had the problem of programmers using separate subsets of the language. The bad parts of Javascript are design/implementation mistakes. The bad part of C++ is the overabundance of language features/complexity.

However, I think it would be great for someone to design a sensible subset of the language. Provide a lint tool (like Crockford's JSLint) which can enforce the subset. Marketing will be the hard part - a buzzword/name will be extremely useful for branding/word-of-mouth to work. A name will make it sellable, and identifiable, as a language of its own.

dave_au · on Oct 18, 2009

C++ Coding Standard and C++ Common Knowledge are both pretty good in that regard.

Although once you know what's going on under the hood you should be fairly safe - it just takes a while, and probably a read of Modern C++ Design or something like it in order to provide the motivation for the effort requried :)

cpppppppp · on Oct 18, 2009

Fine by me, pick your latest toy language and demonstrate how you can write a toy webserver in 3 lines or a Fibonacci calculator in 3 characters.

When you need to get some real work done give me a call - I just put my rates up.

jacquesm · on Oct 18, 2009

I've written a non-toy webserver in C++, it's been making me money since '98 or so, it's still cooking along at http://upload6.ww.com/_/stats.html 25M hits / day, 5 servers like that. It's a single threaded 'select' based webserver that pushes images posted by one user to many others.

I could have done that in C, but C++ seemed nice at the time, so I used it. Finally, after debugging obscure runtime issues and memory leaks it dawned on me that the program was ok, I just wasn't competent enough in C++ to do this quickly and error-free. So, bit by bit I started to get rid of the OO features, the program became faster, and more reliable. When it was back down to bare-to-the-metal 'C' with C++ comments it ran faultless, and it has been doing so for well over a decade now.

The webcam software was a different story, because it had to run on the microsoft platform it had to be in C++, fortunately Borland C++ builder was an excellent environment, and I never regretted using it instead of using C. The GUI features of C++ builder made life a lot easier.

My 'take home' was if you have to use C++ be very careful and use as many quality libraries as you can find so you don't end up spending a lot of time chasing obscure bugs.

And if it is systems level stuff, stick to C.

uninverted · on Oct 18, 2009

I wonder what a language with a three character Fibonacci sequence would be like. You'd probably need Unicode, if the language wasn't specifically designed for the Fibonacci sequence.

mahmud · on Oct 18, 2009

A "language" that has fibonacci builtin can do it 2 characters. fx. where f is the name of the fibonacci function and x is a single digit number.

A fibonacci DSL could do it in 1; if all input is expected to be an integral value and the machine only computes fibonacci values, then typing a single digit number would have the intended consequence:

   (loop
     (print (fib (parse-integer (read)))))

blasdel · on Oct 18, 2009

Someone clearly needs to extend HQN+ with another primitive -- http://www.cliff.biffle.org/esoterica/hq9plus.html

tesseract · on Oct 18, 2009

http://www.dyalog.com/dfnsdws/n_fibonacci.htm

chanux · on Oct 18, 2009

You say, Oh that language sucks. This is my favorite language. MyLang Rules!!!

OK you are a jerk.

You say, well I'm not a heavy user of that language. Actually I don't like it much. But this is my favorite. I like it because of this. Maybe you too like to try this :).

Hmm... You are great.

That's what I think about language bashing :).

jrockway · on Oct 18, 2009

I don't think anyone is "bashing" anything.

C++ has a weird niche. It's as fast as C, but is as high-level as Java... at first. It has exceptions, it has multiple inheritance, and it has "methods" (though C++ doesn't call them this, for some reason). This makes it the perfect language in many people's minds; fast and expressive. Unfortunately, the stuff that makes it fast is on all the time at the cost of making every function a potential foot-gun. (I can convert an integer to a pointer at runtime! I can remove "const", too...)

When you are writing code that needs to execute quickly, you don't want to do anything that's not related to the task at hand. You want to shovel bits into the CPU as quickly as possible. This means using native, unboxed types, not checking preconditions before every operation ("why would the preconditions ever not be satisfied?"), using trickery like integer-to-pointer casts, etc. C works this way, and lets you write code that runs very quickly. C++ works this way too.

The other half of a C++ app, though, is the complex glue code. Most sections of a "real application" are complicated, but they don't execute often enough for speed to be too critical. That's where you want safety; lots of self-checks, boxed types that don't let you misuse them, etc. There is a lot of complexity to manage, and a team of programmers can't do it; so you want your language and runtime to help you out.

Unfortunately, C++ doesn't help you out; you are expected to write the non-time-critical sections of your application they same way you write the critical sections. Ignore types, manage your own memory, skip polymorphism for speed, etc. Those are all the default behaviors, and they make it hard to write reliable code. (It is really worth overwriting random memory just to avoid a single comparison between an array's length and the index you are referencing? In a video decoder, yes. In the drop-down menu where the user picks the kind of TPS report to send, probably not. But guess what the default behavior is.)

The failure mode of C++ is also quite severe. In Java, flaky code that doesn't work just returns the wrong answer or dies. In C++, flaky code that doesn't work can corrupt other parts of the program randomly, and it can even allow malicious users to inject arbitrary code. There is no failure mode; the program keeps running incorrectly until something is so utterly wrong that the OS or machine kills the program. ("Segmentation fault (core dumped)".)

So basically, if you're using C++ correctly, for speed-critical sections of your application, you could just do it in C; and if you're using it for applications, you would be better off with anything else; Java, C#, Common Lisp, Haskell... all much better for higher-level code. You probably won't notice the runtime speed difference, but you probably will notice the lack of random crashes because your off-by-one bug overwrote some global state...