Running this without amazon seems to require replacing the hardcoded "{}://{}.s3{}.amazonaws.com" url with the ___location where the s3 compatible service is running on the local network. Now if that was just a default and could be overwritten with an env var as well... :)
And minio [1] seems to be the easiest pseudo-s3 there is ($ ./minio server CacheDir/, done?), or are there better alternatives by now?
I'd take a patch to allow overriding the URL! I actually could use that for testing the S3 storage.
The S3 code went through a few revisions. I was originally using Rusoto (https://github.com/rusoto/rusoto), which is nice but just didn't quite meet my needs, so then I borrowed some code from the crates.io codebase and then rewrote most of it.
You can also run it using a local disk cache, similar to ccache, but it doesn't have any code to limit the size of the cache so it's not very good right now. (It's used in all the tests, though.) Fixing that specific issue is next on my plate.
Rust's format macro only takes string literals as a template, so you can't provide a runtime string template. The solution seems to be adding an external templating lib, which seems like overkill for a single template.
In some sense, yes. In another, it's not some sort of different kind of templating, it's the same stuff, just calculated at runtime instead of compile time. I assume the lack of wanting a library for this is due to its weight and learning some other form of templating; this crate is pretty small, and isn't different in that way.
Congrats to Mozilla for not only creating a programming language research project that engaged the community, but also growing it into a successful language that's useful and robust for real-world projects.
Mozilla sometimes gets flak for its experiments (or abandoning them), sometimes deservedly, but by doing so many of them and not being afraid to cancel them, they occasionally get big wins like Rust.
I've been using distcc for quicker distributed builds without issue for many years. What extra features does sccache bring to the table other than it being rewritten in Rust?
distcc is great, and as I mentioned in that Reddit comment some of my colleagues are using icecream to great success.
I won't repeat my entire comment from Reddit, but one other notable point is that tools like ccache/distcc don't generally support MSVC, and we build Firefox for Windows with MSVC, so that's pretty important to us. And frankly, our Windows builds are slow enough that we can use all the build time wins we can get.
Color me surprised. As a developer, the number of times "Windows support" or "MSVC" is the special case that requires you to go through contortions.. shrug
It's been very frustrating on multiple open source projects I've worked on, people proposing to change entire build systems of projects just so that MSVC can be included as a build target. I hope the new Bash-on-Windows stuff will eventually make supporting MSVC a little simpler from various Unix-oriented build tools.
Supporting Windows is often a pain (although in this case Rust hides a lot of that pain, honestly), but the main driver of the original sccache implementation was not Windows support, but the ability to share the compilation cache across our many ephemeral build machines. My colleague Mike Hommey (who wrote the first implemenation) has some nice graphs demonstrating this in his original blog posts on the topic:
https://glandium.org/blog/?p=3079
Yeah, I'm especially bitter because I just spent my entire weekend getting colors and tty detection to work consistently in Windows. It was unbelievably hard. (The central difficulty is that you need to support both native Windows consoles like cmd.exe and cygwin terminals like mintty.)
It's worth it though. There are lots of people on Windows! And thankfully, most of the other pieces (like the build system) in Rust are cross platform by default. :-)
Meanwhile, GCC works fine on Windows. (Mingw builds normal non Cygwin programs.) There's no need to bow to cl.exe just because you want to build for Windows
If you're shipping binaries to end users Microsoft's optimizer is pretty fantastic, especially if you build with PGO. Plus MinGW sometimes lags behind with things added in newer Microsoft SDKs, so there can be some friction there. I think things are getting better with clang-cl, so that might be a feasible direction in the near future.
Their reddit comments are not consistent with my experience using distcc. I can attest it is indeed a distributed ccache and works quite well. For a while I was even using distcc with a GCC cross compiler on Windows nodes to speed up my Linux app builds.
If there's a sccache maintainer reading this thread please consider explaining the advantages of using sccache over distcc in the README.
> I can attest it is indeed a distributed ccache and works quite well.
The distcc readme says nothing of the sort? It seems to specify that it shares header files and compiles cpp files (which are all presumably sent back to the source as object files, which links them together)
It seems to work with ccache, but if you're spinning up fresh instances all the time ccache will need to be synchronized somehow, and it doesn't seem to drive that (whereas sccache seems to do so)
distcc is not a cache. It doesn't keep the output of the compiler around after it builds things. It (only) distributes running the compiler around (and to do that handles some details of preprocessing, in certain modes) and then gets the output back to the requester of the compilation task.
At that point, the requester could cache that output, should they want to.
Just as a neat data point, we hadn't been using sccache for Firefox builds in our new CI infrastructure (Taskcluster) for various reasons. After rolling out the sccache rewrite I fixed that, and build times in that environment dropped by 32-38%:
https://treeherder.mozilla.org/perf.html#/alerts?id=4335
In Rust, it's very common to put tests right next to the code; all functions tagged with #[test] will be run as tests, and examples within the function documentation comments will also be run as tests by default.
For instance, randomly looking at the first file in that repository, cache/cache.rs, you can see at the end a submodule tagged with #[cfg(test)] (meaning it will be compiled only when running the tests), and within it several functions tagged with #[test].
I can't figure out whether or not I like this convention. I used to work with a guy who did the same thing with C++ and C# tests: right there alongside the code they tested. On the one hand, it is nice to have all of the relevant code in one place for context. On the other hand, it muddied up the actual business logic with tests, so you had to perform some mental work to figure out whether the code you were looking at was for production use or only for testing.
It's worth noting this convention is only for unit tests. Integration tests traditionally have their own test directory and files. Because of this, the file-inlined test code you see is normally relatively simple and shouldn't have a ton of extra helper functions and setup and such. The majority of the code should just be test cases themselves, which are clearly marked with #[test].
Note that Rust lets you put tests in tests/ instead of src/. So typically, you put short unit tests in src/ and longer/integration tests in tests/. Oh, and examples that show up in the documentation are also run alongside tests.
Also, the compiler will tell you if you have code that's not labelled with #[cfg(test)] but is designed to run only in tests.
Not .. exactly. An integration test usually tests an entire application at once. `tests/` can still test things in the public API (indeed, we do this in Servo a lot) without being an integration test.
The point is that some unit tests cannot be written in tests/, whereas all integration tests can be.
It's so that you can do unit testing without having to do weird things like airing your privates in tests, since private modules/functions/etc. aren't visible to your integration tests (which live in tests/ and basically just use your code as a library).
I started liking this convention when I started considering the tests documentation. In my code, the first unit test for any function is always the simple common case, and having that right next to the function usually makes the code easier to read.
It does have a bunch of tests, although probably not enough. I've been meaning to add code coverage stats to its CI, although I've only used kcov for Rust code coverage so far, and that only works on Linux it wouldn't reflect reality very well. I've done a lot of work on test automation at Mozilla over the years so automated testing is pretty important to me. :)
I actually like how that code turned out, I've thought about cleaning it up and publishing it as a standalone crate. Rust doesn't have a real test mocking solution yet, and even if it did this is a special case since the standard library's process execution types don't implement a trait that could be mocked.
As development went on I realized I didn't have automated tests that tested the whole program, especially running against real compilers (which is important given that the tool is a compiler wrapper), so I wrote some "system" tests that run the actual binary with an actual compiler and local disk cache and verify that it works as expected:
https://github.com/mozilla/sccache/blob/master/src/test/syst...
I didn't understand his comment on Rust's match expression.
Other than it returning a value (and insisting a default clause) it's just like C's switch, right?
It is much more like haskell's pattern matching. Notably it handles tagged unions (in rust enum). Here is a good example from the [book](https://doc.rust-lang.org/book/match.html):
It does not insist on a default clause it requires match completeness, which depending on the matched value may require a default clause (e.g. it does for integrals[0], it does not for enums as you can match each case individually)
> it's just like C's switch, right?
Only in its most simplistic form (though even then it does not ever fall through — whether by default or optionally — is type-safe and requires match-completeness), match performs refutable pattern matching on possibly complex values and allows additional per-match conditionals.
match foo {
// complex destructuring and multiple patterns for a case
Some((42, _)) | Some((55, _)) => { println!("1") }
// simple destructuring + conditional
Some(a) if a.0 % 5 == 0 => { println!("2") }
// matching + wildcard
Some(..) => { println!("3") }
// trivial matching
None => { println!("4") }
}
or
match (i % 3, i % 5) {
(0, 0) => {}
(0, _) => {}
(_, 0) => {}
_ => {}
}
[0] because the completeness checking doesn't really handle non-enum values currently
All the other answers are correct, in so far as they go, but they kind of fail to explain the bigger picture imo. A match statement like rust has is part of a larger concept where types are an integral part of control flow.
C/C++ switch makes no assertion about what is or isn't valid in its branches. That is, you might be switching on a tag field of a union or something like that, and then in one (or more) of the switch branches you may act on that information. But there is no compiler constraint on the correctness of that decision.
Pattern matching insists that the code inside a particular branch matches the type expectations asserted in its case clause. If you're in the branch for enum-type Blah, you can only act on Blah and not on Blorp. The compiler will force this on you.
To put this in practical terms, one area I have found this incredibly valuable (in Swift, but it applies here too) is in state machines. If you represent your state machine as an enum/union with fields for the information any particular state needs, every iteration through the machine you can be sure you are acting on the correct information. The compiler won't let you do otherwise.
2. Checked for exhaustiveness. This is different from "insisting on a default clause", because if you see a match with no default clause, then you know that the compiler is verifying that every possible case is being handled.
3. The cases of the match can be arbitrary patterns, not just integers. This allows you to perform very natural and powerful conditional control flow, especially when using tagged unions.
4. Can return a value.
Finally, remember the context of this post: Python doesn't have a C-style switch statement at all. :P
match also lets you destructure values, which is useful when you're dealing with enums. It also doesn't have C's fall-through behaviour.
Edit:
match doesn't insist on a default clause, it enforces exhaustiveness. Which can be achieved by having a default clause, but quite often you just have an explicit branch for every possible case.
I think other commenters have explained it pretty well, but suffice to say that if you get comfortable writing Rust you grow to really like using match, and most languages don't have an equivalent. :)
I'm not sure what this comment is implying. The author isn't complaining about Python's performance, rather, it's noting that concurrency in Python isn't as painless as it is in Rust (which isn't a controversial statement, Rust is explicitly designed for robust concurrency). It also isn't controversial that Rust code should end up faster than Python, considering that Rust is designed to prioritize runtime performance. This isn't a case of comparing the performance of two dynamic languages (as we so often had with all the "I switched to Ruby" or "I switched to Node" posts in prior years); nobody is going to hold up this blog post as proof that Rust is generally faster than Python, because nobody in the world argues otherwise (and I say this as a prolific Python user, not just as a Rust user).
3) Removing "unnecessary cruft" during the rewrite, measuring the speedup and then gradually adding the "cruft" (features) back one by one as previously unknown boundary conditions are encountered.
Restart the process again after a few years of years, as is tradition.
This is not what happened in this case (although I have seen this pattern in other situations in the past). The Rust version is a faithful port of the feature set of the Python version. (It had to be, we were using essentially the entire feature set in production.)
This effect is the reason my default is to be skeptical of benchmarks of new software. It's easy to be fast if you're willing to be incorrect or incomplete.
At the same time, it's not like there's no way around that. You could make the Python app to keep running and add a super lightweight binary to interface with it. Of course, that may not be worth the trouble.
That's what the python implementation did. Except the lightweight binary to interface with it was also in python. But that still had a massive win, not having to load all the modules every time.
I actually rewrote the interfacing binary in rust about two years ago, that would talk to the python server. The python equivalent would start the server if it wasn't already running, but rust, back then, didn't support spawning a subprocess and keeping it alive after the parent process quits, so it was never deployed as a replacement. I filed an issue about that, and iirc, that was fixed, but I never came back to finish and deploy the lightweight rust client. I probably should have.
For the record, I wasn't actually looking for any speedups with this rewrite. Most of it is a straight port of the Python code. The build time speedups were just a nice bonus. (I was measuring just to make sure I didn't regress build times.)
As the author of the original python sccache, I wanted to rewrite it in Rust and expected it would unlock implementing features that would have been hard to implement efficiently in python. I'm glad you beat me to it. Now we can focus on those features :)
It's using one of Phu Ly's WordPress themes from 10 years ago so that may be why. I'd recognize his themes anywhere. Back then 8pt was standard and flexible units were just catching on. I feel old.
And minio [1] seems to be the easiest pseudo-s3 there is ($ ./minio server CacheDir/, done?), or are there better alternatives by now?
1: https://github.com/minio/minio