While reading this my hope was it not being written in JavaScript. I'm not disap...

pantalaimon · on Jan 9, 2019

It's comparably slow though :p

    xxd `which hexyl` > /dev/null  0.12s user 0.09s system 99% cpu 0.219 total
    hexdump `which hexyl` > /dev/null  0.19s user 0.05s system 91% cpu 0.256 total
    hexyl `which hexyl` > /dev/null  1.69s user 0.39s system 95% cpu 2.175 total

ByronBates · on Jan 9, 2019

The same author also wrote hyperfine, a tool to compare performance of various program runs.

    hyperfine './target/release/hexyl ./target/release/hexyl' 'xxd ./target/release/hexyl' 'hexdump ./target/release/hexyl'
    Benchmark #1: ./target/release/hexyl ./target/release/hexyl
      Time (mean ± σ):      1.529 s ±  0.028 s    [User: 1.476 s, System: 0.050 s]
      Range (min … max):    1.491 s …  1.581 s    10 runs

    Benchmark #2: xxd ./target/release/hexyl
      Time (mean ± σ):      70.5 ms ±   0.5 ms    [User: 68.0 ms, System: 1.2 ms]
      Range (min … max):    69.5 ms …  72.3 ms    41 runs

    Benchmark #3: hexdump ./target/release/hexyl
      Time (mean ± σ):     262.4 ms ±   2.8 ms    [User: 260.1 ms, System: 1.5 ms]
      Range (min … max):   259.8 ms … 268.8 ms    11 runs

    Summary
      'xxd ./target/release/hexyl' ran
        3.72 ± 0.05 times faster than 'hexdump ./target/release/hexyl'
       21.70 ± 0.43 times faster than './target/release/hexyl ./target/release/hexyl'

Currently hexyl seems nearly 22x slower than xxd.

sharkdp · on Jan 9, 2019

... and I have already used hyperfine to benchmark hexyl as well :-)

Yes, it's a shame. But I don't think there is too much we can do about it. We have to print much more to the console due to the ANSI escape codes and we also have to do some conditional checks ON EACH BYTE in order to colorize them correctly. Surely there are some ways to speed everything up a little bit, but in the end I don't think its a real issue. Nobody is going to look at 1MB dumps in a console hex viewer (that's 60,000 lines of output!) without restricting it to some region. And if somebody really wants to, he can probably spare 1.5 seconds to wait for the output :-)

userbinator · on Jan 9, 2019

We have to print much more to the console due to the ANSI escape codes and we also have to do some conditional checks ON EACH BYTE in order to colorize them correctly.

A few extra comparisons and output for each byte shouldn't be that much slower; fortunately the function of this program is extremely well-defined, so we can calculate some estimates. Assuming a billion instructions per second, taking ~1.5s to hexdump ~1 million bytes means each byte is consuming ~1500 instructions to process. In reality the time above is probably on a faster CPU, so that number maybe 2-3x more. That is a shockingly high number just to split a byte into two nybbles (expected to be 1-3 instructions), convert the nybbles into ASCII (~3 instructions), and decide on the colour (let's be very generous and say ~100 instructions.)

The fact that the binary itself is >1MB is also rather surprising, especially given that the source (not familiar with Rust, but still understandable) seems quite small and straightforward.

steveklabnik · on Jan 10, 2019

Rust binaries can be large because unlike C, the standard library is statically linked, as well as jemalloc. Jemalloc will no longer be the default as of the next release, so that will shave off ~300k...

lilyball · on Jan 10, 2019

What's replacing Jemalloc?

yoklov · on Jan 10, 2019

The system malloc implementation. Users who want to use jemalloc have to opt in, but doing so is relatively easy (using the jemallocator crate from crates.io).

jabl · on Jan 10, 2019

Why was this done?

Did rust become less dependent on allocator performance, or did system allocators improve enough? IIRC glibc malloc has improved a lot over the last few years, particularly for multithreaded use, but I don't know about windows / macOS.

steveklabnik · on Jan 10, 2019

So, long ago, Rust actually had a large, Erlang-like runtime. So jemalloc was used. Over time, we shed more and more of this runtime, but jemalloc stayed. We didn't have a pluggable allocator story, and so we couldn't really remove it without causing a regression for people who do need jemalloc. Additionally, jemalloc was already removed on some platforms for a long time; Windows has been shipping the system allocator for as long as I can remember.

So, now that we have a stable way to let you use jemalloc, the right default for a systems language is to use the system allocator. If jemalloc makes sense for you, you can still use it, but if not, you save a non-significant amount of binary size, which matters to a lot of people. See the parent I originally replied to for an example of a very common response when looking at Rust binary sizes.

It's really more about letting you choose the tradeoff than it is about specific improvements between the allocators.

sharkdp · on Jan 10, 2019

It seems I was wrong. The new hexyl version is significantly faster (see my other comment)

qiqitori · on Jan 10, 2019

You may be able to speed things up by using a lookup table instead of branching.

(If it's spending a lot of time in Rust's format function you could also use a (or the same) lookup table to convert to hex/dec/oct.)

lilyball · on Jan 10, 2019

The format function is going to end up allocating a string for every single byte. That's a huge overhead.

Edit: Turns out to be about 22% overhead, see https://github.com/sharkdp/hexyl/pull/23. Also it was 2 strings per byte, not 1.

sharkdp · on Jan 10, 2019

Thanks to that PR, hexyl is now slightly faster than hexdump. Both are about a factor of 2-3 slower than xxd:

    Benchmark #1: hexyl $(which hexyl)
      Time (mean ± σ):     169.8 ms ±   8.2 ms    [User: 152.5 ms, System: 17.1 ms]
      Range (min … max):   162.2 ms … 189.1 ms    16 runs
     
    Benchmark #2: hexdump -C $(which hexyl)
      Time (mean ± σ):     188.5 ms ±   4.4 ms    [User: 186.2 ms, System: 2.2 ms]
      Range (min … max):   184.1 ms … 198.2 ms    14 runs
     
    Benchmark #3: xxd $(which hexyl)
      Time (mean ± σ):      72.8 ms ±   2.7 ms    [User: 71.9 ms, System: 1.1 ms]
      Range (min … max):    71.0 ms …  87.8 ms    40 runs

sjmulder · on Jan 11, 2019

I made a little clone for fun and got a bit carried away optimising. Now at about 3x the speed of hexyl 0.3.1:

https://github.com/sjmulder/hxl

Most of the improvement came from not using printf, fputs, and putchar in favour of operating directly on an array for the line that can be fwritten in one call.

sgt · on Jan 9, 2019

It takes a second to compute all that color. Nice tool though - but what I would really like to see is a --color argument added to xxd as well, as I'll probably forget about hexyl down the line and start looking for xxd again (which is conveniently already installed on most *nix systems).

sharkdp · on Jan 10, 2019

Fixed in v0.3.1 (https://github.com/sharkdp/hexyl/releases/tag/v0.3.1) :-)

weavie · on Jan 9, 2019

261 lines of pure Rust by the look of it.

dancek · on Jan 9, 2019

I was hoping that it's written in Rust. I wasn't disappointed either.

udp · on Jan 9, 2019

What difference would that make?

ConcernedCoder · on Jan 9, 2019

Yeah! -- Because prejudice rocks and JavaScript sucks right?

neysofu · on Jan 11, 2019

Every problem requires appropriate tools. Why would you use a slow and resource-intensive language for a thing as simple as hex viewing?