Hacker News new | past | comments | ask | show | jobs | submit login

> Having signed and unsigned variants of every integer type essentially doubles the number of options to choose from. This adds to the mental burden, yet has little payoff because signed types can do almost everything that unsigned ones can.

Unsigned types are quite useful when doing bit twiddling because they don't overflow or have a bit taken up by the sign.




> Unsigned types are quite useful when doing bit twiddling because they don't overflow or have a bit taken up by the sign.

That's essentially their only application. The rest are stupid single-bit memory-size optimizations. As Jens Gustedt noted, it's one of the (many) misnomers in the C language. It should be better called "modulo" instead of "unsigned". Other such misnomers that I recall:

    unsigned -> modulo
    char     -> byte
    union    -> overlay
    typedef  -> typealias
    const    -> immutable
    inline   -> negligible
    static   -> intern
    register -> addressless
EDIT: found the reference https://gustedt.wordpress.com/2010/08/18/misnomers-in-c/


> That's essentially their only application.

What about when it doesn't make semantic sense to have negative values? Eg for counting things, indexing into a vector, size of things. If negative doesn't make sense, I use unsigned types. Its not about the memory-size in that case.


While I also like to use unsigned numbers when that is the correct type of a variable, the C language does not really have support for unsigned integers.

As someone else already said, the so called "unsigned" integers in C are in fact remainders modulo 2^N, not unsigned integers.

While the sum and the product of 2 unsigned integers is also an unsigned integer, the difference of 2 unsigned integers is a signed integer.

The best behavior for a programming language would be to define correctly the type of the difference of 2 unsigned integers and the second best behavior would be to specify that the type of the result is unsigned, but to insert automatically checks for out-of-___domain results, to detect the negative results.

As C does not implement any of these behaviors, whenever using unsigned integers you must either not use subtraction or always check for negative results, unless it is possible to always guarantee that negative results cannot happen.

This is a source of frequent errors in C when unsigned integers are used.

The remainders modulo 2^N can be very useful, so an ideal programming language would support signed integers, unsigned integers and modular numbers.


If negative doesn't make sense then you are saving one bit using this method, but introducing a ton of fun footguns involving things like conversions. Further, the compiler cannot assume no overflowing and must now do extra work to handle those cases in conforming fashion, even if your value width doesn't match the CPU width. This can make your code slower!


Also, go to the Compiler Explorer and compare the generated code for C++ "num / 2" when num is an int, and when num is an unsigned int.

While there are a few cases where the compiler tends to do a better job of optimizing signed ints than unsigned ints (generally by exploiting the fact that signed integer overflow is undefined), they are not as fundamental as "num / 2". Being forced to write "num >> 1" all over the place whenever I care about performance is basically a dealbreaker for me in many projects; and I haven't even gotten into the additional safety issues introduced by undefined overflow.


Positive values are a particular case of signed values, you can still use signed ints to store positive values. No need to enforce your semantics through type, and especially not when the values of the type are trivially particular cases of the values of another type. For example, when you write a function in C that computes prime factors of an int, do you need a type for prime numbers? No, you just use int. The same thing for positive numbers, and for even numbers, and for odd numbers. You can and should do everything with signed integers, except bitfields, of course.


> No need to enforce your semantics through type

Maybe I'm spoiled by other languages with more powerful type systems, but this is exactly what I want my types to do! Isn't this why we have type traits and concepts and whatnot in C++ now? If not for semantics, why have types at all, the compiler could figure out what amount of bytes it needs to store my data in, after all.

I use types for two things: to map semantics to hardware (if memory or performance optimization are important, which is rare) and to enforce correctness in my code. You're telling me that the latter is not a valid use of types and I say that's the single-biggest reason I use statically typed languages over dynamically typed languages, when I do so.

But even if that's not the case, why would I use a more general type than I need, when I know the constraints of my code? If I know that negative values are not semantically valid, why not use a type that doesn't allow those? What benefit would I get from not doing that? I mean, why do we have different sizes of integers when all the possible ones I could want can be represented as a machine-native size and I can enforce size constraints in software instead? We could also just use double's for all numbers, like some languages do.


> Maybe I'm spoiled by other languages with more powerful type systems, but this is exactly what I want my types to do! Isn't this why we have type traits and concepts and whatnot in C++ now? If not for semantics, why have types at all, the compiler could figure out what amount of bytes it needs to store my data in, after all.

yes, but understand that, despite the name, what unsigned models in C / C++ is not "positive numbers" but "modulo 2^N" arithmetic (while signed models the usual arithmetic).

There is no good type that says "always positive" by default in C or C++ - any type which gives you an infinite loop if you do

    for({int,unsigned,whatever} i = 0; i < n - 1; i++) {
       // oops, n was zero, n - 1 is 4.something billion, see you tomorrow
    }
is not a good type.

If you want a "always positive" type use some safe_int template such as https://github.com/dcleblanc/SafeInt - here if you do "x - y" and the result of the computation should be negative, then you'll get the rightful runtime error that you want, not some arbitrarily high and incorrect number

The correct uses of unsigned are for instance for computations of hashes, crypto algorithms, random number generation, etc... as those are in general defined in modular arithmetic


+1 for this. I was just bitten by this last week, when I switched from using a custom container where size() was an int to a std::vector where size() is size_t.

The code was check-all-pairs, e.g.

  for (int i = 0; i < container.size() - 1; ++i) {
    for (int j = i + 1; j < container.size(); ++j) {
      stuff(container[i], container[j]);
    }
  }
Which worked just fine for int size, but failed spectacularly for size_t size when size==0.

I totally should have caught that one, but I just couldn't see it until someone else pointed it out. And then it was obvious, like many bugs.


I recommend using -fsanitize=undefined -fsanitize=integer if you can build with clang - it will print a warning when an unsigned int underflows which catches a terrifying amount of similar bugs the first time it is run (there are a lot of false positives in hash functions, etc though but imho it's well worth using regularly)


Would you really write a function find_prime_factors() that takes an input of type "integer" and an output of type "prime", that you have previously defined? Then if you want to sum or multiply such primes you have to cast them back to integers. Maybe it makes sense for you, but for me this is the textbook example of useless over-engineering.

The same ugliness occurs when using unsigned types to store values that happen to be positive. Well, in that case it is even worse, because it is incomplete and asymmetric. What's so special about the lower bound of the possible set of values? If it's an index to an array of length N, you'll surely want an integer type whose values cannot exceed N. And this is a can of worms that I prefer not to open...


> Would you really write a function find_prime_factors() that takes an input of type "integer" and an output of type "prime", that you have previously defined?

If the language allows me to and its an important semantic part of my program, then yes. The same way as I would create types for units that need conversion.

Unless I'm writing low level performance sensitive code, yes, I want to encode as much of my semantics as I can, so that I can catch mistakes and mismatches at compile time, make sure units get properly converted and whatnot.

> What's so special about the lower bound of the possible set of values?

Nothing, I would encode a range if I can. But many things don't have a knowable upper-bound but do have a lower bound at zero: you can't have a negative size (for most definitions of size), usually when you have a count of things you don't have negatives, you know that a dynamically sized array can never have an element index less than 0, but you may not know the upper bound.

Also, the language has limitations, so I have to work within them. I don't understand your objection for using what is available to make sure software is correct. Also, remember that many of the security bugs we've seen in recent years came about because of C not being great at enforcing constraints. Are you really suggesting not to even try?

> And this is a can of worms that I prefer not to open...

And yet many languages do and even C++20 is introducing ranges which kind of sort of fall into this space.


To me it could totally make sense. It depends on the context, but I can very well see contexts where such a choice could make sense. For example, in line of principle it would make sense, for an RSA implementation, to accept to construct a type PublicKey only computing the product of two Prime's, and not two arbitrary numbers. And the Prime type would only be constructible by procedures that provably (perhaps with high probability) generate an actual prime number. It would be a totally sensible form of defensive programming. You don't want to screw up your key generation algorithm, so it makes sense to have your compiler help you to not construct keys from anything.

For the same reason, say, in an HTTP server I could store a request as a char* or std::string, but I would definitely create a class that ensures, upon construction, that the request is valid and legitimate. Code that processes the request would accept HTTPRequest, but not char*, so that unverified requests cannot even risk to cross the trust boundary.


But "unsigned" doesn't actually enforce the semantics you want. Missing an overflow check means your value will never be negative, but it is almost certainly still a bug. And because unsigned overflow is defined, the compiler isn't allowed to prevent you from doing it!

This is just enough type semantics to injure oneself.


So, because its not perfect, should you throw it all out?


No. Because people tend to make more mistakes if they try to use unsigned values in this manner in C/C++.


I’ve personally never encountered a bug that turned out to be caused by an unsigned value. YMMV, I guess.


If seen all sorts of bugs caused by surprise conversions, as well as overflows that cause bugs that would be statically detectable but can't become blocking errors because unsigned overflow is well defined.


> Positive values are a particular case of signed values, you can still use signed ints to store positive values.

And yet Java's lack of unsigned integers is considered a major example of its (numerous) design errors.

> No need to enforce your semantics through type, and especially not when the values of the type are trivially particular cases of the values of another type.

Of course not, there's no need for any type at all, you can do everything with just the humble byte.

> The same thing for positive numbers

No?

> You can and should do everything with signed integers

You really should not. If a value should not have negative values, then making it so it can not have negative values is strictly better than the alternative. Making invalid values impossible makes software clearer and more reliable.

> except bitfields, of course.

There's no more justification for that than for the other things you object to.


Java's lack of unsigned int is widely (but not universally) seen as a deficiency. This is especially true when Java is compared to C#, a very similar language at its core but which does have uint types. Anyway, I have a separate article arguing why Java should not have uint, and many ideas from there can be adapted to C/C++ too: https://www.nayuki.io/page/unsigned-int-considered-harmful-f...


Well, you and me are different persons and we don't have to agree on everything. In this case, it seems that we don't agree on anything. But it's still OK, if it works for you ;)


Thanks for this, gonna add some #defines to my headers :)


> const -> immutable

const -> read_only_view is better


Which is why sane languages of the time had a bitfield type.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: