A step beyond Rust's pattern matching

claytonwramsey · on March 28, 2024

In Rust, let-else patterns don't allow this, but match guards (used in `match` expressions and the `matches!` macro) can do something like this.

  while matches!(iterator.next(), Some((a, b)) if a + 1 == b) {
      println!("Hello!");
  }

However, using the bound values in an expression is a little harder.

  while match iterator.next() {
      Some((a, b)) if a + 1 == b => {
          println!("{a}, {b}");
          true
      },
      _ => false
  } {}

Since this is a well-formed transformation, it should theoretically be possible to implement this as a macro without changing the language.

kazinator · on March 28, 2024

The problem with patterns allowed to be expressions that simply evaluate to an object to be matched is that this gets in the way of doing potentially more useful things, like predicates, so it's kind of a waste:

TXR Lisp:

  1> (match (@a @(= (+ a 1))) '(1 3) a)
  ** match: (@a @(= (+ a 1))) failed to match object (1 3)
  1> (match (@a @(= (+ a 1))) '(1 2) a)
  1

An expression is treated as a predicate. The object being matched, like 2, is inserted as a right argument into it: (= (+ a 1) <obj>).

If you try the following, you get a surprising result:

  2> (match (@a @(+ a 1)) '(1 3) a)
  1

It is more verbose, but we can do more. We don't have to test for exact equality; e.g. @(< a) will match something that is greater than the previously matched a.

Capture a b that is greater than previously captured a:

  3> (match (@a @(< a @b)) '(1 3) (list a b))
  (1 3)

Or fail trying:

  4> (match (@a @(< a @b)) '(1 0) (list a b))
  ** match: (@a @(< a @b)) failed to match object (1 0)

We could treat certain functions differently; since + isn't a relational function producing a Boolean result, equality could be inserted implicitly to generate a predicate.

But, rather than that hack, there is a different one: user-defined patterns. We can have @(+ a 1) as a pattern by defining + as a pattern operator:

  1> (defmatch + (. args) ^@(= (+ ,*args)))
  +

Then:

  2> (match (@a @(+ a 1)) '(1 2) a)
  1
  3> (match (@a @(+ a 1)) '(1 3) a)
  ** match: (@a @(+ a 1)) failed to match object (1 3)

carterschonwald · on March 28, 2024

So I think what the author is suggesting is what was called “n+k patterns” in Haskell. I don’t recall the specific history, but over time there was a strong concensus that it was a bad idea. This might have been related to how signed integers work semantically, or some other gotcha

jerf · on March 28, 2024

I think it might just have been that the capability makes the pattern matching unbounded in human complexity. The right side's expression is already practically unbounded in complexity, making the left side too flexible becomes insanely difficult to figure out. And when the Haskell community is telling you that some feature is too difficult for humans to use sensibly it's best to listen.

I think simple pattern matching is useful, especially type-based pattern matching, including which element of a sum type is matched. But you want to watch out for how complicated it gets on the matching side. Complex matching code is a very strong form of coupling. It may be a good thing but I think as a language designer you don't want to overpower it because of the coupling it introduces as the power gets used.

YeGoblynQueenne · on March 28, 2024

Why don't languages who need pattern matching use Unification? Turing-complete pattern matching, with known fast implementations. Why does everyone need to reinvent the wheel ad-hoc, and end up with a square wheel that only rolls on Tuesdays?

https://en.wikipedia.org/wiki/Unification_(computer_science)...

saagarjha · on March 28, 2024

Because people have other goals. Doing Turing-complete pattern matching may not be desired, or someone might want something even more powerful (…or just different) than this.

Ar-Curunir · on March 28, 2024

Rust gives you strong guarantees about exhaustive matching. If you had a Turing complete mechanism, it could not provide such strong guarantees.

bwestergard · on March 28, 2024

Wouldn't it be highly undesirable to run a Turing complete unification process at the value level at run time?

YeGoblynQueenne · on March 28, 2024

Are you concerned about cost? There are linear-time and space algorithms for unification and they are algorithms (i.e. guaranteed to terminate).

marcosdumay · on March 28, 2024

Rust is all about zero-cost abstractions. Linear time and space ones do not fit the bill.

Some language willing to trade a bit of speed for expressivity could adopt it. But not Rust.

samueloph · on March 28, 2024

> Opens up blog post about pattern matching in Rust.

> Not a single "match" statement.

This post could increase its reach by explaining why a let statement is pattern matching, or at least point to a reference.

Although I understand people who don't know this already are not the target audience.

Here's the reference for those who need it: https://doc.rust-lang.org/book/ch18-01-all-the-places-for-pa...

ametrau · on March 28, 2024

Thanks

atoav · on March 28, 2024

What I really love about Rusts pattern matching is how well it integrates with the type system.

    match user {
       ImportUser{id, account: {AccountData: email, ..}, ..
    } => println({id}: {Some(email)}),
        _ => println("Not an ImportUser or has no mail")
    }

Of course this specific case could also be written with `if let`, but imagine how long this specific case would have been using the traditional conditionals.

demurgos · on March 28, 2024

I really enjoy Rust's enums; but I feel that that actually their integration could be even better. TypeScript's flow typing or Rust's work on pattern types [0] are better version of what we have currently.

The main issue is that it's very hard to propagate information found through matching .

[0]: https://github.com/rust-lang/rust/pull/120131

dwattttt · on March 28, 2024

Can't you use @ to bind any part of the match you want to use?

https://doc.rust-lang.org/book/ch18-03-pattern-syntax.html#-...

demurgos · on March 28, 2024

What I want is something similar to the switch from lexical to non-lexical lifetimes; but for enums and pattern matching. `@` is a construct depending on lexical scoping

In following code, it would be nice if the second match could use the early return from the first match to avoid extra checks. It would also be nice if it worked across function boundaries. It would help removing boilerplate conversions or some `unreachable!()` branches.

    enum Auth { Guest, User(u32), Bot(u32) }
    
    fn check_permissions(Auth &auth) -> Result<Permission, PermissionError> {
    match auth {
        Auth::Guest => return PermissionError::Unauthorized,
        Auth::User(_) | Auth::Bot(_) => { ... }
    // ... 
    match auth {
        Auth::User(_) => {...}
        Auth::Bot(_) => {...}
        // At this point the compiler should know that `Auth::Guest` is not possible
    }

rphln · on March 28, 2024

Often I wish that:

    enum Auth { Guest, User(u32), Bot(u32) }

was just syntactic sugar for

    struct Auth::Guest;
    struct Auth::User(u32);
    struct Auth::Bot(u32);

    type Auth = Auth::Guest | Auth::User | Auth::Bot;

where `|` denotes an anonymous union. Using enums feel very clunky when you start needing to work with subsets of them.

demurgos · on March 28, 2024

I 100% agree. I tried a few workarounds to get the same result, but it quickly gets out of hand. The two main solutions I'm using is to define the individual variants as in your post, and then one enum per union I care about; even with macros it gets very verbose. There's also the problem that it requires explicit conversions (so it's not zero cost). As an alternative, I define a generic enum where each variant corresponds to a generic param and then you can set it to an inhabited type (such as Infallible) when the variant is not possible. Converting is not zero-cost again. You can retrieve the zero cost manually, but then you need to deal with discriminators yourself / the compiler can't help you there.

The pattern type work I mentioned earlier would bring us very close to what you have at the end:

    enum Auth { Guest, User(u32), Bot(u32) }

    type FullAuth = pattern_type!(Auth, Auth::Guest | Auth::User(_) | Auth::Bot(_)); // useless, but works
    type IdAuth = pattern_type!(Auth, Auth::User(_) | Auth::Bot(_));

The full enum still needs to be defined. In particular, it serves as the basis the in-memory representation. All the subset can be handled by the compiler though.

bombela · on March 28, 2024

I think you might have meant to match on "Some(email)".

atoav · on March 28, 2024

True, thanks for catching that, corrected it

armchairhacker · on March 28, 2024

This looks a lot like logic programming: https://en.wikipedia.org/wiki/Logic_programming

arijun · on March 28, 2024

The case at the beginning, where we know our pattern is irrefutable but the compiler doesn't (let (x, 2) = (1, 1+1)) ,is something the compiler could easily prove (as long as the irrefutably could be proven locally as it is here). At first, I struggled to think of why that capability would ever be helpful. In the end, though, I came up with this example, which can (and does) come up:

  fn infallible() -> Result<u8, !>{
      Ok(1)
  }
  
  fn match_infallible(){
      let Ok(x) = infallible();
  }

The compiler here knows the `infallible` function will always return an `Ok` because of the return type. Here you could just `unwrap` the result, but what if it was inside a tuple, or something similar?

couchand · on March 28, 2024

The latter is shown to be type-based static analysis, even though it "looks like" there's a runtime component: the Never plugs one of only two holes in Result, so it statically becomes equivalent to a newtype.

The former can't be restricted to type analysis (without something like dependent types I think?). You're right that we could apply constant folding to check in this case. My understanding is that the Rust design philosophy is to skip things like this because of the potential to produce surprising results upon refactoring.

arijun · on April 4, 2024

My point was that both are static, but you’re right that one being done at the type checking stage probably makes a big difference in the feasibility.

> the potential to produce surprising results

That’s actually the big reason for that capability. If you do

    let result: Result<_,!> = …
    let ok(fine) = result

, if result ever changes from infallible to fallible, you get a compile error rather than the runtime error you would get with unwrap. In fact, when looking it up, it turns out that specific case was a big part of the motivation for the “!” type in the first place! I think the above code probably compiles in nightly because of that.

MuffinFlavored · on March 28, 2024

I think this part of the article has a typo (it is missing (2, 2) and a comma after the tuple in the vector macro assignment)

Say I have a vector of pairs:

    let the_data = vec![
      (1, 2),
      (2, 2)
      (2, 3),
      (5, 7),
      (8, 8)];

I can grab an iterator out of it and do the same pattern matching in a for loop:

    for (a, b) in the_data {
        println!("({}, {})", a, b);
    }
    
    // prints
    // (1, 2)
    // (2, 3)
    // (5, 7)
    // (8, 8)

kccqzy · on March 28, 2024

I'm not a fan of forcing a left-to-right order in the author's proposed syntax here. Sure the example given by OP is `while let Some((a, a + 1)) = iterator.next()` but if you want to write `while let Some((b - 1, b)) = iterator.next()`?

This is a contrived example because you can switch expressions to make it work. Now what if you want to match a tuple where the first element is the SHA256 hash of the second?

croemer · on March 28, 2024

There should be no `b` here in the println, this looks like a typo:

  while let Some((a, a + 1)) = iterator.next() {
    println!("({}, {})", a, b);
}

dhruvkb · on March 28, 2024

There's one more.

  if let (a, 2) = (1, 2) &&
     let (c, 4) = (3, 4) && // should be b, not c
     b == a + 2 {
        println!("Made it, a = {}, b = {}", a, b);
  }

codyd51 · on March 28, 2024

This was a fun read, kudos to the author! Adding one more typo to this thread:

  if let (a, 2) = (1, 2) &&
     let (b, 4) = (3, 4) &&  // <-- This should be an open brace.
      println!("Made it, a = {}, b = {}", a, b);
  }