Hacker News new | past | comments | ask | show | jobs | submit login
Google researcher, long out of math, cracks devilish problem about sets (quantamagazine.org)
420 points by kentricon on Jan 3, 2023 | hide | past | favorite | 107 comments



Always cool to see unexpected applications of information theory, especially outside of probabilistic contexts.

A related toy example that comes to mind is the puzzle where you're given 12 metal balls and told that one of them is either heavier or lighter than the rest. You're then given a balance/scale and instructed to figure out which ball is different and whether it is heavier or lighter using the balance a maximum of three times to weigh different combinations of balls.

Part of the reason this is solvable is because there are 12x2=24 different possibilities (or log2(24) = 4.58 bits) for the ball ID/weight and 27=3x3x3 possible outcomes (or log2(27) = 4.75 bits) for the weighing process (left heavier, right heavier, or equal in each use). Thus, there's a chance the weighing sequence could convey the ball ID/weight. If there were 14 balls, however, then you would require log(28) = 4.81 bits of information to specify the ball ID/weight, so the puzzle would be unsolvable, since there wouldn't be enough information available in the weighing sequence to specify the ball ID/weight.


I once made a cool solution to this problem using information theory, in particular coding theory.

The usual solution uses the fact there are 3 places to put items: left scale, off scale, and right scale. Then those solutions do if/then cases to juggle items around to find the mis-weighted ball.

My solution numbered the balls in base 3, then did four weighings with no if/thens,. The weightings each gave a base 3 result: left, balance, right. This formed the digits of the mis-weighted ball.

The idea is to make an error correcting code in base 3, and use the "weighings" as the correction matrix to isolate the bad "bit".

This idea generalizes to solve a massive range of find the odd item(s) problems.


Just took a stab at recreating my older solution. It goes something like this

Number the balls 7 8 9 10 11 12 14 15 16 17 18 19

Then do three weighings according to rows like this:

     0  1 -1  0  1 -1  1 -1  0  1 -1  0
     1  1 -1 -1 -1  0  0  1  1  1 -1 -1
    -1 -1  0  0  0  0  0  0  0  0  1  1
Where -1 means index on left side of of balance, 0 means not on balance, 1 means on right side of balance. Note each row has the same number of -1 and 1.

If left is heavy, record 0, else if balanced, record 1, else if right heavy, record 2.

Then the base 3 value gives the coin label above (in absolute value) and is >0 if heavier, else is -1 if lighter.

Also note there is no if/then. You simply follow these three and record the numbers which spit out the answer.

The idea is the matrix represents an error correction code in base 3. You need each column distinct, and coin j being out of whack selects column j from the matrix.


I like how simple the deciding process becomes but I’m afraid there might be a problem. This solution works when the odd ball is always heavier, but it could either be heavier or lighter, and you don’t know which.

For example, say ball 7 is heavier. From top to bottom, the weightings would result in: 120. However, if ball 19 was lighter you would obtain the same weighing sequence 120.

I believe that there is another condition that must be imposed on the matrix - namely, the column must be unique, but also -1*column must not be another column.

I created the following matrix, which can identify the ball and if it is heavier or lighter. However, this does not line up neatly, and would require a table to decode the output of the weighings.

-1 -1 -1 -1 1 1 1 1 0 0 0 0 -1 -1 0 1 0 -1 -1 0 1 1 0 1 0 1 -1 0 -1 1 -1 0 1 0 1 -1

For example, the sequence 001 means the first ball is heavier. The sequence 221 means the first ball is lighter. Etc.


You might be right. I didn't check as carefully as when I did it decades ago. This is all from memory, and the matrix above I sketched out over dinner by hand :)

The idea works. When I solved it before I cider it up to test


I don't understand how SideQuark is distinguishing negative results from positive results. But using the representation he suggested to me sidethread, it isn't possible to have the result be positive whenever the misweighted ball is heavy and negative whenever it's light.


Does this still work if I mislabel the balls to begin with?


Why do the balls have to be numbered starting at 7?


There's a few hundred sets of numbers that work, but none are the sequence 1,2,..,12 or so, due to needing balanced left and right weighings. To see this, write the base 3 values for 1,2,..,12 as columns, and count the number of each digit in each row - you need equal numbers of 0 and 2 per row to make a valid weighing.

Here I picked the above since it is a valid sequence, but fails for the reason elsewhere on this page - you also need to not have negative paired columns. And that too turns out to not be enough, so write a simple program and crunch to check each case that does work, and there's (I think) 304. I should stop poking at this as I have real work I should be doing :)

If you really want to clean it up, you could then map the answer 3-vector of scrambled -1,0,1 values into another decoder matrix to map into values you like. Or use a look up table.


Not OP, but the reason is because that the values that you’re looking for tell you how to do the weighings, where the rows are weighings and the columns are the indices (it’s easier to see if you don’t translate from {0,1,2} to {-1,0,1}). Because of the problem you need the same number of balls on each side of the scale.

By choosing 7-19, omit 13, you finding the center values of the range 0-26. This is important because when you write the values in ternary each column has the same number of 0 and 2 digits.

This would not hold if you simply used the values 0-12, for example. Then the first column has eight 0’s and four 1’s, so it does not correspond to a valid weighting.

It’s important to note that this method doesn’t work for the problem described - it only works if you know that the odd ball out is heavier. You can see my other comment for a description of why, and a working solution.


I'm a different person, but I can provide some commentary about that.

A. There are many ways to identify the misweighted ball in three deterministic weighings. The simple approach is to label the possible states 1 to 24, or alpha to omega, or what have you, and then to assign an outcome code to each state.

SideQuark wants to entwine the outcome codes with the state labels, so that the label for each state is the value represented by its outcome code when that code is interpreted as a trinary number. This sets up feedback between things that would otherwise be independent of each other; it's sort of like writing a quine.

B. Therefore, it's possible that the odd choice to start numbering the balls at 7 (and skip 13) represents a compromise with what outcome codes our system can generate.

(For example, you can identify and diagnose the misweighted ball in these three deterministic weighings: 1,2,3,4 vs 5,6,7,8; 1,5,9,10 vs 2,6,7,11; 2,4,6,9 vs 3,7,10,12. If an outcome of 0 represents the right side of the scale being heavier, and 1 represents the left side being heavier, then the outcome code 121 uniquely identifies the 4 ball as being too heavy. But 121 interpreted as a trinary number is 16, not 4.

In fact, in this weighing scheme, the outcome code "4" (= 011) cannot occur, and 9 and 26 are also impossible.)

C. SideQuark appears to be cheating in that his balls are labeled from 7 to 19 (skipping 13), but his column indices range from 1 to 12. This means he's given up on getting a real correspondence between label and outcome code (the balls have different labels depending on whether you're reading the specification of the test [1-12 in full] or the interpretation of the result [7-19 with a hole]), but it looks like he is still aiming to have a meaningful correspondence between the code that specifies a particular ball is heavy and the code that specifies the same ball is light.

D. We're going to record a trinary number between 000 and 222, a range from 0 to +26. It is not clear to me how we would get a negative number. We could interpret the number we get as a three-trit three's complement value, but in that case we'd really want the balls to be labeled from 1 to 12, so that a single value uniquely identifies and diagnoses a ball - for an example of the problem we're about to run into, the outcome 101 represents the value 10, and should tell us that the 10 ball is too heavy, but if we're using three's complement then 101 is also the value -17, which tells us that the 17 ball is too light. This is a bad outcome when the 10 ball and the 17 ball both exist.


>It is not clear to me how we would get a negative number.

Use base 3 with 'digits' -1,0,1.

It's useful and interesting to note you can uniquely encode and decode any integer in any base with 'shifted' digits. If you allow some digits to be positive and some negative, you get both positive and negative integers encoded for free without needing sign extraction, using standard algorithms. You can also change base per slot and go into mixed-radix tricks for encoding (useful for turning things with varying choices per slot into unique encodings).

Things look like

    Encode(int val, int base, int shift)
        while (val != 0)
            d = ((val%b)+b)%b; // get positive mod 'digit'
            if (d + shift >= b) d = d-b; // shift digits
            val -= d; // kill off low digit - often not done in positive only case
            val /= d;
            output(d)

   Decode (list of digits, int base, int shift)
        val = 0
        reverse digits
        foreach digit d
           val = val * b + d
        return val
with positive shift, this encodes and decodes positive and negative integers in any base correctly, and works like the normal postive only case.

digits -1,0,1 also makes a unique base 3 representation of any integer using the same encoding and decoding algorithms. In fact, you can shift digits in any base like this and still get unique encoding and decoding, with the benefit that they handle positive and negative numbers for free.


I'll look into this more later, but right now it's bothering me that (a) your writeup specifies that the digits of the result code are drawn from 0/1/2 [a minor issue]; and (b) the minimum number representable in three digits of base 3 where the digits are -/0/+ is ---, -9 + -3 + -1 = -13, which will make the results -14 through -19 unreachable. The same problem occurs on the other side, where the maximum result is +++, positive 13.


All right; I've done some more looking.

Shifting the digits is exactly equivalent to biasing the number; in this case, changing the interpretation of the graphical symbols "0", "1" and "2" to be the numeric values -1, 0, and 1 just means we subtract trinary 111 = decimal 13 from our three-digit value. So we could represent the value -4 as "0--" (in "shifted trinary") or we could represent it as "100" (in biased trinary, since 9-13 = -4) and those representations are glyph-to-glyph isomorphic. Since the representable range of three digits of ordinary trinary is 0-26, the representable range using three of these shifted digits is -13 to +13.

Using this -/0/+ representation, it is easy to state a weighing schedule in which the absolute value of the three trinary digits you get as output from the scale represent equals the label (1-12) of the misweighted ball.[1]

But it appears to be impossible for the representation to be positive whenever the ball is heavy and negative whenever the ball is light.

Let's assume that we record a - when the scale tips right and a + when the scale tips left.

The problem is that every time we do a weighing, there will be at least one ball on each side of the scale. (Or else the weighing would generate zero information.) So without loss of generality, there is a ball L on the left of the weighing which gives us our most significant digit, and a different ball R on the right of the same weighing.

Since we record a - when the scale tips right, our most significant digit will be -, and therefore our overall result will be negative, whenever ball R is heavy. This conflicts with your stated goal of having the result be positive whenever any ball is heavy.

This will make it difficult to design an algorithm to diagnose the ball that doesn't involve any branches - we can "branchlessly" (to the extent taking an absolute value doesn't branch) determine which ball is misweighted, but we need a branch (or a lookup table) to determine whether it's light or heavy.

[1] This will do the job:

    6 8 10 11  vs  5 7 9 12
    3 5  7 11  vs  2 4 6 12
    1 2  5 10  vs  4 7 8 11
Record 0 when the scale balances, and otherwise plus or minus 1. The magnitude of the result will always equal the value of the misweighted ball, but the meaning of the sign is different for balls 2, 4, 5, 7, 9, and 12 (which have their first appearance on the right) than it is for balls 1, 3, 6, 8, 10, and 11 (which have their first appearance on the left).


That is a beautiful solution, wow. I'm just floored by you coming up with that, it is so utterly out of the box.


It's not out of the box when I apply error correcting codes to a massive range of problems ;)

If you like cool problems, here is one I have not found a simple solution for, but I once saw it in a place where a simple solution should exist.

You have an odd number >=3 of items with the property that any time you remove one, there is a way to split the remaining items into two equal count piles of equal weight. Show all items have the same weight.

It's easy for 3 items, A,B,C. Remove C, A balances B, then do A,C, now all equal.

It's easy to check by hand for a few small cases.

I proved it using more advanced stuff from lin alg. But I've never found a simple solution yet.

EDIT: corrected with "equal count piles"


I'm not sure I get the problem.

For number 5, I can use items of weights 3 1 1 1 1 and they are not equal. If you remove 3, you get 1+1=1+1 and in case you remove only of 1s, you can split 3 = 1+1+1.


Ah, problem is in memory. Probably also needs two piles of equal number of items :)


And this generalizes - if n = 2k + 1, 2k times x and once (2k - 1)x can be split into twice kx after removing the single (2k - 1)x or twice (2k - 1)x after removing one x. For k = 1, i.e. n = 3, (2k - 1)x = x. So there is an infinite family of counter examples for all odd n > 3.


I remember that problem from my college linear algebra course! I found a nice simple proof for integers (which was very satisfying!) but afaik you do need heavy linear algebra machinery to generalise it to the reals.


> I found a nice simple proof for integers (which was very satisfying!)

But no such proof can exist because the claim is not true. When you posted your comment, Fetiorin had already given a counterexample.


I was pulling the problem from memory. Add the requirement boths sides are same number of items.


Why doesn't the integer proof work on reals?


it relied on modular arithmetic


Thinking on this on my drive, I recalled another puzzle you might like with another error corrected solution.

The puzzle is related to the game 20 questions. If you can ask only 20 yes/no questions, you can differentiate 2^20 items. For example, with 20 yes/no questions, you can determine an integer in 1 to 1,000,000 (technically, up to 2^20).

Now consider you need to figure out what integer between 1 and 1,000,000 I am think of, but I am allowed to lie at most once. Now how many questions will you need to ensure you get the answer?

Doing this by hand and creating more and more convoluted tricks is possible, but in reality this is another error correction problem: The original problem is a binary message of length 20 bits, from which you get the 20 bit message.

With at most one lie, there is a 20 bit message, me (the noisy channel) can add at most one error, so you need the minimal length binary code that takes a message of length 20 and a max error of 1 bit.

To find the number needed, use a coding table like http://www.codetables.de/.

So you need to find how many more bits you need. To fix d bits, you need 2d+1 'distance' in your code to make sure d errors do not move one valid point within Hamming distance d of another answer, so we need d = 3.

So you need a binary code, d = 3, and encoding a message of length k=20. Click on Linear Codes, then the GF2 button, and you see the big table of best codes. Find the column with k = 20 (our message length), scroll down over increasing n (number of questions needed), until the first d=3 entry. There n = 25, so 25 questions will do it. This is the result http://www.codetables.de/BKLC/BKLC.php?q=2&n=25&k=20

This is provably best, otherwise the table would have a range if there were not proof (look around the table, notice some are ranges).

To design the questions, most places with tables have code constructions. So click the entry, and they describe the construction (unfortunately in error correction code terms), but that can be reversed into English and actual questions to get your answer.

This technique would let you answer any problem along the line of "How many questions, each with b options, would it take to determine between N items, if there are at most d lies told?"


Interesting! Intuitively I would have had the answer right but I have no idea where that bit of info popped up from. Possibly something to do with SDLC or some other telecommunications protocol that I worked with or implemented the drivers for.


It is a beautiful solution, but it's not terribly out of the box. It's similar to generating all subsets of a set of N items by iterating through the integers 0..2^N - 1 and checking to see which bits are set.


All subsets requires 2^12 weighings. The beautiful thing about error correcting codes is finding a minimal set of spanning items. And in this case, binary will not suffice, since you can only distinguish 2^3 = 8 items. The third position, one most people don't think if, is the "off balance" set, which gives three choies.

Then 3 weighings, 3 choices per, gives 27 possible outcomes, which is enough to distinguish which of 12 items is off, as well as whether or not it's higher or lower in weight (another factor of 2), for 24 choices. So there is a little extra in the 3=27-24 information theoretical bounds.

Using this idea, you can generate all sorts of hard coin problems, vastly harder than this one :), by mining error correction tables like http://www.codetables.de/ and converting to word problems.


As stated, it's not a solution at all - it uses four out of three possible weighings.


No, it does three. The "four" is a typing mistake. And it's provably optimal from the theory behind construction of minimal error correcting codes.


Ah, it looks like there's a typo in your earlier post. It currently reads: "My solution numbered the balls in base 3, then did four weighings with no if/thens..." (Emphasis mine.)


I always thought of bit boards in the same way, wow moment.


That's clever! If you get an "equal" result from the scale, then you can put those balls on the scale with the unknown ones, so you can maintain symmetry in the weighings.


That's called IIRC an "unconditional" solution (in that later weighings don't depend on earlier weighings). David MacKay poses that a problem in his Information theory book on p. 85, and gives the solution on the next page.

http://www.inference.org.uk/itprnn/book.pdf


Can you explain how you weigh the balls? How many do you weigh at a time and how do you decide which ones to weigh? I'm trying to follow your solution but I'm confused about this one thingn


The traditional solutions do things like: 1) split into 4 piles A,B,C. compare A,B on the balance. 2) If they balance, bad coin in 4 in pile C can easily be found. 3) If they don't balance, you know the 4 in C are good. At this point you have to fiddle by mixing coins among piles cleverly to get more weighings. But you can do it in three total. If I recall, seomthing like one from A, three from C, versus three from B, one from C, or something like that, is the next weighing.

This type of solution requires choices and branching. Mine was nicer I think since it requires no branches, and the weighings themselves spit out the bad coin number in base 3.


There's a slightly simpler solution if you don't care whether the odd ball is lighter or heavier:

It's:

X. AAAA vs BBBB

If X was balanced, XX: C1 vs C2, then XXX: C2 vs C3 (general way to distinguish 2 unknowns: weigh one unknown against one known)

If X was unbalanced, WLOG assume A was heavier. XY: A1 A2 B1 vs A3 A4 B2

Then if XY balanced, use the general method to distinguish B3 or B4 as answer.

If XY unbalanced, WLOG assume A1 or A2 is heavy, or B2 is light, so XYY A1 vs A2 finds the heavy one or implies B2 is light.


What is C1 and C2 and C3?

What is XY? What's WLOG?

Can you dumb this down a bit more please?


There's an episode of Columbo with a similar problem, expressed as: there are a number of bags. Each bag contains many gold coins. The gold coins weigh 1 ounce each. However, one bag contains fake coins, which appear the same as the real coins, but have a slightly different weight - 1.1 ounces per coin. You have a scale, which you can only use once, and you must find the bag with the fake coins.

The solution is: Take one coin from the first bag, two coins from the second bag, three coins from the third bag, etc. Weigh all of the selected coins at the same time to get a total weight. Subtract from the total weight the triangular number of the number of bags[1], in ounces. The remainder, divided by 0.1, will be the number of the bag with the fake coins.

[1] eg, for three bags, 1 + 2 + 3 = 6


Isn't that overcomplicating the solution?

The only requirement is to have a unique number of coins from the fake coin pouch. Since we don't know which one that is, every pouch gets a different number of coins, and not necessarily a sequence starting from 1. Then, you weigh them all assuming they're all real, but wait, they're not! So you take the excess weight, divide it by the excess weight per fake coin, and the bag from which you took out that many coins is the bag with the fake coins.


I'm amazed that there was an episode of a popular TV drama aired that had such a detailed puzzle in it.


Columbo episodes were unusually long. Each was the length of a short movie: ~1 to ~1.5hrs.

One of the Die Hard movies had the "extract 1 gallon from a 3 gallon and a 5 gallon jug" puzzle in it.


The 12-ball problem is a fun one and IMO the key revelation is that you do not need to determine whether the ball is lighter or heavier (which isn’t possible anyway). Just which ball is different than the rest. It’s a great exercise in decision trees as you change your strategy based on total information, not just the most recent step.

It’s also fun to scale the problem from 3 balls up to see how the solution evolves.


>> you're given 12 metal balls and told that one of them is either heavier or lighter than the rest. You're then given a balance/scale and instructed to figure out which ball is different and whether it is heavier or lighter using the balance a maximum of three times

> The 12-ball problem is a fun one and IMO the key revelation is that you do not need to determine whether the ball is lighter or heavier (which isn’t possible anyway).

Huh?


If I'm reading this [0] right, the heavier/lighter factor can be determined within 3 weightings... [0] https://suresolv.com/brain-teaser/find-fake-ball-among-12-id...


Of course it can. There are 24 possible states and you get enough information to uniquely identify one of 27 states. This was already stated in the root comment.

I'm confused how the response to "you must identify whether the odd ball is heavier or lighter" can be "the key to this is realizing you don't need to know whether the odd ball is heavier or lighter, which is good because that's impossible anyway".


Sorry, yes I saw your confusion. If you read back to your Huh comment you might forgive me for reading it as that you were unsure of the truth yourself. The one word 'huh?', spoke to me as "which one of these is true?". Whereas a person who was sure of the true answer, I would expect to say "Hey that's incorrect", if they encountered an obviously mistaken quote.


Scale down first! Interesting patterns start from 0 and 1.


I'm not sure why you say it won't work with 14 balls, this is solvable with up to 27 balls in 3 weighs.

Weigh 1: 9 vs 9 (18 on balance), 9 off

Weigh 2: 3 vs 3 (6 on balance ), 3 off

Weigh 3: 1 vs 1 (2 on balance ), 1 off


You don't know if the unique ball is heavier or lighter, how would this work?


Ah, if you need to know that for certain then you'd be restricted to 18 balls (6,6,6), (2,2,2), (1,1). Otherwise I guess you would "only" know in 26/27 cases if it was heavier or lighter (off the balance in all 3 conditions).

....and now I understand, you would only initially know there is a difference in weight, not which side had the heavier or lighter ball.


No, you can't solve the harder version with 18 balls, as you just proved.

18 balls have 36>27 states.


> “Mike said, ‘Justin, you’re going to get me thinking about this problem again and I don’t want to do that,’” said Gilmer.

This is a very relatable feeling. I often find myself getting trapped into thinking about problems I am unlikely to solve.


Reminds me of the multi-armed bandit problem, where Allied scientists considered it to be so intractable that they proposed dropping it over Germany so their scientists would waste their time on it.


The randomized version of nerd-sniping, nerd-bombing.


I find it's a cruel trick people, usually inadvertently, play on me at work: "when you've got a sec can you look at x?"

There's a portion of my brain that's immediately hijacked and starts thinking about the problem whether I'm actively working on it or not.

Happens to me for work problems outside of work, too. I'll just be chilling in the shower or something and suddenly I'm thinking about a work problem.

Maybe one day I'll retire from programming and do something less theoretical e.g. driving a semi.


Bad idea. Don't drive and derive. You'll be driven to distraction and crash.


The obligatory XKCD https://xkcd.com/356/


Damn you! Now all I can think about is that xkcd problem.


That's taken from the Google Labs Aptitude test, IIRC it's a simplification of https://arxiv.org/abs/cond-mat/9909120 with solution here https://mathworld.wolfram.com/news/2004-10-13/google/


> This is a very relatable feeling. I often find myself getting trapped into thinking about problems I am unlikely to solve.

I have this problem as well. The most disturbing thing is retracing old paths to dead ends without realizing it. Here I mean internal mental models, but also applies to previously known results I just wasn't familiar with, and don't turn up unless specific phrases are known.


Yep, same. About once a year I get sucked for a day or so into the Noita Eye Messages puzzle (https://noita.fandom.com/wiki/Eye_Messages).


Oh lordy, same. I personally can't wait for someone else to solve it.


I have 3, personally. One day.. ONE DAY.. I will make the elusive perfect deep dish crust. One family can only eat so much pizza though, so you just have to let it go. If it's gonna happen then it'll happen.


I find that one of the hardest parts of doing research is knowing when to give up on a problem


Just do what I do, focus on easy problems! Doesn't take that long but you feel great anyway.


Paper link: https://arxiv.org/abs/2211.09055

Abstract:

We show that for any union-closed family F\subseeq 2^[n],F≠{∅}, there exists an i∈[n] which is contained in a 0.01 fraction of the sets in F. This is the first known constant lower bound, and improves upon the Ω(log_2(|F|)^{−1}) bounds of Knill and Wójick. Our result follows from an information theoretic strengthening of the conjecture. Specifically, we show that if A,B are independent samples from a distribution over subsets of [n] such that Pr[i∈A]<0.01 for all i and H(A)>0, then H(A∪B)>H(A).


I’m glade to know that “did a PhD in maths six years ago and work in applied mathematics” is now being long out of math. I see clickbaity titles are still doing fine in 2023.

Impressive result however.


To be fair, results like this are very improbable to come from someone who is not a full-time mathematician. And no that does not mean someone working in machine learning or someone who did their PhD 6 years ago. How many such results do you see from someone who is not a full-time mathematician?

It is possible for someone who is not a full-time mathematician by still working on mathematics in their spare time or in a mathy-field in their day job but very very few real examples of that can be found.

So I don't find the title clickbaity. It is really really hard to come up with results like this without working on mathematics research (not AI research) 10 hours a day!


Totally agree. I did a PhD in Pure Math. 6 years later I could no longer give anything but a superficial explanation of my own thesis. Now ~15 years later my thesis is totally impenetrable to me. Advanced math is a house of cards that quickly falls if you're not actively reinforcing it.


The article also notes that other mathematicians very quickly advanced his work from 10% to 38% and wondered why he hadn't just done it himself, since it was relatively simple. He explained that his skills were rusty after being out of mathematics for six years.


> He also worked with a textbook by his side so he could look up formulas he’d forgotten. “You’d think someone who comes up with a great result shouldn’t have to consult Chapter 2 of Elements of Information Theory, but I did,” Gilmer said.

Always enjoy seeing this sentiment reaffirmed. Funny that Gilmer states it in a somewhat self surprised tone.

Rings of the Einstein quote:

“[I do not] carry such information in my mind since it is readily available in books. He also said, “…The value of a … education is not the learning of many facts but the training of the mind to think.”


"The value of a … education is not the learning of many facts but the training of the mind to think."

While anyone with a (good) STEM education knows this to be obviously true, its frightening how many people go through 16+ years of formal schooling where they mostly do rote memorization and regurgitation.


We can thank the academic system and Goodhart's Law for that.

We measure students with tests, but all we get are people good at taking tests and rote memorization.

Maybe if we had a better way to access actual learning but are teachers even paid enough to truly care?


Tests have their place though. The more rigorous maths exams I took in school were some of the most intellectually challenging things I did in school. If written correctly, they require you to apply the things you've learned in novel ways, or to combine the topics you've studied in new ways. Problem is, IME there's very few professors/teachers qualified and devoted enough to develop those sorts of exams, and as soon as you try to 'scale' it by centralizing the test creation, you run into the same issue of students simply learning for the test.


I think math is somewhat special in that it isn’t about solving specific problems, but about learning to recognize patterns, and that the tests don’t check whether they know the patterns, but whether they can apply those patterns to problems that they need not even have ever seen.

In many other disciplines applying what students learned to things they’ve never seen before isn’t really possible.

For example, you can teach students what factors were needed to start the Industrial Revolution and why, but then, to answer “why didn’t the Industrial Revolution start in the 1750s in South America?” students would have to know about South America in the 1750s. If they don’t, all they can answer is “it must be because not all of the prerequisites foo, bar, baz, … were available, but I wouldn’t know which one(s)”.

Because of that, that question isn’t better than the dull “why did the Industrial Revolution start where and when it started?”, the answer to which can be learned by rote learning.


I would say that studying history is almost defined by rote memorization. Once a pattern emerges, you're not studying pure history but rather historical economics or sociology or something else. Things don't usually fit neatly into pure categories but I can imagine well written tests for pretty much any subject that require extrapolation or speculation of some kind, except "pure history".


You make a good point. I believe that there are ways to get students outside of maths to apply what they've learned to things they haven't seen before, though it won't be as novel or challenging as a maths exam. In history for example, you could have students analyze primary sources they've never encountered, which would require them to apply their knowledge of the context to something novel. Definitely more challenging than in STEM though.


Writing a hard/challenging exam is not difficult. The problem is most bad or even average students will fail. The tricky part is striking a balance such that the good students feel pushed, and the weaker students still have a chance to demonstrate they at least engaged and understand the basics.


This was exactly my experience in college. I repeatedly had professors who apologized for making the test "too hard" -- i.e. all but 0-1 students failed because passing the test required a novel approach that no one (or only 1-2 brilliant students) was able to find during the time allotted.


>Maybe if we had a better way to access actual learning but are teachers even paid enough to truly care?

There's no incentive to pay teachers more; system is working as designed. Those students aren't giving short-term quarterly gains.


Tis a shame the system doesn't account for the massive, long-term returns quality education provides, both to student and society.


> its frightening how many people go through 16+ years of formal schooling where they mostly do rote memorization and regurgitation.

That's how most education is structure, though. Unless your point is simply that, though it reads as if you're blaming the student.


Students do share some of the blame, at least once you get to high school and university level. No one forces (university) students to take easy courses that don't require rigor. As long as people are willing to shell out full-price tuition for a degree that doesn't require you to do more than rote memorization and regurgitation, schools will continue to facilitate it.


I think it’s mainly that STEM majors are smarter, and thinking is hard slash unmotivating.

Of course any non-STEM field has geniuses, but our impressions—like the parent’s—are formed from averages, and STEM majors are smarter on avg.


Chapter 2 of that excellent book by Cover and Thomas is the roundup of all the definitions of entropy, relative entropy, mutual information, etc.

I think it's the first place I ever saw the "Information Diagram" (https://en.wikipedia.org/wiki/Information_diagram) which is a Venn-like diagram that really helps to remember all these relationships.


Without needing to commit many facts to memory, one nevertheless needs to know which facts are known, and how to get to them if need be. Otherwise all the time will be spent reinventing the wheel.


The vast majority of academic "achievement" measures are centered on tests which entirely disallow looking things up.

Especially in poorer public schools, learning is modeled as recitation. It's awfully sad. Most students are lucky if they get one or two teachers that break that mold.


carry such information in my mind since it is readily available in books.

It depends...good luck doing math if you don't have a good grasp of arithmetic . Having to look up everything would be a major chore.


It's just coy self-deprecation. A smart guy who's spent a few years in a corporate environment probably knows when to deploy it.


I do not carry such information in my mind since it is readily available on stack overflow


> Still, for some of the authors of the follow-up papers, getting to 38% was relatively straightforward, and they wondered why Gilmer didn’t just do it himself. The simplest explanation turned out to be the correct one: After more than a half-decade out of math, Gilmer just didn’t know how to do some of the technical analytic work required to pull it off.

> “I was a bit rusty, and to be honest, I was stuck,” Gilmer said. “But I was eager to see where the community would take it.”

> Yet Gilmer thinks the same circumstances that left him out of practice probably made his proof possible in the first place.

This reminds me of the principle of Shoshin: https://en.wikipedia.org/wiki/Shoshin

The idea that the mindset of a beginner can sometimes find insight where the mind of an expert cannot, and that experts should cultivate thinking like a beginner in some instances.


What am I not getting? Isn't family {1}, {2}, {3}, {4}, {1, 2, 3, 4} union-closed? Each element appears in only 2/5 of the sets, so ???

Edit: From the paper, the family F is defined as "F ⊆ (subset or equal) 2^n". I think that means for n elements there are at most 2^n subsets? In other words, no duplicated subsets. Which still doesn't clear up my confusion--5 subsets is < 2^4.

Edit: Duh, thanks for correcting my thinko.


"A collection, or family, of sets is considered “union-closed” if the union of any two sets in the family equals any existing set in the family." Your family isn't union closed because it doesn't include {1, 2}, the union of {1} and {2}.


{1} U {2} = {1,2} and is not in the family.


Very cool to see someone come back out of "industry" and to crack a problem. On a side note: the name looked familiar to me, and I realized I had spent a decent amount of time re-implementing his most cited (ML) paper a few years back.


Set family {{1,2,3,4}, {1}, {2}, {3}, {4}} has all 4 elements appearing in exactly 2 sets out of the 5, that is 40%. What am I missing?


Your example is not union-closed.

"A collection, or family, of sets is considered “union-closed” if the union of any two sets in the family equals any existing set in the family."

EDIT: In your hypothetical collection of sets, the union of "2" and "3" would produce the set "2,3" which is not an enumerated set in your list.


Check out the replies to this comment [1] with the same attempted counterexample. Your set isn't union closed because {1}U{2} = {1, 2} isn't in it (same for {1}U{3} etc.).

[1] https://news.ycombinator.com/item?id=34236889#34240381


The article title has been revised to more accurately reflect the researcher's position (and removes the AI ambiguity): "Google Researcher, Long Out of Math, Cracks Devilish Problem About Sets"

Should the title of this HN entry be updated accordingly?


"long out of math"? he did his Ph.D. in 2015. - surely 7 years is not that long! :)


Ok, done. Thanks!


GPT, how do I prove that...?


Similarly, I assumed from the headline that he had used the AI to help him determine the proof.

As for using AIs as mathematical proof helpers, I threw it the following softball and can't say that mathematicians should be too worried just yet:

>> Please prove that the first decimal digit of pi is 3.

> The value of pi is an irrational number, which means that it has an infinite number of decimal digits that do not repeat in a pattern. It is not possible to prove that any particular decimal digit is the first decimal digit of pi, because the decimal representation of pi is infinite and does not repeat. However, the first few decimal digits of pi are widely accepted to be 3.14, so the first decimal digit of pi is generally assumed to be 3.


The title makes it sound like an AI solved it but he's just a programmer who normally works on AI that happens to have solved it.


> (i.e. clickbait)

It is not clickbait just because you interpreted it wrongly. The AI in the title only aims to demonstrate his state of out-group in regards to math.


"Computer science researcher solves computer science problem" isn't very outgroup.

He's more "in" that problem space than a professional differential topologist.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: