Computational Life: How self-replicating programs emerge from simple interaction

jekude · 2024-06-30T19:25:30 1719775530

Combinatory Chemistry [0] is a great paper that is referenced. Using Combinatory Logic feels a lot cleaner than brainfuck IMO.

[0] https://arxiv.org/abs/2003.07916

29athrowaway · 2024-07-01T05:25:08 1719811508

Brainfuck is a useful language for evolutionary programming because it is Turing complete, it has few instructions, the interpreter is easy to implement, and a mutation operator is also easy to implement.

There are different ways to deal with bounds checking when moving the tape: saturation, wrapping and invalidation. I often chose to invalidate a program that crosses bounds, as it usually generates a cleaner program that is more compatible with other interpreters as well.

In the paper the part that's interesting is that there was no explicit fitness functions. The higher fitness emerges in the form of the replicator. https://www.youtube.com/watch?v=eOHGBuZCswA

JyrkiAlakuijala · 2024-06-28T13:26:35 1719581195

Zilog Z80: The reports of my death are greatly exaggerated. I'm actually creating life nowadays.

muragekibicho · 2024-06-28T13:32:53 1719581573

Lol I saw your tweet

p1esk · 2024-06-30T18:04:56 1719770696

They use Brainfuck language for the experiments. Seriously.

Zondartul · 2024-06-30T19:07:11 1719774431

It's convienient because any random collection letters is a "valid" brainfuck program.

dekhn · 2024-06-30T23:07:26 1719788846

Interesting! That's analogous to some recent work in chemistry. Historically, there was a string representation for molecules called SMILES, which is fairly terse and, when canonicalized, maps from strings to individual molecules (2D topology). However, not all strings are valid SMILES. Recent work with autoencoders to turn SMILES strings into a vector representation via embedding creates models that often generate invalid SMILES strings (the popular paper about this glosses over this fact). For example, if your training set includes both bromine (represented as Br) and chlorine (represented as Cl) and you generate random vectors, they might decode to contain Bl, which is an as-yet unmade element. This is not desirable (although opinions vary).

As a result, the group that published the earlier work developed a new compact representation known as SELFIES (https://github.com/aspuru-guzik-group/selfies) where it's effectively impossible to generate invalid decodings of strings (every SELFIE string decodes to a valid molecule).

I'm not sure what the terminology for these sorts of features of encodings.

blackbear_ · 2024-06-30T23:52:07 1719791527

At the risk of bringing this analogy too far, there was a recent paper [1] arguing that the ability to generate invalid SMILES is beneficial and allows the model to more accurately represent the target distribution compared to SELFIES.

This could be similar to mutations allow one to explore a wider range of options, although sometimes it can go too far and get a non-functional individual.

[1]: https://www.nature.com/articles/s42256-024-00821-x

dekhn · 2024-07-01T00:20:14 1719793214

Yeah that paper does not seem intuitive to me. I'm probably just bitter because I worked so hard to reproduce the original papers results only to realize that it didn't work very well and the author's basically swept that under the rug. I could have saved a month if they had just shared their code and weights. The element example I give is a perfect example of where it seems very unlikely that generating invalid representations would be useful

blackbear_ · 2024-07-01T00:37:04 1719794224

They do share code and data, the links are at the end [1,2]. The results do seem to more or less reproduce, but I'm also not entirely convinced by the explanation they provide. I was thinking that the difference is attributable to the shenanigans that SELFIES do with valences to make sure that all strings are valid (thus entirely derailing the model after a single mistake rather than getting the string thrown out), but I couldn't figure out how to prove it.

[1] https://doi.org/10.5281/zenodo.8321735 [2] https://doi.org/10.5281/zenodo.10680855

dekhn · 2024-07-01T02:18:02 1719800282

No, I mean the older paper with the VAE for SMILES https://arxiv.org/abs/1610.02415 which we ended up implementing here: https://github.com/maxhodak/keras-molecules

SELFIES is just a pip install away; I think the team learned from their previous papers to be more open about the limits, as well as releasing code and weights for reproducibility.

I don't follow this area closely enough to really have any comments about SELFIES other than "well, at least the problems we identified were addressed in later work". Specifically, my goal was to start with two SMILES strings, encode them to vectors, then sample points along the path between the two vectors, and decode them back to valid molecules. Presumably, SELFIES does this far better (see the examples in the repo).

optimalsolver · 2024-06-30T22:42:56 1719787376

StupidStackLanguage also has this pleasing property:

https://esolangs.org/wiki/StupidStackLanguage

metadat · 2024-07-01T07:16:41 1719818201

Wow, some of those programs are actually interesting, I want to try cat (implemented as simply "jf"), and some of the other ones further down the list.

I almost can't believe this is my first time encountering StupidStackLanguage.

29athrowaway · 2024-07-02T01:13:56 1719882836

The high amount of operators makes it harder to mutate the program iteratively into something that does what you want.

stephen_cagle · 2024-07-07T23:45:38 1720395938

Is every program valid? What if the "end loop" occurs before the "begin loop"?

LegionMammal978 · 2024-07-01T04:43:36 1719809016

As the language is usually presented, it has the syntactic requirement that all square brackets are properly paired. Of course, many interpreters (including the original one IIRC) don't fully validate this, but they will have inconsistent behavior on unpaired brackets. Here, the authors had to further specify that jumping from an unpaired bracket causes the program to halt.

29athrowaway · 2024-07-02T01:09:12 1719882552

I think "]" requires to be preceded by "[".

Depending how to handle "<", and ">" you can have out of bounds errors. And "+" and "-" can lead to overflow/underflow. Unless you wrap or saturate values.

Some programs can also lead to infinite loops.

So there are "invalid" programs for these purposes. What I do is to mark the program as invalid when these problems are present. For infinite loop detection, I give the program a limited amount of iterations and if it doesn't finish when the max limit is reached the program is marked as invalid.