Hacker News new | past | comments | ask | show | jobs | submit login
Implementing Neural Turing Machines (arxiv.org)
117 points by godelmachine on July 26, 2018 | hide | past | favorite | 15 comments



The author states in his github repository (https://github.com/MarkPKCollier/NeuralTuringMachine) that his work is based on (https://github.com/snowkylin/ntm). If that is the case then I find it kind of strange that he has relicensed the work to MIT from LGPL3 and removed any reference to the original author in the LICENSE. LGPL3 requires any derivative work to be licensed the same and retain mentions of the authors, more over it also requires a clear explanation of the modifications that were undertaken.

Also compared to the open source implementation (https://github.com/snowkylin/ntm) it seems like his main novel claim is that he looked at different memory initialisation patterns.

Edit:

compare the original: https://github.com/snowkylin/ntm/blob/master/ntm/ntm_cell.py

to the derivative work: https://github.com/MarkPKCollier/NeuralTuringMachine/blob/ma...

from what I can tell the main innovation is that the derivative work uses a named tuple instead of a dictionary for state keeping and there is new memory initialisation code. The original author apparently initialised the memory randomly. I also feel like the paper should cite the implementation they are basing their work on. The paper https://arxiv.org/pdf/1807.08518.pdf merely states that other implementations exist on page one and makes no mention of the fact that their implementation is based on one of those. Combine that with the fact that they are asking people in the Readme to cite their paper feels like not a very good idea.


I am the author of this work. I was not aware of the difference in the licenses and just used the GitHub default. I have updated it accordingly, thanks for pointing this out.

A note on the difference between our work and Snowkylin's: the code implementing the operations of the NTM are very similar and both are similar to other open source NTM implementations as the operations for a NTM are defined by the equations of the original NTM paper and have a clear mapping into code.

The primary difference is that our implementation works and is stable - the code changes to achieve this are minimal but this still required substantial experimentation and work to figure out what was causing slow convergence and the gradients becoming NaN (causing training to fail) in other implementations. Thus our primary contribution is not to put a NTM into code but to get that code to reliably and quickly train.


Just to be clear, I went through the two implementations (https://github.com/snowkylin/ntm/blob/master/ntm/ntm_cell.py) and (https://github.com/MarkPKCollier/NeuralTuringMachine/blob/ma...) line by line. The code is not just similar, but largely identical. You've used the exact same variable names and ordering in the code. Some comments have been removed. The notable differences I can spot are:

- line 53-56 (in your code), which correspond to line 41-45 in Snowkylin's code

- line 70 vs 69, where you've used a build in function instead of the explicit compositional form of softplus

- finally the major change I can see are various variable initialisation schemes 147-178 in your code, which is handled by line 157-185 in snowlkylin's code

and some miscellaneous places where you've used a NamedTuple instead of a dictionary. And yes the choice of initialisation is a valid improvement. As far as I can tell Snowkylin's implementation is not broken, but just might learn more slowly because of a different state initialisation scheme. It does not suffer the NaN problem for example.

It should be communicated very clearly also in the paper that the code you are releasing is derived from another open-source implementation, with a precise explanation of the changes you have made. Also you can't change the LICENSE and copyright notice of open-source code to yourself without permission. I believe even in the current form your Github repo is in breach of the LGPL3. For more details how to use the LGPL properly, please refer to https://www.gnu.org/licenses/gpl-howto.en.html. Notice that the original repository actually does not carry out all those steps properly.

Namely it is missing LICENSE and Copyright notices in every file. It also doesn't have an explicit copyright notice anywhere.


Indeed, if these are so similar your repo really ought to be a fork of the prior repo, and you should certainly acknowledge them more in the paper (make clearer what your contributions are).


You broke the LGPL license, you didn't state changes:

https://github.com/snowkylin/ntm/blob/master/LICENSE

Moreover, in the paper 5 times you write:

"Our implementation"

You also don't acknowledge that you are piggybacking on snowkylin's code. You didn't make it magically "stable", like fixed NaNs or anything like that.

And you want to be cited as follows:

title={Implementing Neural Turing Machines, author={Collier, Mark and Beel, Joeran},

That's just very bad.

You should:

1. State changes (orbifold did it for you below). Acknowledge snowkylin. Link to them.

2. Title it "Improved initialization in NTM" or something like that, not "Implementing NTM".


Important and unsurprising sentence from the paper's abstract: "A number of open source implementations of NTMs exist but are unstable during training and/or fail to replicate the reported performance of NTMs"


Why do you deem it unsurprising? Do you think that the opensource neural network implementations are subpar?


Anecdotally: I've tried to replicate some recent AI/ML papers and failed. So have some of my acquaintances.


Some have written about a "reproducibility crisis" in machine learning research:

https://petewarden.com/2018/03/19/the-machine-learning-repro...


I would expect that it was just as unstable for DeepMind. But they can automatically run lots of experiments with different hyper params so that is less an issue for them.


“how the memory contents of a NTM are initialized may be a defining factor in the success of a NTM implementation”

Does it mean that one can expect better explainability in the future from these models?


The result was that constant/zero initialization was the best, which is the most natural choice anyway (for me at least), and definitely also the most simple option. I'm a bit surprised that they emphasise so much on that. Also, not really sure what to learn from this.


Does this correspond to an intuition that a tabula rasa may be the best way for a learner to start?


It'd be interesting to see a Neural TM that can increase its Kolmogorov complexity.





Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: