Hacker News new | past | comments | ask | show | jobs | submit login

My point was that this is not possible as the trained layers are intrinsically tightly coupled. You can't combine pre-trained sub networks in arbitrary manner without retraining. In all the standard practice of reusing pretrained networks, you would take a pretrained network or part of it, and train some layers around it to match what you need, optionally fine-tuning the pretrained layers as well. If you want use a different pre-trained embedding model, you retrain the rest of the network.

In your example, the sentiment layer will work without re-training or finetuning only if preceeded by the exact same language-embed layer as the one it was trained on. You can't swap in another layer there - even if you get a different layer that has the exact same dimensions, the exact same structure, the exact same training algorithm and hyperparameters, the exact same training data but a different random seed value for initialization, then it can't be a plug-in replacement. It will generate different language embeddings than the previous one - i.e. the meaning of output neuron #42 being 1.0 will be completely unrelated to what your sentiment layer expects in that position, and your sentiment layer will output total nonsense. There often (but not always!) could exist a linear transformation to align them, but you'd have to explicitly calculate it somehow e.g. through training a transformation layer. In the absence of that, if you want to invoke that particular version of sentiment layer, then you have no choice about the preceeding layers, you have to invoke the exact same version as was done during the training.

Solving that dependency problem requires strong API contracts about the structure and meaning of the data being passed between the layers. It might be done, but that's not how we commonly do it nowadays, and that would be a much larger task than this project. Alternatively, what could be useful is that if you want to pipe the tweets to sentiment_model_v123 then a system could automatically look up in the metadata of that model that it needs to transform the text by transformation_A followed by fasttext_embeddings_french_v32 - as there's no reasonable choice anyway.




Yes. I understand how neural networks work. In my example language-embed and sentiment are provided by layer. This allows layer to provide compatible modules. If two modules which are incompatible are used together they might provide junk output. That is true for any combination of command line utitilies. If I cat a .jpg I'm going to have a hard time using that output with sed.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: