There is evidence that language is fairly smooth though. For example, we can ext...

Florin_Andrei · on Aug 9, 2016

> I have a pet theory that the discovery of sharp features and boolean programs might heavily rely on noise. If the error surface becomes too discrete, we basically need to backup to pure random optimization (i.e. trying any direction by random chance and keep it, if it is better). That allows us to skip down the energy surface even without the presence of a gradient.

Isn't that basically Monte Carlo?

hacker42 · on Aug 9, 2016

It's called random optimization or random search depending on whether you sample from a normal or uniform distribution for the random direction. MC typically refers to any algorithm that computes approximated solutions using random numbers (as opposed to Las Vegas algorithms which use random numbers to always compute the correct solution). So, yes, RO, RS and gradient descent are MC local optimization algorithms.

brianchu · on Aug 10, 2016

The very method of using a word embedding space assumes the manifold is smooth, so the fact that vectors extracted from a method that assumes a smooth manifold, are in fact on a smooth manifold, is just circular and not evidence of anything.

hacker42 · on Aug 10, 2016

The evidence is that this works in the first place.

brianchu · on Aug 12, 2016

That is very, very weak evidence.