Text Understanding from Scratch Using Temporal Convolutional Networks

sgt101 · on Feb 7, 2015

Astonishing? 3% better than bag of words after n days training on GPU's? I have misunderstood because I am not astonished.

bradneuberg · on Feb 7, 2015

It's at the lower level character rather than word level which is unique. The convolutional net also doesn't need to be told what each word's role is, but rather learns that feature itself.

ninjin · on Feb 7, 2015

> It's at the lower level character rather than word level which is unique.

No, it is not unique. We have, among other things seen character-level language models (Sutskever et al. 2011) [1] and character-level part-of-speech tagging (Santos et al. 2014) [2]. What is unique are the convolutional aspects.

I am still for from convinced. The baselines are really weak sauce, sure, new datasets and wanting to use the same baseline for all tasks, but a Bag-of-Words model is pretty much the weakest baseline there is for Natural Language Processing tasks. Also, using the 5,000 most frequent words will hurt the BoW model for plenty of tasks since it will cover mostly function words rather than rare nouns due to the Zipfian nature of language. It is pretty much common knowledge that these rare nouns can be far more useful than function words for tasks such as topic classification.

[1]: http://www.cs.utoronto.ca/~ilya/pubs/2011/LANG-RNN.pdf

[2]: http://jmlr.org/proceedings/papers/v32/santos14.pdf

fspeech · on Feb 8, 2015

Not an expert here but could the fact that character based techniques work at all indicate that linguistics inspired ML may be superfluous? The authors here argued that the biological based consideration should point to phoneme based training. As the Chinese romanization corresponds tightly to phonemes (no irregular pronunciations as in English) the approach worked well with Chinese pinyin even though the native Chinese written system is totally different with thousands of characters.

What is interesting to me is that if ConvNet works well both for language and for visual processing that may well be because the human circuitry for processing both are very similar, while formalized grammar is at a different level (like logic) above speech as opposed to the linguistic view of a universal grammar undergirding speech.

eva1984 · on Feb 7, 2015

What surprises me is that (BOW model + logistic regression) works just fine in most of the benchmarks(except for Amazon Review), interesting paper anyway. Could it be that because the vocabulary for BOW is limited to 5000, a lot of information is lost?

ameasure · on Feb 7, 2015

Fascinating paper but the benchmarks seem incredibly weak. 5000 features for a bag of words model is nothing,these models normally have tens or hundreds of thousands of features.

MayanAstronaut · on Feb 7, 2015

True, it comparisons do a lot better with more features.

This paper looks to just show the major winning aspect of using CovNets as they do not need many features as the deep net learns its own representations of the training data. It more to show CovNets work on more then just vision.

But architeching the pooling layers IS adding complex to the simple input feature set. Therefore the comparison should be of only state of the art ML.

sushirain · on Feb 7, 2015

Open questions:

* Compare to RNNs with character level input.

* Compare to dedicated methods of sentiment analysis and topic categorization.

petercooper · on Feb 7, 2015

This paper is pretty fascinating, thanks! Having trouble visualizing it but getting there..