Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Best Text Mining Resources
59 points by big_data on June 13, 2010 | hide | past | favorite | 6 comments
I plan a deeper dive into text mining this year, and am looking for some suggestions on what resources are best. A friend suggested Text Mining by Weiss, et al http://www.springer.com/computer/database+management+%26+information+retrieval/book/978-0-387-95433-2

What would you suggest?




Managing Gigabytes (Witten)

Information Retrieval (Manning)

Text Compression (Bell)

Natural Language Processing (Manning)

Natural Language Understanding (Allen)

Speech and Language Processing (Jurafsky)

The Text Mining Handbook (Sanger)

Statistical Machine Translation (Koehn)

Data-Intensive Text Processing with MapReduce (Lin)

Algorithms on strings (Gusfield)

Jewels of Stringology (Crochemore)

Regular Expressions (Friedl), also: http://swtch.com/~rsc/regexp/regexp1.html and automata theory (Hopcroft)

Practical Text Mining with Perl (Bilisoly)

Natural Language Processing with Python (Bird)

Computational Linguistics (Hausser)

Syntactic structures (Chomsky)

also check out these links: http://measuringmeasures.blogspot.com/2010/01/learning-about...

http://measuringmeasures.com/blog/2010/3/12/learning-about-m...

http://www.cs.technion.ac.il/~gabr/resources/resources.html


If you want to learn more about learning over text I will recommend you to look at those lectures: http://videolectures.net/mlas06_pittsburgh/

First two lectures are great introduction to this topic and third is also related, but not necessary.

If you want to dive deeper to more advanced stuff I will recommend to look to the conditional random fields, which is kind of state of art of this field right now.

Great tutorial: http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf Wiki entry: http://en.wikipedia.org/wiki/Conditional_random_field



You could ask in the Machine Learning subreddit too : http://reddit.com/r/machinelearning


- Modern Information Retrieval by Baeza Yates

- Data Mining Book by Jiawei Han et al

- Managing Gigabytes by Witten et al

- Hypertext Mining book by Chakrabarti


Great stuff here, sure to get me going in the right direction! Thank you all!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: