Extracting Meaning from Millions of Pages

lazyjeff · on Sept 4, 2011

I work for the professor from the article (but not on TextRunner).

We're working on extracting meaning from reviews as well: http://revminer.com/

At the moment, it only has reviews of Seattle places (restaurants, hotels, etc.) but we're moving it mobile. It's written using node.js and socket.io; I'd be interested in hearing any feedback.

agotterer · on Sept 5, 2011

Is it also open source?

lazyjeff · on Sept 5, 2011

Not yet, but we may open source the code when we publish the paper.

yannis · on Sept 5, 2011

How does this relate to freebase.org? I see some of the js ajaxing to the freebase API.

andreasvc · on Sept 5, 2011

I'd say you're extracting information, not meaning.

acak · on Sept 4, 2011

From the article - "For example, to find the names of people who are CEOs within millions of documents, you'd first need to train the software with other examples, such as "Steve Jobs is CEO of Apple, Sheryl Sandberg is CEO of Facebook." "

Sheryl Sandberg? Deliberate or honest mistake? :-]

antimora · on Sept 4, 2011

Looks like the directory index was left open. http://textrunner.cs.washington.edu/

timr · on Sept 4, 2011

it's open source. just download it:

http://reverb.cs.washington.edu/

mark_l_watson · on Sept 5, 2011

Awesome: code released under the GPL, with several data sets. Good to see this project (which has been under development for a long time) releasing technology for other people to use.

abhaga · on Sept 5, 2011

Read The Web at CMU is also a similar system. http://rtw.ml.cmu.edu/rtw/

DallaRosa · on Sept 5, 2011

Hasn't this been out for like, a long time?