SUSE Hack Week: RankWell: Markov Chain Generation of Yelp Restaurant Reviews

Ever left a restaurant wanting to write a review, but thinking it wasn't worth the trouble to tap out all those words on your phone -- you just want to give the place your n stars and provide a few words of praise or condemnation? If only you could press a button to generate a plausible review. If this project happens, you will.

We'll use the Yelp API to grab as many reviews of certain types of restaurants as the terms of service allow (I assume "Use any robot, spider, site search/retrieval application, or other automated device, process or means to access, retrieve, scrape, or index any portion of the Site or any Site Content;" doesn't apply to API users -- otherwise it wouldn't be much of an API).

We'll investigate libraries like spaCy for doing natural language processing (in Python).

We'll dive into the research on Markov Chains for Natural Language Generation

Finally we'll put this functionality on the web, using Flask & SQLAlchemy in front of a postgres database. The idea is to generate a sample review, and allow the user to easily tweak it and copy it to the clipboard, not to automatically post the reviews to Yelp.

Join this project Leave this project

Looking for hackers with the skills:

python nltk webapps flask sqlalchemy

This project is part of:

Hack Week 15

Activity

over 8 years ago: cschum liked this project.

over 8 years ago: ericp added keyword "flask" to this project.

over 8 years ago: ericp added keyword "sqlalchemy" to this project.

over 8 years ago: ericp added keyword "python" to this project.

over 8 years ago: ericp added keyword "nltk" to this project.

over 8 years ago: ericp added keyword "webapps" to this project.

over 8 years ago: ericp started this project.

over 8 years ago: dmacvicar liked this project.

over 8 years ago: ericp originated this project.

Comments

over 8 years ago by ericp | Reply

Mixed results on the hack.

On one hand, I learned how to use Python's nltk library to do natural language processing on the reviews, to the point that I could lex each word. nltk has parsing abilities as well, but I decided to see how far I could get with just lexing and statistical analysis. Also, the flask and sqlalchemy libraries were straightforward Python analogs of Ruby's sinatra and ActiveRecord/Sequel ORMs, nothing new here. And the Yelp API was straightforward to work with.

The downside was that the Yelp API only exposed the first 10 words of three reviews for each business. If we assume the average business has 100 reviews of about 200 words each, this wasn't going to give me the data I needed. However, each review in the resource returned byhttps://api.yelp.com/v3/businessesBUSINESSID/reviews also contained a URL, and following that URL gave me the full text of 19 reviews.

And apparently following that URL violated Yelp's general terms of service, but not the developers's ToS, and I was cut off after pulling down reviews for 350 restaurants. At least I randomized my selection procedure, so I ended up with a smattering of Mexican, Chinese, Japanese and American-style restaurants.

The best generated sentence might have been one of the first: "I travel the bone tender." I also liked "My wife had the chipotle pancakes." But most of the sentences were grammatically incorrect, or made no sense, or both. I did try tweaking the Markov generator to use a mix of single-word and double-word prefixes, but given the lack of data, I ended the hack and went back to work.

Similar Projects

This project is one of its kind!

Looking for hackers with the skills:

This project is part of:

Activity

Comments

over 8 years ago by ericp | Reply

Similar Projects