Ever left a restaurant wanting to write a review, but thinking it wasn't worth the trouble to tap out all those words on your phone -- you just want to give the place your n stars and provide a few words of praise or condemnation? If only you could press a button to generate a plausible review. If this project happens, you will.

We'll use the Yelp API to grab as many reviews of certain types of restaurants as the terms of service allow (I assume "Use any robot, spider, site search/retrieval application, or other automated device, process or means to access, retrieve, scrape, or index any portion of the Site or any Site Content;" doesn't apply to API users -- otherwise it wouldn't be much of an API).

We'll investigate libraries like spaCy for doing natural language processing (in Python).

We'll dive into the research on Markov Chains for Natural Language Generation

Finally we'll put this functionality on the web, using Flask & SQLAlchemy in front of a postgres database. The idea is to generate a sample review, and allow the user to easily tweak it and copy it to the clipboard, not to automatically post the reviews to Yelp.

Join this project Leave this project

Looking for hackers with the skills:

python nltk webapps flask sqlalchemy

This project is part of:

Hack Week 15

Activity

almost 9 years ago: cschum liked this project.

almost 9 years ago: ericp added keyword "flask" to this project.

almost 9 years ago: ericp added keyword "sqlalchemy" to this project.

almost 9 years ago: ericp added keyword "python" to this project.

almost 9 years ago: ericp added keyword "nltk" to this project.

almost 9 years ago: ericp added keyword "webapps" to this project.

almost 9 years ago: ericp started this project.

almost 9 years ago: dmacvicar liked this project.

almost 9 years ago: ericp originated this project.

Comments

almost 9 years ago by ericp | Reply

Mixed results on the hack.

On one hand, I learned how to use Python's nltk library to do natural language processing on the reviews, to the point that I could lex each word. nltk has parsing abilities as well, but I decided to see how far I could get with just lexing and statistical analysis. Also, the flask and sqlalchemy libraries were straightforward Python analogs of Ruby's sinatra and ActiveRecord/Sequel ORMs, nothing new here. And the Yelp API was straightforward to work with.

The downside was that the Yelp API only exposed the first 10 words of three reviews for each business. If we assume the average business has 100 reviews of about 200 words each, this wasn't going to give me the data I needed. However, each review in the resource returned byhttps://api.yelp.com/v3/businessesBUSINESSID/reviews also contained a URL, and following that URL gave me the full text of 19 reviews.

And apparently following that URL violated Yelp's general terms of service, but not the developers's ToS, and I was cut off after pulling down reviews for 350 restaurants. At least I randomized my selection procedure, so I ended up with a smattering of Mexican, Chinese, Japanese and American-style restaurants.

The best generated sentence might have been one of the first: "I travel the bone tender." I also liked "My wife had the chipotle pancakes." But most of the sentences were grammatically incorrect, or made no sense, or both. I did try tweaking the Markov generator to use a mix of single-word and double-word prefixes, but given the lack of data, I ended the hack and went back to work.

Similar Projects

python

Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil

Join the Gitter channel! https://gitter.im/uyuni-project/hackweek

Uyuni is a configuration and infrastructure management tool that saves you time and headaches when you have to manage and update tens, hundreds or even thousands of machines. It also manages configuration, can run audits, build image containers, monitor and much more!

Currently there are a few distributions that are completely untested on Uyuni or SUSE Manager (AFAIK) or just not tested since a long time, and could be interesting knowing how hard would be working with them and, if possible, fix whatever is broken.

For newcomers, the easiest distributions are those based on DEB or RPM packages. Distributions with other package formats are doable, but will require adapting the Python and Java code to be able to sync and analyze such packages (and if salt does not support those packages, it will need changes as well). So if you want a distribution with other packages, make sure you are comfortable handling such changes.

No developer experience? No worries! We had non-developers contributors in the past, and we are ready to help as long as you are willing to learn. If you don't want to code at all, you can also help us preparing the documentation after someone else has the initial code ready, or you could also help with testing :-)

The idea is testing Salt (including bootstrapping with bootstrap script) and Salt-ssh clients

To consider that a distribution has basic support, we should cover at least (points 3-6 are to be tested for both salt minions and salt ssh minions):

Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)
Package management (install, remove, update...)
Patching
Applying any basic salt state (including a formula)
Salt remote commands
Bonus point: Java part for product identification, and monitoring enablement
Bonus point: sumaform enablement (https://github.com/uyuni-project/sumaform)
Bonus point: Documentation (https://github.com/uyuni-project/uyuni-docs)
Bonus point: testsuite enablement (https://github.com/uyuni-project/uyuni/tree/master/testsuite)

If something is breaking: we can try to fix it, but the main idea is research how supported it is right now. Beyond that it's up to each project member how much to hack :-)

If you don't have knowledge about some of the steps: ask the team
If you still don't know what to do: switch to another distribution and keep testing.

This card is for EVERYONE, not just developers. Seriously! We had people from other teams helping that were not developers, and added support for Debian and new SUSE Linux Enterprise and openSUSE Leap versions :-)

In progress/done for Hack Week 25

Guide

We started writin a Guide: Adding a new client GNU Linux distribution to Uyuni at https://github.com/uyuni-project/uyuni/wiki/Guide:-Adding-a-new-client-GNU-Linux-distribution-to-Uyuni, to make things easier for everyone, specially those not too familiar wht Uyuni or not technical.

openSUSE Leap 16.0

The distribution will all love!

https://en.opensuse.org/openSUSE:Roadmap#DRAFTScheduleforLeap16.0

Curent Status We started last year, it's complete now for Hack Week 25! :-D

[W] Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file) NOTE: Done, client tools for SLMicro6 are using as those for SLE16.0/openSUSE Leap 16.0 are not available yet
[W] Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)
[W] Package management (install, remove, update...). Works, even reboot requirement detection

Improve chore and screen time doc generator script `wochenplaner` by gniebler

Description

I wrote a little Python script to generate PDF docs, which can be used to track daily chore completion and screen time usage for several people, with one page per person/week.

I named this script wochenplaner and have been using it for a few months now.

It needs some improvements and adjustments in how the screen time should be tracked and how chores are displayed.

Goals

Fix chore field separation lines
Change screen time tracking logic from "global" (week-long) to daily subtraction and weekly addition of remainders (more intuitive than current "weekly time budget method)
Add logic to fill in chore fields/lines, ideally with pictures, falling back to text.

Resources

tbd (Gitlab repo)

Improvements to osc (especially with regards to the Git workflow) by mcepl

Description

There is plenty of hacking on osc, where we could spent some fun time. I would like to see a solution for https://github.com/openSUSE/osc/issues/2006 (which is sufficiently non-serious, that it could be part of HackWeek project).

Bring to Cockpit + System Roles capabilities from YAST by miguelpc

Bring to Cockpit + System Roles features from YAST

Cockpit and System Roles have been added to SLES 16 There are several capabilities in YAST that are not yet present in Cockpit and System Roles We will follow the principle of "automate first, UI later" being System Roles the automation component and Cockpit the UI one.

Goals

The idea is to implement service configuration in System Roles and then add an UI to manage these in Cockpit. For some capabilities it will be required to have an specific Cockpit Module as they will interact with a reasource already configured.

Resources

A plan on capabilities missing and suggested implementation is available here: https://docs.google.com/spreadsheets/d/1ZhX-Ip9MKJNeKSYV3bSZG4Qc5giuY7XSV0U61Ecu9lo/edit

Linux System Roles:

https://linux-system-roles.github.io/
https://build.opensuse.org/package/show/openSUSE:Factory/ansible-linux-system-roles Package on sle16 ansible-linux-system-roles

First meeting Hackweek catchup

Monday, December 1 · 11:00 – 12:00
Time zone: Europe/Madrid
Google Meet link: https://meet.google.com/rrc-kqch-hca

Song Search with CLAP by gcolangiuli

Description

Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface

Goals

Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:

Music Tagging;
Free text search;
Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.

The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.

Result

In this MVP we implemented:

Async Song Analysis with Clap model
Free Text Search of the songs
Similar song search based on vector representation
Containerised version with web interface

We also documented what went well and what can be improved in the use of AI.

You can have a look at the result here:

Future implementation can be related to performance improvement and stability of the analysis.

References

CLAP: The main model being researched;
huggingface: Pre-trained models for CLAP;
Free Music Archive: Creative Commons songs that can be used for testing;

flask

Collection and organisation of information about Bulgarian schools by iivanov

Description

To achieve this it will be necessary:

Collect/download raw data from various government and non-governmental organizations
Clean up raw data and organise it in some kind database.
Create tool to make queries easy.
Or perhaps dump all data into AI and ask questions in natural language.

Goals

By selecting particular school information like this will be provided:

School scores on national exams.
School scores from the external evaluations exams.
School town, municipality and region.
Employment rate in a town or municipality.
Average health of the population in the region.

Resources

Some of these are available only in bulgarian.

https://danybon.com/klasazia
https://nvoresults.com/index.html
https://ri.mon.bg/active-institutions
https://www.nsi.bg/nrnm/ekatte/archive

Results

Information about all Bulgarian schools with their scores during recent years cleaned and organised into SQL tables
Information about all Bulgarian villages, cities, municipalities and districts cleaned and organised into SQL tables
Information about all Bulgarian villages and cities census since beginning of this century cleaned and organised into SQL tables.
Information about all Bulgarian municipalities about religion, ethnicity cleaned and organised into SQL tables.
Data successfully loaded to locally running Ollama with help to Vanna.AI
Seems to be usable.

TODO

Add more statistical information about municipalities and ....

Code and data

Github

Looking for hackers with the skills:

This project is part of:

Activity

Comments

almost 9 years ago by ericp | Reply

Similar Projects

python

Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil

In progress/done for Hack Week 25

Guide

openSUSE Leap 16.0

Improve chore and screen time doc generator script `wochenplaner` by gniebler

Description

Goals

Resources

Improvements to osc (especially with regards to the Git workflow) by mcepl

Description

Bring to Cockpit + System Roles capabilities from YAST by miguelpc

Bring to Cockpit + System Roles features from YAST

Goals

Resources

Song Search with CLAP by gcolangiuli

Description

Goals

Result

References

flask

Collection and organisation of information about Bulgarian schools by iivanov

Description

Goals

Resources

Results

TODO

Code and data