Ever left a restaurant wanting to write a review, but thinking it wasn't worth the trouble to tap out all those words on your phone -- you just want to give the place your n stars and provide a few words of praise or condemnation? If only you could press a button to generate a plausible review. If this project happens, you will.
We'll use the Yelp API to grab as many reviews of certain types of restaurants as the terms of service allow (I assume "Use any robot, spider, site search/retrieval application, or other automated device, process or means to access, retrieve, scrape, or index any portion of the Site or any Site Content;" doesn't apply to API users -- otherwise it wouldn't be much of an API).
We'll investigate libraries like spaCy for doing natural language processing (in Python).
We'll dive into the research on Markov Chains for Natural Language Generation
Finally we'll put this functionality on the web, using Flask & SQLAlchemy in front of a postgres database. The idea is to generate a sample review, and allow the user to easily tweak it and copy it to the clipboard, not to automatically post the reviews to Yelp.
Looking for hackers with the skills:
This project is part of:
Hack Week 15
Activity
Comments
-
almost 8 years ago by ericp | Reply
Mixed results on the hack.
On one hand, I learned how to use Python's
nltk
library to do natural language processing on the reviews, to the point that I could lex each word.nltk
has parsing abilities as well, but I decided to see how far I could get with just lexing and statistical analysis. Also, theflask
andsqlalchemy
libraries were straightforward Python analogs of Ruby'ssinatra
andActiveRecord
/Sequel
ORMs, nothing new here. And the Yelp API was straightforward to work with.The downside was that the Yelp API only exposed the first 10 words of three reviews for each business. If we assume the average business has 100 reviews of about 200 words each, this wasn't going to give me the data I needed. However, each review in the resource returned by
https://api.yelp.com/v3/businessesBUSINESSID/reviews
also contained a URL, and following that URL gave me the full text of 19 reviews.And apparently following that URL violated Yelp's general terms of service, but not the developers's ToS, and I was cut off after pulling down reviews for 350 restaurants. At least I randomized my selection procedure, so I ended up with a smattering of Mexican, Chinese, Japanese and American-style restaurants.
The best generated sentence might have been one of the first: "I travel the bone tender." I also liked "My wife had the chipotle pancakes." But most of the sentences were grammatically incorrect, or made no sense, or both. I did try tweaking the Markov generator to use a mix of single-word and double-word prefixes, but given the lack of data, I ended the hack and went back to work.
Similar Projects
Symbol Relations by hli
Description
There are tools to build function call graphs based on parsing source code, for example, cscope
.
This project aims to achieve a similar goal by directly parsing the disasembly (i.e. objdump) of a compiled binary. The assembly code is what the CPU sees, therefore more "direct". This may be useful in certain scenarios, such as gdb/crash debugging.
Detailed description and Demos can be found in the README file:
Supports x86 for now (because my customers only use x86 machines), but support for other architectures can be added easily.
Tested with python3.6
Goals
Any comments are welcome.
Resources
https://github.com/lhb-cafe/SymbolRelations
symrellib.py: mplements the symbol relation graph and the disassembly parser
symrel_tracer*.py: implements tracing (-t option)
symrel.py: "cli parser"
SUSE AI Meets the Game Board by moio
Use tabletopgames.ai’s open source TAG and PyTAG frameworks to apply Statistical Forward Planning and Deep Reinforcement Learning to two board games of our own design. On an all-green, all-open source, all-AWS stack!
Results: Infrastructure Achievements
We successfully built and automated a containerized stack to support our AI experiments. This included:
- a Fully-Automated, One-Command, GPU-accelerated Kubernetes setup: we created an OpenTofu based script, tofu-tag, to deploy SUSE's RKE2 Kubernetes running on CUDA-enabled nodes in AWS, powered by openSUSE with GPU drivers and gpu-operator
- Containerization of the TAG and PyTAG frameworks: TAG (Tabletop AI Games) and PyTAG were patched for seamless deployment in containerized environments. We automated the container image creation process with GitHub Actions. Our forks (PRs upstream upcoming):
./deploy.sh
and voilà - Kubernetes running PyTAG (k9s
, above) with GPU acceleration (nvtop
, below)
Results: Game Design Insights
Our project focused on modeling and analyzing two card games of our own design within the TAG framework:
- Game Modeling: We implemented models for Dario's "Bamboo" and Silvio's "Totoro" and "R3" games, enabling AI agents to play thousands of games ...in minutes!
- AI-driven optimization: By analyzing statistical data on moves, strategies, and outcomes, we iteratively tweaked the game mechanics and rules to achieve better balance and player engagement.
- Advanced analytics: Leveraging AI agents with Monte Carlo Tree Search (MCTS) and random action selection, we compared performance metrics to identify optimal strategies and uncover opportunities for game refinement .
- more about Bamboo on Dario's site
- more about R3 on Silvio's site (italian, translation coming)
- more about Totoro on Silvio's site
A family picture of our card games in progress. From the top: Bamboo, Totoro, R3
Results: Learning, Collaboration, and Innovation
Beyond technical accomplishments, the project showcased innovative approaches to coding, learning, and teamwork:
- "Trio programming" with AI assistance: Our "trio programming" approach—two developers and GitHub Copilot—was a standout success, especially in handling slightly-repetitive but not-quite-exactly-copypaste tasks. Java as a language tends to be verbose and we found it to be fitting particularly well.
- AI tools for reporting and documentation: We extensively used AI chatbots to streamline writing and reporting. (Including writing this report! ...but this note was added manually during edit!)
- GPU compute expertise: Overcoming challenges with CUDA drivers and cloud infrastructure deepened our understanding of GPU-accelerated workloads in the open-source ecosystem.
- Game design as a learning platform: By blending AI techniques with creative game design, we learned not only about AI strategies but also about making games fun, engaging, and balanced.
Last but not least we had a lot of fun! ...and this was definitely not a chatbot generated line!
The Context: AI + Board Games
ClusterOps - Easily install and manage your personal kubernetes cluster by andreabenini
Description
ClusterOps is a Kubernetes installer and operator designed to streamline the initial configuration
and ongoing maintenance of kubernetes clusters. The focus of this project is primarily on personal
or local installations. However, the goal is to expand its use to encompass all installations of
Kubernetes for local development purposes.
It simplifies cluster management by automating tasks and providing just one user-friendly YAML-based
configuration config.yml
.
Overview
- Simplified Configuration: Define your desired cluster state in a simple YAML file, and ClusterOps will handle the rest.
- Automated Setup: Automates initial cluster configuration, including network settings, storage provisioning, special requirements (for example GPUs) and essential components installation.
- Ongoing Maintenance: Performs routine maintenance tasks such as upgrades, security updates, and resource monitoring.
- Extensibility: Easily extend functionality with custom plugins and configurations.
- Self-Healing: Detects and recovers from common cluster issues, ensuring stability, idempotence and reliability. Same operation can be performed multiple times without changing the result.
- Discreet: It works only on what it knows, if you are manually configuring parts of your kubernetes and this configuration does not interfere with it you can happily continue to work on several parts and use this tool only for what is needed.
Features
- distribution and engine independence. Install your favorite kubernetes engine with your package
manager, execute one script and you'll have a complete working environment at your disposal.
- Basic config approach. One single
config.yml
file with configuration requirements (add/remove features): human readable, plain and simple. All fancy configs managed automatically (ingress, balancers, services, proxy, ...). - Local Builtin ContainerHub. The default installation provides a fully configured ContainerHub available locally along with the kubernetes installation. This configuration allows the user to build, upload and deploy custom container images as they were provided from external sources. Internet public sources are still available but local development can be kept in this localhost server. Builtin ClusterOps operator will be fetched from this ContainerHub registry too.
- Kubernetes official dashboard installed as a plugin, others planned too (k9s for example).
- Kubevirt plugin installed and properly configured. Unleash the power of classic virtualization (KVM+QEMU) on top of Kubernetes and manage your entire system from there, libvirtd and virsh libs are required.
- One operator to rule them all. The installation script configures your machine automatically during installation and adds one kubernetes operator to manage your local cluster. From there the operator takes care of the cluster on your behalf.
- Clean installation and removal. Just test it, when you are done just use the same program to uninstall everything without leaving configs (or pods) behind.
Planned features (Wishlist / TODOs)
- Containerized Data Importer (CDI). Persistent storage management add-on for Kubernetes to provide a declarative way of building and importing Virtual Machine Disks on PVCs for
Ansible for add-on management by lmanfredi
Description
Machines can contains various combinations of add-ons and are often modified during the time.
The list of repos can change so I would like to create an automation able to reset the status to a given state, based on metadata available for these machines
Goals
Create an Ansible automation able to take care of add-on (repo list) configuration using metadata as reference
Resources
- Machines
- Repositories
- Developing modules
- Basic VM Guest management
- Module
zypper_repository_list
- ansible-collections community.general
Results
Created WIP project Ansible-add-on-openSUSE
Team Hedgehogs' Data Observability Dashboard by gsamardzhiev
Description
This project aims to develop a comprehensive Data Observability Dashboard that provides r insights into key aspects of data quality and reliability. The dashboard will track:
Data Freshness: Monitor when data was last updated and flag potential delays.
Data Volume: Track table row counts to detect unexpected surges or drops in data.
Data Distribution: Analyze data for null values, outliers, and anomalies to ensure accuracy.
Data Schema: Track schema changes over time to prevent breaking changes.
The dashboard's aim is to support historical tracking to support proactive data management and enhance data trust across the data function.
Goals
Although the final goal is to create a power bi dashboard that we are able to monitor, our goals is to 1. Create the necessary tables that track the relevant metadata about our current data 2. Automate the process so it runs in a timely manner
Resources
AWS Redshift; AWS Glue, Airflow, Python, SQL
Why Hedgehogs?
Because we like them.
Cobbler Angular Web Interface by SchoolGuy
Project Description
The old Cobbler webinterface was built into the server, leading to a huge dependency stack only required for a few people.
Goal for this Hackweek
The project should aim to finalize the first prototype of the new Angular based web interface.
A secondary goal of this hackweek is to learn a lot of Angular.
Update for Hackweek 24
The GH project received some traction since I have some vacation. As such it is my aim to get a first alpha released to close the milestone 0.0.1 (or whatever version I can release with semantic release).
Resources
WebUI for your data by avicenzi
A single place to view every bit of data you have.
Problem
You have too much data and you are a data hoarder.
- Family photos and videos.
- Lots of eBooks, TV Shows, Movies, and else.
- Boxes full of papers (taxes, invoices, IDs, certificates, exams, and else).
- Bank account statements (multiple currencies, countries, and people).
Maybe you have some data on S3, some on your NAS, and some on your local PC.
- How do you get it all together?
- How do you link a bank transaction to a product invoice?
- How to tag any object type and create a collection out of it (mix videos, photos, PDFs, transactions)?
- How to store this? file/folder structure does not work, everything is linked together
Project Description
The idea is a place where you can throw all your data, photos, videos, documents, binaries, and else.
Create photo albums, document collections, add tags across multiple file-formats, link content, and else.
The UI should be easy to use, where the data is not important for now (could be all S3 or local drive).
Similar proposals
The closest I found so far is https://perkeep.org/, but this is not what I'm looking for.
Goal for this Hackweek
Create a web UI, in Svelte ideally, perhaps React.
It should be able to show photos and videos at least.
Resources
None so far, this is just an idea.