Ever left a restaurant wanting to write a review, but thinking it wasn't worth the trouble to tap out all those words on your phone -- you just want to give the place your n stars and provide a few words of praise or condemnation? If only you could press a button to generate a plausible review. If this project happens, you will.
We'll use the Yelp API to grab as many reviews of certain types of restaurants as the terms of service allow (I assume "Use any robot, spider, site search/retrieval application, or other automated device, process or means to access, retrieve, scrape, or index any portion of the Site or any Site Content;" doesn't apply to API users -- otherwise it wouldn't be much of an API).
We'll investigate libraries like spaCy for doing natural language processing (in Python).
We'll dive into the research on Markov Chains for Natural Language Generation
Finally we'll put this functionality on the web, using Flask & SQLAlchemy in front of a postgres database. The idea is to generate a sample review, and allow the user to easily tweak it and copy it to the clipboard, not to automatically post the reviews to Yelp.
Looking for hackers with the skills:
This project is part of:
Hack Week 15
Activity
Comments
-
almost 8 years ago by ericp | Reply
Mixed results on the hack.
On one hand, I learned how to use Python's
nltk
library to do natural language processing on the reviews, to the point that I could lex each word.nltk
has parsing abilities as well, but I decided to see how far I could get with just lexing and statistical analysis. Also, theflask
andsqlalchemy
libraries were straightforward Python analogs of Ruby'ssinatra
andActiveRecord
/Sequel
ORMs, nothing new here. And the Yelp API was straightforward to work with.The downside was that the Yelp API only exposed the first 10 words of three reviews for each business. If we assume the average business has 100 reviews of about 200 words each, this wasn't going to give me the data I needed. However, each review in the resource returned by
https://api.yelp.com/v3/businessesBUSINESSID/reviews
also contained a URL, and following that URL gave me the full text of 19 reviews.And apparently following that URL violated Yelp's general terms of service, but not the developers's ToS, and I was cut off after pulling down reviews for 350 restaurants. At least I randomized my selection procedure, so I ended up with a smattering of Mexican, Chinese, Japanese and American-style restaurants.
The best generated sentence might have been one of the first: "I travel the bone tender." I also liked "My wife had the chipotle pancakes." But most of the sentences were grammatically incorrect, or made no sense, or both. I did try tweaking the Markov generator to use a mix of single-word and double-word prefixes, but given the lack of data, I ended the hack and went back to work.
Similar Projects
SUSE AI Meets the Game Board by moio
Use tabletopgames.ai’s open source TAG and PyTAG frameworks to apply Statistical Forward Planning and Deep Reinforcement Learning to two board games of our own design. On an all-green, all-open source, all-AWS stack!
Results: Infrastructure Achievements
We successfully built and automated a containerized stack to support our AI experiments. This included:
- a Fully-Automated, One-Command, GPU-accelerated Kubernetes setup: we created an OpenTofu based script, tofu-tag, to deploy SUSE's RKE2 Kubernetes running on CUDA-enabled nodes in AWS, powered by openSUSE with GPU drivers and gpu-operator
- Containerization of the TAG and PyTAG frameworks: TAG (Tabletop AI Games) and PyTAG were patched for seamless deployment in containerized environments. We automated the container image creation process with GitHub Actions. Our forks (PRs upstream upcoming):
./deploy.sh
and voilà - Kubernetes running PyTAG (k9s
, above) with GPU acceleration (nvtop
, below)
Results: Game Design Insights
Our project focused on modeling and analyzing two card games of our own design within the TAG framework:
- Game Modeling: We implemented models for Dario's "Bamboo" and Silvio's "Totoro" and "R3" games, enabling AI agents to play thousands of games ...in minutes!
- AI-driven optimization: By analyzing statistical data on moves, strategies, and outcomes, we iteratively tweaked the game mechanics and rules to achieve better balance and player engagement.
- Advanced analytics: Leveraging AI agents with Monte Carlo Tree Search (MCTS) and random action selection, we compared performance metrics to identify optimal strategies and uncover opportunities for game refinement .
- more about Bamboo on Dario's site
- more about R3 on Silvio's site (italian, translation coming)
- more about Totoro on Silvio's site
A family picture of our card games in progress. From the top: Bamboo, Totoro, R3
Results: Learning, Collaboration, and Innovation
Beyond technical accomplishments, the project showcased innovative approaches to coding, learning, and teamwork:
- "Trio programming" with AI assistance: Our "trio programming" approach—two developers and GitHub Copilot—was a standout success, especially in handling slightly-repetitive but not-quite-exactly-copypaste tasks. Java as a language tends to be verbose and we found it to be fitting particularly well.
- AI tools for reporting and documentation: We extensively used AI chatbots to streamline writing and reporting. (Including writing this report! ...but this note was added manually during edit!)
- GPU compute expertise: Overcoming challenges with CUDA drivers and cloud infrastructure deepened our understanding of GPU-accelerated workloads in the open-source ecosystem.
- Game design as a learning platform: By blending AI techniques with creative game design, we learned not only about AI strategies but also about making games fun, engaging, and balanced.
Last but not least we had a lot of fun! ...and this was definitely not a chatbot generated line!
The Context: AI + Board Games
Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil
Join the Gitter channel! https://gitter.im/uyuni-project/hackweek
Uyuni is a configuration and infrastructure management tool that saves you time and headaches when you have to manage and update tens, hundreds or even thousands of machines. It also manages configuration, can run audits, build image containers, monitor and much more!
Currently there are a few distributions that are completely untested on Uyuni or SUSE Manager (AFAIK) or just not tested since a long time, and could be interesting knowing how hard would be working with them and, if possible, fix whatever is broken.
For newcomers, the easiest distributions are those based on DEB or RPM packages. Distributions with other package formats are doable, but will require adapting the Python and Java code to be able to sync and analyze such packages (and if salt does not support those packages, it will need changes as well). So if you want a distribution with other packages, make sure you are comfortable handling such changes.
No developer experience? No worries! We had non-developers contributors in the past, and we are ready to help as long as you are willing to learn. If you don't want to code at all, you can also help us preparing the documentation after someone else has the initial code ready, or you could also help with testing :-)
The idea is testing Salt and Salt-ssh clients, but NOT traditional clients, which are deprecated.
To consider that a distribution has basic support, we should cover at least (points 3-6 are to be tested for both salt minions and salt ssh minions):
- Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
- Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)
- Package management (install, remove, update...)
- Patching
- Applying any basic salt state (including a formula)
- Salt remote commands
- Bonus point: Java part for product identification, and monitoring enablement
- Bonus point: sumaform enablement (https://github.com/uyuni-project/sumaform)
- Bonus point: Documentation (https://github.com/uyuni-project/uyuni-docs)
- Bonus point: testsuite enablement (https://github.com/uyuni-project/uyuni/tree/master/testsuite)
If something is breaking: we can try to fix it, but the main idea is research how supported it is right now. Beyond that it's up to each project member how much to hack :-)
- If you don't have knowledge about some of the steps: ask the team
- If you still don't know what to do: switch to another distribution and keep testing.
This card is for EVERYONE, not just developers. Seriously! We had people from other teams helping that were not developers, and added support for Debian and new SUSE Linux Enterprise and openSUSE Leap versions :-)
Pending
FUSS
FUSS is a complete GNU/Linux solution (server, client and desktop/standalone) based on Debian for managing an educational network.
https://fuss.bz.it/
Seems to be a Debian 12 derivative, so adding it could be quite easy.
[W]
Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)[W]
Onboarding (salt minion from UI, salt minion from bootstrap script, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator) --> Working for all 3 options (salt minion UI, salt minion bootstrap script and salt-ssh minion from the UI).[W]
Package management (install, remove, update...) --> Installing a new package works, needs to test the rest.[I]
Patching (if patch information is available, could require writing some code to parse it, but IIRC we have support for Ubuntu already). No patches detected. Do we support patches for Debian at all?[W]
Applying any basic salt state (including a formula)[W]
Salt remote commands[ ]
Bonus point: Java part for product identification, and monitoring enablement
Team Hedgehogs' Data Observability Dashboard by gsamardzhiev
Description
This project aims to develop a comprehensive Data Observability Dashboard that provides r insights into key aspects of data quality and reliability. The dashboard will track:
Data Freshness: Monitor when data was last updated and flag potential delays.
Data Volume: Track table row counts to detect unexpected surges or drops in data.
Data Distribution: Analyze data for null values, outliers, and anomalies to ensure accuracy.
Data Schema: Track schema changes over time to prevent breaking changes.
The dashboard's aim is to support historical tracking to support proactive data management and enhance data trust across the data function.
Goals
Although the final goal is to create a power bi dashboard that we are able to monitor, our goals is to 1. Create the necessary tables that track the relevant metadata about our current data 2. Automate the process so it runs in a timely manner
Resources
AWS Redshift; AWS Glue, Airflow, Python, SQL
Why Hedgehogs?
Because we like them.
Make more sense of openQA test results using AI by livdywan
Description
AI has the potential to help with something many of us spend a lot of time doing which is making sense of openQA logs when a job fails.
User Story
Allison Average has a puzzled look on their face while staring at log files that seem to make little sense. Is this a known issue, something completely new or maybe related to infrastructure changes?
Goals
- Leverage a chat interface to help Allison
- Create a model from scratch based on data from openQA
- Proof of concept for automated analysis of openQA test results
Bonus
- Use AI to suggest solutions to merge conflicts
- This would need a merge conflict editor that can suggest solving the conflict
- Use image recognition for needles
Resources
Timeline
Day 1
- Conversing with open-webui to teach me how to create a model based on openQA test results
- Asking for example code using TensorFlow in Python
- Discussing log files to explore what to analyze
- Drafting a new project called Testimony (based on Implementing a containerized Python action) - the project name was also suggested by the assistant
Day 2
- Using NotebookLLM (Gemini) to produce conversational versions of blog posts
- Researching the possibility of creating a project logo with AI
- Asking open-webui, persons with prior experience and conducting a web search for advice
Highlights
- I briefly tested compared models to see if they would make me more productive. Between llama, gemma and mistral there was no amazing difference in the results for my case.
- Convincing the chat interface to produce code specific to my use case required very explicit instructions.
- Asking for advice on how to use open-webui itself better was frustratingly unfruitful both in trivial and more advanced regards.
- Documentation on source materials used by LLM's and tools for this purpose seems virtually non-existent - specifically if a logo can be generated based on particular licenses
Outcomes
- Chat interface-supported development is providing good starting points and open-webui being open source is more flexible than Gemini. Although currently some fancy features such as grounding and generated podcasts are missing.
- Allison still has to be very experienced with openQA to use a chat interface for test review. Publicly available system prompts would make that easier, though.
Symbol Relations by hli
Description
There are tools to build function call graphs based on parsing source code, for example, cscope
.
This project aims to achieve a similar goal by directly parsing the disasembly (i.e. objdump) of a compiled binary. The assembly code is what the CPU sees, therefore more "direct". This may be useful in certain scenarios, such as gdb/crash debugging.
Detailed description and Demos can be found in the README file:
Supports x86 for now (because my customers only use x86 machines), but support for other architectures can be added easily.
Tested with python3.6
Goals
Any comments are welcome.
Resources
https://github.com/lhb-cafe/SymbolRelations
symrellib.py: mplements the symbol relation graph and the disassembly parser
symrel_tracer*.py: implements tracing (-t option)
symrel.py: "cli parser"
Cobbler Angular Web Interface by SchoolGuy
Project Description
The old Cobbler webinterface was built into the server, leading to a huge dependency stack only required for a few people.
Goal for this Hackweek
The project should aim to finalize the first prototype of the new Angular based web interface.
A secondary goal of this hackweek is to learn a lot of Angular.
Update for Hackweek 24
The GH project received some traction since I have some vacation. As such it is my aim to get a first alpha released to close the milestone 0.0.1 (or whatever version I can release with semantic release).
Resources
WebUI for your data by avicenzi
A single place to view every bit of data you have.
Problem
You have too much data and you are a data hoarder.
- Family photos and videos.
- Lots of eBooks, TV Shows, Movies, and else.
- Boxes full of papers (taxes, invoices, IDs, certificates, exams, and else).
- Bank account statements (multiple currencies, countries, and people).
Maybe you have some data on S3, some on your NAS, and some on your local PC.
- How do you get it all together?
- How do you link a bank transaction to a product invoice?
- How to tag any object type and create a collection out of it (mix videos, photos, PDFs, transactions)?
- How to store this? file/folder structure does not work, everything is linked together
Project Description
The idea is a place where you can throw all your data, photos, videos, documents, binaries, and else.
Create photo albums, document collections, add tags across multiple file-formats, link content, and else.
The UI should be easy to use, where the data is not important for now (could be all S3 or local drive).
Similar proposals
The closest I found so far is https://perkeep.org/, but this is not what I'm looking for.
Goal for this Hackweek
Create a web UI, in Svelte ideally, perhaps React.
It should be able to show photos and videos at least.
Resources
None so far, this is just an idea.