Description
To achieve this it will be necessary:
- Collect/download raw data from various government and non-governmental organizations
- Clean up raw data and organise it in some kind database.
- Create tool to make queries easy.
- Or perhaps dump all data into AI and ask questions in natural language.
Goals
By selecting particular school information like this will be provided:
- School scores on national exams.
- School scores from the external evaluations exams.
- School town, municipality and region.
- Employment rate in a town or municipality.
- Average health of the population in the region.
Resources
Some of these are available only in bulgarian.
- https://danybon.com/klasazia
- https://nvoresults.com/index.html
- https://ri.mon.bg/active-institutions
- https://www.nsi.bg/nrnm/ekatte/archive
Results
- Information about all Bulgarian schools with their scores during recent years cleaned and organised into SQL tables
- Information about all Bulgarian villages, cities, municipalities and districts cleaned and organised into SQL tables
- Information about all Bulgarian villages and cities census since beginning of this century cleaned and organised into SQL tables.
- Information about all Bulgarian municipalities about religion, ethnicity cleaned and organised into SQL tables.
- Data successfully loaded to locally running Ollama with help to Vanna.AI
- Seems to be usable.
TODO
- Add more statistical information about municipalities and ....
Code and data
This project is part of:
Hack Week 25
Activity
Comments
-
20 days ago by rtsvetkov | Reply
I'm really excited to see some results... Even raw and preliminary
-
12 days ago by iivanov | Reply
Initial version is ready. Don't expect too much. It is somehow usable. For better results use queries in Bulgarian language ;-) like:
- Колко общини има в България?
- Коя е най-малката от тях през 2005 година?
- Коя учебна институция има най-добър резултат от изпитите по математика през 2024 година?
- В кое населено място се намира?
- Колко е голямо? ...
-
Similar Projects
Enhance git-sha-verify: A tool to checkout validated git hashes by gpathak
Description
git-sha-verify is a simple shell utility to verify and checkout trusted git commits signed using GPG key. This tool helps ensure that only authorized or validated commit hashes are checked out from a git repository, supporting better code integrity and security within the workflow.
Supports:
- Verifying commit authenticity signed using gpg key
- Checking out trusted commits
Ideal for teams and projects where the integrity of git history is crucial.
Goals
A minimal python code of the shell script exists as a pull request.
The goal of this hackweek is to:
- DONE: Add more unit tests
- New and more tests can be added later
- New and more tests can be added later
- Partially DONE: Make the python code modular
- DONE: Add code coverage if possible
Resources
- Link to GitHub Repository: https://github.com/openSUSE/git-sha-verify
Song Search with CLAP by gcolangiuli
Description
Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface
Goals
Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:
- Music Tagging;
- Free text search;
- Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.
The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.
Result
In this MVP we implemented:
- Async Song Analysis with Clap model
- Free Text Search of the songs
- Similar song search based on vector representation
- Containerised version with web interface
We also documented what went well and what can be improved in the use of AI.
You can have a look at the result here:
Future implementation can be related to performance improvement and stability of the analysis.
References
- CLAP: The main model being researched;
- huggingface: Pre-trained models for CLAP;
- Free Music Archive: Creative Commons songs that can be used for testing;
Improvements to osc (especially with regards to the Git workflow) by mcepl
Description
There is plenty of hacking on osc, where we could spent some fun time. I would like to see a solution for https://github.com/openSUSE/osc/issues/2006 (which is sufficiently non-serious, that it could be part of HackWeek project).
Improve chore and screen time doc generator script `wochenplaner` by gniebler
Description
I wrote a little Python script to generate PDF docs, which can be used to track daily chore completion and screen time usage for several people, with one page per person/week.
I named this script wochenplaner and have been using it for a few months now.
It needs some improvements and adjustments in how the screen time should be tracked and how chores are displayed.
Goals
- Fix chore field separation lines
- Change screen time tracking logic from "global" (week-long) to daily subtraction and weekly addition of remainders (more intuitive than current "weekly time budget method)
- Add logic to fill in chore fields/lines, ideally with pictures, falling back to text.
Resources
tbd (Gitlab repo)
Improve/rework household chore tracker `chorazon` by gniebler
Description
I wrote a household chore tracker named chorazon, which is meant to be deployed as a web application in the household's local network.
It features the ability to set up different (so far only weekly) schedules per task and per person, where tasks may span several days.
There are "tokens", which can be collected by users. Tasks can (and usually will) have rewards configured where they yield a certain amount of tokens. The idea is that they can later be redeemed for (surprise) gifts, but this is not implemented yet. (So right now one needs to edit the DB manually to subtract tokens when they're redeemed.)
Days are not rolled over automatically, to allow for task completion control.
We used it in my household for several months, with mixed success. There are many limitations in the system that would warrant a revisit.
It's written using the Pyramid Python framework with URL traversal, ZODB as the data store and Web Components for the frontend.
Goals
- Add admin screens for users, tasks and schedules
- Add models, pages etc. to allow redeeming tokens for gifts/surprises
- …?
Resources
tbd (Gitlab repo)
Work on kqlite (Lightweight remote SQLite with high availability and auto failover). by epenchev
Description
Continue the work on kqlite (Lightweight remote SQLite with high availability and auto failover).
It's a solution for applications that require High Availability but don't need all the features of a complete RDBMS and can fit SQLite in their use case.
Also kqlite can be considered to be used as a lightweight storage backend for K8s (https://docs.k3s.io/datastore) and the Edge, and allowing to have only 2 Nodes for HA.
Goals
Push kqlite to a beta version.
kqlite as library for Go programs.
Resources
https://github.com/kqlite/kqlite
Sim racing track database by avicenzi
Description
Do you wonder which tracks are available in each sim racing game? Wonder no more.
Goals
Create a simple website that includes details about sim racing games.
The website should be static and built with Alpine.JS and TailwindCSS. Data should be consumed from JSON, easily done with Alpine.JS.
The main goal is to gather track information, because tracks vary by game. Older games might have older layouts, and newer games might have up-to-date layouts. Some games include historical layouts, some are laser scanned. Many tracks are available as DLCs.
Initially include official tracks from:
- ACC
- iRacing
- PC2
- LMU
- Raceroom
- Rennsport
These games have a short list of tracks and DLCs.
Resources
The hardest part is collecting information about tracks in each game. Active games usually have information on their website or even on Steam. Older games might be on Fandom or a Wiki. Real track information can be extracted from Wikipedia or the track website.
Casky – Lightweight C Key-Value Engine with Crash Recovery by pperego
Description
Casky is a lightweight, crash-safe key-value store written in C, designed for fast storage and retrieval of data with a minimal footprint. Built using Test-Driven Development (TDD), Casky ensures reliability while keeping the codebase clean and maintainable. It is inspired by Bitcask and aims to provide a simple, embeddable storage engine that can be integrated into microservices, IoT devices, and other C-based applications.
Objectives:
- Implement a minimal key-value store with append-only file storage.
- Support crash-safe persistence and recovery.
- Expose a simple public API: store(key, value), load(key), delete(key).
- Follow TDD methodology for robust and testable code.
- Provide a foundation for future extensions, such as in-memory caching, compaction, and eventual integration with vector-based databases like PixelDB.
Why This Project is Interesting:
Casky combines low-level C programming with modern database concepts, making it an ideal playground to explore storage engines, crash safety, and performance optimization. It’s small enough to complete during Hackweek, yet it provides a solid base for future experiments and more complex projects.
Goals
- Working prototype with append-only storage and memtable.
- TDD test suite covering core functionality and recovery.
- Demonstration of basic operations: insert, load, delete.
- Optional bonus: LRU caching, file compaction, performance benchmarks.
Future Directions:
After Hackweek, Casky can evolve into a backend engine for projects like PixelDB, supporting vector storage and approximate nearest neighbor search, combining low-level performance with cutting-edge AI retrieval applications.
Resources
The Bitcask paper: https://riak.com/assets/bitcask-intro.pdf The Casky repository: https://github.com/thesp0nge/casky
Day 1
[0.10.0] - 2025-12-01
Added
- Core in-memory KeyDir and EntryNode structures
- API functions: caskyopen, caskyclose, caskyput, caskyget, casky_delete
- Hash function: caskydjb2hash_xor
- Error handling via casky_errno
- Unit tests for all APIs using standard asserts
- Test cleanup of temporary files
Changed
- None (first MVP)
Fixed
- None (first MVP)
Day 2
[0.20.0] - 2025-12-02
Port the classic browser game HackTheNet to PHP 8 by dgedon
Description
The classic browser game HackTheNet from 2004 still runs on PHP 4/5 and MySQL 5 and needs a port to PHP 8 and e.g. MariaDB.
Goals
- Port the game to PHP 8 and MariaDB 11
- Create a container where the game server can simply be started/stopped
Resources
- https://github.com/nodeg/hackthenet
Uyuni read-only replica by cbosdonnat
Description
For now, there is no possible HA setup for Uyuni. The idea is to explore setting up a read-only shadow instance of an Uyuni and make it as useful as possible.
Possible things to look at:
- live sync of the database, probably using the WAL. Some of the tables may have to be skipped or some features disabled on the RO instance (taskomatic, PXT sessions…)
- Can we use a load balancer that routes read-only queries to either instance and the other to the RW one? For example, packages or PXE data can be served by both, the API GET requests too. The rest would be RW.
Goals
- Prepare a document explaining how to do it.
- PR with the needed code changes to support it