SUSE Hack Week: Casky – Lightweight C Key-Value Engine with Crash Recovery

Description

Casky is a lightweight, crash-safe key-value store written in C, designed for fast storage and retrieval of data with a minimal footprint. Built using Test-Driven Development (TDD), Casky ensures reliability while keeping the codebase clean and maintainable. It is inspired by Bitcask and aims to provide a simple, embeddable storage engine that can be integrated into microservices, IoT devices, and other C-based applications.

Objectives:

Implement a minimal key-value store with append-only file storage.
Support crash-safe persistence and recovery.
Expose a simple public API: store(key, value), load(key), delete(key).
Follow TDD methodology for robust and testable code.
Provide a foundation for future extensions, such as in-memory caching, compaction, and eventual integration with vector-based databases like PixelDB.

Why This Project is Interesting:

Casky combines low-level C programming with modern database concepts, making it an ideal playground to explore storage engines, crash safety, and performance optimization. It’s small enough to complete during Hackweek, yet it provides a solid base for future experiments and more complex projects.

Goals

Working prototype with append-only storage and memtable.
TDD test suite covering core functionality and recovery.
Demonstration of basic operations: insert, load, delete.
Optional bonus: LRU caching, file compaction, performance benchmarks.

Future Directions:

After Hackweek, Casky can evolve into a backend engine for projects like PixelDB, supporting vector storage and approximate nearest neighbor search, combining low-level performance with cutting-edge AI retrieval applications.

Resources

The Bitcask paper: https://riak.com/assets/bitcask-intro.pdf The Casky repository: https://github.com/thesp0nge/casky

Day 1

[0.10.0] - 2025-12-01

Added

Core in-memory KeyDir and EntryNode structures
API functions: caskyopen, caskyclose, caskyput, caskyget, casky_delete
Hash function: caskydjb2hash_xor
Error handling via casky_errno
Unit tests for all APIs using standard asserts
Test cleanup of temporary files

Changed

None (first MVP)

Fixed

None (first MVP)

Day 2

[0.20.0] - 2025-12-02

Added

Append-only log support: every PUT and DELETE operation is now persisted to a log file in Bitcask format: [CRC][Timestamp][KeyLen][ValueLen][Key][Value].
caskywritedatatofile() function for atomic log writes with optional fsync.
synconwrite flag in KeyDir to control whether each write is flushed to disk.
casky_crc32() function to calculate CRC32 checksums of log records.
caskyputinmemory() and caskydeletefrommemory() helper functions to separate memory and disk logic.
caskyopen() now reconstructs the KeyDir from the log file, skipping corrupted records and setting corrupteddir flag.
casky_compact() API: creates a compacted log file removing corrupted entries and outdated keys.
corrupted_dir field in KeyDir to signal the presence of corrupted records.
Tests for log integrity, including crash-resilient reads and writes.
casky_logdump tool to inspect the contents of a log file, verifying CRC.

Changed

caskyput() and caskydelete() now call in-memory helpers and write to log atomically.
caskyopen() sets caskyerrno to CASKYERRCORRUPT when corrupted records are detected.
Updated TDD tests to reflect append-only log behavior and crash recovery semantics.
Error handling improved for memory allocation and I/O failures.
Logging of DELETE operations in the append-only log.

Fixed

Memory leaks in log reading routines.
Timestamp handling in log writes, ensuring correct reconstruction during casky_open().
CRC verification now applied correctly to both keys and values.
testputwriteslog() and testlog_integrity() corrected to match the new append-only logic

Day 3

[0.30.0] – 2025-12-03

Added

Compile-time thread-safety flag -DTHREAD_SAFE:
Enables concurrent access to caskyput, caskyget, caskydelete, and caskycompact.
Without the flag, APIs behave according to the original Bitcask paper (non-thread-safe).
Server caskyd now relies on the library for thread-safety instead of managing locks internally.
Stress test teststresscaskyd added to verify multi-client concurrency (only active with -DTHREAD_SAFE).
Comments in the code documenting the architecture choice for optional thread-safety.

Changed

caskyput, caskyget, caskydelete, and caskycompact now internally handle locking when THREAD_SAFE is defined.
caskyd handles multiple clients via pthreads without managing locks itself.
Logging and client welcome messages standardized.
Improved error handling (casky_errno) for corrupted logs and invalid commands.

Fixed

Bug in stress tests caused by unexpected server welcome message.
Compatibility fix for usleep() function across platforms.
Corrected behavior when corrupted log records are detected; casky_errno now properly set.

[0.40.0] - 2025-12-04

Added

TTL (time-to-live) support for keys in memory
Key expiration cleanup routine (casky_expire)
Server-side commands implemented:
COMPACT: compacts log file
STAT: retrieves runtime statistics
VER: returns Casky version

No Hackers yet

Join this project Leave this project

Looking for hackers with the skills:

database

This project is part of:

Hack Week 25

Activity

29 days ago: dmdiss liked this project.

about 1 month ago: abergmann liked this project.

about 1 month ago: m.crivellari liked this project.

about 1 month ago: fabriziosestito liked this project.

about 2 months ago: horon liked this project.

2 months ago: llansky3 liked this project.

3 months ago: wfrisch liked this project.

3 months ago: pperego added keyword "database" to this project.

3 months ago: pperego originated this project.

Comments

Be the first to comment!

Similar Projects

database

Collection and organisation of information about Bulgarian schools by iivanov

Description

To achieve this it will be necessary:

Collect/download raw data from various government and non-governmental organizations
Clean up raw data and organise it in some kind database.
Create tool to make queries easy.
Or perhaps dump all data into AI and ask questions in natural language.

Goals

By selecting particular school information like this will be provided:

School scores on national exams.
School scores from the external evaluations exams.
School town, municipality and region.
Employment rate in a town or municipality.
Average health of the population in the region.

Resources

Some of these are available only in bulgarian.

https://danybon.com/klasazia
https://nvoresults.com/index.html
https://ri.mon.bg/active-institutions
https://www.nsi.bg/nrnm/ekatte/archive

Results

Information about all Bulgarian schools with their scores during recent years cleaned and organised into SQL tables
Information about all Bulgarian villages, cities, municipalities and districts cleaned and organised into SQL tables
Information about all Bulgarian villages and cities census since beginning of this century cleaned and organised into SQL tables.
Information about all Bulgarian municipalities about religion, ethnicity cleaned and organised into SQL tables.
Data successfully loaded to locally running Ollama with help to Vanna.AI
Seems to be usable.

TODO

Add more statistical information about municipalities and ....

Code and data

Github

Uyuni read-only replica by cbosdonnat

Description

For now, there is no possible HA setup for Uyuni. The idea is to explore setting up a read-only shadow instance of an Uyuni and make it as useful as possible.

Possible things to look at:

live sync of the database, probably using the WAL. Some of the tables may have to be skipped or some features disabled on the RO instance (taskomatic, PXT sessions…)
Can we use a load balancer that routes read-only queries to either instance and the other to the RW one? For example, packages or PXE data can be served by both, the API GET requests too. The rest would be RW.

Goals

Prepare a document explaining how to do it.
PR with the needed code changes to support it

Sim racing track database by avicenzi

Description

Do you wonder which tracks are available in each sim racing game? Wonder no more.

Goals

Create a simple website that includes details about sim racing games.

The website should be static and built with Alpine.JS and TailwindCSS. Data should be consumed from JSON, easily done with Alpine.JS.

The main goal is to gather track information, because tracks vary by game. Older games might have older layouts, and newer games might have up-to-date layouts. Some games include historical layouts, some are laser scanned. Many tracks are available as DLCs.

Initially include official tracks from:

These games have a short list of tracks and DLCs.

Resources

The hardest part is collecting information about tracks in each game. Active games usually have information on their website or even on Steam. Older games might be on Fandom or a Wiki. Real track information can be extracted from Wikipedia or the track website.

Port the classic browser game HackTheNet to PHP 8 by dgedon

Description

The classic browser game HackTheNet from 2004 still runs on PHP 4/5 and MySQL 5 and needs a port to PHP 8 and e.g. MariaDB.

Goals

Port the game to PHP 8 and MariaDB 11
Create a container where the game server can simply be started/stopped

Resources

https://github.com/nodeg/hackthenet

Work on kqlite (Lightweight remote SQLite with high availability and auto failover). by epenchev

Description

Continue the work on kqlite (Lightweight remote SQLite with high availability and auto failover).
It's a solution for applications that require High Availability but don't need all the features of a complete RDBMS and can fit SQLite in their use case.
Also kqlite can be considered to be used as a lightweight storage backend for K8s (https://docs.k3s.io/datastore) and the Edge, and allowing to have only 2 Nodes for HA.

Goals

Push kqlite to a beta version.
kqlite as library for Go programs.

Resources

https://github.com/kqlite/kqlite