djmount is a neat idea - see UPnP/AV resources in your directly in the file system. Unfortunately the code hasn't been maintained for ~10y, and - at least for me - seems to by plagued by various bugs causing crashes and what not. There's currently no official openSUSE package. This project aims to pick up the code, fix bugs, and make the tool actually useful again. The code itself seems to be in quite a good shape, so this should be doable.
Looking for hackers with the skills:
Nothing? Add some keywords!
This project is part of:
Hack Week 15 Hack Week 16
Comments
-
almost 8 years ago by mwilck | Reply
Hackweek #15 results
I grabbed this project, hoping to be able to pull a few bug fixes from github and then starting to explore why djmount wouldn't read files from my Raumfeld media server corretctly. In order to do this, I needed to enable debug messages in djmount. Unfortunately, as soon as I did this, I started to encounter segmentation faults and ABORTs in djmount after a few minutes runtime. The error stacks varied, but they were always related to memory management, usually malloc reporting corruption of its internal data structures.
I tested various memory debuggers (Valgrind, ElectricFence, Address Sanitizer), but doing that seemed to either prevent the problems from occuring - the program would now run stably for many hours. The only tool that seemed to find problems with glibc's mtrace, but these turned out to be false positives, as mtrace can't deal with multi-threaded programs. Valgrind, I learned on a SUSE mailing list, is not well suited to debug multithreading-related corruption.
djmount uses the talloc library for its own memory management. At the same time, it uses threading heavily; it employs the MT main loop of FUSE, and uses other threads for both libupnp and internal purposes. The talloc documentation mentions that talloc, by itself, is not MT-safe, and provides examples how to deal with that. This is a generic problem of the djmount code - there's nothing to provide protection from races in the talloc library. It took me quite a while to realize that - initially I couldn't believe that the author Rémi Turboult, whose code otherwise meets high standards AFAICT, might have overlooked such a basic problem. This is not only a matter of protecting the main program's data structures - talloc's internal data can be easily corrupted if the same context is accessed by several threads in parallel.
Before diving into the djmount code directly, I wrote a test program to find out what kind of problems occur with threaded talloc calls, and how to best avoid them. This test program ended up with >1000 lines of code. Eventually I could demonstrate that improper use of threading could cause the same phenomena that I'd observed with the djmount code, and that this could be avoided with proper locking around talloc calls. In short, what's necessary is to protect all talloc calls to shared contexts (=data structures) using mutexes. Thread-local talloc context can be used as usual, if the application writer can prove that its talloc data structures will never be manipulated in another thread.
With this knowledge, I went back to the djmount source code and tried to separate "global" and "local" context. The unpleasant part of this is that it's hard to maintain - everyone working with the code needs to understand this distinction and needs to track cleanly which talloc context is used how, and where, in the code. It's not generally possible to detect wrong usage automatically. The djmount code passes talloc'ed memory around between different code modules, making it pretty hard to assess locality correctly.
Therefore I sat back again and started wondering whether it might be possible to actually make a thread-safe version of the talloc library. This would make it possible to use fine-grained per-context locking rather than a slow and clumsy global lock. I think it is certainly possible to do this, but it's far from easy. Various talloc operations involve 2 or 3 different context objects. Thread-safe operation would require locking all of them, which would pose severe risk of deadlock unless the code was written very carefully. I suppose the current version of talloc is not MT-safe for a reason.
So, there's now a choice of options, each of which has pros and cons:
- Pursue analysis of the djmount code, adding locking primitives around talloc calls as appropriate. This is doable in limited time, but will likely not produce an optimal solution, and result in hard-to-maintain code.
- Convert djmount from using talloc to some other memory management code, possibly plain malloc(). This would be possible and not too hard, but memory leaks are likely to result, and eliminating all of them will be costly. Avoiding leaks by recursively freeing memory is one of the key points of talloc, and heavily used in the djmount code.
- Try to create an MT-safe talloc library. This is the cleanest option and promises to provide the best result. It's is also by far the most difficult and challenging option, and it's possible that I'll face problems I won't be able to solve.
Similar Projects
This project is one of its kind!