I maintain the mailprocessing utility, a Turing complete maildir and IMAP capable filter with rules written in Python, both upstream and in openSUSE. It is currently suffering from some performance problems I'll devote my Hackweek (or part of it) to improving. Most notably the following areas will need some attention:
- Ability to deal with large inboxes. This one is already done and I have been using it myself for a while but not released, yet. If you urgently need it feel free to grab it from the
master
branch. I'll give it another once-over before release, but it should be fine. - Batching for imapproc header downloads: right now, imapproc issues a
FETCH
command for downloading every single email's headers. I'll try and find a way to batch these into fewerFETCH
commands. That should greatly improve cache building duration for the day one returns from a two week vacation... - Tuneables for batch size: both the large inbox improvements and the header downloads will split their
FETCH
commands into batches. Depending on the IMAP server and how slow it is, a user may not be able to use the default batch size. For this user, the batch size should be exposed as a tuneable. - Log rotation: the IMAP or maildir logs can grow to respectable sizes (several hundred megabytes) fairly quickly. It would therefore make sense to at least add some hooks for
logrotate(1)
so these logs can be rotated in a sensible manner.
By the end of the Hackweek I'll update the mailprocessing package to include the new changes. Also, feel free to post feature requests in the comments and I'll see what I can do...
Comments
-
over 5 years ago by jgrassler | Reply
Currently investing a bug I introduced somewhere along the way:
setting log level to 4 54:52.32 imaplib version 2.58 54:52.32 new IMAP4 connection, tag=b'GAOK' Traceback (most recent call last): File "/usr/bin/imapproc", line 11, in load_entry_point('mailprocessing', 'console_scripts', 'imapproc')() File "/crypt/home/johannes/src/mailprocessing/mailprocessing/cmd/imap.py", line 317, in main exec(rc, environment) File "", line 47, in File "/crypt/home/johannes/src/mailprocessing/mailprocessing/processor/imap.py", line 284, in __iter__ self.clean_exit() File "/crypt/home/johannes/src/mailprocessing/mailprocessing/processor/imap.py", line 700, in clean_exit self._save_cache(self.header_cache) File "/crypt/home/johannes/src/mailprocessing/mailprocessing/processor/imap.py", line 368, in _save_cache self.header_cache[folder]['uids'].pop(uid) KeyError: '91351'
So far it doesn't appear overly harmful: the only thing that happens is non-existent keys not getting deleted from the cache. Still, might not be a great idea to use the
master
branch right now. -
over 5 years ago by jgrassler | Reply
Tuneables for header/flag download batch size exist now (actually that bit was already done yesterday). Then I got a little sidetracked...
As a first step towards log rotation I figured I'd create and write a pid file. I also figured that it might be a good idea to lock it. That took more tinkering than I'd expected for I wrote a locking function and
fcntl.flock
never worked right for some reason. The utility function I'd written for creating and locking the pid file turned out to be the culprit:``` def writepidfile(filename): lockacquired = False
try: pidfile = open(filename, 'a') except IOError as e: print("Couldn't open pid file %s for writing: %s" % (filename, e), file=sys.stderr) while not lock_acquired: try: fcntl.flock(pidfile, fcntl.LOCK_EX | fcntl.LOCK_NB) lock_acquired = True except OSError as e: print("Couldn't acquire lock on pid file %s, sleeping for 5s" % pidfile.name, file=sys.stderr) time.sleep(5) pidfile.seek(0) pidfile.truncate() print(os.getpid(), file=pidfile) pidfile.flush()
```
Whenever I launched two imapproc processes the second would just keep ignoring the first process' lock, which caused me considerable head scratching. When I finally thought to use
strace(1)
I discovered the problem: a rogueclose()
on the pid file's file descriptor I never put in there. Apparently Python does that to file descriptors opened in a function after exiting that function. I changed the function to use a file descriptor rather than a file name as its argument and indeed it worked just fine with that change in place.More sidetracking occurred as a result of testing: I used continuous mode for that (so the
imapproc
processes wouldn't exit by themselves) and that turned out to be pretty broken. Thus I spent most of this morning on general resilience and signal handling fixes for continuous mode. These fixes are already in themaster
branch. The pidfile stuff is still missing (still need to add documentation for that; expect it later today). -
over 5 years ago by jgrassler | Reply
I'm finished with testing and documentation now and the uploaded the 1.2.6 release to PyPI. I'll update the openSUSE package in a bit (want to test it against my freshly
zypper dup
ed machine first and the upgrade is still running on account of being...sizeable).I also did a little bit of extra testing: I signed up for an Office 365 test acocunt and ran
imapproc
againstoutlook.office365.com
. It created folders and moved emails fromInbox
to these folders just fine. So as of right now,imapproc
is known to work (for me at any rate...) with the Dovecot, Groupwise and Office 365 IMAP dialects.
Similar Projects
This project is one of its kind!