I maintain the mailprocessing utility, a Turing complete maildir and IMAP capable filter with rules written in Python, both upstream and in openSUSE. It is currently suffering from some performance problems I'll devote my Hackweek (or part of it) to improving. Most notably the following areas will need some attention:

  • Ability to deal with large inboxes. This one is already done and I have been using it myself for a while but not released, yet. If you urgently need it feel free to grab it from the master branch. I'll give it another once-over before release, but it should be fine.
  • Batching for imapproc header downloads: right now, imapproc issues a FETCH command for downloading every single email's headers. I'll try and find a way to batch these into fewer FETCH commands. That should greatly improve cache building duration for the day one returns from a two week vacation...
  • Tuneables for batch size: both the large inbox improvements and the header downloads will split their FETCH commands into batches. Depending on the IMAP server and how slow it is, a user may not be able to use the default batch size. For this user, the batch size should be exposed as a tuneable.
  • Log rotation: the IMAP or maildir logs can grow to respectable sizes (several hundred megabytes) fairly quickly. It would therefore make sense to at least add some hooks for logrotate(1) so these logs can be rotated in a sensible manner.

By the end of the Hackweek I'll update the mailprocessing package to include the new changes. Also, feel free to post feature requests in the comments and I'll see what I can do...

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 18

Activity

  • over 4 years ago: jgrassler started this project.
  • almost 5 years ago: jgrassler originated this project.

  • Comments

    • jgrassler
      almost 5 years ago by jgrassler | Reply

      I've implemented batched header downloads now (feel free to grab it if you'd like to try). It's still a bit flaky though: I'm getting unexpected cache misses. Will take a closer look tomorrow...

    • jgrassler
      almost 5 years ago by jgrassler | Reply

      Found and fixed the unexpected cache misses - master branch updated.

    • jgrassler
      almost 5 years ago by jgrassler | Reply

      Currently investing a bug I introduced somewhere along the way:

      setting log level to 4
        54:52.32 imaplib version 2.58
        54:52.32 new IMAP4 connection, tag=b'GAOK'
      Traceback (most recent call last):
        File "/usr/bin/imapproc", line 11, in <module>
          load_entry_point('mailprocessing', 'console_scripts', 'imapproc')()
        File "/crypt/home/johannes/src/mailprocessing/mailprocessing/cmd/imap.py", line 317, in main
          exec(rc, environment)
        File "<string>", line 47, in <module>
        File "/crypt/home/johannes/src/mailprocessing/mailprocessing/processor/imap.py", line 284, in __iter__
          self.clean_exit()
        File "/crypt/home/johannes/src/mailprocessing/mailprocessing/processor/imap.py", line 700, in clean_exit
          self._save_cache(self.header_cache)
        File "/crypt/home/johannes/src/mailprocessing/mailprocessing/processor/imap.py", line 368, in _save_cache
          self.header_cache[folder]['uids'].pop(uid)
      KeyError: '91351'
      

      So far it doesn't appear overly harmful: the only thing that happens is non-existent keys not getting deleted from the cache. Still, might not be a great idea to use the master branch right now.

    • jgrassler
      almost 5 years ago by jgrassler | Reply

      Got it now. I accidentally used one and the same UID for all keys in the cache, leading to only one message per run getting processed, with the next one in line causing a crash.

    • jgrassler
      almost 5 years ago by jgrassler | Reply

      Tuneables for header/flag download batch size exist now (actually that bit was already done yesterday). Then I got a little sidetracked...

      As a first step towards log rotation I figured I'd create and write a pid file. I also figured that it might be a good idea to lock it. That took more tinkering than I'd expected for I wrote a locking function and fcntl.flock never worked right for some reason. The utility function I'd written for creating and locking the pid file turned out to be the culprit:

      def write_pidfile(filename):
          lock_acquired = False
      
          try:
              pidfile = open(filename, 'a')
          except IOError as e:
              print("Couldn't open pid file %s for writing: %s" % (filename, e),
                      file=sys.stderr)
      
          while not lock_acquired:
              try:
                  fcntl.flock(pidfile, fcntl.LOCK_EX | fcntl.LOCK_NB)
                  lock_acquired = True
              except OSError as e:
                  print("Couldn't acquire lock on pid file %s, sleeping for 5s" % pidfile.name, file=sys.stderr)
              time.sleep(5)
      
          pidfile.seek(0)
          pidfile.truncate()
          print(os.getpid(), file=pidfile)
          pidfile.flush()
      
      

      Whenever I launched two imapproc processes the second would just keep ignoring the first process' lock, which caused me considerable head scratching. When I finally thought to use strace(1) I discovered the problem: a rogue close() on the pid file's file descriptor I never put in there. Apparently Python does that to file descriptors opened in a function after exiting that function. I changed the function to use a file descriptor rather than a file name as its argument and indeed it worked just fine with that change in place.

      More sidetracking occurred as a result of testing: I used continuous mode for that (so the imapproc processes wouldn't exit by themselves) and that turned out to be pretty broken. Thus I spent most of this morning on general resilience and signal handling fixes for continuous mode. These fixes are already in the master branch. The pidfile stuff is still missing (still need to add documentation for that; expect it later today).

    • jgrassler
      almost 5 years ago by jgrassler | Reply

      Ok, got a SIGHUP handler as well now. I'll still need to stress test it, write documentation and example configs but looks good so far.

    • jgrassler
      almost 5 years ago by jgrassler | Reply

      I'm finished with testing and documentation now and the uploaded the 1.2.6 release to PyPI. I'll update the openSUSE package in a bit (want to test it against my freshly zypper duped machine first and the upgrade is still running on account of being...sizeable).

      I also did a little bit of extra testing: I signed up for an Office 365 test acocunt and ran imapproc against outlook.office365.com. It created folders and moved emails from Inbox to these folders just fine. So as of right now, imapproc is known to work (for me at any rate...) with the Dovecot, Groupwise and Office 365 IMAP dialects.

    • jgrassler
      almost 5 years ago by jgrassler | Reply

      Package updated in server:mail, tested and checks out. Here's the request against OpenSUSE Factory: https://build.opensuse.org/request/show/712270

    • jgrassler
      almost 5 years ago by jgrassler | Reply

      I stand corrected, here's the superseded request: https://build.opensuse.org/request/show/712271 (I used the setup.py created tarball. It differs from the Github one the source URL points to - the latter doesn't have build artifacts).

    • jgrassler
      over 4 years ago by jgrassler | Reply

      The openSUSE request made it into Factory on Friday, so the changes mentioned above should be available in Tumbleweed now.

    Similar Projects

    This project is one of its kind!