[Haskell-cafe] [ANNOUNCE] biostockholm 0.2
Felipe Almeida Lessa
felipe.lessa at gmail.com
Fri Jan 27 02:42:55 CET 2012
Hello!
I'm pleased to announce the second major release of the biostockholm
library! This library allows you to parse and render files in the
Stockholm 1.0 format, which is used by Pfam, Rfam, Infernal and others
for holding information about families of proteins or non-coding RNAs.
http://hackage.haskell.org/package/biostockholm
Despite this low increase in number from 0.1 to 0.2, this is actually
a big rewrite of the library. Now we have:
- An streaming interface similar to what SAX parsers provide. This
allows you to consume Stockholm files using constant memory (80k in a
simple case).
- More test cases. It's able to consume its own pretty printed
version of Rfam through the document interface, and is also capable of
reading the full Rfam stockholm file (which has some huge families)
through the streaming interface.
- QuickCheck properties. Now we have three different QuickCheck
properties covering almost everything. These have helped uncover some
tricky bugs that were never found before. However, two of these three
properties still don't pass, but I consider the failing examples that
I've investigated just corner cases. Unfortunately, Stockholm lacks a
formal specification.
- Conduit interface. Besides a lazy I/O version, now there's a
conduit interface.
- Code much easier to read and reason about.
- Fast enough: the streaming interface achieves 12 MiB/s for parsing,
which is pretty nice considering that there are some known overheads
on its implementation.
For the tasks that biostockholm 0.1 already handled, biostockholm 0.2
tends to be slightly slower. However, biostockholm 0.2 is able to
handle some previously impossible cases where an streaming solution is
required.
Cheers!
--
Felipe.
More information about the Haskell-Cafe
mailing list