[Haskell-beginners] program not running lazily

Thu Jan 21 00:53:02 UTC 2016

Hi,

I'm teaching myself Haskell and attempting to use it for day-to-day tasks
at work. I signed up for this list a few weeks ago, and this is my first
post. Here's my problem:

I'm given a list of 45 Strings, each nine Chars long. The only characters
are 'A', 'C', 'G' and 'T'. (This is for a bioinformatics application.) I'll
refer to any nine-length String composed of these Chars as a "9-mer".

I need to generate a larger list of 9-mers such that no two 9-mers have a
Levenshtein distance of less than 3. (The list I start with satisfies this
requirement.)

I can generate as many 9-mers as I possibly can, but this process is very
slow. It's also not being computed lazily; calling head on the output list
forces the entire list of 9-mers to be computed. *I'd like to understand
why this list isn't being computed lazily, and how I can change my code to
make it so. *My knowledge of monads is pretty poor as well, so a digression
or a series of questions about why the line [9] >>= (`replicateM` "ACGT")
works would also be helpful, as would general tips about writing clean code.

This is an O(n^2) operation, so I'm not expecting it to be slow. However, I
figured that I'd just be able to take the first N elements from the output
list.

Here's what I have:

import Control.Monad

main :: IO ()
main = interact processData

processData :: String -> String
processData = unlines . (`merge` ([9] >>= (`replicateM` "ACGT"))) . lines

-- Merges two lists of things into a single list
merge :: Eq a => [[a]] -> [[a]] -> [[a]]
merge xs [] = xs
merge xs ys = merge ((head ys) `addInto` xs) $ tail ys

-- Adds a thing into a list if it is "different" enough
addInto :: Eq a => [a] -> [[a]] -> [[a]]
y `addInto` ys =
  if ((minimum $ map (dist y) ys) < 3)
    then ys
    else y:ys

-- Calculate the Levenshtein distance
-- Lloyd Allison algorithm. See Haskell wiki
-- code omitted
dist :: Eq a => [a] -> [a] -> Int

My workaround to getting a smaller subset of the whole list is to replace

[9] >>= (`replicateM` "ACGT")

with

take 10000 $ [9] >>= (`replicateM` "ACGT")

Thanks!
Mason Lai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160120/d3f38aa0/attachment.html>