<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Hi,</p>
<p>The ListLike instance for ByteString is defined here:
<a class="moz-txt-link-freetext" href="https://hackage.haskell.org/package/ListLike-3.1.7.1/docs/src/Data-ListLike-Instances.html">https://hackage.haskell.org/package/ListLike-3.1.7.1/docs/src/Data-ListLike-Instances.html</a></p>
<p>The implementation of Levenshtein distance in language-spelling
is here:
<a class="moz-txt-link-freetext" href="https://hackage.haskell.org/package/language-spelling-0.3.2/docs/src/Language-Distance.html#levenshtein">https://hackage.haskell.org/package/language-spelling-0.3.2/docs/src/Language-Distance.html#levenshtein</a></p>
<p>It seems to use only O(1) operations on the Bytestring: length
and index (!!).<br>
It uses an unboxed mutable vector in the ST monad.<br>
It is specialized for ByteString.<br>
<br>
So I think it should be quite efficient already. If you want you
can use the same approach to implement the other algorithm, with
Data.Bits for the bit-fiddling:
<a class="moz-txt-link-freetext" href="https://hackage.haskell.org/package/base-4.8.2.0/docs/Data-Bits.html">https://hackage.haskell.org/package/base-4.8.2.0/docs/Data-Bits.html</a></p>
<p>Cheers,<br>
Sylvain</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 27/04/2016 01:17, Mason Lai wrote:<br>
</div>
<blockquote
cite="mid:CAH1iVpf4s23LGyVv-xYkyDJU4abMgos=6yfB=SFxqMW3ko_F6g@mail.gmail.com"
type="cite">
<div dir="ltr">Hi,
<div><br>
</div>
<div>I'm writing a Haskell program which is looking for whether
or not a string (needle) exists within a larger string
(haystack) given a Levenshtein distance. Note that it
shouldn't calculate the Levenshtein distance between the
needle and haystack. For example,</div>
<div><br>
</div>
<div>needle: AAA</div>
<div>haystack: TTTTTAATTTTT</div>
<div><br>
</div>
<div>The needle would be "found" if the Levenshtein distance is
set at 1, because dist("AAT", "AAA") == 1.</div>
<div><br>
</div>
<div>The following almost works:</div>
<div><br>
</div>
<div>distance(needle, haystack) - (len(haystack) - len(needle))</div>
<div><br>
</div>
<div>But it does not handle deletions correctly.</div>
<div><br>
</div>
<div>I've previously written a program which does this
approximate search in Java with the Wu-Manber extension of the
Bitap, or shift-or, algorithm. The same strategy seems
difficult to code up in Haskell because it's very "stateful"
and involves a lot of bit-fiddling. (Maybe it's actually quite
simple, but I'm not sure how I would go about doing this.)</div>
<div><br>
</div>
<div>As a further complication, I'd prefer to keep the data
packed as ByteStrings, as I'll be dealing with around 200 GiBs
of data (split and parallelized over a cluster, but still a
lot). I don't really know how to deal with ByteStrings without
using the functions that come along in the ByteString module
or unpacking the ByteString.</div>
<div><br>
</div>
<div>I've been using the language-spelling package, which has a
module which can calculate the Levenshtein distance between
two ByteStrings. I see that it uses ListLike. I'm not really
sure how it works, but I assume it makes ByteString an
instance of ListLike, and then you have access to all of the
ListLike methods?</div>
<div><br>
</div>
<div>Does anyone have any advice on how I should proceed with
this? I don't mind learning new things, but I don't really
know what the best strategy is or where to look.</div>
<div><br>
</div>
<div>-- Mason</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Beginners mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Beginners@haskell.org">Beginners@haskell.org</a>
<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners">http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners</a>
</pre>
</blockquote>
<br>
</body>
</html>