finding sublist

Garner, Robin Robin.Garner@crsrehab.gov.au
Mon, 6 May 2002 14:15:55 +1000


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C1F4B4.B78BAF00
Content-Type: text/plain;
	charset="iso-8859-1"

See Dijkstra's 'Discipline of Programming' for an o(M + N) algorithm - naive
approches are o(MN) where M and N are the length of the list and substring
respectively.

Essentially the algorithm calculates a sort of autocorrelation table of the
substring, showing where to resume the search after a failed match.  For
example, if you're matching the substring [1,2,3,4], and fail on the last
comparison, you already know that there's no point in advancing one element
and attempting another match - you can actually start again at the element
that failed.  When matching the string [1,2,1,3], you can resume at the
current element of the main string, and the second element of the search
string.

How you would do this in a functional implementation is another question -
Dijkstra's example is comparing two arrays, and there may be inefficiencies
translating it to a list-based implementation.

Cheers

-----Original Message-----
From: Serge D. Mechveliani [mailto:mechvel@botik.ru]
Sent: Thursday, 2 May 2002 17:37
To: haskell@haskell.org
Subject: finding sublist


Thanks to people who helped me with the task

>> Import two space separated columns of integers from file.
 
Claus Reinke <claus.reinke@talk21.com> recommends to exploit `lines'.

Indeed, it bocomes shorter now:

  main = readFile "data" >>= (putStr . show . twoIntLists)
    where
    twoIntLists str = case  span (not . null) $ dropWhile null $ lines str
                      of
                        (lns, lns') -> (readInts lns, readInts lns')

    readInts = map (\ str -> read str :: Integer) . dropWhile null


Another question:
has the Haskell Standard library a function for such a usable task
as finding of occurence of a segment in a list? Say 

     findSegmentBy (...) [2,2,3] [0,0,2,2,1,2,2,3,1,2,3] --> 

                                 ([0,0,2,2,1], [2,2,3,1,2,3])

I have heard, an efficient algorithm (for large lists) for this 
is not so simple.

-----------------
Serge Mechveliani
mechvel@botik.ru
_______________________________________________
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

------_=_NextPart_001_01C1F4B4.B78BAF00
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2653.12">
<TITLE>RE: finding sublist</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>See Dijkstra's 'Discipline of Programming' for an o(M =
+ N) algorithm - naive approches are o(MN) where M and N are the length =
of the list and substring respectively.</FONT></P>

<P><FONT SIZE=3D2>Essentially the algorithm calculates a sort of =
autocorrelation table of the substring, showing where to resume the =
search after a failed match.&nbsp; For example, if you're matching the =
substring [1,2,3,4], and fail on the last comparison, you already know =
that there's no point in advancing one element and attempting another =
match - you can actually start again at the element that failed.&nbsp; =
When matching the string [1,2,1,3], you can resume at the current =
element of the main string, and the second element of the search =
string.</FONT></P>

<P><FONT SIZE=3D2>How you would do this in a functional implementation =
is another question - Dijkstra's example is comparing two arrays, and =
there may be inefficiencies translating it to a list-based =
implementation.</FONT></P>

<P><FONT SIZE=3D2>Cheers</FONT>
</P>

<P><FONT SIZE=3D2>-----Original Message-----</FONT>
<BR><FONT SIZE=3D2>From: Serge D. Mechveliani [<A =
HREF=3D"mailto:mechvel@botik.ru">mailto:mechvel@botik.ru</A>]</FONT>
<BR><FONT SIZE=3D2>Sent: Thursday, 2 May 2002 17:37</FONT>
<BR><FONT SIZE=3D2>To: haskell@haskell.org</FONT>
<BR><FONT SIZE=3D2>Subject: finding sublist</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>Thanks to people who helped me with the task</FONT>
</P>

<P><FONT SIZE=3D2>&gt;&gt; Import two space separated columns of =
integers from file.</FONT>
<BR><FONT SIZE=3D2>&nbsp;</FONT>
<BR><FONT SIZE=3D2>Claus Reinke &lt;claus.reinke@talk21.com&gt; =
recommends to exploit `lines'.</FONT>
</P>

<P><FONT SIZE=3D2>Indeed, it bocomes shorter now:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; main =3D readFile &quot;data&quot; &gt;&gt;=3D =
(putStr . show . twoIntLists)</FONT>
<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; where</FONT>
<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; twoIntLists str =3D case&nbsp; =
span (not . null) $ dropWhile null $ lines str</FONT>
<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
of</FONT>
<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp; (lns, lns') -&gt; (readInts lns, readInts lns')</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; readInts =3D map (\ str -&gt; read =
str :: Integer) . dropWhile null</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>Another question:</FONT>
<BR><FONT SIZE=3D2>has the Haskell Standard library a function for such =
a usable task</FONT>
<BR><FONT SIZE=3D2>as finding of occurence of a segment in a list? Say =
</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp; findSegmentBy (...) [2,2,3] =
[0,0,2,2,1,2,2,3,1,2,3] --&gt; </FONT>
</P>

<P><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ([0,0,2,2,1], =
[2,2,3,1,2,3])</FONT>
</P>

<P><FONT SIZE=3D2>I have heard, an efficient algorithm (for large =
lists) for this </FONT>
<BR><FONT SIZE=3D2>is not so simple.</FONT>
</P>

<P><FONT SIZE=3D2>-----------------</FONT>
<BR><FONT SIZE=3D2>Serge Mechveliani</FONT>
<BR><FONT SIZE=3D2>mechvel@botik.ru</FONT>
<BR><FONT =
SIZE=3D2>_______________________________________________</FONT>
<BR><FONT SIZE=3D2>Haskell mailing list</FONT>
<BR><FONT SIZE=3D2>Haskell@haskell.org</FONT>
<BR><FONT SIZE=3D2><A =
HREF=3D"http://www.haskell.org/mailman/listinfo/haskell" =
TARGET=3D"_blank">http://www.haskell.org/mailman/listinfo/haskell</A></F=
ONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C1F4B4.B78BAF00--