[Haskell-beginners] FW: Beginners Digest, Vol 91, Issue 26

Fri Jan 22 07:34:34 UTC 2016

Don’t worry about the time to append rather than pretend, it is not “exponential” nor even cubic.
You are making a list of length in an N^2 algorithm. Appending to a list adds another tiny O(N^2) step, which is dominated by the main algorithm (also O(N^2), which both takes longer and  evaluates vastly more values (failures as well as successes.)
Message: 3
Date: Thu, 21 Jan 2016 19:05:17 +0000
From: Rein Henrichs <rein.henrichs at gmail.com>
To: The Haskell-Beginners Mailing List - Discussion of primarily
	beginner-level topics related to Haskell <beginners at haskell.org>,
	masonmlai at gmail.com
Subject: Re: [Haskell-beginners] program not running lazily
Message-ID:
	<CAJp6G8zemeHpYwZ36ShMJ-m_BishVnUJpoqEmK=DosKHRVaqeg at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

s/not/note, sorry

On Thu, Jan 21, 2016 at 10:42 AM Rein Henrichs <rein.henrichs at gmail.com>
wrote:

> But not that doing so will cause the program to have an exponential
> runtime as each new ys must be repeatedly traversed to append a [y].. The
> alternative is to *unfold* your list by recursing on the right hand side of
> the (:) to add new elements.
>
> On Thu, Jan 21, 2016 at 5:43 AM Doug McIlroy <doug at cs.dartmouth.edu>
> wrote:
>
>> Each time you find another good 9-mer, you add it to
>> the head of the list. This means that the ultimate
>> list will be in reverse order of discovery: the first element
>> to be printed is the last one to be found.  To get
>> output in the order it was discovered, build the
>> output by ys++[y] rather than y:ys.
>> _______________________________________________
>> Beginners mailing list
>> Beginners at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160121/9a99b2f6/attachment-0001.html>

------------------------------

Message: 4
Date: Thu, 21 Jan 2016 11:34:35 -0800
From: Mason Lai <masonmlai at gmail.com>
To: Rein Henrichs <rein.henrichs at gmail.com>, doug at cs.dartmouth.edu
Cc: The Haskell-Beginners Mailing List - Discussion of primarily
	beginner-level topics related to Haskell <beginners at haskell.org>
Subject: Re: [Haskell-beginners] program not running lazily
Message-ID:
	<CAH1iVpdf1tNsdHakTsM-1u4=2hu6GHpb47=gj+MZuxybUtb2Mw at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I've changed the "if minimum < 3" line with an "any (< 3)" line. This has
sped up the performance to be good enough. (I assume that I have to
calculate the Lev. distance of all the 9-mers in order to take the minimum.
I really only care if any element has a Lev. distance less than three, so I
can stop when I find the first.) The rest of this discussion is purely for
fun.

I've swapped "y:ys" for "ys ++ [y]", and while the output is reversed, I
don't appear to be able to take the first n elements still. I haven't timed
how long the program takes now to see if it blows up. Rein, I don't quite
understand your answer; I may need you to be more explicit if I don't
figure this out. Part of what confuses me is that the recursion takes place
in merge, and the (:) is in addInto. I think your comment has given me an
idea, so I'm going to take some time to play with this in the afternoon, so
I'll send an update tonight.

Thanks for looking at this!
-- Mason

On Thu, Jan 21, 2016 at 11:05 AM, Rein Henrichs <rein.henrichs at gmail.com>
wrote:

> s/not/note, sorry
>
> On Thu, Jan 21, 2016 at 10:42 AM Rein Henrichs <rein.henrichs at gmail.com>
> wrote:
>
>> But not that doing so will cause the program to have an exponential
>> runtime as each new ys must be repeatedly traversed to append a [y].. The
>> alternative is to *unfold* your list by recursing on the right hand side of
>> the (:) to add new elements.
>>
>> On Thu, Jan 21, 2016 at 5:43 AM Doug McIlroy <doug at cs.dartmouth.edu>
>> wrote:
>>
>>> Each time you find another good 9-mer, you add it to
>>> the head of the list. This means that the ultimate
>>> list will be in reverse order of discovery: the first element
>>> to be printed is the last one to be found.  To get
>>> output in the order it was discovered, build the
>>> output by ys++[y] rather than y:ys.
>>> _______________________________________________
>>> Beginners mailing list
>>> Beginners at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160121/15fe120e/attachment-0001.html>

------------------------------

Message: 5
Date: Fri, 22 Jan 2016 17:33:33 +1100
From: Thomas Koster <tkoster at gmail.com>
To: beginners at haskell.org
Subject: [Haskell-beginners] Increasing capabilities dramatically
	increases	execution time
Message-ID:
	<CAG1wH7DDYULMComW8QNntU4s6hcrhU53TAFywKfub+QqJi+BRQ at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi friends,

I have encountered a situation in a concurrent program where I see an
unexpected, dramatic increase in execution time when I add additional
capabilities. On a multi-core CPU, "-N1" is fastest by an order of
magnitude and the program increasingly slows for an increasing number
of capabilities (still fewer than the number of real cores, of
course).

My question is, why is this so and what can I do to improve this?

Some details:

I have a shared data structure which I call a "store", which is
basically a strict HashMap String Value. This structure is shared
between Haskell threads using an IORef/MVar pair:

data Store = Store (IORef (HashMap Key Value)) (MVar ())

Focus on the IORef/MVar pair - the HashMap itself is not important. My
intention is that read-only transactions can take the pure value
straight from the IORef without blocking writers or other readers,
whereas mutating transactions (those that will update the IORef when
they complete) are serialized using the MVar. Alternatively, you can
regard the read-only transaction as working with a private snapshot of
the store that is discarded after it completes.

It is my hope that this technique will allow my program to exploit a
multi-core CPU by running several read-only transactions and at most
one mutating transaction in parallel. I am also assuming that this
technique is marginally more efficient than STM for this use case,
especially for write-heavy loads where I am assuming STM would waste
some time on retries (I did not test this).

-- | Execute a read-only transaction that returns a value.
withStore :: Store -> (HashMap Key Value -> a) -> IO a
withStore (Store ioref _) f = do
  store <- readIORef ioref
  return (f store)

-- | Execute a transaction that updates the store and returns a value.
modifyStore :: Store -> (HashMap Key Value -> (HashMap Key Value, a)) -> IO a
modifyStore (Store ioref mvar) f =
  modifyMVar mvar $ \ z -> do
    store <- readIORef ioref
    let (store', x) = f store
    store' `seq` writeIORef ioref store'
    return (z, x)

Stop me right here if this is a silly way to do this.

I have a test that forks 4 threads that each increment a counter in
the store 100000 times, with the expectation that the final answer is
400000. I use the "async" package for this. This test is not
necessarily pathological. I expect simple operations like incrementing
counters and concatenating text to be typical transactions.

  threads <- replicateM 4 $ async $
               replicateM_ 100000 $
                 modifyStore store (...increment a counter...)
  forM_ threads wait

In this test, while any thread is busy modifying the store, the other
three are blocked on the empty MVar, irrespective of how many
capabilities I have. Therefore, I expect the execution time with -N4
to be similar to -N1. I expect the only difference to be attributable
to the runtime's scheduling overheads, which I assume are relatively
insignificant. Instead, I see a dramatic increase in execution time
with increasing capabilities (typical measurements below).

By the way, I say I assume scheduling overheads are "relatively
insignificant" compared to incrementing the counter because in my
program, the counter is incremented by interpreting an imperative
EDSL, which I assume is relatively inefficient compared to e.g.
"succ", but perhaps I am mistaken. In a way, my whole question
probably centres around this assumption.

I would be grateful is anyone can illuminate the reason for this
dramatic increase in execution time when I increase the number of
capabilities, and any hints on how I can mitigate it.

I am using GHC 7.10.2 and compile with -O -threaded. All library
versions are as at Stackage LTS 3.2.

A typical measurement, with -N1:

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       516 colls,     0 par    0.000s   0.003s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.000s   0.001s     0.0003s    0.0003s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    0.136s  (  0.139s elapsed)
  GC      time    0.000s  (  0.003s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time    0.136s  (  0.144s elapsed)

With -N2:

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       334 colls,   334 par    0.012s   0.006s     0.0000s    0.0001s
  Gen  1         2 colls,     1 par    0.000s   0.000s     0.0002s    0.0003s

  Parallel GC work balance: 39.33% (serial 0%, perfect 100%)

  TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    2.032s  (  2.456s elapsed)
  GC      time    0.012s  (  0.007s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    2.044s  (  2.465s elapsed)

With -N4:

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       133 colls,   133 par    0.032s   0.005s     0.0000s    0.0001s
  Gen  1         2 colls,     1 par    0.000s   0.001s     0.0003s    0.0003s

  Parallel GC work balance: 41.33% (serial 0%, perfect 100%)

  TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.003s elapsed)
  MUT     time    3.516s  (  4.431s elapsed)
  GC      time    0.032s  (  0.005s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time    3.548s  (  4.439s elapsed)

Thanks,
Thomas Koster

------------------------------

Subject: Digest Footer

_______________________________________________
Beginners mailing list
Beginners at haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

------------------------------

End of Beginners Digest, Vol 91, Issue 26
*****************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160121/c9d32302/attachment-0001.html>