<div dir="ltr">I have never done it, but I think you can make GDB count the times a breakpoint is hit using conditional breakpoints.  Someone else may know of better tools.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Jul 25, 2020 at 2:04 AM Compl Yue <<a href="mailto:compl.yue@icloud.com">compl.yue@icloud.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p>Shame on me for I have neither experienced with `perf`, I'd learn
      these essential tools soon to put them into good use.</p>
    <p>It's great to learn about how `orElse` actually works, I did get
      confused why there are so little retries captured, and now I know.
      So that little trick should definitely be removed before going
      production, as it does no much useful things at excessive cost. I
      put it there to help me understand internal working of stm, now I
      get even better knowledge ;-)<br>
    </p>
    <p>I think a debugger will trap every single abort, isn't it
      annoying when many aborts would occur? If I'd like to count the
      number of aborts, ideally accounted per service endpoints, time
      periods, source modules etc. there some tricks for that?</p>
    <p>Thanks with best regards,</p>
    <p>Compl</p>
    <p><br>
    </p>
    <div>On 2020/7/25 上午2:02, Ryan Yates wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">To be clear, I was trying to refer to Linux `perf`
        [^1].  Sampling based profiling can do a good job with
        concurrent and parallel programs where other methods are
        problematic.  For instance,
        <div> changing the size of heap objects can drastically change
          cache performance and completely different behavior can show
          up.</div>
        <div><br>
        </div>
        <div>[^1]: <a href="https://en.wikipedia.org/wiki/Perf_(Linux)" target="_blank">https://en.wikipedia.org/wiki/Perf_(Linux)</a></div>
        <div><br>
        </div>
        <div>The spinning in `readTVar` should always be very short and
          it typically shows up as intensive CPU use, though it may not
          be high energy use with `pause` in the loop on x86 (looks like
          we don't have it [^2], I thought we did, but maybe that was
          only in some of my code... )</div>
        <div><br>
        </div>
        <div>[^2]: <a href="https://github.com/ghc/ghc/blob/master/rts/STM.c#L1275" target="_blank">https://github.com/ghc/ghc/blob/master/rts/STM.c#L1275</a> <br>
        </div>
        <div><br>
        </div>
        <div>All that to say, I doubt that you are spending much time
          spinning (but it would certainly be interesting to know if you
          are!  You would see `perf` attribute a large amount of time to
          `read_current_value`).  The amount of code to execute for
          commit (the time when locks are held) is always much shorter
          than it takes to execute the transaction body.  As you add
          more conflicting threads this gets worse of course as commits
          sequence.</div>
        <div><br>
        </div>
        <div>The code you have will count commits of executions of
          `retry`.  Note that `retry` is a user level idea, that is, you
          are counting user level *explicit* retries.  This is different
          from a transaction failing to commit and starting again. 
          These are invisible to the user.  Also using your trace will
          convert `retry` from the efficient wake on write
          implementation, to an active retry that will always attempt
          again.  We don't have cheap logging of transaction aborts in
          GHC, but I have built such logging in my work.  You can
          observe these aborts with a debugger by looking for execution
          of this line:</div>
        <div><br>
        </div>
        <div><a href="https://github.com/ghc/ghc/blob/master/rts/STM.c#L1123" target="_blank">https://github.com/ghc/ghc/blob/master/rts/STM.c#L1123</a></div>
        <div><br>
        </div>
        <div>Ryan </div>
        <div><br>
        </div>
        <div><br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Fri, Jul 24, 2020 at 12:35
          PM Compl Yue <<a href="mailto:compl.yue@icloud.com" target="_blank">compl.yue@icloud.com</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div>
            <p>I'm not familiar with profiling GHC yet, may need more
              time to get myself proficient with it.</p>
            <p>And a bit more details of my test workload for
              diagnostic: the db clients are Python processes from a
              cluster of worker nodes, consulting the db server to
              register some path for data files, under a data dir within
              a shared filesystem, then mmap those data files and fill
              in actual array data. So the db server don't have much
              computation to perform, but puts the data file path into a
              global index, which at the same validates its uniqueness.
              As there are many client processes trying to insert one
              meta data record concurrently, with my naive
              implementation, the global index's TVar will almost always
              in locked state by one client after another, from a queue
              never fall empty.</p>
            <p>So if `readTVar` should spinning waiting, I doubt the
              threads should actually make high CPU utilization, because
              at any instant of time, all threads except the committing
              one will be doing that one thing.<br>
            </p>
            <p>And I have something in my code to track STM retry like
              this:</p>
            <p>```</p>
            <div style="color:rgb(212,212,212);background-color:rgb(30,30,30);font-weight:normal;font-size:17px;line-height:23px;white-space:pre-wrap">
<div><span style="color:rgb(212,212,212)">    </span><span style="color:rgb(106,153,85)">-- blocking wait not expected, track stm retries explicitly</span></div><div><span style="color:rgb(212,212,212)">    </span><span style="color:rgb(220,220,170)">trackSTM</span><span style="color:rgb(212,212,212)"> :: </span><span style="color:rgb(86,156,214)">Int</span><span style="color:rgb(212,212,212)"> -> </span><span style="color:rgb(86,156,214)">IO</span><span style="color:rgb(212,212,212)"> (</span><span style="color:rgb(86,156,214)">Either</span><span style="color:rgb(212,212,212)"> () </span><span style="color:rgb(156,220,254)">a</span><span style="color:rgb(212,212,212)">)</span></div><div><span style="color:rgb(212,212,212)">    trackSTM !rtc = </span><span style="color:rgb(197,134,192)">do</span></div>
<div><span style="color:rgb(212,212,212)">      when </span><span style="color:rgb(106,153,85)">-- todo increase the threshold of reporting?</span></div><div><span style="color:rgb(212,212,212)">           (rtc > </span><span style="color:rgb(181,206,168)">0</span><span style="color:rgb(212,212,212)">) $ </span><span style="color:rgb(197,134,192)">do</span></div><div><span style="color:rgb(212,212,212)">        </span><span style="color:rgb(106,153,85)">-- trace out the retries so the end users can be aware of them</span></div><div><span style="color:rgb(212,212,212)">        tid <- myThreadId</span></div><div><span style="color:rgb(212,212,212)">        trace</span></div><div><span style="color:rgb(212,212,212)">            (  </span><span style="color:rgb(206,145,120)">"🔙</span><span style="color:rgb(215,186,125)">\n</span><span style="color:rgb(206,145,120)">"</span></div><div><span style="color:rgb(212,212,212)">            <> show callCtx</span></div><div><span style="color:rgb(212,212,212)">            <> </span><span style="color:rgb(206,145,120)">"🌀 "</span></div><div><span style="color:rgb(212,212,212)">            <> show tid</span></div><div><span style="color:rgb(212,212,212)">            <> </span><span style="color:rgb(206,145,120)">" stm retry #"</span></div><div><span style="color:rgb(212,212,212)">            <> show rtc</span></div><div><span style="color:rgb(212,212,212)">            )</span></div><div><span style="color:rgb(212,212,212)">          $ return </span><span style="color:rgb(86,156,214)">()</span></div>
<div><span style="color:rgb(212,212,212)">      atomically ((Just <$> stmJob) `orElse` return Nothing) >>= \</span><span style="color:rgb(197,134,192)">case</span></div><div><span style="color:rgb(212,212,212)">         Nothing -> </span><span style="color:rgb(106,153,85)">-- stm failed, do a tracked retry</span></div><div><span style="color:rgb(212,212,212)">          trackSTM (rtc + </span><span style="color:rgb(181,206,168)">1</span><span style="color:rgb(212,212,212)">)</span></div><div><span style="color:rgb(212,212,212)">        Just ... -> ...</span></div><div><span style="color:rgb(212,212,212)">
</span></div></div>
            <p>```</p>
            <p>No such trace msg fires during my test, neither in single
              thread run, nor in runs with pressure. I'm sure this
              tracing mechanism works, as I can see such traces fire, in
              case e.g. posting a TMVar to a TQueue for some other
              thread to fill it, then read the result out, if these 2
              ops are composed into a single tx, then of course it's
              infinite retry loop, and a sequence of such msgs are
              logged with ever increasing rtc #.<br>
            </p>
            <p>So I believe no retry has ever been triggered.</p>
            <p>What can going on there?</p>
            <p><br>
            </p>
            <div>On 2020/7/24 下午11:46, Ryan Yates wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">
                <div dir="ltr">
                  <div>> Then to explain the low CPU utilization
                    (~10%), am I right to understand it as that upon
                    reading a TVar locked by another committing tx, a
                    lightweight thread will put itself into `waiting
                    STM` and descheduled state, so the CPUs can only
                    stay idle as not so many threads are willing to
                    proceed?<br>
                  </div>
                  <div><br>
                  </div>
                  <div>Since the commit happens in finite steps, the
                    expectation is that the lock will be released very
                    soon.  Given this when the body of a transaction
                    executes `readTVar` it spins (active CPU!) until the
                    `TVar` is observed unlocked.  If a lock is observed
                    while commiting, it immediately starts the
                    transaction again from the beginning.  To get the
                    behavior of suspending a transaction you have to
                    successfully commit a transaction that executed
                    `retry`.  Then the transaction is put on the wakeup
                    lists of its read set and subsequent commits will
                    wake it up if its write set overlaps.</div>
                  <div><br>
                  </div>
                  <div>I don't think any of these things would explain
                    low CPU utilization.  You could try running with
                    `perf` and see if lots of time is spent in some
                    recognizable part of the RTS.</div>
                  <div><br>
                  </div>
                  <div>Ryan</div>
                </div>
                <div><br>
                </div>
                <br>
                <div class="gmail_quote">
                  <div dir="ltr" class="gmail_attr">On Fri, Jul 24, 2020
                    at 11:22 AM Compl Yue <<a href="mailto:compl.yue@icloud.com" target="_blank">compl.yue@icloud.com</a>>
                    wrote:<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div>
                      <p>Thanks very much for the insightful information
                        Ryan! I'm glad my suspect was wrong about the
                        Haskell scheduler:<br>
                      </p>
                      <p>> The Haskell capability that is
                        committing a transaction will not yield to
                        another Haskell thread while it is doing the
                        commit.  The OS thread may be preempted, but
                        once commit starts the haskell scheduler is not
                        invoked until after locks are released.</p>
                      <div>So best effort had already been made in GHC
                        and I just need to cooperate better with its
                        design. Then to explain the low CPU utilization
                        (~10%), am I right to understand it as that upon
                        reading a TVar locked by another committing tx,
                        a lightweight thread will put itself into
                        `waiting STM` and descheduled state, so the CPUs
                        can only stay idle as not so many threads are
                        willing to proceed?<br>
                      </div>
                      <div><br>
                      </div>
                      <div>Anyway, I see light with better data
                        structures to improve my situation, let me try
                        them and report back. Actually I later changed
                        `TVar (HaskMap k v)` to be `TVar (HashMap k
                        Int)` where the `Int` being array index into
                        `TVar (Vector (TVar (Maybe v)))`, in pursuing
                        insertion order preservation semantic of dict
                        entries (like that in Python 3.7+), then it's
                        very hopeful to incorporate stm-containers' Map
                        or ttrie to approach free of contention.<br>
                      </div>
                      <p>Thanks with regards,</p>
                      <p>Compl</p>
                      <p><br>
                      </p>
                      <div>On 2020/7/24 下午10:03, Ryan Yates wrote:<br>
                      </div>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div>Hi Compl,</div>
                          <div><br>
                          </div>
                          <div>Having a pool of transaction processing
                            threads can be helpful in a certain way.  If
                            the body of the transaction takes more time
                            to execute then the Haskell thread is
                            allowed and it yields, the suspended thread
                            won't get in the way of other thread, but
                            when it is rescheduled, will have a low
                            probability of success.  Even worse, it will
                            probably not discover that it is doomed to
                            failure until commit time.  If transactions
                            are more likely to reach commit without
                            yielding, they are more likely to succeed. 
                            If the transactions are not conflicting, it
                            doesn't make much difference other than
                            cache churn.</div>
                          <div><br>
                          </div>
                          <div>The Haskell capability that is
                            committing a transaction will not yield to
                            another Haskell thread while it is doing the
                            commit.  The OS thread may be preempted, but
                            once commit starts the haskell scheduler is
                            not invoked until after locks are released.</div>
                          <div><br>
                          </div>
                          <div>To get good performance from STM you must
                            pay attention to what TVars are involved in
                            a commit.  All STM systems are working under
                            the assumption of low contention, so you
                            want to minimize "false" conflicts
                            (conflicts that are not essential to the
                            computation).    Something like `TVar
                            (HashMap k v)` will work pretty well for a
                            low thread count, but every transaction that
                            writes to that structure will be in conflict
                            with every other transaction that accesses
                            it.  Pushing the `TVar` into the nodes of
                            the structure reduces the possibilities for
                            conflict, while increasing the amount of
                            bookkeeping STM has to do.  I would like to
                            reduce the cost of that bookkeeping using
                            better structures, but we need to do so
                            without harming performance in the low TVar
                            count case.  Right now it is optimized for
                            good cache performance with a handful of
                            TVars.</div>
                          <div><br>
                          </div>
                          <div>There is another way to play with
                            performance by moving work into and out of
                            the transaction body.  A transaction body
                            that executes quickly will reach commit
                            faster.  But it may be delaying work that
                            moves into another transaction.  Forcing
                            values at the right time can make a big
                            difference.</div>
                          <div><br>
                          </div>
                          <div>Ryan</div>
                          <div dir="ltr"><br>
                          </div>
                          <div dir="ltr">On Fri, Jul 24, 2020 at 2:14 AM
                            Compl Yue via Haskell-Cafe <<a href="mailto:haskell-cafe@haskell.org" target="_blank">haskell-cafe@haskell.org</a>>
                            wrote:<br>
                          </div>
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                              <div>
                                <p>Thanks Chris, I confess I didn't pay
                                  enough attention to STM specialized
                                  container libraries by far, I skimmed
                                  through the description of
                                  stm-containers and ttrie, and feel
                                  they would definitely improve my
                                  code's performance in case I limit the
                                  server's parallelism within hardware
                                  capabilities. That may because I'm
                                  still prototyping the api and
                                  infrastructure for correctness, so
                                  even `TVar (HashMap k v)` performs
                                  okay for me at the moment, only if at
                                  low contention (surely there're plenty
                                  of CPU cycles to be optimized out in
                                  next steps). I model my data after
                                  graph model, so most data, even most
                                  indices are localized to nodes and
                                  edges, those can be manipulated
                                  without conflict, that's why I assumed
                                  I have a low contention use case since
                                  the very beginning - until I found
                                  there are still (though minor) needs
                                  for global indices to guarantee global
                                  uniqueness, I feel faithful with
                                  stm-containers/ttrie to implement a
                                  more scalable global index data
                                  structure, thanks for hinting me.</p>
                                <p>So an evident solution comes into my
                                  mind now, is to run the server with a
                                  pool of tx processing threads,
                                  matching number of CPU cores, client
                                  RPC requests then get queued to be
                                  executed in some thread from the pool.
                                  But I'm really fond of the mechanism
                                  of M:N scheduler which solves
                                  massive/dynamic concurrency so
                                  elegantly. I had some good result with
                                  Go in this regard, and see GHC at par
                                  in doing this, I don't want to give up
                                  this enjoyable machinery.</p>
                                <p>But looked at the stm implementation
                                  in GHC, it seems written TVars are
                                  exclusively locked during commit of a
                                  tx, I suspect this is the culprit when
                                  there're large M lightweight threads
                                  scheduled upon a small N hardware
                                  capabilities, that is when a
                                  lightweight thread yield control
                                  during an stm transaction commit, the
                                  TVars it locked will stay so until
                                  it's scheduled again (and again) till
                                  it can finish the commit. This way,
                                  descheduled threads could hold live
                                  threads from progressing. I haven't
                                  gone into more details there, but
                                  wonder if there can be some
                                  improvement for GHC RTS to keep an stm
                                  committing thread from descheduled,
                                  but seemingly that may impose more
                                  starvation potential; or stm can be
                                  improved to have its TVar locks
                                  preemptable when the owner trec/thread
                                  is in descheduled state? Neither
                                  should be easy but I'd really love
                                  massive lightweight threads doing STM
                                  practically well.<br>
                                </p>
                                <p>Best regards,</p>
                                <p>Compl</p>
                                <p><br>
                                </p>
                                <div>On 2020/7/24 上午12:57, Christopher
                                  Allen wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr">It seems like you know
                                    how to run practical tests for
                                    tuning thread count and contention
                                    for throughput. Part of the reason
                                    you haven't gotten a super clear
                                    answer is "it depends." You give up
                                    fairness when you use STM instead of
                                    MVars or equivalent structures. That
                                    means a long running transaction
                                    might get stampeded by many small
                                    ones invalidating it over and over.
                                    The long-running transaction might
                                    never clear if the small
                                    transactions keep moving the cheese.
                                    I mention this because transaction
                                    runtime and size and count all
                                    affect throughput and latency. What
                                    might be ideal for one pattern of
                                    work might not be ideal for another.
                                    Optimizing for overall throughput
                                    might make the contention and
                                    fairness problems worse too. I've
                                    done practical tests to optimize
                                    this in the past, both for STM in
                                    Haskell and for RDBMS workloads.
                                    <div><br>
                                    </div>
                                    <div>The next step is sometimes
                                      figuring out whether you really
                                      need a data structure within a
                                      single STM container or if perhaps
                                      you can break up your STM
                                      container boundaries into zones or
                                      regions that roughly map onto
                                      update boundaries. That should
                                      make the transactions churn less.
                                      On the outside chance you do need
                                      to touch more than one container
                                      in a transaction, well, they
                                      compose.
                                      <div><br>
                                      </div>
                                      <div>e.g. <a href="https://hackage.haskell.org/package/stm-containers" target="_blank">https://hackage.haskell.org/package/stm-containers</a></div>
                                      <div><a href="https://hackage.haskell.org/package/ttrie" target="_blank">https://hackage.haskell.org/package/ttrie</a><br>
                                      </div>
                                      <div><br>
                                      </div>
                                      <div>It also sounds a bit like
                                        your question bumps into
                                        Amdahl's Law a bit.</div>
                                      <div><br>
                                      </div>
                                      <div>All else fails, stop using
                                        STM and find something more
                                        tuned to your problem space.</div>
                                      <div><br>
                                      </div>
                                      <div>Hope this helps,</div>
                                      <div>Chris Allen</div>
                                      <div><br>
                                      </div>
                                    </div>
                                  </div>
                                  <br>
                                  <div class="gmail_quote">
                                    <div dir="ltr" class="gmail_attr">On
                                      Thu, Jul 23, 2020 at 9:53 AM
                                      YueCompl via Haskell-Cafe <<a href="mailto:haskell-cafe@haskell.org" target="_blank">haskell-cafe@haskell.org</a>>
                                      wrote:<br>
                                    </div>
                                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                      <div>Hello Cafe,
                                        <div><br>
                                        </div>
                                        <div>I'm working on an in-memory
                                          database, in Client/Server
                                          mode I just let each connected
                                          client submit remote procedure
                                          call running in its dedicated
                                          lightweight thread, modifying
                                          TVars in RAM per its business
                                          needs, then in case many
                                          clients connected concurrently
                                          and trying to insert new data,
                                          if they are triggering global
                                          index (some TVar) update, the
                                          throughput would drop
                                          drastically. I reduced the
                                          shared state to a simple int
                                          counter by TVar, got same
                                          symptom. While the parallelism
                                          feels okay when there's no hot
                                          TVar conflicting, or M is not
                                          much greater than N.</div>
                                        <div><br>
                                        </div>
                                        <div>As an empirical test
                                          workload, I have a `+RTS -N10`
                                          server process, it handles 10
                                          concurrent clients okay, got
                                          ~5x of single thread
                                          throughput; but in handling 20
                                          concurrent clients, each of
                                          the 10 CPUs can only be driven
                                          to ~10% utilization, the
                                          throughput seems even worse
                                          than single thread. More
                                          clients can even drive it
                                          thrashing without much
                                           progressing.</div>
                                        <div><br>
                                        </div>
                                        <div> I can understand that pure
                                          STM doesn't scale well after
                                          reading [1], and I see it
                                          suggested [7] attractive and
                                          planned future work toward
                                          that direction.</div>
                                        <div><br>
                                        </div>
                                        <div>But I can't find certain
                                          libraries or frameworks
                                          addressing large M over small
                                          N scenarios, [1] experimented
                                          with designated N parallelism,
                                          and [7] is rather theoretical
                                          to my empirical needs.</div>
                                        <div><br>
                                        </div>
                                        <div>Can you direct me to some
                                          available library implementing
                                          the methodology proposed in
                                          [7] or other ways tackling
                                          this problem?</div>
                                        <div><br>
                                        </div>
                                        <div>I think the most difficult
                                          one is that a transaction
                                          should commit with global
                                          indices (with possibly unique
                                          constraints) atomically
                                          updated, and rollback with any
                                          violation of constraints, i.e.
                                          transactions have to cover
                                          global states like indices.
                                          Other problems seem more
                                          trivial than this.</div>
                                        <div><br>
                                        </div>
                                        <div>Specifically, [7] states:</div>
                                        <div><br>
                                        </div>
                                        <div>> It must be emphasized
                                          that all of the mechanisms we
                                          deploy originate, in one form
                                          or another, in the database
                                          literature from the 70s and
                                          80s. Our contribution is to
                                          adapt these techniques to
                                          software transactional memory,
                                          providing more effective
                                          solutions to important STM
                                          problems than prior proposals.</div>
                                        <div><br>
                                        </div>
                                        <div>I wonder any STM based
                                          library has simplified those
                                          techniques to be composed
                                          right away? I don't really
                                          want to implement those
                                          mechanisms by myself,
                                          rebuilding many wheels from
                                          scratch.</div>
                                        <div><br>
                                        </div>
                                        <div>Best regards,</div>
                                        <div>Compl</div>
                                        <div><br>
                                        </div>
                                        <div><br>
                                        </div>
                                        <div>[1] Comparing the
                                          performance of concurrent
                                          linked-list implementations in
                                          Haskell </div>
                                        <div><a href="https://simonmar.github.io/bib/papers/concurrent-data.pdf" target="_blank">https://simonmar.github.io/bib/papers/concurrent-data.pdf</a></div>
                                        <div><br>
                                        </div>
                                        <div>[7] M. Herlihy and E.
                                          Koskinen. Transactional
                                          boosting: a methodology for
                                          highly-concurrent
                                          transactional objects. In
                                          Proc. of PPoPP ’08, pages
                                          207–216. ACM Press, 2008.</div>
                                        <div><a href="https://www.cs.stevens.edu/~ejk/papers/boosting-ppopp08.pdf" target="_blank">https://www.cs.stevens.edu/~ejk/papers/boosting-ppopp08.pdf</a></div>
                                        <div><br>
                                        </div>
                                      </div>
_______________________________________________<br>
                                      Haskell-Cafe mailing list<br>
                                      To (un)subscribe, modify options
                                      or view archives go to:<br>
                                      <a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br>
                                      Only members subscribed via the
                                      mailman list are allowed to post.</blockquote>
                                  </div>
                                  <br clear="all">
                                  <div><br>
                                  </div>
                                  -- <br>
                                  <div dir="ltr">
                                    <div dir="ltr">
                                      <div>
                                        <div dir="ltr">
                                          <div dir="ltr">Chris Allen<br>
                                            <div><span style="font-size:12.8px">Currently
                                                working on </span><a href="http://haskellbook.com" target="_blank">http://haskellbook.com</a></div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </div>
_______________________________________________<br>
                              Haskell-Cafe mailing list<br>
                              To (un)subscribe, modify options or view
                              archives go to:<br>
                              <a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br>
                              Only members subscribed via the mailman
                              list are allowed to post.</blockquote>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </blockquote>
                </div>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </div>

</blockquote></div>