<div dir="ltr">With Michael Snoyman's help, I rewrote my Conduit version of the
 application (without using stm-conduit). This was a large improvement: 
my first Conduit version was operating over all data and I didn't 
realize this. 

<p>I also increased the nursery size.</p>

<p>My revised function ended up looking like this:</p>

<pre class="gmail-lang-hs gmail-prettyprint gmail-prettyprinted"><code><span class="gmail-kwd">module</span><span class="gmail-pln"> Search </span><span class="gmail-kwd">where</span><span class="gmail-pln">

</span><span class="gmail-kwd">import</span><span class="gmail-pln">           Conduit               </span><span class="gmail-pun">((.|))</span><span class="gmail-pln">
</span><span class="gmail-kwd">import</span><span class="gmail-pln"> qualified Conduit               as C
</span><span class="gmail-kwd">import</span><span class="gmail-pln">           Control.Monad
</span><span class="gmail-kwd">import</span><span class="gmail-pln">           Control.Monad.IO.Class   </span><span class="gmail-pun">(</span><span class="gmail-pln">MonadIO</span><span class="gmail-pun">,</span><span class="gmail-pln"> liftIO</span><span class="gmail-pun">)</span><span class="gmail-pln">
</span><span class="gmail-kwd">import</span><span class="gmail-pln">           Control.Monad.Trans.Resource </span><span class="gmail-pun">(</span><span class="gmail-pln">MonadResource</span><span class="gmail-pun">)</span><span class="gmail-pln">
</span><span class="gmail-kwd">import</span><span class="gmail-pln"> qualified Data.ByteString       as B
</span><span class="gmail-kwd">import</span><span class="gmail-pln">           Data.List             </span><span class="gmail-pun">(</span><span class="gmail-pln">isPrefixOf</span><span class="gmail-pun">)</span><span class="gmail-pln">
</span><span class="gmail-kwd">import</span><span class="gmail-pln">           Data.Maybe            </span><span class="gmail-pun">(</span><span class="gmail-pln">fromJust</span><span class="gmail-pun">,</span><span class="gmail-pln"> isJust</span><span class="gmail-pun">)</span><span class="gmail-pln">
</span><span class="gmail-kwd">import</span><span class="gmail-pln">           System.Path.NameManip </span><span class="gmail-pun">(</span><span class="gmail-pln">guess_dotdot</span><span class="gmail-pun">,</span><span class="gmail-pln"> absolute_path</span><span class="gmail-pun">)</span><span class="gmail-pln">
</span><span class="gmail-kwd">import</span><span class="gmail-pln">           System.FilePath       </span><span class="gmail-pun">(</span><span class="gmail-pln">addTrailingPathSeparator</span><span class="gmail-pun">,</span><span class="gmail-pln"> normalise</span><span class="gmail-pun">)</span><span class="gmail-pln">
</span><span class="gmail-kwd">import</span><span class="gmail-pln">           System.Directory      </span><span class="gmail-pun">(</span><span class="gmail-pln">getHomeDirectory</span><span class="gmail-pun">)</span><span class="gmail-pln">

</span><span class="gmail-kwd">import</span><span class="gmail-pln">           Filters


sourceFilesFilter </span><span class="gmail-pun">::</span><span class="gmail-pln"> </span><span class="gmail-pun">(</span><span class="gmail-pln">MonadResource m</span><span class="gmail-pun">,</span><span class="gmail-pln"> MonadIO m</span><span class="gmail-pun">)</span><span class="gmail-pln"> </span><span class="gmail-pun">=></span><span class="gmail-pln"> ProjectFilter </span><span class="gmail-pun">-></span><span class="gmail-pln"> FilePath </span><span class="gmail-pun">-></span><span class="gmail-pln"> C.ConduitM </span><span class="gmail-pun">()</span><span class="gmail-pln"> String m </span><span class="gmail-pun">()</span><span class="gmail-pln">
sourceFilesFilter projFilter dirname' </span><span class="gmail-pun">=</span><span class="gmail-pln">
    C.sourceDirectoryDeep False dirname'
    </span><span class="gmail-pun">.|</span><span class="gmail-pln"> parseProject projFilter

parseProject </span><span class="gmail-pun">::</span><span class="gmail-pln"> </span><span class="gmail-pun">(</span><span class="gmail-pln">MonadResource m</span><span class="gmail-pun">,</span><span class="gmail-pln"> MonadIO m</span><span class="gmail-pun">)</span><span class="gmail-pln"> </span><span class="gmail-pun">=></span><span class="gmail-pln"> ProjectFilter </span><span class="gmail-pun">-></span><span class="gmail-pln"> C.ConduitM FilePath String m </span><span class="gmail-pun">()</span><span class="gmail-pln">
parseProject </span><span class="gmail-pun">(</span><span class="gmail-pln">ProjectFilter filterFunc</span><span class="gmail-pun">)</span><span class="gmail-pln"> </span><span class="gmail-pun">=</span><span class="gmail-pln"> </span><span class="gmail-kwd">do</span><span class="gmail-pln">
  C.awaitForever go
  </span><span class="gmail-kwd">where</span><span class="gmail-pln">
    go path' </span><span class="gmail-pun">=</span><span class="gmail-pln"> </span><span class="gmail-kwd">do</span><span class="gmail-pln">
      bytes </span><span class="gmail-pun"><-</span><span class="gmail-pln"> liftIO </span><span class="gmail-pun">$</span><span class="gmail-pln"> B.readFile path'
      </span><span class="gmail-kwd">let</span><span class="gmail-pln"> isProj </span><span class="gmail-pun">=</span><span class="gmail-pln"> validProject bytes
      when </span><span class="gmail-pun">(</span><span class="gmail-pln">isJust isProj</span><span class="gmail-pun">)</span><span class="gmail-pln"> </span><span class="gmail-pun">$</span><span class="gmail-pln"> </span><span class="gmail-kwd">do</span><span class="gmail-pln">
        </span><span class="gmail-kwd">let</span><span class="gmail-pln"> proj' </span><span class="gmail-pun">=</span><span class="gmail-pln"> fromJust isProj
        when </span><span class="gmail-pun">(</span><span class="gmail-pln">filterFunc proj'</span><span class="gmail-pun">)</span><span class="gmail-pln"> </span><span class="gmail-pun">$</span><span class="gmail-pln"> C.yield path'</span></code></pre>

<p>My main just runs the conduit and prints those that pass the filter:</p>

<pre class="gmail-lang-hs gmail-prettyprint gmail-prettyprinted"><code><span class="gmail-pln">mainStreamingConduit </span><span class="gmail-pun">::</span><span class="gmail-pln"> IO </span><span class="gmail-pun">()</span><span class="gmail-pln">
mainStreamingConduit </span><span class="gmail-pun">=</span><span class="gmail-pln"> </span><span class="gmail-kwd">do</span><span class="gmail-pln">
  options </span><span class="gmail-pun"><-</span><span class="gmail-pln"> getRecord </span><span class="gmail-str">"Search JSON Files"</span><span class="gmail-pln">
  </span><span class="gmail-kwd">let</span><span class="gmail-pln"> filterFunc </span><span class="gmail-pun">=</span><span class="gmail-pln"> makeProjectFilter options
  searchDir </span><span class="gmail-pun"><-</span><span class="gmail-pln"> absolutize </span><span class="gmail-pun">(</span><span class="gmail-pln">searchPath options</span><span class="gmail-pun">)</span><span class="gmail-pln">
  itExists </span><span class="gmail-pun"><-</span><span class="gmail-pln"> doesDirectoryExist searchDir
  </span><span class="gmail-kwd">case</span><span class="gmail-pln"> itExists </span><span class="gmail-kwd">of</span><span class="gmail-pln">
    False </span><span class="gmail-pun">-></span><span class="gmail-pln"> putStrLn </span><span class="gmail-str">"Search Directory does not exist"</span><span class="gmail-pln"> </span><span class="gmail-pun">>></span><span class="gmail-pln"> exitWith </span><span class="gmail-pun">(</span><span class="gmail-pln">ExitFailure </span><span class="gmail-lit">1</span><span class="gmail-pun">)</span><span class="gmail-pln">
    True </span><span class="gmail-pun">-></span><span class="gmail-pln"> C.runConduitRes </span><span class="gmail-pun">$</span><span class="gmail-pln"> sourceFilesFilter filterFunc searchDir </span><span class="gmail-pun">.|</span><span class="gmail-pln"> C.mapM_ </span><span class="gmail-pun">(</span><span class="gmail-pln">liftIO </span><span class="gmail-pun">.</span><span class="gmail-pln"> putStrLn</span><span class="gmail-pun">)</span></code></pre>

<p>I run it like this (without the stats, typically):</p>

<pre class="gmail-lang-hs gmail-prettyprint gmail-prettyprinted"><code><span class="gmail-pln">stack exec search</span><span class="gmail-pun">-</span><span class="gmail-pln">json </span><span class="gmail-com">-- --searchPath $FILES --name NAME +RTS -s -A32m -n4m</span></code></pre>

<p>Without increasing nursery size, I get a productivity around 30%. With the above, however, it looks like this:</p>

<pre class="gmail-lang-hs gmail-prettyprint gmail-prettyprinted"><code><span class="gmail-pln">  </span><span class="gmail-lit">72</span><span class="gmail-pun">,</span><span class="gmail-lit">308</span><span class="gmail-pun">,</span><span class="gmail-lit">248</span><span class="gmail-pun">,</span><span class="gmail-lit">744</span><span class="gmail-pln"> bytes allocated </span><span class="gmail-kwd">in</span><span class="gmail-pln"> the heap
     </span><span class="gmail-lit">733</span><span class="gmail-pun">,</span><span class="gmail-lit">911</span><span class="gmail-pun">,</span><span class="gmail-lit">752</span><span class="gmail-pln"> bytes copied during GC
       </span><span class="gmail-lit">7</span><span class="gmail-pun">,</span><span class="gmail-lit">410</span><span class="gmail-pun">,</span><span class="gmail-lit">520</span><span class="gmail-pln"> bytes maximum residency </span><span class="gmail-pun">(</span><span class="gmail-lit">8</span><span class="gmail-pln"> sample</span><span class="gmail-pun">(</span><span class="gmail-pln">s</span><span class="gmail-pun">))</span><span class="gmail-pln">
         </span><span class="gmail-lit">863</span><span class="gmail-pun">,</span><span class="gmail-lit">480</span><span class="gmail-pln"> bytes maximum slop
             </span><span class="gmail-lit">187</span><span class="gmail-pln"> MB total memory </span><span class="gmail-kwd">in</span><span class="gmail-pln"> use </span><span class="gmail-pun">(</span><span class="gmail-lit">27</span><span class="gmail-pln"> MB lost due to fragmentation</span><span class="gmail-pun">)</span><span class="gmail-pln">

                                     Tot time </span><span class="gmail-pun">(</span><span class="gmail-pln">elapsed</span><span class="gmail-pun">)</span><span class="gmail-pln">  Avg pause  Max pause
  Gen  </span><span class="gmail-lit">0</span><span class="gmail-pln">       </span><span class="gmail-lit">580</span><span class="gmail-pln"> colls</span><span class="gmail-pun">,</span><span class="gmail-pln">   </span><span class="gmail-lit">580</span><span class="gmail-pln"> par    </span><span class="gmail-lit">2.731</span><span class="gmail-pln">s   </span><span class="gmail-lit">0.772</span><span class="gmail-pln">s     </span><span class="gmail-lit">0.0013</span><span class="gmail-pln">s    </span><span class="gmail-lit">0.0105</span><span class="gmail-pln">s
  Gen  </span><span class="gmail-lit">1</span><span class="gmail-pln">         </span><span class="gmail-lit">8</span><span class="gmail-pln"> colls</span><span class="gmail-pun">,</span><span class="gmail-pln">     </span><span class="gmail-lit">7</span><span class="gmail-pln"> par    </span><span class="gmail-lit">0.163</span><span class="gmail-pln">s   </span><span class="gmail-lit">0.044</span><span class="gmail-pln">s     </span><span class="gmail-lit">0.0055</span><span class="gmail-pln">s    </span><span class="gmail-lit">0.0109</span><span class="gmail-pln">s

  Parallel GC work balance</span><span class="gmail-pun">:</span><span class="gmail-pln"> </span><span class="gmail-lit">35.12</span><span class="gmail-pun">%</span><span class="gmail-pln"> </span><span class="gmail-pun">(</span><span class="gmail-pln">serial </span><span class="gmail-lit">0</span><span class="gmail-pun">%,</span><span class="gmail-pln"> perfect </span><span class="gmail-lit">100</span><span class="gmail-pun">%)</span><span class="gmail-pln">

  TASKS</span><span class="gmail-pun">:</span><span class="gmail-pln"> </span><span class="gmail-lit">10</span><span class="gmail-pln"> </span><span class="gmail-pun">(</span><span class="gmail-lit">1</span><span class="gmail-pln"> bound</span><span class="gmail-pun">,</span><span class="gmail-pln"> </span><span class="gmail-lit">9</span><span class="gmail-pln"> peak workers </span><span class="gmail-pun">(</span><span class="gmail-lit">9</span><span class="gmail-pln"> total</span><span class="gmail-pun">),</span><span class="gmail-pln"> using </span><span class="gmail-pun">-</span><span class="gmail-pln">N4</span><span class="gmail-pun">)</span><span class="gmail-pln">

  SPARKS</span><span class="gmail-pun">:</span><span class="gmail-pln"> </span><span class="gmail-lit">0</span><span class="gmail-pln"> </span><span class="gmail-pun">(</span><span class="gmail-lit">0</span><span class="gmail-pln"> converted</span><span class="gmail-pun">,</span><span class="gmail-pln"> </span><span class="gmail-lit">0</span><span class="gmail-pln"> overflowed</span><span class="gmail-pun">,</span><span class="gmail-pln"> </span><span class="gmail-lit">0</span><span class="gmail-pln"> dud</span><span class="gmail-pun">,</span><span class="gmail-pln"> </span><span class="gmail-lit">0</span><span class="gmail-pln"> GC'd</span><span class="gmail-pun">,</span><span class="gmail-pln"> </span><span class="gmail-lit">0</span><span class="gmail-pln"> fizzled</span><span class="gmail-pun">)</span><span class="gmail-pln">

  INIT    time    </span><span class="gmail-lit">0.001</span><span class="gmail-pln">s  </span><span class="gmail-pun">(</span><span class="gmail-pln">  </span><span class="gmail-lit">0.006</span><span class="gmail-pln">s elapsed</span><span class="gmail-pun">)</span><span class="gmail-pln">
  MUT     time   </span><span class="gmail-lit">26.155</span><span class="gmail-pln">s  </span><span class="gmail-pun">(</span><span class="gmail-pln"> </span><span class="gmail-lit">31.602</span><span class="gmail-pln">s elapsed</span><span class="gmail-pun">)</span><span class="gmail-pln">
  GC      time    </span><span class="gmail-lit">2.894</span><span class="gmail-pln">s  </span><span class="gmail-pun">(</span><span class="gmail-pln">  </span><span class="gmail-lit">0.816</span><span class="gmail-pln">s elapsed</span><span class="gmail-pun">)</span><span class="gmail-pln">
  EXIT    time   </span><span class="gmail-pun">-</span><span class="gmail-lit">0.003</span><span class="gmail-pln">s  </span><span class="gmail-pun">(</span><span class="gmail-pln">  </span><span class="gmail-lit">0.008</span><span class="gmail-pln">s elapsed</span><span class="gmail-pun">)</span><span class="gmail-pln">
  Total   time   </span><span class="gmail-lit">29.048</span><span class="gmail-pln">s  </span><span class="gmail-pun">(</span><span class="gmail-pln"> </span><span class="gmail-lit">32.432</span><span class="gmail-pln">s elapsed</span><span class="gmail-pun">)</span><span class="gmail-pln">

  Alloc rate    </span><span class="gmail-lit">2</span><span class="gmail-pun">,</span><span class="gmail-lit">764</span><span class="gmail-pun">,</span><span class="gmail-lit">643</span><span class="gmail-pun">,</span><span class="gmail-lit">665</span><span class="gmail-pln"> bytes per MUT second

  Productivity  </span><span class="gmail-lit">90.0</span><span class="gmail-pun">%</span><span class="gmail-pln"> </span><span class="gmail-kwd">of</span><span class="gmail-pln"> total user</span><span class="gmail-pun">,</span><span class="gmail-pln"> </span><span class="gmail-lit">97.5</span><span class="gmail-pun">%</span><span class="gmail-pln"> </span><span class="gmail-kwd">of</span><span class="gmail-pln"> total elapsed

gc_alloc_block_sync</span><span class="gmail-pun">:</span><span class="gmail-pln"> </span><span class="gmail-lit">3494</span><span class="gmail-pln">
whitehole_spin</span><span class="gmail-pun">:</span><span class="gmail-pln"> </span><span class="gmail-lit">0</span><span class="gmail-pln">
gen</span><span class="gmail-pun">[</span><span class="gmail-lit">0</span><span class="gmail-pun">].</span><span class="gmail-pln">sync</span><span class="gmail-pun">:</span><span class="gmail-pln"> </span><span class="gmail-lit">15527</span><span class="gmail-pln">
gen</span><span class="gmail-pun">[</span><span class="gmail-lit">1</span><span class="gmail-pun">].</span><span class="gmail-pln">sync</span><span class="gmail-pun">:</span><span class="gmail-pln"> </span><span class="gmail-lit">177</span></code></pre>

<p>I'd still like to figure out how to parallelize the <code>filterProj . parseJson . readFile</code> part, but for now I'm satisfied with what I have. <br></p><p>(I also isolated my crashing to another process launched from the same terminal window.)<br></p></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Jan 21, 2018 at 10:12 PM, Michael Snoyman <span dir="ltr"><<a href="mailto:michael@snoyman.com" target="_blank">michael@snoyman.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>I just wanted to comment on the conduit aspect of this in particular. Looking at your first version:</div><span class=""><div><br><div><pre class="m_6090739341647934176gmail-m_746947386685124369gmail-lang-hs m_6090739341647934176gmail-m_746947386685124369gmail-prettyprint m_6090739341647934176gmail-m_746947386685124369gmail-prettyprinted"><code><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln">conduitFilesFilter </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">::</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln"> ProjectFilter </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">-></span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln"> Path Abs Dir </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">-></span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln"> IO </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">[</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln">Path Abs File</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">]</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln">
conduitFilesFilter projFilter dirname' </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">=</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-kwd">do</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln">
  </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">(_,</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln"> allFiles</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun"><-</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln"> listDirRecur dirname'
  C.runConduit </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">$</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln">
    C.yieldMany allFiles
    </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">.|</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln"> C.filterMC </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln">filterMatchingFile projFilter</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln">
    </span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pun">.|</span><span class="m_6090739341647934176gmail-m_746947386685124369gmail-pln"> C.sinkList</span></code></pre></div><span style="font-family:monospace,monospace"></span></div><div><br></div></span><div>This isn't taking full advantage of conduit: you're reading in a list of the files in the file system, instead of streaming those values. And the output is a list of `String`, instead of streaming out those `String`s. More idiomatic would look something like:</div><div><br></div><div>sourceFilesFilter projFilter dirname' =</div><div>  sourceDirectoryDeep False dirname' .| filterMC (filterMatchingFile projFilter)</div><div><br></div><div>And then, wherever you're consuming the output, to do so in a streaming fashion, e.g.:</div><div><br></div><div>runConduitRes $ sourceFilesFilter projFilter dirname' .| mapM_C print</div><div><br></div><div>This should help with the increasing memory usage, though it will do nothing about the runtime overhead of parsing the JSON itself.<br></div><div><span style="font-family:monospace,monospace"></span></div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Mon, Jan 22, 2018 at 1:38 AM, erik <span dir="ltr"><<a href="mailto:eraker@gmail.com" target="_blank">eraker@gmail.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div dir="ltr"><div>Hello Haskell Cafe,</div><div><br></div><div>I have written a small, pretty simple program but I am finding it hard to reason about its behavior (and also about the best way to do what I want), so I would like to ask you all for some suggestions.</div><div><br></div><div>For reference, here's a <a href="https://stackoverflow.com/questions/48330690/haskell-conduit-aeson-parsing-large-jsons-and-filter-matching-key-values/48348153#48348153" target="_blank">Stack Overflow question</a> where I described what's going on, but I'll also describe it below.</div><div><br></div><div>My program does the following:<br></div><div><ol><li>Recursively list a directory,</li><li>Parse the JSON files from the directory list into identifiable objects/records,<br></li><li>Look for matching key-value pairs, and</li><li>Return filenames where matches have been found.</li></ol><div>A few details for more context:</div><ul><li>I have to filter between 500,000 and 1 million files (I'm typically trying to reduce down to between 1,000 and 40,000 that represent a particular project). I usually just need the filenames.<br></li><li>Each file is quite large, some of them 5mb or 10mb, and it's not uncommon for them to have deeply nested keys (40,000 keys or so).</li></ul><div>My first version of this program was simple, synchronous, and as straightforward as I could come up with. However, the memory usage increased monotonically. Profiling, I found that most of the time was spent in JSON-parsing into Objects before my code could turn the objects into records (also, as you might imagine, tons of time in garbage collection).<br></div><div><br></div><div>For my second version, I switched to conduit and it seemed to solve the increasing memory issue. My core function now looked like this:</div><div><pre class="m_6090739341647934176m_746947386685124369gmail-lang-hs m_6090739341647934176m_746947386685124369gmail-prettyprint m_6090739341647934176m_746947386685124369gmail-prettyprinted"><code><span class="m_6090739341647934176m_746947386685124369gmail-pln">conduitFilesFilter </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">::</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> ProjectFilter </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">-></span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> Path Abs Dir </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">-></span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> IO </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">[</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">Path Abs File</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">]</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
conduitFilesFilter projFilter dirname' </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">=</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-kwd">do</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(_,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> allFiles</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-pun"><-</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> listDirRecur dirname'
  C.runConduit </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">$</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
    C.yieldMany allFiles
    </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">.|</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> C.filterMC </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">filterMatchingFile projFilter</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
    </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">.|</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> C.sinkList</span></code></pre></div><div><br></div><div>This was still slow and certainly still synchronous. What I really wanted was to run that "filterMatchingFile..." part in parallel across a number of CPUs. As an aside, my filtering function looks like this:</div><div><br></div><div><span style="font-family:monospace,monospace">filterMatchingFile :: ProjectFilter -> Path Abs File -> IO Bool<br>filterMatchingFile (ProjectFilter filterFunc) fpath = do<br>  let fp = toFilePath fpath<br>  bs <- B.readFile fp<br>  case validImplProject bs of  -- this is pretty much just `decodeStrict`<br>    Nothing -> pure False<br>    (Just proj') -> pure $ filterFunc proj'</span><br></div><div><br></div><div>Here are the stats from running this:</div><div><br></div><div><pre class="m_6090739341647934176m_746947386685124369gmail-lang-hs m_6090739341647934176m_746947386685124369gmail-prettyprint m_6090739341647934176m_746947386685124369gmail-prettyprinted"><code><span class="m_6090739341647934176m_746947386685124369gmail-lit">115</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">961</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">554</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">600</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bytes allocated </span><span class="m_6090739341647934176m_746947386685124369gmail-kwd">in</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> the heap
  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">35</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">870</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">639</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">768</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bytes copied during GC
      </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">56</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">467</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">720</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bytes maximum residency </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">681</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> sample</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">))</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
       </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">1</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">283</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">008</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bytes maximum slop
             </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">145</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> MB total memory </span><span class="m_6090739341647934176m_746947386685124369gmail-kwd">in</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> use </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> MB lost due to fragmentation</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">

                                     Tot time </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">  Avg pause  Max pause
  Gen  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">     </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">108716</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> colls</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">108716</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> par   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">76.915</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">20.571</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s     </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.0002</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.0266</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s
  Gen  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">1</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">       </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">681</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> colls</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">680</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> par    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.530</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.147</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s     </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.0002</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.0009</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s

  Parallel GC work balance</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">14.99</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">%</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">serial </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">%,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> perfect </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">100</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">%)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">

  TASKS</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">10</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">1</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bound</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">9</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> peak workers </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">9</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> total</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">),</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> using </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">-</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">N4</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">

  SPARKS</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> converted</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> overflowed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> dud</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> GC'd</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> fizzled</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">

  INIT    time    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.001</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.007</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
  MUT     time   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">34.813</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">42.938</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
  GC      time   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">77.445</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">20.718</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
  EXIT    time    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.000</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.010</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
  Total   time  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">112.260</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">63.672</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">

  Alloc rate    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">3</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">330</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">960</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">996</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bytes per MUT second

  Productivity  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">31.0</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">%</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-kwd">of</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> total user</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">67.5</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">%</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-kwd">of</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> total elapsed

gc_alloc_block_sync</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">188614</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
whitehole_spin</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
gen</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">[</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">].</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">sync</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">33</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
gen</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">[</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">1</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">].</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">sync</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">811204</span></code></pre></div><div><br></div><div>I thought about writing a plainer (non-conduit) parallel version but I was afraid of the memory issue. I tried to write a Conduit-plus-channels version but it didn't work. <br></div><div><br></div><div>Finally, I wrote a version using stm-conduit, which I thought might be a bit more efficient. It seems to be slightly better, but it's not really the kind of parallelization I was imagining:</div><div><br></div><div><span style="font-family:monospace,monospace">conduitAsyncFilterFiles :: ProjectFilter -> Path Abs Dir -> IO [String]<br>conduitAsyncFilterFiles projFilter dirname' = do<br>  (_, allFiles) <- listDirRecur dirname'<br>  buffer 10<br>    (C.yieldMany allFiles<br>    .| (C.mapMC (readFileWithPath . toFilePath)))<br>    (C.mapC (filterProjForFilename projFilter)<br>         .| C.filterC isJust<br>         .| C.mapC fromJust<br>         .| C.sinkList)</span><br></div><div><br></div><div>The first conduit passed to `buffer` does something like the following: <span style="font-family:monospace,monospace">parseStrict . B.readFile</span>.</div><div><br></div><div>This still wasn't too great, but after reading about handing garbage collection in smarter ways, I found that I could run my application like this:</div><div><pre class="m_6090739341647934176m_746947386685124369gmail-lang-hs m_6090739341647934176m_746947386685124369gmail-prettyprint m_6090739341647934176m_746947386685124369gmail-prettyprinted"><code><span class="m_6090739341647934176m_746947386685124369gmail-pln">stack exec search</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">-</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">json </span><span class="m_6090739341647934176m_746947386685124369gmail-com">-- --searchPath $FILES --name hello +RTS -s -A32m -n4m</span></code></pre></div><div>And the "productivity" would shoot up quite a lot presumably because I'm doing less frequent garbage collection. My program also got a bit faster:</div><div><br></div><div><pre class="m_6090739341647934176m_746947386685124369gmail-lang-hs m_6090739341647934176m_746947386685124369gmail-prettyprint m_6090739341647934176m_746947386685124369gmail-prettyprinted"><code><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">36</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">379</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">265</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">096</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bytes allocated </span><span class="m_6090739341647934176m_746947386685124369gmail-kwd">in</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> the heap
   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">1</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">238</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">438</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">160</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bytes copied during GC
      </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">22</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">996</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">264</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bytes maximum residency </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">85</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> sample</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">))</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
       </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">3</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">834</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">152</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bytes maximum slop
             </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">207</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> MB total memory </span><span class="m_6090739341647934176m_746947386685124369gmail-kwd">in</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> use </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">14</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> MB lost due to fragmentation</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">

                                     Tot time </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">  Avg pause  Max pause
  Gen  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">       </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">211</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> colls</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">211</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> par    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">1.433</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.393</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s     </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.0019</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.0077</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s
  Gen  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">1</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">        </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">85</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> colls</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">84</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> par    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.927</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.256</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s     </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.0030</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.0067</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s

  Parallel GC work balance</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">67.93</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">%</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">serial </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">%,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> perfect </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">100</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">%)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">

  TASKS</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">10</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">1</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bound</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">9</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> peak workers </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">9</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> total</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">),</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> using </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">-</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">N4</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">

  SPARKS</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> converted</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> overflowed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> dud</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> GC'd</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> fizzled</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">

  INIT    time    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.001</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.004</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
  MUT     time   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">12.636</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">12.697</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
  GC      time    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">2.359</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.650</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
  EXIT    time   </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">-</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.015</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0.003</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
  Total   time   </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">14.982</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s  </span><span class="m_6090739341647934176m_746947386685124369gmail-pun">(</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">13.354</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">s elapsed</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">)</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">

  Alloc rate    </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">2</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">878</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">972</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">840</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> bytes per MUT second

  Productivity  </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">84.2</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">%</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-kwd">of</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> total user</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">,</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">95.1</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">%</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-kwd">of</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> total elapsed

gc_alloc_block_sync</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">9612</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
whitehole_spin</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
gen</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">[</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">0</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">].</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">sync</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">2044</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">
gen</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">[</span><span class="m_6090739341647934176m_746947386685124369gmail-lit">1</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">].</span><span class="m_6090739341647934176m_746947386685124369gmail-pln">sync</span><span class="m_6090739341647934176m_746947386685124369gmail-pun">:</span><span class="m_6090739341647934176m_746947386685124369gmail-pln"> </span><span class="m_6090739341647934176m_746947386685124369gmail-lit">47704</span></code></pre></div><div><br></div><div>Thanks for reading thus far. I now have three questions.</div><div><br></div>1. I understand that my program necessarily creates tons of garbage because it parses and then throws away 5mb of JSON 500,000 times. However, I don't really understand why this helps "<code><span class="m_6090739341647934176m_746947386685124369gmail-com">+RTS -A32m -n4m"</span></code> and I'm always reluctant to sprinkle in magic I don't fully understand. Can anyone help me understand what this means?<br><div><br></div><div>2. It seems that the allocation limit is really something I should be using, but I can't figure out how to successfully add it to my package.yml with the other options. From the documentation for GHC 8.2, I thought it needed to look like this but it never works, usually telling me that -A32m and -n4m are not recognizable flags (how do I add them in to my package.yml so I don't have to pass them when running the program?):<br></div><div><br></div><div><span style="font-family:monospace,monospace">ghc-options:<br>    - -threaded<br>    - -rtsopts<br>    - "-with-rtsopts=-N4 <code><span class="m_6090739341647934176m_746947386685124369gmail-com">-A32m -n4m"<br></span></code></span></div><div><br></div><div>3. Finally, the most important question I have is this. When I run this program on OSX, it runs successfully through to completion. However, <i>a few minutes after terminating</i>, my terminal becomes unresponsive. I use emacs for my editor, typically launched from a terminal window and that too becomes unresponsive. This is not a typical outcome for any programs I write and it happens <i>every time</i> I run this particular application, so I know that this application is to blame. <br></div><div><br></div><div>The crazy thing is that force quitting the terminal or logging out doesn't help: I have to actually restart my computer to use the terminal application again.  Other details that may help: <br></div><ul><li>This crash happens after the process id for my program has terminated. </li><li>Watching its progress in HTOP, it never comes close to running out of memory: the value hovers in the same place.</li></ul><div>I can't really deploy an application that has this potential-crashing problem, but  I don't know to debug this issue. My total stab-in-the-dark idea is that heap allocations somehow are unrecoverable even after the process has terminated? Can anyone offer suggestions on things to look for or ways to debug and/or fix this issue? </div><div><code><span class="m_6090739341647934176m_746947386685124369gmail-com"></span></code></div><div><br></div></div><div>Finally, if anyone has suggestions on better ways to structure my application or parallelize the slow parts, I'll happily take those.</div><div><br></div><div>Thanks again for reading. I appreciate any suggestions you may have.</div><div><br></div><div>Best,<span class="m_6090739341647934176HOEnZb"><font color="#888888"><br></font></span></div><span class="m_6090739341647934176HOEnZb"><font color="#888888"><div><br>-- <br><div class="m_6090739341647934176m_746947386685124369gmail_signature"><div dir="ltr">Erik Aker</div></div>
</div></font></span></div>
<br></div></div>______________________________<wbr>_________________<br>
Haskell-Cafe mailing list<br>
To (un)subscribe, modify options or view archives go to:<br>
<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-bi<wbr>n/mailman/listinfo/haskell-caf<wbr>e</a><br>
Only members subscribed via the mailman list are allowed to post.<br></blockquote></div><br></div>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Erik Aker</div></div>
</div>