[Haskell-cafe] How to safely and fast write repa array to a file?

Compl Yue compl.yue at icloud.com
Sat Apr 4 14:08:28 UTC 2020


I have a feel that given the result data is dense and already in RAM, 
your approach should already be the most safe & efficient one, though 
benchmarks may favor other slight variants with different sized arrays, 
the speed diffs should be neglectable.

But from overall architecture, I suggest it can be even more proficient 
to mmap the data file into foreign ptr in order to back the array to 
receive computation result, with virtual memory, then after the 
computation, do `msync` to guarantee the data is written to non-volatile 
storage. This puts no burden at GC in the first place, and of course 
demands no further memory pinning etc. at all, by just leveraging the 
os' virtual memory system (and modern file systems that tightly coupled 
with it) for its designed purpose.

I'm myself doing a PoC of an array database thing, I'm currently using 
the vector-mmap package's routine to finish the PoC. But the depended 
mmap package lacks `msync`,  and a test case suggests resource leakage 
with GHC 8.6.5, so a stable solution is yet to be worked out ahead the way.

Btw, when you have more then 10k such array files to mmap, you'll hit 
another limit - nofile for Linux, I used to implement a FUSE filesystem 
providing virtual large data files viewing many small files on the 
remote storage server, but written in Go, and am porting that to GHC in 
the near future.


On 2020/4/3 上午3:25, cyberfined via Haskell-Cafe wrote:
> Hello, all. I decide to write parallel ray tracer on haskell with 
> repa. Now, to save repa array to file I use dirty trick casting repa 
> array ptr to bytestring with fromForeignPtr and then writing it to 
> file with hPut. It looks something like that:
>
> import qualified Data.Array.Repa as R
> import qualified Data.Array.Repa.Repr.ForeignPtr as RF
>
> import qualified Data.ByteString as B
> import qualified Data.ByteString.Char8 as BC
> import qualified Data.ByteString.Internal as BI
>
> type Image = Array F DIM2 Pixel
>
> writeImage :: FilePath -> Image -> IO ()
> writeImage path img = bracket (openFile path WriteMode) (hClose) $ 
> \hdl -> B.hPut hdl header >> B.hPut hdl body
>   where Z :. h :. w = R.extent img
>         header = BC.pack $ "P6\n" ++ show w ++ ' ':show h ++ "\n255\n"
>         body = BI.fromForeignPtr(castForeignPtr $ RF.toForeignPtr img) 
> 0 (w*h*3)
>
> My question is: how to write repa array to file safely and fast?
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20200404/24aab54a/attachment.html>


More information about the Haskell-Cafe mailing list