<div dir="ltr"><div>Michael,</div><div><br></div><div>Thanks to your insight I took a look at Data.Vector.Unboxed</div><div><br></div><div>I was very happy, but there is a caveat; Looking at the documentation:</div><div><br></div><div><span style="color:rgb(0,0,0);font-family:sans-serif;font-size:13px;line-height:18.2px">"In particular, unboxed vectors of pairs are represented as pairs of unboxed vectors". </span><font color="#000000" face="sans-serif"><span style="line-height:18.2px"><a href="https://tldrify.com/cfj">https://tldrify.com/cfj</a></span></font><br></div><div><span style="color:rgb(0,0,0);font-family:sans-serif;font-size:13px;line-height:18.2px"><br></span></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">it seems that derivingUnbox and all the Unboxed Vectors code do as much as is possible with what GHC offers to make things unboxed, which are reduced to the primitive types.</span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px"><br></span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">So the only way to pack an user defined data is to create one Vector of unboxed primitive type for each field of the data . So if I have a data with five fields the result is five Vectors.</span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px"><br></span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">This is nice in some cases, but does most of the time does not. this does not solve the problem of CPU cache since the fields in the data are at least lenght (Vector) away. I mean that if the vector is moderately long, if the first field is in the cache, the second or third etc may not be. Usually the fields of any data are handled together. </span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px"><br></span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">This solution will not match C- C++ speed consistently for most single threaded programs. Moreover there is an additional cost in the packing/unpacking necessary.</span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px"><br></span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">However this is the best that can be done with what GHC offers now.</span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px"><br></span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">The solution is the possibility of unboxed user-defined types:</span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px"><br></span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">data MyUnboxedData= MyUnboxedData# Int# Float# ....</span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px"><br></span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">where MyUnboxedData# is an unboxed constructor </span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px"><br></span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">Then there are some questions:</span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">1) Am I wrong? </span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">2) It is feashible? </span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">3) It is worth the pain? (My answer: yes </span></font><span style="color:rgb(0,0,0);font-family:sans-serif;line-height:18.2px">This is not satisfactory, it is very important for Haskell success and should be addressed if there is no insurmountable barrier)</span></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">4) Are there alternatives before making a formal proposal?</span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px"><br></span></font></div><div><font color="#000000" face="sans-serif"><span style="line-height:18.2px">Thanks</span></font></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-11-13 12:42 GMT+01:00 Alberto G. Corona <span dir="ltr"><<a href="mailto:agocorona@gmail.com" target="_blank">agocorona@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">That is more practical. <div><br></div><div>It is a pity that the Unboxed class does not interact with the UNPACK pragma. Or it seems so. <br><div><br></div><div>The problem that round my head after being exposed to the Google phylosophy of "count the bytes" is the trade off when choosing containers: Either boxed, pure, linked multithreaded or unboxed mutable single threaded. </div><div><br></div><div>Haskell philosophy choose the first, while almost all the mainstream languages choose the second. That is another problem for the adoption of haskell. Google people say: we don´t need multithreaded programs because we run many single threaded programs in one machine, we use all the cores. </div><div>Only when there is a single application to run the justification for the extra effort of multithreading is justified. And this happens rarely in the real world: In scientific, engineering, financial it is usual. It also happens in distributed settings but in that last case, performance per thread and core is also critical, so Haskell is ruled out.</div><div><br></div><div>But that hasn´t to be that way, or at least that is what I think. DiffUArrays are internally mutable but with a pure interface. They use a kind of versioning. in single threaded environments it theoretically perform at mutable speeds. the versioning of diffArray is the blend of packed and linked structure that can mix the two worlds.</div><div><br></div><div>If the unboxing is extended to any kind of user defined data, the versioning idea can be used to have containers that perform at C speeds when single threaded, but preserve the purity when multithreaded. So it is possible to have the cake and eat it too.</div></div><div><br></div><div>it is even possible to codify balanced binary trees in a compact diffarray, so very fast maps can be used in single threaded applications that also are pure and work multithreaded.</div><div><br></div><div>The goal is to remove the objections about haskell coming from that side of the computer industry by having such containers available without forcing the user to know lots of things about the internals of haskell and GHC.</div><div><br></div><div>I do not know if there are thing going on in some of this direction. Maybe I´m being simplistic and there is something that I miss . </div><div><br></div><div>By experience I know that what sell more from a language is not the real performance numbers, but the approaches that the language takes and how much that promises for the future: </div><div><br></div><div>For example I can develop a kind of container following this idea that perform badly both in single and multithreaded. for sure the early version should be so. But, if people understand that the design has potential for being optimized in the future, people will buy the idea and will accept happily the Haskell language for performance semi-critical apps because they will have arguments against the objections of this kind .</div></div><div class="gmail_extra"><div><div class="h5"><br><div class="gmail_quote">2015-11-12 23:56 GMT+01:00 Michael Snoyman <span dir="ltr"><<a href="mailto:michael@snoyman.com" target="_blank">michael@snoyman.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">How about vector-th-unbox[1]?<div><br></div><div>[1] <a href="https://www.stackage.org/package/vector-th-unbox" target="_blank">https://www.stackage.org/package/vector-th-unbox</a></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 12, 2015 at 2:53 PM, Alberto G. Corona <span dir="ltr"><<a href="mailto:agocorona@gmail.com" target="_blank">agocorona@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">There are no examples. It is hard to guess the functionality and the maturity of he approach.</div><div class="gmail_extra"><div><div><br><div class="gmail_quote">2015-11-12 18:56 GMT+01:00 David Kraeutmann <span dir="ltr"><<a href="mailto:kane@kane.cx" target="_blank">kane@kane.cx</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This might be of interest to you:<br>
<a href="https://hackage.haskell.org/package/structs" rel="noreferrer" target="_blank">https://hackage.haskell.org/package/structs</a><br>
<span><br>
<br>
On 11/12/2015 6:49 PM, Alberto G. Corona wrote:<br>
> Looking at this:<br>
><br>
> <a href="https://downloads.haskell.org/~ghc/6.12.3/docs/html/users_guide/primitives.html" rel="noreferrer" target="_blank">https://downloads.haskell.org/~ghc/6.12.3/docs/html/users_guide/primitives.html</a><br>
><br>
> It seems that it is impossible to manage data in Haskell within a core<br>
> without L1 cache faults. Except for unboxed arrays of primitive types.<br>
><br>
> Since it is impossible to have unboxed arrays of user-defined types.<br>
><br>
> Am I right?<br>
><br>
> This is definitively very bad for tasks that are inherently single threaded<br>
> and in general for the image of Haskell as a practical language.<br>
><br>
> I have more to say about that, but I would like to know first if I´m right<br>
> and second If there is some idea to going on to permit user defined boxed<br>
> datatypes. Or if there is some low level trick for having it using foreign<br>
> call and unsafeCoerce in some way,<br>
><br>
> I know that the language ATS has unboxing a la carte....<br>
><br>
><br>
><br>
</span>> _______________________________________________<br>
> Haskell-Cafe mailing list<br>
> <a href="mailto:Haskell-Cafe@haskell.org" target="_blank">Haskell-Cafe@haskell.org</a><br>
> <a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br>
><br>
<br>
<br>
</blockquote></div><br><br clear="all"><div><br></div></div></div><span><font color="#888888">-- <br><div>Alberto.</div>
</font></span></div>
<br>_______________________________________________<br>
Haskell-Cafe mailing list<br>
<a href="mailto:Haskell-Cafe@haskell.org" target="_blank">Haskell-Cafe@haskell.org</a><br>
<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br>
<br></blockquote></div><br></div>
</div></div></blockquote></div><br><br clear="all"><div><br></div></div></div><span class="HOEnZb"><font color="#888888">-- <br><div>Alberto.</div>
</font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Alberto.</div>
</div>