[Haskell-cafe] serialized data structures (Was: Generalized, named, and exportable default declarations)

YueCompl compl.yue at icloud.com
Wed Apr 14 07:41:04 UTC 2021


As for text handling:

[There Ain’t No Such Thing As Plain Text.](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses)

[What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](https://kunststube.net/encoding)

Just some random hits from google search, thanks to globalization, variable-byte-width and even variable-byte-order is the norm today, I see package text hasn't catchup yet:

http://hackage.haskell.org/package/text

> Currently the text library uses UTF-16 as its internal representation which is neither a fixed-width nor always the most dense representation for Unicode text. We're currently investigating the feasibility of changing Text's internal representation to UTF-8 and if you need such a Text type right now you might be interested in using the spin-off packages text-utf8 and text-short.

I do think Haskell can excel at managing compact annotative data structures around raw utf-8 bytes as the major payload, and foster sophisticated manipulation APIs beyond counting and slicing with the out-dated fixed-width character assumption.

> On 2021-04-14, at 06:45, Carter Schonwald <carter.schonwald at gmail.com> wrote:
> 
> Indeed.  
> 
> I think there’s a few viable directions folks are exploring on the string front. 
> 
> As for rules based optimization, I think that there’s room for more robust systems, eg can any of the ideas in for example the egraphs good paper from popl 2021 or the associated egg library be adapted to allow for more robust optimization in ghc or similar language for fusion?  I suspect yes, but with some serious work around cost model and how unfolding is done (we shouldn’t need to inline to allow fusion that results in choosing to inline!)
> 
> On Tue, Apr 13, 2021 at 4:30 PM YueCompl via Haskell-Cafe <haskell-cafe at haskell.org> wrote:
> I suggest it won't need to be as efficient as Text, just reasonable efficient will suffice. 
> 
> C++'s mantra of “you don’t pay for what you don’t use” is overly emphasizing on the machine aspect on today's stand point, as machine price (hardware purchase, energy consumption for the run, time to result) descending and human price (programmer / analyst / management mental overhead, time to production deployment, bug tracking & resolution, maintenance & service) ascending, more and more orgs will be willing to pay reasonably more on machines to save the cost on humans.
> 
> GHC / Haskell's unique trade off w.r.t. optimization may be the new sweet spot in coming years as I feel it.
> 
> > On 2021-04-13, at 22:58, Mario <blamario at rogers.com> wrote:
> > 
> > On 2021-04-13 5:20 a.m., Henning Thielemann wrote:
> >> 
> >> We have seen a lot of effort of better integrating Text into Haskell programming. The only purpose of doing so is to replace String by something more space and time efficient. What would happen if we invest equally much time into making String as efficient as Text? At ICFP 2019 I attended a talk about Gibbon:
> >> 
> >>    https://github.com/iu-parfunc/gibbon
> >> 
> >> The idea of the project is to serialize (Haskell's) tree data structures in memory as much as possible. Wouldn't this enable us to use String instead of Text, again, maybe even lists instead of Vectors? No more Text integration efforts, no more external library with GHC-specific manual optimizations. Unfortunately, the project is still in an early stage. So far, it only supports strict data structures.
> > 
> > 
> > I don't want to be unfair to the project without investigating it closer, but my feeling is that it goes against the spirit of the times. There's been some disillusionment with the shortcut fusion, rule-based rewriting, and similar advanced techniques. It's probably been inevitable that, as GHC slowly shifts from research and teaching to industrial use, the community would get jaded with amazing but flukey research results and put more value on boring predictability instead. Unless Gibbon can make String perform *consistently* as efficient as Text, I don't see the project gaining adoption.
> > 
> > 
> > _______________________________________________
> > Haskell-Cafe mailing list
> > To (un)subscribe, modify options or view archives go to:
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> > Only members subscribed via the mailman list are allowed to post.
> 
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.



More information about the Haskell-Cafe mailing list