[Haskell-cafe] stream interface vs string interface: references
damodar kulkarni
kdamodar2000 at gmail.com
Tue Sep 3 09:30:08 CEST 2013
Thank you very much for the detailed explanation. It surely was an
enlightenment for me. Especially the comments
Java makes "obtain print version as a string" the basic form and "append
> print version to output stream" a derived form.
>
Smalltalk makes "append print version to output stream"
> the basic form and "obtain print version as a string" a derived form.
I wonder, this seems a good example of how such a seemingly 'small' design
decision can mean so much in terms of performance difference.
Surely, in Java I will have to take a detour to get around this handicap by
implementing something 'shows' on my own or may be use an equivalent
function from some library BUT in any case I will have to be aware of this.
Please give me a more general principle that can be learnt from this design
lesson, which you might think of. It doesn't quite seem to be the 'kiss'
principle i.e. "keep it simple stupid".
What is really behind the Smalltalk's decision?
Thanks and regards,
-Damodar Kulkarni
On Tue, Sep 3, 2013 at 12:06 PM, Richard A. O'Keefe <ok at cs.otago.ac.nz>wrote:
>
> On 3/09/2013, at 5:17 PM, damodar kulkarni wrote:
> > I didn't want to clutter that thread so I am asking a question here.
> > Where do I find foundational and/or other good references on the topic
> of "stream interface vs string interface to convert objects to text"? I
> tried google but failed.
>
> It has to do with the cost of string concatenation.
>
> Let's take a simple thing.
> A Set of Strings.
>
> Smalltalk has
>
> String
> printOn: aStream
> aStream nextPut: $'.
> self do: [:each |
> each = $' ifTrue: [aStream nextPut: $'].
> aStream nextPut: each].
> aStream nextPut: $'.
>
> (Smalltalk uses '' for strings with quote doubling for embedded
> quotes and has no character escapes. s nextPut: c writes character
> c to stream s. do: [:...] is a loop.)
>
> Set
> printOn: aStream
> |started|
> started := false.
> aStream nextPutAll: 'a Set ('.
> self do: [:each |
> started ifTrue: [aStream space] ifFalse: [started := true].
> each printOn: aStream].
> aStream nextPut: $'.
>
> Object
> printString
> |stream|
> stream := WriteStream on: (String new: 40).
> self printOn: stream.
> ^stream contents
>
> (A WriteStream is an internal stream. It starts with the
> array-like object you give it and grows it, typically by
> doubling, as needed. `contents' returns the part actually
> written to.)
>
> Let's actually do something subtly different.
> Each Set will contain the printString of a number
> and also another set, so
> a Set('3' a Set('2' a Set('1' a Set())))
>
> s := Set new.
> 1 to: 1000*1000 do: [:i |
> s := Set with: s with: i printString].
> s printOn: Transcript.
>
> The set having been created, how much allocation is done
> by the output phase? *NONE*. And the time is *LINEAR*
> in the size of the output.
>
> To summarise: Smalltalk makes "append print version to output stream"
> the basic form and "obtain print version as a string" a derived form.
> The result is that printing (acyclic) objects of any size takes time
> linear in the size of the output.
>
> Now consider the Java version.
> Java makes "obtain print version as a string" the basic form and
> "append print version to output stream" a derived form.
>
> There's a nasty little wrinkle which is that "foo".toString() is
> "foo" instead of the expected "\"foo\"" in Java. So the output
> will be [3, [2, [1, []]] or something like that.
>
> String {
> String toString() {
> return this;
> }
> }
>
> HashSet {
> String toString() {
> StringBuilder b = new StringBuilder();
> boolean started = false;
> b.append("[");
> for (Object o: this) {
> if (started) b.append(", "); else started = true;
> b.append(o.toString());
> }
> b.append("]");
> return b.toString();
> }
> }
>
> This takes *QUADRATIC* time, but it's annoyingly hard to demonstrate
> because it keeps crashing with a stack overflow for even quite small n.
> Thankfully, the -Xss option comes to the rescue.
>
> n time (seconds)
> 100 0.16
> 200 0.18
> 500 0.22
> 1000 0.30
> 2000 0.62
> 5000 2.58
> 10000 12.08
> 20000 51.54
>
> Not only does it take an obscene amount of time to print a large
> object, it turns over a totally unwarranted amount of space.
>
> Now you might object that sets like this are not realistic.
> If you are trying to write large circuits or abstract syntax
> trees or the like to a file, or using this *style* even if
> not this specific *interface* to write XML documents, the
> example errs by being unrealistically *easy* for Java.
> It's easy to understand what's going wrong.
>
> Consider [2, [1, []]]
> When visiting {}, we create "[]". So far so good.
> When visiting {1, {}}, we *copy* the "[]" into "[1, []]".
> When visiting {2, {1, {}}}, we *copy* the "[1, []]" into
> "[2, [1, []]]".
> And so it goes. This does an awful lot of copying and
> *none* of it is needed given a sane interface.
>
> What is the take of Haskell on this?
> There is a *reason* 'shows' exists.
>
> newtype Date = Date (Int,Int,Int)
> instance Show Date
> where showsPrec _ (Date (y,m,d)) rest =
> "Date (" ++ shows y ("," ++ shows m ("," ++ shows d (")" ++
> rest)))
>
> The only uses of ++ here are where the left operand is a short literal.
> Using this approach, Haskell programs can produce strings in linear time
> and linear space.
>
> For lazy I/O, using shows in Haskell is a good analogue of using
> #printOn: in Smalltalk. The basic form is "include this as PART of
> a stream", with "convert this to a whole string" as a derived form.
>
> What the equivalent of this would be for Iteratees I don't yet
> understand.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130903/0879f93d/attachment.htm>
More information about the Haskell-Cafe
mailing list