The size of things

Simon Marlow simonmar@microsoft.com
Thu, 14 Feb 2002 15:13:01 -0000


> I'm building some stuff here that is basically a glorified and
> specialized version of quicksort.  It is very quick, and works like a
> charm for my purposes, except that it consumes way too much space,
> about 100 bytes times the size of the input list.  Which means that,
> for my data sets, the program exceeds available RAM and uses swap,
> which in turn kills performance.
>=20
> Is there any guideline how much space the various constructs use?
> E.g. how much for cons cells (or whatever that make up lists), how
> much for a data type with only nullary data constructors -- should I
> use Word8s instead? -- how much for tuples, and so on.  I realize I
> can (and I do somewhat) use profiling to determine heap use, but it'd
> be nice to be able to do back-of-envelope calculations and have some
> idea of what's the right approach.=20

Ok, roughly speaking, in GHC every object requires one word plus one =
word for each field (a word is 4 bytes on a 32-bit machine).  In other =
words, a cons cell will require 3 words.

An Int requires two words.  A Char usually requires no words (chars up =
to 256 are cached in the RTS).  Nullary constructors require no dynamic =
storage.  The smallest dynamic object is 2 words: ie. a Word8 requires 2 =
words, even though the useful data it contains is only a single byte.

Of course, all of this just applies to fully-evaluated structures - any =
suspensions (thunks) will screw up the calculations, and certainly the =
amount of temporary allocation required to *generate* a particular =
structure is very rarely predictable.

You can save storage by using strict constructor fields and using the =
-funbox-strict-fields flag.  eg.

	data FloatList =3D FloatNil | FloatCons !Float FloatList

requires 3 words for each float in the list, because its actual =
representation is
=09
	data FloatList =3D FloatNil | FloatCons Float# FloatList

(Float# is an unboxed float).  We heartily recommend strict constructor =
fields: apart from saving space and time, they help avoid space leaks.  =
Using -funbox-strict-fields is a good way to get the benefits of unboxed =
types without digging around in GHC's low-level primitive types & =
operations.

Does that help?

Cheers,
	Simon