-O and -prof

Simon Marlow simonmar@microsoft.com
Fri, 16 May 2003 13:04:18 +0100


=20
> John Meacham <john@repetae.net> writes:
>=20
> > I always use them in combination. simply because optimization can
> > drastically change the memory useage/profile. profiling the=20
> unoptimized
> > version seems rather moot.=20
>=20
> Exactly.  If this is indeed the intention, I guess the stack overflow
> is a bug?

Yes, strictly speaking.  It may be a spurious case of the code generated
with -O -prof being slightly different from that generated by -O alone -
this happens because having cost centres floating around in the
intermediate code disables some transformations that would normally be
applicable.

Nevertheless, if you can provide a test case we'll look into it.

> > BTW. does O2 still do not much more than -O? it seems to reduce the
> > memory footprint of some of my apps pretty noticbly.
>=20
> Not sure.  The docs seem to indicate the speed improvement is
> negligible. =20

It's normally negligible.  -O2 turns on one more optimisation pass,
which *might* have a significant impact on your program if it hits the
inner loop, but in most cases just costs you extra compilation time for
not much benefit.

> I thought I saw some hints in the GHC docs, including using
> -fvia-C, but I couldn't find them, and  I'm not sure if they would be
> still current.
>=20
> Memory footprint is a problem, I wonder if GHC makes any
> effort to pack strict data types?  I.e.
>=20
>   data D1 =3D A | B =20
>   data D2 =3D A2 | B2 | C2
>=20
>   data D3 =3D D !D1 !D2  -- could fit inside e.g. a Word8?

We don't do any useful optimsation of the representation of D3 here, but
we could (it's been on my ToDo list since I implemented
-funbox-strict-fields some time ago).  In a similar vein, the strictness
analyser doesn't take advantage of strict enumerated types - it could
map them to Int#, for example.

Semitagging (an optimisation in the works along with optimistic
evaluation) will provide some of the benefit that a more efficient
representation would yield here.
=20
> Is there an elegant way to achieve this manually (if I know I'll need
> large arrays of D3s, for instance -- can I map them to arrays of Word8
> or a similar type?)

If you map D1 and D2 to Int by hand, then use

  data D3 =3D D !Int !Int

you'll get the speed benefit, but not all the space (each field will
still take up a 32-bit word).

Cheers,
	Simon