memory useage of data types in the time package

Duncan Coutts duncan.coutts at googlemail.com
Fri May 21 09:54:20 EDT 2010


Hi Ashley (and other people interested in the time lib),

I wonder if we could look more closely at the in-memory representations
of the data types in the time library to see if we could make them a bit
smaller.

There are various sorts of programs that deal with a large quantity of
data and often that includes some date/time data. It's a great shame
that programs dealing with lots of time data might have to avoid using
the time package and revert to things like
newtype Seconds = Seconds Int32 
simply because they take less space, and can be unpacked into other data
structures.

Looking at the representations in the time package, I think there are a
number of places where the size could be reduced without affecting the
behaviour or significantly affecting the range of values that can be
represented.

For example,

data TimeOfDay = TimeOfDay {
	todHour    :: Int,
	todMin     :: Int,
	todSec     :: Pico
}

This uses 40 bytes (or 80 bytes on 64bit) but it could be brought down
to just 20 bytes (or 24 on 64bit).

We could use strict fields here for the Ints and build the package with
-funbox-strict-fields. This would save 4 words. The most expensive one
here is Pico which uses an Integer underneath and thus cannot be
unboxed. We could save a further two words if Pico were based on Int64
instead of Integer. In principle, the hour and minutes could be Int16
and perhaps in future ghc might be able to pack them into a single word
inside the TimeOfDay record.

If we find cannot change Pico (since it's in the base package), then we
could change the time package to use some equivalent fixed point type.


newtype Day = ModifiedJulianDay {toModifiedJulianDay :: Integer}

This could use Int64 without significantly affecting the range. It would
still let us represent dates trillions of years into the future or past.
(Using Int32 would allow dates a mere few million years in the future).

Again, the point of using Int64 is to save an indirection and to allow
the type to be unboxed into other constructors such as:

data UTCTime = UTCTime {
	utctDay :: Day,
	utctDayTime :: DiffTime
}

This takes 7 words, 28 or 56 bytes. It could be reduced to 20 or 24 bytes.

This one is especially useful to make smaller since it is used as a
timestamp in many applications, so there tend to be a lot of them.

newtype DiffTime = MkDiffTime Pico

Again, Pico based on Int64 rather than Integer would save space, save an
indirection and allow further unboxing.

In general, do any of the date/time record types need to have lazy
fields? My guess is that they do not.

Duncan



More information about the Libraries mailing list