[Haskell-cafe] Decorating a list of strings

Fri Nov 3 04:47:03 EST 2006

Steve Schafer wrote:
> I have a list of text strings:
> 
>  ["Alice", "Bob", "Cindy", "Bob", "Bob", "Dave", "Cindy"]
> 
> As you can see, some of the strings occur only once; others appear two
> or more times.
> 
> I would like to end up with a new list, according to the following
> rules:
> 
> 1) If a string occurs only once in the original list, then that string
> appears unchanged in the new list.
> 
> 2) If a string occurs more than once in the original list, then each
> occurrences of that string is "decorated" with a numerical suffix
> indicating its relative position among its peers.
> 
> In the case of the above list, the result of applying these rules would
> be:
> 
>  ["Alice", "Bob:1", "Cindy:1", "Bob:2", "Bob:3", "Dave", "Cindy:2"]
> 
> So, how do we do this? The straightforward approach is to traverse the
> list twice, once to build a table of repeat counts for the different
> strings, and a second time to generate the new list, based on the old
> list plus the table of repeat counts. Of course, it's possible to
> streamline this into a single traversal by building up a list of thunks
> at the same time as the table of repeat counts is generated, and then to
> resolve those thunks by applying them in one fell swoop to the table.
> But while this does _conceptually_ streamline the problem, it doesn't
> offer any practical improvement; it takes at least as much processing
> effort to resolve the thunks as it would to make a second pass through
> the list.

The processing effort is a matter of optimization.  It is entirely possible that
for small lists that fit in the CPU cache that a double traversal is very cheap,
where for a large lists that spills into swap space on disk that a second
traversal is very expensive.  So the obvious thing to do is just write the
easiest to maintain version and if the performance benchmarks are bad then try
the other method.

> Can we do better?

no.