Naive question on lists of duplicates

Dylan Thurston
Sat, 7 Jun 2003 03:06:29 +0200

Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jun 05, 2003 at 08:09:02AM -0500, Stecher, Jack wrote:
> I have an exceedingly simple problem to address, and am wondering if
> there are relatively straightforward ways to improve the efficiency
> of my solution.

Was there actually a problem with the efficiency of your first code?

> The task is simply to look at a lengthy list of stock keeping units
> (SKUs -- what retailers call individual items), stores, dates that a
> promotion started, dates the promotion ended, and something like
> sales amount; we want to pull out the records where promotions
> overlap.  I will have dates in yyyymmdd format, so there's probably
> no harm in treating them as Ints.

(Unless this is really a one-shot deal, I suspect using Ints for dates
is a bad decision...)

> My suggestion went something like this (I'm not at my desk so I
> don't have exactly what I typed):

I have a different algorithm, which should be nearly optimal, but I
find it harder to describe than to show the code (which is untested):

> import List(sortBy, insertBy)
> data PromotionRec  =3D PR {sku :: String, store :: String, startDate :: I=
nt, endDate :: Int, amount::Float}
> compareStart, compareEnd :: PromotionRec -> PromotionRec -> Ordering
> compareStart x y =3D compare (startDate x) (startDate y)
> compareEnd x y =3D compare (endDate x) (endDate y)

> overlap :: [PromoRec] -> [[PromoRec]]
> overlap l =3D filter (lambda l. length l > 1)=20
>                (overlap' [] (sortBy compareStart l))
> overlap' _ [] =3D []
> overlap' active (x:xs) =3D
>   let {active' =3D dropWhile (lambda y. endDate y < startDate x) active} =
>   (x:active') : overlap' (insertBy compareEnd x active') xs

The key is that, by keeping a list of the currently active promotions
in order sorted by the ending date, we only need to discared an
initial portion of the list.

You could get a moderately more efficient implementation by keeping
the active list as a heap rather than a list.


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.2 (GNU/Linux)