Naive question on lists of duplicates
Dylan Thurston
dpt@math.harvard.edu
Sat, 7 Jun 2003 03:06:29 +0200
--d6Gm4EdcadzBjdND
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Thu, Jun 05, 2003 at 08:09:02AM -0500, Stecher, Jack wrote:
> I have an exceedingly simple problem to address, and am wondering if
> there are relatively straightforward ways to improve the efficiency
> of my solution.
Was there actually a problem with the efficiency of your first code?
> The task is simply to look at a lengthy list of stock keeping units
> (SKUs -- what retailers call individual items), stores, dates that a
> promotion started, dates the promotion ended, and something like
> sales amount; we want to pull out the records where promotions
> overlap. I will have dates in yyyymmdd format, so there's probably
> no harm in treating them as Ints.
(Unless this is really a one-shot deal, I suspect using Ints for dates
is a bad decision...)
> My suggestion went something like this (I'm not at my desk so I
> don't have exactly what I typed):
I have a different algorithm, which should be nearly optimal, but I
find it harder to describe than to show the code (which is untested):
> import List(sortBy, insertBy)
>
> data PromotionRec =3D PR {sku :: String, store :: String, startDate :: I=
nt, endDate :: Int, amount::Float}
>
> compareStart, compareEnd :: PromotionRec -> PromotionRec -> Ordering
> compareStart x y =3D compare (startDate x) (startDate y)
> compareEnd x y =3D compare (endDate x) (endDate y)
> overlap :: [PromoRec] -> [[PromoRec]]
> overlap l =3D filter (lambda l. length l > 1)=20
> (overlap' [] (sortBy compareStart l))
>
> overlap' _ [] =3D []
> overlap' active (x:xs) =3D
> let {active' =3D dropWhile (lambda y. endDate y < startDate x) active} =
in
> (x:active') : overlap' (insertBy compareEnd x active') xs
The key is that, by keeping a list of the currently active promotions
in order sorted by the ending date, we only need to discared an
initial portion of the list.
You could get a moderately more efficient implementation by keeping
the active list as a heap rather than a list.
Peace,
Dylan
--d6Gm4EdcadzBjdND
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
iD8DBQE+4TqVVeybfhaa3tcRAvYYAJ9JwrS6WKViKTB1lOxArpojN3JbvACggTvN
aMltiRdkXtFNrSTI8BBpPXc=
=mlll
-----END PGP SIGNATURE-----
--d6Gm4EdcadzBjdND--