[Haskell-cafe] looking for a good algorithm

Casey Hawthorne caseyh at istar.ca
Wed Nov 11 18:32:35 EST 2009


So, as I understand it, you have a very large sparse table, thousands
of rows and hundreds of columns, of which each cell within a column of
type String, Int, or Double can contain one of those types or nothing.

Then you to want to shuffle the rows to maximize the number of columns
whose first 100 rows have at least one number (Int or Double), given a
list of preferred column names since there is no guarantee that every
number column will have at least one number in its first 100 rows
after shuffling.


I'm wondering about hashing on the rows and hashing on the columns,
then the column hash has the number of Int's or Double's (don't need
the String's) in that column and the rows they are in.

The row hash would have the number of Int's and Double's in that row
and what column's they are in.

Then;

Then scan the row hash and sort into descending order, and by tagging
those rows, not by actually moving them.

Then I think your ready for simmulated annealing.


--
Regards,
Casey


More information about the Haskell-Cafe mailing list