[Haskell-cafe] Code Review: Sudoku solver

Mon Apr 10 19:26:06 EDT 2006

Moin Claus,

Am Montag, 10. April 2006 10:11 schrieben Sie:
> >> Mine's even faster now (at least on my computer, would you care to test
> >> it on your's? If you don't want to get EnumSet, just change DiffArray to
> >> Array, worked wonders for me), I'll dig into yours tomorrow to see what
> >> I can get out of it to improve my algorithm.
> >
> > Unforunately, no new inference rules :-(
>
> indeed, the crucial difference between our versions was that I'd avoided
> group inference; I thought it would be expensive to implement and, based
> on my own limited experience, rarely used - boy, was I wrong on both
> accounts: it is easily implemented, with a little care for the inner loop,
> not too inefficient, either, and the examples I mentioned (on which my
> solver needed guesswork) simply fall apart once grouping is added (it was
> quite impressive to take the output of my old propagation code and apply
> grouping by hand!).

Just to be sure, with group inference added, your solver must guess for 
exactly the same puzzles as mine?
>
> now my solver takes slightly fewer guesses than your's, though your's
> is still significantly faster (same factor of about 3 between them, both
> having improved by a factor of 2 or so..).
>
> > Two things I don't like about your code:
> > 1. no type declarations
>
> that's what type inference is for!-) I tend to add them only as debugging
> assertions, when language limitations require them, or when I'm fairly
> certain that the code is stable (ie., dead;-). it is useful to think about
> types, but engraving them into the code is overrated (any good ide should

My IDE is kate/kwrite + ghci + hugs, if you know a better one (which doesn't 
involve loading down a 99MB thing like eclipse), I'd be interested to try it 
out.

> be able to show them as if they'd been written in anyway..; lacking such
> ide,
>
> :browse Main will do the trick, though ghci's current formatting is ugly!).
> :

Well, I didn't :b Main and add the signatures before I printed it out, and 
since I read it in bed, I had to figure them out myself.
And as somebody else somewhere stated, a type signature is about the most 
useful piece of documentation.
But I concede that there are also good reasons for leaving them out 
(especially in the early development stage).

> > 2. too much name shadowing, that makes following the code difficult
>
> depends on which parts you mean - I don't like overusing primed or

guesses, candidates and the where clause of findNarrowPos

> numbered variables, and using the same variable name through a
> sequence of steps is usually the last step before factoring that variable
> out into some monad anyway (I did mention that the code was still
> ad-hoc, not cleaned up).

Yes, and if you hadn't, I would have been annoyed by it (after all, I'm the 
only one who may write unreadable code :-)

>
> > apart from that: clever
>
> as in: you'd prefer it if I'd express myself more clearly?-) agreed.

That, too - don't we all want everybody else to express themselves 
immaculately lucid? 
But mostly: clever, as in: clever

> I've been working on cleaning up the code (also trying to remove
> stupidities as they become exposed in the process..).
>

I'd like to see it, when you consider it fit for posting.

> >> lists and adhoc code tend to be performance killers, I doubled the speed
> >> of mine by de-adhoccing the code (and that although I introduced the
> >> speed-killer DiffArray)
>
> there's nothing wrong with lists, just as arrays are not synonymous with
> good performance - it all depends on the usage patterns. there are several

Of course, and I love lists! I meant, lists are slow in situations like these, 
where you repeatedly access elements at various positions, as in 
findSingletonPos and findNarrowPos, where you have to traverse the whole list 
for each row/column/block. Such selections are much faster done over an array 
(I believe, but imagining a 1024 by 1024 sudoku, I'm rather confident)

> interesting examples in this problem:
>
> 1 when I first tried EnumSet for the ranges, there was no substantial
>     performance difference to a list-based representation.
>
>     in theory, the change sounds like a clear win (constant vs linear time
>     membership test), but (a) the ranges are small (they start at size 9,
>     and we do our best to shrink them rapidly, so even if they were at
>     average size 6 or 7, the test would take an average of 3 or 4 steps);
>     (b) membership is not the only operation - for the inner loops in my
>     code, singleton testing and extraction of the single element from a
>     singleton were equally important, and here the relation was reversed:
>     lists give us those operations in constant time while they were linear
>     in the old EnumSet.

Yes, and for those reasons I didn't expect much difference from changing to 
EnumSet, but in the group inference, I form a lot of unions (and before I 
cleaned up my code, also many differences) which I expected to be 
significantly faster with EnumSet, so I gave it a try and it was faster even 
before Bulat's optimization. After that, using EnumSet is about twice as fast 
as using lists.

>
>     Bulat's tabling suggestion (which I interpreted rather more extremely
>     by tabling the actual calls to size and range2list:-) solves this issue

Err, I'm not sure what you mean. Did you also build a table
ranges :: Array Word [Int]
ranges = listArray (l,u) $ map toList' [l .. u] ? 
interesting idea, I should try that.

I did, lost me 10 seconds, so that's not the way.

>     beautifully, making EnumSet starting to look competitive in spite of
>     the particular usage pattern in this problem, and when the grouping
>     inference asked for more calls to size, as well as intersection and
>     union, the choice flipped completely, now in favour of EnumSet
>     (has agreement been reached about an updated version of EnumSet?).
>
>     [if I might make a suggestion for EnumSet: it is not just that the
>      implementation of the standard API can be more efficient for these
>      small sets, but there are additional operations that ought to be
>      included (unless the EnumSet.Set constructor gets exported),
>      such as lifting the Ix instance from Word to Set a]
>
> 2 DiffArrays: they are neither a speed killer nor a universal cure.

I was talking about my programme where it was a speed killer.
I'm well aware of the fact that they may be _much_ faster than ordinary 
arrays, and I hoped that they were here, but they weren't.
Today, I wrote a version using IOArray, hoping to get incredibly fast in-place 
updating, and explicitly making copies via freeze and thaw when guessing is 
required, but no luck (maybe I just don't know how to do it properly), spent 
85% of the time in GC, 18.3% with -H64M -A32M, MUT time is about 15% more 
than plain Array and EnumSet.
If I had even the vaguest idea how I could provide an 
instance MArray IOUArray Entry IO
(or such with Entry replaced by (Int, Set Int) or even 
(# Int, Word #)), I would give that a try. 
I don't understand it, I would have thought that when I call newArray, an 
array of appropriate size is created somewhere on the heap (or wherever) and 
stays there until no longer referenced and writeArray would just change the 
contents of the relevant memory-cell and not touch the rest, so that should 
be fast and allocate less than plain Array, but it seems that isn't so :-(

Now I've changed writeArray to unsafeWrite (analogously for read), that 
brought MUT time to less than the plain Array version, but still spends a lot 
of time gc'ing (16.4% with -H64M -A64M, 10.2% with -H64M -A64M, that brought 
total time to less than plain Array, but had a maximum residency of 
74,252,696 bytes, too much for my liking, plain Array has maximum residency 
of several K)
>
>     they were intended to enable functional arrays with inplace
>     update for sequential access, but without the complex analyses
>     needed to ensure such sequential access, and with the reverse
>     diff to old versions preserving the persistence of copies.
>
>     using them, we'll get inplace update plus a long trail recording
>     the diffs needed to reconstruct the old versions, and if we use
>     them single-threadedly, that trail will just be ignored and the
>     garbage collector will neglect to copy it at some point. but if
>     we combine them with backtracking, those trails are likely to
>     kill much of the performance gains, and it is not so much the
>     width of branching as the depth of each branch: we iterate on
>     that array to narrow ranges and to update positions, and each
>     tiny update extends that trail of reverse diffs (they are not even
>     collapsed, so they can be longer than the array range) - so
>     when we finally abandon a branch and need to return to any
>     of the older versions of that array, we are going through that
>     trail, linearly, to reconstruct that old version from the new one
>     (doing an identity update before entering a branch should give
>     us two independent copies, though we'd need to make that
>     update not on the current version of the array..).

I tried to achieve that, but failed.
>
> 3 Arrays: again, they are no universal performance cure, and
>     their advantages depend on the usage patterns.
>
>     tabling calls to size is an ideal example in favour: a single array
>     construction vs frequent reads. in contrast, a sequence of
>     frequent small updates, as in constraint propagation, seems to
>     be rather disadvantageous: every time we narrow a range or
>     fix a position, the whole array gets copied! on the other hand,
>     those arrays are small, and the cost of copying may be lost in
>     other costs. but if we have to go through many elements,
>     the cost can be reduced by doing updates in bulk: a _list_ of
>     updates, with only one new copy being created, incrementally.
>
>     there is, however, one part of the problem where I'd expect
>     arrays to be advantageous, and that is in generating those lists
>     of updates: here we only inspect the sudoku matrix, reading
>     its elements according to various more or less complex views.
>     this part should be efficiently handled over an array, and as
>     your code shows, doing so may even result in fairly readable
>     code (although the clarity of my own code has improved as
>     well since the time when it used to be very concerned with
>     constructing those several views in rather complex ways;-).
>
> > I believe if you change the representation of puzzles from [(pos,range)]
> > to an Array, you'll get a significant speedup
>
> since my solver's logic now seems to be on par, I might try that
> for the generation of constraints, but I still doubt it is ideal for the
> actual updating of the matrix - perhaps bulk updates might be
> sufficient to remedy that.
>
> but the important lesson is that one can leave non-algorithmic
> performance concerns to the very end (where they need to be
> guided by profiling, and limited to the actual "inner loops") - I
> got my solver fairly fast without ever worrying about specialised
> representations, or strictness, etc.; focussing on algorithmic
> aspects was sufficient and necessary, however:

agree
>
> the naive generate-and-test is impractical, and while simple
> constraint propagation will lead to drastic speedups, those are
> not sufficient (they still left me in the 10000puzzles/hour stage),
> so more careful propagation is needed.
>
> my first narrowing propagation sped up the difficult puzzles, but
> slowed down the simple ones (and there are a lot of those in this
> set); I had already started on complex heuristics to apply that
> propagation selectively before I recalled my own good principles
> and looked at the algorithm again, as that slowdown should not
> have happened in the first place.
>
> and in fact, it turned out to be related to one of those optimizations
> we are all too likely to throw in automatically, without much thought:
> to avoid stumbling over the fixed positions as singletons ranges
> again and again, I had disabled that propagation, and had instead
> left it to each of the other propagation steps to note the appearance
> of new singletons, while ignoring the old ones.
>
> so, when my new narrowing step narrowed the range on some
> position to a singleton, without notifying anyone, that singleton
> got duly ignored by everyone else, assuming that it had already
> been taken care of. result: longer searches, in spite of more
> information!-( rectifying that, and simplifying some structures,
> got the time for the whole set down to the 17min I reported.
>
> on the other hand, one cannot always ignore performance completely,
> relying only on the algorithm: but usage patterns and profiling of actual
> code are the key here. when I finally added my group inference, it did
> slice down the number of guesses for the previously difficult puzzles,
> but the actual runtime for the whole set of puzzles increased by a few
> minutes! profiling showed some rather extravagantly complex code
> ("clever" in the worst sense!-) in the innermost loop, cured by
> simplifying (and clarifying) the code in question. the next step was
> moving to EnumSet, followed by the size/foldBits issue, cured by
> Bulat's tabling suggestion. now we're down to 8min20s or so, still
> with lists for backtracking and for the main matrix. (compared to
> 2min40s or so for your's)
>
> > Rot! Typo in _my_ previous message, 5319 is correct.
>
> I changed the output format to a line per puzzle (solution/number
> and list of guesses - I still like that metric as each guess means that
> we've run out of ideas, which suggests that we want to get the number
> of guesses down to zero), using the same format for both solvers to

and if we get down to zero, all's fine, but I think we shouldn't ignore the 
false guesses.

> allow for easy grep/diff.
>
> >> > puzzles involving guesses: 8165
> >> > largest number of guesses:
> >> >     10 (#9175), 10 (#17200), 10 (#29823), 10 (#30811)
>
> down to 5319 and 0/0/3/0 now.
>
> > We use different guessing strategies (plus, I also have group-inference).
>
> yep, that convinced me that group inference is a must (now added).
>
> as for guessing strategies, you're sorting the non-fixed positions by size
> of range before extracting candidates for guesses, whereas I simply take
> them in the order they appear in the puzzle (which may have been
> somewhat hard to see thanks to another one of those overcomplicated
> pieces of code that has now disappeared..). I had thought about that, but

I have also tried guessing on the first blank, increased time about 12%, so 
until I find a better criterion, I'll stay with fewest possibilities.
The (simple) idea behind that choice is, if I have only two possibilities, I 
have a 50% chance of being right on the first attempt - that reasoning of 
course was based on the 'find one solution and stop' target, for determining 
all solutions, i.e. following all branches, I'm not so sure I could justify 
it. 
However, I've also tried the opposite (just to compare), guess at a position 
with maximum number of choices: MUCH slower.

> couldn't make up my mind about preferring any order, and while the
> numbers of guesses for our two solvers vary wildly between puzzles,
> the total of guesses shows no advantage of sorting (the total looks
> slightly smaller for my solver, it has no two-digit guesses, and fewer
> puzzles with more than 6 guesses, so if anything, the evidence is against
> sorting, but I'd rather focus on reducing the guesses than try to attach
> any significance to these variations..).

Oh, absolutely, avoiding guesses is the best.
>
> if I eliminate the sorting from your solver, both solvers deliver the same
> results, so that's nice!-) but it also means that we have room for further
> improvement.
>
> one reason why I got interested in this problem in the first place was
> that I had been reading about constraint propagation, and wanted to
> get some hands-on experience. in particular, there is a technique
> called "generalized propagation":
>
>     Generalised Constraint Propagation Over the CLP Scheme,
>     by Thierry Le Provost and Mark Wallace. Journal of Logic
>     Programming 16, July 1993. Also [ECRC-92-1]
>     http://eclipse.crosscoreop.com:8080/eclipse/reports/ecrc_genprop.ps.gz
>
> if I may oversimplify things a bit:
>
> 1 split inferences into branching (search) and non-branching (propagation)
> 2 distribute common information from non-branching inferences
>     out of branches
> 3 to enhance applicability of step 2, add approximation of information
>
> applied to our sudoku problem, we have deterministic propagation
> and branching search, and what we do if we run out of propagation
> opportunities is to select a non-fixed position, then branch on each
> of the numbers in its range. adding generalised propagation (as I
> understand it) amounts to a form of lookahead: we take each of
> those branches, but look only at the deterministic information we
> can extract in each (ignoring further search). then we check whether
> the branches have any of that information in common, and apply any
> such common information to our matrix. rinse and repeat. an obvious
> way to refine this process by approximation is to build the union of
> the ranges on each position for the matrices resulting from all the
> branches.
>
> using this idea with a single level of branch lookahead, and selecting
> a position with the widest range for inspection, reduces the number
> of puzzles requiring guesses to 373, with a maximum depth of 3
> guesses.

But this method, if I'm not grossly mistaken, does involve guessing - refined, 
educated guessing, but guessing still. On the other hand, one might correctly 
state that even the wildest guess and check is logic (proof by contradiction; 
if I put values v_1 to v_n at positions p_1 to p_n respectively and then 
value v_(n+1) at position p_(n+1), I finally can't satisfy the constraints, 
hence, given v_1 ... v_n at p_1 ... p_n, at p_(n+1), v_(n+1) cannot be ...)

>
> it doesn't reduce runtime much, yet (7min50s), but I haven't
> even started profiling, either. I guess I have to look into
> representations now, if I ever want my solver to catch up with
> your's in terms of runtime;-)
>
> cheers,
> claus

Cheers,
Daniel

-- 

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
	-- Blair P. Houghton