lines/unlines and "inverse"

D. Tweed tweed@compsci.bristol.ac.uk
Sun, 21 Jul 2002 13:35:42 +0100 (BST)


On 21 Jul 2002, Lars Henrik Mathiesen wrote:

> > From: Ian Lynagh <igloo@earth.li>
> > Date: Sat, 20 Jul 2002 15:03:22 +0100
> > 
> > [The Revised Haskell 98 report] says
> > 
> > -- lines breaks a string up into a list of strings at newline
> > -- characters. The resulting strings do not contain newlines.
> > -- Similary, words breaks a string up into a list of words, which
> > -- were delimited by white space.  unlines and unwords are the
> > -- inverse operations. unlines joins lines with terminating
> > -- newlines, and unwords joins words with separating spaces.
> > 
> > I think the use of "inverse" is potentially confusing given,
> > well, they aren't inverses (or even left or right inverses).
> > 
> > Ian, who thinks (unlines . lines == id) would have been useful. Oh well.
> 
> Well, you do have
> 
>       lines . unlines = id
>       unlines . lines . unlines == unlines
>       words . unwords . words = words
> 
> (unwords . words . unwords) cannot be simplified to unwords, though;
> the results will differ on input that contains 'words' with leading,
> trailing, or multiple consecutive spaces.
>
> However, if you observe a few reasonable constraints on the input to
> any of the functions, you can get it back by feeding the output to the
> 'inverse' function:
> 
>       words: input must have no leading or trailing blanks
>       lines: input must end in a newline
>       unwords: input list elements must be non-empty and must not
> 	       contain blanks
>       unlines: input list elements must not contain newlines

To put what Lars is saying in a more `puffed up' way, there's an idea of a
`canonical form' for a string containing a words and lines, and you do
have

   unlines.lines == toCanonicalRep

It's no different from if you had a representation of polynomials
represented as sums of powers using lists of (coefficient,power) pairs
ordered by descending power values so that, eg, 2 x^2 +1 would be
represented by either [(2,2),(1,0)] or [(2,2),(0,1),(1,0)] since zero
coefficients don't affect things. Given functions `factorise' which
convert to a product of factors representation and `expand' which expands
a product of factors, then these are inverses in the sense that

        f.expand.factorise == f

__providing f does not give different values depending upon coefficients
which are zero__, i.e., f is completely defined by it's action on the
canonical forms of the polynomials. There'd be no `cognitive dissonance'
calling them inverses in this case because zero coefficients don't have
any interesting effects on polynomials (e.g., changing the length of the
coefficient list isn't interesting; if they somehow added new zeroes they
would become interesting). But arguably for strings precisely what the
whitespace originally was __is__ significant and we don't naturally have a
mental model with the canonical form given by Lars, so maybe replacing
`inverse' with the more vague `converse' might be good.

___cheers,_dave_________________________________________________________
www.cs.bris.ac.uk/~tweed/  |  `It's no good going home to practise
email:tweed@cs.bris.ac.uk  |   a Special Outdoor Song which Has To Be
work tel:(0117) 954-5250   |   Sung In The Snow' -- Winnie the Pooh