adding to GHC/win32 Handle operations support of Unicode filenamesand files larger than 4 GB

Wed Nov 23 20:17:24 EST 2005

Hello Simon,

Wednesday, November 23, 2005, 2:22:02 PM, you wrote:

SM> This sounds like a good idea to me.

SM> As far as possible, we should keep the platform-dependence restricted to
SM> the implementation of one module (System.Posix.Internals will do, even
SM> though this isn't really POSIX any more).  So System.Posix.Internals
SM> exports the CFilePath/CFileOffset types, and the foreign functions that
SM> operate on them.

SM> Alternatively (and perhaps this is better), we could hide the difference
SM> even further, and provide functions like

SM>   rmDir :: FilePath -> IO CInt

SM> in System.Posix.Internals.  Similarly for functions that operate on
SM> COff, they would take/return Integer (eg. we already have
SM> System.Posix.fdFileSize).

well... but not well :)  let's consider function c_open for more
informative example. between functions c_open and openFile there is
several levels of "translation": 

1) convert C types to Haskell types
2) check for errno and raise exception on error
3) convert interfaces (translate IOMode to CMode in this example)
4) convert file descriptors to Handles

you suggestion is to build middle-level library whose functions lie
between step 1 and 2 in this scheme:

c_open :: CFilePath -> CInt -> CMode -> IO CInt
1) convert C types to Haskell types
open :: String -> Int -> CMode -> IO Int
2) check for errno
3) convert interfaces
4) convert file descriptors to Handles

This have one obvious benefit - these functions will look very like to
its C counterparts. but on the other side, resulting functions will
not belong to C, nor to Haskell world - they will use Haskell types
but C-specific error signalling

moreover, adding such middle-level functions will not help making
implementation simpler - all differences between platforms are already
covered by definitions of CFilePath/CFileOffset/withCFilePath/peekCFilePath

but i propose to make these middle-level functions after stage 2 or
even 3 in this scheme - so that they will be fully in Haskell world,
only work with file descriptors instead of Handles. for example:

lseek :: Integral int => FD -> SeekMode -> int -> IO ()
lseek h direction offset = do
  let   whence :: CInt
        whence = case mode of
                   AbsoluteSeek -> sEEK_SET
                   RelativeSeek -> sEEK_CUR
                   SeekFromEnd  -> sEEK_END
  throwErrnoIfMinus1Retry_ "lseek"
    $ c_lseek (fromIntegral h) (fromIntegral offset) direction

profits:

1) current GHC.Handle code is monolithic, it performs all these 4
steps of translation in one function. this change will simplify this
module and concenrate it on solving only one, most complex, task -
implementing operations on Handles via operations on FDs

2) part of code in GHC.Handle, what is not really GHC-specific, will
be moved to standard hierarchical libraries, where it will become
ready to use by other Haskell implementations

3) alternative Handle implementations can use these middle-level
functions and not reinvent the wheel. just for example - in
http://haskell.org/~simonmar/new-io.tar.gz openFile code is mostly
copied from existing GHC.Handle

4) we will get full-fledged FD library on GHC, Hugs and NHC for free

5) if this FD library will have Handle-like interface, it can be
used as "poor men's" drop-in replacement of Handle library in
situations where we don't need its buffering and other advanced
features

so, as first step i propose to move middle-level code from GHC.Handle
to Posix.Internals, join FD type definitions, replace CString with
CFilePath where appropriate, and so on. and only after this - make
changes specific for windows. i can do it all. what you will say?

>> That's all! of course, this will broke compatibility with current
>> programs which directly uses these c_* functions (c_open, c_lseek,
>> c_stat and 
>> so on). this may be issue for some libs. are someone really use these
>> functions??? of course, we can go in another, fully
>> backward-compatible way, by adding some "f_*" functions and changing
>> high-level modules to work with these functions

if my changes will be committed only to GHC 6.6 (HEAD) branch, the
problem that types of c_* functions is changed will not be a big
problem - you anyway change some interfaces between major releases.
but now i'm realized that Posix.Internals is part of libraries common
for several Haskell compilers. can such changes break their working?

moreover, i plan to move "throwErrnoIfMinus1RetryOnBlock" to
Foreign.C.Error, and sEEK_CUR/sEEK_SET/sEEK_END - to Posix.Internals.
can it be done?

SM> As regards whether to use feature tests or just #ifdef mingw32_HOST_OS,
SM> in general feature tests are the right thing, but sometimes it doesn't
SM> buy you very much when there is (and always will be) only one platform
SM> that has some particular quirk.  Writing a bunch of autoconf code that
SM> would, if we're lucky, handle properly the case when some future version
SM> of Windows removes the quirk, is not a good use of developer time.
SM> Furthermore, Windows hardly ever changes APIs, they just add new ones.
SM> So I don't see occasional use of #ifdef mingw32_HOST_OS as a big deal.
SM> It's more important to organise the codebase and make sure all the
SM> #ifdefs are behind suitable abstractions.

so i will write the following:

-- Support for Unicode filenames and files>4GB
#ifdef mingw32_HOST_OS

in ALL the places where this feature test must take place. it will
document the code and give ability to easily find/edit all these places
if this will be needed sometime in the future

can i also ask several questions about "new i/o" library? as i see,
this library solves 3 problems:

1) having several streams in 1 file. why it is better than using
just hDuplicate?

2) using different Char encodings on the streams. i think that it can
be better done by renaming current hGetChar/hPutChar to
hGetByte/hPutByte and adding different encodings just as different
"hGetByte->hGetChar" strategies. in this way memory buffers will always
hold untranslated chars and Handle structure will contain the
following fields:

data Handle__ = Handle__ { ...
  haPutChar :: (Word8 -> IO ()) -> Char -> IO (),
  haGetChar :: (IO Word8) -> IO Char ... }

these fields will be modified by hSetEncoding operation

3) using Handles to access memory/sockets/pipes and so on. can this be
solved in the same way as previous problem - by defining class Stream:

class Stream where
  sPutBuf, sGetBuf, sSeek, ....

and incorporating in Handle instance of these class instead of haFD:

data Handle__ = Handle__ {
      haStream        :: forall s . Stream s => s

?

-- 
Best regards,
 Bulat                            mailto:bulatz at HotPOP.com