adding to GHC/win32 Handle operations support of Unicode filenames and files larger than 4 GB

Bulat Ziganshin bulatz at HotPOP.com
Mon Nov 21 07:01:06 EST 2005


Hello glasgow-haskell-users,

Simon, what you will say about the following plan?

ghc/win32 currently don't support operations with files with Unicode
filenames, nor it can tell/seek in files for positions larger than 4
GB. it is because Unix-compatible functions open/fstat/tell/... that
is supported in Mingw32 works only with "char[]" for filenames and
off_t (which is 32 bit) for file sizes/positions

half year ago i discussed with Simon Marlow how support for unicode
names and large files can be added to GHC. now i implemented my own
library for such files, and got an idea how this can incorporated to
GHC with minimal efforts:

GHC currently uses CString type to represent C-land filenames and COff
type to represent C-land fileseizes/positions. We need to
systematically change these usages to CFilePath and CFileOffset,
respectively, defined as follows:

#ifdef mingw32_HOST_OS
type CFilePath = LPCTSTR
type CFileOffset = Int64
withCFilePath = withTString
peekCFilePath = peekTString
#else
type CFilePath = CString
type CFileOffset = COff
withCFilePath = withCString
peekCFilePath = peekCString
#endif

and of course change using of withCString/peekCString, where it is
applied to filenames, to withCFilePath/peekCFilePath (this will touch
modules System.Posix.Internals, System.Directory, GHC.Handle)

the last change needed is to conditionally define all "c_*" functions
in System.Posix.Internals, whose types contain references to filenames
or offsets:

#ifdef mingw32_HOST_OS
foreign import ccall unsafe "HsBase.h _wrmdir"
   c_rmdir :: CFilePath -> IO CInt
....
#else
foreign import ccall unsafe "HsBase.h rmdir"
   c_rmdir :: CFilePath -> IO CInt
....
#endif

(note that actual C function used is _wrmdir for Windows and rmdir for
Unix). of course, all such functions defined in HsBase.h, also need to
be defined conditionally, like:

#ifdef mingw32_HOST_OS
INLINE time_t __hscore_st_mtime ( struct _stati64* st ) { return st->st_mtime; }
#else
INLINE time_t __hscore_st_mtime ( struct stat* st ) { return st->st_mtime; }
#endif

That's all! of course, this will broke compatibility with current programs
which directly uses these c_* functions (c_open, c_lseek, c_stat and
so on). this may be issue for some libs. are someone really use these
functions??? of course, we can go in another, fully
backward-compatible way, by adding some "f_*" functions and changing
high-level modules to work with these functions


-- 
Best regards,
 Bulat                          mailto:bulatz at HotPOP.com





More information about the Glasgow-haskell-users mailing list