[Haskell-cafe] Best bit LIST data structure
Ryan Newton
rrnewton at gmail.com
Sun Oct 9 15:18:47 CEST 2011
Yep, it is simple. But I prefer to only use well-tested data structure
libraries where I can! Here's an example simple implementation (partial --
missing some common functions):
module Data.BitList
( BitList
, cons, head, tail, empty
, pack, unpack, length, drop
)
where
import Data.Int
import Data.Bits
import Prelude as P hiding (head,tail,drop,length)
import qualified Data.List as L
import Test.HUnit
data BitList = One {-# UNPACK #-} !Int {-# UNPACK #-} !Int64
| More {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 BitList
instance Show BitList where
show bl = "BitList " ++ show (map (\b -> case b of True -> '1'; False ->
'0') (unpack bl))
-- show bl = "pack " ++ show (unpack bl)
empty :: BitList
empty = One 0 0
cons :: Bool -> BitList -> BitList
cons True x@(One 64 _ ) = More 1 1 x
cons False x@(One 64 _ ) = More 1 0 x
cons True x@(More 64 bv _) = More 1 1 x
cons False x@(More 64 bv _) = More 1 0 x
cons True (One i bv) = One (i+1) (bv `setBit` i)
cons False (One i bv) = One (i+1) (bv )
cons True (More i bv r) = More (i+1) (bv `setBit` i) r
cons False (More i bv r) = More (i+1) (bv ) r
-- TODO: May consider (More 0 _ _) representation to reduce extra
-- allocation when size of the BitList is fluctuating back and forth.
head :: BitList -> Bool
head (One 0 _ ) = error "tried to take head of an empty BitList"
head (More 0 _ r) = error "BitList: data structure invariant failure!"
head (One i bv ) = bv `testBit` (i-1)
head (More i bv r) = bv `testBit` (i-1)
tail :: BitList -> BitList
tail (One 0 _ ) = error "tried to take the tail of an empty BitList"
tail (One i bv ) = One (i-1) bv
tail (More 1 bv r) = r
tail (More i bv r) = More (i-1) bv r
pack :: [Bool] -> BitList
pack [] = One 0 0
pack (h:t) = cons h (pack t)
unpack :: BitList -> [Bool]
unpack (One 0 _) = []
unpack (One i bv) = (bv `testBit` (i-1)) : unpack (One (i-1) bv)
unpack (More 0 _ r) = unpack r
unpack (More i bv r) = (bv `testBit` (i-1)) : unpack (More (i-1) bv r)
drop :: Int -> BitList -> BitList
drop 0 bl = bl
drop n bl | n >= 64 = case bl of
One _ _ -> error "drop: not enough elements in BitList"
More i _ r -> drop (n-i) r
drop n bl = case bl of
One i bv -> One (i-n) bv
More i bv r -> More (i-n) bv r
length :: BitList -> Int
length (One i _) = i
length (More i _ r) = i + length r
-- TODO: index, take, etc
-- TODO: functor instance, etc.
--------------------------------------------------------------------------------
-- Testing:
t1 = pack (L.concat$ L.replicate 10 [True,False,True])
t2 = L.length $ unpack $ pack $ replicate 1000 True
t3 = pack $ replicate 1000 True
t4 = drop 500 t3
p3 = L.and (unpack t3)
p4 = L.and (unpack t4)
t5 = iterate tail t4 !! 250
t5a = length t5
t5b = L.length (unpack t5)
tests :: Test
tests =
TestList
[
show t1 ~=? "BitList \"101101101101101101101101101101\""
, t2 ~=? 1000
, t5a ~=? 250
, t5b ~=? 250
, p3 ~=? True
, p4 ~=? True
]
-- TODO: QuickCheck
On Sun, Oct 9, 2011 at 7:50 AM, Roman Beslik <beroal at ukr.net> wrote:
> I am not aware of such a library, but IMHO this code will be very simple.
> > data Bits b => BitList b = BitList Int {- number of used bits in the next
> component -} b [b]
> Write an isomorphism between @BitList b@ and @ListStep (BitList b)@
> where
> > data ListStep e rc = Nil | Cons e rc
>
>
> On 07.10.11 17:52, Ryan Newton wrote:
>
> Hi Cafe,
>
> We are lucky to have a plethora of data structures out there. But it does
> make choosing one off hackage difficult at times. In this case I'm *not*
> looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the
> choice there), but an efficient representation for a list of bits
> (cons,head,tail).
>
> Let's say that you want to represent tree indices as you walk down a binary
> tree. [Bool] is a simple choice, you only add to the front of the list (0/1
> = Left/Right), sharing the tails. But [Bool] is quite space inefficient.
>
> Something like [Int] would allow packing the bits more efficiently. A Lazy
> ByteString could amortize the space overhead even more... but in both cases
> there's a tiny bit of work to do in wrapping those structures for per-bit
> access. That's probably the right thing but I wanted to check to see if
> there's something else recommended, perhaps more off-the-shelf.
>
> What about just using the Data.Bits instance of Integer? Well, presently,
> the setBit instance for very large integers creates a whole new integer,
> shifts, and xors:
>
> http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit
> (I don't know if it's possible to do better. From quick googling GMP seems
> to use an array of "limbs" rather than a chunked list, so maybe there's no
> way to treat large Integers as a list and update only the front...)
>
> Advice appreciated!
>
> Thanks,
> -Ryan
>
>
> _______________________________________________
> Haskell-Cafe mailing listHaskell-Cafe at haskell.orghttp://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20111009/af48814e/attachment.htm>
More information about the Haskell-Cafe
mailing list