[Haskell-cafe] How to deal with last item with concatMapAccumC in Conduit.
Michael Snoyman
michael at snoyman.com
Fri Jul 21 07:17:02 UTC 2017
I'll preface by saying this probably indicates that the API for
concatMapAccumC should be slightly different than it is currently.
The problem is that there is no way to convert the final accumulator value
into output, and therefore when the input stream ends, that accumulator is
simply dropped. One solution (pretty hacky) is to wrap all of the lines in
a `Just` and then send in a final `Nothing` value to indicate that the
stream is ended. This would look like:
https://gist.github.com/snoyberg/6537120fca2e9b8944e41fe60d285793
Another option is to simply use the conduit primitives (await and yield)
directly:
https://gist.github.com/snoyberg/bd58030db9b9c90f9e1fcf8b31ea10e9
I'd lean towards the latter.
On Thu, Jul 20, 2017 at 9:34 AM, jun zhang <zhangjun.julian at gmail.com>
wrote:
> Dear all
>
> the runnable example code is as blow
>
> ===================================================================
> import Conduit
> import Text.Regex (matchRegex,mkRegex,Regex)
>
>
>
> loghead = mkRegex "^([0-9]{4}-[0-9]{2}-[0-9]{2}
> [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} )"
> -- "2015-01-25 00:04:18,840"
>
> logMerge::Regex->String->String->(String,[String])
> logMerge logregex str accum =
> case matchRegex logregex str of
> Just _ -> (str,[(accum++"\n")])
> Nothing -> case null accum of
> True -> (str,[])
> False -> (accum ++ "<br>" ++ str,[])
>
>
> runMerge::String->String->IO ()
> runMerge infile outfile =
> runResourceT $ sourceFile infile $= linesUnboundedC $=
> concatMapAccumC (logMerge loghead ) "" $$ sinkFile outfile
>
> ================================================================
>
> the example input file is
> ---------
> 2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20 | | errorCode:
> toString() = null
> 2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20 |
> codsexception.getErrorCode(): toString()
> {
> errorCode = "UNEXPECTED_PROBLEM"
> severity = ""
> }
> 2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20 | |
> 2015-01-25 00:03:45,331 | DEBUG | WebContainer : 20 | |
> ---------
>
> the expected output is
> ---------
> 2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20 | | errorCode:
> toString() = null
> 2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20 |
> codsexception.getErrorCode(): toString() <br>{<br> errorCode
> =<br>"UNEXPECTED_PROBLEM"<br> severity = ""<br>}
> 2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20 | |
> 2015-01-25 00:03:45,331 | DEBUG | WebContainer : 20 | |
> ---------
>
> the actual output is blow, missing the last line of log
> ---------
> 2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20 | | errorCode:
> toString() = null
> 2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20 |
> codsexception.getErrorCode(): toString() <br>{<br> errorCode
> =<br>"UNEXPECTED_PROBLEM"<br> severity = ""<br>}
> 2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20 | |
> ---------
>
> Thanks
>
>
>
> 在 2017年7月19日,下午7:50,Michael Snoyman <michael at snoyman.com> 写道:
>
> I'm afraid I doon't follow what it meant by the stream here. Could you
> provide a complete, runnable example and indicate what the expected and
> actual output are?
>
> On Mon, Jul 17, 2017 at 5:48 AM, jun zhang <zhangjun.julian at gmail.com>
> wrote:
>
>> Dear cafes
>>
>> I use Conduit to parse a huge file. And I need merge lines by condition.
>>
>> I find the concatMapAccumC can do that and I write a demo as blow(with
>> conduit-combinators-1.0.6,lts-6.18).
>> The problem is if the last item didn’t make condition true, the data only
>> keep in the accum but missing in stream.
>>
>> Any one can give me some advises?
>>
>> Thanks
>>
>>
>> ----------------------------
>> import Conduit
>>
>> test'::Int->Int->(Int,[Int])
>> test' a s = case (a+s) > 5 of
>> True -> (0,[a+s])
>> False -> (a+s,[])
>>
>> testlog::IO [Int]
>> testlog = runConduit $ (yieldMany [1,2,3,4,5,6,3]) $= (concatMapAccumC
>> test' 0 ) $$ sinkList
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170721/932af73c/attachment.html>
More information about the Haskell-Cafe
mailing list