[Haskell-cafe] Question about TagSoup

Neil Mitchell ndmitchell at gmail.com
Mon Dec 6 19:57:38 CET 2010


Hi David,

I see no reason not to use TagSoup for this, assuming it does what you
want. It wasn't really designed for either modification or round
tripping, so be careful that things like entities don't become
corrupted. Also note that this won't replace all the contents of the
Content tag, only the first text node, so if someone writes
<Content><i>Text</i></Content> you won't hit it. But if it works, I'd
stick with it - it's light weight and easy to get to grips with.

Also your use of recursion seems perfectly reasonable. I often find
the easiest way to encode some kind of multiple element search (i.e.
for the <Content> tag and it's following text) is with direct
recursion - although I'm certain some kind of fold would also work.

Thanks, Neil

On Fri, Dec 3, 2010 at 1:45 PM, Alex Rozenshteyn <rpglover64 at gmail.com> wrote:
> I really wouldn't use tag soup for this.  Haskell has libraries specifically
> for XML processing which might be better suited to your needs.
>
> On Fri, Dec 3, 2010 at 5:59 AM, David Virebayre <dav.vire+haskell at gmail.com>
> wrote:
>>
>> Hello café,
>>
>> I have seen tutorials about extracting information from a tag soup, but I
>> have a different use case:
>> I want to read a xml file, find a tag, change its content, and write the
>> xml file back.
>>
>> This is an example of the files
>>
>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>> <idPkg:Story
>> xmlns:idPkg="http://ns.adobe.com/AdobeInDesign/idml/1.0/packaging"
>> DOMVersion="7.0">
>>        <Story Self="ub9fad" AppliedTOCStyle="n" TrackChanges="false"
>> StoryTitle="$ID/" AppliedNamedGrid="n">
>>                <StoryPreference OpticalMarginAlignment="false"
>> OpticalMarginSize="12" FrameType="TextFrameType"
>> StoryOrientation="Horizontal" StoryDirection="LeftToRightDirection"/>
>>                <InCopyExportOption IncludeGraphicProxies="true"
>> IncludeAllResources="false"/>
>>                <ParagraphStyleRange
>> AppliedParagraphStyle="ParagraphStyle/prix">
>>                        <CharacterStyleRange
>> AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
>>                                <Content>zzznba5</Content>
>>                        </CharacterStyleRange>
>>                </ParagraphStyleRange>
>>        </Story>
>> </idPkg:Story>
>>
>> Assuming I want to change the content of the "Content" tag, this is what I
>> came up with (simplified), I'm using direct recursion. Is there a better way
>> ?
>> ts = do
>>  soup <- parseTags `fmap` readFile "idml/h00/Stories/Story_ub9fad.xml"
>>  writeFile "test" $ renderTagsOptions renderOptions{optMinimize = const
>> True}
>>                   $ modif soup
>>
>> modif [] = []
>> modif (x@(TagOpen "Content" []):TagText _m : xs) = x : TagText "modified"
>> : modif xs
>> modif (x:xs) = x : modif xs
>>
>> David.
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
>
>
> --
>           Alex R
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>



More information about the Haskell-Cafe mailing list