[Haskell-beginners] Almost there. Bizzaro problem with libcurl forces kludge that I would like to be rid of.

Michael Litchard michael at schmong.org
Mon Jan 31 23:42:30 CET 2011


So, I've been writing a http client designed to simulate a session with one
of my company's products.
The crucial thing we need from this program is a cookie jar. When I began I
was doing exploratory code to figure things out about http. I managed to
make a cookie jar with my toy code, along with the intermediate steps needed
to generate the cookies that represent a session.
Then it was time to integrate my experimental toy code into something closer
to what production code should look like.
This is where bizarro behavior starts to happen. All the intermediate steps
appear to be working, as later steps are dependent on earlier steps
succeeding. Yet in the end, no cookie jar.

We did an strace on the binary and it appears that there seems to be a
problem with the reading/writing of  the file to be the cookie jar, and we
don't know why. The kludge solution is to write a wrapper that calls the
program in such a way that I can capture standard error and parse out the
cookies. I would hate for things to remain this way. It disturbs me.


So, my toy code can create a cookie jar. But what will one day be production
code does everything correctly except produce the cookie jar. Included is
the entire program, standard error, and strace output. Also, my toy program
which succeeds in producing a cookie jar. If anyone wants to take a stab at
this that would be peachy. I'm looking for avenues to explore and good
questions to ask, as I am out of both.

P.S. Thanks to everyone who got me this far. If this is what we end up
having to use, it does get the job done. I'm just trying to get rid of my
kludge work-around.


> {-# LANGUAGE FlexibleContexts #-}
> module Main where

------------------------------------------------------------------------------
-- Imports
------------------------------------------------------------------------------

> import Control.Monad
> import Control.Monad.Error
> import Network.Curl
> import System.Environment (getArgs)
> import HtmlParsing
> import SessionCreator

> main :: IO ()
> main = do
>   curl <- initCurl
>   user:pass:_ <- getArgs
>   resp <- generateResourceHtml curl user pass
>
>   case resp of
>     Left err -> print err
>     Right body -> obtainCookies curl body


> {-# LANGUAGE FlexibleContexts #-}
> module SessionCreator
>     (
>       initCurl,
>       curlResp,
>       urlBase,
>       urlInitial,
>       urlLogin,
>       urlLogOut,
>       urlFlash1,
>       urlFlash2,
>       urlLaunch,
>       urlLaunchTest,
>       urlQuickCreate,
>       urlGetResource,
>       urlShowWebForwards,
>       respBody,
> --      login
>       resourceOpts,
>       loginOpts
>     ) where

------------------------------------------------------------------------------
-- Imports
------------------------------------------------------------------------------

--import qualified Data.ByteString as B

> import Network.Curl
> import Control.Monad
> import Control.Monad.Error

> initCurl :: IO Curl
> initCurl = do
>   curl <- initialize
>   setopts curl curlOpts
>   return curl

> curlOpts :: [CurlOption]
> curlOpts =
>   [ CurlCookieSession True
>   , CurlCookieJar  "cookies.txt"
>   , CurlCookieFile "cookies2.txt"
>   , CurlFollowLocation True
>   , CurlUserAgent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
>   , CurlVerbose True
>   , CurlHeader True
>   , CurlAutoReferer True
>   , CurlFailOnError True
>   ]


> --- loginOpts looks like this for now
> loginOpts :: String -> String -> [CurlOption]
> loginOpts user pass =
>   [ CurlFailOnError True
>   , CurlPost True
>   , CurlNoBody False
>   , CurlPostFields
>     [ "_charset_=UTF-8"
>     , "javaScript=true"
>     , "username=" ++ user
>     , "password=" ++ pass
>     ]
>   ]

> resourceOpts :: [CurlOption]
> resourceOpts =
>   [ CurlHttpHeaders
>     [ "Accept: text/javascript, text/html, application/xml, text/xml, */*"
>     , "Accept-Language: en-us,en;q=0.5"
>     , "Accept-Charset:  ISO-8859-1,utf-8;q=0.7,*;q=0.7"
>     , "Keep-Alive:      115"
>     , "Connection:      keep-alive"
>     , "X-Requested-With:        XMLHttpRequest"
>     , "X-Prototype-Version:     1.6.0.3"
>     ]
>     , CurlEncoding "gzip,deflate"
>   ]


> curlResp :: (Error e, MonadError e m, MonadIO m) =>
>       Curl -> URLString -> [CurlOption] -> m String --CurlResponse
> curlResp curl url opts = do
>   resp <- liftIO $ (do_curl_ curl url opts :: IO CurlResponse)
>   let code   = respCurlCode resp
>       status = respStatus resp
>   if code /= CurlOK || status /= 200
>      then throwError $ strMsg $ "Error: " ++ show code ++ " -- " ++ show
status
>      else return $ respBody resp


------------------------------------------------------------------------------
   URLs
------------------------------------------------------------------------------

> (//) :: String -> String -> String
> a // b = a ++ "/" ++ b

> urlBase :: URLString
> urlBase = "https://172.16.1.18"

> urlInitial = urlBase // "showLogon.do"
> urlLogin   = urlBase // "default/showLogon.do"
> urlFlash1  = urlBase // "showFlashCheck.do"
> urlFlash2  = urlBase // "logon.do?flashVersion=10.1.102"
> urlShowWebForwards = urlBase // "showWebForwards.do"
> urlGetResource  = urlBase //
"resourceList.do?form=webForwardsForm&readOnly=false&policyLaunching=true&resourcePrefix=webForwards&path=%2FshowWebForwards.do&messageResourcesKey=webForwards&actionPath=%2FresourceList.do"

> urlCreateWebForward = urlBase //
"quickCreateWebForward.do?actionTarget=create&actionPath=%2FquickCreateWebForward.do%3FactionTarget%3Dcreate&quickCreate=true&evalJS=true&evalScripts=true"


> urlLogOut = urlBase // "logoff.do"


> urlQuickCreate = urlBase //
"quickCreateWebForward.do?actionTarget=create&actionPath=%2FquickCreateWebForward.do%3FactionTarget%3Dcreate&quickCreate=true&evalJS=true&evalScripts=true"




> module HtmlParsing
>     (
>       obtainCookies
>      ,generateResourceHtml
>     ) where

> import Text.HTML.TagSoup
> import Data.List.Split
> import Data.List
> import Data.Maybe
> import Numeric
> import Network.Curl
> import Control.Monad
> import Control.Monad.Error
> import SessionCreator


> processHTML :: String -> [[String]]
> processHTML htmlFILE =
>            let parsedHTML        = parseTags htmlFILE
>                allTagOpens       = sections (~== TagOpen "a"
[("href","")]) parsedHTML
>                taggedTEXT        = head $ map (filter isTagOpen)
allTagOpens
>                allHREFS          = map (fromAttrib "href") taggedTEXT
>                allPotentials     = map (dropWhile (/= '?')) allHREFS
>                removedNulls      = filter (not . null) allPotentials
>                removedQs         = map (drop 1) removedNulls
>            in map (splitOn "&") removedQs

> createIDList :: [[String]] -> [String]
> createIDList strungPairs =
>              let nvList = map (map (splitOn "=")) strungPairs
>                  nvTuple = map (map (list2Tuple)) nvList
>                  netofResources =  map (lookup "resourceId") nvTuple
>                  removedNothings = catMaybes netofResources
>                  sortedIds = (map head . group . sort) $ map (\x -> read x
:: Int) removedNothings
>              in  map ("ResourceId" %%) $ map show sortedIds

> list2Tuple :: [a] -> (a,a)
> list2Tuple [x,y] = (x,y)


> (%%) :: String -> String -> String
> a %% b = a ++ "=" ++ b



> generateResourceHtml :: Curl -> String -> String -> IO (Either String
String)
> generateResourceHtml curl user pass = do
>   let makeIDPage = do
>       curlResp curl urlInitial method_GET
>       curlResp curl urlLogin $ loginOpts user pass
>       curlResp curl urlFlash1 resourceOpts
>       curlResp curl urlFlash2 resourceOpts
>       curlResp curl urlShowWebForwards resourceOpts
>       curlResp curl urlQuickCreate resourceOpts
>       curlResp curl urlGetResource resourceOpts
>   runErrorT makeIDPage


> obtainCookies :: Curl -> String -> IO ()
> obtainCookies curl responseBody = do
>               mapM_ (flip (curlResp curl) resourceOpts) $
>                     screenScraping responseBody

> screenScraping :: String -> [String]
> screenScraping responseBody =
>                let collectedStrings = processHTML responseBody
>                    collectedIDLists = createIDList collectedStrings
>                    in constructedResourceURIs urlBase collectedIDLists


> constructedResourceURIs :: String -> [String] -> [String]
> constructedResourceURIs url resourceIDs =
>                         let frontURI = url ++ "/launchWebForward.do?"
>                             midURI = map (frontURI ++) resourceIDs
>                         in map (++
"&policy=0&returnTo=%2FshowWebForwards.do") midURI


Here is the toy (produces a cookiejar) program.
Attached are all the output files

{-# LANGUAGE FlexibleContexts #-}
module Main where

------------------------------------------------------------------------------
-- Imports
------------------------------------------------------------------------------

import Control.Monad
import Control.Monad.Error
import Network.Curl
import System.Environment (getArgs)

------------------------------------------------------------------------------
-- Curl
------------------------------------------------------------------------------

initCurl :: IO Curl
initCurl = do
  curl <- initialize
  setopts curl curlOpts
  return curl

curlOpts :: [CurlOption]
curlOpts =
  [ CurlCookieSession True
  , CurlCookieJar  "cookies.txt"
  , CurlCookieFile "cookies.txt"
  , CurlFollowLocation True
  , CurlUserAgent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
  , CurlVerbose True
  ]

loginOpts :: String -> String -> [CurlOption]
loginOpts user pass =
  [ CurlFailOnError True
  , CurlPost True
  , CurlNoBody False
  , CurlPostFields
    [ "_charset_=UTF-8"
    , "javaScript=true"
    , "username=" ++ user
    , "password=" ++ pass
    ]
  ]

curlResp
  :: (MonadError String m, MonadIO m)
  => Curl -> URLString -> [CurlOption] -> m String --CurlResponse
curlResp curl url opts = do
  resp <- liftIO $ (do_curl_ curl url opts :: IO CurlResponse)
  let code   = respCurlCode resp
      status = respStatus resp
  if code /= CurlOK || status /= 200
     then throwError $ "Error: " ++ show code ++ " -- " ++ show status
     else return $ respBody resp

------------------------------------------------------------------------------
-- URLs
------------------------------------------------------------------------------

(//) :: String -> String -> String
a // b = a ++ "/" ++ b

urlBase :: URLString
urlBase = "https://172.16.1.18"

urlInitial = urlBase // "showLogon.do"
urlLogin   = urlBase // "default/showLogon.do"
urlFlash1  = urlBase // "showFlashCheck.do"
urlFlash2  = urlBase // "logon.do?flashVersion=9.0.100"
urlLaunch  = urlBase //
"launchWebForward.do?resourceId=4&policy=0&returnTo=%2FshowWebForwards.do"
urlLaunchTest = urlBase //
"launchWebForward.do?resourceId=5&policy=0&returnTo=%2FshowWebForwards.do"

curlResp
  :: (MonadError String m, MonadIO m)
  => Curl -> URLString -> [CurlOption] -> m String --CurlResponse
curlResp curl url opts = do
  resp <- liftIO $ (do_curl_ curl url opts :: IO CurlResponse)
  let code   = respCurlCode resp
      status = respStatus resp
  if code /= CurlOK || status /= 200
     then throwError $ "Error: " ++ show code ++ " -- " ++ show status
     else return $ respBody resp

------------------------------------------------------------------------------
-- URLs
------------------------------------------------------------------------------

(//) :: String -> String -> String
a // b = a ++ "/" ++ b

urlBase :: URLString
urlBase = "https://172.16.1.18"

urlInitial = urlBase // "showLogon.do"
urlLogin   = urlBase // "default/showLogon.do"
urlFlash1  = urlBase // "showFlashCheck.do"
urlFlash2  = urlBase // "logon.do?flashVersion=9.0.100"
urlLaunch  = urlBase //
"launchWebForward.do?resourceId=4&policy=0&returnTo=%2FshowWebForwards.do"
urlLaunchTest = urlBase //
"launchWebForward.do?resourceId=5&policy=0&returnTo=%2FshowWebForwards.do"

------------------------------------------------------------------------------
-- IO
------------------------------------------------------------------------------

launch :: String -> String -> IO (Either String String)
launch user pass = do
  -- Initialize Curl
  curl <- initCurl

  -- Sequence of steps
  let steps = do
      curlResp curl urlInitial method_GET
      curlResp curl urlLogin $ loginOpts user pass
      curlResp curl urlFlash1 method_GET
      curlResp curl urlFlash2 method_GET
      curlResp curl urlLaunch method_GET
      -- curlResp curl urlLaunchTest method_GET
      -- You'd use this to catch the exception right away
      -- `catchError` \e-> ...
  runErrorT steps

main :: IO ()
main = do
  -- username and password
  user:pass:_ <- getArgs

  -- Launch webpage
  resp <- launch user pass

  -- Response comes as Either String String
  -- You have to handle each case
  case resp of
    Left  err  -> print err
    Right body -> putStrLn body



So to sum up again, I'm trying to figure out why the toy program generates a
cookie jar and the other program does everything it should *except* produce
a cookie jar. Thanks to anyone who wants to slog through this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20110131/5d5cd349/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: production_headers
Type: application/octet-stream
Size: 57684 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/beginners/attachments/20110131/5d5cd349/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: production_log
Type: application/octet-stream
Size: 1565292 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/beginners/attachments/20110131/5d5cd349/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: toy_headers
Type: application/octet-stream
Size: 24335 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/beginners/attachments/20110131/5d5cd349/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: toy_strace
Type: application/octet-stream
Size: 713169 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/beginners/attachments/20110131/5d5cd349/attachment-0007.obj>


More information about the Beginners mailing list