[Haskell-cafe] Shouldn't this be lazy???
Olivier Boudry
olivier.boudry at gmail.com
Fri Jun 22 10:58:22 EDT 2007
Hi all,
I'm playing with the TagSoup library trying to extract links to
original pictures from my Flickr Sets page. This programs first loads
the Sets page, open links to each set, get links to pictures and then
search for original picture link (see steps in main function).
It does the job, but for the tests I just wanted to take 10 links to
reduce the time the program runs. Just hoping that haskell laziness
would magically take the minimum amount of data required to get the
first 10 links out of this set of pages.
I did this replacing:
(putStrLn . unlines . concat) origLinks
with
(putStrLn . unlines . take 10 . concat) origLinks
in the main function.
With the last version of that line, I effectively only get 10 links
but the runtime is exactly the same for both main functions.
As I'm a newbie haskell programmer I certainly missing something.
By the way I know Flickr has an api I could use, but the purpose was
playing with TagSoup.
Thanks for any advice.
Olivier.
Here's the code:
module Main where
import Data.Html.TagSoup
import Control.Monad (liftM)
import Data.List (isPrefixOf, groupBy)
import Data.Maybe (mapMaybe)
import System (getArgs)
import System.Time
import IO (hPutStrLn, stderr)
base = "http://www.flickr.com"
setsUrl name = "/photos/" ++ name ++ "/sets/"
main :: IO ()
main = do
args <- getArgs
tStart <- getClockTime
setLinks <- getLinksByAttr ("class", "Seta") (base ++ setsUrl (args !! 0))
picLinks <- mapM (getLinksByAttr ("class", "image_link")) setLinks
origLinks <- mapM (getLinksAfterImgByAttr ("src",
"http://l.yimg.com/www.flickr.com/images/icon_download.gif")) $
(mapMaybe linkToOrigSize . concat) picLinks
(putStrLn . unlines . concat) origLinks
tEnd <- getClockTime
hPutStrLn stderr ( timeDiffToString $ diffClockTimes tEnd tStart )
-- | extract all links from "a" tag types having given attribute
getLinksByAttr :: (String, String) -> String -> IO [String]
getLinksByAttr attr url = do
sects <- getSectionsByTypeAndAttr "a" attr url
return $ hrefs sects
-- | get "a" tags following a "img" having a specific attribute
getLinksAfterImgByAttr :: (String, String) -> String -> IO [String]
getLinksAfterImgByAttr attr url = do
sects <- getSectionsByTypeAndAttr "img" attr url
return $ hrefs $ map (dropWhile (not . isTagOpen) . drop 1) sects
-- | create sections from tag type and attribute
getSectionsByTypeAndAttr :: String -> (String, String) -> String -> IO [[Tag]]
getSectionsByTypeAndAttr tagType attr url = do
tags <- liftM parseTags $ openURL $ url
(return . filterByTypeAndAttr tagType attr) tags
where
filterByTypeAndAttr :: String -> (String, String) -> [Tag] -> [[Tag]]
filterByTypeAndAttr t a = sections (~== TagOpen t [a])
-- | extract href values from sections of "a" tags
hrefs :: [[Tag]] -> [String]
hrefs = map (addBase . fromAttrib "href" . head)
where
addBase :: String -> String
addBase s | "http://" `isPrefixOf` s = s
addBase s | otherwise = base ++ s
-- | transform a link to a picture into a link to the original size picture
linkToOrigSize :: String -> Maybe String
linkToOrigSize link =
if parts !! 3 == "photos" then
Just $ newUrl parts
else
Nothing
where
parts = map tail $ groupBy (const(/='/')) link
newUrl p = "http://www.flickr.com/photo_zoom.gne?id=" ++ p !! 5 ++
"&size=o&context=" ++ p !! 7
More information about the Haskell-Cafe
mailing list