- Managed to remove the complexities of the
ReaderTthingy with the app and introduced a much simpler type:
type App a = Config -> ExceptT AppError IO a
- So now, any function that needs the config can simply be a function that takes a
Configand returns anExceptT AppError IO a. - I also updated the feed-item extracting part of the code so that it can, in one pass, work for both Youtube feeds and other RSS XML 2.0 spec feeds — the difference in YT feeds is that they use
<link href="..." />instead of<link>{actual_url}</link>, and they use<published>and<updated>instead of<pubDate>.. basically different specs. The way I handled this was to use the wonderful<|>(alternative) operator. My use-case has been very simple so far thankfully — I just try to extract/parse (usingtagsoup) and then return aMaybe String. Combined with the<|>operator, I get what I want - if the item has<pubDate>, that's extracted and returned. But if it has<updated>instead, that will be extracted and sent. If both exist, the first will be returned.
extractData :: [Tag String] -> FeedItem
extractData tags =
let title = getInnerText $ takeBetween "<title>" "</title>" tags
linkFromYtFeed = extractLinkHref tags -- youtube specific
link = case getInnerText $ takeBetween "<link>" "</link>" tags of
"" -> Nothing
x -> Just x
pubDate = case getInnerText $ takeBetween "<pubDate>" "</pubDate>" tags of
"" -> Nothing
x -> Just x
updatedDate = case getInnerText $ takeBetween "<updated>" "</updated>" tags of
"" -> Nothing
x -> Just x
updated = pubDate <|> updatedDate
in FeedItem {title = title, link = link <|> linkFromYtFeed, updated = fromMaybe "" updated}
- The success in
extractDatadid not last by the time it came to working with functions that had to interface with the DB. Had a lot of trouble writing that one functionprocessFeed :: URL -> App ()which takes a URL (String), and an app config (Config) and then actually fetches the contents, then writes to the database. This was because I had a tough time unwrapping/wrapping toIO ()without losing the errors.. but then, I realized I had not really thought about error/crash strategy: like, when I extract feed items and attempt to write each link to the database, should I crash at the first error or should I just log it to the console and carry on with the next item? - I decided that it was best to log and continue to the next. Then came the question of which function should do the logging? For now, the
insertFeedItem(which takes a feed item and writes to the DB) is the one that logs... but that came about after so much wrangling with these functions. At the end of the day, I went with really simplifiedinsertFeedItemwhose return type went fromExceptT ... ()toIO ()... which I am not really happy about because a thrown error is not caught. - Eventually,
insertFeedItemgot wrapped insideprocessFeed(which processes a single feed URL), which then got wrapped insideprocessFeedswhich processes a list of URLs, simply mapping over theprocessFeedfunction. - Overall — the good thing is that I managed to get a working function that took a URL and fed the feed items into the table. But the code is already somewhat confusing and there were times where things compiled and worked but I just couldn't get a complete understanding of what was going on in the wrapping/unwrapping — a feeling I commonly encounter when dealing with these functional programming languages.
- I spent time on finalizing the table schemas. The
linkis the primary key so duplicate inserts are prevented at the DB level besides being prevented at the code level. - I am also briefly mulling if I could put all the code in a single file. I have modules around functions (like DB-related, RSS parsing related etc) but they are already coupled by sharing of types and whatnot. So I am thinking maybe for now I will put them all in a single Main.hs file and then extract out modules based on what natural grouping requirements emerge in that file.