Sunday, January 15, 2012

LYAH Chapter 7 - Modules


Modules
Loading modules
  • module = collection of related functions, types and typeclasses.
  • program = collection of modules where the main module loads up the other modules and then uses the functions defined in them to do something.
  • import <module name> must be done before defining any functions
  • all the functions become available in the global namespace
  • Data.List.nub takes a list and weeds out duplicate elements
  • Can put the functions of modules into the global namespace when using GHCI. 
    • ghci> :m + Data.List Data.Map Data.Set  
  • If you've loaded a script that already imports a module, you don't need to use :m + to get access to it.
  • Selectively import: import Data.List (nub, sort)  
  • Import all functions except a few select ones: import Data.List hiding (nub)  
  • Qualified imports: 
    • import qualified Data.Map  -->  Data.Map.filter
    • import qualified Data.Map as M  -->  M.filter.
  • Use this handy reference to see which modules are in the standard library.
  • Use Hoogle to search for functions or to find out where they're located.

Data.List
  • The Prelude module exports some functions (like map and filter) from Data.List for convenience. 
  • You don't have to import Data.List via a qualified import because it doesn't clash Prelude.
  • intersperse takes an element and a list and then puts that element in between each pair of elements in the list.
  • intercalate takes a list of lists and a list. It then inserts that list in between all those lists and then flattens the result.
  • transpose transposes a list of lists. If you look at a list of lists as a 2D matrix, the columns become the rows and vice versa.
  • foldl' and foldl1' are stricter versions of their respective lazy incarnations. When using lazy folds on really big lists, you might often get a stack overflow error. The culprit for that is that due to the lazy nature of the folds, the accumulator value isn't actually updated as the folding happens. What actually happens is that the accumulator kind of makes a promise that it will compute its value when asked to actually produce the result (also called a thunk). That happens for every intermediate accumulator and all those thunks overflow your stack. The strict folds aren't lazy buggers and actually compute the intermediate values as they go along instead of filling up your stack with thunks. So if you ever get stack overflow errors when doing lazy folds, try switching to their strict versions.
  • concat flattens a list of lists into just a list of elements.
  • concatMap is the same as first mapping a function to a list and then concatenating the list with concat.
  • and takes a list of boolean values and returns True only if all the values in the list are True.
  • or is like and, only it returns True if any of the boolean values in a list is True.
  • any and all take a predicate and then check if any or all the elements in a list satisfy the predicate, respectively. Usually we use these two functions instead of mapping over a list and then doing and or or.
  • iterate takes a function and a starting value. It applies the function to the starting value, then it applies that function to the result, then it applies the function to that result again, etc. It returns all the results in the form of an infinite list.
  • splitAt takes a number and a list. It then splits the list at that many elements, returning the resulting two lists in a tuple.
  • takeWhile takes elements from a list while the predicate holds and then when an element is encountered that doesn't satisfy the predicate, it's cut off.
  • dropWhile is similar, only it drops all the elements while the predicate is true. Once predicate equates to False, it returns the rest of the list.
  • span is kind of like takeWhile, only it returns a pair of lists. The first list contains everything the resulting list from takeWhile would contain if it were called with the same predicate and the same list. The second list contains the part of the list that would have been dropped.
  • break breaks the list when the predicate is first true (whereas span spans the list while the predicate is true). Doing break p is the equivalent of doing span (not . p).  When using break, the second list in the result will start with the first element that satisfies the predicate.
  • sort simply sorts a list. The type of the elements in the list has to be part of the Ord typeclass, because if the elements of a list can't be put in some kind of order, then the list can't be sorted.
  • group takes a list and groups adjacent elements into sublists if they are equal.
  • inits and tails are like init and tail, only they recursively apply that to a list until there's nothing left.
  • isInfixOf searches for a sublist within a list and returns True if the sublist we're looking for is somewhere inside the target list.
  • isPrefixOf and isSuffixOf search for a sublist at the beginning and at the end of a list, respectively.
  • elem and notElem check if an element is or isn't inside a list.
  • partition takes a list and a predicate and returns a pair of lists. The first list in the result contains all the elements that satisfy the predicate, the second contains all the ones that don't.
    • While span and break are done once they encounter the first element that doesn't and does satisfy the predicate, partition goes through the whole list and splits it up according to the predicate.
  • find takes a list and a predicate and returns the first element that satisfies the predicate. But it returns that element wrapped in a Maybe value. 
  • elemIndex is kind of like elem, only it doesn't return a boolean value. It maybe returns the index of the element we're looking for. If that element isn't in our list, it returns a Nothing.
  • elemIndices is like elemIndex, only it returns a list of indices, in case the element we're looking for crops up in our list several times. Failure can be represented as the empty list, which is very much synonymous to Nothing.
  • findIndex is like find, but it maybe returns the index of the first element that satisfies the predicate. 
  • indIndices returns the indices of all elements that satisfy the predicate in the form of a list.
  • zip3zip4, etc. and zipWith3zipWith4, etc. These variants go up to 7.  We already covered zip and zipWith. We noted that they zip together two lists, either in a tuple or with a binary function (meaning such a function that takes two parameters). But what if we want to zip together three lists? Or zip three lists with a function that takes three parameters? 
  • lines is a useful function when dealing with files or input from somewhere. It takes a string and returns every line of that string in a separate list.
  • unlines is the inverse function of lines. It takes a list of strings and joins them together using a '\n'.
  • words and unwords are for splitting a line of text into words or joining a list of words into a text. Very useful.
  • nub takes a list and weeds out the duplicate elements, returning a list whose every element is a unique snowflake! The function does have a kind of strange name. It turns out that "nub" means a small lump or essential part of something. In my opinion, they should use real words for function names instead of old-people words.
  • delete takes an element and a list and deletes the first occurence of that element in the list.
  • \\ is the list difference function. It acts like a set difference, basically. For every element in the right-hand list, it removes a matching element in the left one.  Doing [1..10] \\ [2,5,9] is like doing delete 2 . delete 5 . delete 9 $ [1..10].
  • union also acts like a function on sets. It returns the union of two lists. It pretty much goes over every element in the second list and appends it to the first one if it isn't already in yet. Watch out though, duplicates are removed from the second list!
  • intersect works like set intersection. It returns only the elements that are found in both lists.
  • insert takes an element and a list of elements that can be sorted and inserts it into the last position where it's still less than or equal to the next element. In other words, insert will start at the beginning of the list and then keep going until it finds an element that's equal to or greater than the element that we're inserting and it will insert it just before the element. 
  • What lengthtakedropsplitAt!! and replicate have in common is that they take an Int as one of their parameters (or return anInt), even though they could be more generic and usable if they just took any type that's part of the Integral or Num typeclasses (depending on the functions). They do that for historical reasons. However, fixing that would probably break a lot of existing code. That's why Data.List has their more generic equivalents, named genericLengthgenericTakegenericDropgenericSplitAt,genericIndex and genericReplicate. For instance, length has a type signature of length :: [a] -> Int. If we try to get the average of a list of numbers by doing let xs = [1..6] in sum xs / length xs, we get a type error, because you can't use / with anIntgenericLength, on the other hand, has a type signature of genericLength :: (Num a) => [b] -> a. Because a Num can act like a floating point number, getting the average by doing let xs = [1..6] in sum xs / genericLength xs works out just fine.
  • The nubdeleteunionintersect and group functions all have their more general counterparts called nubBydeleteBy,unionByintersectBy and groupBy take an equality function and then compare them by using that equality function. group is the same as groupBy (==).
  • on is an even clearer way to write equality functions for the By functions. Data.Function.on is defined like this:
    • on :: (b -> b -> c) -> (a -> b) -> a -> a -> c  
    • f `on` g = \x y -> f (g x) (g y)  
    • So doing (==) `on` (> 0) returns an equality function that looks like \x y -> (x > 0) == (y > 0)
    • on is used a lot with the By functions because with it, we can do:
      • ghci> groupBy ((==) `on` (> 0)) values  -->  [[-4.3,-2.4,-1.2],[0.4,2.3,5.9,10.5,29.1,5.3],[-2.4,-14.5],[2.9,2.3]]  
    • "Group this by equality on whether the elements are greater than zero."
  • sortByinsertBymaximumBy and minimumBy take a function that determine if one element is greater, smaller or equal to the other. The type signature of sortBy is sortBy :: (a -> a -> Ordering) -> [a] -> [a]. If you remember from before, the Ordering type can have a value of LTEQ or GTsort is the equivalent of sortBy compare, because compare just takes two elements whose type is in the Ord typeclass and returns their ordering relationship.
  • compare `on` length ... is the equivalent of \x y -> length x `compare` length y
  • When you're dealing with By functions that take an equality function, you usually do (==) `on` something and when you're dealing with By functions that take an ordering function, you usually do compare `on` something.
Data.Char

  • Helpful when filtering and mapping over strings because they're just lists of characters.
  • Data.Char exports a bunch of predicates over characters. That is, functions that take a character and tell us whether some assumption about it is true or false. Here's what they are:
  • isControl checks whether a character is a control character.
  • isSpace checks whether a character is a white-space characters. That includes spaces, tab characters, newlines, etc.
  • isLower checks whether a character is lower-cased.
  • isUpper checks whether a character is upper-cased.
  • isAlpha checks whether a character is a letter.
  • isAlphaNum checks whether a character is a letter or a number.
  • isPrint checks whether a character is printable. Control characters, for instance, are not printable.
  • isDigit checks whether a character is a digit.
  • isOctDigit checks whether a character is an octal digit.
  • isHexDigit checks whether a character is a hex digit.
  • isLetter checks whether a character is a letter.
  • isMark checks for Unicode mark characters. Those are characters that combine with preceding letters to form latters with accents. Use this if you are French.
  • isNumber checks whether a character is numeric.
  • isPunctuation checks whether a character is punctuation.
  • isSymbol checks whether a character is a fancy mathematical or currency symbol.
  • isSeparator checks for Unicode spaces and separators.
  • isAscii checks whether a character falls into the first 128 characters of the Unicode character set.
  • isLatin1 checks whether a character falls into the first 256 characters of Unicode.
  • isAsciiUpper checks whether a character is ASCII and upper-case.
  • isAsciiLower checks whether a character is ASCII and lower-case.
  • All these predicates have a type signature of Char -> Bool. Most of the time you'll use this to filter out strings or something like that. For instance, let's say we're making a program that takes a username and the username can only be comprised of alphanumeric characters. We can use the Data.List function all in combination with the Data.Char predicates to determine if the username is alright.
  •  In case you don't remember, all takes a predicate and a list and returns True only if that predicate holds for every element in the list.We can also use isSpace to simulate the Data.List function words.
    • ghci> filter (not . any isSpace) . groupBy ((==) `on` isSpace) $ "hey guys its me"  -->  ["hey","guys","its","me"]  
  • The GeneralCategory type presents us with a few possible categories that a character can fall into. The main function for getting the general category of a character is generalCategory. It has a type of generalCategory :: Char -> GeneralCategory. There are about 31 categories so we won't list them all here, but let's play around with the function.
    • ghci> generalCategory ' '  --> Space  
    • ghci> generalCategory 'A'  -->  UppercaseLetter  
    • ghci> generalCategory 'a'   -->  LowercaseLetter  
    • ghci> generalCategory '.'   -->  OtherPunctuation  
    • ghci> generalCategory '9'   -->  DecimalNumber  
  • Since the GeneralCategory type is part of the Eq typeclass, we can also test for stuff like generalCategory c == Space.
  • toUpper converts a character to upper-case. Spaces, numbers, and the like remain unchanged.
  • toLower converts a character to lower-case.
  • toTitle converts a character to title-case. For most characters, title-case is the same as upper-case.
  • digitToInt converts a character to an Int. To succeed, the character must be in the ranges '0'..'9''a'..'f' or 'A'..'F'.
    • ghci> map digitToInt "34538"   -->  [3,4,5,3,8]  
    • ghci> map digitToInt "FF85AB"   -->  [15,15,8,5,10,11]  
  • intToDigit is the inverse function of digitToInt. It takes an Int in the range of 0..15 and converts it to a lower-case character.
  • The ord and chr functions convert characters to their corresponding numbers and vice versa
    • The difference between the ord values of two characters is equal to how far apart they are in the Unicode table.
Data.Map

  • Association lists (also called dictionaries) are lists that are used to store key-value pairs where ordering doesn't matter. For instance, we might use an association list to store phone numbers, where phone numbers would be the values and people's names would be the keys. We don't care in which order they're stored, we just want to get the right phone number for the right person.
  • The most obvious way to represent association lists in Haskell would be by having a list of pairs. The first component in the pair would be the key, the second component the value. Here's an example of an association list with phone numbers:
  1. phoneBook =   
  2.     [("betty","555-2938")  
  3.     ,("bonnie","452-2928")  
  4.     ,("patsy","493-2928")  
  5.     ,("lucille","205-2928")  
  6.     ,("wendy","939-8282")  
  7.     ,("penny","853-2492")  
  8.     ]  
Despite this seemingly odd indentation, this is just a list of pairs of strings. The most common task when dealing with association lists is looking up some value by key. Let's make a function that looks up some value given a key.
  1. findKey :: (Eq k) => k -> [(k,v)] -> v  
  2. findKey key xs = snd . head . filter (\(k,v) -> key == k) $ xs  
Pretty simple. The function that takes a key and a list, filters the list so that only matching keys remain, gets the first key-value that matches and returns the value. But what happens if the key we're looking for isn't in the association list? Hmm. Here, if a key isn't in the association list, we'll end up trying to get the head of an empty list, which throws a runtime error. However, we should avoid making our programs so easy to crash, so let's use the Maybe data type. If we don't find the key, we'll return a Nothing. If we find it, we'll return Just something, where something is the value corresponding to that key.
  1. findKey :: (Eq k) => k -> [(k,v)] -> Maybe v  
  2. findKey key [] = Nothing  
  3. findKey key ((k,v):xs) = if key == k  
  4.                             then Just v  
  5.                             else findKey key xs  
Look at the type declaration. It takes a key that can be equated, an association list and then it maybe produces a value. Sounds about right.
This is a textbook recursive function that operates on a list. Edge case, splitting a list into a head and a tail, recursive calls, they're all there. This is the classic fold pattern, so let's see how this would be implemented as a fold.
  1. findKey :: (Eq k) => k -> [(k,v)] -> Maybe v  
  2. findKey key = foldr (\(k,v) acc -> if key == k then Just v else acc) Nothing  
Note: It's usually better to use folds for this standard list recursion pattern instead of explicitly writing the recursion because they're easier to read and identify. Everyone knows it's a fold when they see the foldr call, but it takes some more thinking to read explicit recursion.
  1. ghci> findKey "penny" phoneBook  
  2. Just "853-2492"  
  3. ghci> findKey "betty" phoneBook  
  4. Just "555-2938"  
  5. ghci> findKey "wilma" phoneBook  
  6. Nothing  
Works like a charm! If we have the girl's phone number, we Just get the number, otherwise we getNothing.
We just implemented the lookup function from Data.List. If we want to find the corresponding value to a key, we have to traverse all the elements of the list until we find it. The Data.Map module offers association lists that are much faster (because they're internally implemented with trees) and also it provides a lot of utility functions. From now on, we'll say we're working with maps instead of association lists.
Because Data.Map exports functions that clash with the Prelude and Data.List ones, we'll do a qualified import.
  1. import qualified Data.Map as Map  
Put this import statement into a script and then load the script via GHCI.
Let's go ahead and see what Data.Map has in store for us! Here's the basic rundown of its functions.
The fromList function takes an association list (in the form of a list) and returns a map with the same associations.
  1. ghci> Map.fromList [("betty","555-2938"),("bonnie","452-2928"),("lucille","205-2928")]  
  2. fromList [("betty","555-2938"),("bonnie","452-2928"),("lucille","205-2928")]  
  3. ghci> Map.fromList [(1,2),(3,4),(3,2),(5,5)]  
  4. fromList [(1,2),(3,2),(5,5)]  
If there are duplicate keys in the original association list, the duplicates are just discarded. This is the type signature of fromList
  1. Map.fromList :: (Ord k) => [(k, v)] -> Map.Map k v  
It says that it takes a list of pairs of type k and v and returns a map that maps from keys of type k to type v. Notice that when we were doing association lists with normal lists, the keys only had to be equatable (their type belonging to the Eq typeclass) but now they have to be orderable. That's an essential constraint in the Data.Map module. It needs the keys to be orderable so it can arrange them in a tree.
You should always use Data.Map for key-value associations unless you have keys that aren't part of the Ord typeclass.
empty represents an empty map. It takes no arguments, it just returns an empty map.
  1. ghci> Map.empty  
  2. fromList []  
insert takes a key, a value and a map and returns a new map that's just like the old one, only with the key and value inserted.
  1. ghci> Map.empty  
  2. fromList []  
  3. ghci> Map.insert 3 100 Map.empty  
  4. fromList [(3,100)]  
  5. ghci> Map.insert 5 600 (Map.insert 4 200 ( Map.insert 3 100  Map.empty))  
  6. fromList [(3,100),(4,200),(5,600)]  
  7. ghci> Map.insert 5 600 . Map.insert 4 200 . Map.insert 3 100 $ Map.empty  
  8. fromList [(3,100),(4,200),(5,600)]  
We can implement our own fromList by using the empty map, insert and a fold. Watch:
  1. fromList' :: (Ord k) => [(k,v)] -> Map.Map k v  
  2. fromList' = foldr (\(k,v) acc -> Map.insert k v acc) Map.empty  
It's a pretty straightforward fold. We start of with an empty map and we fold it up from the right, inserting the key value pairs into the accumulator as we go along.
null checks if a map is empty.
  1. ghci> Map.null Map.empty  
  2. True  
  3. ghci> Map.null $ Map.fromList [(2,3),(5,5)]  
  4. False  
size reports the size of a map.
  1. ghci> Map.size Map.empty  
  2. 0  
  3. ghci> Map.size $ Map.fromList [(2,4),(3,3),(4,2),(5,4),(6,4)]  
  4. 5  
singleton takes a key and a value and creates a map that has exactly one mapping.
  1. ghci> Map.singleton 3 9  
  2. fromList [(3,9)]  
  3. ghci> Map.insert 5 9 $ Map.singleton 3 9  
  4. fromList [(3,9),(5,9)]  
lookup works like the Data.List lookup, only it operates on maps. It returns Just something if it finds something for the key andNothing if it doesn't.
member is a predicate takes a key and a map and reports whether the key is in the map or not.
  1. ghci> Map.member 3 $ Map.fromList [(3,6),(4,3),(6,9)]  
  2. True  
  3. ghci> Map.member 3 $ Map.fromList [(2,5),(4,5)]  
  4. False  
map and filter work much like their list equivalents.
  1. ghci> Map.map (*100) $ Map.fromList [(1,1),(2,4),(3,9)]  
  2. fromList [(1,100),(2,400),(3,900)]  
  3. ghci> Map.filter isUpper $ Map.fromList [(1,'a'),(2,'A'),(3,'b'),(4,'B')]  
  4. fromList [(2,'A'),(4,'B')]  
toList is the inverse of fromList.
  1. ghci> Map.toList . Map.insert 9 2 $ Map.singleton 4 3  
  2. [(4,3),(9,2)]  
keys and elems return lists of keys and values respectively. keys is the equivalent of map fst . Map.toList and elems is the equivalent of map snd . Map.toList.
fromListWith is a cool little function. It acts like fromList, only it doesn't discard duplicate keys but it uses a function supplied to it to decide what to do with them. Let's say that a girl can have several numbers and we have an association list set up like this.
  1. phoneBook =   
  2.     [("betty","555-2938")  
  3.     ,("betty","342-2492")  
  4.     ,("bonnie","452-2928")  
  5.     ,("patsy","493-2928")  
  6.     ,("patsy","943-2929")  
  7.     ,("patsy","827-9162")  
  8.     ,("lucille","205-2928")  
  9.     ,("wendy","939-8282")  
  10.     ,("penny","853-2492")  
  11.     ,("penny","555-2111")  
  12.     ]  
Now if we just use fromList to put that into a map, we'll lose a few numbers! So here's what we'll do:
  1. phoneBookToMap :: (Ord k) => [(k, String)] -> Map.Map k String  
  2. phoneBookToMap xs = Map.fromListWith (\number1 number2 -> number1 ++ ", " ++ number2) xs  
  1. ghci> Map.lookup "patsy" $ phoneBookToMap phoneBook  
  2. "827-9162, 943-2929, 493-2928"  
  3. ghci> Map.lookup "wendy" $ phoneBookToMap phoneBook  
  4. "939-8282"  
  5. ghci> Map.lookup "betty" $ phoneBookToMap phoneBook  
  6. "342-2492, 555-2938"  
If a duplicate key is found, the function we pass is used to combine the values of those keys into some other value. We could also first make all the values in the association list singleton lists and then we can use ++ to combine the numbers.
  1. phoneBookToMap :: (Ord k) => [(k, a)] -> Map.Map k [a]  
  2. phoneBookToMap xs = Map.fromListWith (++) $ map (\(k,v) -> (k,[v])) xs  
  1. ghci> Map.lookup "patsy" $ phoneBookToMap phoneBook  
  2. ["827-9162","943-2929","493-2928"]  
Pretty neat! Another use case is if we're making a map from an association list of numbers and when a duplicate key is found, we want the biggest value for the key to be kept.
  1. ghci> Map.fromListWith max [(2,3),(2,5),(2,100),(3,29),(3,22),(3,11),(4,22),(4,15)]  
  2. fromList [(2,100),(3,29),(4,22)]  
Or we could choose to add together values on the same keys.
  1. ghci> Map.fromListWith (+) [(2,3),(2,5),(2,100),(3,29),(3,22),(3,11),(4,22),(4,15)]  
  2. fromList [(2,108),(3,62),(4,37)]  
insertWith is to insert what fromListWith is to fromList. It inserts a key-value pair into a map, but if that map already contains the key, it uses the function passed to it to determine what to do.
  1. ghci> Map.insertWith (+) 3 100 $ Map.fromList [(3,4),(5,103),(6,339)]  
  2. fromList [(3,104),(5,103),(6,339)]  
These were just a few functions from Data.Map. You can see a complete list of functions in the documentation.
Data.Set
Description: legosets
The Data.Set module offers us, well, sets. Like sets from mathematics. Sets are kind of like a cross between lists and maps. All the elements in a set are unique. And because they're internally implemented with trees (much like maps in Data.Map), they're ordered. Checking for membership, inserting, deleting, etc. is much faster than doing the same thing with lists. The most common operation when dealing with sets are inserting into a set, checking for membership and converting a set to a list.
Because the names in Data.Set clash with a lot of Prelude and Data.List names, we do a qualified import.
Put this import statement in a script:
  1. import qualified Data.Set as Set  
And then load the script via GHCI.
Let's say we have two pieces of text. We want to find out which characters were used in both of them.
  1. text1 = "I just had an anime dream. Anime... Reality... Are they so different?"  
  2. text2 = "The old man left his garbage can out and now his trash is all over my lawn!"  
The fromList function works much like you would expect. It takes a list and converts it into a set.
  1. ghci> let set1 = Set.fromList text1  
  2. ghci> let set2 = Set.fromList text2  
  3. ghci> set1  
  4. fromList " .?AIRadefhijlmnorstuy"  
  5. ghci> set2  
  6. fromList " !Tabcdefghilmnorstuvwy"  
As you can see, the items are ordered and each element is unique. Now let's use the intersection function to see which elements they both share.
  1. ghci> Set.intersection set1 set2  
  2. fromList " adefhilmnorstuy"  
We can use the difference function to see which letters are in the first set but aren't in the second one and vice versa.
  1. ghci> Set.difference set1 set2  
  2. fromList ".?AIRj"  
  3. ghci> Set.difference set2 set1  
  4. fromList "!Tbcgvw"  
Or we can see all the unique letters used in both sentences by using union.
  1. ghci> Set.union set1 set2  
  2. fromList " !.?AIRTabcdefghijlmnorstuvwy"  
The nullsizememberemptysingletoninsert and delete functions all work like you'd expect them to.
  1. ghci> Set.null Set.empty  
  2. True  
  3. ghci> Set.null $ Set.fromList [3,4,5,5,4,3]  
  4. False  
  5. ghci> Set.size $ Set.fromList [3,4,5,3,4,5]  
  6. 3  
  7. ghci> Set.singleton 9  
  8. fromList [9]  
  9. ghci> Set.insert 4 $ Set.fromList [9,3,8,1]  
  10. fromList [1,3,4,8,9]  
  11. ghci> Set.insert 8 $ Set.fromList [5..10]  
  12. fromList [5,6,7,8,9,10]  
  13. ghci> Set.delete 4 $ Set.fromList [3,4,5,4,3,4,5]  
  14. fromList [3,5]  
We can also check for subsets or proper subset. Set A is a subset of set B if B contains all the elements that A does. Set A is a proper subset of set B if B contains all the elements that A does but has more elements.
  1. ghci> Set.fromList [2,3,4] `Set.isSubsetOf` Set.fromList [1,2,3,4,5]  
  2. True  
  3. ghci> Set.fromList [1,2,3,4,5] `Set.isSubsetOf` Set.fromList [1,2,3,4,5]  
  4. True  
  5. ghci> Set.fromList [1,2,3,4,5] `Set.isProperSubsetOf` Set.fromList [1,2,3,4,5]  
  6. False  
  7. ghci> Set.fromList [2,3,4,8] `Set.isSubsetOf` Set.fromList [1,2,3,4,5]  
  8. False  
We can also map over sets and filter them.
  1. ghci> Set.filter odd $ Set.fromList [3,4,5,6,7,2,3,4]  
  2. fromList [3,5,7]  
  3. ghci> Set.map (+1) $ Set.fromList [3,4,5,6,7,2,3,4]  
  4. fromList [3,4,5,6,7,8]  
Sets are often used to weed a list of duplicates from a list by first making it into a set with fromList and then converting it back to a list withtoList. The Data.List function nub already does that, but weeding out duplicates for large lists is much faster if you cram them into a set and then convert them back to a list than using nub. But using nub only requires the type of the list's elements to be part of the Eqtypeclass, whereas if you want to cram elements into a set, the type of the list has to be in Ord.
  1. ghci> let setNub xs = Set.toList $ Set.fromList xs  
  2. ghci> setNub "HEY WHATS CRACKALACKIN"  
  3. " ACEHIKLNRSTWY"  
  4. ghci> nub "HEY WHATS CRACKALACKIN"  
  5. "HEY WATSCRKLIN"  
setNub is generally faster than nub on big lists but as you can see, nub preserves the ordering of the list's elements, while setNub does not.
Making our own modules
Description: making modules
We've looked at some cool modules so far, but how do we make our own module? Almost every programming language enables you to split your code up into several files and Haskell is no different. When making programs, it's good practice to take functions and types that work towards a similar purpose and put them in a module. That way, you can easily reuse those functions in other programs by just importing your module.
Let's see how we can make our own modules by making a little module that provides some functions for calculating the volume and area of a few geometrical objects. We'll start by creating a file called Geometry.hs.
We say that a module exports functions. What that means is that when I import a module, I can use the functions that it exports. It can define functions that its functions call internally, but we can only see and use the ones that it exports.
At the beginning of a module, we specify the module name. If we have a file called Geometry.hs, then we should name our moduleGeometry. Then, we specify the functions that it exports and after that, we can start writing the functions. So we'll start with this.
  1. module Geometry  
  2. ( sphereVolume  
  3. , sphereArea  
  4. , cubeVolume  
  5. , cubeArea  
  6. , cuboidArea  
  7. , cuboidVolume  
  8. where  
As you can see, we'll be doing areas and volumes for spheres, cubes and cuboids. Let's go ahead and define our functions then:
  1. module Geometry  
  2. ( sphereVolume  
  3. , sphereArea  
  4. , cubeVolume  
  5. , cubeArea  
  6. , cuboidArea  
  7. , cuboidVolume  
  8. where  
  9.   
  10. sphereVolume :: Float -> Float  
  11. sphereVolume radius = (4.0 / 3.0) * pi * (radius ^ 3)  
  12.   
  13. sphereArea :: Float -> Float  
  14. sphereArea radius = 4 * pi * (radius ^ 2)  
  15.   
  16. cubeVolume :: Float -> Float  
  17. cubeVolume side = cuboidVolume side side side  
  18.   
  19. cubeArea :: Float -> Float  
  20. cubeArea side = cuboidArea side side side  
  21.   
  22. cuboidVolume :: Float -> Float -> Float -> Float  
  23. cuboidVolume a b c = rectangleArea a b * c  
  24.   
  25. cuboidArea :: Float -> Float -> Float -> Float  
  26. cuboidArea a b c = rectangleArea a b * 2 + rectangleArea a c * 2 + rectangleArea c b * 2  
  27.   
  28. rectangleArea :: Float -> Float -> Float  
  29. rectangleArea a b = a * b  
Pretty standard geometry right here. There are a few things to take note of though. Because a cube is only a special case of a cuboid, we defined its area and volume by treating it as a cuboid whose sides are all of the same length. We also defined a helper function calledrectangleArea, which calculates a rectangle's area based on the lenghts of its sides. It's rather trivial because it's just multiplication. Notice that we used it in our functions in the module (namely cuboidArea and cuboidVolume) but we didn't export it! Because we want our module to just present functions for dealing with three dimensional objects, we used rectangleArea but we didn't export it.
When making a module, we usually export only those functions that act as a sort of interface to our module so that the implementation is hidden. If someone is using our Geometry module, they don't have to concern themselves with functions that we don't export. We can decide to change those functions completely or delete them in a newer version (we could delete rectangleArea and just use * instead) and no one will mind because we weren't exporting them in the first place.
To use our module, we just do:
  1. import Geometry  
Geometry.hs has to be in the same folder that the program that's importing it is in, though.
Modules can also be given a hierarchical structures. Each module can have a number of sub-modules and they can have sub-modules of their own. Let's section these functions off so that Geometry is a module that has three sub-modules, one for each type of object.
First, we'll make a folder called Geometry. Mind the capital G. In it, we'll place three files: Sphere.hsCuboid.hs, and Cube.hs. Here's what the files will contain:
Sphere.hs
  1. module Geometry.Sphere  
  2. ( volume  
  3. , area  
  4. where  
  5.   
  6. volume :: Float -> Float  
  7. volume radius = (4.0 / 3.0) * pi * (radius ^ 3)  
  8.   
  9. area :: Float -> Float  
  10. area radius = 4 * pi * (radius ^ 2)  
Cuboid.hs
  1. module Geometry.Cuboid  
  2. ( volume  
  3. , area  
  4. where  
  5.   
  6. volume :: Float -> Float -> Float -> Float  
  7. volume a b c = rectangleArea a b * c  
  8.   
  9. area :: Float -> Float -> Float -> Float  
  10. area a b c = rectangleArea a b * 2 + rectangleArea a c * 2 + rectangleArea c b * 2  
  11.   
  12. rectangleArea :: Float -> Float -> Float  
  13. rectangleArea a b = a * b  
Cube.hs
  1. module Geometry.Cube  
  2. ( volume  
  3. , area  
  4. where  
  5.   
  6. import qualified Geometry.Cuboid as Cuboid  
  7.   
  8. volume :: Float -> Float  
  9. volume side = Cuboid.volume side side side  
  10.   
  11. area :: Float -> Float  
  12. area side = Cuboid.area side side side  
Alright! So first is Geometry.Sphere. Notice how we placed it in a folder called Geometry and then defined the module name asGeometry.Sphere. We did the same for the cuboid. Also notice how in all three sub-modules, we defined functions with the same names. We can do this because they're separate modules. We want to use functions from Geometry.Cuboid in Geometry.Cube but we can't just straight up do import Geometry.Cuboid because it exports functions with the same names as Geometry.Cube. That's why we do a qualified import and all is well.
So now if we're in a file that's on the same level as the Geometry folder, we can do, say:
  1. import Geometry.Sphere  
And then we can call area and volume and they'll give us the area and volume for a sphere. And if we want to juggle two or more of these modules, we have to do qualified imports because they export functions with the same names. So we just do something like:
  1. import qualified Geometry.Sphere as Sphere  
  2. import qualified Geometry.Cuboid as Cuboid  
  3. import qualified Geometry.Cube as Cube  
And then we can call Sphere.areaSphere.volumeCuboid.area, etc. and each will calculate the area or volume for their corresponding object.
The next time you find yourself writing a file that's really big and has a lot of functions, try to see which functions serve some common purpose and then see if you can put them in their own module. You'll be able to just import your module the next time you're writing a program that requires some of the same functionality.

No comments:

Post a Comment