Files and streams
getChar is an I/O
action that reads a single character from the terminal.
getLine is an I/O
action that reads a line from the terminal. These two are pretty
straightforward and most programming languages have some functions or
statements that are parallel to them. But now, let's meet
getContents. getContents is an I/O
action that reads everything from the standard input until it encounters an
end-of-file character. Its type is getContents :: IO String. What's cool
about getContents is that it does lazy I/O.
These two programs are the same:
- import Control.Monad
- import Data.Char
- main = forever $ do
- putStr "Give me some input: "
- l <- getLine
- putStrLn $ map toUpper l
Same as:
- import Data.Char
- main = do
- contents <- getContents
- putStr (map toUpper contents)
- $ cat haiku.txt | ./capslocker
- I'M A LIL' TEAPOT
- WHAT'S WITH THAT AIRPLANE FOOD, HUH?
- IT'S SO SMALL, TASTELESS
A program that takes some input and
prints out only those lines that are shorter than 10 characters.
- main = do
- contents <- getContents
- putStr (shortLinesOnly contents)
- shortLinesOnly :: String -> String
- shortLinesOnly input =
- let allLines = lines input
- shortLines = filter (\line -> length line < 10) allLines
- result = unlines shortLines
- in result
interact takes a
function of type String -> String as a parameter and returns an I/O action that
will take some input, run that function on it and then print out the function's
result.
- main = interact $ unlines . filter ((<10) . length) . lines
- respondPalindromes = unlines . map (\xs -> if isPalindrome xs then "palindrome" else "not a palindrome") . lines
- where isPalindrome xs = xs == reverse xs
- main = interact respondPalindromes
Reading and Writing Files
- import System.IO
- main = do
- handle <- openFile "girlfriend.txt" ReadMode
- contents <- hGetContents handle
- putStr contents
- hClose handle
openFile :: FilePath ->
IOMode -> IO Handle takes returns an I/O action that will open a file
and have the file's associated handle encapsulated as its result.
- data IOMode = ReadMode | WriteMode | AppendMode | ReadWriteMode
hGetContents takes a Handle and returns an IO String — an I/O action
that holds as its result the contents of the file.
hClose takes a handle and
returns an I/O action that closes the file.
bracket :: IO a -> (a
-> IO b) -> (a -> IO c) -> IO c
(from Control.Exception) its first parameter
is an I/O action that acquires a resource, such as a file handle. Its second
parameter is a function that releases that resource. This function gets called
even if an exception has been raised. The third parameter is a function that
also takes that resource and does something with it.
- withFile name mode f = bracket (openFile name mode)
- (\handle -> hClose handle)
- (\handle -> f handle)
bracketOnError :: IO a -> (a
-> IO b) -> (a -> IO c) -> IO c
(from Control.Exception) performs the
cleanup only if an exception has been raised.
withFile :: FilePath ->
IOMode -> (Handle -> IO a) -> IO a takes a path to a file, an IOMode and then it
takes a function that takes a handle and returns some I/O action. What it
returns is an I/O action that will open that file, do something we want with
the file and then close it. The result encapsulated in the final I/O action
that's returned is the same as the result of the I/O action that the function
we give it returns.
- import System.IO
- main = do
- withFile "girlfriend.txt" ReadMode (\handle -> do
- contents <- hGetContents handle
- putStr contents)
(\handle -> ... ) is the function
that takes a handle and returns an I/O action and it's usually done like this,
with a lambda.
hGetLine, hPutStr, hPutStrLn, hGetChar, etc. work just like
their counterparts without the h, only they take a handle as a
parameter and operate on that specific file instead of operating on standard
input or standard output.
Reading Files as Strings
readFile :: FilePath ->
IO String takes a path to a file and returns an I/O action that will read that
file (lazily, of course) and bind its contents to something as a string. It's
usually more handy than doing openFile and binding it
to a handle and then doing hGetContents.
- import System.IO
- main = do
- contents <- readFile "girlfriend.txt"
- putStr contents
writeFile :: FilePath ->
String -> IO () takes a path to a file and a string to write to that file and returns
an I/O action that will do the writing
- import System.IO
- import Data.Char
- main = do
- contents <- readFile "girlfriend.txt"
- writeFile "girlfriendcaps.txt" (map toUpper contents)
appendFile has a type
signature that's just like writeFile, only appendFile doesn't
truncate the file to zero length if it already exists but it appends stuff to
it.
ToDo App - Append
- import System.IO
- main = do
- todoItem <- getLine
- appendFile "todo.txt" (todoItem ++ "\n")
- main = do
- withFile "something.txt" ReadMode (\handle -> do
- contents <- hGetContents handle
- putStr contents)
For text files, the default buffering
is line-buffering usually - the smallest part of the file to be read at once is
one line.
For binary files, the default buffering
is usually block-buffering. That means that it will read the file chunk by
chunk. The chunk size is some size that your operating system thinks is cool.
hSetBuffering controls how
exactly buffering is done. It takes a handle and a BufferMode and returns an I/O
action that sets the buffering.
BufferMode is a simple
enumeration data type and the possible values it can hold are: NoBuffering, LineBuffering or BlockBuffering
(Maybe Int). The Maybe Int is for how big the chunk should be, in bytes. If it's Nothing, then the operating
system determines the chunk size. NoBuffering means that it
will be read one character at a time. NoBuffering usually sucks
as a buffering mode because it has to access the disk so much.
- main = do
- withFile "something.txt" ReadMode (\handle -> do
- hSetBuffering handle $ BlockBuffering (Just 2048)
- contents <- hGetContents handle
- putStr contents)
hFlush takes a handle and
returns an I/O action that will flush the buffer of the file associated with
the handle.
ToDo App - Removing
- import System.IO
- import System.Directory
- import Data.List
- main = do
- handle <- openFile "todo.txt" ReadMode
- (tempName, tempHandle) <- openTempFile "." "temp"
- contents <- hGetContents handle
- let todoTasks = lines contents
- numberedTasks = zipWith (\n line -> show n ++ " - " ++ line) [0..] todoTasks
- putStrLn "These are your TO-DO items:"
- putStr $ unlines numberedTasks
- putStrLn "Which one do you want to delete?"
- numberString <- getLine
- let number = read numberString
- newTodoItems = delete (todoTasks !! number) todoTasks
- hPutStr tempHandle $ unlines newTodoItems
- hClose handle
- hClose tempHandle
- removeFile "todo.txt"
- renameFile tempName "todo.txt"
openTempFile (from System.IO) takes a path to a
temporary directory and a template name for a file and opens a temporary file.
We could have also done mapM putStrLn
numberedTasks
We ask the user which one they want to
delete and wait for them to enter a number.
removeFile (System.Directory) takes a path to a
file (not handle) and deletes it.
renameFile (System.Directory) takes a path to a
file (not handle) and renames it.
getArgs:: IO [String] from (System.Environment) is an I/O action
that will get the arguments that the program was run with and have as its
contained result a list with the arguments.
getProgName :: IO String is an I/O action that
contains the program name.
- import System.Environment
- import Data.List
- main = do
- args <- getArgs
- progName <- getProgName
- putStrLn "The arguments are:"
- mapM putStrLn args
- putStrLn "The program name is:"
- putStrLn progName
Full ToDo App
Dispatch association list of command
line arguments -> functions of type [String] -> IO () that take the
argument list as a parameter and return an I/Oaction that does the viewing,
adding, deleting, etc.
- import System.Environment
- import System.Directory
- import System.IO
- import Data.List
- dispatch :: [(String, [String] -> IO ())]
- dispatch = [ ("add", add)
- , ("view", view)
- , ("remove", remove)
- ]
- main = do
- (command:args) <- getArgs
- let (Just action) = lookup command dispatch
- action args
- add :: [String] -> IO ()
- add [fileName, todoItem] = appendFile fileName (todoItem ++ "\n")
- view :: [String] -> IO ()
- view [fileName] = do
- contents <- readFile fileName
- let todoTasks = lines contents
- numberedTasks = zipWith (\n line -> show n ++ " - " ++ line) [0..] todoTasks
- putStr $ unlines numberedTasks
- remove :: [String] -> IO ()
- remove [fileName, numberString] = do
- handle <- openFile fileName ReadMode
- (tempName, tempHandle) <- openTempFile "." "temp"
- contents <- hGetContents handle
- let number = read numberString
- todoTasks = lines contents
- newTodoItems = delete (todoTasks !! number) todoTasks
- hPutStr tempHandle $ unlines newTodoItems
- hClose handle
- hClose tempHandle
- removeFile fileName
- renameFile tempName fileName
random :: (RandomGen g,
Random a) => g -> (a, g) (from System.Random)
RandomGen typeclass is
for types that can act as sources of randomness.
Random typeclass is
for things that can take on random values.
Random takes a random generator (that's
our source of randomness) and returns a random value and a new random
generator.
StdGen that is an
instance of the RandomGen typeclass.
We can either make a StdGen manually or we can
tell the system to give us one based on a multitude of sort of random stuff.
mkStdGen :: Int -> StdGen creates a random
generator. It takes an integer and based on that, gives us a (hardly) random
generator.
- ghci> random (mkStdGen 100) :: (Int, StdGen)
- (-1352021624,651872571 1655838864)
- ghci> random (mkStdGen 949488) :: (Float, StdGen)
- (0.8938442,1597344447 1655838864)
- ghci> random (mkStdGen 949488) :: (Bool, StdGen)
- (False,1485632275 40692)
- ghci> random (mkStdGen 949488) :: (Integer, StdGen)
- (1691547873,1597344447 1655838864)
randoms takes a
generator and returns an infinite sequence of values based on that generator.
- ghci> take 5 $ randoms (mkStdGen 11) :: [Int]
- [-1807975507,545074951,-1015194702,-1622477312,-502893664]
We could make a function that generates
a finite stream of numbers and a new generator like this:
- finiteRandoms :: (RandomGen g, Random a, Num n) => n -> g -> ([a], g)
- finiteRandoms 0 gen = ([], gen)
- finiteRandoms n gen =
- let (value, newGen) = random gen
- (restOfList, finalGen) = finiteRandoms (n-1) newGen
- in (value:restOfList, finalGen)
randomR :: (RandomGen g,
Random a) :: (a, a) -> g -> (a, g) takes as its first parameter a pair of
values that set the lower and upper bounds and the final value produced will be
within those bounds.
- ghci> randomR (1,6) (mkStdGen 359353)
- (6,1494289578 40692)
randomRs produces a stream of
random values within our defined ranges.
- ghci> take 10 $ randomRs ('a','z') (mkStdGen 3) :: [Char]
- "ndkxbvmomg"
I/O Random
getStdGen is an I/O
action, which has a type of IO StdGen. When your program
starts, it asks the system for a good random number generator and stores that
in a so called global generator. getStdGen fetches you
that global random generator when you bind it to something.
- import System.Random
- main = do
- gen <- getStdGen
- putStr $ take 20 (randomRs ('a','z') gen)
Just performing getStdGen twice will ask
the system for the same global generator twice.
newStdGen splits our current
random generator into two generators. It updates the global random generator
with one of them and encapsulates the other as its result.
- import System.Random
- main = do
- gen <- getStdGen
- putStrLn $ take 20 (randomRs ('a','z') gen)
- gen' <- newStdGen
- putStr $ take 20 (randomRs ('a','z') gen')
reads returns an empty
list when it fails to read a string - use it if you don't want your program to
crash on erronous input - it returns a singleton list with a tuple that has our
desired value as one component and a string with what it didn't consume as the
other.
Processing files as strings tends to be
slow. That overhead doesn't bother us so much most of the time, but it turns
out to be a liability when reading big files and manipulating them.
Bytestrings are sort of like lists,
only each element is one byte (or 8 bits) in size. The way they handle laziness
is also different.
Strict bytestrings reside in Data.ByteString and they do
away with the laziness completely - represent a series of bytes in an array - there
are no thunks (the technical term for promise) involved.
Lazy bytestrings reside in Data.ByteString.Lazy - they're lazy, but
not quite as lazy as lists - they are stored in chunks, each chunk has a size
of 64K. Data.ByteString.Lazy has a lot of functions that have the same names as
the ones from Data.List, only the type signatures have ByteString instead
of [a] and Word8 instead of a in them.
- import qualified Data.ByteString.Lazy as B
- import qualified Data.ByteString as S
pack :: [Word8] ->
ByteString takes a list, which is lazy, and making it less lazy, so that it's lazy
only at 64K intervals.
Word8 is like Int but has a much
smaller range, namely 0-255. It represents an 8-bit number. It's in the Num typeclass… e.g. 5 can take the
type of Word8.
- ghci> B.pack [99,97,110]
- Chunk "can" Empty
- ghci> B.pack [98..120]
- Chunk "bcdefghijklmnopqrstuvwx" Empty
If you try to use a big number,
like 336 as a Word8, it will just wrap around to 80.
Empty is like
the [] for lists.
unpack is the inverse
function of pack. It takes a bytestring and turns it into a list of bytes.
fromChunks takes a list of
strict bytestrings and converts it to a lazy bytestring.
toChunks takes a lazy
bytestring and converts it to a list of strict ones.
- ghci> B.fromChunks [S.pack [40,41,42], S.pack [43,44,45], S.pack [46,47,48]]
- Chunk "()*" (Chunk "+,-" (Chunk "./0" Empty))
This is good if you have a lot of small
strict bytestrings and you want to process them efficiently without joining
them into one big strict bytestring in memory first.
cons is the
bytestring version of :. It takes a byte and a bytestring and puts the byte at the beginning.
It's lazy though, so it will make a new chunk even if the first chunk in the
bytestring isn't full.
cons' is the strict
version of cons which is better to use if you're going to be inserting a lot of bytes
at the beginning of a bytestring.
- ghci> B.cons 85 $ B.pack [80,81,82,84]
- Chunk "U" (Chunk "PQRT" Empty)
- ghci> B.cons' 85 $ B.pack [80,81,82,84]
- Chunk "UPQRT" Empty
- ghci> foldr B.cons B.empty [50..60]
- Chunk "2" (Chunk "3" (Chunk "4" (Chunk "5" (Chunk "6" (Chunk "7" (Chunk "8" (Chunk "9" (Chunk ":" (Chunk ";" (Chunk "<"
- Empty))))))))))
- ghci> foldr B.cons' B.empty [50..60]
- Chunk "23456789:;<" Empty
The bytestring modules have a load of
functions that are analogous to those in Data.List and System.IO (only Strings are replaced
with ByteStrings).
If you're using strict bytestrings and
you attempt to read a file, it will read it into memory at once! With lazy
bytestrings, it will read it into neat chunks.
Let's make a simple program that takes
two filenames as command-line arguments and copies the first file into the
second file. Note that System.Directory already has a function called copyFile, but we're going to
implement our own file copying function and program anyway.
- import System.Environment
- import qualified Data.ByteString.Lazy as B
- main = do
- (fileName1:fileName2:_) <- getArgs
- copyFile fileName1 fileName2
- copyFile :: FilePath -> FilePath -> IO ()
- copyFile source dest = do
- contents <- B.readFile source
- B.writeFile dest contents
We make our own function that takes
two FilePaths (remember, FilePath is just a synonym for String) and returns an I/O
action that will copy one file into another using bytestring. In the main function, we
just get the arguments and call our function with them to get the I/O action,
which is then performed.
- $ runhaskell bytestringcopy.hs something.txt ../../something.txt
Notice that a program that doesn't use
bytestrings could look just like this, the only difference is that we
used B.readFile and B.writeFile instead of readFile and writeFile. Many times, you can
convert a program that uses normal strings to a program that uses bytestrings
by just doing the necessary imports and then putting the qualified module names
in front of some functions. Sometimes, you have to convert functions that you
wrote to work on strings so that they work on bytestrings, but that's not hard.
Whenever you need better performance in
a program that reads a lot of data into strings, give bytestrings a try,
chances are you'll get some good performance boosts with very little effort on
your part. I usually write programs by using normal strings and then convert
them to use bytestrings if the performance is not satisfactory.
Exceptions more sense in I/O contexts
because the outside world because it is so unreliable.
Pure code can throw exceptions too they
can only be caught in the I/O part of our code (when we're inside a do block
that goes into main). That's because you don't know when (or if) anything will be evaluated
in pure code, because it is lazy and doesn't have a well-defined order of
execution, whereas I/O code does.
Earlier, we talked about how we should
spend as little time as possible in the I/O part of our program.
The logic of our program should reside
mostly within our pure functions, because their results are dependant only on
the parameters that the functions are called with.
When dealing with pure functions, you
only have to think about what a function returns, because it can't do anything
else.
This makes your life easier.
Even though doing some logic in I/O is
necessary (like opening files and the like), it should preferably be kept to a
minimum.
Pure functions are lazy by default,
which means that we don't know when they will be evaluated and that it really
shouldn't matter.
However, once pure functions start
throwing exceptions, it matters when they are evaluated.
That's why we can only catch exceptions
thrown from pure functions in the I/O part of our code.
And that's bad, because we want to keep
the I/O part as small as possible. However, if we don't catch them in the I/O
part of our code, our program crashes. The solution?
Don't mix exceptions and pure code.
Take advantage of Haskell's powerful type system and use types like Either and Maybe to represent
results that may have failed.
I/O exceptions are exceptions that are
caused when something goes wrong while we are communicating with the outside
world in an I/O action that's part of main.
- ...
- contents <- readFile fileName
- ...
- $ runhaskell linecount.hs i_dont_exist.txt
- linecount.hs: i_dont_exist.txt: openFile: does not exist (No such file or directory)
Our program crashes.
What if we wanted to print out a nicer
message if the file doesn't exist?
doesFileExist :: FilePath ->
IO Bool (from System.Directory.) checks if a file exists…
- import System.Environment
- import System.IO
- import System.Directory
- main = do (fileName:_) <- getArgs
- fileExists <- doesFileExist fileName
- if fileExists
- then do contents <- readFile fileName
- putStrLn $ "The file has " ++ show (length (lines contents)) ++ " lines!"
- else do putStrLn "The file doesn't exist!"
Another solution here would be to use
exceptions. It's perfectly acceptable to use them in this context. A file not
existing is an exception that arises from I/O, so catching it in I/O is fine
and dandy.
catch :: IO a ->
(IOError -> IO a) -> IO a (from System.IO.Error) takes two
parameters - the first one is an I/O action. , the second one is the so-called
handler. If the first I/O action passed to catch throws an I/O
exception, that exception gets passed to the handler, which then decides what
to do.
IOError is a value that
signifies that an I/O exception occurred that also carries information
regarding the type of the exception that was thrown.
We can't inspect values of the type IOError by pattern
matching against them - how this type is implemented depends on the implementation
of the language itself.
We can use a bunch of useful predicates
to find out stuff about values of type IOError as we'll learn
in a second.
- import System.Environment
- import System.IO
- import System.IO.Error
- main = toTry `catch` handler
- toTry :: IO ()
- toTry = do (fileName:_) <- getArgs
- contents <- readFile fileName
- putStrLn $ "The file has " ++ show (length (lines contents)) ++ " lines!"
- handler :: IOError -> IO ()
- handler e = putStrLn "Whoops, had some trouble!"
Just catching all types of exceptions
in one handler is bad practice in Haskell just like it is in most other
languages.
Modify our program to catch only the
exceptions caused by a file not existing.
- import System.Environment
- import System.IO
- import System.IO.Error
- main = toTry `catch` handler
- toTry :: IO ()
- toTry = do (fileName:_) <- getArgs
- contents <- readFile fileName
- putStrLn $ "The file has " ++ show (length (lines contents)) ++ " lines!"
- handler :: IOError -> IO ()
- handler e
- | isDoesNotExistError e = putStrLn "The file doesn't exist!"
- | otherwise = ioError e
Everything stays the same except the
handler, which we modified to only catch a certain group of I/O exceptions.
Here we used two new functions from —
isDoesNotExistError :: IOError ->
Bool (from System.IO.Error) is a predicate over IOErrors.
ioError :: IOException
-> IO a,takes an IOError and produces an I/O action that will throw it. The I/O action has
a type of IO a, because it never actually yields a result, so it can act as IO anything.
If the exception thrown in the toTry I/O action isn't
handled, otherwise = ioError e will re-throw it.
More predicates:
·
isAlreadyExistsError
·
isDoesNotExistError
·
isAlreadyInUseError
·
isFullError
·
isEOFError
·
isIllegalOperation
·
isPermissionError
·
isUserError
userError is used for making
exceptions from our code and equipping them with a string e.g. ioError $ userError
"remote computer unplugged!". Although It's prefered you use types
like Either and Maybe to express possible failure instead of throwing exceptions
yourself with userError.
So you could have a handler that looks
something like this:
- handler :: IOError -> IO ()
- handler e
- | isDoesNotExistError e = putStrLn "The file doesn't exist!"
- | isFullError e = freeSomeSpace
- | isIllegalOperation e = notifyCops
- | otherwise = ioError e
Where notifyCops and freeSomeSpace are some I/O
actions that you define. Be sure to re-throw exceptions if they don't match any
of your criteria, otherwise you're causing your program to fail silently in
some cases where it shouldn't.
System.IO.Error also exports
functions that enable us to ask our exceptions for some attributes, like what
the handle of the file that caused the error is, or what the filename is. These
start with ioe and you can see a full list of them in the documentation. Say we
want to print the filename that caused our error. We can't print the fileName that we got
fromgetArgs, because only
the IOError is passed to the handler and the handler doesn't know about
anything else. A function depends only on the parameters it was called with.
That's why we can use the ioeGetFileName function, which has a type of ioeGetFileName ::
IOError -> Maybe FilePath. It takes an IOError as a parameter
and maybe returns aFilePath (which is just a type synonym for String, remember, so it's
kind of the same thing). Basically, what it does is it extracts the file path
from the IOError, if it can. Let's modify our program to print out the file path that's
responsible for the exception occurring.
- import System.Environment
- import System.IO
- import System.IO.Error
- main = toTry `catch` handler
- toTry :: IO ()
- toTry = do (fileName:_) <- getArgs
- contents <- readFile fileName
- putStrLn $ "The file has " ++ show (length (lines contents)) ++ " lines!"
- handler :: IOError -> IO ()
- handler e
- | isDoesNotExistError e =
- case ioeGetFileName e of Just path -> putStrLn $ "Whoops! File does not exist at: " ++ path
- Nothing -> putStrLn "Whoops! File does not exist at unknown location!"
- | otherwise = ioError e
In the guard where isDoesNotExistError is True, we used a case expression
to call ioeGetFileName with e and then pattern match against the Maybe value that it
returned. Using case expressions is commonly used when you
want to pattern match against something without bringing in a new function.
You don't have to use one handler
to catch exceptions in your whole I/O part. You can just cover certain parts
of your I/O code with catch or you can cover several of them with catch and use
different handlers for them, like so:
- main = do toTry `catch` handler1
- thenTryThis `catch` handler2
- launchRockets
Haskell offers much better ways to
indicate errors in pure code than reverting to I/O to catch them.
Even when glueing together I/O actions
that might fail, I prefer to have their type be something like IO (Either a b), meaning that
they're normal I/O actions but the result that they yield when performed is of
type Either a b, meaning it's either Left a or Right b.
Questions
“That's why in this case it actually reads a line, prints it to the
output, reads the next line, prints it, etc”
Why does
this process the whole file and not a line at a time?
main1 =
do
handle <- openFile "abc.txt"
ReadMode
hSetBuffering handle LineBuffering
contents <- hGetContents handle
putStr $ reverse contents
hClose handle
cons’ is the
strict version of cons but is in the Lazy module: Data.ByteString.Lazy
Explain: Pure code can throw
exceptions too they can only be caught in the I/O part of our code (when we're
inside a do block that goes into main). That's because you
don't know when (or if) anything will be evaluated in pure code, because it is
lazy and doesn't have a well-defined order of execution, whereas I/O code does.
“we can't pattern match against values of type IO something“?
No comments:
Post a Comment