Haskell notes
A place for all the things I think are cool in Haskell.
Effects:
Libraries:
- Polysemy: The new hotness in free monads, aiming to bring the performance of mtl with the ease of use of free monad libraries like freer-simple. Sadly the promise of fantastic performance won’t be truly fulfilled until GHC 8.10, but I’m looking forward to that day. This video is a great introduction to the library and its techniques
Web
Libraries
- Yam: Looks like quite a nice way to deal with some of the more mundane aspects of writing a web service using Servant and Servant-server. It takes care of dealing with configuration and logging, and probably a few others things.
Data types
Ideas
Higher Kinded Data (HDK)
Data types which take a type parameter of kind * -> *
which wraps all fields. For me, the canonical introduction to this idea is Benjamin Pizza Hodgson’s post on the subject. One thing that particularly interests me is that it allows you to get frames like structures somewhat for free.
-- Turn a vector of Foo's into a Foo containing vectors of each field.
-- Efficiency++!
structOfArrays :: Vector (FooF Identity) -> FooF Vector
Libraries
- extensible: Higher kinded data with lots of other nice stuff.
Higgledy: Similar to
extensible
but you don’t have to write HKD youself:Superrecord: Efficient annonymous records. Mostly mentioned because I had a PR against the library which made some of the benchmarks 4x faster and I like to boast.
Other Cool libraries
Discrimination: Linear time grouping and sorting. Linear time sorting. Linear time sorting. Using the observation that algebraic data types can be treated as the radix for a radix sort. Can be used for grouping data by key (
groupWith :: Grouping b => (a -> b) -> [a] -> [[a]]
), which will produce the lists productively (they will lazilly produce values as they are encountered in the input list, so this enables parallel processing).It took me a loing time to unpack what the definition of
Group
actually means:newtype Group a = Group { getGroup :: forall m b. PrimMonad m -- IO or ST => (b -> m (b -> m ())) -- Run once for each `a`, used to set -- up whatever state is needed to -- process new `b`'s, where you get to -- decide what `b` is. It's important -- to make sure that the `b` you're -- passed is also stored in whatever -- state you need - I forgot that in my -- initial concurrent processing codce -- below. -> m (a -> b -> m ()) -- Returns an action which can be -- applied to an a and a b which will -- select the appropriate action which -- was returned from the previous -- argument. } deriving Typeable
in the implementation of
group
,b
here is[a]
1.1It’s actually using some crazy Ed magic using the promises package to make the productive lazy values. Here be dragons with confusing types.
Some code I wrote for using this to push data to concurrent consumers, which are spawned for each new group:
-- | Given an action which can process b;s from a stream, and a list of a's and b's, fork one processor per 'a'. All processors are sent 'Nothing' to signal there is not further data to process. distribute :: forall a b. Grouping a => (Chan (Maybe b) -> IO ()) -> [(a,b)] -> IO () distribute fork xs = do -- Create somewhere to store all the chans and async to gether -- resuls once all elements in the list are processed. chans <- newMutVar ([]::[(Async (), Chan (Maybe b))]) k <- getGroup (grouping @a) $ \b -> do -- Function run once for each new 'a' -- get the current list of Chans to add the new one to. allChans <- readMutVar chans -- Create a new chan for the 'b''s tagged with this new 'a' chan <- newChan -- Fork off the processor with this chan asyn <- async (fork chan) -- Make sure we send it the first 'b' we encountered. writeChan chan (Just b) -- Add the new chan and async handle to the list. writeMutVar chans $ (asyn,chan) : allChans -- Return a function which will add new b's for this 'a' to the -- correct chan. This is the `(b -> m ()) in the `Group` definition. return $ \b' -> writeChan chan (Just b') -- k is the (a -> b -> m ()) function in the 'Group' definition. -- Now the input data is actually processed. mapM_ (uncurry k) xs allChans <- readMutVar chans -- Tell all processors we're done. asyncs <- mapM (\(async,chan) -> writeChan chan Nothing $> async) allChans -- Wait for them all to terminate. mapM_ wait asyncs
An example of using this to discriminate between 11 groups - I’ve added
*Main MVar> v <- newMVar () -- ghetto locking on stdout to avoid interleaving output *Main MVar> let prnt xs chn = do readChan chn >>= \case Nothing -> withMVar v $ \() -> print (reverse xs); Just x -> prnt (x:xs) chn *Main MVar> distribute (prnt []) $ map (\n -> let nMod11 = n `mod` 11 in (nMod11,(nMod11, n))) [1..100::Int] [(4,4),(4,15),(4,26),(4,37),(4,48),(4,59),(4,70),(4,81),(4,92)] [(5,5),(5,16),(5,27),(5,38),(5,49),(5,60),(5,71),(5,82),(5,93)] [(6,6),(6,17),(6,28),(6,39),(6,50),(6,61),(6,72),(6,83),(6,94)] [(7,7),(7,18),(7,29),(7,40),(7,51),(7,62),(7,73),(7,84),(7,95)] [(8,8),(8,19),(8,30),(8,41),(8,52),(8,63),(8,74),(8,85),(8,96)] [(9,9),(9,20),(9,31),(9,42),(9,53),(9,64),(9,75),(9,86),(9,97)] [(10,10),(10,21),(10,32),(10,43),(10,54),(10,65),(10,76),(10,87),(10,98)] [(0,11),(0,22),(0,33),(0,44),(0,55),(0,66),(0,77),(0,88),(0,99)] [(1,1),(1,12),(1,23),(1,34),(1,45),(1,56),(1,67),(1,78),(1,89),(1,100)] [(2,2),(2,13),(2,24),(2,35),(2,46),(2,57),(2,68),(2,79),(2,90)] [(3,3),(3,14),(3,25),(3,36),(3,47),(3,58),(3,69),(3,80),(3,91)]
We can discriminate on constructors too!
Main MVar> distribute (prnt []) $ map (\n ->let key = if even n then Left (n`mod`4) else Right (n`mod`7) in (key,(key, n))) [1..100::Int] [(Right 1,1),(Right 1,15),(Right 1,29),(Right 1,43),(Right 1,57),(Right 1,71),(Right 1,85),(Right 1,99)] [(Left 0,4),(Left 0,8),(Left 0,12),(Left 0,16),(Left 0,20),(Left 0,24),(Left 0,28),(Left 0,32),(Left 0,36),(Left 0,40),(Left 0,44),(Left 0,48),(Left 0,52),(Left 0,56),(Left 0,60),(Left 0,64),(Left 0,68),(Left 0,72),(Left 0,76),(Left 0,80),(Left 0,84),(Left 0,88),(Left 0,92),(Left 0,96),(Left 0,100)] [(Right 3,3),(Right 3,17),(Right 3,31),(Right 3,45),(Right 3,59),(Right 3,73),(Right 3,87)] [(Left 2,2),(Left 2,6),(Left 2,10),(Left 2,14),(Left 2,18),(Left 2,22),(Left 2,26),(Left 2,30),(Left 2,34),(Left 2,38),(Left 2,42),(Left 2,46),(Left 2,50),(Left 2,54),(Left 2,58),(Left 2,62),(Left 2,66),(Left 2,70),(Left 2,74),(Left 2,78),(Left 2,82),(Left 2,86),(Left 2,90),(Left 2,94),(Left 2,98)] [(Right 5,5),(Right 5,19),(Right 5,33),(Right 5,47),(Right 5,61),(Right 5,75),(Right 5,89)] [(Right 0,7),(Right 0,21),(Right 0,35),(Right 0,49),(Right 0,63),(Right 0,77),(Right 0,91)] [(Right 2,9),(Right 2,23),(Right 2,37),(Right 2,51),(Right 2,65),(Right 2,79),(Right 2,93)] [(Right 4,11),(Right 4,25),(Right 4,39),(Right 4,53),(Right 4,67),(Right 4,81),(Right 4,95)] [(Right 6,13),(Right 6,27),(Right 6,41),(Right 6,55),(Right 6,69),(Right 6,83),(Right 6,97)]
I’ve had some discussions with Ed about how to best implement
Grouping Text
, so far the best I’ve come up with is a comb ination of thehashing :: Hashable a => Group a
and using theGrouping a => Grouping [a]
instance :-- Some alternatives grouping :: Group Text -- Not good is there ar elong common prefixes grouping = hashing <> contraMap T.unpack grouping -- Might help with the above, but is weird grouping = contraMap (T.takeEnd 64{-arbitrary-}) hashing <> contraMap T.unpack grouping -- fall back to [Char]
Ed informs me there’s a proper way to do this
-- #haskell - 2019-06-07 18:35 Axman6: I also thought of doing something like contraMap (T.take 64) hashing <> contramap T.unpack grouping as a middle ground 18:35 edwardk: because give two novels you'd expect only log novel count characters worth of distinct common prefix 18:36 edwardk: that starts to sound like a sensible next step 18:36 Axman6: though that still degrades in the long common prefix case 18:36 edwardk: this would get a lot better in general if we did the burst-sort improvements on discrimination 18:37 Axman6: contraMap (T.takeEnd 64) hashing <> contramap T.unpack grouping -- >_> 18:37 Axman6: burst-sort? 18:38 edwardk: burst-sort is generally the fastest stringsort. and its built on the same sort of underpinnings as discrimination 18:38 edwardk: discrimination needs two things to really be fast 18:38 edwardk: and i just haven't done them 18:38 edwardk: 1.) i need a linear-probing hash table with simple tabulation hashing. the ioref holding a hashmap isn't cutting it 18:39 edwardk: and hashable is a terrible hash code 18:39 edwardk: simple tabulation hashing can be implemented as say a linked list of small hash tables for how to hash each byte in sequence that you xor the results of 18:39 Axman6: ah I'd also thought of using a trie to do this, glad I was on the right track 18:39 edwardk: and just build that hash table chain as a lazy list that gets threaded through the 'serializer' 18:40 edwardk: the second thing is this burst-trie stuff where we try to recover temporal coherence for cache access 18:40 edwardk: if on the other hand what you did was batch up in buckets until you got ~8192 things to distribute then you'd have coherence again for the distribution step 18:41 edwardk: this is what (the first version of) burst sort did 18:41 edwardk: so you'd now have two passes 18:41 edwardk: pass one would be dumping things into these buckets, which fill like water balloons, as you drip contents into them, bursting if they get too full, and building a larger and larger prefix of the trie 18:42 edwardk: then we take the trie that has leaf level buckets and walk that trie in order 18:42 edwardk: and for each bucket we can use a more specialized sort, e.g. multikey quicksort or something can be used locally 18:43 edwardk: this is the "right" version of discrimination 18:43 Axman6: I look forward to seeing it =) 18:43 edwardk: not anywhere near the top of my todo pile 18:43 edwardk: if you want it, build it ;) 18:43 edwardk: i'll happily help give pointers 18:44 Axman6: yeah I figured. I don't have anough need for it at the moment, but would be interested in building it if I could 18:44 edwardk: simple tabulation hashing + linear probing was proven basically optimal for this by mihai patrascu 18:45 Axman6: I feel like I've nerdsniped myself now 18:48 edwardk: =)
Versioning
https://github.com/lortabac/versioning