fredag 11 september 2015

GPU programming in Haskell using GPipe - Part 2

< Previous episode: Hello triangle

Welcome back! In last part you got your first triangle through GPipe. This time we are going to examine Buffers and PrimitiveArrays in more detail.


In the "Hello world" example we made last time, we used a Buffer to store the triangle's positions and colors. From this buffer, we created a PrimitiveArray that we fed to the shader. Later on, we will see that Buffers can be used for other things as well.

A Buffer in GPipe is an array of data that is stored on the GPU. It is mutable just like IOArray or STArray and just like those it also lives in a monad, in this case the ContextT monad. Let's look at the function that creates buffers first:

newBuffer :: (MonadIO m, BufferFormat b) => Int -> ContextT ctx os m (Buffer os b)

A buffer has type Buffer os b, where os is the same as the ContexT's os. As you might remember from last time this os type parameter is used to keep context bound objects from leaving the monad, and Buffer is such an object.

newBuffer just takes one argument: the number of elements in the buffer to create. A buffer has mutable elements, but the number of elements is immutable. The type of the buffer's elements is denoted by b, and as you can see this b is constrained by the type class BufferFormat b. Before I show you that type class, let's look at the function you will use to fill your buffer with data from the CPU side:

writeBuffer :: MonadIO m => Buffer os b -> BufferStartPos -> [HostFormat b] -> ContextT ctx os m ()

This function takes a buffer to write to and a zero indexed position in the buffer to start at, nothing strange with that, but then it takes a list of HostFormat b... What's up with that? A buffer's contents doesn't have the same representation on the host as in the buffer, which lives on the GPU. (From now on I am going to use the term "host" when I mean the normal CPU-living Haskell world, as opposed to the GPU world.) HostFormat b is an associated type in the type class BufferFormat b. Let's take a look at that class:

class BufferFormat f where
  type HostFormat f
  toBuffer :: ToBuffer (HostFormat f) f

The sole purpose of this class is to provide a representation for the buffer elements' type on the host, as well as a conversion from the host to the buffer representation. Here are some examples of instances of this class, and their host representations:

f  HostFormat f
B FloatFloat
B Int32Int32
B Word32Word32
B2 FloatV2 Float
B2 Int32V2 Int32
B2 Word32V2 Word32
B2 Int16V2 Int16
B2 Word16V2 Word16
(a, b)(HostFormat a, HostFormat b)
V2 aV2 (HostFormat a)
There are many more, including B3, B4 and larger tuples. See the full list on hackage.

A Float on the host will become B Float in a Buffer. B a is an opaque type that you can't inspect the value of or do any calculations with, e.g. there is no Num instance for B Float. Buffers doesn't expose a way to apply functions on their elements anyway (e.g. Buffer has no Functor instance), but we will soon create VertexArrays from our Buffers and then this will become a subject.

GPipe also defines the B2 a, B3 a and B4 a types. For a selected set of as, B2 a is the buffer representation of V2 a on the host. V2 a is also an instance of BufferFormat with V2 (HostFormat a) as host representation, which means that both V2 (B Float) and B2 Float has the same host representation: V2 Float. Both these buffer formats have the same size and even internal layout, but the B2 Float version can be used more efficient as we will see later. For that reason, always try to use B-types over V-types in buffers when possible. Then why is there a BufferFormat instance for V2 a at all? The main use case is matrices, e.g. V4 (V4 Float) on the host can be stored in a buffer as V4 (B4 Float).

Another interesting thing you may have noticed from studying the BufferFormat instance list is that there are B2 Int16 and B2 Word16 instances but no B Int16 or B Word16 instances. This is because vertex attributes has to be 4 byte aligned on some hardware, and GPipe enforces this through it's types. Int16 and Word16 are both 2 bytes, so you need to have a vector of at least two of them. There are actually B3 Int16 and B3 Word16 instances, but these will pad their data with 2 byte extra. The motivation for all of this is that you could always go for a B Int32 instead of a B Int16 if it existed, they would work with the same shaders and would just be the same size anyway if we had padded the latter. A B3 Int32 on the other hand take 12 bytes while a padded B3 Int16 only takes 8, so there is a distinct use case for that one. A B4 Int16 also takes 8 bytes, but that wouldn't work with the same shaders as will become evident in the next part of this tutorial.

Now let's look at the toBuffer member of the BufferFormat type class. It has the type ToBuffer (HostFormat f) f. ToBuffer is something that is called an arrow in Haskell. It is like a function (in this case HostFormat f -> f), but more general. Let's look at the BufferFormat (a, b) instance as an example:

{-# LANGUAGE Arrows #-}

instance (BufferFormat a, BufferFormat b) => BufferFormat (a, b) where
  type HostFormat (a,b) = (HostFormat a, HostFormat b)
  toBuffer = proc ~(a, b) -> do
                a' <- toBuffer -< a
                b' <- toBuffer -< b
                returnA -< (a', b')

Arrow notation almost looks like a lambda (using the special keyword proc) returning a monadic action. But this is not a monad. The main difference from a monad is that you cannot select action based on the arrow return values. This is why arrow actions have an arrow tail (-<); anything between the <- and -< of an arrow may not reference anything outside them (a, b, a' or b' in this case). This enforces that every invocation of toBuffer must go through the same series of arrow actions, independent on the values of the actual input data. Another additional requirement that GPipe has is that it needs to be able to produce values lazily, thus the tilde (~) in the proc pattern. The only ToBuffer arrow actions GPipe defines for you that you can use inside your own implementation of toBuffer are the other instance's toBuffer methods. You are going to see this pattern where an arrow is used to define a conversion between two domains appear in more places of GPipe as we continue through the tutorial.

Vertex arrays

Ok, you are now experts on buffers! Let's put them to some use:

newVertexArray :: Buffer os a -> Render os (VertexArray t a)

You run this function in a Render monad to create a VertexArray t a. A vertex array is like view of a buffer, and newVertexArray doesn't copy any data. Since we operate inside the Render monad (that is run by the render function, which doesn't allow return values) and Buffers can only be modified outside this monad (in the ContextT monad), conceptually you may think of a VertexArray as a copy of the Buffer. But it's really not. But treat it like one.

VertexArray t a is an array of vertices where each vertex is an element of type a, that is the same type as the elements of the Buffer you created it from. Don't worry about the type parameter t for now, I'll get to that in a bit. The VertexArray has as many vertices as there are elements in the originating Buffer, but in contrast to Buffers you may trim a VertexArray using the functions dropVertices or takeVertices. These works exactly like drop or take works on normal lists:

takeVertices :: Int -> VertexArray t a -> VertexArray t a Source
dropVertices :: Int -> VertexArray () a -> VertexArray t a Source

VertexArrays also has a Functor instance, which allows you to fmap over its vertices. This is when the opacity of the B-types I talked about earlier comes into play! Now that you get to do stuff with your B-values, you will notice that the options are rather limited. You will merely pick elements from structures like tuples and/or build new such structures with the values you have. There are however a couple of functions that operate on the B-values that you can use here:

toB22 :: forall a. (Storable a, BufferFormat (B2 a)) => B4 a -> (B2 a, B2 a)
toB3 :: forall a. (Storable a, BufferFormat (B3 a)) => B4 a -> B3 a
toB21 :: forall a. (Storable a, BufferFormat (B a)) => B3 a -> (B2 a, B a)
toB12 :: forall a. (Storable a, BufferFormat (B a)) => B3 a -> (B a, B2 a)
toB11 :: forall a. (Storable a, BufferFormat (B a)) => B2 a -> (B a, B a)

These may split B-vectors into smaller parts. Notice that there are no functions that can combine them again though.

You may also zip two VertexArrays together with the zipVertices function, which works exactly like zipWith on normal lists; you provide a function to combine the elements of the two argument VertexArrays and the resulting VertexArray will be the length of the shorter of the two input arrays:

zipVertices :: (a -> b -> c) -> VertexArray t a -> VertexArray t' b -> VertexArray (Combine t t') c

(Again, don't mind the strange first type parameter in the returned VertexArray, I'll explain that later.)

Zipping vertex arrays is what corresponds to using non-interleaved arrays in OpenGl, while a vertex array from a single buffer of a compound element type (such as a tuple of two B-values) corresponds to interleaved arrays. This is just the functional and type safe way to do it!

Primitive arrays

Now that you have trimmed, zipped and mapped your vertex array into perfection, it's time to create a primitive array. The simplest way to create one is with this function:

toPrimitiveArray :: PrimitiveTopology p -> VertexArray () a -> PrimitiveArray p a

You always need a primitive topology, besides your array of vertices, to create a PrimitiveArray. The primitive topology denotes how the vertices should be connected to form primitives, and is one of these data constructors:
data PrimitiveTopology p where
  TriangleList :: PrimitiveTopology Triangles  
  TriangleStrip :: PrimitiveTopology Triangles  
  TriangleFan :: PrimitiveTopology Triangles  
  LineList :: PrimitiveTopology Lines  
  LineStrip :: PrimitiveTopology Lines  
  LineLoop :: PrimitiveTopology Lines  
  PointList :: PrimitiveTopology Points

In most cases you will work with triangles. Lets look at how the three triangle topologies look like:

(a) - TriangleStrip, (b) - TriangleFan, (c) - TriangleList
(Image courtesy of OpenGl specification by Khronos)

In a TriangleStrip, every vertex forms a triangle with the previous two vertices, alternating the winding of the vertices for every other triangle. That means that the first triangle is formed by vertices 1-2-3 in that order, the next by 2-4-3, then 3-4-5, 4-6-5, and so on. For TriangleFan, every triangle is formed by the first vertex in the array together with every two consecutive vertices, in that order. For TriangleList, every three vertices simply forms a triangle; there is no sharing of vertices between triangles.

The vertices always comes in counter clock wise order for a triangle that is front facing (which means that all triangles but the right-most in the image above are back facing, just as an example on how intuitive the OpenGl specification can be). The facing of a triangle will matter later when we rasterize it, when you may choose to only rasterize front facing or back facing triangles.

Primitive arrays may not be trimmed like vertex arrays, but it does have a Functor instance so you may fmap over it just like with vertex arrays. It also has a Monoid instance, which allow you to concatenate two PrimitiveArrays together into one using mappend. This makes it possible to create a PrimitiveArray consisting of several disjoint triangle strips, but more efficient ways of achieving that are presented in the next two sections.

Index arrays

It is common that a vertex is used by not only two consecutive triangles in a strip, but also by triangles in another strip. It would be quite wasteful to duplicate all the shared vertices data for each strip, and for this reason you can use an index array instead:

toPrimitiveArrayIndexed :: PrimitiveTopology p -> IndexArray -> VertexArray () a -> PrimitiveArray p a

Instead of forming primitives from taking consecutive vertices in a VertexArray, this function will take indices from an IndexArray and use those to pick vertices from the VertexArray. Multiple elements in the IndexArray may refer to the same vertex. The primitive topology works the same for this function, but is applied to the IndexArray instead. For example if TriangleStrip is used the first triangle is formed by the vertices referred by the first three indices, the next one is formed by the second, fourth and third index, and so on.

You create an IndexArray with

newIndexArray :: forall os b a. (BufferFormat b, Integral a, IndexFormat b ~ a) => Buffer os b -> Maybe a -> Render os IndexArray

Almost like creating a VertexArray, but the type of the elements of the Buffer you create it from is also constrained by this type family:

type family IndexFormat a where
  IndexFormat (B Word32) = Word32  
  IndexFormat (BPacked Word16) = Word16  
  IndexFormat (BPacked Word8) = Word8 

This means that indices are either Word32, Word16 or Word8. Remember that I previously told you that all buffer element types needed to be 4-byte aligned? Index arrays actually require all elements to be tightly packed, but still supports indices of type Word16 and Word8. This means that buffers of those two element types cannot be used as vertex arrays. That's why the buffer representation of Word16 and Word8 are BPacked Word16 and BPacked Word8. They work exactly like their B-counter parts, with the exception that there are no VertexInput instances for any BPacked a. VertexInput is the type class that is used when creating primitive streams from primitive arrays, which we will do in the next part of this tutorial. As you might have guessed by now, the type family IndexFormat a evaluates to the same types as the associated type HostFormat a.

Besides a buffer of indices, newIndexArray also takes a Maybe a as an argument. This is denoting an optional primitive restart index, i.e. a special index value that if encountered in the index array while assembling primitives signals that the current topology should end and the next index to be the beginning of a new topology. This makes it possible to have multiple triangle strips in a single IndexArray by just separating them with a special index, which is more efficient than concatenating multiple PrimitiveStreams using their Monoid instance.

Index arrays may be trimmed just like vertex arrays, but with the functions takeIndices and dropIndices instead. It does not have a Functor instance (that wouldn't make sense) or a Monoid instance.

Instanced primitive arrays

The last thing I'll show you in this episode is instanced primitive arrays. Imagine that you want to create a triangle mesh of a temple, where you have many identical pillars placed in a row. Instead of duplicating the triangles for each pillar, or making a single pillar primitive array that you concatenate with itself multiple times, you can create an instanced primitive array. The function looks like this:

toPrimitiveArrayInstanced :: PrimitiveTopology p -> (a -> b -> c) -> VertexArray () a -> VertexArray t b -> PrimitiveArray p c

It resembles the zipVertices function in that it takes two VertexArrays and a binary function to combine the vertices of these two arrays, but toPrimitiveArrayInstanced doesn't zip the two arrays together. Instead, it will create one primitive array of the first vertex array for each vertex in the second vertex array, and concatenate the results. In our example with temple pillars, the first array then contains the strip for a single pillar, while the second array contains a position for each pillar to instantiate, resulting in a primitive array where each vertex contains it's individual position within the pillar, as well as the instances position within the temple. You would then need a shader that combined those two positions together into the final position. This is the most efficient way to render multiple instances of the same object.

If you want to use instanced primitive arrays and indexed primitive arrays at the same time, there is a function for that too:

toPrimitiveArrayIndexedInstanced :: PrimitiveTopology p -> IndexArray -> (a -> b -> c) -> VertexArray () a -> VertexArray t b -> PrimitiveArray p c

To make instancing even more powerful, you may replicate the vertices in one array a fixed number of times each and then zip it with another array and use the result as instances in toPrimitiveArrayInstanced. E.g. you could have a vertex array with three different colors and replicate each color 5 times and then zip it with an array of 15 positions and use this as instances to our temple to get 15 pillars colored in three different shades for variation. The function you use to do this is

replicateEach :: Int -> VertexArray t a -> VertexArray Instances a

This will replicate each vertex in the argument array as many times as dictated by the first argument. Notice the Instances type in the first type parameter of the resulting VertexArray. Maybe you have noticed that this parameter has previously been () or just t. If this parameter to VertexArray is Instances then the VertexArray can only be used for instances, i.e. as last argument in a call to toPrimitiveArrayInstanced or toPrimitiveArrayIndexedInstanced. If you go back and look at the types of functions taking or returning VertexArrays above, you will see that
  • replicateEach returns a VertexArray that must be used as instances.
  • dropVertices may not be used on any VertexArray that must be used as instances.
  • zipVertices returns a VertexArray that must be used as instances if either of the input arrays must be used as instances.
I was a bit unfair just now, because that last bullet was not something you could see from looking at the function type alone, you needed this definition as well:

type family Combine t t' where
  Combine () Instances = Instances  
  Combine Instances () = Instances  
  Combine Instances Instances = Instances  
  Combine () () = ()

Once you have your PrimitiveArray, the type information whether instancing, indexing or both were used is gone. This means that you may mappend an instanced primitive array with a non-instanced, and that the shader you send a primitive array to doesn't care if it was instanced or indexed.

No code examples this time, so I'll leave as an exercise to apply what you learned this time on the example from previous part. Next time we will finally enter the Shader!

söndag 6 september 2015

GPU programming in Haskell using GPipe - Part 1

Welcome to the first part of a tutorial series for GPU programming in Haskell! We will be using GPipe 2.2.1, that I was earlier announced on this blog. GPipe 2 is a functional API based on OpenGl 3.3, but this tutorial will not require previous knowledge of OpenGl, so if you know Haskell (which is a prerequisite) and ever wanted to learn graphics programming now is the time!

Edit 2017-05-20: This tutorial has been updated for GPipe 2.2.

Hello triangle!

Lets start with a small "Hello world!" program:

{-# LANGUAGE ScopedTypeVariables, PackageImports, TypeFamilies #-}   
module Main where

import Graphics.GPipe
import qualified "GPipe-GLFW" Graphics.GPipe.Context.GLFW as GLFW
import Control.Monad (unless)

main :: IO ()
main =
  runContextT GLFW.defaultHandleConfig $ do
    win <- newWindow (WindowFormatColor RGB8) (GLFW.defaultWindowConfig "Hello world!")

    vertexBuffer :: Buffer os (B4 Float, B3 Float) <- newBuffer 3
    writeBuffer vertexBuffer 0 [ (V4 (-1) 1 0 1, V3 1 0 0)
                               , (V4 0 (-1) 0 1, V3 0 1 0)
                               , (V4 1 1 0 1,  V3 0 0 1)

    shader <- compileShader $ do
      primitiveStream <- toPrimitiveStream id
      fragmentStream <- rasterize (const (FrontAndBack, ViewPort (V2 0 0) (V2 500 500), DepthRange 0 1)) primitiveStream
      drawWindowColor (const (win, ContextColorOption NoBlending (V3 True True True))) fragmentStream

    loop vertexBuffer shader win

loop vertexBuffer shader win = do
  render $ do
    clearWindowColor win (V3 0 0 0)
    vertexArray <- newVertexArray vertexBuffer
    let primitiveArray = toPrimitiveArray TriangleList vertexArray
    shader primitiveArray
  swapWindowBuffers win

  closeRequested <- GLFW.windowShouldClose win
  unless (closeRequested == Just True) $
    loop vertexBuffer shader win   

As you see from the import list, an additional packages is required: GPipe-GLFW (version 1.4.1 or later). This package provides functionality to create windows into which GPipe draws, as well as functions to get key and mouse input. This kind of functionality used to be built into GPipe 1, but since many wanted to be free to choose what window manager to use this was factored out to its own package. At the time of writing this, only bindings for GLFW exists, but more will surely come.

When you import Graphics.GPipe you also get the entire linear and Boolean packages, since these are used quite heavily in GPipe applications.

We are now ready to compile (use -threaded as parameter to GHC as GPipe-GLFW requires it) and run our program which will show us this colorful triangle in the window's lower left corner:

The context

The first thing we do in the main function is to execute runContextT. A context owns all the GPU memory objects, much like a process does for the CPU. runContextT creates a new context for us. It takes two arguments: a ContextHandlerParameters value and a monad action.

The ContextHandlerParameters parameter is provided by the window manager. To use the GPipe-GLFW package we imported earlier, we pass in GLFW.defaultHandleConfig as this first argument.

The second parameter to runContextT is the monad action in which our entire GPipe program happens. This monad action has the type ContextT ctx os m a. This is a monad transformer, i.e. a monad that is inheriting the capabilities of another monad of type m. For ContextT, m is the type of the monad in which we run runContextT. In this and most other cases, that is simply the IO monad. Inside a monad transformer, you may use the function lift to execute an action in the inherited monad.

Some type trickery is made to ensure that the variables that GPipe actions return within your context is not returned from it. This is the same trickery as the ST monad uses to ensure STRefs aren't returned and used in another runST invocation. The trick is that the runContextT call uses something called a rank 2 type:
runContextT :: (MonadIO m, MonadAsyncException m, ContextHandler ctx) 
            => ContextHandlerParameters ctx 
            -> (forall os. ContextT ctx os m a)
            -> m a

Notice that there is a forall qualifier for os local to the ContextT monad action argument. This will make any object referencing that os type bound to this monad action.

The ctx parameter in the ContextT type is the type of the window manager. When we use GLFW.defaultHandleConfigctx will be Graphics.GPipe.Context.GLFW.Handle. This type constraints some actions to only be usable in ContextT monads for a certain window handler, as we will see an example of below.


The first thing we do in our ContextT monad is to create a window using newWindow. This function takes two parameters, a format and a window manager specific parameter. The format is describing what kind of images we will be drawing into the window, e.g. how many color channels it will have and how many bits per color. It also describes whether we have a depth buffer or a stencil buffer attached to the window. I will discuss what these are in a later part of this tutorial when I cover drawing. Right now we settle for a format with an RGB color with 8 bits per each of the three channels, and no depth or stencil buffers. The value that describes this format for us is WindowFormatColor RGB8.

You may create any number of windows you like, all that will be able to render different views of the same data. Windows may be explicitly closed using deleteWindow, but will otherwise be automatically closed when the runContextT call is finished.

You don't even need to create any window at all, for example if you want to use the GPU to generate images to save to disk rather than to show on screen.

windowShouldClose is an action defined by GPipe-GLFW, and only works in contexts where the ctx parameter is Graphics.GPipe.Context.GLFW.Handle. This action will return whether the user has requested the window to close, e.g. by clicking the 'X' in the top corner. Also, note that this action actually returns a Maybe Bool, which will be Nothing if the window was already closed.

Rendering - This is what it's all about

Now that we have our context, let's do some rendering. Any rendering you do in GPipe will follow this sequence of operations:

In short, every GPipe rendering will from a buffer of data create an array of vertices that are assembled into an array of primitives. There are three kinds of primitives: points, lines and triangles, but we will almost exclusively work with triangles. The array of primitives is then turned into a stream of primitives inside a shader, enabling us to do transformations of those vertices. The primitives are then rasterized, i.e. chopped up into pixel sized fragments, forming a fragment stream. This stream of fragments is then drawn in a window, or into an off-screen image.

In the ContextT monad, we create a Buffer of data that is stored on the GPU. In our "Hello world" example above, our buffer is called vertexBuffer and has 3 elements, each of which is a tuple (B4 Float, B3 Float). B4 and B3 are the "buffer representations" of V4 and V3, the vector types from the linear package. I will go into more detail what these "buffer representations" are in the next part of this tutorial, but for now you may think of B4 as just another name for V4 when we use it in a Buffer. Directly after creating the buffer, we write three values into it from an ordinary list.

With a function called render we run another monad, conveniently called... Render. In this monad we use our Buffer to create a VertexArray with the newVertexArray function. Coming from our vertexBuffer, vertexArray will have 3 vertices, each with of which has a tuple (B4 Float, B3 Float). Now you may wonder what the difference of a VertexArray and a Buffer is. A fair question indeed, but I'm afraid I'll have to wait until the next part of this turtorial to answer it. Sorry.

Now that we have a VertexArray, we use this to create a PrimitiveArray of triangles using the function toPrimitiveArray. TriangleList that we give as argument to this function indicates that we want to form triangles from each three consecutive vertices in vertexArray. Since there are only three vertices, primitiveArray will only contain a single triangle.

Looking at the graph above we should then turn this PrimitiveArray into a PrimitiveStream (yet another name for the same thing?), but in the code we just see shader primitiveArray?

Shader - A primer

The gray box in the graph above is called a Shader. It is, I guess unsurprisingly by now, also a monad! The difference from both ContextT and Render monad is that we can't run it directly, it has to be compiled first. This compilation is different from the compilation you do when you run ghc, cabal, stack or whatever shortcut you have in emacs. This compilation happens during runtime of the program, and is using a compiler provided by your graphics driver. This compilation may actually take seconds, so it is definitely not something you want to do every frame in for example a game written with GPipe.

A Shader monad is compiled with the function compileShader, that you run in your ContextT monad. compileShader will return a function that you later can run in a Render monad. In our example above, we compile the shader into a function we call just shader. And this shader is what we see being executed as last action in the Render monad, passing in primitiveArray as an argument.

Let's take a look at the actual Shader in our example now. The first action we run is toPrimitiveStream. This will load a PrimitiveArray into something called a PrimitiveStream. The PrimitiveArray to load is selected with the function passed as argument to toPrimitiveStream, in this case id. A Shader monad is almost like a Reader monad, it closes over an environment. But unlike Reader, there is no ask action where you can retrieve this environment. Instead, many other actions, like toPrimitiveStream, will take a function that extracts values from this environment. The environment value is not defined until the shader is run, i.e. not even when the shader is compiled. Remember that we passed primitiveArray as argument to our compiled shader function? That is the environment we use in our program. Since the function passed to toPrimitiveStream wants to extract a PrimitiveArray from the environment, and our environment is a PrimitiveArray, we just use id.

A PrimitiveStream is also a sequence of primitives, but it lives inside the shader and as such we may map functions on it that will run on the GPU. PrimitiveStream implements Functor, and fmap f primitiveStream will return a new primitive stream that is the result of applying the function f to each vertex of each primitive in primitiveStream. Mapping functions on streams with fmap in shaders is many times faster than doing the same kind of operation on an ordinary lists since we are using the GPU instead of the CPU. In our "Hello world" example, we are actually not doing anything with the primitives in our primitiveStream before we feed them to the rasterize function. But before we move on to that, let me just mention what the inferred type of primitiveStream is:

primitiveStream :: PrimitiveStream Triangles (V4 VFloat, V3 VFloat)

As you can see, the B4 and B3 types we had in our buffer (and in our vertex array and primitive array) got turned into V4 and V3 again, but the Floats inside them apparently got turned into VFloats! VFloat is really a type synonym for S V Float which is a Float lifted to a vertex stream on the GPU, i.e. it is not an ordinary Float that you can use with any Float function anymore; you can only do things with it that the GPU supports. I will discuss this type in more detail when we dissect shaders in an upcoming part of this tutorial.


Even though we never map any functions on our primitiveStream to run on the GPU, nor on the fragmentStream we are about to create, there is still one operation we will always do in a shader that leverages the massive parallelism of GPUs: rasterization.

Rasterization is the process of mapping a primitive such as a triangle to a grid and generate pixel sized fragments. The vertices of the input primitives are used in two ways: first they must all provide a position of the vertex so the rasterizer knows how many fragments to generate, and secondly they provide values that will be linearly interpolated between the primitive's all vertices to create unique values for each generated fragment.

The first argument to rasterize is a function extracting three parameters from the shader environment: which side of each primitive to rasterize, the view port's position and size, and the fragment's depth range. In our example, we know all parameters up front and don't need to get them from the shader environment, that is why we use the const function. The parameters we provide tells rasterize that it should rasterize both sides of each triangle, that the view port has lower left corner at (0,0) and has a width and height of 500 pixels, and that the depth range is [0,1]. More on that in a bit.

The vertices' positions are 3D coordinates in a canonical view space. During rasterization, these will be transformed into the view port in pixel screen space, where the position (-1,-1,z) in canonical view space will be mapped to the view port's lower left corner (in our case (0,0)) , and (1,1,z) will be mapped to the upper right corner (in our case (500,500)). To be more precise, the fragment in the lower left corner in our case will actually have pixel coordinate (0.5, 0.5), and the uppermost, rightmost fragment we generate will have coordinate (499.5, 499.5).

Every fragment also has a depth value in the range [0,1]. At rasterization, we specify with the DepthRange parameter how to map the canonical z coordinates to this range. A z coordinate with value -1 will be mapped to the first parameter of DepthRange, and a z coordinate with value 1 will be mapped to the second parameter of DepthRange. In our example, we map z coordinates in the canonical view space range [-1,1] to the depth range [0,1]. The convention used by Linear.Projection and most other OpenGl math libraries is that a z coordinate of 1 in canonical view space is considered to be furthest away and -1 to be closest, but you are actually free to use any convention you like. Any fragment with a value outside the depth range [0,1] will be discarded, so any part of primitives that intersects the box [(-1,-1,-1), (1,1,1)] in the canonical view space will become fragments in the view port. This box is commonly referred to as the canonical view volume.

The position of a vertex in canonical view space is actually provided as a V4 VFloat, known as a homogeneous 3D coordinate, where V4 x y z w has the 3D position (x/w, y/w, z/w). All three vertices of the triangle in our example all use 1 for the w component, so in this simple case they are just normal 3D coordinates. When using perspective projection (where objects appear smaller the further away they are, which is standard in most 3D applications) the w component will not be 1. The reason the rasterizer wants w to be passed in explicitly instead of having us divide the other components by it ourselves (by mapping such a function over the primitive stream), is that this w component is also used when interpolating all other values of the vertex. I'll demonstrate how this perspective correct interpolation works in a later part when we cover textures and samplers.

Now that we have calculated which fragments to generate from each primitive, and what screen positions and depth values these will have, we can interpolate the vertices' other values. The rasterize function's second argument is a primitive stream of type

FragmentIput a => PrimitiveStream p (V4 VFloat, a)

And returns a fragment stream of type

FragmentIput a => FragmentStream (FragmentFormat a)

That is, each vertex has a homogenous position as we've just discussed, but also some other value of type a that will be turned into a value of type FragmentFormat a in each fragment. These values are produced by linearly interpolating the vertices values over the entire primitive for each fragment. In our example, a is V3 VFloat representing the color of each vertex. FragmentFormat a is an associated type in the FragmentInput class, and FragmentFormat (V3 VFloat) evaluates to V3 FFloat. FFloat is just like VFloat a lifted version of Float, but this time in a fragment stream. We distinguish lifted values in vertex streams from lifted values in fragment streams, since GPUs doesn't support the exact same set of operations on them.

Drawing and swapping

The last thing we do in our shader now that we have our fragmentStream is to draw its fragments into our window. drawWindowColor takes the fragmentStream as argument but also (just like most other actions in the Shader monad) a function that extracts parameters from the shader environment. In this case the parameter extracted is a value of type (Window os c ds, ContextColorOption c), i.e. the window to draw to and a specification how the fragments should be combined with the previous values in the window. The values we provide in our example (again using const since it is not dependent on the shader environment) is the window we created earlier and a specification that each fragment should completely overwrite the previous value in the window. I will devote an entire part of this tutorial to drawing, so these parameters will be explained in detail later.

Since our window was created with format RGB8, the fragment stream needs to contain color values of type V3 FFloat. Conveniently enough, that is exactly the type our fragmentStream has as a result of rasterization. In most GPipe programs though, you will fmap functions on your fragment stream to transform the values interpolated from rasterization into the color format that is required by the window.

Drawing is the only action in the shader that has a side effect: in this case the back buffer of the window is altered. A window actually has (at least) two buffers, one we call the front buffer that is currently shown on screen and one that we call the back buffer that shaders are drawing to. When the shader primitiveArray action in the Render monad action returns, the back buffer will have been updated. To present this newly rendered image on screen, we need to call swapWindowBuffers inside our ContextT monad. This will tell the graphics hardware to swap places of the front and back buffer. This will not perform any copy of memory, but merely swap some pointer values, so it is quite effective. swapWindowBuffers may however block for a while if you try to present images faster than the screen can update, but this usually is a good thing because you would otherwise just waste GPU and CPU cycles producing more images than would have been presented.

There is one line in our examples Render action that I shamelessly skipped over before: clearWindowColor win (V3 0 0 0). This action happens before we run the shader, and it is used to set each pixel in the previous contents of the window's back buffer to a constant value, in this case V3 0 0 0, aka black. After a swap, the contents of the back buffer is undefined, so it is always a good idea to start each frame after last swapWindowBuffers by clearing. Clearing and running shaders are the two actions in the Render monad that has side effects.

tisdag 1 september 2015

GPipe is dead, long live GPipe!

I am proud to announce the release of the next major version of GPipe on Hackage!

In this post I'll explain the whats and whys and do some demonstration of the new API.

What is GPipe?

If you never heard about GPipe before, please read on. If you however already been using GPipe for the last six years and just been waiting for the next version to come out, you might wanna skip ahead to the next section.

GPipe is OpenGl made functional and type safe for Haskell. And then I mean really type safe, as in "if it compiles, it runs"! There are already a couple of bindings for OpenGl in Haskell but when you use either of those you are still exposed to the inherently imperative state machine that is OpenGl, and there are a myriad of invariants you need to keep track of that otherwise will cause hard to debug runtime crashes. This is especially true with modern OpenGl (versions >= 3). GPipe is based on the OpenGl 3.3 core profile.

Some examples of things that are awkward or hard to do right in plain OpenGl but that GPipe makes safe and easy:
  • Setting up complete textures (e.g. define all mip map levels with correct size)
  • Setting up texture samplers for use in shaders
  • Setting up complete FBOs and attach all outputs correctly from fragment shaders
  • Setting up VBOs and match the types and indices in GLSL code
  • Aligning uniform buffers to match GLSL code
  • Handling buffers is about as safe as dealing with void* in C++, especially interleaved VBOs
  • Selecting the correct format, internal format and type for transferring pixels to your texture
Since I want shaders to be type safe with respect to the rest of the pipeline (e.g. use same format of vertex attributes as provided in the vertex array) I decided to provide a DSL for the shaders as well. There is no need to learn GLSL at all to use GPipe! (You might wanna learn linear though...)

Where can I get it?

It's right here!

Why a new major version?

If you have tried GPipe 1, chances are that you ran into some memory leaks. And that is the major flaw with the GPipe 1 design: you had little control over space and time. "Space" as in vertex buffers and textures where converted from Haskell lists and cached automatically. To animate a texture for instance, you had to create a new one each frame! And "time" as in shaders were generated and compiled at times that weren't really obvious for the user. In fact, shader source code was generated each frame and the shader cache used that source to pick a compiled shader, or compile a new one. While this cache used a trie to be as efficient as possible, it just didn't scale well.

GPipe 1 also modeled the graphical pipeline as a pure function, something that only really pays out if you are rendering to an image to store offline. If you want to present the rendered result, you are still bound to the IO monad. This perceived purity required a lot of unsafePerformIO under the hood, and in order to prevent different GL frames to interfere there was a lot of explicit forcing of evaluations that bloated the code.

So when I set off to rewrite GPipe I decided on a couple of design principles:
  • It should be safe
  • It should be fast
  • It should be familiar to anyone who knows modern OpenGl
In order to achieve that, I had to kill some of my darlings. For example, the almost magical toGPU function that turned a normal Haskell value into a GL uniform value to be used in shaders had to go. It was the main cause for the shader to be regenerated every frame. Still, I think the end result is just as modular and easier to reason about.

Whats new in GPipe 2?

  • Based on OpenGl 3.3 core profile
  • Mutable textures
  • Mutable type safe buffers
  • Interleaved or non-interleaved vertex arrays
  • Instanced rendering
  • Control structures (such as loops) in shaders
  • Render to texture (FBOs) with support for multiple render targets
  • Integral texture formats (even for FBOs)
  • Much wider texture and sampler support (e.g. shadow samplers)
  • Explicit control of shader compilation
  • Explicit control of context creation, with support for shared contexts
  • No built in GLUT window manager. This functionality is instead provided by external packages.
  • Vec library is replaced by linear
  • No unsafePerformIO!
GPipe 2 is not backward compatible with GPipe 1. That also means that auxiliary packages like GPipe-Collada and GPipe-TextureLoad doesn't work with this new version yet.

As for now, there exist one window manager package for GPipe 2 created by plredmond: GPipe-GLFW. Anyone who dares can create their own window manager based on GLUT or whatever if they wanted.

Enough talk! Show me some code and pretty pictures!


Edit 2015-09-02: Added missing language pragmas and fixed indentation that got skewed when entering world of HTML.

{-# LANGUAGE ScopedTypeVariables, PackageImports, FlexibleContexts, TypeFamilies #-}  
module Main where  
import Control.Applicative  
import Control.Monad  
import "transformers" Control.Monad.IO.Class  
import Graphics.GPipe  
import qualified "GPipe-GLFW" Graphics.GPipe.Context.GLFW as GLFW  
import qualified "JuicyPixels" Codec.Picture as Juicy  
import qualified "JuicyPixels" Codec.Picture.Types as Juicy  
import "linear" Linear  
main =   
  runContextT GLFW.newContext (ContextFormatColorDepth SRGB8 Depth16) $ do  
    -- Create vertex data buffers  
    positions :: Buffer os (B2 Float) <- newBuffer 4  
    normals   :: Buffer os (B3 Float) <- newBuffer 6  
    tangents  :: Buffer os (B3 Float) <- newBuffer 6  
    writeBuffer positions 0 [V2 1 1, V2 1 (-1), V2 (-1) 1, V2 (-1) (-1)]        
    writeBuffer normals 0 [V3 1 0 0, V3 (-1) 0 0, V3 0 1 0, V3 0 (-1) 0, V3 0 0 1, V3 0 0 (-1)]  
    writeBuffer tangents 0 [V3 0 1 0, V3 0 (-1) 0, V3 0 0 1, V3 0 0 (-1), V3 (-1) 0 0, V3 1 0 0]  
    -- Make a Render action that returns a PrimitiveArray for the cube   
    let makePrimitives = do   
          pArr <- newVertexArray positions  
          nArr <- newVertexArray normals  
          tArr <- newVertexArray tangents  
          let sideInstances = zipVertices (,) nArr tArr          
          return $ toPrimitiveArrayInstanced TriangleStrip (,) pArr sideInstances   
    -- Load image into texture  
    Right (Juicy.ImageYCbCr8 image) <- liftIO $ Juicy.readImage "image.jpg"  
    let size = V2 (Juicy.imageWidth image) (Juicy.imageHeight (image))  
    tex <- newTexture2D SRGB8 size maxBound -- JPG converts to SRGB  
    writeTexture2D tex 0 0 size $ Juicy.pixelFold getJuicyPixel [] image   
    generateTexture2DMipmap tex  
    -- Create a buffer for the uniform values        
    uniform :: Buffer os (Uniform (V4 (B4 Float), V3 (B3 Float))) <- newBuffer 1  
    -- Create the shader  
    shader <- compileShader $ do  
      sides <- fmap makeSide <$> toPrimitiveStream primitives  
      (modelViewProj, normMat) <- getUniform (const (uniform, 0))  
      let filterMode = SamplerFilter Linear Linear Linear (Just 4)  
          edgeMode = (pure ClampToEdge, undefined)  
          projectedSides = proj modelViewProj normMat <$> sides            
      samp <- newSampler2D (const (tex, filterMode, edgeMode))  
      fragNormalsUV <- rasterize rasterOptions projectedSides          
      let litFrags = light samp <$> fragNormalsUV  
          litFragsWithDepth = withRasterizedInfo   
              (\a x -> (a, getZ $ rasterizedFragCoord x)) litFrags  
          colorOption = ContextColorOption NoBlending (pure True)  
          depthOption = DepthOption Less True                            
      drawContextColorDepth (const (colorOption, depthOption)) litFragsWithDepth  
    -- Run the loop  
    loop shader makePrimitives uniform 0  
loop shader makePrimitives uniform angle = do  
  -- Write this frames uniform value   
  size@(V2 w h) <- getContextBuffersSize  
  let modelRot = fromQuaternion (axisAngle (V3 1 0.5 0.3) angle)  
      modelMat = mkTransformationMat modelRot (pure 0)  
      projMat = perspective (pi/3) (fromIntegral w / fromIntegral h) 1 100   
      viewMat = mkTransformationMat identity (- V3 0 0 5)  
      viewProjMat = projMat !*! viewMat !*! modelMat  
      normMat = modelRot  
  writeBuffer uniform 0 [(viewProjMat, normMat)]  
  -- Render the frame and present the results  
  render $ do  
    clearContextColor 0 -- Black  
    clearContextDepth 1 -- Far plane  
    prims <- makePrimitives  
    shader $ ShaderEnvironment prims (FrontAndBack, ViewPort 0 size, DepthRange 0 1)  
  closeRequested <- GLFW.windowShouldClose  
  unless closeRequested $  
    loop shader makePrimitives uniform ((angle + 0.005) `mod''` (2*pi))  
getJuicyPixel xs _x _y pix =  
  let Juicy.PixelRGB8 r g b = Juicy.convertPixel pix in V3 r g b : xs   
getZ (V4 _ _ z _) = z -- Some day I'll learn to use lenses instead...  
data ShaderEnvironment = ShaderEnvironment   
  { primitives :: PrimitiveArray Triangles (B2 Float, (B3 Float, B3 Float))  
  , rasterOptions :: (Side, ViewPort, DepthRange)  
-- Project the sides coordinates using the instance's normal and tangent  
makeSide (p@(V2 x y), (normal, tangent)) =   
  (V3 x y 1 *! V3 tangent bitangent normal, normal, uv)    
  where bitangent = cross normal tangent  
        uv = (p + 1) / 2  
-- Project the cube's positions and normals with ModelViewProjection matrix  
proj modelViewProj normMat (V3 px py pz, normal, uv) =   
  (modelViewProj !* V4 px py pz 1, (fmap Flat $ normMat !* normal, uv))   
-- Set color from sampler and apply directional light  
light samp (normal, uv) =   
  sample2D samp SampleAuto Nothing Nothing uv * pure (normal `dot` V3 0 0 1)     

(It is actually spinning)

You need an image called "image.jpg" in the same folder as your compiled program.

Besides linear and transformers, this code uses GPipe-GLFW for window management, and JuicyPixels for loading images.

I wont go into the details of the code today, but will instead cover all of it and more in a tutorial that I will publish as a series of blog posts here on this site. I trust you are adventurous enough to make some sense of the code on your own in the meantime, maybe with a little help from the haddocks. I expect you to find bugs, and when you do please report them to

What's next?

In order to reach feature completeness I had to leave a couple of OpenGl 3.3 features out for GPipe 2.0, but now starts the time of bringing them in! I'd like to have the communities feedback first on what features you need. Some suggestions on features currently not in GPipe 2:
  • Geometry shaders
  • Transform feedback
  • Multisample framebuffers and textures
  • Compressed textures
  • Half and double formats
  • Shader array types
  • Buffer textures
  • Occlusion queries
  • Packed host and buffer formats
  • OpenGl 4.5 features...
Let me know what you want, at, or by email to tobias_bexelius snabelA, where "snabelA" is Swedish for "at".