söndag 6 september 2015

GPU programming in Haskell using GPipe - Part 1

Welcome to the first part of a tutorial series for GPU programming in Haskell! We will be using GPipe 2.2.1, that I was earlier announced on this blog. GPipe 2 is a functional API based on OpenGl 3.3, but this tutorial will not require previous knowledge of OpenGl, so if you know Haskell (which is a prerequisite) and ever wanted to learn graphics programming now is the time!

Edit 2017-05-20: This tutorial has been updated for GPipe 2.2.

Hello triangle!

Lets start with a small "Hello world!" program:

{-# LANGUAGE ScopedTypeVariables, PackageImports, TypeFamilies #-}   
module Main where

import Graphics.GPipe
import qualified "GPipe-GLFW" Graphics.GPipe.Context.GLFW as GLFW
import Control.Monad (unless)

main =
  runContextT GLFW.defaultHandleConfig $ do
    win <- newWindow (WindowFormatColor RGB8) (GLFW.defaultWindowConfig "Hello world!")

    vertexBuffer :: Buffer os (B4 Float, B3 Float) <- newBuffer 3
    writeBuffer vertexBuffer 0 [ (V4 (-1) 1 0 1, V3 1 0 0)
                               , (V4 0 (-1) 0 1, V3 0 1 0)
                               , (V4 1 1 0 1,  V3 0 0 1)

    shader <- compileShader $ do
      primitiveStream <- toPrimitiveStream id
      fragmentStream <- rasterize (const (FrontAndBack, ViewPort (V2 0 0) (V2 500 500), DepthRange 0 1)) primitiveStream
      drawWindowColor (const (win, ContextColorOption NoBlending (V3 True True True))) fragmentStream

    loop vertexBuffer shader win

loop vertexBuffer shader win = do
  render $ do
    clearWindowColor win (V3 0 0 0)
    vertexArray <- newVertexArray vertexBuffer
    let primitiveArray = toPrimitiveArray TriangleList vertexArray
    shader primitiveArray
  swapWindowBuffers win

  closeRequested <- GLFW.windowShouldClose win
  unless (closeRequested == Just True) $
    loop vertexBuffer shader win   

As you see from the import list, an additional packages is required: GPipe-GLFW (version 1.4.1 or later). This package provides functionality to create windows into which GPipe draws, as well as functions to get key and mouse input. This kind of functionality used to be built into GPipe 1, but since many wanted to be free to choose what window manager to use this was factored out to its own package. At the time of writing this, only bindings for GLFW exists, but more will surely come.

When you import Graphics.GPipe you also get the entire linear and Boolean packages, since these are used quite heavily in GPipe applications.

We are now ready to compile (use -threaded as parameter to GHC as GPipe-GLFW requires it) and run our program which will show us this colorful triangle in the window's lower left corner:

The context

The first thing we do in the main function is to execute runContextT. A context owns all the GPU memory objects, much like a process does for the CPU. runContextT creates a new context for us. It takes two arguments: a ContextHandlerParameters value and a monad action.

The ContextHandlerParameters parameter is provided by the window manager. To use the GPipe-GLFW package we imported earlier, we pass in GLFW.defaultHandleConfig as this first argument.

The second parameter to runContextT is the monad action in which our entire GPipe program happens. This monad action has the type ContextT ctx os m a. This is a monad transformer, i.e. a monad that is inheriting the capabilities of another monad of type m. For ContextT, m is the type of the monad in which we run runContextT. In this and most other cases, that is simply the IO monad. Inside a monad transformer, you may use the function lift to execute an action in the inherited monad.

Some type trickery is made to ensure that the variables that GPipe actions return within your context is not returned from it. This is the same trickery as the ST monad uses to ensure STRefs aren't returned and used in another runST invocation. The trick is that the runContextT call uses something called a rank 2 type:
runContextT :: (MonadIO m, MonadAsyncException m, ContextHandler ctx) 
            => ContextHandlerParameters ctx 
            -> (forall os. ContextT ctx os m a)
            -> m a

Notice that there is a forall qualifier for os local to the ContextT monad action argument. This will make any object referencing that os type bound to this monad action.

The ctx parameter in the ContextT type is the type of the window manager. When we use GLFW.defaultHandleConfigctx will be Graphics.GPipe.Context.GLFW.Handle. This type constraints some actions to only be usable in ContextT monads for a certain window handler, as we will see an example of below.


The first thing we do in our ContextT monad is to create a window using newWindow. This function takes two parameters, a format and a window manager specific parameter. The format is describing what kind of images we will be drawing into the window, e.g. how many color channels it will have and how many bits per color. It also describes whether we have a depth buffer or a stencil buffer attached to the window. I will discuss what these are in a later part of this tutorial when I cover drawing. Right now we settle for a format with an RGB color with 8 bits per each of the three channels, and no depth or stencil buffers. The value that describes this format for us is WindowFormatColor RGB8.

You may create any number of windows you like, all that will be able to render different views of the same data. Windows may be explicitly closed using deleteWindow, but will otherwise be automatically closed when the runContextT call is finished.

You don't even need to create any window at all, for example if you want to use the GPU to generate images to save to disk rather than to show on screen.

windowShouldClose is an action defined by GPipe-GLFW, and only works in contexts where the ctx parameter is Graphics.GPipe.Context.GLFW.Handle. This action will return whether the user has requested the window to close, e.g. by clicking the 'X' in the top corner. Also, note that this action actually returns a Maybe Bool, which will be Nothing if the window was already closed.

Rendering - This is what it's all about

Now that we have our context, let's do some rendering. Any rendering you do in GPipe will follow this sequence of operations:

In short, every GPipe rendering will from a buffer of data create an array of vertices that are assembled into an array of primitives. There are three kinds of primitives: points, lines and triangles, but we will almost exclusively work with triangles. The array of primitives is then turned into a stream of primitives inside a shader, enabling us to do transformations of those vertices. The primitives are then rasterized, i.e. chopped up into pixel sized fragments, forming a fragment stream. This stream of fragments is then drawn in a window, or into an off-screen image.

In the ContextT monad, we create a Buffer of data that is stored on the GPU. In our "Hello world" example above, our buffer is called vertexBuffer and has 3 elements, each of which is a tuple (B4 Float, B3 Float). B4 and B3 are the "buffer representations" of V4 and V3, the vector types from the linear package. I will go into more detail what these "buffer representations" are in the next part of this tutorial, but for now you may think of B4 as just another name for V4 when we use it in a Buffer. Directly after creating the buffer, we write three values into it from an ordinary list.

With a function called render we run another monad, conveniently called... Render. In this monad we use our Buffer to create a VertexArray with the newVertexArray function. Coming from our vertexBuffer, vertexArray will have 3 vertices, each with of which has a tuple (B4 Float, B3 Float). Now you may wonder what the difference of a VertexArray and a Buffer is. A fair question indeed, but I'm afraid I'll have to wait until the next part of this turtorial to answer it. Sorry.

Now that we have a VertexArray, we use this to create a PrimitiveArray of triangles using the function toPrimitiveArray. TriangleList that we give as argument to this function indicates that we want to form triangles from each three consecutive vertices in vertexArray. Since there are only three vertices, primitiveArray will only contain a single triangle.

Looking at the graph above we should then turn this PrimitiveArray into a PrimitiveStream (yet another name for the same thing?), but in the code we just see shader primitiveArray?

Shader - A primer

The gray box in the graph above is called a Shader. It is, I guess unsurprisingly by now, also a monad! The difference from both ContextT and Render monad is that we can't run it directly, it has to be compiled first. This compilation is different from the compilation you do when you run ghc, cabal, stack or whatever shortcut you have in emacs. This compilation happens during runtime of the program, and is using a compiler provided by your graphics driver. This compilation may actually take seconds, so it is definitely not something you want to do every frame in for example a game written with GPipe.

A Shader monad is compiled with the function compileShader, that you run in your ContextT monad. compileShader will return a function that you later can run in a Render monad. In our example above, we compile the shader into a function we call just shader. And this shader is what we see being executed as last action in the Render monad, passing in primitiveArray as an argument.

Let's take a look at the actual Shader in our example now. The first action we run is toPrimitiveStream. This will load a PrimitiveArray into something called a PrimitiveStream. The PrimitiveArray to load is selected with the function passed as argument to toPrimitiveStream, in this case id. A Shader monad is almost like a Reader monad, it closes over an environment. But unlike Reader, there is no ask action where you can retrieve this environment. Instead, many other actions, like toPrimitiveStream, will take a function that extracts values from this environment. The environment value is not defined until the shader is run, i.e. not even when the shader is compiled. Remember that we passed primitiveArray as argument to our compiled shader function? That is the environment we use in our program. Since the function passed to toPrimitiveStream wants to extract a PrimitiveArray from the environment, and our environment is a PrimitiveArray, we just use id.

A PrimitiveStream is also a sequence of primitives, but it lives inside the shader and as such we may map functions on it that will run on the GPU. PrimitiveStream implements Functor, and fmap f primitiveStream will return a new primitive stream that is the result of applying the function f to each vertex of each primitive in primitiveStream. Mapping functions on streams with fmap in shaders is many times faster than doing the same kind of operation on an ordinary lists since we are using the GPU instead of the CPU. In our "Hello world" example, we are actually not doing anything with the primitives in our primitiveStream before we feed them to the rasterize function. But before we move on to that, let me just mention what the inferred type of primitiveStream is:

primitiveStream :: PrimitiveStream Triangles (V4 VFloat, V3 VFloat)

As you can see, the B4 and B3 types we had in our buffer (and in our vertex array and primitive array) got turned into V4 and V3 again, but the Floats inside them apparently got turned into VFloats! VFloat is really a type synonym for S V Float which is a Float lifted to a vertex stream on the GPU, i.e. it is not an ordinary Float that you can use with any Float function anymore; you can only do things with it that the GPU supports. I will discuss this type in more detail when we dissect shaders in an upcoming part of this tutorial.


Even though we never map any functions on our primitiveStream to run on the GPU, nor on the fragmentStream we are about to create, there is still one operation we will always do in a shader that leverages the massive parallelism of GPUs: rasterization.

Rasterization is the process of mapping a primitive such as a triangle to a grid and generate pixel sized fragments. The vertices of the input primitives are used in two ways: first they must all provide a position of the vertex so the rasterizer knows how many fragments to generate, and secondly they provide values that will be linearly interpolated between the primitive's all vertices to create unique values for each generated fragment.

The first argument to rasterize is a function extracting three parameters from the shader environment: which side of each primitive to rasterize, the view port's position and size, and the fragment's depth range. In our example, we know all parameters up front and don't need to get them from the shader environment, that is why we use the const function. The parameters we provide tells rasterize that it should rasterize both sides of each triangle, that the view port has lower left corner at (0,0) and has a width and height of 500 pixels, and that the depth range is [0,1]. More on that in a bit.

The vertices' positions are 3D coordinates in a canonical view space. During rasterization, these will be transformed into the view port in pixel screen space, where the position (-1,-1,z) in canonical view space will be mapped to the view port's lower left corner (in our case (0,0)) , and (1,1,z) will be mapped to the upper right corner (in our case (500,500)). To be more precise, the fragment in the lower left corner in our case will actually have pixel coordinate (0.5, 0.5), and the uppermost, rightmost fragment we generate will have coordinate (499.5, 499.5).

Every fragment also has a depth value in the range [0,1]. At rasterization, we specify with the DepthRange parameter how to map the canonical z coordinates to this range. A z coordinate with value -1 will be mapped to the first parameter of DepthRange, and a z coordinate with value 1 will be mapped to the second parameter of DepthRange. In our example, we map z coordinates in the canonical view space range [-1,1] to the depth range [0,1]. The convention used by Linear.Projection and most other OpenGl math libraries is that a z coordinate of 1 in canonical view space is considered to be furthest away and -1 to be closest, but you are actually free to use any convention you like. Any fragment with a value outside the depth range [0,1] will be discarded, so any part of primitives that intersects the box [(-1,-1,-1), (1,1,1)] in the canonical view space will become fragments in the view port. This box is commonly referred to as the canonical view volume.

The position of a vertex in canonical view space is actually provided as a V4 VFloat, known as a homogeneous 3D coordinate, where V4 x y z w has the 3D position (x/w, y/w, z/w). All three vertices of the triangle in our example all use 1 for the w component, so in this simple case they are just normal 3D coordinates. When using perspective projection (where objects appear smaller the further away they are, which is standard in most 3D applications) the w component will not be 1. The reason the rasterizer wants w to be passed in explicitly instead of having us divide the other components by it ourselves (by mapping such a function over the primitive stream), is that this w component is also used when interpolating all other values of the vertex. I'll demonstrate how this perspective correct interpolation works in a later part when we cover textures and samplers.

Now that we have calculated which fragments to generate from each primitive, and what screen positions and depth values these will have, we can interpolate the vertices' other values. The rasterize function's second argument is a primitive stream of type

FragmentIput a => PrimitiveStream p (V4 VFloat, a)

And returns a fragment stream of type

FragmentIput a => FragmentStream (FragmentFormat a)

That is, each vertex has a homogenous position as we've just discussed, but also some other value of type a that will be turned into a value of type FragmentFormat a in each fragment. These values are produced by linearly interpolating the vertices values over the entire primitive for each fragment. In our example, a is V3 VFloat representing the color of each vertex. FragmentFormat a is an associated type in the FragmentInput class, and FragmentFormat (V3 VFloat) evaluates to V3 FFloat. FFloat is just like VFloat a lifted version of Float, but this time in a fragment stream. We distinguish lifted values in vertex streams from lifted values in fragment streams, since GPUs doesn't support the exact same set of operations on them.

Drawing and swapping

The last thing we do in our shader now that we have our fragmentStream is to draw its fragments into our window. drawWindowColor takes the fragmentStream as argument but also (just like most other actions in the Shader monad) a function that extracts parameters from the shader environment. In this case the parameter extracted is a value of type (Window os c ds, ContextColorOption c), i.e. the window to draw to and a specification how the fragments should be combined with the previous values in the window. The values we provide in our example (again using const since it is not dependent on the shader environment) is the window we created earlier and a specification that each fragment should completely overwrite the previous value in the window. I will devote an entire part of this tutorial to drawing, so these parameters will be explained in detail later.

Since our window was created with format RGB8, the fragment stream needs to contain color values of type V3 FFloat. Conveniently enough, that is exactly the type our fragmentStream has as a result of rasterization. In most GPipe programs though, you will fmap functions on your fragment stream to transform the values interpolated from rasterization into the color format that is required by the window.

Drawing is the only action in the shader that has a side effect: in this case the back buffer of the window is altered. A window actually has (at least) two buffers, one we call the front buffer that is currently shown on screen and one that we call the back buffer that shaders are drawing to. When the shader primitiveArray action in the Render monad action returns, the back buffer will have been updated. To present this newly rendered image on screen, we need to call swapWindowBuffers inside our ContextT monad. This will tell the graphics hardware to swap places of the front and back buffer. This will not perform any copy of memory, but merely swap some pointer values, so it is quite effective. swapWindowBuffers may however block for a while if you try to present images faster than the screen can update, but this usually is a good thing because you would otherwise just waste GPU and CPU cycles producing more images than would have been presented.

There is one line in our examples Render action that I shamelessly skipped over before: clearWindowColor win (V3 0 0 0). This action happens before we run the shader, and it is used to set each pixel in the previous contents of the window's back buffer to a constant value, in this case V3 0 0 0, aka black. After a swap, the contents of the back buffer is undefined, so it is always a good idea to start each frame after last swapWindowBuffers by clearing. Clearing and running shaders are the two actions in the Render monad that has side effects.

This concludes the first part of this tutorial. Next time I will write about all the gory details of Buffers and PrimitiveArrays!

2 kommentarer:

  1. Is the use of the "Package Imports" extension necessary? It seems like only one package provides the `Graphics.GPipe.Context.GLFW` module

    1. Not really "necessary", but for a tutorial I like how it makes it explicit what comes from other packages, especially as the package Graphics.GPipe.Context do come from GPipe.