torsdag 1 oktober 2015

GPU programming in Haskell using GPipe - Part 3

< Previous episode: Buffers and arrays

Welcome to the Shader!

Finally! This is where the real fun happens! In previous episode we learned many different ways to create a PrimitiveArray. Remember from the first episode that this PrimitiveArray could be turned into a PrimitiveStream inside the Shader monad? Well, that's what we will do next.

Primitive streams

A quick recap from episode 1: This is how you create a PrimitiveStream: 
toPrimitiveStream :: forall os f s a p. VertexInput a => (s -> PrimitiveArray p a) -> Shader os s (PrimitiveStream p (VertexFormat a))

This is a Shader action that uses a function to get a PrimitiveArray from the shader environment, and then returns a PrimitiveStream. There are two things to remember about Shader actions:
  1. Whatever is returned from a Shader action can never leave the Shader monad. The only thing you can do with a Shader monad is to give it to compileShader, and that one is expecting a Shader os s (), i.e. it doesn't accept any return values.
  2. Whenever you see a Shader action wanting a function to retrieve some from the shader environment (in the toPrimitiveStream case, x is a primitive array), then x is something that may vary with different invocations of the shader without requiring another call to compileShader.
In the transition from array to stream, our vertices will turn from type a to type VertexFormat a. This is very similar to how buffers worked: there is a VertexInput type class that defines what may be turned into vertices in primitive streams:

class BufferFormat a => VertexInput a where
  type VertexFormat a
  toVertex :: ToVertex a (VertexFormat a)

toVertex is an arrow action, just as toBuffer was. If you want to create your own instance of VertexInput you define your toVertex member in terms of the toVertex members of the GPipe provided instances. Let's look at some examples of these instances:
a  VertexFormat f
B FloatS V Float
B Int32S V Int
B Word32S V Word
Normalized (B Int32)S V Float
Normalized (B Word32)S V Float
B2 FloatV2 (S V Float)
B2 Int32V2 (S V Int)
B2 Word32V2 (S V Word)
B2 Int16V2 (S V Int)
B2 Word16V2 (S V Word)
Normalized (B2 Int32)V2 (S V Float)
Normalized (B2 Word32)V2 (S V Float)
Normalized (B2 Int16)V2 (S V Float)
Normalized (B2 Word16)V2 (S V Float)
(a, b)(VertexFormat a, VertexFormat b)
V2 aV2 (VertexFormat a)
There are many more, including B3, B4 and larger tuples. See the full list on hackage.

Almost looks like a straight copy from the table in the previos episode, doesn't it? One important difference between the BufferFormat class and the VertexInput class is that the former was defined on the resulting type in the conversion and had an associated type for the originating type (HostFormat), while VertexInput is instead defined on the originating type and uses an associated type for the resulting type (VertexFormat). So in the table above, a B Float will turn into a S V Float and not the other way around.

S x a is what is called a lifted type, and represents an a in an x "setting" (to avoid using the word "context" that already has a distinct meaning in GPipe). The setting in this case is V, which you might have guessed stands for "Vertex". There is one other setting in GPipe and that is F that stands for Fragment.

You use an S x a almost just as you would use a normal a: For example S V Float has Num, Floating and Fractional instances so you can do things like add, divide or take sin of them.

Ints and Words looses their specific size when entering the world of shaders, so e.g. both Int16 and Int32 becomes just Int in an S-type.

There is a Normalized newtype wrapper that you can put an integral B-value in (when fmaping on the PrimitiveArray) to make it become a lifted floating value. The GPU will do this by mapping a signed (i.e. IntXX) integral's range [minBound, maxBound] to the floating point range [-1.0, 1.0] on the shader side, and an unsigned's (i.e. WordXX) range to [0.0, 1.0].

Buffer vector types (such as B2) will turn back into their normal V-vector form (e.g. V2), but with the elements converted to S-values. Normal V-vectors may also be used as input and will become the same V-vector of S-values on the shader side. The difference is that converting a V2 Int32 for instance would take two shader attribute slots while converting a B2 Int32 would only take one. Each graphics hardware has a limit on how many attribute slots can be used at once, and GPipe will detect if this limit is reached and throw an exception from the compileShader call in that case. To help staying under that limit, prefer using B-vectors instead of V-vectors in buffers and primitive arrays when possible. Actually, the topic of error handling in GPipe is so important that it should have its section of its own:

Error handling in Gpipe

There are three levels of errors you may encounter when working with GPipe:
  1. Type check errors
    This is the whole idea of GPipe: GHC will tell you when you try to do something that is statically known to not work (something that usually would be a runtime error when working with plain OpenGl). Most invariants in GPipe are checked this way.
  2. Runtime GPipeExceptions
    Whenever a hardware dependent limit is reached, like number of attributes used by a PrimitiveArray, a GPipeException is thrown. You may always catch one of those to maybe show a user friendly error message and/or fallback to a simpler solution.
  3. Runtime error calls
    For operations that involves indexing, e.g. writeBuffer, an error may be thrown if you use an index that is out of bounds (just as the !! operator on a list would). Since these kinds of errors are not hardware dependent and would always be thrown, they are considered to be programmer errors and as such are not guaranteed to be catchable in a safe way (e.g. the interior OpenGl state might be garbled after this).

Enough of that, back to the shader

So now we have a PrimitiveStream with vertices built up from S-values. A PrimitiveStream is a Functor and a Monoid, just as PrimitiveArray was, so you can fmap functions on it's vertices and mappend several streams into one. But since your vertices are made of S-values and not B-values, you can actually do computations on them that will be run as shader code on the GPU! Let's try it out! In the "Hello world" program you did in part 1 of this tutorial, add these two lines to your shader:
      let primitiveStream2 = fmap (\(pos,clr) -> (pos - V4 1 1 0 0, clr / 10)) primitiveStream  
      let primitiveStream3 = primitiveStream `mappend` primitiveStream2  

Also replace the last argument to rasterize from primitiveStream to primitiveStream3. Rebuild and you should see this picture:

It looks almost like the original one, but on top of the old triangle you see a new one, translated exactly 250 pixels down and left, and with one tenth of the brightness of the original triangle. Why 250 pixels? Remember from part 1 that the canonical view space is defined as [(-1,-1,-1), (1,1,1)], i.e. a cube with size 2, and since we used a view port of size 500 a move of 1 unit will translate to 250 pixels.

Why was the new triangle drawn on top of the old? Switch place with the operands to mappend and you'll see: the order of primitives in a PrimitiveStream (and also the fragments in a FragmentStream) matters. Since we didn't used any depth or stencil tests when drawing (I'll explain what that is later as well) the new triangle became on top because it was drawn later.

If you knew OpenGl from before, you might be interested in how the GLSL shader that drew these triangles look like. Actually, there wasn't one shader but two! This is where GPipe starts showing it's modularity and composability compared to plain GLSL code. Let's take it even further and add two more lines to the shader:
      let rotationMatrix a = V4 (V4 (cos a) (-sin a) 0 0)   
                                (V4 (sin a) (cos a) 0 0)  
                                (V4 0    0    1 0)  
                                (V4 0    0    0 1)  
      let primitiveStream4 = fmap (first (rotationMatrix (-0.2) !*)) primitiveStream3   

And don't forget to change the rasterize argument to primitiveStream4 now. When running, both triangles should have been rotated 0.2 radians clock wise around origo:

(You see the old bright triangle getting clipped on top and right because we use a view port that is smaller than the actual window.)

As you can see, in GPipe there is nothing stopping you from fmaping functions on streams more than once. How many GLSL shaders have GPipe created behind the curtains in this case? Still just two! If you are interested in how the shaders GPipe generates looks like and use Windows I can recommend the tool RenderDoc (a linux UI is also planned). With the latest version you can inspect any OpenGl 3.2+ application, and watch all API calls, all data and all GLSL shader source code.

A note on performance: Combining PrimitiveArrays with mappend has usually better performance than combining two PrimitiveStreams, so unless you need to fmap different functions on them do the mappend before they turn into streams with toPrimitiveStream.

Linear algebra in GPipe

If you've never done any kind of 3D programming before that last rotation code might be all new to you. In that case I'm afraid you'll have to pick up some basic linear algebra before venturing on, since that is essential for any graphics programming. E.g. you need linear algebra in your PrimitiveStream to transform your vertices from object space and world space to the canonical viewspace that the rasterizer expects. Linear algebra and 3D math is however out of scope for this tutorial. This seems to become a pretty good online book on the subject (at least I have enjoyed Akenine-Möller's other books before), but at the time of writing this it is not complete yet. If any reader has suggestions on good 3D math tutorials, then please share those in the comments field below!

Linear algebra in GPipe is preferably done with the linear package, since support for it's vector types are built in to GPipe. If you import the Graphics.GPipe module, you will even get all of linear imported for free!

When doing 3D math, there are a number of things that can be made in two different ways
  • Coordinate systems can be right handed or left handed
  • Transform matrices can be made to be multiplied with vectors from left or from right
  • Matrices can be defined in row major or column major order
The linear package is mostly mimicking what was the convention in early versions of Opengl: it uses a right handed system before perspective or orthographic projection (you look down negative z) but left handed after (far plane gets z=+1 and near plane z=-1), and you multiply any matrices returned from linear's functions such as mkTransformation from the right. Matrices are however unlike OpenGl defined in row major order in the linear package (which will require an extra transpose when building base matrices from left, up and forward vectors). 

Just remember that these assertions are a property of the functions of the linear package, and you are free to ditch those in favor of whatever linear algebra functions you like (the underlying OpenGl has no preferences really).


In the example above, we rotated the two triangles clock wise 0.2 radians. What if we want to increase the angle every frame? We can't make the angle parametric since the shader is precompiled, and there's no way to get an arbitrary value from the shader environment (there is no ask method, remember). Instead, you use uniforms!

A uniform in GPipe is a shader value that you get from a single buffer element. Uniforms in GPipe are built upon something that is called Uniform Buffer Objects (UBOs) in OpenGl, but don't worry if you don't know what that is. You get a uniform from a buffer with this shader action:
getUniform :: forall os s b x. UniformInput b => (s -> (Buffer os (Uniform b), Int)) -> Shader os s (UniformFormat b x)

Uh oh, another type class with another associated type? Yep, I told you that pattern was going to occur several times in GPipe. I think you know how that works by now so I'll just say it works almost exactly like VertexInput, except there are no instances for smaller Ints and Words than 32 bits. Check out the haddock for it here.

getUniform is one of those Shader functions that take a shader environment function as argument. In this case, that function is used to retrieve a tuple of a Buffer and an index into that buffer. The element type of the Buffer has to be Uniform b (where b is an instance of UniformInput). The reason for this is that uniforms have very restrictive requirements for alignment, and by creating a buffer with the element type wrapped in the newtype wrapper Uniform, you instruct GPipe to use that alignment. A buffer of this kind can still be used to create VertexArrays if you really wanted to (you can just remove the Uniform wrapper with a fmap on the VertexArray) but keep in mind that a buffer created with Uniform will have a quite large element alignment (usually 128 bytes, it's hardware dependent), and hence large padding for each element.

The uniform value you get with getUniform is just like the vertices in your PrimitiveStream built up from S-values, but it isn't bound to a stream of its own and can be used when fmaping with any other stream instead. Let's try it out:

{-# LANGUAGE ScopedTypeVariables, PackageImports, TypeFamilies, FlexibleContexts #-}

module Main where

import Graphics.GPipe
import qualified "GPipe-GLFW" Graphics.GPipe.Context.GLFW as GLFW
import Control.Monad (unless)
import Data.Monoid
import Control.Arrow (first)

main =
  runContextT GLFW.defaultHandleConfig $ do
    win <- newWindow (WindowFormatColor RGB8) (GLFW.defaultWindowConfig "Uniforms")
    vertexBuffer :: Buffer os (B4 Float, B3 Float) <- newBuffer 3
    writeBuffer vertexBuffer 0 [ (V4 (-1) 1 0 1, V3 1 0 0)
                               , (V4 0 (-1) 0 1, V3 0 1 0)
                               , (V4 1 1 0 1, V3 0 0 1)

    uniformBuffer :: Buffer os (Uniform (B Float)) <- newBuffer 1

    shader <- compileShader $ do
      primitiveStream <- toPrimitiveStream id
      let primitiveStream2 = fmap (\(pos,clr) -> (pos - V4 1 1 0 0, clr / 10)) primitiveStream
      let primitiveStream3 = primitiveStream `mappend` primitiveStream2
      let rotationMatrix a = V4 (V4 (cos a) (-sin a) 0 0)
                                (V4 (sin a) (cos a) 0 0)
                                (V4 0    0    1 0)
                                (V4 0    0    0 1)
      uniform <- getUniform (const (uniformBuffer,0))
      let primitiveStream4 = fmap (first (rotationMatrix uniform !*)) primitiveStream3
      fragmentStream <- rasterize (const (FrontAndBack, ViewPort (V2 0 0) (V2 500 500), DepthRange 0 1)) primitiveStream4
      drawWindowColor (const (win, ContextColorOption NoBlending (V3 True True True))) fragmentStream

    loop vertexBuffer shader win uniformBuffer 0

loop vertexBuffer shader win uniformBuffer angle = do
  writeBuffer uniformBuffer 0 [angle]
  render $ do
    clearWindowColor win (V3 0 0 0)
    vertexArray <- newVertexArray vertexBuffer
    let primitiveArray = toPrimitiveArray TriangleList vertexArray
    shader primitiveArray
  swapWindowBuffers win

  closeRequested <- GLFW.windowShouldClose win
  unless (closeRequested == Just True) $
    loop vertexBuffer shader win uniformBuffer ((angle+0.1) `mod''` (2*pi))

(The additions for uniforms are highlighted)

First we create an additional buffer with just one element that we name uniformBuffer. We then get the value from this buffer as a uniform in the shader (using const since we always want to use this buffer and index 0). Last, we send uniformBuffer to the loop function along with an angle value so that we can write the angle value to that buffer each iteration. When we render our shader with shader primitiveArray, it will use the current value of the buffer that we just wrote. We then increment the angle value for the next recursion of the loop, with a constantly rotating image as result.

Working with lifted S-values

To avoid numerical instability after several lapses I used mod'' on the incremented angle to make it wrap back to 0 each lap. This function is a method of the Real' typeclass, defined in the Graphics.GPipe.Expr module. This type class, as well as the type classes Convert, Integral' and FloatingOrd, are provided since the prelude version is too tied to non-lifted real world types (e.g. the prelude Integral type class has a toInteger method that we couldn't have defined for S-values). These GPipe-specific type classes also have instances for normal types such as Float and Double, so that's why we could use mod'' in our example above.

Boolean operations in the Prelude are particularly non-supporting towards lifted types. For example ==, < and && all return non-lifted Bools. To alleviate this, GPipe uses the Boolean package which provides type classes that are parametric on the type of boolean, so that we can use S V Bool in conditional functions. All lifted versions of operators have a * postfix and all lifted version of functions have a B postfix. Boolean also provides an ifB function that works the same way as the normal if statement but on lifted booleans (just without the then and else syntax). Here's an example on a function that uses conditionals on S-values:
f :: S V Float -> S V Float -> S V Float
f x y = ifB (x <* y &&* x /=* 3) (x * 2) (minB y 3)

Besides ifB provided by Boolean, GPipe also defines three other ways of doing conditional branching:
ifThen :: forall a x. ShaderType a x => S x Bool -> (a -> a) -> a -> a
ifThenElse :: forall a b x. (ShaderType a x, ShaderType b x) => S x Bool -> (a -> b) -> (a -> b) -> a -> b
ifThenElse' :: forall a x. ShaderType a x => S x Bool -> a -> a -> a

They don't seem to bring much more than just another constraint than ifB, and they are indeed almost functionally equivalent, except in rare cases when using implicit derivatives in a FragmentStream; I'll cover that in next part of the tutorial. Another difference is that these functions in many cases generate more efficient shader code for the GPU than ifB. Rule of thumb is to use the most specialized if-function whenever possible, and basically only use ifB if you want a generic function that works on normal non-lifted values as well.

The ShaderType constraint is also used by this function:
while :: forall a x. ShaderType a x => (a -> S x Bool) -> (a -> a) -> a -> a

You can't use a regular recursive function to express loops with S-values, since you cannot select a base case to stop the recursion based on a lifted S x Bool value. Instead, you can use this while function, which takes a conditional function, a transformation function and an initial value. It then does the transformation repeatedly as long as the conditional function returns true. This loop is made on the GPU when running the shader, so if it never returns true your computer might hang (unless you have GPU hang detection like Windows which in most cases just kills your application). Here's an example of a function that multiplies a float value by 0.9 x number of times:
f :: S V Int -> S V Float -> S V Float
f x y = snd $ while ((<* x) . fst) (\(i,n) -> (i+1, n*0.9)) (0, y)

You may create your own instances of the ShaderType type class if you have a type which structure can be mapped to any of the existing instances'. Here is an example of such an instance:
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, UndecidableInstances #-}
data MyType x = MyType (S x Float) (S x Int)

instance ShaderType (MyType x) x where
  type ShaderBaseType (MyType x) = ShaderBaseType (S x Float, S x Int)
  toBase x ~(MyType f i) = toBase x (f, i)
  fromBase x b = let (f,i) = fromBase x b in (MyType f i)

One thing to remember is to make toBase completely lazy, using tilde (~) in pattern matches and all. And yeah, UndecidableInstances is currently needed for own instances of ShaderType, sorry about that!

Combining Shader monads

Before we end this part of the tutorial, I just want to show you some cool stuff about the Shader monad.

First of all, in order to be able to reuse Shader actions in many different settings we need a way to change the shader environment. That is done with this combinator:
mapShader :: (s -> s') -> Shader os s' a -> Shader os s a

With mapShader you provide a function that extracts a sub environment from another environment, and a Shader monad that operates in that sub environment that then will be turned into a Shader that operates in the other environment.

The Shader monad is not only a monad, it is also a MonadPlus (and an Alternative which is basically the same thing). This enables us to create alternative branches with mplus (or <|>), and to discriminate among the branches with the guard function from Control.Monad. The left most branch without a guard False action will be the one that is run. (A Shader where all branches has guard False actions will throw an error.) This is all kind of cool, but what is even cooler is this:
guard' :: (s -> Bool) -> Shader os s ()

It looks almost exactly like guard, except that instead of a Bool it takes a function that retrieves a Bool from the shader environment. This enables us to select among branches at shader run time instead of shader compile time! You could for instance have a bool in the shader environment to denote whether we want to render shadows or not, and be able to toggle that in runtime.

Derived from mapShader and guard', GPipe defines these two useful shader combinators:
maybeShader :: (s -> Maybe s') -> Shader os s' () -> Shader os s ()
chooseShader :: (s -> Either s' s'') -> Shader os s' a -> Shader os s'' a -> Shader os s a

The first one, maybeShader, works almost like mapShader, only that the function provided only maybe returns a sub environment. If it doesnt (i.e. it returns Nothing) maybeShader does nothing. Since it might not run, no return value besides () can be expected.

chooseShader will run one of two provided Shader actions, each in a (possibly different) sub environment. Since exactly one of the actions always will be run, this may return a value as well.

I think that's all for this time! Stay tuned for next part where we will look at textures and samplers!

5 kommentarer:

  1. You use 'uniform' in example where you rotate triangles for the first time. It should be replaced with '0.2' since 'uniform' wasn't defined yet (in following section).

  2. Svar
    1. You are absolutely right! Have changed it now. Thanks for spotting it!

  3. If the code does not compile for you (e.g. vanilla GHC 7.10.3), consider adding 'FlexibleContexts' extension in the LANGUAGE pragma.

    Full LANGUAGE pragma I used to compile the code:

    {-# LANGUAGE ScopedTypeVariables, PackageImports, FlexibleContexts, TypeFamilies #-}

  4. You mention that you have to take the modulus of the angle before feeding it to sin/cos to avoid imprecision. To my knowledge most machines these days have a repeating texture on the GPU for sin/cos, is this taken advantage of when sin/cos are called in a GPipe program? If so then I assume the use of mod'' is just a safeguard in case the user isn't on one of these said machines.

    What's the case for functions implemented in hardware in general?