söndag 1 november 2015

GPU programming in Haskell using GPipe - Part 5


< Previous episode: Textures and Samplers

We are almost at the end of this tutorial series, but before we depart I'll have to show you the ins and outs of drawing. Drawing is the process of having each fragment in a FragmentStream affect a pixel in an image, and the ultimate goal of any graphics program. We start with drawing to the window, and then we will look at drawing to off-screen images.

Window formats

Most GPipe programs have a shader that ends with drawing to the window. When we created our window back in part 1, we gave WindowFormatColor RGB8 as parameter to the newWindow call. This parameter described the format of the window, and is defined by this type:
data WindowFormat c ds where
  WindowFormatColor :: ContextColorFormat c => Format c -> WindowFormat c ()
  WindowFormatColorDepth :: ContextColorFormat c => Format c -> Format Depth -> WindowFormat c Depth
  WindowFormatColorStencil :: ContextColorFormat c => Format c -> Format Stencil -> WindowFormat c Stencil
  WindowFormatColorDepthStencilSeparate :: ContextColorFormat c => Format c -> Format Depth -> Format Stencil -> WindowFormat c DepthStencil
  WindowFormatColorDepthStencilCombined :: ContextColorFormat c => Format c -> Format DepthStencil -> WindowFormat c DepthStencil
  WindowFormatDepth :: Format Depth -> WindowFormat () Depth
  WindowFormatStencil :: Format Stencil -> WindowFormat () Stencil
  WindowFormatDepthStencilSeparate :: Format Depth -> Format Stencil -> WindowFormat () DepthStencil
  WindowFormatDepthStencilCombined :: Format DepthStencil -> WindowFormat () DepthStencil

The data stored for a window is called a frame buffer, which may have any combination of three different image buffers: A color buffer, a depth buffer and a stencil buffer. The WindowFormat type is parameterized on what type of color and what combination of depth or stencil buffers it contains. The types of these parameters comes from the constructors' Format arguments (same kind of Format as we previously have used for defining textures). () is used to denote the lack of buffers. In all examples so far, we have only used a window format with a RGB8 color buffer and no depth nor stencil buffers, hence a window format of type WindowFormat RGBFloat (). Windows' frame buffers only ever uses float colors, and never integral colors. 

Window drawing actions

Drawing is done as a Shader action, in fact the only one with a side effect! There are different drawing actions to choose from depending on which of the color, depth or stencil buffers you want to use (given that they exist):
drawWindowColor :: forall os s c ds. ContextColorFormat c => (s -> (Window os c ds, ContextColorOption c)) -> FragmentStream (FragColor c) -> Shader os s ()
drawWindowDepth :: forall os s c ds. DepthRenderable ds => (s -> (Window os c ds, DepthOption)) -> FragmentStream FragDepth -> Shader os s ()
drawWindowColorDepth :: forall os s c ds. (ContextColorFormat c, DepthRenderable ds) => (s -> (Window os c ds, ContextColorOption c, DepthOption)) -> FragmentStream (FragColor c, FragDepth) -> Shader os s ()
drawWindowStencil :: forall os s c ds. StencilRenderable ds => (s -> (Window os c ds, StencilOptions)) -> FragmentStream () -> Shader os s ()
drawWindowColorStencil :: forall os s c ds. (ContextColorFormat c, StencilRenderable ds) => (s -> (Window os c ds, ContextColorOption c, StencilOptions)) -> FragmentStream (FragColor c) -> Shader os s ()
drawWindowDepthStencil :: forall os s c ds. (DepthRenderable ds, StencilRenderable ds) => (s -> (Window os c ds, DepthStencilOption)) -> FragmentStream FragDepth -> Shader os s ()
drawWindowColorDepthStencil :: forall os s c ds. (ContextColorFormat c, DepthRenderable ds, StencilRenderable ds) => (s -> (Window os c ds, ContextColorOption c, DepthStencilOption)) -> FragmentStream (FragColor c, FragDepth) -> Shader os s ()

If you e.g. has a window format of type WindowFormat RGBFloat Depth, you may use all of these but anyone that has Stencil in their name.

Remember from part 1 that the window's frame buffer is double buffered and these drawing actions will draw to the hidden back buffer. To make the drawn image visible on screen, you need to make a call to swapWindowBuffers after you have done all your drawing.

In all previous examples we have been using clearWindowColor for clearing the window. There are clearWindowDepth, clearWindowStencil and clearWindowDepthStencil functions for clearing a window's depth or stencil buffers as well. What value you should use to clear the buffer with depends on the use case, but usually you use 0 (i.e. black) for colors and 1 for depths.

Common for all drawing actions are that they take two arguments: a function to retrieve window and drawing options from the shader environment, and a FragmentStream to draw. The options retrieved from the shader environment are different depending on which of color, depth or stencil buffers are used. The type of fragments in the FragmentStream is also dependent on whether color or depth is used. Let's look at how each of color, depth and stencil work separately.

Drawing colors

All of the drawing actions listed above that has Color in their name will retrieve at least a ContextColorOption c from the shader environment, and requires a FragmentStream with vertices containing FragColor c, even though some of them require additional options or fragment values for depth or stencil tests. The ContextColorOption c data type looks like this:
data ContextColorOption f = ContextColorOption Blending (ColorMask f)
type ColorMask f = Color f Bool
data Blending
  = NoBlending
  | BlendRgbAlpha (BlendEquation, BlendEquation) (BlendingFactors, BlendingFactors) ConstantColor
  | LogicOp LogicOp

It consists of a Blending and a ColorMask. The latter is a Color of Bool components, e.g. a V3 Bool for an RGB color buffer. You can use the color mask to suppress drawing of specific color channels by setting the corresponding components to False.

The Blending setting determines how each fragment gets combined with the image's previous pixel value to make a new pixel value. In most cases NoBlending is used, which makes each fragment simply overwrite the pixel's previous value (for the color channels where the ColorMask is True that is). If BlendRgbAlpha is used, then the new pixel value value will become the result of a pair of BlendEquations, where the first determines how the RGB components are blended and the second how the A component (if it exists) is blended. A BlendEquation may be one of five symbolic values that each represent a different function, so its not quite as flexible as you might have hoped. The fragment and the previous pixel values will also each be multiplied with a BlendingFactor (an enum of 15 predefined symbolic values) before the BlendEquations are performed.

If this blending is used:
BlendRgbAlpha (eqRGB, eqA) (BlendingFactors srcFactRGB destFactRGB, BlendingFactors srcFactA destFactA) color
then the RGB and Alpha of the new pixel will be calculated like this:
(pseudo code)
newPixelRGB = eqRGB (srcFactRGB color * fragmentRGB) (destFactRGB color * previousPixelRGB)
newPixelA = eqA (srcFactA color * fragmentA) (destFactA color * previousPixelA)

The ConstantColor argument is only used by some values of BlendingFactors, namely ConstantColor, OneMinusConstantColor, ConstantAlpha and OneMinusConstantAlpha. If neither of these are used by your Blending value, then you may use undefined for ConstantColor. The ConstantColor is always a V4 Float, no matter what the actual window color format is. This means that all components may not be used. The reason for this seemingly deficiency in the design will become apparent in a bit when we learn that we can draw to multiple color images at once (of possibly different formats) but can only use a single Blending for them all.

The last variant of Blending is LogicOp. This is yet another predefined symbolic function enum that will be used for any colors that has an internal integral representation. That means for example RGB8 (which has an internal unsigned byte representation per component even though you use S F Floats to write to it) or RGBUI (which has the same internal representation as RGB8), but not for example RGBA32F (which uses floating point values internally). For colors with internal floating point representations, using LogicOp is equivalent to NoBlending.

Depth test

The drawing actions that has Depth in their name can be used for any window that has a depth buffer. These drawing actions will all perform an additional depth test and write values to the window's depth buffer. A depth buffer consists of a fixed point value in the range [0, 1] for each pixel (with a precision of 16, 24 or 32 bits).

The Depth drawing actions want a DepthOption or a DepthStencilOption from the shader environment, and they all need a FragmentStream with fragments that contains a FragDepth value (which is just a synonym for S F Float).

The DepthOption is defined like this:
data DepthOption = DepthOption DepthFunction DepthMask
type DepthMask = Bool

The DepthFunction is like the BlendEquation an enum with predefined symbolic functions. For each fragment, this function will compare the fragments FragDepth value to the depth buffer's previous value. If the function returns true, the depth value will be updated with the fragments value (unless the DepthMask is False) and the color (if available) will be written according to the ContextColorOption. If the DepthMask is False, the test is still always performed, it's just the updating of the depth buffer value upon passed tests that is skipped.

The FragDepth value may be calculated in a normal fmap on the FragmentStream, but most commonly the rasterized depth is what you want. To get this and other rasterized values, you can use the function withRasterizedInfo:
withRasterizedInfo :: (a -> RasterizedInfo -> b) -> FragmentStream a -> FragmentStream b
  data RasterizedInfo = RasterizedInfo {
    rasterizedFragCoord :: V4 FFloat,
    rasterizedFrontFacing :: FBool,
    rasterizedPointCoord :: V2 FFloat
  }

This works just as fmap on a FragmentStream, but augments the function with a RasterizedInfo record that among other things contains the fragments position in window space: rasterizedFragCoord. The z component of this member is the rasterized depth and this is what you usually pass on as FragDepth to the drawing actions.

The common convention for depth tests is to use Less as DepthFunction, and let the depth value increase with distance from viewer. This is what I used in the spinning box example from the announcement of GPipe 2.

When the drawing action uses a DepthStencilOption instead, then the DepthOption is provided as an argument to it's constructor.

Stencil test

A window may also contain a stencil buffer, which contains an integral value of 1, 4, 8 or 16 bits. If a stencil buffer is available, then any of the drawing actions with Stencil in their names may be used. These will all perform a stencil test before any depth test or blending. The stencil draw actions may also update the stencil buffer's values differently when the stencil test pass or fail, or even when the stencil test pass but the depth test fail. This is unlike the depth test that may only update the depth buffer's values where the test pass.

A stencil drawing action will not require any special values from the fragments to perform it's test. It do require a StencilOptions value from the shader environment though, which is defined like this:
type StencilOptions = FrontBack StencilOption
data FrontBack a = FrontBack { front :: a, back :: a }
data StencilOption = StencilOption {
  stencilTest         :: ComparisonFunction,
  stencilReference    :: Int,
  opWhenStencilFail   :: StencilOp,
  opWhenStencilPass   :: StencilOp,
  stencilReadBitMask  :: Word,
  stencilWriteBitMask :: Word
  }

Different stencil tests is specified for front facing and back facing primitives (in case of triangles that is, lines and points are always front facing). For each of the two tests, a ComparisonFunction is used just like for depth tests (DepthFunction is just a type synonym for ComparisonFunction). As previously mentioned, fragments doesn't contain individual values to use for this comparison like they did for depth test, instead a stencilReference is provided. This value will first be clamped to the same range as the stencil buffer's values. Then it will be masked with stencilReadBitMask, as will the stencil buffer's value be, and then the stencilTest will be performed on these masked values.

If the test fails, the stencil value will be updated by the symbolic function specified by opWhenStencilFail. If it passes and no depth tests are to be made, the color (if available) will be blended and the stencil value will be updated with opWhenStencilPass instead. When updating the stencil value, the stencilWriteBitMask will be used to determine what bits will be updated.

If the drawing action is doing both stencil and depth tests, the option value retrieved from the shader environment will be
data DepthStencilOption = DepthStencilOption {
  dsStencilOptions              :: StencilOptions,
  dsDepthOption                 :: DepthOption ,
  opWhenStencilPassButDepthFail :: FrontBack StencilOp
  }

The dsStencilOptions and dsDepthOptions works as before, with the addition that the depth test is only performed if the stencil test passes. If the depth test would fail, then opWhenStencilPassButDepthFail will be performed on the stencil buffer's value instead of opWhenStencilPass.

Custom filtering of fragments

Both stencil test and depth test may discard fragments so their colors aren't drawn. There is yet another way of discarding fragments that doesn't involve additional buffers:
filterFragments :: (a -> FBool) -> FragmentStream a -> FragmentStream a

This function works just like filter does on normal lists: Only fragments where the predicate evaluates to true will be included in the resulting FragmentStream. If you already know OpenGl I can tell you that this corresponds to using discard in a fragment shader.

Drawing to texture images

There are many rendering techniques that requires you to do more than one pass through the pipeline, which requires a way to render fragments to an off-screen image instead of the window. In GPipe you will use a portion of a texture as an off-screen image. An image is a single 2D array of pixels while there are many different dimensionalities of textures (all which can have multiple LOD levels) so there are multiple images in each single texture (each image in a Texture1D or Texture1DArray has height 1). You get one of the images from a texture using any of the getTextureXXXImage functions. Just as with all the other texturing functions, there is one version for each of the six texture types. This is how the 2D variant looks like:
getTexture2DImage :: Texture2D os f -> Level -> Render os (Image f)

This will give you an Image f of a format f that is a reference to a specific level of the given texture. Any drawing to this image will affect the original texture. Since this could make it hard to reason about the state of a texture when also sampling from it, you may not use getTexture2DImage and newSampler2D on the same texture within the same Render monad (and analogous on the other texture types). Trying to do so will generate a run time error (I really wish Haskell had the region inference capabilities of Rust so this could have been checked in compile time). This means that you need to split up a two-pass rendering into two separate render calls.

Once we have one or more images, drawing to them isn't radically different from drawing to the window. You will use a different set of functions:
draw :: forall a os s. (s -> Blending) -> FragmentStream a -> (a -> DrawColors os s ()) -> Shader os s ()
drawDepth :: forall a os s d. DepthRenderable d => (s -> (Blending, Image (Format d), DepthOption)) -> FragmentStream (a, FragDepth) -> (a -> DrawColors os s ()) -> Shader os s ()
drawStencil :: forall a os s st. StencilRenderable st => (s -> (Blending, Image (Format st), StencilOptions)) -> FragmentStream a -> (a -> DrawColors os s ()) -> Shader os s ()
drawDepthStencil :: forall a os s d st. (DepthRenderable d, StencilRenderable st) => (s -> (Blending, Image (Format d), Image (Format st), DepthStencilOption)) -> FragmentStream (a, FragDepth) -> (a -> DrawColors os s ())

As you can see, you will in addition to the DepthOption provide an Image (Format d) to the versions using depth tests, and in addition to the StencilOptions an Image (Format st) to the stencil versions. All versions will take a Blending parameter from the shader environment, but apparently no image for the colors? No, instead these drawing actions expect a fragment stream of some a (plus a FragDepth for those doing depth test) and also want a function a -> DrawColors os s ().

DrawColors os s a is a monad in which you draw colors. When drawing to off-screen images, you may actually draw each fragment stream to multiple color images (but still only use a single image for depth test and a single image for stencil test). If the depth and stencil tests of the drawing action succeeds, then the a -> DrawColors os s () function will be run for each fragment's a. To draw a color inside the DrawColors monad, you use
drawColor :: forall c s os. ColorRenderable c => (s -> (Image (Format c), ColorMask c, UseBlending)) -> FragColor c -> DrawColors os s ()

Besides the FragColor c, which you get from the a that was passed to the a -> DrawColors os s () function, you also need to provide a function that retrieves the following from the shader environment: an image to draw the colors to, a ColorMask c and a Bool telling whether blending should be used or not (UseBlending is simply a synonym for Bool). The Blending itself was specified for the entire drawing action above meaning that if you want to draw multiple colors, you can choose which of them that will use blending but they'll have to use the same kind of blending if they do. For color images where UseBlending is False, it is equivalent to as if NoBlending was used (i.e. the image's previous pixel value will be overwritten with the fragment's value).

The number of colors you may draw to in the same drawing action is hardware dependent, and a GPipeException will be thrown from the compileShader call of any shader that exceeds this limit.

When using multiple images of different sizes in a single draw call (including the depth or stencil images) only fragments that lie inside all of the images will be drawn. The rest will simply be discarded and those parts of the larger images remain untouched.

To clear an entire image, you use the render actions clearImageColor, clearImageDepth, clearImageStencil and clearImageDepthStencil. They work exactly like their ...Window... counterparts, only that you need to tell which image to clear for each of them.

Demo time!

{-# LANGUAGE ScopedTypeVariables, PackageImports, TypeFamilies #-}
module Main where

import Graphics.GPipe
import qualified "GPipe-GLFW" Graphics.GPipe.Context.GLFW as GLFW
import "lens" Control.Lens
import Control.Monad (unless)
import Data.Word (Word32)
import Control.Applicative (pure)
import Data.Monoid (mappend)

main =
  runContextT GLFW.defaultHandleConfig $ do
    win <- newWindow (WindowFormatColor RGB8) (GLFW.defaultWindowConfig "Checkers")
    vertexBuffer :: Buffer os (B2 Float) <- newBuffer 4
    writeBuffer vertexBuffer 0 [V2 0 0, V2 1 0, V2 0 1, V2 1 1]
    tex <- newTexture2D R8 (V2 8 8) 1
    let whiteBlack = cycle [minBound,maxBound] :: [Word32]
        blackWhite = tail whiteBlack
    writeTexture2D tex 0 0 (V2 8 8) (cycle (take 8 whiteBlack ++ take 8 blackWhite))

    colorTex <- newTexture2D RG8 (V2 256 256) 1
    depthTex <- newTexture2D Depth16 (V2 256 256) 1

    shader1 <- compileShader $ do
      texMappedFragmentStream <- getProjectedFragments 256 (V3 0.5 (-0.8) (-0.8)) (V3 0.5 0.5 0) (V3 0 1 0)  textureMappedPrimitives
      solidFragmentStream <- getProjectedFragments 256 (V3 (-0.6) (-0.6) 0.8) (V3 0.25 0.25 0) (V3 0 1 0) solidPrimitives
      let filter = SamplerFilter Nearest Nearest Nearest Nothing
          edge = (pure ClampToEdge, 0)
      samp <- newSampler2D (const (tex, filter, edge))
      let sampleTexture = sample2D samp SampleAuto Nothing Nothing
          texMappedFragmentStream2 = filterFragments ((>* 0.5) . sampleTexture) texMappedFragmentStream
          texMappedFragmentStream3 = fmap (const (V2 1 0)) texMappedFragmentStream2
          solidFragmentStream2 = fmap (const (V2 0 1)) solidFragmentStream
          fragmentStream = solidFragmentStream2 `mappend` texMappedFragmentStream3
          fragmentStream2 = withRasterizedInfo (\a r -> (a, rasterizedFragCoord r ^. _z)) fragmentStream
      drawDepth (\s -> (NoBlending, depthImage s, DepthOption Less True)) fragmentStream2 $ \ a -> do
        drawColor (\ s -> (colorImage s, pure True, False)) a

    shader2 <- compileShader $ do
      fragmentStream <- getProjectedFragments 800 (V3 1 2 2) (V3 0.5 0.5 0) (V3 0 1 0) id

      let filter = SamplerFilter Linear Linear Nearest Nothing
          edge = (pure ClampToEdge, 0)
      samp <- newSampler2D (const (colorTex, filter, edge))
      let sampleTexture = sample2D samp SampleAuto Nothing Nothing
          fragmentStream2 = fmap ((\(V2 r g) -> V3 r 0 g) . sampleTexture) fragmentStream
      drawWindowColor (const (win, ContextColorOption NoBlending (pure True))) fragmentStream2

    renderLoop win [
      do
        vertexArray <- newVertexArray vertexBuffer
        let singleTriangle = takeVertices 3 vertexArray
        cImage <- getTexture2DImage colorTex 0
        dImage <- getTexture2DImage depthTex 0
        clearImageColor cImage 0
        clearImageDepth dImage 1
        shader1 $ ShaderEnvironment
            (toPrimitiveArray TriangleStrip vertexArray)
            (toPrimitiveArray TriangleList singleTriangle)
            cImage
            dImage
      ,
      do
        clearWindowColor win 0.5
        vertexArray <- newVertexArray vertexBuffer
        shader2 (toPrimitiveArray TriangleStrip vertexArray)
      ]

getProjectedFragments size eye center up sf = do
  primitiveStream <- toPrimitiveStream sf
  let primitiveStream2 = fmap (\pos2d -> (make3d eye center up pos2d, pos2d)) primitiveStream
  rasterize (const (FrontAndBack, ViewPort (V2 0 0) (V2 size size), DepthRange 0 1)) primitiveStream2

make3d eye center up (V2 x y) = projMat !*! viewMat !* V4 x y 0 1
  where
    viewMat = lookAt' eye center up
    projMat = perspective (pi/3) 1 1 100

renderLoop win renderings = do
  mapM_ render renderings
  swapWindowBuffers win
  closeRequested <- GLFW.windowShouldClose win
  unless (closeRequested == Just True) $
    renderLoop win renderings

-- Copy of lookAt from linear with normalize replaced with signorm
lookAt' eye center up =
  V4 (V4 (xa^._x)  (xa^._y)  (xa^._z)  xd)
     (V4 (ya^._x)  (ya^._y)  (ya^._z)  yd)
     (V4 (-za^._x) (-za^._y) (-za^._z) zd)
     (V4 0         0         0          1)
  where za = signorm $ center - eye
        xa = signorm $ cross za up
        ya = cross xa za
        xd = -dot xa eye
        yd = -dot ya eye
        zd = dot za eye

data ShaderEnvironment = ShaderEnvironment
  {
    textureMappedPrimitives, solidPrimitives  :: PrimitiveArray Triangles (B2 Float),
    colorImage :: Image (Format RGFloat),
    depthImage :: Image (Format Depth)
  }

In this example, we draw a checker textured quad and a solid triangle onto an off-screen image (with two color channels). The quad will have its fragments discarded where the texture has values lower than 0.5. The triangle and the quad are intersected so we use depth testing on them with the help of an additional depth image. The off-screen color image is then in another render pass mapped onto a quad that is rendered to the screen, producing this final result:


We could also have stored the off-screen image to disc by using readTexture2D, and not drawn anything to the screen at all, but I leave that as an exercise for the reader.




We have now covered the entire graphical pipeline in GPipe, from context creation and vertex buffers, all the way through the Shader via PrimitiveStreams and FragmentStreams and have finally come out on the other end!

If you haven't already, now would be a good time to study the haddocks. Play around with GPipe's all functions and combinators. Remember: If it compiles it will most likely run. If it doesn't run, it should at least give you a nice error message. Good luck!