Open GL : Drawing a lot of Geometry Efficiently (part 2) – Instanced Rendering

Instanced Rendering

There will probably be times when you want to draw the same object many times. Imagine a fleet of starships, or a field of grass. There could be thousands of copies of what are essentially identical sets of geometry, modified only slightly from instance to instance. A simple application might just loop over all of the individual blades of grass in a field and render them separately, calling glDrawArrays once for each blade and perhaps updating a set of shader uniforms on each iteration. Supposing each blade of grass were made up of a strip of four triangles, the code might look something like Listing 2.

Listing 2. Drawing the Same Geometry Many Times
glBindVertexArray(grass_vao);
for (int n = 0; n < number_of_blades_of_grass; n++) {
    SetupGrassBladeParameters();
    glDrawArrays(GL_TRIANGLE_STRIP, 0, 6);
}

 

How many blades of grass are there in a field? What is the value of number_of_blades_of_grass? It could be thousands, maybe millions. Each blade of grass is likely to take up a very small area on the screen, and the number of vertices representing the blade is also very small. Your graphics card doesn’t really have a lot of work to do to render a single blade of grass, and the system is likely to spend most of its time sending commands to OpenGL rather than actually drawing anything. OpenGL addresses this through instanced rendering, which is a way to ask it to draw many copies of the same geometry.

Instanced rendering is a method provided by OpenGL to specify that you want to draw many copies of the same geometry with a single function call. This functionality is accessed through instanced rendering functions, such as

void glDrawArraysInstanced(GLenum mode, GLint first, GLsizei count, GLsizei primcount);	  

and

void glDrawElementsInstanced(GLenum mode, GLsizei count, GLenum type, const void * indices, GLsizei primcount);		  

 

These two functions behave much like glDrawArrays and glDrawElements, except that they tell OpenGL to render primcount copies of the geometry. The first parameters of each (mode, first, and count for glDrawArraysInstanced, and mode, count, type, and indices for glDrawElementsInstanced) take the same meaning as in the regular, noninstanced versions of the functions. When you call one of these functions, OpenGL makes any preparations it needs to draw your geometry (such as copying vertex data to the graphics card’s memory, for example) only once and then renders the same vertices many times.

If all that these functions did were send many copies of the same vertices to OpenGL as if glDrawArrays or glDrawElements had been called in a tight loop, they wouldn’t be very useful. One of the things that makes instanced rendering usable and very powerful is a special, built-in variable in GLSL named gl_InstanceID. The gl_InstanceID variable appears in GLSL as if it were an integer uniform. When the first copy of the vertices is sent to OpenGL, gl_InstanceID will be zero. It will then be incremented once for each copy of the geometry and will eventually reach primcount – 1. Because gl_InstanceID is an integer, there is a practical upper limit of a couple of billion instances that you can render in one call to glDrawArraysInstanced or glDrawElementsInstanced, but that should be enough for the vast majority of applications. If you need to render more than two billion copies of your geometry, your application will probably run very slowly anyway, and you won’t see a significant performance penalty for breaking your rendering into blocks of, say one billion vertices.

The glDrawArraysInstanced function essentially operates as if the code in Listing 3 were executed.

Listing 3. Pseudo-code Illustrating the Behavior of glDrawArraysInstanced
// Loop over all of the instances (i.e. primcount)
for (int n = 0; n < primcount; n++) {
    // Set the gl_InstanceID uniform – here gl_InstanceID is a C variable
    // holding the location of the 'virtual' gl_InstanceID uniform.
    glUniform1i(gl_InstanceID, n);
    // Now, when we call glDrawArrays, the gl_InstanceID variable in the
    // shader will contain the index of the instance that's being rendered.
    glDrawArrays(mode, first, count);
}

 

Likewise, the glDrawElementsInstanced function operates similarly to the code in Listing 4.

Listing 4. Pseudo-code Illustrating the Behavior of glDrawElementsInstanced
for (int n = 0; n < primcount; n++) {
    // Set the value of gl_InstanceID
    glUniform1i(gl_InstanceID, n);
    // Make a normal call to glDrawElements
    glDrawElements(mode, count, type, indices);
}

 

Of course, gl_InstanceID is not a real uniform, and you can’t get a location for it by calling glGetUniformLocation. The value of gl_InstanceID is managed by OpenGL and is very likely generated in hardware, meaning that it’s essentially free to use in terms of performance. The power of instanced rendering comes from imaginative use of this variable, along with instanced arrays, which are explained in a moment.

The value of gl_InstanceID can be used directly as a parameter to a shader function or to index into data such as textures or uniform arrays. To return to our example of the field of grass, let’s figure out what we’re going to do with gl_InstanceID to make our field not just be thousands of identical blades of grass growing out of a single point. Each of our grass blades is made out of a little triangle strip with four triangles in it, a total of just six vertices. It could be tricky to get them to all look different. However, with some shader magic, we can make each blade of grass look sufficiently different so as to produce an interesting output. We won’t go over the shader code here , but we walk through a few ideas of how you can use gl_InstanceID to add variation to your scenes.

First, we need each blade of grass to have a different position; otherwise, they’ll all be drawn on top of each other. Let’s arrange the blades of grass more or less evenly. If the number of blades of grass we’re going to render is a power of two, we can use half the bits of gl_InstanceID to represent the x coordinate of the a blade, and the y coordinate to represent the z coordinate (our ground lies in the x-z plane, with y being altitude). For this example, we render 2^20, or a little over a million blades of grass (actually 1,048,576 blades, but who’s counting?). By using the ten least significant bits (bits 9 through 0) as the x coordinate and the ten most significant bits (19 through 10) as the z coordinate, we have a uniform grid of grass blades. Let’s take a look at Figure 2 to see what we have so far.

 

Figure 2. First attempt at an instanced field of grass.

 

Our uniform grid of grass probably looks a little plain, as if a particularly attentive groundskeeper hand-planted each blade. What we really need to do is displace each blade of grass by some random amount within its grid square. That’ll make the field look a little less uniform. A simple way of generating random numbers is to multiply a seed value by a large number and take a subset bits of the resulting product and use it as the input to a function. We’re not aiming for a perfect distribution here, so this simple generator should do. Usually, with this type of algorithm, you’d reuse the seed value as input to the next iteration of the random number generator. In this case, though, we can just use gl_InstanceID directly as we’re really generating the next few numbers after gl_InstanceID in a pseudo-random sequence. By iterating over our pseudo-random function only a couple of times, we can get a reasonably random distribution. Because we need to displace in both x and z, we generate two successive random numbers from gl_InstanceID and use them to displace the blade of grass within the plane. Look at Figure 3 to see what we get now.

 

Figure 3. Slightly perturbed blades of grass.

 

At this point, our field of grass is distributed evenly with random perturbations in position for each blade of grass. All the grass blades look the same, though. (Actually, we used the same random number generator to assign a slightly different color to each blade of grass just so that they’d show up in the figures.) We can apply some variation over the field to make each blade look slightly different. This is something that we’d probably want to have control over, so we use a texture to hold information about blades of grass.

You have an x and a z coordinate for each blade of grass that was calculated by generating a grid coordinate directly from gl_InstanceID and then generating a random number and displacing the blade within the x-z plane. That coordinate pair can be used as a coordinate to look up a texel within a 2D texture, and you can put whatever you want in it. Let’s control the length of the grass using the texture. We can put a length parameter in the texture (let’s use the red channel) and multiply the y coordinate of each vertex of the grass geometry by that to make longer or shorter grass. A value of zero in the texture would produce very short (or nonexistent) grass, and a value of one would produce grass of some maximum length. Now you can design a texture where each texel represents the length of the grass in a region of your field. Why not draw a few crop circles? The texture can be sampled with GL_LINEAR sampling, and you can even use mipmapping.

Now, the grass is evenly distributed over the field, and you have control of the length of the grass in different areas. However, the grass blades are still just scaled copies of each other. Perhaps we can introduce some more variation. Next, we rotate each blade of grass around its axis according to another parameter from the texture. We use the green channel of the texture to store the angle through which the grass blade should be rotated around the y-axis, with zero representing no rotation and one representing a full 360 degrees. We’ve still only done one texture fetch in our vertex shader, and still the only input to the shader is gl_InstanceID. Things are starting to come together. Take a look at Figure 4.

 

Figure 4. Control over the length and orientation of our grass.

 

Our field is still looking a little bland. The grass just sticks straight up and doesn’t move. Real grass sways in the wind and gets flattened when things roll over it. We need the grass to bend, and we’d like to have control over that. Why not use another channel from the parameter texture (the blue channel) to control a bend factor? We can use that as another angle and rotate the grass around the x-axis before we apply the rotation in the green channel. This allows us to make the grass bend over based on the parameter in the texture. Use zero to represent no bending (the grass stands straight up) and one to represent fully flattened grass. Normally, the grass will sway gently, and so the parameter will have a low value. When the grass gets flattened, the value can be much higher.

Finally, we can control the color of the grass. It seems logical to just store the color of the grass in a large texture. This might be a good idea if you want to draw a sports field with lines, markings, or advertising on it for example, but it’s fairly wasteful if the grass is all varying shades of green. Instead, let’s make a palette for our grass in a 1D texture and use the final channel within our parameter texture (the alpha channel) to store the index into that palette. The palette can start with an anemic looking dead-grass yellow at one end and a lush, deep green at the other end. Now we read the alpha channel from the parameter texture along with all the other parameters and use it to index into the 1D texture—a dependent texture fetch. Our final field is shown in Figure 5

 

Figure 5. The final field of grass.

 

Now, our final field has a million blades of grass, evenly distributed, with application control over length, “flatness,” direction of bend, or sway and color. Remember, the only input to the shader that differentiates one blade of grass from another is gl_InstanceID, the total amount of geometry sent to OpenGL is six vertices, and the total amount of code required to draw all the grass in the field is a single call to glDrawArraysInstanced.

The parameter texture can be read using linear texturing to provide smooth transitions between regions of grass and can be a fairly low resolution. If you want to make your grass wave in the wind or get trampled as hoards of armies march across it, you can animate the texture by updating it every frame or two and uploading a new version of it before you render the grass. Also because the gl_InstanceID is used to generate random numbers, adding an offset to it before passing it to the random number generator allows a different but predetermined chunk of “random” grass to be generated with the same shader.

Open GL : Drawing a lot of Geometry Efficiently (part 3) – Getting Your Data Automatically
Open GL : Drawing a lot of Geometry Efficiently (part 1) – Combining Drawing Functions, Combining Geometry Using Primitive Restart