Introduction to Computer Graphics Notes
I want to learn about 3D graphics. I plan on displaying some 3D graphics on this site and maybe creating things with 3D graphics in the future, so I want to learn more about them.
References
- Introduction to Computer Graphics; Version 1.4, August 2024; David J. Eck, Hobart and William Smith Colleges
- Related
Preface
- All of the programs in this book can be done in JavaScript, but some knowledge of C is also important for certain sections of this book.
- Most sample programs and all demos use HTML canvas graphics, WebGPU, or WebGL.
- OpenGL 1.1 implements fundamental graphics concepts - vertices, normal vectors, coordinate transformations, lighting, and material - in a way that is transparent and fairly easy to use. Newer graphics APIs are more flexible and more powerful, but they have a much steeper learning curve.
- OpenGL is still fully supported, but it is being largely superseded by newer graphics APIs such as Direct3D, Metal, and Vulkan.
Introduction
The termcomputer graphicsrefers to anything involved in the creation or manipulation of images on a computer, including animated images. [...] there is a core of fundamental ideas that are part of the foundation of most applications of computer graphics. This book attempts to cover the foundation ideas.
Painting and Drawing
The main focus of this book is three-dimensional (3D) graphics, where most of the work goes into producing a 3D model of a scene. The end result of a computer graphics project is a 2D image. The direct production and manipulation of 2D images is an important topic in its own right.
- An image that is presented on the computer screen is made up of pixels. The screen consists of a rectangular grid of pixels, arranged in rows and columns. Each pixel can only show one color at a time. Most screens use 24-bit color, where a color can be specified by three 8-bit number, giving the red, green, and blue of each color. Any color on a screen is made up of some combination of these colors. Other color systems include grayscale and indexed color.
- The color values for all the pixels on the screen are stored in a large block of memory known as a frame buffer. Changing the image on the screen requires changing color values that are stored in the frame buffer. Basically right the pixel values change in the frame buffer, the screen changes. A computer screen used in this way is the basic model of raster graphics.
- Specifying individual pixel colors is not always the best way to create an image. Another way is to specify the basic geometric objects that it contains, such as shapes and lines, circles, triangles, and rectangles. This is the idea that defines vector graphics. Represent an image as a list of the geometric shapes that it contains. Shapes can have attributes, such as the thickness of a line or fill or a rectangle. A vector graphics display stores a display list of lines that should appear on the screen. The graphics display goes over the display list over and over, continually redrawing all the lines on the list. Changing the image involves changing the display list.
For an image that can be specified as a reasonably small number of geometric shapes, the amount of information needed to represent the image is much smaller using a vector representation than using a raster representation.
The divide between raster and vector graphics persists in several areas of computer graphics, for example painting programs and drawing programs. In a painting program, the image is a grid of pixels and the user creates an image by assigning colors to pixels. The point of a painting program is to color the individual pixels, and it is only the pixel colors that are saved. In a drawing program, the user creates an image by adding geometric shapes and the image is represented as a list of those shapes.
- In the world of free software, Gimp is a good alternative to Photoshop, while Inkscape is a reasonably capable free drawing program.
The divide between vector and raster also appears in graphics file formats. If the original image is to be recovered form the bits stored in the file, the representation must follow some exact, known specification. Such a specification is called a graphics file format.
The amount of data necessary to represent a raster image can be quite large. However, the data usually contains a lot of redundancy, and the data can be compressed
to reduced its size. GIF and PNG use lossless data compression, which means that the original image can be recovered perfectly from the compressed data. JPEG uses a lossy data compression algorithm, which means that the image that is recovered from a JPEG file is not exactly the same as the original image; some information has been lossed (not very noticeable).
SVG is fundamentally a vector-graphics format. It is actually an XML-based language for describing two-dimensional vector graphics images. There is no loss in quality when the size of a vector image is changed.
A digital image, no matter what its format, is specified using a coordinate system. A coordinate system sets up a correspondence between numbers and geometric points.
Elements of 3D Graphics
Most common approaches to 3D graphics have more in common with vector graphics than raster graphics. That is, the content of an image is specified as a list of geometric objects, which is referred to as geometric modeling.
- The starting point is to construct an artificial 3D world as a collection of simple geometric shapes, arranged in 3D space. These shapes can have attributes that determine appearances. Often, the range of basic shapes is very limited, perhaps including points, line segments, and triangles, a more complex shape such as polygon or sphere can be built or approximated as a collection of more basic shapes.
- To make a 2-D image of the scene, the scene is projected from the three dimensions down to two dimensions. Projection is the equivalent of taking a photo of the scene.
- The smallest building blocks that we have to work with in a 3D scene are called geometric primitives.
- Hierarchical modeling - designing reusable components out of simple geometric primitives.
- A geometric transform is used to adjust the size, orientation, and position of a geometric object. The three most basic kinds of geometric transform are:
- scaling - A scaling transform is used to set the size of an object, that is, to make it bigger or smaller by a specified factor.
- rotation - A rotation transform is used to set an object's orientation, by rotating it by some angle about some specific axis.
- translation - A translation transform is used to set the position of an object, by displaying it by a given amount from its given position.
- The appearance of 3D objects is set by assigning attributes to the geometric objects. In 3D graphics, instead of color, we usually talk about material. The term material refers to the properties that determine the intrinsic visual appearance of a surface. Material properties can include basic color along with shininess, roughness, and transparency.
- One of the most useful kinds of material property is texture. In most general terms, a texture is a way of varying material properties point to point on a surface. Textures can refer to changing color along surface, changing transparency or bumpiness.
- The actual appearance of the object also depends on the environment in which it is viewed. In 3D graphics, you have to add simulated lighting to a scene. There can be several source of light in a scene, each having its own color, intensity, and direction or position.
- In general, the ultimate goal of 3D graphics is to produce 2D images of the 3D world. The transformation from 3D to 2D involves viewing and projection. You need to specify the position of the viewer and in which direction they are looking. This can be thought of as placing a
virtual camera
into the scene. - The final step in 3D graphics is to assign colors to individual pixels in the 2D image. This process is called rasterization, and the whole process of producing an image is referred to as rendering the scene.
- In many cases, the ultimate goal is not to create a single image, but to create an animation, consisting of a sequence of images that show the world at different times. In an image, there are small changes from one image in the sequence to the next. Almost any aspect of a scene can change during an animation, including the coordinates of primitives, transformations, material properties, and the view. One of the most important techniques to help with the computation for the changes is the
physics engine
, which computes the motion and interaction of object based on the laws of physics.
Hardware and Software
This book uses OpenGL as the primary basis for 3D graphics programming.
OpenGL is supported by the graphics hardware in most modern computing devices, including desktop computers, laptops, and many mobile devices. In the form of WebGL, it is the used for most 3D graphics on the Web.
The improvement in graphics is mainly due to the GPU, or Graphics Processing Unit. A GPU include includes processors for doing graphics computations; in fact, it can include a large number of such processors that work in parallel to greatly speed up graphical operations. It also includes its own dedicated memory for storing things like images and lists of coordinates. GPU processors have very fast access to data stored in GPU memory.
To perform a graphical operation, the CPU simply sends commands along with necessary data to the GPU, which is responsible for actually carrying out those commands. The CPU offloads most of the graphical work to the GPU, which is optimized to carry out that work very quickly. The set of commands that the GPU understands make up the API of the GPU. OpenGL is an example of a graphics API and most GPUs support OpenGL in the sense that they can understand OpenGL commands.
Other Graphics APIs:
- Vulkan
- Metal
- Direct3D
OpenGL was designed as a client / server
system. The server, which is responsible for controlling the computer's display and performing graphic computations, carries out commands issued by the client. Typically, the server is a GPU, including its graphics processors and memory. The sever executes OpenGL commands. The client is the CPU on the same computer, along with the application programming that it is running. It is possible to run OpenGL over a network.
One of the driving factors in the evolution of OpenGL has been the desire to limit the amount of communication that is needed between the CPU and the GPU.
OpenGL draws primitives such as triangles. Specifying a primitive means specifying coordinates and attributes for each of its vertices. A Vector Buffer Object (VBO) is a block of memory in the GPU that can store the coordinates or attribute values for a set of vertices. This makes it possible to have reuse the data without having to retransmit it from the CPU to the GPU.
Texture Objects make it possible to store several images on the GPU for use as textures. With OpenGL 2.0, it became possible to write programs to be executed as part of the graphical computation in the GPU. These programs are run on the GPU at GPU speed. The programs are called shaders (the term doesn't really describe what most of them actually do). The first shaders to be introduced were vertex shaders and fragment shaders. A vertex shader is a program that can take over the job of doing pre-vertex computations
. A fragment shader can take over the job of performing per-pixel
computations. OpenGL shaders are written in GLSL (OpenGL Shading Language).
The computations that are done on different vertices are pretty much independent, and can potentially be done in parallel.
Two-Dimensional Graphics
Things are simpler and a lot easier to visualize in 2D than in 3D, but most of the ideas that are covered in this chapter will also be very relevant to 3D.
Pixels, Coordinates, and Colors
To create a 2-D image, each point in the image is assigned a color. A point in 2D can be identified by a pair of numerical coordinates. Coordinate systems associate numbers to points, and color models associate numbers to colors. A digital image is made up of rows and columns of pixels. A pixel in such an image can be specified by saying which column and row contains it.
The pixel that is identified by a pair of coordinates depends on the choice of coordinate system. Row and column numbers identify a pixel, not a point. A pixel contains many points; mathematically, it contains an infinite number of pixels. The goal of computer graphics is not really to color pixels - it is to create and manipulate images.
In some ideal sense, an image should be defined by specifying a color for each point, not just for each pixel. Pixels are an approximation.
Aliasing refers to visual artifacts, particularly jagged or stair-stepped edges, that occur when rendering images or scenes with limited resolution or insufficient sampling. Antialiasing is a term for techniques that are designed to mitigate the effects of aliasing. The idea is that when a pixel is only partially covered by a shape, the color of the pixel should be a mixture of the color of the shape and the color of the background. In practice, an approximate method is used to calculate the color.
In general, we think of the numbers of the coordinates as referring to the top-left of the pixel. The resolution of a display device can be measured in terms of the number of pixels per inch on the display, a quantity referred to as PPI or sometimes DPI. In many graphics systems, pixel
doesn't really refer to the size of a physical pixel. Instead, it is just another unit of measure, which is set by the system to be something appropriate. For vector graphics, pixels only become an issue during rasterization, the step in which a vector image is converted into pixels for display.
The aspect ratio of a rectangle is the ratio of its width to its height. A coordinate system also has an aspect ratio.
In the RGB color model, colors are specified by specifying an intensity value for red, green, and blue. The red, green, and blue values for a color are called the color components of that color in the RGB color model. The eye contains three kinds of color sensors. The color sensors are called cone cells
. Cone cells do not respond exclusively to red, green, and blue light. Each kind of cone cell responds, to a varying degree, to wavelengths of light in a wide range. A given mix of wavelengths will stimulate each type of cell to a certain degree, and the intensity of stimulation determines the color that we see. Three basic colors can produce a reasonably large fraction of the set of perceivable colors, but there are colors that you can see in the world that you will never see on your computer screen. The range of colors that can be produced by a device such as a computer screen is called the color gamut of that device.
The most common color model for computer graphics is RGB. RGB colors are most often represented using 8 bits per color component, a total of 24 bits to represent a color. The color models HSV and HSL describe the same colors as RGB, but attempt to do it in a more intuitive way. H
stands for hue
, a basic spectral color. As H increase, the color changes from red to yellow to green to cyan to blue to magenta and then back to red. The value of H is often taken to range from 0 to 360. The S
stands for saturation
and is taken to range from 0 to 1. A saturation of 0 gives a shade of grey. A saturation of 1 gives a pure color. V
stands for value and L
stands for lightness. They determine how bright or dark the color is. The main difference between the HSV and HSL models is where pure spectral colors occur.
A fourth component is often added to color models. The component alpha and color models that use it are referred to by names such as RGBA and HSLA. Alpha is used to represent transparency. Transparency determines what happens when you draw with one color (the foreground color) on top of another color (the background color). Alpha Blending:
A RGBA color model with 8 bits per component uses a total of 32 bits to represent a color.
Shapes
In a graphics API, there will be certain basic shapes that can be drawn with one command. An algorithm for drawing the line has to decide exactly which pixels to color. One of the first computer graphics algorithms, Breshenham's algorithm, for line drawing, implements a very efficient procedure for doing so.
There are two ways to make a shape visible in a drawing. You can stroke it. Or, if it is a closed shape such as a rectangle or an oval, you can fill it. Stroking a line is like dragging a pen along the line. Stroking a rectangle or oval is like dragging a pen along its boundary. It is possible to stroke and fill the same shape.
How to draw self-intersecting shapes: The winding number of a shape about a point is how many times the shape winds around the point in the positive direction. There can be different fill rules based on winding number.
Shapes can be filled with colors, patterns and gradients. A pattern is an image, usually a small image. When used to fill a shape, a pattern can be repeated horizontally and vertically to cover the entire shape. A gradient is similar in that it is a way for color to vary from point to point, but instead of taking the colors from an image, they are computed. There are a lot of variations to the basic idea, but there is always a line segment along which the color varies. The color is specified at the endpoints of the line segments, and possibly at additional points, and between these points the color is interpolated.
It is impossible for a graphics API to include every possible shape as a basic shape, but there is usually some way to create more complex shapes. A polygon is a closed shape consisting of a sequence of line segments. Each line segment is joined to the next at its endpoint, and the last line segment connects back to the first. In a regular polygon, all the sides are the same length and all the angles between the sides are equal. A convex polygon has the property that whenever two points are inside or on the polygon, then the entire line segment between these points is also inside or on the polygon. Simple polygons have no self-intersections.
Bezier Curves
- Used to create very general curved shapes
- Defined by parametric polynomial equations
- Two kinds: cubic Bezier curves and quadratic Bezier curves
- Control Points
Transforms
In a typical application, we have a rectangle made of pixels, with its natural pixel coordinates, where an image will be displayed. This rectangle will be called the viewport. We also have a set of geometric objects that are defined in a possibly different coordinate system, generally one that uses real-number coordinates rather than integers. These objects make up the scene
or world
that we want to view, and the coordinates that we use to define the scene are called world coordinates. For 2D graphics, the world lies in a plane. It is not possible to show a picture of the entire infinite plane. We need to pick some rectangular area in the plane to display the image. This rectangular area is called the window. A coordinate transform is used to map the window to the viewport.
represents the coordinate transformation, is a function that takes world coordinates in some window and maps them to pixel coordinates in the viewport. The coordinates that we use to define an object are called object coordinates for the object. When we want to place the object into a scene, we need to transform the object coordinates that we used to define the object into the world coordinate system that we are using for the scene. The transformation that we need is called a modeling transformation. The image below illustrates an object defined in its own object coordinate system and then mapped by three different modeling transformations into the world with coordinate system.
The choice of a view window tells which part of the scene is shown in the image. Moving, resizing, or even rotating the window will give a different view of the scene. Keeping in mind with the image above that the object could be moved / resized or the window could be moved/resized. When we modify the view window, we change the coordinate system that is applied to the viewport. But in fact, this is the same as leaving that coordinate system in place and moving the objects in the scene instead. There is no essential distinction between transforming the window and transforming the object.
The transformations used in 2D graphics can be written in the form:
where represents the coordinates of some point before the transformation is applied, and are the transformed coordinates. The transformation is defined by six constants , and the transformation function can be written as a function , where
A transformation of this form is called an affine transform. An affine transform has the property that, when it is applied to two parallel lines, the transformed lines will also be parallel. Also, if you follow one affine transform by another affine transform, the result is again an affine transform.
- A translation transform simply moves every point by a certain amount horizontally a certain amount vertically.
- is the amount of units by which the point is moved horizontally and is the amount by which it is moved vertically
- A rotation transform rotates each point about the origin. Every point is rotated through the same angle, called the angle of rotation. A rotation with a positive angle rotates objects in the direction from the positive x-axis towards the positive y-axis.
- A scaling transform can be used to make objects bigger or smaller. Mathematically, a scaling transform simply multiplies each x-coordinate by a given amount and each y-coordinate by a given amount.
- The common case where the horizontal and vertical scaling factors are the same is called uniform scaling.
- Transformations Demo
- We will look at one more type of basic transform, a shearing transform. A shear will
tilt
objects. A horizontal shear will tilt things toward the left (for negative shear) or right (for positive shear). - Horizontal Shear
- Vertical Shear
- The window-to-viewport transformation maps the rectangular view window in the xy-plane that contains the scene to the rectangular grid of pixels where the image will be displayed.
The transforms that are used in computer graphics can be represented as matrices, and the points on which they operate are represented as vectors. Linear algebra is fundamental to computer graphics. Matrix and vector math is built into GPUs. If is in an matrix, and is a vector of length , then can be multiplied by to give another vector . The function that takes to is a transformation; it transforms any given vector of size into another vector of size . a transformation of this form is called a linear transformation. Note that . Rotation and scaling are linear transformations. Translation is not a linear transformation. We turn the coordinates into a triple of numbers , then we can represent rotation, scaling, and translation - and any affine transformation - on 2D space as multiplication by a 3-by-3 matrix.
Hierarchical Modeling
Hierarchical modeling is when a complex object can be made up of simpler objects, which in turn can be made up of even simpler objects, and so on until it bottoms out with simple geometric primitives that can be drawn directly. The transforms applied to an object are usually applied in the order scale, then rotate, then translate, because scaling and rotation leave the reference point, fixed. An identity transform is a transform that doesn't modify the coordinates to which it is applied.
We would like to be able to draw the smaller component objects in their own natural coordinate systems, just as we do the main object:
- We draw each small component, such as the bloom, in its own coordinate system, and use a modeling transformation to move the sub-object into position within the main object. We are composing the complex object in its own natural coordinate system as if it were a complete scene.
- We can apply another modeling transformation to the complex object as whole, to move it into the actual scene; the sub-objects of the complex object are carried along with it. The overall transformation that applies to a sub-object consists of a modeling transformation that places the sub-object into the complex object, followed by the transformation that places the complex object into the scene.
- We can build objects that are made up of smaller objects which in turn are made up of even smaller objects, to any level. We could draw the bloom's petals in their own coordinate systems, then apply modeling transformations to place the petals into the natural coordinate system for the bloom. There will be another transformation that moves the bloom into position on the stem, and so forth. This is hierarchical modeling.
Building a complex scene out of objects is similar to building up a complex program out of subroutines.
Animation is just another aspect of modeling. A computer animation consists of a sequence of frames. Each frame is a separate image, with small changes from one frame to the next. From our point of view, each frame is a separate scene and has to be drawn separately. The same object can appear in many frames. To animate the object, we can simply apply a different modeling transformation to the object in many frames.
Logically, the components of a complex scene form a structure. In this structure, each object is associated with the sub-objects that it contains. If the scene is hierarchical, then the structure is hierarchical. This structure is known as a scene graph. A scene graph is a tree-like structure, with the root representing the entire scene, the children of the root representing the top-level objects in the scene, and so on. Each arrow in the picture above can be associated with a modeling transformation that places the sub-object into the parent object. Although the scene graph exists conceptually, in some applications it exists often implicitly.
The computer only keeps track of the current transformation
that represents all the transforms that are applied to an object. When an object is drawn by a subroutine, the program saves the current transformation before calling the subroutine. Inside the subroutine, the object is drawn in its own coordinate system, possibly calling other subroutines to draw sub-objects with their own coordinate system, possibly calling other subroutines to draw sub-objects with their own modeling transformations.
HTML Canvas Graphics
A canvas
is an HTML element. It appears on the page as a blank rectangular area which can be used as a drawing surface. To draw on a canvas, you need a graphics context. A graphics context is an object that contains functions for drawing shapes. It also contains variables that record the current graphics state, including things like the current drawing color, transform, and font. Typically, you will store the graphics context in a global variable and use the same graphics context throughout your program. When a graphics context is global, changes made to the state in one function call will carry over to subsequent function calls, unless you do something to limit their effect.
- Graphics Starter
- Graphics Plus Starter
- Shows how to add functions to a graphics context for drawing lines, ovals, and other shapes that are not built into the API
- Animation Starter
- Events Starter
- Pixel Manipulation Example
SVG is a scene description language rather than a programming language. Where a programming language creates a scene by generating its contents procedurally, a scene description language specifies the scene declaratively
by listing its content.
OpenGL 1.1: Geometry
Shapes and Colors in OpenGL 1.1
In the default coordinate system for OpenGL, the image shows a region of 3D space in which , , and range from minus one to one. To show a different region, you have to apply a transformation. OpenGL can be implemented in many different programming languages, but the API specification more or less assumes that the language is C.
OpenGL can draw only a few basic shapes, including points, lines, and triangles. There is no built-in support for curves or curved surfaces; they must be approximated by simpler shapes. A primitive in OpenGL is defined by its vertices. A vertex is simply a point in 3D, given by its , , and coordinates. The entire set of vertex functions is often referred to as glVertex
with the *
standing in for the parameter specification (f=float
). The state
of OpenGL includes all the settings that affect rendering.
Seven primitives still exist in OpenGL:
Three have been removed:
Approximating a circle by drawing a polygon with a large number of sides:
glBegin( GL LINE LOOP );
for (i = 0; i < 64; i++) {
angle = 6.2832 * i / 64; // 6.2832 represents 2*PI
x = 0.5 * cos(angle);
y = 0.5 * sin(angle);
glVertex2f( x, y );
}
glEnd();
OpenGL has a large collection of functions that can be used to specify colors for the geometry that we draw. The functions have names of the form glColor
where *
stands for a suffix that gives the number and types of the parameters.
The color buffer is the drawing area. OpenGL uses several buffers in addition to the color buffer.
An obvious point about viewing in 3D is that one object can be behind another object. When this happens, the back object is hidden from the viewer by the front object. When we create an image of a 3D world, we have to make sure that objects that are supposed to be hidden behind other objects are in fact not visible in the image. This is the hidden surface problem. The solution might seem simple enough: Just draw the objects in order from back to front. If one object is behind another, the back object will be covered up later when the front object is drawn. This is called pointer's algorithm.
OpenGL uses a technique called the depth test. The depth test solves the hidden surface problem no matter what order the objects are drawn in, so you can draw them in any order you want. The term depth
here has to do with the distance from the viewer to the object. Objects at greater depth are farther from the viewer. An object with smaller depth will hide an object with greater depth. To implement the depth test algorithm, OpenGL stores a depth value for each pixel in the image. The extra memory that is used to store these depth values makes up the depth buffer. During the drawing process, the depth buffer is used to keep track of what is currently visible at that pixel, the information in the depth buffer can be used to decide whether the new object is in front of or behind the object that is currently visible there. If the new object is in front, then the color of the current pixel is changed to show the new object, and the depth buffer is also updated. If the new color is behind the current object, then the data for the new object is discarded and the color and depth buffers are left unchanged.
3D Coordinates and Transforms
In three dimensions, you need three numbers to specify a point.
OpenGL programmers usually think in terms of a coordinate system in which the and axes lie in the plane of the screen, and the axis us perpendicular to the screen with the positive direction pointing out of the screen. The coordinate system on the world - the coordinate system in which the scene is assembled - is referred to as world coordinates. Objects are not usually specified directly in world coordinates. Instead, objects are specified in their own coordinate system, known as object coordinates, and then modeling transforms are applied to place the objects into the world, or into more complex objects.
The basic 3D transforms are extensions of rotation glRotate*
, scaling glScale*
, and translation glTranslate*
. In3D, rotation is about a line, called the axis of rotation. The most common choices for axis of rotation are the coordinate axes, z-axis, y-axis, or x-axis.
Projection and Viewing
The coordinates that you actually use for drawing an object are called object coordinates. The object coordinate system is chose to be convenient for the object being drawn. A modeling transformation can then be applied to set the size, orientation, and position of the object in the overall scene. The modeling transformation is the first that is applied to the vertices of an object. In the real world, what you see depends on where you are standing and the direction in which you are looking. That is, you can't make a picture of the scene until you know the position of the viewer
and where the viewer is looking and how the viewer's head is tilting.
For the purposes of OpenGL, we imagine that the viewer is attached to their own individual coordinate system, which is known as eye coordinates. In the coordinate system, the viewer is at the origin looking in the direction of the negative z-axis, the positive direction of the y-axis is pointing straight up, and the x-axis is pointing to the right.
Eye coordinates are (almost) the coordinates that you actually want to use for drawing on the screen. The transform from world coordinates to eye coordinates is called the viewing transformation. The modelview transformation is a combination of the modeling and viewing transforms. The viewer can't see the entire 3D world, only the part that fits into the viewport which is the rectangular region of the screen or other display device where the image will be drawn. We say that the scene is clipped
by the edges of the viewport. The volume of space that is actually rendered into the image is called the view volume. Things inside the view volume make it rendered into the image; things not are clipped and cannot be seen.
OpenGL applies a coordinate transform that maps the view volume onto a cube. The cube is centered at the origin and extends from -1 to 1 in the -direction, in the -direction, and in the -direction. The coordinate system on this cube is referred to as the clip coordinates. The transformation from eye coordinates to clip coordinates is called the projection transformation. When things are actually drawn, there are device coordinates, the 2D coordinate system in which the actual drawing takes place on a physical device such as the computer screen. The viewport transformation takes and from the clip coordinates and scales them to fit the viewport.
Primitive Goes through the Following Transformations:
- The points that defined the primitive are specified in object coordinates, using methods such as
glVertex3f
- The points are first subjected to the modelview transformation, which is a combination of the modeling transform that places the primitive into the world and the viewing transform that maps the primitive into eye coordinates
- The projection transformation is then applied to map the view volume that is visible to the viewer onto the clip coordinate cube. If the transformed primitive lies outside that cube, it will not be part of the image, and the processing stops. If part of the primitive lies inside and part outside, the part that lies outside is clipped away and discarded, and only the part that remains is processed further.
- Finally, the viewport transform is applied to produce the device coordinates that will actually be used to draw the primitive on the display device.
The simplest of the transforms is the viewport transform, It transforms and clip coordinates to the coordinates that are used on the display device. The projection is represents in OpenGL as a matrix. OpenGL keeps track of the projection matrix separately from the matrix that represents the modelview transformation.
Remember that a 3D image ca show only part of the infinite 3D world. The view volume is the part of the world that is visible in the image. The view volume is determined by a combination of the viewing transformation and the projection transformation. The viewing transform determines where the viewer is located and what direction the viewer is facing, but it doesn't saw how much of the world the viewer can see. The projection transform does that: It specifies the shape and extent of the region that is in view. think of the view as a camera, with a big invisible box attached to the front of the camera that encloses the part of the world that the camera has in view. The inside of the box is the view volume. As the camera moves around the world, the view volume changes. The shape and size of the box correspond to the projection transformation. The position and orientation of the camera correspond to the viewing transform.
The OpenGL projection transformation transforms eye coordinates to clip coordinates, mapping the view volume onto the 2-by-2-by-2 clipping cube that contains everything that will be visible in the image. To specify a projection just means specifying the size and shape of the view volume relative to the viewer.
There are in general two types of projection:
- Perspective projection
- Perspective projection shows you what you would see if the OpenGL display rectangle on your computer screen were a window into an actual 3D world. It shows a view that you could get by taking a picture of a 3D world with an ordinary camera.
Since the OpenGL depth buffer can only store a finite range of depth values, it can't represent the entire range of depth values for the infinite pyramid that is theoretically in view. Only objects in a certain range of distances from the viewer can be part of the image. That range of distances is specified by two values, near and far. For a perspective transformation, both of these must be positive, far must be greater than near, and anything closer to the viewer than near or farther away than far is discarded and does not appear in the rendered image. The volume of space that is represented in the image is thus a truncated pyramid
. The pyramid is the view volume for a perspective projection:
- Orthographic Projection
In orthographic projection, the 3D world is projected onto a 2D image by discarding the z-coordinate of the eye-coordinate system. This type of projection is unrealistic in that it is not what a viewer would see. This type of projection is unrealistic in that it is not what a viewer would see.
Modeling
and viewing
might seem like very different things, but OpenGL combines them into a single transformation. This is because there is no way to distinguish them in principle; the difference is purely conceptual. That is, a given transformation can be considered to be either a modeling transformation or a viewing transformation, depending on how you think about it. One significant difference, conceptually, is that the same viewing transformation usually applies to every object in the 3D scene, while each object can have its own modeling transformation.
Although modeling and view transformations are the same in principle, they remain different conceptually and they are typically applied at different points in the code. In general, when drawing a scene, you will do the following:
- Load the identity matrix, for a well defined starting-point
- Apply the viewing transformation
- Draw the objects in the scene; each with its own modeling transformation
Projection and viewing are often discussed using the analogy of a camera. Setting up the viewing transformation is like positioning and pointing the camera.
Polygonal Meshes and glDraw
Arrays
OpenGL can only directly render points, lines, and polygons. A polyhedron, the 3D analog of a polygon, can be represented exactly, since a polyhedron has faces that are polygons. Curved surfaces can only be approximated. A polyhedron can be represented, or a curved surface can be approximated, as a polygonal mesh, that is, a set of polygons that are connected along their edges. If the polygons are small, the approximation can look like a curved surface. The polygons in a polygonal mesh are also referred to as faces
and one of the primary means of representing a polygonal mesh is an indexed face set or IFS. The data for an IFS includes a list of all vertices that appear in the mesh, giving the coordinates of each vertex. A vertex can then be identified by an integer that specifies its index, or position, in the list.
It should be possible to store information on the GPU for drawing objects so that it can be reused without transmitting it. There are two techniques for doing this: display lists and vertex buffer objects (VBOs). Display lists are useful when the same sequence of OpenGL commands will be used by those commands. A display list can be stored in a GPU. The contents of the display list only have to be transmitted once to the GPU. Once a list has been created, it can be called
. The key point is calling a list only requires one OpenGL command. Although the same list of commands still has to be executed, only one command has to be transmitted from the CPU to the graphics card, and then the full power of hardware acceleration can be used to execute the commands at the highest possible speed.
Vertex buffer objects take a different approach to reusing information. They only store data, not commands. A VBO is similar to an array. In fact, it is essentially an array that can be stored on the GPU for efficiency of reuse. There are OpenGL commands to create and delete VBOs and to transfer data from an array to the CPU side into a VBO on the GPU.
OpenGL 1.1: Light and Material
One of the goals of computer graphics is physical realism, that is, making images that look like they could be photographs of reality. One important aspect of physical realism is lighting: the play of light and shadow, the way that light reflects from different materials, the way it can bend or be diffracted as it passes through translucent objects. Another goal of computer graphics is speed. OpenGL was designed for real-time graphics where the time that is available for rendering an image is a fraction of a second. Material properties determine how the objects interact with light.
Introduction to Lighting
The properties of a surface that determine how it interacts with light are referred to as the material of the surface. A surface can have several different material properties.
When light strikes a surface, some of it will be reflected. Exactly how it reflects depends in a complicated way on the nature of the surface, the material properties of the surface. In OpenGL1.1, the complexity of lighting is approximated by two general types of reflection: specular reflection and diffuse reflection.
In perfect specular (mirror-like
) reflection, an incoming ray of light is reflected from the surface intact. The reflected ray makes the same angle with the surface as an incoming ray. Even if the entire surface is illuminated by the light source, the viewer will only see the reflection of the light source at those points on the surface where the geometry is right. Such reflections are referred to as specular highlights. In practice, we think of a ray of light being reflected not as a single perfect ray, but as a cone of light, which can be more or less narrow.
- Specular reflections from a very shiny surface produces very narrow cones of reflected light; Specular highlights on such a material are small and sharp.
- A duller surface will produce wider cones of reflected light and bigger, fuzzier specular highlights.
- In OpenGL, the material property that determines the size and sharpness of specular highlights is called shininess.
In pure diffuse reflection, an incoming ray of light is scattered in all directions equally. A viewer would see reflected light from all points on the surface.
When light strikes a surface, some of the light can be absorbed, some can be reflected diffusely, and some can be reflected specularly. The amount of reflection can be different for different wavelengths. The degree to which a material reflects light of various wavelengths is what constitutes the color of a material. A material can have two different colors:
- a diffuse color that tells the material how to reflect the light diffusely
- This is the basic color of the object
- a spectral color that tell show it reflects the light specularly
- This determines the color of specular highlights
The diffuse and specular colors can be the same; for example, this is often true for metallic surfaces. Or they can be different; for example, a plastic surface will often have white specular highlights no matter what the diffuse color.
Open GL goes further; there are two more colors associated with a material:
- the third color is the ambient color of the material, which tells how the surface reflects ambient light. Ambient light refers to a general level of illumination that does not come directly from a light source. It consists of light that has been reflected and re-reflected so many times that it is no longer coming from any particular direction. Ambient light is why shadows are not absolutely black. Ambient light is only a crude approximation for the reality of multiply reflected light.
- The ambient color of a material determines how it will reflect various wavelengths of ambient light.
- Ambient color is generally set to be the same as diffuse color.
- The fourth color associated with a material is an emission color, which is not really a color in the sense of the other three. It has nothing to do with how the surface reflects light. The emission color is color that does not come from any external source, and therefore seems to be emitted by the material itself. It means that the object can be seen even if there is no light. The emission color is usually black.
Leaving aside ambient light, the light in an environment comes from a light source such as a lamp or sun. A lamp and the sun are examples of two essentially different kinds of light source: a point light and a directional light:
- A point light source is located at a point in 3D space, and it emits light in all directions from that point.
- For directional light, all the light comes from the same direction, so that the rays of light are parallel.
A light can have color. In OpenGL, each light source has three colors: an ambient color, a diffuse color, and a specular color. Just as the color of a material is more properly referred to as reflectivity, color of light is more properly referred to as intensity or energy. Color refers to how the light's energy is distributed among different wavelengths. The ambient intensity of a light in OpenGL is added to the general level of ambient light.
The visual effect of light shining on a surface depends to a great extent on the angle at which the light strikes the surface. The angle is essential to specular reflection and also affects diffuse reflection. To calculate this angle, OpenGL needs to know the direction in which the surface is facing. That direction is specified by a vector that is perpendicular to the surface. A non-zero vector that is perpendicular to a surface at a given point is called a normal vector to the surface. When used in lighting calculations, a normal vector must have a length equal to 1 (called the unit normal). In OpenGL, normal vectors are assigned only to the vertices of primitives. The normal vectors at the vertices of a primitive are used to do lighting calculations for the entire primitive.
The two ways of assigning normal vectors are called flat shading and smooth shading. Flat shading makes a surface look like it is made of flat sides or facets. Smooth shading makes it look more like a smooth surface.
The goal of lighting calculations is to produce a color, for a point on a surface. In OpenGL 1.1, lighting calculations are actually done only at the vertices of a primitive. After the color of each vertex has been computed, colors for interior points of the primitive are obtained by interpolating vertex colors. The alphas component of the vertex color is easy: it's simply the alpha component of the diffuse material color at that vertex. The calculation of , , and is fairly complex and rather mathematical:
- Assume that the ambient, diffuse, specular, and emission colors of the material have RGB components
- Suppose that the global ambient intensity, which represents the ambient light that is not associated with any light source in the environment is .
- There can be several point and directional light sources, which we render to as light number 0, light number 1, light number 2, and so on...
- With this setup, the red component of the vertex color will be:
where is the red component of the contribution that comes from light number 0, and so on. A similar equation holds for the green and blue color components of the color. This equation says that the emission color, is simply added to any contributions to the color. It also says that the contribution by the material ambient color is obtained by multiplying the global ambient intensity, , by the material ambient color, . This is the mathematical way of saying that the material ambient color is the fraction of the ambient light is reflected by the surface. The terms represent contributions from the various light sources. For these we have to look at geometry as well as the colors:
The angle between N and L is the same as the angle between N and R. Given that the light has ambient, diffuse, and spectral color components . Also, let be the value of the shininess property of the material. Then, assuming that the light is enabled, the contribution of this light source to the red component of the vertex color is:
The first term, accounts for the contribution of the ambient light from this light source to the color of the surface. This term is added to the color whether or not the surface is facing the light. The value of is 0 if the surface is facing away from the light and is 1 if the surface faces the light; it accounts for the fact that the light only illuminates one side of the surface. To test whether is 0 or 1, we can check whether is less than 0. This dot product is the cosine of the angle between L and N; it is less than 0 when the angle is greater than 90 degrees, which would mean the normal vector is on the opposite side of the surface from the light.
The diffuse component of the color, before adjust by , is given by . This represents the diffuse intensity of the light times the reflectivity of the material, multiplied by the cosine of the angle between and . The angle is involved because for a larger angle, the same amount of energy from the light is spread out over a greater area.
Light and Material in OpenGL1.1
Material properties are vertex attributes in that same way that color is a vertex attribute. That is, the OpenGL state includes a current value for each of the material properties.
Image Textures
Uniformly colored 3D objects look nice enough, but they are a little bland. Their uniform colors don't have the visual appeal of say, a brick wall or a plaid couch. Three-dimensional objects can be made to look more interesting and more realistic by adding a texture to their surfaces. A texture is some sort of variation from pixel to pixel within a single primitive. An image texture can be applied to a surface to make the color of the surface vary from point to point, something like painting a copy of the image onto the surface.
Textures might be the most complicated part of OpenGL, and they are the part that has survived, and become more complicated, in the most modern versions since they are so vital for the efficient creation of realistic images. An image that is used as a texture should have a width and height that are powers of two (power-of-two textures). A texture image comes with its own 2D coordinate system. Traditionally, is used for the horizontal coordinate and is used for the vertical coordinate. . To draw a textured primitive, we need a pair of numbers for each vertex. These are the texture coordinates for that vertex. They tell which point in the image is mapped to the vertex.
When a texture is applied to a surface, the pixels in the texture do not usually match up one-to-one with pixels on the surface, and in general, the texture must be stretched or shrunk as it is being mapped onto the surface. When multiple pixels in the texture are mapped to the same pixel on the surface and then combined in some way, the process of combining the pixels is called a minification filter because the texture is being shrunk. When one pixel from the texture covers more than one pixel on the surface, the texture has to be magnified, and we need a magnification filter. The pixels in a texture are referred to as texels, short for texture pixel
or texture element
.
A mipmap for a texture is a scaled-down version of that texture. A complete set of mipmaps consists of the full-sized texture, a half-sized version in which each dimension is divided by two, a quarter-sized version, and so on until the final mipmap consists of one pixel.
The total memory required for mipmaps is only one-third more than the memory used for the original texture. Mipmaps are used only for are used only for minification filtering. They are essentially a way of pre-computing the bulk of the averaging that is required when shrinking a texture to fit a surface. OpenGL can use 1-D, 2-D, and 3-D textures.
Since texture coordinates are no different from vertex coordinates, they can be transformed in exactly the same way. OpenGL maintains a texture transformation as part of its state, along with the modelview and projection transformations.
Lights, Camera, Action
A scene in computer graphics can be a complex collection of objects, each with its own attributes. Rendering a scene means traversing the scene graph, rendering each object in the graph as it is encountered. Choice in the shape of graphs:
In general, a scene graph can be a:
- Directed Acyclic Graph (DAG)
- A tree-like structure except that a node can have several parents in the graph. This has the advantage that a single node in the graph can represent several objects in the scene, since in a DAG, a node can be encountered several times as the graph is traversed.
- Representing several objects with one scene graph node can lead to a lack of flexibility, since those objects will all have the same value for any property encoded in the node.
- Tree
- In a tree, each node has a unique parent, and the node will be encountered only once as the tree is traversed.
If we want to implement a viewer that can be moved around the world like other objects. Sometimes, such a viewer is thought of as a moving camera. The camera is used to take pictures of the scene. We want to be able to apply transformations to a camera just as we apply transformations to other objects. The position and orientation of the camera determine what should be visible when the scene is rendered. And the size
of the camera, which can be affected by a scaling transformation, determines how large a field of view it has. A camera really represents the viewing transformation that we want to use.
It can also be useful to think of lights as objects, even as part of a complex object.
Three.js: A 3D Scene Graph API
WebGL is a low level language - even more so than OpenGL 1.1, since a WebGL program has to handle a lot of the low-level implementation details that were handled internally in the original version of OpenGL. Before looking at WebGL, the book will look at a higher-level API for 3D web graphics that is built on top of WebGL: three.js.
Three.js Basics
Three.js is an object-oriented JavaScript library for 3D graphics. It is an open-source project originally created by Ricardo Cabello, who goes by the handle mr.doob
. It seems to be the most popular open-source JavaScript library for 3D web applications.
In JavaScript, a module is a script that is isolated from other scripts, except that a module can export
identifiers that it defines. Identifiers that are exported by one script can then be imported
by another script. A module only has access to an identifier from another module if the identifier is explicitly exported by one module and imported by the other.
Three.js works with the HTML <canvas>
element, the same thing that we used for the 2D graphics. Three.js is an object-oriented scene graph API. The basic procedure is to build a scene graph out of three.js objects, and then to render an image of the scene it represents. Animation can be implemented by modifying properties of the scene graph between frames. three.js is made up of a large number of classes. Three of the most basic are THREE.Scene
. THREE.Camera
, and THREE.WebGLRenderer
. THREE.WebGLRenderer
is by far the most common. A three.js program will need at least one of each type.
- A
Scene
object is a holder for all the objects that make up the 3D world, including lights, graphical objects, and possibly cameras. It acts as a root node for the scene graph. - A
Camera
is a special kind of object that represents a viewpoint from which an image of a 3D world can be made. It represents a combination of a viewing transformation and a projection. - A
WebGLRenderer
is an object that can create an image from a scene graph.
There are orthographic projection and perspective projection cameras available in three.js. A camera, like other objects, can be added to a scene, but it does not have to be part of the scene graph to be used. You might add it to the scene graph if you want it to be a parent or a child of another object in the graph. In any case, you will generally want to apply a modeling transformation to the camera to set its position and orientation in 3D space. A renderer is an instance of the class WebGLRenderer
. The main thing you want to do with a renderer is render an image. For that, you also need a scene and a camera.
A three.js scene graph is made up of objects of type THREE.Object3D
. Cameras, lights, and visible objects are all represented by a subclasses of Object3D
. Any Object3D
contains a list of child objects, which are also of type Object3D
. The child lists define the structure of the scene graph.
A three.js scene graph must be a tree. Every node in the graph has a unique parent node except for the root node, which has no parent. An Object3D
has a property which points to the parent of the object in the scene graph. This should never be set manually - it is set when you add a child to the object. To make it easy to duplicate parts of the structure of a scene graph, Object3D
defines a clone() method. This method copies the node, including the recursive copying of the children of that node.
An Object3D
has an associated transformation, given by its scale, rotation, and position properties. The object is first scaled, then rotated, then translated according to the values of these properties. The scale and position properties are of type THREE.Vector3
. A Vector3
represents a vector or point in three dimensions. A Vector3
object can be constructed from three numbers that give the coordinates of the vector. The properties can be set individually, or all at once. The object obj.rotation
has properties obj.rotation.x
, obj.rotation.y
, obj.rotation.z
that represent rotations. The angles are measured in radians. The object is rotated first about the x-axis, then the y-axis, and then the z-axis. The value of obj.rotation
is not a vector. It belongs to the type THREE.Euler
, and the angles of rotation are called Euler angles.
A visible object in three.js is made up of either points, lines or triangles. An individual object corresponds to an OpenGL primitive. There are five classes to represent these possibilities: THREE.Points
for points, THREE.Mesh
for triangles, THREE.LineSegments
and THREE.LineLoop
.
A visible object is made up of some geometry plus a material that determines the appearance of that geometry. In three.js, the geometry and material of a visible object are themselves represented by JavaScript classes THREE.BufferGeometry
and THREE.Material
.
- An object of
THREE.BufferGeometry
can store vertex coordinates and their attributes. They are stored in JavaScript types arrays.
In three.js, to make some geometry into a visible object, we also need an appropriate material. The colors for points are stored in the geometry, not in the material.
Colors in three.js can be represented by values of type THREE.Color
. The class THREE.Color
represents an RGB color. A Color
object has properties , , and giving the red, green, and blue color components as floating point numbers in the range from 0.0 to 1.0.
three.js comes with classes to represent common mesh geometries, such as a sphere, a cylinder, and a torus. There are also geometry classes representing the regular polyhedra: THREE.TetrahedronGeometry
, THREE.OctahedronGeometry
, THREE.DodecahedronGeomertry
, and THREE.IcosahedronGeometry
. (For a cube use BoxGeometry
).
To create a mesh object, you need a material as well as a geometry. There are several kinds of material suitable for mesh objects, including THREE.MeshBasicMaterial
, THREE.MeshLambertMaterial
, and THREE.MeshPhongMaterial
, THREE.MeshStandardMaterial
and THREE.MeshPhysicalMaterial
.
- A
MeshBasicMaterial
represents a color that is not affected by lighting; it looks the same whether or not there are lights in the scene, and it is not shaded, giving it a flat rather than 3D appearance. - The other two classes represent materials that need to be lit to be seen. They implement models of lighting known as lambert shading and Phong shading. The main difference is that
MeshPhongMaterial
has a specular color butMeshLambertMaterial
does not.
Material Properties:
vertexColors
: a boolean property that can be set totrue
to use vertex colors from the geometry. The default isfalse
wireframe
- a boolean value that indicates whether the mesh should be drawn as a wireframe model, showing only the outlines of its faces. The default isfalse
. Atrue
value works best withMeshBasicMaterial
wireframeLinewidth
- the width of the lines used to draw the wireframe, in pixelsvisible
- a boolean value that controls whether the object on which it is used is rendered or notside
- has valueTHREE.FrontSide
,THREE.BackSide
, orTHREE.DoubleSide
, with the default beingTHREE.FrontSide
. This determines whether faces of the mesh are drawn or not, depending on which side of the face is visible. With the default value,THREE.FrontSide
, a face is drawn only if it is being viewed from the front.THREE.DoubleSide
will draw it whether it is viewed from the front or from the back, andTHREE.BackSide
will draw it only if it is viewed from the back.- For closed objects, such as a cube or complete sphere, the default value makes sense.
- For a plane or an open tube, or a partial sphere, the value should be set to
THREE.DoubleSide
flatShading
- a boolean value with the default being false. This does not work withMeshBasicMaterial
. For an object that is supposed to lookfaceted
, with flat sides, it is important to set this property to true. That would be the case, for example, for a cube or for a cylinder with a small number of sides.- Mesh Materials Examples
Compared to geometries and materials, lights are easy. Three.js has several classes to represent lights. Lights are subclasses of THREE.Object3D
. A light object can be added to a scene and will then illuminate objects in the scene.
- The class
THREE.DirectionalLight
represents light that shines in parallel rays from a given direction, like the light from the sun. Theposition
property of a directional light gives the direction from which the light shines. Note that the light shines from the given position toward the origin. - The class
THREE.PointLight
represents a light that shines in all directions from a point. The location of the point is given by the light's position property. - A third type of light is
THREE.AmbientLight
. This class exists to add ambient light to a scene. - The fourth type of light,
THREE.SpotLight
, is something new. An object that represents a spotlight, which is similar to a point light, except that the shining in all directions, a spotlight produces a cone of light. The vertex of the cone is located at the position of the light. By default, the axis of the cone points from that location towards the origin. - Modeling Example
- Modeling Starter
Building Objects
In three.js, a visible object is constructed from a geometry and a material.
A mesh in three.js is what is called a polygonal mesh earlier in the notes. In three.js mesh, all of the polygons must be triangles. A basic polygonal mesh representation does not use face indices. Instead, it specifies each triangle by listing the coordinates of the vertices. This requires nine numbers—three numbers per vertex—for the three vertices of the triangle.
A three.js mesh object requires a geometry and a material. The geometry is an object of type THREE.BufferedGeometry
which has a position attribute that holds the coordinate of the vertices that are used in the mesh. The attribute uses a typed array that holds the coordinates of the vertices of the triangles that make up the mesh. When this geometry is used with a Lambert or Phong material, normal vectors are required, else the materials will appear black. You can call THREE.BufferedGeometry.computeVertexNormals
to create the normal vectors that are perpendicular to the faces. It is possible to use different materials on different faces of a mesh with vertex groups.
In addition to letting you build indexed face sets, three.js has support for working with curves and surfaces that are defined mathematically.
A texture can be used to add visual interest and detail to an object. In three.js, an image texture is represented by an object of type THREE.Texture
. The image for a three.js texture is generally loaded from a web address. A Texture
has a number of properties that can be set, including properties for minification and magnification filters for the texture and a property to control the generation of mipmaps, which is done automatically by default. The properties that you most likely want to change are the wrap mode for the texture coordinates outside the range 0 to 1 and the texture transformation.
Although it is possible to create mesh objects by listing their vertices and faces, it would be difficult to do it by hand for all but very simple objects. It is much easier to design an object using an interactive modeling program such as Blender. Modeling programs like Blender can export objects using many different file formats. Three.js has utility functions for loading models from files in a variety of file formats. The preferred format for model files is GLTF. A GLTF model can be stored in a text file with extension .gltf
or in a binary file with the extension .glb
. Binary files are smaller but non-human-readable. The data returned y the GLTFLoader
contains a three.js Scene
. Any object defined by the file will be part of the scene graph for that scene. The object comes complete with both geometry and material.
Other Features
The THREE.InstancedMesh
makes it possible to quickly render several objects, possibly a large number of objects, that use the same geometry but differ in the transformations that are applied to them, and possibly, in their material color. Each copy of the object is called an instance
.
Most real programs require user input. In Three.js, you can use the mouse to rotate the scene with TrackballControls
or the class OrbitControls
.
The THREE.Raycaster
allows you to detect what objects were clicked in the scene.
Shadows can add a nice touch of realism to a scene, but OpenGL and WebGL cannot generate shadows automatically. One method to compute shadows, shadow mapping, is implemented n Three.js, but it is not trivial to use.
The basic idea of shadow mapping is fairly straightforward: To tell what parts of a scene are in shadow, you have to look at the scene from the point of view of the light source. Things that are visible from the point of view of the light are illuminated by that light. Things that are not visible from the light are in shadow.
It would be nice to put our scenes in an environment
such as the interior of a building, a nature scene, or a public square. It's not practical to get a reasonably good effect using textures. The technique used in three.js is called a skybox. A skybox is a large cube - effectively infinitely large - where a different texture is applied to each face of the cube. The textures are the images of some environment.
For a viewer inside the cube, the six texture images on the cube fit together to provide a complete view of the environment in every direction. The six texture images together make up what is called a cubemap texture. The images must match up along the edges of the cube to form a seamless view of the environment
A reflective surface shouldn't just reflect light - it should reflect its environment. Three.js can use environmental mapping to simulate reflection. Environmental mapping uses a cube map texture.
Introduction to WebGL
WebGL is the version of OpenGL for the Web. Three.js uses WebGL for 3D graphics. It is more difficult to use WebGL directly, but doing so gives you full control over the graphics hardware. Learning it will be a good introduction to modern graphics programming. There are two sides to any WebGL program. Part of the program is written in JavaScript, the programming language for the web. The second part is written in GLSL, a language for writing shader
programs that run on the GPU.
The Programmable Pipeline
OpenGL 1.1 uses a fixed-function pipeline for graphics processing. Data is provided by a program and passes through a series of processing stages that ultimately produce the pixel colors seen in the final image. The program can enable and disable some of the steps in the process, such as the depth test and lighting calculations. But there is no way for it to change what happens at each stage. The functionality is fixed.
OpenGL 2.0 introduced a programmable pipeline. It became possible for the programmer to replace certain stages in the pipeline with their own programs. This gives the programmer complete control over what happens at each stage. WebGL uses a programmable pipeline, and it is mandatory. The programs that are written as part of the pipeline are called shaders. For WebGL, you need to write a vertex shader, which is called once for each vertex in a primitive, and a fragment shader, which is called once for each pixel in the primitive. Shaders are written in some version of the GLSL programming language. GLSL is based on the C programming language.
The WebGL graphics pipeline renders an image. The data that defines the image comes from JavaScript. As it passes through the pipeline, it is processed by the current vertex shader and fragment shader as well as by the fixed-function stages of the pipeline. The basic operation in WebGL is to draw a geometric primitive. WebGL uses just 7 of the OpenGL primitives. When WebGL is used to draw a primitive, there are two general categories of data that can be provided for the primitive. The two kinds are referred to as attribute variables and uniform variables. A primitive is defined by its type and list of vertices.
- One attribute that should always be specified s the coordinate of the vertex. The vertex coordinates must be an attribute since each vertex in a primitive will have its own set of coordinates.
- Another possible attribute is color. You can specify a different color for each vertex.
WebGL does not come with any predefined attributes, not even one for vector coordinates. The meaning of attributes is determined by what is done with them.
When drawing a primitive, the JavaScript program specifies values for any attributes and uniforms in the shader program. For each attribute, it will specify an array of values, one for each vertex. For each uniform, it will specify a single value. It must send these values to the GPU before drawing the primitive. The primitive can then be drawn by calling a single JavaScript function. At that point, the GPU takes over, and executes the shader programs. When drawing the primitive, the GPU calls the vertex shader once for each vertex. The attribute values for the vertex that is to be processed are passed as input into the vertex shader. Values of uniform variables are also passed to the vertex shader. The way this works is that both attributes and uniforms are represented as global variables in the vertex shader program. Before calling the shader for a given vertex, the GPU sets the values of those variables appropriately for that specific vertex.
As one of its outputs, the vertex shader must specify the coordinates of the vertex in the clip coordinate system. It does that by assigning a value to a special variable named gl_Position
. The position is often computed by applying a transformation to the attribute that represents the coordinates in the object coordinate system. After the positions of the vertices have been computed, a fixed-function stage in the pipeline clips away the parts of the primitive whose coordinates are outside the range of valid clip coordinates. The primitive is then rasterized - it is determined which pixels lie inside the primitive. The GPU then calls the fragment shader once for each pixel that lies in the primitive. l. Pixel coordinates are computed by interpolating the values of gl_Position
that were specified by the vertex shader.
Other quantities besides the coordinates can work in much the same way. A varying variable is declared both in the vertex shader and the fragment shader. The vertex shader is responsible for assigning a value to the varying variable. Each vertex of a primitive can assign a different value to the variable. The interpolator takes all the values produced by executing the vertex shader for each vertex in the primitive, and it interpolates those values to produce a value for each pixel.
The JavaScript side of the program sends values for attributes and uniform variables to the GPU and then issues a command to draw a primitive. The GPU executes the vertex shader once for each vertex. The vertex shader can use the values of attributes and uniforms. It assigns values togl_Position
and to any varying variables that exist in the shader. After clipping, rasterization, and interpolation, the GPU executes the fragment shader once for each pixel in the primitive. The fragment shader can use the values of varying variables, uniform variables, andgl_FragCoord
. It computes a value forgl_FragColor
. This diagram summarizes the flow of data:
3D Graphics with WebGL
A procedural texture is defined by a function whose value is computed rather than looked up. That is, the texture coordinates are used as input to a code segment whose output is the corresponding color value for the texture. A texture can refer to variation in any property. One example is bumpmapping, where the property that is modified by the texture is the normal vector to the surface.
Natural-looking textures often have some element of randomness. We can't use actual randomness, since the texture would look different every time it is drawn. Many natural-looking procedural textures are based on a type of pseudo-randomness called Perlin noise. In WebGL, a framebuffer is a data structure that organizes the memory resources that are needed to render an image. A depth mask is a boolean value that controls whether values are written to the depth buffer during rendering.
Beyond Basic 3D Graphics
- Ray Tracing: To find out what you see when you look in a given direction, consider a ray of light that arrives at your location from that direction, and follow that light ray backwards to see where it came from. Or, as it is usually phrased, cast a ray from your location in a given direction and see what it hits. The operation of determining what is hit by a ray is called ray casting.
- Path Tracing: Like ray tracing, path tracing computes colors for points in an image by tracing the paths of light rays backwards from the viewer through points on the image and into the scene. In path tracing, the idea is to account for all possible paths that the light could have followed.
Introduction to WebGPU
- WebGPU is a new API for computer graphics on the web. It is a vey low-level API.
Comments
You can read more about how comments are sorted in this blog post.
User Comments
There are currently no comments for this article.