In rasterization, the near plane is a weird beast. The reason for its existence is far from intuitive - after all, ray tracing does fine without it. Why does rasterization specifically have to deal with this, and the associated the trade-off between depth precision and clipping artifacts for geometry near the camera?
One reason that comes to mind is that rasterization typically uses the projected depth value Z/W, because it's convenient (the difference between perspective and orthogonal projection boils down to a few matrix values) and efficient (Z/W is linear in screen space, making hardware implementation trivial). This value isn't equi-distributed along the view vector however, and quickly goes towards minus infinity as we get closer to the camera. Historically, the near plane make sense as an arbitrary cut-off point for fixed-point depth values.
Nowadays, we could easily work around this by storing Z/W as floating point values in the depth buffer, or switching to linear depth Z altogether. Modern GPUs typically use 32bit depth buffers anyway - using D24S8 as a depth stencil format often means that the hardware has a 32bit buffer for depth (out of which 8bits are unused padding) and separate 8bit stencil buffer, potentially even in a specialized on-chip memory region. Using floating point depth doesn't incur any extra cost, so why is the near plane still a thing?
Consider the basic steps of rasterizing a 3D triangle:
- Project 3D vertices to 2D pixel positions
- Determine pixels covered by 2D triangle
- Interpolate vertex attributes with perspective correction for each pixel
Except that this is not quite correct. When projecting a 3D triangle onto a 2D screen, the result is not a triangle. In particular, when part of a triangle is in front and another part is behind the camera plane, the resulting 2D shape has three edges, two of which extend to infinity rather than meeting in a common point. However because it is much easier, both on an intuitive as well as technical level, to work under the assumption that 3D triangle map to 2D triangles, the near plane was introduced in order to clip away those parts of the geometry that don't fullfil this condition.
A better mental model is perhaps that a half-space in 3D is projected to a half-space in 2D. A triangle is simply the intersection of 3 half-spaces as defined by its edges, and the shape projected by a triangle is the intersection of the projection of each half-space defining the triangle. As it happens, these half-spaces do not necessarily enclose a finite area.
Until ray tracing finally takes over for real-time rendering, which is usually only 10 years from now at any given point of time, there's a technique called clipless rasterization that does not require clipping geometry against the near plane. It works in terms of projecting 3D edges directly to 2D half-space equations, against which points in the plane can be tested. NVIDIA GPUs have implemented this technique for a number of years, while AMD still uses traditional clipping rasterization.
While implementing a clipless rasterizer is initially straightforward to implement once all the math is worked out, it's fairly difficult to control precision. With clipping, the computed 2D vertex position are typically rounded to some fixed-point format, after which all operations are carried out exactly. Any deviation between the rasterized area and the "ground truth" is directly determined by the chosen fixed point precision. Without clipping, we only have edge equations, which are very sensitive to numerical errors. Consider two edges meeting at an acute angle: If either edge is offset by a small amount, the intersection point can potentially move very far. Another issue is that two triangles sharing an edge might generate numerically different but geometrically identical edge equations, but the rasterizer needs to ensure to generate the exact same pixels for each. One solution is to use floating point formats with enough mantissa bits to ensure that all edge computations are lossless. This is costly in hardware, but on the other hand it removes the need for complex clipping circuitry in the front-end.
However, all current graphics APIs are specified in terms of traditional rasterization, including the requirement of near-clipping. Because of this, NVIDIA emulates the effect of clipping against the near plane by discarding pixels below some depth threshold, meaning all of the user-facing benefits of clipless rasterization are lost.
Until those specifications are fixed, I guess we'll have to live with the near plane a little-bit longer.