unity的ShadowSamplingTent源码

unity的ShadowSamplingTent源码

剧集推荐📺:
《交响情人梦》,一定要看真人版🤷‍♂️


最原始的PCF的做法是:将shading point投影到light space后,取其周围一圈的采样点(例如3*3范围内的9个像素),每个采样点都和该shading point比较深度,如果可见则为1不可见则为0,然后把这所有的比较结果平均起来,最终得到该shading point的visibility。

原始PCF的问题:

  • 要采样的点太多了,例如如果 N*N 的PCF那么要采样 \(N^2\) 次阴影贴图。本文接下来要介绍的是Unity源码中对PCF的优化 —— N*N 的PCF只需要 \(\left \lceil \frac{N}{2} \right \rceil \times \left \lceil \frac{N}{2} \right \rceil \)个采样点。
  • 原始PCF中只是把所有点的可见性平均起来,那么范围内所有点的贡献度是一样的,这显然是不太合理的。说到贡献度很容易会想到卷积,卷积filter可以理解为提供的就是贡献度,下面我们将使用tent filter。

Unity使用了不同规格的等腰直角三角形(之所以使用等腰直角三角形的tent filter估计就是因为该形状很方便计算吧),在4阶,6阶,8阶采样内核上进行覆盖,以获取不同纹素对阴影的贡献程度,然后遵循n阶采样内核执行\(\frac{n}{2}^2\)次采样的规则进行PCF处理。

GetTriangleTexelArea

SampleShadow_GetTriangleTexelArea
1
2
3
4
5
6
7
8
9
10
11
// Assuming a isoceles right angled triangle of height "triangleHeight" (as drawn below).
// This function return the area of the triangle above the first texel.
//
// |\ <-- 45 degree slop isosceles right angled triangle
// | \
// ---- <-- length of this side is "triangleHeight"
// _ _ _ _ <-- texels
real SampleShadow_GetTriangleTexelArea(real triangleHeight)
{
return triangleHeight - 0.5;
}

该函数计算的是等腰直角三角形在第一个纹素范围中的面积。

假设triangleHeight=h,那么有:\(S = (h + (h-1)) * 1 * \frac{1}{2} = h - 0.5\)。


GetTexelAreas_Tent

SampleShadow_GetTexelAreas_Tent_3x3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Assuming a isoceles triangle of 1.5 texels height and 3 texels wide lying on 4 texels.
// This function return the area of the triangle above each of those texels.
// | <-- offset from -0.5 to 0.5, 0 meaning triangle is exactly in the center
// / \ <-- 45 degree slop isosceles triangle (ie tent projected in 2D)
// / \
// _ _ _ _ <-- texels
// X Y Z W <-- result indices (in computedArea.xyzw and computedAreaUncut.xyzw)
void SampleShadow_GetTexelAreas_Tent_3x3(real offset, out real4 computedArea, out real4 computedAreaUncut)
{
// Compute the exterior areas
real offset01SquaredHalved = (offset + 0.5) * (offset + 0.5) * 0.5;
computedAreaUncut.x = computedArea.x = offset01SquaredHalved - offset;
computedAreaUncut.w = computedArea.w = offset01SquaredHalved;

// Compute the middle areas
// For Y : We find the area in Y of as if the left section of the isoceles triangle would
// intersect the axis between Y and Z (ie where offset = 0).
computedAreaUncut.y = SampleShadow_GetTriangleTexelArea(1.5 - offset);
// This area is superior to the one we are looking for if (offset < 0) thus we need to
// subtract the area of the triangle defined by (0,1.5-offset), (0,1.5+offset), (-offset,1.5).
   real clampedOffsetLeft = min(offset,0); // 偏移值为负值时才需要减去三角形的面积
   real areaOfSmallLeftTriangle = clampedOffsetLeft * clampedOffsetLeft;
computedArea.y = computedAreaUncut.y - areaOfSmallLeftTriangle;

// We do the same for the Z but with the right part of the isoceles triangle
computedAreaUncut.z = SampleShadow_GetTriangleTexelArea(1.5 + offset);
real clampedOffsetRight = max(offset,0);
real areaOfSmallRightTriangle = clampedOffsetRight * clampedOffsetRight;
computedArea.z = computedAreaUncut.z - areaOfSmallRightTriangle;
}

2D的tent filter可以由1D的横向tent filter和纵向tent filter相乘得到(见Convolution的博文),因此上述函数名虽然是3*3 tent filter(3*3的filter可以覆盖3*3~4*4范围的纹素),但是其实计算的是其对应的1D tent filter。

上述代码实际上做的就是计算,tent filter在x,y,z,w四个像素范围内的面积。
为什么要以tent filter在像素范围内的面积作为贡献度呢?
如果直接用像素点位置处tent filter的值作为贡献度的话,相当于先用tent filter把离散的像素冲击信号重建为连续的信号,这个连续型号看起来像锯齿是\(C^0\)连续的,这样由于信号不够光滑在部分位置会表现的会很突兀。而如果计算的是面积作为贡献度,那么其实相当于首先用box filter重建,得到了阶梯状的连续信号,接着用tent filter去平滑它,然后得到了光滑的信号,这样的信号效果更好。
其实不考虑这么多,仅从计算面积这一点来说,相当于考虑了像素内每一小段的贡献,比起直接考虑一整块像素,这样显然是更合理的。

w的面积
在没有偏移时,三角形底边在w中的长度为0.5,设offset=d,当发生偏移后w中的长度变为0.5+d,由于tent filter在w中的面积始终为等腰直角三角形,因此w区域的面积为:\(S_w = (d + 0.5) * (d + 0.5) * 0.5\)。
x的面积
三角形底边在x和w中的长度相加应该始终等于1,因此w中的长度为0.5+d的话,那么在x中的长度则为0.5-d,因此:
\(S_x = (0.5-d) * (0.5-d) * 0.5,将S_x和S_w展开后可以得到两者的关系为S_x=S_w-d \)。
y的面积

上图绿框框出的部分就是在不同偏移下y中的面积,当偏移为正值时,如上图右侧所示,面积直接就等于高度为h-d的等腰直角三角形在第一个纹素范围中的面积(调用SampleShadow_GetTriangleTexelArea);当偏移值为负值时,如上图左侧所示,面积等于高度为h-d的等腰直角三角形在第一个纹素范围中的面积减去红色等腰直角三角形的面积。

z的面积
和求y的思路一样,不再赘述。

SampleShadow_GetTexelWeights_Tent_3x3
1
2
3
4
5
6
7
8
// Assuming a isoceles triangle of 1.5 texels height and 3 texels wide lying on 4 texels.
// This function return the weight of each texels area relative to the full triangle area.
void SampleShadow_GetTexelWeights_Tent_3x3(real offset, out real4 computedWeight)
{
real4 dummy;
SampleShadow_GetTexelAreas_Tent_3x3(offset, computedWeight, dummy);
computedWeight *= 0.44444;//0.44 == 1/(the triangle area)
}

在最后还要除整个tent filter的面积,这样才能保证filter积分为1,否则前后能量不守恒。

5*5 && 7*7
以下仅以5*5为例:

SampleShadow_GetTexelWeights_Tent_5x5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Assuming a isoceles triangle of 2.5 texel height and 5 texels wide lying on 6 texels.
// This function return the weight of each texels area relative to the full triangle area.
// / \
// _ _ _ _ _ _ <-- texels
// 0 1 2 3 4 5 <-- computed area indices (in texelsWeights[])
void SampleShadow_GetTexelWeights_Tent_5x5(real offset, out real3 texelsWeightsA, out real3 texelsWeightsB)
{
// See _UnityInternalGetAreaPerTexel_3TexelTriangleFilter for details.
real4 computedArea_From3texelTriangle;
real4 computedAreaUncut_From3texelTriangle;
SampleShadow_GetTexelAreas_Tent_3x3(offset, computedArea_From3texelTriangle, computedAreaUncut_From3texelTriangle);

// Triangle slope is 45 degree thus we can almost reuse the result of the 3 texel wide computation.
// the 5 texel wide triangle can be seen as the 3 texel wide one but shifted up by one unit/texel.
// 0.16 is 1/(the triangle area)
texelsWeightsA.x = 0.16 * (computedArea_From3texelTriangle.x);
texelsWeightsA.y = 0.16 * (computedAreaUncut_From3texelTriangle.y);
texelsWeightsA.z = 0.16 * (computedArea_From3texelTriangle.y + 1);
texelsWeightsB.x = 0.16 * (computedArea_From3texelTriangle.z + 1);
texelsWeightsB.y = 0.16 * (computedAreaUncut_From3texelTriangle.z);
texelsWeightsB.z = 0.16 * (computedArea_From3texelTriangle.w);
}

5*5的tent filter可以覆盖 5*5 ~ 6*6范围的纹素,和3*3时候的情况类似,这里我们就是要计算Ax,Ay,Az,Bx,By,Bz这6个像素下tent filter的面积。
由于等腰直角三角形的优良性质,这里我们直接利用SampleShadow_GetTexelAreas_Tent_3x3的计算结果,然后不足剩余的面积来得到SampleShadow_GetTexelWeights_Tent_5x5的结果。

如上图所示,三个红、蓝、绿小三角形都是和3*3 tent filter相同的等腰直角三角形。以Az为例,其面积等于蓝色三角形y区域(3*3的区域命名)的面积加上底下的一个面积为1的正方形的面积。依次类推,可以很容易的求出Ax,Ay,Az,Bx,By,Bz区域中的面积。

7*7的情况类似,也是先算SampleShadow_GetTexelAreas_Tent_3x3,然后利用3*3来推7*7。


ComputeSamples_Tent

SampleShadow_ComputeSamples_Tent_3x3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// 3x3 Tent filter (45 degree sloped triangles in U and V)
void SampleShadow_ComputeSamples_Tent_3x3(real4 shadowMapTexture_TexelSize, real2 coord, out real fetchesWeights[4], out real2 fetchesUV[4])
{
// tent base is 3x3 base thus covering from 9 to 12 texels, thus we need 4 bilinear PCF fetches
real2 tentCenterInTexelSpace = coord.xy * shadowMapTexture_TexelSize.zw;
real2 centerOfFetchesInTexelSpace = floor(tentCenterInTexelSpace + 0.5);
real2 offsetFromTentCenterToCenterOfFetches = tentCenterInTexelSpace - centerOfFetchesInTexelSpace;

// find the weight of each texel based
real4 texelsWeightsU, texelsWeightsV;
SampleShadow_GetTexelWeights_Tent_3x3(offsetFromTentCenterToCenterOfFetches.x, texelsWeightsU);
SampleShadow_GetTexelWeights_Tent_3x3(offsetFromTentCenterToCenterOfFetches.y, texelsWeightsV);

// each fetch will cover a group of 2x2 texels, the weight of each group is the sum of the weights of the texels
real2 fetchesWeightsU = texelsWeightsU.xz + texelsWeightsU.yw;
real2 fetchesWeightsV = texelsWeightsV.xz + texelsWeightsV.yw;

// move the PCF bilinear fetches to respect texels weights
real2 fetchesOffsetsU = texelsWeightsU.yw / fetchesWeightsU.xy + real2(-1.5,0.5);
real2 fetchesOffsetsV = texelsWeightsV.yw / fetchesWeightsV.xy + real2(-1.5,0.5);
fetchesOffsetsU *= shadowMapTexture_TexelSize.xx;
fetchesOffsetsV *= shadowMapTexture_TexelSize.yy;

real2 bilinearFetchOrigin = centerOfFetchesInTexelSpace * shadowMapTexture_TexelSize.xy;
fetchesUV[0] = bilinearFetchOrigin + real2(fetchesOffsetsU.x, fetchesOffsetsV.x);
fetchesUV[1] = bilinearFetchOrigin + real2(fetchesOffsetsU.y, fetchesOffsetsV.x);
fetchesUV[2] = bilinearFetchOrigin + real2(fetchesOffsetsU.x, fetchesOffsetsV.y);
fetchesUV[3] = bilinearFetchOrigin + real2(fetchesOffsetsU.y, fetchesOffsetsV.y);

fetchesWeights[0] = fetchesWeightsU.x * fetchesWeightsV.x;
fetchesWeights[1] = fetchesWeightsU.y * fetchesWeightsV.x;
fetchesWeights[2] = fetchesWeightsU.x * fetchesWeightsV.y;
fetchesWeights[3] = fetchesWeightsU.y * fetchesWeightsV.y;
}

SampleShadow_GetTexelWeights_Tent_3x3 等函数计算了1D tent filter覆盖纹素范围内每个纹素的贡献度,图片是二维的,要想得到范围内每个像素对采样点的贡献,那么就先根据采样点的二维偏移坐标,分别求出横向和纵向的1D tent filter的贡献度,然后把两者相乘就得到了每个像素的贡献度了。由于硬件可以同时对2*2的像素做比较采样,因此现在我们需要把像素2*2分成一组(group),这样每2*2个像素只用采样一次了(记这个采样操作为\(PCF_{2\times 2}\)),然后根据上面计算得到的每个像素的贡献度来算出每个组的贡献度以及该组的uv坐标。

(为了简便以下x,y,z,w区域的贡献度也简记为x,y,z,w)
计算group的贡献度
这里同样还是先算1D,之后把横纵相乘得到2D的结果。
以3*3为例,group的贡献度就是该group中像素的贡献度之和,所以两两的贡献度加起来就可以了,因此x,y,z,w四个像素的贡献度分成了两组(x+y)和(z+w)。
由此可知,横向的两组贡献度为(U.x+U.y)和(U.z+U.w),纵向的两组为(V.x+V.y)和(V.z+V.w),将U和V不同分量进行相乘,得到对应2*2 group的贡献度。

计算采样group的坐标
这里以x,y一组为例,该组的贡献度为(x+y),设x处的采样值为a,y处的采样值为b,两者线性插值的结果为c(代表的是该组的采样结果),插值的位置为t,由于这里是1D的情况,因此以上面的图为例,设x,y,z,w最中间的竖线处坐标为0,因此y像素中心坐标为-0.5,x为-1.5,于是有下面的式子:
$$
x·a + y·b = (x + y) · c
$$
$$
c = (-\frac{1}{2}-t) · a + (t+\frac{3}{2}) · b
$$
解得\(t=-\frac{3}{2}+\frac{y}{x+y}\)。对于(z+w)这一组思路一致不再赘述,当然5*5和7*7也是一样的。
因此像这样分别求横向和纵向各组的线性插值系数t,然后两两组合就得到了\(PCF_{2\times 2}\)的采样坐标了。


使用上述函数

定义DIRECTIONAL_FILTER_SETUP
1
#define DIRECTIONAL_FILTER_SETUP SampleShadow_ComputeSamples_Tent_3x3
PCF
1
2
3
4
5
6
7
8
9
10
11
12
float weights[DIRECTIONAL_FILTER_SAMPLES];
float2 positions[DIRECTIONAL_FILTER_SAMPLES];
float4 size = _ShadowAtlasSize.yyxx; // XY:texel size ZW:total texture size
DIRECTIONAL_FILTER_SETUP(size, positionSTS.xy, weights, positions); // weights, positions是输出参数
float shadow = 0;
// 根据DIRECTIONAL_FILTER_SETUP输出的结果累计阴影采样结果
for (int i = 0; i < DIRECTIONAL_FILTER_SAMPLES; i++) {
shadow += weights[i] * SampleDirectionalShadowAtlas(
float3(positions[i].xy, positionSTS.z)
);
}
return shadow;

首先通过SampleShadow_ComputeSamples_Tent_3x3等函数计算每个group的贡献度以及group的位置(\(PCF_{2\times 2}\)在此采样),把采样group的值加权求和得到最终的结果。

评论