SIGGRAPH 2010: Deferred Lighting / Deferred Shading
There was a presentation given at the Beyond Programmable Shading day on Deferred Shading. I believe the presenter wanted to compare Deferred Lighting with Deferred Shading. I couldn’t attend the Beyond Programmable Shading day this year, so I was only looking at the slides. The interesting part was that Deferred Shading was implemented with the Compute Shader and performance figures were given for ATI’s 5870 and a NVIDIA GTX 480 for up to 1000 lights.
You can find the talk here:
http://bps10.idav.ucdavis.edu/
Having helped to ship games with Deferred Shading and later with Deferred Lighting gives me a good rough estimate on how those two compare with each other.
The presentation shows that the highest end graphics cards seem to max out with 1000 lights in the given scenario with the help of compute shader support. Most of my tests four years ago were done on a XBOX 360 or PS3 and later on lower end graphics cards. From what I remember in an artifical scenario 256 – 512 lights on a XBOX 360 / PS3 were possible in a similar setting that was described in the talk.
On low-end PC graphics cards like the 9600M GT of my MacBook Pro, we can run 8000 small point lights while a whole game level and colliding particles are rendered with more than 40 fps.
Comparing Deferred Shading with Deferred Lighting, I believe Deferred Lighting should be faster in all scenarios because you fetch fewer render targets and you do not resolve the lighting equation for each light.
Because the presenter used high-end NVIDIA and ATI cards I thought it would be cool to use an integrated INTEL GPU to show off Deferred Lighting and everyone could enjoy it. The drivers for those GPUs are really good now and our system only requires DX10, so we don’t use the compute shader. So I thought I give it a try on my two-and-a-half year old Lenovo X301 (http://www.notebookcheck.net/Lenovo-ThinkPad-X301.16099.0.html) with an Intel Graphics Media Accelerator (GMA) 4500MHD. This chipset is obviously not INTEL’s latest but to my surprise it ran our demo quite well. Wikipedia says this GPU has a theoretical memory bandwidth of 12.8 GB/s. The GPUs used in the SIGGRAPH presentation have a theoretical memory bandwidth of 153.6 GB/sec (ATI RADEON 5870) and 177.4 GB/sec (GeForce GTX 480) if they are standard GPUs; some vendors sell those GPUs with higher memory clock rates.
I chose to visualize 1000 small point lights without specular and the resolution of the notebook is set to 1024×768, so much smaller than what was used on SIGGRAPH. The particles, to which the point lights are attached to, also collide with the environment and bounce off. Nevertheless it was running during my tests with roughly 11 to 22 fps ![]()
Here are two screenshots and a shot of the laptop:
I think it would be cool to see more stuff running on INTEL integrated chip sets, after all it is fun to get things going on low-end GPUs
… raise your hand if you’re with me













Resolution is 1280×720 and the GPU still runs with 158 frames per second. The whole level has about 16k lights.
The pixel represented by a square has two triangles (blue and yellow) crossing some of its sample points. The black dot represents the pixel sample location (pixel center); this is were the pixel shader is executed. The cross symbol corresponds to the location of the multisamples where the depth tests are performed. Samples passing the depth test receive the output of the pixel shader. Those samples are replicated by the MSAA back-end into a multisampled render target that represents each pixel with -in that case- four samples. That means the render target size for an intended resolution of 1280×720 would be 2560×1440 representing each pixel with four samples but the pixel shader only writes 1280×720 times (assuming there is no overdraw) while the MSAA back-end replicates for each pixel four samples into the multisampled render target.

















