http://gametomorrow.com/blog/index.php/2007/09/05/cell-vs-g80/
well,IBM又拿他們的iRT來說嘴了XD
能不能拿些別的應用出來….XD
這篇是引用薩爾大學的 Philipp Slusallek教授拿G80作ray-tracing的論文,Stackless KD-Tree Traversal for High Performance GPU Ray Tracing的數據,與CELL跑iRT的比較。非常有趣的是,iRT的作者正好是 Philipp Slusallek的學生。
這完全就是重複當初RacingPHT兄自己做的Ray-Tracing demo on G80的狀況,只是當初是預估,這次是實際跑完。

從左至右:
2.6 GHz AMD Opteron – Saarland Ray-tracer
Nvidia GeForce 8800 GTX – Saarland Ray-tracer
Sony Playstation3 (partial 3.2 GHz Cell processor running Linux) – IBM iRT
3.2 GHz Cell Processor – IBM iRT
IBM QS20 Blade (Two 3.2 GHz Cell Processors) – IBM iRT
論文本身有提到因為Register footprint的關係,G80運作的效率並不高,有待CUDA本身的改善。
Although we believe a performance of over 16M rays/s to be already quite impressive for the CONFERENCE scene, we expected much higher performance: The G80 with its 128 scalar arithmetic units running at 1.3GHz should deliver over 160GFlops, meaning that tracing one ray costs about 10,000 cycles. We suspect the main bottleneck to be the large number of registers in the compiled code, which limits the occupancy of the GPU to less than 33%. Unfortunately, although the program requires much less registers, the CUDA compiler is not yet mature enough and cannot aid in reducing their count. An option would be to rewrite the whole CUDA code in PTX intermediate assembly.
不過即使是考慮33%的效率,仍與6SPE的CELL有相當的距離,更別提完整8SPE的CELL與內建雙CELL的QS20 blade,這應該與每個SPE都具備相對多數的register有關….然後IBM的人對Larrabee放話了。XD
—-
說真的,來點別的應用吧。TGS07希望多點東西….
謠傳有HOME for JP、firmware 2.0、新的線上AV服務、新定價策略與同捆包,甚至是FF7….
