http://vr-zone.com/articles/-rumour-geforce-gtx-400-series-details-performance-pricing-etc.-/8487.html
[Rumour] Geforce GTX 400 Series details (Performance, pricing, etc.)
—-
http://developer.download.nvidia.com/compute/cuda/3_0/docs/NVIDIA_FermiCompatibilityGuide.pdf
Fermi Compatability Guide
http://developer.download.nvidia.com/compute/cuda/3_0/docs/NVIDIA_FermiTuningGuide.pdf
Fermi Tuning Guide
http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide_3.0.pdf
CUDA Programming Guide for CUDA Toolkit 3.0
http://developer.download.nvidia.com/compute/cuda/docs/CUDA_Developer_Guide_for_Optimus_Platforms.pdf
CUDA Developer Guide for Optimus Platforms
On devices of compute capability 1.x, some kernels can achieve a speedup when using (cached) texture fetches rather than regular global memory loads (e.g., when the regular loads do not coalesce well).
Unless texture fetches provide other benefits such as address calculations or texture filtering (Section 5.3.2.5), this optimization can be counter-productive on devices of compute capability 2.0, however, since global memory loads are cached in L1 and the L1 cache has higher bandwidth than the texture cache.
The shared memory hardware is improved on devices of compute capability 2.0 to support multiple broadcast words and to generate fewer bank conflicts for accesses of 8-bits, 16-bits, 64-bits, or 128-bits per thread (Section G.4.3).
——-
目前來說,GF100初期的產品性能應該是會很悽慘,只是悽慘歸悽慘,之後的產品還是以這個東西為基礎….
現在問題是當初NV30在中低階一路崩盤到NV4x才挽回,Fermi的中低階會如何?
所以讓人很想思考Fermi「架構」上會有多少劣勢。
當初NV30的主要性能以FX12提供、FP32單元數量很少,面對犧牲精確度用FP24的R300有很大的性能劣勢,需要廠商主動針對FX12/FP16去做optimize….
而Fermi的架構上維持G80時代shader數量上的設計觀念,基本上還是累積與加強,雖說某種意味上帳面上還是128sp的G92、256sp的GT200b,細部的tune up則相當多,ROP性能也有提昇。
反過來說ATI的SIMD unit數量還是堆在那邊….XD
well,聽說GF104(256sp)還是比65nm G92的die size大就是了。那幾乎就和RV870一樣大了…._A_)a