http://blog.livedoor.jp/htmk73/archives/621167.html
岩田社長が決算説明会の質疑応答でおまえらやマスコミの報道に怒りのコメント
他根本在演戲吧w
想到這個:
http://vipper2ch.blog94.fc2.com/blog-entry-457.html
任天堂は従業員1人辺り、約10億円の売上高を上げているらしい
現金存量不知道有多可怕的地步w
(雖然任天堂CM用超兇的)
——-
http://forums.nvidia.com/index.php?showtopic=84440
NVIDIA CUDA FAQ version 2.1
CUDA FAQ #34:
Is it possible to run multiple CUDA applications and graphics applications at the same time?
CUDA is a client of the GPU in the same way as the OpenGL and Direct3D drivers are – it shares the GPU via time slicing. It is possible to run multiple graphics and CUDA applications at the same time, although currently CUDA only switches at the boundaries between kernel executions.
The cost of context switching between CUDA and graphics APIs is roughly the same as switching graphics contexts. This isn’t something you’d want to do more than a few times each frame, but is certainly fast enough to make it practical for use in real time graphics applications like games.
咦?這樣說來其實CUDA的application和graphics一樣,都得靠content switching來切換….所以說G80~GT200都沒有同時執行VS、PS的能力?這好像不太對…或者說,同樣都是time sharing的狀況下,G80~GT200都有點像single thread的CPU、而ATI從Xenos以來都一直有mutli thread的能力…. 嘿,考慮VLIW下的R600還是常有60%的worst case,這樣我還真不知道哪邊效率高w
不過如果想想Fermi相對於G80~GT200有10倍的改善這點,改變之後是20~25 microsecond(µs)的話,那就算是10倍也大概是300µs前後,好像真的不會差很多。重點可能還是在16個SM都可以各自執行不同的kernel這點也說不定。雖然這對手早就做了w
http://www.anandtech.com/video/showdoc.aspx?i=3334&p=6
Derek’s Conjecture Regarding SP Pipelining and TMT
In G80 and GT200, because of the fact that context is stored per warp, even though the SPs are working on an instruction for a different thread in every pipeline stage, they are not working on a different context at every pipeline stage. Each SP processes four threads in a row from the same warp and thus from the same context. Because it is incredibly likely at 1.5GHz that the SPs have more than 4 pipeline stages, we will still see more than one context switch within the pipeline itself, but it still isn’t down to a different context for every stage.
http://zergone.blogspot.com/2009/10/fermi-technology-unveiled.html
Fermi technology unveiled
http://techreport.com/articles.x/17670/2
Better scheduling, faster switching
“Fermi avoids this inefficiency by executing up to 16 different kernels concurrently, including multiple kernels on the same SM. The limitation here is that the different kernels must come from the same CUDA context-so the GPU could process, say, multiple PhysX solvers at once, if needed, but it could not intermix PhysX with OpenCL.”
看起來主要的限制就是不能夠好幾個不同的程式同時利用GPU….不過本來的話同一個application裡面同時有graphic和CUDA似乎也不會有這個問題,只是Fermi效率應該會更高點。
To tackle that latter sort of problem, Fermi has much faster context switching, as well. Nvidia claims context switching is ten times the speed it was on GT200, as low as 10 to 20 microseconds. Among other things, intermingling GPU computing with graphics ought to be much faster as a result. (Incidentally, AMD tells us its Cypress chip can also run multiple kernels concurrently on its different SIMDs. In fact, different kernels can be interleaved on one SIMD.)
—–
http://www.intrinsity.com/index.php/articles/64-hot-rodding
Hot-Rodding the Cortex-A8
Because Fast14 logic gates are 25% to 50% faster than static logic gates, the processor can do more work per clock cycle without altering the basic design of the instruction pipelines and functional blocks. Fast14 is particularly efficient for muxes and other elements with wide structures. Intrinsity also uses optimized static logic, custom circuits, and standard cells. (See MPR 8/13/01-02, “Intrinsity’s Dynamic Designs.”) Figure 1 shows Intrinsity’s design flow.