{"id":2594,"date":"2008-06-23T03:06:15","date_gmt":"2008-06-23T03:06:15","guid":{"rendered":"http:\/\/192.168.1.4\/wordpress4\/?p=2594"},"modified":"2008-06-23T03:06:15","modified_gmt":"2008-06-23T03:06:15","slug":"g80-rv670-register-file","status":"publish","type":"post","link":"https:\/\/aiplus.idv.tw\/wp\/2008\/06\/23\/g80-rv670-register-file\/","title":{"rendered":"G80\/RV670 register file\u5b8c\u6574\u8cc7\u8a0a\uff0c\u4ee5\u53ca\u96b1\u85cf\u7684\u7d50\u69cb\u6027\u8cea"},"content":{"rendered":"<p><a href=\"http:\/\/www.cs.ucf.edu\/~zhou\/dlp.pdf\">http:\/\/www.cs.ucf.edu\/~zhou\/dlp.pdf<\/a><br \/>\n<br \/>\nExperiencing Various Massively Parallel Architectures and Programming<br \/>\n<br \/>\nModels for Data-Intensive Applications<\/p>\n<blockquote>\n<p>Thread Hierarchy<br \/>\n<br \/>\nG80\uff1a16 Stream multi-processors (SM), 8 streaming processors (SP) per SM, 4 SMs share 1 Texture subsystem<br \/>\n<br \/>\nRV670\uff1a4 clusters, 16 x 5 cores per cluster, each cluster time-multiplex 1 Texture subsystem<\/p>\n<p>Register File (32-bit registers)<br \/>\n<br \/>\nG80\uff1a512 kB = 32kB per SM * 16 SM; 8K registers per SM; 1K register per SP<br \/>\n<br \/>\nRV670\uff1a1MB = 256kB per cluster * 4 cluster; 64K registers per cluster; 1K register per core<\/p>\n<\/blockquote>\n<p>\u9019\u592a\u7cbe\u91c7\u5566XD<\/p>\n<p>\u6240\u4ee5R600\u7684\u89c0\u5ff5\u5176\u5be6\u662f\u6bd4G80\u66f4\u5927\u7684register file\uff0c\u7136\u5f8c\u4e0d\u5177\u5099G80\u7684shared memory\u3002<br \/>\n<br \/>\nR600\/RV670\u7684GPGPU\u8981\u7b97\u5f97\u5feb\u7684\u8a71\uff0c\u5c31\u662f\u8981\u914d\u5408\u5927register file\u4f7f\u7528loop unrolling\u4e4b\u985e\u7684\u6280\u5de7\uff1b<br \/>\n<br \/>\n\u80fd\u7b97\u7684\u6771\u897f\u5247\u5f88\u985e\u4f3cPixel Shader\uff0c\u56e0\u70ba\u53ea\u8981\u505a\u4e86thread interconnection\u6548\u7387\u5c31\u6703\u5f88\u5dee\uff0c\u539f\u56e0\u662fR600\u6c92\u6709on-chip\u7684share memory\uff0c\u53ea\u6709\u8b80\u5bebon-board memory\u7684FIFO\u3002<\/p>\n<p>\u6b64\u5916\uff0cG80\/G92\u662f24\u500bwarp(32 threads per warp)\uff0cR600\/RV670\u5247\u662f\u6bcf\u500barray\u6709192\u500b&#8221;wavefront&#8221;(64 threads per wavefront)\uff0c\u9019\u500b\u8a5e\u5927\u6982\u5c31\u662f\u548cG80\u7684warp\u76f8\u5c0d\u61c9\u3002<br \/>\n<br \/>\n\u4e00\u770b\u5f88\u6709\u8da3\u5730\uff0cG80\u662f24&#215;8 = 192warp total\uff0cR600\/RV670\u4e5f\u662f192wavefront total\uff0c\u53ea\u662f\u4e00\u908a\u662f32threads per warp\u3001\u4e00\u908a\u662f64threads per wavefront\uff0c\u540c\u6642\u5169\u908a\u7684egister file\u7684\u898f\u6a21\u4e5f\u525b\u597d\u662f\u76f8\u5dee\u4e00\u500d\uff0c\u9019\u4e0b\u8b0e\u984c\u90fd\u89e3\u958b\u4e86XD<\/p>\n<p>&#8212;&#8211;<br \/>\n<br \/>\n\u9019\u7bc7\u4e5f\u78ba\u8a8d\u4e86R600\u6bcf\u500bcluster\u4e0a\u9762\u7684core(16&#215;5=80cores)\u4ee5\u5206\u6642\u7684\u65b9\u5f0f\u5171\u7528\u4e00\u500bTexture Array(4D)\u3002<br \/>\n<br \/>\n\u6240\u4ee5RV670\u7dad\u63014\u500barray\u5c0d4\u500btex-array(4x4D)\uff0cRV770\u589e\u52a0\u523010\u500barray\uff0cTex-array\u4e5f\u589e\u52a0\u523010\u7d444D\u3002<\/p>\n<p>G80\u5230GT200\u7684register file\u898f\u6a21\u589e\u52a0\u7684\u5e45\u5ea6\uff0c\u662f\u5f9e8K registers per SM\uff0c\u64f4\u5145\u523016K\uff0c\u7b49\u65bc64KB per SM * 30 = 1.92MB\u3002<br \/>\n<br \/>\nR600\u7684\u6bcf\u500bcore\u90fd\u6709\u81ea\u5df1\u7684256KB register file\uff0c\u6240\u4ee54\u500barray\u5171\u8a081MB\uff0c10\u500barray\u5171\u8a08\u589e\u52a0\u52302.5MB\uff0c\u5176\u5be6\u6bd4GT200\u63d0\u4f9b\u7684\u898f\u6a21\u9084\u5927\u4e0a\u8a31\u591a&#8230;.<br \/>\n<br \/>\n\u7576\u7136RV770\u96d6\u7136\u770b\u8d77\u4f86&#8221;\u53ea\u589e\u52a020%&#8221;\u96fb\u6676\u9ad4\uff0c\u4f46\u662f\u8aaa\u8d77\u4f86\u9019\u4e5f\u662f\u589e\u52a0\u4e86\u8d85\u904e200M\u7684\u898f\u6a21(666M-&gt;954M)\uff0c\u6240\u4ee5\u8aaa\u8d77\u4f86\u4ecd\u7136\u5f88\u5408\u7406\u5c31\u662f\u4e86\u3002XD<\/p>\n<p>\u4e5f\u5c31\u662f\u8aaa\uff0c\u672a\u4f86\u5982\u679cR600\u7684\u67b6\u69cb\u8981\u7e7c\u7e8c\u589e\u52a0SP\uff0c\u6bd4\u65b9\u8aaa\u73fe\u5728\u5728\u50b3\u7684\u300c2000\u500bSP\u300d(25\u500barray)\uff0c\u5176\u5be6\u8003\u616e\u63db\u621045nm\u4e4b\u5f8c\u5927\u6982\u4e5f\u662f\u548c\u73fe\u5728\u7684RV770\u5dee\u4e0d\u591a\u5927\u6216\u8005\u7a0d\u5927\u3002<br \/>\n<br \/>\n\u800c\u9019\u4e5f\u53ef\u4ee5\u8aaa\u662fR600\u7d50\u69cb\u7684\u50f9\u503c\u6240\u5728\u3002<\/p>\n<p>&#8212;-<br \/>\n<br \/>\n\u5728G94\u63a8\u51fa\u524d\uff0c\u5149\u6bd4\u8f03RV670\u548cG92\u5c31\u6703\u986f\u5f97R600\u7d50\u69cb\u6548\u7387\u5f88\u5dee\uff0c\u904b\u7b97\u5bc6\u5ea6\u5f88\u4f4e\uff0c\u7136\u5f8cG92\u7684\u6210\u672c\u5f88\u9ad8\uff1bG94\u63a8\u51fa\u6642\u5c31\u6703\u770b\u5230G8x\u7684\u7d50\u69cb\u6bd4R6x0\u53ef\u4ee5\u66f4\u5c0f\u4e9b&#8230;.<br \/>\n<br \/>\nRV770\u63a8\u51fa\u5f8c\u5247\u628aR6x0\u7684\u904b\u7b97\u5bc6\u5ea6\u63d0\u9ad8\uff0c\u9054\u5230\u548cG92\u53ef\u4ee5\u5c0d\u6297\u7684\u7a0b\u5ea6\u3002\u4e5f\u5c31\u662f\u8aaa\u5176\u5be6\u96d9\u65b9\u7684core(TPC vs ALU array)\u5e7e\u4e4e\u662f\u53ef\u4ee5\u55ae\u4f4d\u4e0a\u76f8\u5c0d\u6bd4\u7684\u3002<\/p>\n<p>\u73fe\u5728\u7684\u554f\u984c\u5c31\u662fRV770\u6709\u6c92\u6709\u8fa6\u6cd5\u4ee5\u73fe\u5728\u7684\u65b9\u5f0f\u62ff\u51fa\u548cGT200\u53ef\u4ee5\u5c0d\u6297\u7684\u7522\u54c1\uff1a<br \/>\n<br \/>\nGT200\u986f\u793aNVIDIA\u8a8d\u70ba8TPC-256bit\u662f\u9069\u7576\uff0c\u6240\u4ee5\u5f80\u4e0a\u505a\u5c31\u662f16TPC-512bit(16TPC\u6574\u5408\u523010TPCx1.5\uff0c\u4ee5\u7e2e\u5c0fcrossbar\u898f\u6a21)<br \/>\n<br \/>\n\u90a3\u9ebc\uff0cRV770\u4e4b\u5f8c\u8b20\u50b3\u6703\u52302000\u500bSP\uff0c\u53ef\u80fd\u662f\u5f9e10\u500barray\u8b8a\u621025\u500barray\uff1b\u4f46\u662f\u5728\u6b64\u540c\u6642TMU\u7684\u6578\u91cf\u4e5f\u8ddf\u8457\u589e\u52a0\u5230100\u500b\uff0c\u9019\u6642\u5019\u9084\u6703\u662f16ROP + 256bit(GDDR5)\u55ce\uff1f<\/p>\n<p>\u8981\u4e0d\u7136\u7684\u8a71NVIDIA\u4e5f\u53ef\u4ee5\u62ff\u66f4\u591aTPC\u4f86\u642d\u914d256bit\uff0c\u5c31\u53ef\u4ee5\u5feb\u901f\u5730\u7e2e\u5c0f\u770b\u8d77\u4f86\u5f88\u5de8\u5927\u7684GT200\u4e86&#8230;.<br \/>\n<br \/>\n\u5f9eRV770\u53ef\u4ee5\u770b\u5f97\u51fa\u4f86\uff0c\u5176\u5be6TMU\u548cROP\u5728R6x0\u88e1\u9762\u4e5f\u662f\u5360\u8f03\u5927\u898f\u6a21\uff0c\u6240\u4ee5\u53ea\u589e\u52a020%\u5c31\u53ef\u4ee5\u9054\u6210\u4e0d\u522a\u6e1bregister file\u589e\u52a0\u7e3d\u904b\u7b97\u91cf\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>http:\/\/www.cs.ucf.edu\/~zhou\/dlp.pdf Experiencing Variou &hellip; <a href=\"https:\/\/aiplus.idv.tw\/wp\/2008\/06\/23\/g80-rv670-register-file\/\" class=\"more-link\">\u95b1\u8b80\u5168\u6587 <span class=\"screen-reader-text\">G80\/RV670 register file\u5b8c\u6574\u8cc7\u8a0a\uff0c\u4ee5\u53ca\u96b1\u85cf\u7684\u7d50\u69cb\u6027\u8cea<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2594","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/posts\/2594","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/comments?post=2594"}],"version-history":[{"count":0,"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/posts\/2594\/revisions"}],"wp:attachment":[{"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/media?parent=2594"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/categories?post=2594"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/tags?post=2594"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}