{"id":729,"date":"2005-10-14T02:19:02","date_gmt":"2005-10-14T02:19:02","guid":{"rendered":"http:\/\/192.168.1.4\/wordpress4\/?p=729"},"modified":"2005-10-14T02:19:02","modified_gmt":"2005-10-14T02:19:02","slug":"x1000_series_cce","status":"publish","type":"post","link":"https:\/\/aiplus.idv.tw\/wp\/2005\/10\/14\/x1000_series_cce\/","title":{"rendered":"X1000 series \u53f0\u7063\u767c\u8868\u6703"},"content":{"rendered":"<p>\u4eca\u5929\u4e0b\u5348\u5169\u9ede\uff0cATI X1000 series\u53f0\u7063\u767c\u8868\u6703\u3002<br \/>\u807d\u6709\u53bb\u7684 ikari \u8aaa\uff0c\u662fDavid Wang(Director of Engineering)\u89aa\u81ea\u51fa\u99ac&#8230;.(wow\uff0cArtX\u7684\u982d\u5b50\uff01XD)<\/p>\n<p>\u4f46\u662f\u597d\u50cf\u53bb\u7684\u4eba\u4e0d\u591a\uff0c\u74b0\u8996\u5468\u906d\u53ea\u6709\u5341\u4f86\u500b&#8230;.<br \/>\u800c\u4e14\u554f\u4e0d\u5230\u4ec0\u9ebc\u554f\u984c\u3002XD<\/p>\n<p>\u7d50\u8ad6\u4f86\u8aaa\uff0cATI \u5c0dGPGPU\u7684\u614b\u5ea6\uff0c\u4e5f\u662f\u63a1\u884c\u6703\u767c\u884cAPI\u7684\u65b9\u5f0f\uff1b\u5176\u4ed6\u90e8\u5206(\u5982AVIVO\u7684H.264)\u5247\u90fd\u662f\u672a\u5b9a\u3001\u89c0\u5bdf\u3002<\/p>\n<p>\u6240\u4ee5\u5176\u5be6\u4e4f\u5584\u53ef\u9673\u55ce&#8230;.. _A_<\/p>\n<p>source:<br \/><a href=\"http:\/\/bbs.gzeasy.com\/index.php?showtopic=461982&amp;st=22\">http:\/\/bbs.gzeasy.com\/index.php?showtopic=461982&amp;st=22<\/a><\/p>\n<p>Mike houston\u5c0dR520\u7684\u4e00\u4e9b\u6558\u8ff0\uff1a<br \/>mhouston<br \/>GP<\/p>\n<p>Joined: 02 Sep 2003<br \/>Posts: 241<br \/>Location: Stanford University<br \/>Posted: Wed Oct 05, 2005 5:01 pm Post subject: A little R520 info <\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>Now that things are public, I can talk about some things: <\/p>\n<p>The board is 32-bit. The precision on ops is slightly better in general than Nvidia, but not in all cases (from GPUBench precision test). ATI cuts corners, much like Nvidia, when it comes to denorms. <\/p>\n<p>Readback rates are still a problem under GL (450MB\/s), but not under DX on Nforce4 or ATI chipsets (900+MB\/s). There are performance problems on Intel chipsets for some reason. Still below where I&#8217;d like to see them, but at least closer to Nvidia performance. <\/p>\n<p>The board has really good latency hiding, much like the R4XX series. Your performance is generally the max(ALU, tex, branch). Where tex is the total fetch latency: 4 cycles for a 128-bit fetch which is a cache hit, and 8 cycles for a 128-bit streaming fetch. You can look at the ClawHMMer paper for more analysis of latency hiding. <\/p>\n<p>The board supports generallized scatter, yet it&#8217;s not currently exposed (no way to do this cleanly in DX, so it might be GL only (&lt;- I&#8217;m working on this)&#8230;) <\/p>\n<p>The board has 1.5 ALUs. The half can do add\/sub but not MUL\/MAD\/etc. This gives the X1800XT ~120GFlop peak. Raw MAD rate is 83GFlops, which is lower than Nvidia. <\/p>\n<p>Cache bandwidth is 42GB\/s and streaming is 21GB\/s for the X1800XT. <\/p>\n<p>Branch granularity is ~16 fragments. Branch performance, at least from really basic tests, seems very good. <\/p>\n<p>ATI has claimed to be more committed to supporting academic research and GPGPU in general. They say they will open up a lot more information about their architecture and provide lower level interfaces to access their hardware. Only time will tell how this will play out. <\/p>\n<p>Let me know if you have other questions, and I&#8217;ll try to answer them as soon and as well as I can. At the moment, I only have a X1800XL here, so I&#8217;ll try to put up some GPUBench results for the board later today on the GPUBench site. <\/p>\n<p>-Mike<\/p>\n<p>Last edited by mhouston on Thu Oct 06, 2005 12:49 am; edited 1 time in total <\/p>\n<p>[quote]The board supports generallized scatter, yet it&#8217;s not currently exposed (no way to do this cleanly in DX, so it might be GL only (&lt;- I&#8217;m working on this)&#8230;) <br \/>[\/quote]<\/p>\n<p>Posted: Thu Oct 06, 2005 1:17 am Post subject: <\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>You can have an arbitrary number of outputs from a shader, well, I guess the instruction limit, so 512. <\/p>\n<p>You can basically do a[i] = x. The writes are uncached, so there will be a performance penalty (think in the thousands of cycles), but if you do lots of ops, some of the latency can be hidden, at least in theory. You are responsible for making sure fragments don&#8217;t clobber each other. Also, you cannot read and write to the same buffer, i.e. no read-modify-write. I haven&#8217;t tested it yet, since it&#8217;s not exposed currently in any available driver, but the memory controller and memory system were designed to handle this. <\/p>\n<p>Posted: Thu Oct 06, 2005 1:29 am Post subject: <\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>Yes. But, you can also output more than 16 floating point values (4 float4&#8217;s) as well. Both are useful. We&#8217;ve been asking the graphics card companies for awhile about this one as it solves some issues with variable output from kernels as well as stream filtering. It&#8217;s going to be interesting to see if it&#8217;s cheaper than the known methods, like Daniel Horn&#8217;s chapter in GPU Gems2. <\/p>\n<p>[quote]I just checked the GPUBench page and compared the X1800 to the X800XT PCIe. It seems to me that the computational power remained nearly the same. (instruction issue, scalar vs. vector instruction issue, basic throughput, FP Bandwidth) <\/p>\n<p>The most significant differences seem to be the new branching and 32-Bit support. <\/p>\n<p>So is it faster than the former ATI cards e.g. X850? Or as fast as those cards, but now with 32 Bit support?[\/quote]<\/p>\n<p>Posted: Thu Oct 06, 2005 2:07 pm Post subject: <\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>The X1800XL has roughly the same clock rates as the X8XX boards, 500 core\/500 mem. The branching, 32-bit, scatter, no dependent texture limit, no dynamic instruction limit, and fully associative cache are the biggest new things. The R520 is ~20% faster than the R4XX clock for clock and has a MUCH better memory subsystem so it handles random reads better. Basically, all our apps got a little faster on the XL, ~10-15%. <\/p>\n<p>The X1800XT has is clocked at 625c\/750m, so is substantially faster. We&#8217;ve seen compute bound applications get ~30% and memory bound applications get 50-100% depending on the memory access patterns. The later is from the new cache design (many fewer misses) and the memory subsystem handling incoherent reads much better.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u4eca\u5929\u4e0b\u5348\u5169\u9ede\uff0cATI X1000 series\u53f0\u7063\u767c\u8868\u6703\u3002\u807d\u6709\u53bb\u7684 ikari \u8aaa\uff0c\u662fDavid Wang(D &hellip; <a href=\"https:\/\/aiplus.idv.tw\/wp\/2005\/10\/14\/x1000_series_cce\/\" class=\"more-link\">\u95b1\u8b80\u5168\u6587 <span class=\"screen-reader-text\">X1000 series \u53f0\u7063\u767c\u8868\u6703<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-729","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/posts\/729","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/comments?post=729"}],"version-history":[{"count":0,"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/posts\/729\/revisions"}],"wp:attachment":[{"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/media?parent=729"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/categories?post=729"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aiplus.idv.tw\/wp\/wp-json\/wp\/v2\/tags?post=729"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}