<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>ByteLabs: TLP - Thread Level Parallelism: UltraSPARC-T1</title>
    <link>http://blog.solaris.bytelabs.org/articles/2005/12/08/tlp-thread-level-parallelism-ultrasparc-t1</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>additions to a vast pool of entropy by Igor and Ines</description>
    <item>
      <title>TLP - Thread Level Parallelism: UltraSPARC-T1</title>
      <description>&lt;p style="float:left"&gt;&lt;img src="http://www.bytelabs.org/images/ust1.png" alt="" /&gt;&lt;/p&gt;


	&lt;p&gt;In my last post I have written a little bit about Thread Level Parallelism and since &lt;a href="http://www.sun.com/processors/UltraSPARC-T1/index.xml"&gt;&lt;span class="caps"&gt;SUN&lt;/span&gt;&lt;/a&gt;  has officially released its first processor based on &lt;a href="http://www.bytelabs.org/hennessy/niagra_micro.pdf"&gt;&lt;span class="caps"&gt;TLP&lt;/span&gt;&lt;/a&gt; I thought this is a good opportunity to write a little something about this nice processor.&lt;/p&gt;


	&lt;p&gt;Instead of increasing the size of the pipeline by dividing pipeline stages up into smaller units in order to be able to increase the processor tact rate and then fail to keep the pipeline full (as Intel has done it with the P4), &lt;span class="caps"&gt;SUN&lt;/span&gt; did something different. They concentrated on throughput and went away from the classical &lt;span class="caps"&gt;ILP&lt;/span&gt; paradigm.&lt;/p&gt;


	&lt;p&gt;Their new processor consists of up to 8 cores processing 4 threads per core. So this means that 32 threads can be processed &amp;#8220;simultaneously&amp;#8221;. Each core can process one instruction every cycle.  There is a L1 cache per core and a shared L2 cache. In order to provide the high memory bandwidth necessary to &amp;#8220;feed&amp;#8221; 32 threads, a crossbar interconnects scheme routes memory references to a banked on-chip level-2 cache that all threads share. All this means more throughput, less energy consumption and more efficient usage of resources.&lt;/p&gt;


	&lt;p&gt;Are you still reading this post and still interested? Then you should &lt;a href="http://www.bytelabs.org/hennessy/niagra_micro.pdf"&gt;check out this description of the niagara architecture&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Thu, 08 Dec 2005 23:02:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:1b1e61c3b31d54c22ed558b032b56d32</guid>
      <author>igor</author>
      <link>http://blog.solaris.bytelabs.org/articles/2005/12/08/tlp-thread-level-parallelism-ultrasparc-t1</link>
      <category>University of Edinburgh</category>
      <category>Hacking and Computers</category>
      <trackback:ping>http://blog.solaris.bytelabs.org/articles/trackback/26</trackback:ping>
    </item>
  </channel>
</rss>
