# **On the Physicl Layout of PRDT-Based NoCs**

Guoqiang Yang<sup>1</sup>, Mei Yang<sup>2</sup>, Yulu Yang<sup>1</sup>, Yingtao Jiang<sup>2</sup>

<sup>1</sup>Department of Computer Science, Nankai University, Tianjin, 300071, China

<sup>2</sup>Department of Electrical and Computer Engineering, Las Vergas, NV, 89154, USA

Emails: <sup>1</sup>nkyangguoqiang@gmail.com, yangyl@nankai.edu.cn, <sup>2</sup>{meiyang, yingtao}@egr.unlv.edu

#### Abstract

In this paper, we present PRDT(2, 1), a new interconnection network topology for Network-on-chip (NoC) design. PRDT(2,1) features a recursive structure, and has small diameter and average distance. We then focus our study on physical layout issues pertaining to PRDT(2, 1). Specifically, we show that the minimum number of metal layers required for the placement and routing in a PRDT (2, 1)-based NoC is 2. We further demonstrate that the routing channel widths can be dramatically reduced when more layers are available for layout purposes. This study confirms that PRDT(2, 1) is a practical and promising topology for on-chip interconnection networks.

#### 1. Introduction

The rapid development of VLSI technologies, as predicted in [4], makes it possible to integrate a large number of processing units on a single chip. One of the major challenges in designing such highly integrated System-on-Chips (SoCs) will be to find an effective way to integrate pre-designed Intellectual Property (IP) cores for power and performance concerns. As a new SoC paradigm [1], NoC was proposed to overcome the limitations of conventional bus-based SoC architectures. With a communication-centric style, in an NoC system, processing units are interconnected by a regular interconnection network.

The on-chip essence makes energy consumption an important design constraint for NoC systems. The interconnection network consumes a significant amount of power in an NoC system (e.g., the interconnection network of a RAW system can consume as much as 36% of the total power [3]). Up to date, 2-D mesh [3] remains the most popular on-chip interconnection network structure, thanks to its simple structure. However, mesh does not scale well as its diameter is proportional to the number of processing units on each row/column. The scalability performance of other proposed structures is either insufficient or unexplored.

In [10], the Rotational Diagonal Torus (RDT)-based onchip interconnection network is proposed. The RDT structure has the following distinct features: 1) high scalability with its recursive structure, 2) low power consumption with a small diameter and average distance, 3) architectural reconfigurability with its embedded mesh/torus topology, and 4) fault-tolerance capability with a constant node degree and robust routing schemes. These features make RDT suitable for interconnecting processing nodes in NoCs. It is shown that RDT(2, 2, 1)/ $\alpha$  is a feasible topology for NoCs, taking into account technological constraints in modern VLSI technologies [10].

In this paper, we will introduce a special type of RDT, PRDT(2,1), which has a simpler structure than RDT(2,2,1)/ $\alpha$ . We show that PRDT(2, 1) is suitable to build on-chip interconnection network for NoCs with several to hundreds of processing units (interchangeably with nodes). We then address the layout issues of PRDT(2, 1)-based NoCs.

Thompson's model [7] is mostly used for VLSI layout which is to embed the layout graph composed of processing units and links into two-dimensional scheme and layout the links between processing units. In this model, the layout area is divided into square "tiles" of unit area and placed in a grid fashion. A tile can hold a node, a wire or a wire crossing. In [3], an improved model and collinear layout are proposed. In [9], multilayer grid model is proposed, which consists of one active layer for nodes and L layers of wires.

Based on the Thompson's model, we define an improved layout model. The goal of the physical layout of PRDT(2, 1) on the defined model is to minimize the layout area of the links given the number of metal layers available while jump crossing is avoided and crosstalk constraints are satisfied. We define the unit width of a routing channel to measure the area used in laying out the links. We present a physical layout which first groups the links into different groups and then lays out the groups in sequence. We show that the routing channel width of the obtained layout is only 2 with 8 metal layers, whereas the channel width increases with the decrease of the number of layers available. The minimum number of layers needed to lay out PRDT(2, 1) is only 2.

The rest of the paper is organized as follows. Section 2 introduces the PRDT(2, 1) structure and the layout model for NoC. Section 3 presents the physical layout of PRDT(2, 1) and provides analytical results of the layout. Section 4 discusses the node model used in the layout. Section 5 concludes the paper.

# 2. The PRDT(2, 1) Structure and Layout model

#### 2.1 The PRDT(2, 1) Structure

The RDT structure is constructed by recursively overlaying 2-D diagonal meshes (tori) [8]. The base torus is a two-dimensional square array of nodes, each of which is numbered with a two-dimensional number  $(i, j), 0 \le i \le N$ -1,  $0 \le j \le N-1$ , where N = nk and both n and k are natural numbers. The torus network is formed with four links between node(x, y) and its neighboring four nodes:  $(mod(x\pm 1, N), y)$  and  $(x, mod(y\pm 1, N))$ . The base torus is also called rank-0 torus. On top of rank-0 torus, a new torus-like network (rank-ltorus) is formed by adding four links between node (x, y) and nodes  $(x \pm n, y \pm n)$ . The direction of the new toruslike network is at an angle of 45 degrees to the original torus. On rank-1 torus, another torus-like network (rank-2 torus) can be formed by adding four links in the same manner. In a more general sense, a *rank-(r+1) torus* can be formed upon *rank-r torus*.

RDT(n, R, m) is a class of RDT in which each node has links to form base torus (rank-0) and m upper tori (the maximum rank is R) with the cardinal number n. A *perfect* RDT (PRDT(n, R)) is a network in which every node has links to form all possible upper rank tori (i.e. RDT(n, R, R)).



Figure 1(a) Structure of 8x8 PRDT(2, 1).



Figure 1(b) Structure of 4x4 PRDT(2, 1).

In this paper, we focus our study on PRDT(2, 1). In general, each node has a constant degree of 8 except for PRDT(2, 1) with 4x4 nodes. Fig. 1(a) shows the structure of 8x8 PRDT(2, 1). Fig. 1(b) shows the structure of 4x4 PRDT(2,1), where each node has 5 links, one on the rank-1 torus and the other four on the rank-0 torus.

With only rank-0 and rank-1 links, the wiring cost of PRDT(2, 1) can be dramatically reduced. Consequently, saving on energy consumption can be achieved. Hence, PRDT(2, 1) is a very promising solution for the interconnection network of NoCs at the presence of tens or even hundreds of nodes.

#### 2.2 Layout Model and Its Features

In this paper, we intend to study how a PRDT(2, 1)based on-chip network can be laid out in a chip. For an NoC system, the functional complexity of processing units determines that they may take more layout area. Here, based on Thompson's model, we define an improved multilayer two-dimensional grid model. The following definitions are used in our model.

**Definition 1 (Link)**. A *link* is the wire connecting two nodes.

**Definition 2 (Routing Channel).** The *routing channel* refers to the blank area between two adjacent nodes (see Fig. 2).

**Definition 3 (Unit Width of Routing Channel).** The unit width of routing channel is defined as the product of the standard separation space between two links and the number of links (i.e., wire width) between two nodes. The routing channel width is measured in unit width.

**Definition 3 (Jump Crossing).** If two links intersect at a point which is not any end of each link, the two links are *jump crossing*.

**Definition 4 (Shortest Path Layout).** For a link between nodes  $(x_1,y_1)$  and  $(x_2,y_2)$  (see Fig. 1(b) for the coordinates), the *shortest path layout* of a link is as the layout with the link length equal to the Manhattan Distance given by  $(|x_1-x_2|+|y_1-y_2|)^*\Psi$ , where  $\Psi$  is a constant used to adjust the length.

In this model, the layout area of each metal layer is firstly divided into multiple tiles, and the number of these tiles is equal to the number of nodes in an NoC. We also assume that the inner routing of all the nodes is completed and the area taken is limited to their own region in all layers. Links can only be laid in the routing channel and can't cross the area that was held by nodes. The routing channels for links are reserved between these tiles. Each routing channel is further divided into slots with a unit width of routing channel, where links with unit width can be laid. Fig. 2 shows the layout area of  $8 \times 8$  base torus.

Area and the total length of wires are the two important goals for a layout scheme. In our model, we assume that all the nodes have the same functionality and hey need the same area to lay out. Without of loss of generality, we assume the layout area of a node is in square shape. An important property of the layout model is as follows.

| $\square$ | $\square$ | $\square$ | $\square$ |           | $\square$ |           |
|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|           | $\square$ | $\square$ | $\square$ | $\square$ |           | $\square$ |
| $\square$ |           | $\square$ |           | $\square$ | $\square$ |           |
| $\square$ | $\square$ |           | $\square$ |           | $\square$ |           |
|           | $\square$ | $\square$ |           | $\square$ |           | $\square$ |
| $\square$ |           | $\square$ | $\square$ |           | $\square$ |           |
|           | $\square$ |           | $\square$ | $\square$ |           |           |
|           |           | $\square$ |           | $\square$ | $\square$ | $\square$ |

Slot with Unit Width

Figure 2 Layout area of 8×8 base torus.

**Property 1:** The total length of all the shortest path layouts of all the links is a constant for a given network topology with a fixed number of nodes.

For link between nodes  $(x_i, y_i)$  and  $(x_i, y_i)$ , its physical length is  $(|x_i-x_j|+|y_i-y_j|)^* \Psi$  under its shortest path layout. So the total length of the shortest path layouts of all links is  $\sum_{\forall i,j} (|\mathbf{x}_i - \mathbf{x}_j| + |\mathbf{y}_i - \mathbf{y}_j|) * \boldsymbol{\psi} .$  The set of links is

determined by the topology, thus the total length of wires is a constant.

## 3. Physical Layout and Analysis

## **3.1 Phyiscal Layout**

Area is an important constraint in NoC system. Since the chip area is fixed, decreasing the area used for links can leave more area for laying out processing units. With the development of VLSI technology and the coming of deep submicron age, the space between two wires is decreasing and the clock frequency is constantly increasing, which makes crosstalk a serious problem of future VLSI design [4]. As such, the goal of the physical layout is to minimize the layout area of links given the number of metal layers available such that no jump crossing exists in each layer and crosstalk constraints can be satisfied.

The following assumptions are used in our study. To avoid crosstalk between adjacent layers, we assume that the links in the same position of adjacent layers must be orthogonal to each other. The diagonal links are laid out in two segments of X and Y dimension links. To achieve scalability, we also assume that if the X dimension segments of some rank-1 links are laid in the same layer, then their Y dimension segments must also be laid in the same layer. And the X and Y dimension segments of a rank-1 link should be routed in the neighboring layers to reduce the electronic and performance influence.

We present the physical layout of PRDT(2, 1), where the links are grouped to form multiple groups and the links in each group are laid out in sequence..

Definition 5 (Outgoing Link and Starting Node). Among the four rank-1 links from a given node (x, y), the two links connecting to nodes  $(x_1, y_1)$  and  $(x_2, y_1)$  where  $y_1 > y_2$ , are called the two outgoing links and the node is called the starting node of the two outgoing links.

Definition 6 (Incoming Link and Ending Node). Among the four rank-1 links from a given node (x, y), the two links connecting to nodes  $(x_1, y_2)$  and  $(x_2, y_2)$  where  $y > y_2$ , are called the two incoming links and this node is called the ending node of the two incoming links.

Definition 7(Inner Links and Boundary Links). For a link between nodes  $(x_1, y_1)$  and  $(x_2, y_2)$ , it is called an *inner links* if  $|x_1-x_2|+|y_1-y_2|=4$ , or boundary links if  $|x_1-x_2|+|y_1-y_2|=4$  $y_2 > 4.$ 

As shown in Fig. 1(a), link ((2,2),(4,4)) is an outgoing link of node (2,2) and an incoming link of node (4,4). Node (2,2) and node (4,4) are called the starting node and the ending node of the link, respectively. Link ((0,0),(2,2)) is an inner link and link ((0,0),(6,2)) is a border link.

In the PRDT(2,1) structure, rank-1 links are more complex than rank-0 links and it is more difficult o avoid jump crossing between rank-1 links. Hence, we consider the layout of rank-1 links first.



According to the definition of PRDT(2, 1), there are four rank-1 links connecting node (x,y) and other four nodes denoted as  $((x \pm 2) \mod N, (y \pm 2) \mod N)$ . Obviously, the parity of the starting node and the ending node of a rank-1 link must be the same. Thus we group the rank-1 links into four groups based on their respective parity of the starting/ending node's X and Y coordinates as follows.

Group 1: links connecting nodes  $(x_1, y_1)$  and  $(x_2, y_2)$ , where  $x_1(x_2)$  and  $y_1(y_2)$  are both even numbers,

Group 2: links connecting nodes  $(x_1, y_1)$  and  $(x_2, y_2)$ , where  $x_1(x_2)$  is odd and  $y_1(y_2)$  is even.

Group 3: links connecting nodes  $(x_1, y_1)$  and  $(x_2, y_2)$ , where  $x_1(x_2)$  is even and  $y_1(y_2)$  is odd.

Group 4: links connecting nodes  $(x_1, y_1)$  and  $(x_2, y_2)$ , where  $x_1(x_2)$  and  $y_1(y_2)$  are both odd numbers.

The group result of  $8 \times 8$  is shown in Fig. 3.

Due to the complete symmetry of PRDT(2, 1), the relative position of the rank-1 links of in each group is the same. Hence, we can lay out the links in one group following the same layout of another group. Next we will show the layout of the links in Group 1 as an example.

Each rank-1 link can be considered as the outgoing link of its starting node. The layout of links in Group 1 follows the increasing order of the *X* coordinates then the *Y* coordinates of the starting nodes. For the two links with the same starting node (x, y), the one connecting the node  $(x_1, y_1)$  with  $x_1 > x$  is laid first. For example, links ((4,4),(2,6)) and ((4,4),(6,6)) are of the same starting node, but ((4,4),(6,6)) is laid first. If one of the links is a border link, the link with shorter wire length is laid first. For example, links ((0,0),(2,2)) and ((0,0),(6,2)) have the same starting node, but ((0,0),(2,2)) is laid first.

For a specific rank-1 link, it is firstly laid on the bottom boundary of its starting node. Then its X dimension segment is laid in the horizontal routing channel. After crossing a via between adjacent layers, its Y dimension segment is arranged in the vertical routing channel of the neighboring layer. As such, the rank-1 links on any layer do not have jump crossing. Also the links that are on the same dimension but don't overlap to each other are laid in the same slot of unit width. For example, the X dimension segments of links ((0,0),(2,2)), ((2,0),(4,2)) and ((4,0),(6,2)) are in the same horizontal slot of unit width (see Fig. 3).

For rank-0 links, the horizontal and vertical links are laid out evenly in different layers. Hence, no jump crossing exists between rank-0 links. Fig. 3 shows part of rank-0 links laid out with rank-1 links in Group 1.

Given 8 metal layers, each group of rank-1 links can be laid out in different layers. If the number of layers is 4, two groups of rank-1 links can be laid out together on one layer. The least number of layers needed is 2, since X dimension and Y dimension segments of a rank-1 link need to be laid in different layers.



For  $4 \times 4$  PRDT(2, 1), the four rank-1 links from a given node are the same, and the node degree is 5. As such, the physical layout of 4x4 PRDT(2,1) is much simpler. Fig. 4 shows the complete layout of 4x4 PRDT(2, 1). All links realize the shortest path layout.

# 3.2 Analysis

From Fig. 3, one can see that all inner links are of the shortest path layout while the boundary links are not. The length of boundary links is at most 2 unit longer than the shortest path layout. We can also find that the width of the horizontal (vertical) routing channels is uniform. All vertical channels are used and their channel width is 2 (in unit width) excluding the rank-0 wrap-around link. One half of the horizontal channels are used and the average horizontal channels is 4, which yields the average horizontal channel width of 2.

It can be derived from the layout that the number of layers needed is in 2's power. Assuming the number of layers is  $2^k$ , the average horizontal channel width  $W_h$  and the average vertical width  $W_v$  are determined as  $W_h = W_v = 2^{(4-k)}$ .

Next we use the layout of  $8 \times 8$  PRDT(2, 1) in 8 layers to explain that the channel width obtained is minimum. We first show that the average horizontal channel width of 2 is the minimum. As shown in Fig. 3, the two neighboring inner links (i.e., the links with their starting nodes closest to each other) which are to be laid in the same horizontal channel cannot overlap each other. For example, links ((0,0),(2,2)) and ((2,0),(0,2)), and the sum of their channel width is no less than 2. To avoid the overlap of the two boundary links (e.g. links ((0,0),(6,2)) and ((6,0),(0,2))) and the overlap of these two links with other links in the channel, at least 2 additional unit width is needed to lay out these two links. Thus the minimum horizontal channel width is 4. But one half of the horizontal channels are not used, so the average channel width is 2. Similarly, we can show that the vertical channel width of 2 is the minimum.

Following the same way, we can derive that the physical layouts with the number of layers less than 8 also yield the minimum channel width. Tab. 1 summarizes the average routing channel width (in # of unit width) in both horizontal and vertical directions with different number of layers for different sizes of PRDT(2, 1).

Table 1 Average routing channel width (in unit width) vs. number of layers for different sizes of PRDT(2, 1).

| # of<br>Layers | Network<br>Size | # of unit width<br>(Horizontal) | # of unit width<br>(Vertical) |
|----------------|-----------------|---------------------------------|-------------------------------|
| 2              | 4 x 4           | 2                               | 2                             |
|                | ≥ 8 x 8         | 8                               | 8                             |
| 4              | 4 x 4           | 1                               | 1                             |
|                | $\geq 8 \ge 8$  | 4                               | 4                             |
| 8              | 4 x 4           | 1/2                             | 1/2                           |
|                | $\geq 8 \ge 8$  | 2                               | 2                             |

# 4. Node model

In PRDT(2, 1), the logical connections between nodes are symmetrical, but the physical connections between nodes may not the same due to the fixed node position. In this section, we attempt to provide one possible node model for implementing the proposed layout scheme.



Figure 5 Node Model.

Since the node degree of PRDT(2, 1) is 8, we reserve 8 mapping point at the border of the four sides of the node, as shown in Fig. 5a). The mapping points with same number are physically connected to each other, and are logically the same point. In this way, the problem of non-uniform distributed node degree can be resolved.

| Table 2 Mapping p | point num | bering rules. |
|-------------------|-----------|---------------|
|-------------------|-----------|---------------|

| Links                                     | Mapping |  |
|-------------------------------------------|---------|--|
|                                           | point   |  |
| ((x, y), (x, (y-1)  mod  N))              | 1       |  |
| ((x, y), (x, (y+1)  mod  N))              | 1'      |  |
| ((x, y), ((x-1)  mod  N, y))              | 2       |  |
| ((x, y), ((x+1)  mod  N, y))              | 2'      |  |
| $((x, y), ((x-2) \mod N, (y-2) \mod N))$  | 3       |  |
| ((x, y), (((x+2)  mod  N, (y+2)  mod  N)) | 3'      |  |
| ((x, y), ((x+2)  mod  N, (y-2)  mod  N))  | 4'      |  |
| $((x, y), ((x-2) \mod N, (y+2) \mod N))$  | 4       |  |

However, in practice, the node border may not be wide enough to reserve 8 mapping points. Hence, we improve the model by combining some of the mapping points. We first assign each link a number, as shown in Tab. 2. Then we only need keep the mapping points that used by the link connections in the layout. The final uniform node model is shown in Fig 6. b). Note that there is one mapping point not used at the right side.

## 5. Conclusion

The topology of the on-chip interconnection network plays an important role in energy consumption and performance of an NoC system. Due to its small diameter and average distance as well as constant node degree, PRDT(2,1) is shown to be a promising candidate for onchip interconnection network. In this paper, we presented a physical layout and showed that any size PRDT(2, 1) can be laid out in 2 metal layers. The channel width can be dramatically reduced with more number of layers. We showed that the channel width obtained by the physical layout is the minimum. This study further confirms that PRDT(2, 1) is a practical topology as far as current VLSI technology is concerned.

#### References

- L. Benini and G.D. Micheli, "Networks on chips: a new SoC paradigm," *IEEE Computer*, vol. 35, no. 1, pp. 70–80, Jan. 2002.
- [2] W.J. Dally and B. Towles, "Route packets, not wires: on-chip interconnection networks," *Proc. DAC*, June 2002.
- [3] A. Fernández and K. Efe, "Efficient VLSI layouts for homogeneous product networks," *IEEE Trans. Computer*, vol. 46, no. 10, pp. 1070-1082, Oct. 1997.
- [4] ITRS Roadmap, http://public.itrs.net/Files/2005ITRS/Home2005.htm, 2005.
- [5] N. Kavaldjiev and G. M. Smit, "An energy-efficient network-onchip for a heterogeneous tiled reconfigurable systems-on-chip," *Proc. Euromicro Symp. Digital System Design (DSD)*, 2004, pp. 492-498.
- [6] J. S. Kim, M. B. Taylor, J. Miller, et al., "Energy characterization of a tiled architecture processor with on-chip networks," Proc. Int'l Symp. Low Power Electronics and Design, pp. 424-427, 2003.
- [7] Thompson, "A complexity theory for VLSI," PhD dissertation, Dept. of Computer Science, Carnegie-Mellon University, Pittsburgh, PA, 1980.
- [8] Y. Yang, A. Funahashi, A. Jouraku, H. Nishi, H. Amano, and T. Sueyoshi, "Recursive diagonal torus: an interconnection network for massively parallel computers," *IEEE Trans. Parallel and Distributed Systems*, vol. 12, no. 7, pp. 701-715, Jul. 2001.
- [9] C.-H. Yeh, B. Parhami, E.A. Varvarigos, and H. Lee, "VLSI layout and packaging of butterfly networks," *Proc. ACM Symp. Parallel Algorithms and Architectures*, 2000, pp. 196-205.
- [10] Y. Yu, M. Yang, Y. Yang, and Y. Jiang, "A RDT-based interconnection network for scalable NoC designs", Proc. ITCC, 2005, pp. 723-728.