AN OPTIMIZED ALGORITHM FOR SIMULTANEOUS ROUTING AND BUFFER INSERTION IN MULTI-TERMINAL NETS

C. Uttraphan\(^1\), N. Shaikh-Husin\(^2\)
\(^1\)Embedded Computing System (EmbCoS) Research Focus Group, Department of Computer Engineering, Faculty of Electrical and Electronic Engineering, Universiti Tun Hussein Onn Malaysia, Batu Pahat, 86400, Malaysia
\(^2\)Universiti Teknologi Malaysia, Johor Bahru, 81310, Malaysia
E-Mail: \(\text{\texttt{chessda@uthm.edu.my, nasirsh@fke.utm.my}}\)

ABSTRACT
In today’s VLSI design, one of the most critical performance metric is the interconnect delay. As design dimension shrinks, the interconnect delay becomes the dominant factor for overall signal delay. Buffer insertion is proven to be an effective technique to minimize the interconnect delay. In conventional buffer insertion algorithms, the buffers are inserted on the fixed routing paths. However, in a modern design, there are macro blocks that prohibit any buffer insertion in their area. Many conventional buffer insertion algorithms do not consider these obstacles. This paper presents an algorithm for simultaneous routing and buffer insertion using look-ahead optimization technique. Simulation results show that the proposed algorithm can produce up to 47% better solution compared to the conventional algorithms. Although research has shown that simultaneous routing and buffer insertion is NP-complete, however, with the aid of look-ahead technique, the runtime of the algorithm can be reduced significantly.

Keywords: Buffer insertion, VLSI routing, VLSI design automation, dynamic programming.

INTRODUCTION
Interconnect is a wiring system that propagates signals to the various functional blocks in VLSI circuits. When VLSI technology is scaled down, gate delay and interconnect delay change in opposite directions. Smaller devices lead to less gate switching delay. In contrast, thinner wire increases wire resistance and signal propagation delay. As a result, interconnect delay has become the dominating factor for VLSI circuit performance (ITRS 2013; Alpert et al. 2009). Among the available techniques, buffer insertion has been proven to be one of the best techniques to reduce the interconnect delay for a long wire. The main challenge in interconnect buffer insertion is how to determine the optimal number of buffers and their placement in the given interconnect tree. The most influential and systematic technique was proposed by van Ginneken (van Ginneken 1990). Given the possible buffer locations, this algorithm can find the optimum buffering solution for the fixed signal routing tree that will maximize timing slack at the source according to Elmore delay model (Elmore 1948).

Recently, many techniques to speed-up van Ginneken algorithm and its extensions were proposed, such as in (Shi and Li 2003; Shi and Li 2005; Li and Shi 2006b; Li and Shi 2006a; Li et al. 2012). However, van Ginneken algorithm and its extensions can only operate on a fixed routing tree. They will give optimal solution when the best routing tree is given, but produce a poor solution when a poor routing tree is provided, especially when there are obstacles in the design. In today’s VLSI design, some regions may be occupied by predesigned libraries such as IP blocks and memory arrays. Some of these regions do not allow buffer or wire to pass through and some regions only allow wire to go through but are restricted for any buffer insertion. Therefore, buffer insertion has to be performed with consideration of this buffer and wire obstacles (Alpert et al. 2009; Khalil-Hani and Shaikh-Husin 2009). The best way to handle the obstacles is to perform the routing and buffer insertion simultaneously using a grid graph technique. However, research has shown that simultaneous routing and buffer insertion is NP-complete (Hu et al. 2009). The available known techniques today are either using dynamic programming to compute optimal solution in the worst-case exponential time or design efficient heuristic without performance guarantee.

The dynamic programming algorithm such as RMP (Recursive Merging and Pruning) algorithm can find an optimal buffering solution for multi-terminal nets (Cong and Yuan 2000), but it is not efficient when the number of sinks and the number of possible buffer locations are big as the search space is very large. Indeed, Hu et al. show that the searching in RMP is NP-complete. They also proposed a heuristic algorithm to solve multi-terminal nets buffer insertion problem by constructing a performance driven Steiner tree where an alternative Steiner node is created if the original Steiner node is inside the obstacle area (Hu et al. 2003). The algorithm is called RIATA for Repeater Insertion with Adaptive Tree Adjustment. RIATA is very fast because it operates on a fixed tree. However, the quality of the solution may not be good enough if many paths of the adjusted tree still overlap with the buffer obstacles.

Instead of fully constructing the routing path simultaneously with buffer insertion like in RMP algorithm, a simultaneous approach on the adjusted tree is proposed. The algorithm is called HRTB-LA for Hybrid Routing Tree and Buffer insertion with Look-Ahead. HRTB-LA produces the best result compared to the techniques that perform buffer insertion on the fixed routing path like van Ginneken algorithm (and its extensions) and RIATA. The runtime of HRTB-LA is improved by adopting a technique called look-ahead proposed by (Khalil-Hani and Shaikh-Husin 2009) to
solve the simultaneous routing and buffer insertion for single-sink net problems.

This paper is organized as follows: section 2 gives problem formulation, section 3 provides the background of the study, section 4 describes the proposed algorithm, section 5 presents the experimental results and section 6 summarizes the conclusion.

PROBLEM FORMULATION

The simultaneous routing and buffer insertion problem in VLSI layout design is essentially a buffered routing path search problem. In this work, it is formulated as a shortest-path problem in a weighted graph specified as follows. Given a routing grid graph \( G = (V, E) \) corresponding to VLSI layout where \( v \in V \) and \( e \in E \) is a set of internal vertices and a set of internal edges respectively, with a source vertex \( s_0 \in V \), \( n \) sink vertices \( s_1, s_2, \ldots, s_n \in V \), \( n-1 \) Steiner vertices \( m_1, m_2, \ldots, m_{n-1} \in V \), a buffer library \( B \) and a wire parameter \( W \). The goal is to find a routing path simultaneously with buffer insertion such that the delay at the source is minimized. A vertex \( v_i \in V \) may belong to the set of buffer obstacle vertices, denoted \( V_{OBW} \) or a set of wire obstacle vertices, denoted as \( V_{OW} \). A buffer library \( B \) contains different types of buffer. For each edge \( e = u \rightarrow v \), signal travels from \( u \) to \( v \), where \( u \) is the upstream vertex and \( v \) is the downstream vertex and \( u, v \not\in V_{OBW} \). A uniform grid graph illustrating some of the parameters for the problem formulation is shown in Figure 1.

![Figure 1: A uniform grid graph \( G = (V, E) \)](image)

BACKGROUND

In simultaneous routing and buffer insertion algorithm, the VLSI layout is represented by a uniform 2D grid graph as shown in Figure 1. Each wire segment (each edge of the graph \( e \in E \)) is modelled as \( \pi \)-model RC circuit as shown in Figure 2a while the buffer model is shown in Figure 2b. The label \( c_w \) and \( r_w \) are the capacitance and resistance per wire segment respectively while \( r_b, c_b \) and \( d_b \) are the output resistance, input capacitance and intrinsic delay of the buffer respectively.

![Figure 2: (a) Wire segment model (b) Buffer model](image)

The goal of the algorithm is to determine the best location of buffers on a given interconnect (at the vertex between each segment) in order to optimize the Elmore delay. The delay is calculated for each segment starting from a sink vertex toward the source (this is called upstream computation). The computation is characterized by two parameters, which are downstream capacitance and downstream delay. Each capacitance-delay \((c, t)\) pair is called a candidate solution. This candidate solution is expanded toward the source by the following operations (these operations are also known as path expansions):

1. Wire expansion: Expand the candidate solution from vertex \( v \) to \( u \) by inserting a wire segment between \( v \) and \( u \) as shown in Figure 3. If \((c, t)\) is the candidate solution at vertex \( v \), then the new candidate solution at vertex \( u \) is \((c', t')\) pair given by

\[
c' = c + c_w \quad \text{and} \quad t' = t + r_w \left( \frac{c_w}{2} + c \right).
\]

![Figure 3: Wire expansion from vertex \( v \) to vertex \( u \) for upstream path expansion](image)

2. Wire expansion terminated by buffer: Expand the candidate solution from vertex \( v \) to \( u \) by inserting a wire segment between \( v \) and \( u \) and insert the buffer at vertex \( v \) as shown in Figure 4. If \((c, t)\) is the candidate solution at vertex \( v \), then the new candidate solution \((c', t')\) at vertex \( u \) is given by

\[
c' = c + c_b \quad \text{and} \quad t' = t + r_w \left( \frac{c_w}{2} + c_b \right) + d_b + r_b c + t.
\]

![Figure 4: Wire expansion from vertex \( v \) to vertex \( u \) and buffer insertion at \( v \)](image)

3. Branch merging: If the solution reach a Steiner vertex, the candidate solution from the left branch \((c, t)_{left}\) is merged with the candidate solution from the right branch \((c, t)_{right}\). The merging solution \((c', t')\) is given by

\[
c' = c_{right} + c_{left} \quad \text{and} \quad t' = \max(t_{right}, t_{left}).
\]

4. When the candidate solution reaches the source vertex, the delay at source is computed with consideration for the source resistance, \( R_s \) as follows.

\[
c' = c + c_b \quad \text{and} \quad t' = t + d_b + r_b c + t.
\]
\[ t_{source} = t + cR_s. \]  

**PROPOSED ALGORITHM**

**Design Descriptions of the Proposed Algorithm**

HRTB-LA algorithm comprises of five main stages as shown in Figure 5. The first stage is the graph construction phase where the 2D grid graph is constructed to represent the VLSI layout.

![Diagram](image)

**Figure 5:** Main stages in HRTB-LA

The tree modification is performed in stage two. The tree adjustment in HRTB-LA is adopted from (Hu et al. 2003) where the initial tree is adjusted according to the obstacles before the path expansions are performed. According to (Hu et al. 2003), the difficulty of buffer obstacle problem occurs when a Steiner vertex lies in an obstacle region, which eliminates opportunities for buffer insertion at the vertex. The key idea of tree adjustment is to consider an alternative Steiner vertex outside of the obstacle without changing the original topology.

The graph pruning in stage three is used to reduce the search space of the algorithm. The idea is to remove the redundant vertices from the graph before the search for path expansion is performed. Stage 4 is the look-ahead weight vector calculation, and stage 5 is the path expansion stage. The maze search starts from each sink towards the Steiner vertex where the branch merging operations are performed to create a new solution set. These solutions will be propagated toward the source and the best solution is selected as a final solution. As they are the most critical parts of the proposed algorithm, stages four and five of the algorithm are explained in more detail in the following subsections.

**Look-ahead scheme**

The look-ahead concept is a mechanism to reduce the search space of possible paths. The first idea was introduced in the field of artificial intelligence (Lin 1965; Newell and Ernst 1965). The idea is to limit the set of possible paths by using information of the remaining sub-paths toward the destination. The look-ahead concept was then adopted in the QoS routing in (Mieghem and Kuipers 2004) where the look-ahead was proposed to further limit the set of possible sub-paths when solving the MCP (multi-constraint paths) problem. In VLSI routing and buffer insertion problem, it was utilized by (Khalil-Hani and Shaikh-Husin 2009) but it was only for two-terminal nets. In this work, we extend this idea into the multi-terminal nets optimization. The concept of look-ahead is to maintain the lowest weight component \( w_i \) \( 1 \leq i \leq m \) from the source vertex to the destination vertex. This information provides each vertex \( u \) with attainable lower bound of \( w_i(P_u \rightarrow v_{dest}) \) where \( v_{dest} \) is the destination vertex. We denote by \( LA(u) \) the lower bound weight vector for vertex \( u \), known as the look-ahead weight vector.

In HRTB-LA, the look-ahead weight vectors are used to guide the path expansion from one node to another node, i.e. from sink node to Steiner node and so on. These weights will be combined with the weights from normal path expansion to form a so-called predicted end-to-end delay. The look-ahead weight vectors are the resistance-delay \( (r, t) \) pair from a node (we call this as a start node) to the next downstream node (end node). In other words, the look-ahead weight vectors are the candidate solutions for the downstream path expansions. Hence, the computation for look-ahead weight vectors are as follows;

1. **Look-ahead wire expansion:** Expand the candidate solution from vertex \( u \) to \( v \) by inserting a wire segment between \( u \) and \( v \) as shown in Figure 6. If \( (r, t) \) is the candidate solution at vertex \( u \), then the new candidate solution \( (r', t') \) at vertex \( v \) is given by

   \[ r' = r_u + r \quad \text{and} \quad t' = \left( r + \frac{r_u}{2} \right) c_u + t. \]

![Diagram](image)

**Figure 6:** Wire expansion from vertex \( u \) to vertex \( v \) for downstream path expansion

2. **Look-ahead wire expansion terminated by buffer:** Expand the candidate solution from vertex \( u \) to \( v \) by inserting a wire segment between \( u \) and \( v \) and insert the buffer at vertex \( v \) as shown in Figure 7.

![Diagram](image)

**Figure 7:** Wire expansion from vertex \( u \) to vertex \( v \) and buffer insertion at \( v \)

If \( (r, t) \) is the candidate solution at vertex \( u \), then the new candidate solution \( (r', t') \) at vertex \( v \) is given by
\[ r' = r_b \text{ and } t' = (c_{w} + c_{b}) + r_w \left( \frac{c_{w}}{2} + c_{b} \right) + d_b + t. \]  

(6)

To understand the concept of look-ahead, we now explain the look-ahead scheme using the following example. Figure 8 shows an interconnect tree with two sinks.

![Figure 8: A sample tree](image)

The corresponding 2D grid graph for the area between sink1 and Steiner node is shown in Figure 9. In Figure 9, Steiner node and sink1 are located in vertices 39 and 14 respectively. Vertices 12, 13, 26 and 27 are wire obstacle vertices \( V_{OB} \) while vertices 40 and 41 are buffer obstacle vertices \( V_{OB} \). The computations for this illustration are performed using the following parameters; Load capacitance \( C_L = 0.022 \text{ pF} \), wire resistance \( r_w = 37.5 \text{ \Omega/segment} \), wire capacitance \( c_{w} = 0.1026 \text{ pF/segment} \), buffer input capacitance \( c_{b} = 0.022 \text{ pF} \), buffer output resistance \( r_b = 104.2 \text{ \Omega} \), buffer intrinsic delay \( d_b = 20 \text{ ps} \) and the source output resistance \( R_s = 104.2 \text{ \Omega} \).

![Figure 9: A 2D grid graph representing a tree in Figure 8 between Steiner node and sink1](image)

At first, HRTB-LA transforms the 2D grid graph into a 1D graph. The 1D graph vertices is based on the shortest topological distance between Steiner vertex (start node) and sink1 vertex (end node) ignoring buffer obstacles \( V_{OB} \). For example, the shortest topological distance between Steiner vertex and sink1 of Figure 9 is five; therefore, the 1D graph to calculate the look-ahead weight vectors has six vertices as shown in Figure 10, where topological distance between vertex 1 and vertex 6 is five. The look-ahead weight vectors are then calculated for each vertex in the 1D graph. Recall that the look-ahead weight vectors are the downstream candidate solutions, hence, they are computed using Eq. (5) and (6).

![Figure 10: 1D grid graph](image)

The look-ahead weight vectors for the graph in Figure 9 are shown in Figure 11. In the 1D graph, vertex 6 corresponds to the sink1 vertex in the original 2D graph. Vertex 5 in 1D graph corresponds to all the vertices in the original 2D graph that are four grids away from the Steiner node (vertices 28 and 56) while vertex 3 in 1D graph corresponds to vertices two grids away from the Steiner node (vertices 41 and 54), and so on. The vertex that exceeds the topological start-to-end node distance will not have any look-ahead vector. A special value, \( WeightMax \) is assigned as the look-ahead weight for these vertices. \( WeightMax \) is the minimum delay at the end node taking into account the load capacitance \( C_L \) and is given by

\[ WeightMax = min(rC_L + t, \forall (r,t) \text{ weight at end node}). \]  

(7)

The look-ahead weights will be combined with the weights from normal path expansion to form a so-called predicted end-to-end delay. The expansion is now guided by using the predicted end-to-end delay instead of the normal path expansion delay. This will reduce the number of candidates significantly because the candidate that has a predicted end-to-end delay greater than a known end-to-end delay will not be expanded. For a vertex \( v \), the predicted end-to-end delay is given by

\[ EndToEndDelay = t_v + t_{LA} + r_{LA}c_v \]  

(8)

where \( t_v \) and \( c_v \) are the accumulated delay and capacitance to vertex \( v \) from sink node respectively while \( t_{LA} \) and \( r_{LA} \) are the look-ahead delay and resistance for vertex \( v \) (i.e. the accumulated delay and resistance from Steiner node to node \( v \)) respectively.

**Path expansion**

Path expansion is the process of constructing the path from sink nodes toward the source node. In HRTB-LA, path expansion is implemented using priority queue. The pseudo-code for the path expansion in HRTB-LA is shown in Figure 12.

In Figure 9, the path expansion begins from sink1 where the first \((c, t)\) pair is \((0.022, 0)\). At the beginning node, the delay is used as the key in the priority queue (line 1), hence the initial key value in the priority queue is 0. The first EXTRACT_MIN (extract the minimum key value from the queue) will extract the candidate solution from sink node for the next path expansion as there is only one key value in the queue (lines 3 – 4). The algorithm will check if the extracted candidate is the candidate from the start node (in this case the Steiner node). The extracted candidate is not from the start node, therefore, lines 6 – 7 are skipped.

The path expansion is performed in lines 9 – 16. For each allowable edge, wire expansion is performed in lines 11 – 12 where the new \((c', t') = (0.12, 2.75)\) is computed using Eq. (1). This candidate is now inserted into the solution list and the delay component of the candidate is
added into the queue by invoking the function InsertCandidate. The function InsertCandidate is shown in Figure 11: Association of look-ahead weight vectors to input grid graph

Figure 12. Pseudo-code for the path expansion in HRTB-LA

In HRTB-LA, the candidate solution \((c_1, t_1)\) is said to be dominated by \((c_2, t_2)\) if \(c_1 > c_2\) and \(t_1 > t_2\). The predicted end-to-end delay is computed in lines 9 – 11 and it is pushed into the queue in line 12.

So far, the queue contains only the key associated with the candidate solution for vertex 28. The next EXTRACT_MIN will extract this candidate for the next path expansion. The expansion is from vertex 28 to vertex 42 only because vertex 27 is located in the wire obstacle. There are two types of expansion which are wire expansion

at vertex 28. The path expansion is repeated until the first solution reaches the Steiner node (vertex 39).

Figure 13: Pseudo-code for the insert candidate in HRTB-LA

In order to make look-ahead possible for a multi-terminal problem, a buffer must be inserted at the Steiner node such that end-to-end delay can be computed. By doing this, the quality of the solution may not be as good as the solution from the algorithm with normal path expansion (no look-ahead). However, from experimental results, the solution quality degradation is very small.

The predicted end-to-end delay that reaches the Steiner node (or source node) is recorded as a known minimum end-to-end delay. For the other path expansions, if their predicted end-to-end delay is greater than this actual known minimum end-to-end delay, then the dominated candidate will be removed. In this way, the number of candidates at the vertices can be substantially reduced, thus speeding up the routing path construction.
Time complexity of HRTB-LA

The proposed algorithm uses the Fibonacci heap data structure (Cormen et al. 2009) to implement the priority queue required for its operations. The advantage of Fibonacci heap over other heap algorithms such as binary heap and binomial heap is that it has much faster operations for the INSERT (used to add new key into the queue) and DECREASE KEY (used to remove a redundant key from the queue) functions. These two functions are implicitly called in function InsertCandidate of HRTB-LA. In HRTB-LA algorithm, the most time consuming part is the path expansion process. In the function Path Expansion, the number of EXTRACT_MIN operations in the priority queue is upper bounded by the total number of vertices |V|. Since Fibonacci heap is used to implement the priority queue, therefore the amortized time for EXTRACT_MIN operation takes $O(B|V|^2 \log |V|)$ because the number of candidate solutions at each vertex is at most $|B|V|$ (Zhou et al. 2000). In Fibonacci heap, each of the INSERT and DECREASE KEY operations in the queue takes $O(1)$. Hence, a wire expansion (lines 10 – 12 in Path Expansion) takes $O(|B||V|)$ times because the pruning and the end-to-end delay prediction operations are linear. Note that, the edge connection operation is bounded by $O(|E|)$ where $|E|$ is the total number of edges. Therefore the total computation time for wire expansions is $O(B|V|||E|)$. Meanwhile, the second expansion (wire expansion terminated by buffer, in lines 13 – 16) takes $O(B^2|V|||E|)$. Therefore the total running time for HRTB-LA algorithm is $O(M|B||V|^2 \log |V| + |B||V||E| + |B^2||V||E|)) \approx O(M(|B||V|^2 \log |V| + |B||V|||E|))$ where $|M|$ is the total number of sinks and Steiner nodes. In practice, the number of $|E|$ and $|V|$ are small due to look-ahead scheme and graph pruning. This is proved by the experimental results presented in the following section.

RESULTS AND DISCUSSION

The proposed algorithm is implemented in C running on a 2.4 GHz Intel Core i5 PC with 4 GB RAM. Two set of experiments were performed. The first experiment was performed to prove that the solution quality of HRTB-LA (which applies the simultaneous routing and buffer insertion on adjusted tree) is better than algorithms which insert buffers on fixed routing tree. The second experiment was performed to demonstrate the effectness of the look-ahead scheme over the algorithm that performs normal path expansion (no look-ahead). We refer the algorithm with normal path expansion as HRTB (Utturphan and Shaikh-Husin 2013) for Hybrid Routing Tree and Buffer insertion.

Experiment 1

In this test, the solutions from HRTB-LA were compared with the solutions from RBI (fast buffer insertion) algorithm (Li et al. 2012) and RIATA (Hu et al. 2003). The code for RBI algorithm is available for download at http://dropzone.tamu.edu/~zhuoli/GSRC/fast_buffer_insertion.html. However, the code for RIATA is not available for download. Therefore, we coded our version of RIATA based on the descriptions in (Alpert et al. 2009) and (Hu et al. 2003). HRTB-LA, RBI and RIATA were tested on 21 different nets and graphs. The number of sink nodes ranges from 3 to 9 sinks. Graph sizes are from $30 \times 30$ to $80 \times 50$ and the wire and buffer obstacles were randomly generated. The test results are tabulated in Table 1. The table is organized as follows; columns 1 to 4 are the net name, graph size, the number of sink nodes in the net and the size of obstacle areas (wire/buffer) as compared to the size of the graph respectively. The fifth column shows the minimum delay after the net is optimized by FBI when there is no obstacle on the graph. This column indicates absolute minimum delay for the given net. These values are used as a reference in this test. Columns 6 to 8 are the delay at source obtained from FBI, RIATA and HRTB-LA algorithms respectively. Columns 9 and 10 are the delay improvement of HRTB-LA (in percentage) over FBI and RIATA respectively. As an example, for net 3S1, the graph size is a $30 \times 30$ grid (total vertices = 900). There are three sink nodes in this net and the obstacle areas are 26.9% of the graph. When the obstacles are ignored (can insert buffer anywhere on the net), the delay measured at source is 1192.89 ps. Meanwhile, when the obstacles are taken into account, the FBI algorithm returns delay at the source of 2268.21 ps. RIATA returns 1201.08 ps and HRTB-LA returns 1195.52 ps. This means that the delay improvement of HRTB-LA over FBI and RIATA are 47.29% and 0.46% respectively.

Clearly in all test cases, the solutions of HRTB-LA are better than the solutions from FBI where the highest delay improvement was recorded at net 3S1 which is 47.29%. For the comparison with RIATA, HRTB-LA improves the delay for most of the nets where the highest delay improvement was recorded for net 7S2 at 24.83%. However, the delays obtained from HRTB-LA are a bit larger than the delays obtained from RIATA for nets 3S2 and 4S1. This is because the obstacle areas on these nets are relatively small. Hence, the routing paths of HRTB-LA and RIATA are the same. Recall that a buffer must be inserted at each Steiner node in HRTB-LA and causes HRTB-LA to return a slightly higher delay. However, the solution degradations are relatively small which are 0.22% and 1.1% for net 3S2 and 4S1 respectively. In fact, if the obstacle areas are large (more than 18%), the solutions from HRTB-LA are better than the solutions from RIATA.

Experiment 2

In the second test, the solution quality, runtime and the number of candidate solutions produced by FBI, RIATA, HRTB and HRTB-LA algorithms are compared. The algorithms are performed on a randomly generated net with 25 sinks. The size of the grid is 100 x 100 which is equivalent to 20 mm x 20 mm layout size. The wire and buffer obstacles are 20% and 10% of the graph respectively.

The test results are summarized in Table 2 and for better comparison, the plots of the results are also provided. The delay at source, runtime, and the number of candidate solutions for all algorithms were recorded. There are six test cases where the first case is when there is only one buffer type in the library. The second case is when there are two buffer types in the library and so on. As an example, on one buffer type, the FBI algorithm returns delay at 18854 ps and the runtime is 0.37 s. The number of candidate solutions is not given because FBI code does not provide this information. The delay at source obtained from RIATA is
18485 ps and the runtime is 1.56 s. The delay at source obtained from HRTB and HRTB-LA are 18073 ps and 18120 ps respectively while the recorded runtime are 5.61 s and 5.38 s respectively. When there are two or more types of buffer in the library (cases 2 - 6), the solution quality for all algorithms are improved.

Table 1: Delay at source comparison between FBI, RIATA and HRTB-LA

<table>
<thead>
<tr>
<th>Graph name</th>
<th>Size</th>
<th>#Sink</th>
<th>Obstacles</th>
<th>Min Delay (ps)</th>
<th>Delay (ps)</th>
<th>Delay improvement (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>FBI</td>
<td>RIATA</td>
<td>HRTB-LA</td>
</tr>
<tr>
<td>351</td>
<td>30 X 30</td>
<td>3</td>
<td>26.87%</td>
<td>1392.89</td>
<td>2268.21</td>
<td>1201.08</td>
</tr>
<tr>
<td>352</td>
<td>30 X 30</td>
<td>3</td>
<td>15.87%</td>
<td>1201.54</td>
<td>1621.95</td>
<td>1219.80</td>
</tr>
<tr>
<td>353</td>
<td>30 X 30</td>
<td>3</td>
<td>31.15%</td>
<td>1199.79</td>
<td>1610.08</td>
<td>1405.91</td>
</tr>
<tr>
<td>451</td>
<td>30 X 30</td>
<td>4</td>
<td>11.59%</td>
<td>1281.65</td>
<td>1902.90</td>
<td>1323.15</td>
</tr>
<tr>
<td>452</td>
<td>30 X 30</td>
<td>4</td>
<td>23.33%</td>
<td>1281.65</td>
<td>2119.90</td>
<td>1356.02</td>
</tr>
<tr>
<td>453</td>
<td>30 X 30</td>
<td>4</td>
<td>23.08%</td>
<td>1281.65</td>
<td>2350.99</td>
<td>1538.62</td>
</tr>
<tr>
<td>551</td>
<td>50 X 30</td>
<td>5</td>
<td>19.44%</td>
<td>1581.66</td>
<td>1737.70</td>
<td>1735.04</td>
</tr>
<tr>
<td>552</td>
<td>50 X 30</td>
<td>5</td>
<td>21.25%</td>
<td>1432.88</td>
<td>1897.00</td>
<td>1580.96</td>
</tr>
<tr>
<td>553</td>
<td>50 X 30</td>
<td>5</td>
<td>30.85%</td>
<td>1656.23</td>
<td>2394.51</td>
<td>1904.55</td>
</tr>
<tr>
<td>651</td>
<td>50 X 50</td>
<td>6</td>
<td>18.02%</td>
<td>1887.86</td>
<td>3389.40</td>
<td>3034.05</td>
</tr>
<tr>
<td>652</td>
<td>50 X 50</td>
<td>6</td>
<td>29.73%</td>
<td>2330.67</td>
<td>3234.63</td>
<td>2696.07</td>
</tr>
<tr>
<td>653</td>
<td>50 X 50</td>
<td>6</td>
<td>35.63%</td>
<td>2330.67</td>
<td>3296.19</td>
<td>2748.18</td>
</tr>
<tr>
<td>751</td>
<td>80 X 30</td>
<td>7</td>
<td>35.96%</td>
<td>2436.38</td>
<td>3569.73</td>
<td>3569.73</td>
</tr>
<tr>
<td>752</td>
<td>80 X 30</td>
<td>7</td>
<td>38.46%</td>
<td>2326.40</td>
<td>3896.01</td>
<td>3605.50</td>
</tr>
<tr>
<td>753</td>
<td>80 X 30</td>
<td>7</td>
<td>31.20%</td>
<td>2250.95</td>
<td>3661.29</td>
<td>3186.19</td>
</tr>
<tr>
<td>851</td>
<td>80 X 30</td>
<td>8</td>
<td>24.84%</td>
<td>2786.24</td>
<td>3621.55</td>
<td>3263.18</td>
</tr>
<tr>
<td>852</td>
<td>80 X 30</td>
<td>8</td>
<td>26.45%</td>
<td>2786.24</td>
<td>4058.96</td>
<td>3730.60</td>
</tr>
<tr>
<td>853</td>
<td>80 X 50</td>
<td>8</td>
<td>33.94%</td>
<td>2824.65</td>
<td>4067.09</td>
<td>3849.26</td>
</tr>
<tr>
<td>951</td>
<td>80 X 50</td>
<td>9</td>
<td>34.04%</td>
<td>2831.93</td>
<td>3749.61</td>
<td>3238.64</td>
</tr>
<tr>
<td>952</td>
<td>80 X 50</td>
<td>9</td>
<td>34.04%</td>
<td>2831.93</td>
<td>3749.61</td>
<td>3238.64</td>
</tr>
<tr>
<td>953</td>
<td>80 X 50</td>
<td>9</td>
<td>35.52%</td>
<td>2827.99</td>
<td>4106.27</td>
<td>3500.76</td>
</tr>
</tbody>
</table>

Table 2: Solutions quality, runtime and number of candidate solutions for Experiment 2

<table>
<thead>
<tr>
<th>Buf</th>
<th>delay (ps)</th>
<th>Runtime (s)</th>
<th># Candidate</th>
<th>delay (ps)</th>
<th>Runtime (s)</th>
<th># Candidate</th>
<th>delay (ps)</th>
<th>Runtime (s)</th>
<th># Candidate</th>
<th>delay (ps)</th>
<th>Runtime (s)</th>
<th># Candidate</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>18854</td>
<td>0.37</td>
<td>-</td>
<td>18485</td>
<td>1.56</td>
<td>6480</td>
<td>18073</td>
<td>5.61</td>
<td>17225</td>
<td>18120</td>
<td>5.38</td>
<td>1985</td>
</tr>
<tr>
<td>2</td>
<td>18774</td>
<td>0.39</td>
<td>-</td>
<td>18381</td>
<td>1.71</td>
<td>11318</td>
<td>17955</td>
<td>6.29</td>
<td>22316</td>
<td>17983</td>
<td>5.32</td>
<td>3985</td>
</tr>
<tr>
<td>3</td>
<td>18724</td>
<td>0.43</td>
<td>-</td>
<td>18307</td>
<td>2.06</td>
<td>17471</td>
<td>17904</td>
<td>9.16</td>
<td>34651</td>
<td>17924</td>
<td>5.39</td>
<td>7259</td>
</tr>
<tr>
<td>4</td>
<td>18713</td>
<td>0.53</td>
<td>-</td>
<td>18289</td>
<td>2.8</td>
<td>32525</td>
<td>17891</td>
<td>14.6</td>
<td>46811</td>
<td>17912</td>
<td>5.52</td>
<td>9869</td>
</tr>
<tr>
<td>5</td>
<td>18707</td>
<td>0.72</td>
<td>-</td>
<td>18277</td>
<td>4.09</td>
<td>29426</td>
<td>17880</td>
<td>25.1</td>
<td>58636</td>
<td>17909</td>
<td>5.71</td>
<td>12589</td>
</tr>
<tr>
<td>6</td>
<td>18707</td>
<td>0.92</td>
<td>-</td>
<td>18277</td>
<td>5.68</td>
<td>35260</td>
<td>17880</td>
<td>40.37</td>
<td>70240</td>
<td>17909</td>
<td>6.07</td>
<td>14984</td>
</tr>
</tbody>
</table>
CONCLUSION

In this paper, the hybrid of the simultaneous and post routing approach for multi-terminal nets is described. By utilizing a given routing path, the proposed algorithm, HRTB-LA adjusts the routing tree if the Steiner node lies in the buffer obstacle. The rerouting process, simultaneously with buffer insertion are performed later on. The results show that the proposed algorithm can produce a better solution compared to other algorithms. To speed up the runtime of the algorithm, the novel look-ahead which were proven to be successful in optimizing the two-terminal nets post routing approach for multi-terminal nets is described. This is because the number of candidate solutions from HRTB-LA is lower than the number of candidate solutions from RIATA. For HRTB-LA, the depth first search in look-ahead scheme allows the algorithm to find the destination as quickly as possible and eliminates unnecessary path expansions later on.

ACKNOWLEDGEMENT

This work is supported by Higher Education Department, Ministry of Education Malaysia and Universiti Tun Hussein Onn Malaysia (UTHM). The author would like to thank Zhuo Li et al. for providing the FBI code.

REFERENCES


ITRS (2013). The international technology roadmap for semiconductors: Interconnect


