# Enhancing Model Order Reduction for Nonlinear Analog Circuit Simulation

Henda Aridhi, Member, IEEE, Mohamed H. Zaki, Member, IEEE, and Sofiène Tahar, Senior Member, IEEE

Abstract-Traditionally, model order reduction methods have been used to reduce the computational complexity of mathematical models of dynamic systems, while preserving their functional characteristics. This technique can also be used to fasten analog circuit simulations without sacrificing their highly nonlinear behavior. In this paper, we present an iterative approach for reducing the computational complexity of nonlinear analog circuits using piecewise linear approximations, k-means clustering, and Krylov space projection techniques. We model primary circuit inputs, design initial conditions, and circuit parameters as fuzzy variables with different distributions in qualitative simulations. We then iteratively fine-tune the reduced models until a model is achieved that meets a predefined performance and accuracy conformance criteria. We demonstrate the effectiveness of our method using several key nonlinear circuits: 1) a transmission line; 2) a ring oscillator; 3) a voltage controlled oscillator; 4) a phase-locked loop; and 5) an analog comparator circuit. Our experiments show that the reduced model simulations are fast and accurate compared with the existing techniques.

*Index Terms*—Analog circuits, clustering, Krylov space, model order reduction (MOR), qualitative simulation (QS).

## I. INTRODUCTION

**C** OMPUTER simulation is an essential step in the design and verification of analog integrated circuits (ICs). Accurate mathematical models of analog circuits tend to be large and thus their computer simulations are computationally expensive in terms of both memory and CPU resources. A large number of computer simulations are generally required to verify their various functional and performance properties. Accurate and less complex mathematical models of analog circuits can help fasten their verification and optimization processes and in meeting the ever pressing time-to-market constraints.

Model order reduction (MOR) is a promising technique that reduces the size and complexity of large scale mathematical models while preserving their main characteristics [1]. This technique has been successfully put in practice for the case of linear analog ICs using Krylov space projections and singular value decomposition (SVD) [2]. However, the problem of elaborating a method for the reduction of nonlinear analog

Digital Object Identifier 10.1109/TVLSI.2015.2421450

circuit models is still an active research area and only a few methods have proved their effectiveness for applications, such as nonlinear transmission lines, amplifiers, and oscillators [3]–[6]. These methods rely on transforming the nonlinear model into a set of local linear or polynomial models and applying projection to obtain reduced local models, which can be evaluated on the fly to approximate the original full order model. The idea is very attractive, but is based on various heuristics. It is, therefore, not always guaranteed to provide accurate reduced models for analog circuits given that their behavior can be highly nonlinear. Large analog circuits are often synthesized in a hierarchical way, resulting in a circuit structure that consists of an interconnection of several instances of linear and nonlinear subcircuits with possibly different parameters. This fact can be used to help reduce the overall computational cost by independently customizing the MOR parameter for each subcircuit as described in [7] for linear analog circuits.

Qualitative simulation (QS) is a method that can be used to characterize the behavior of dynamic systems. It utilizes multivariate optimization techniques in which dynamic systems parameters and initial conditions are considered fuzzy variables with associated possibility distributions [8] instead of concrete values. When applied to analog circuits, it provides the set of trajectories that reach the state variables bounds, given all possible variations in their models. Therefore, it offers a better coverage of their reachable state space and characterization of their nonlinear behavior compared with sampling-based methods such as the Monte Carlo simulation method [9].

In this paper, we propose a new method to MOR of nonlinear analog circuits. We model analog circuits using fuzzy dynamical models and use QS to characterize and determine important trajectories and envelopes of their state variables. Then, we employ the k-means clustering algorithm to subdivide the circuit state space into discrete regions containing its main responses. In each region, a linearized model is reduced via Krylov space projections [10] and is used to approximate the full order nonlinear circuit model. The number of clusters required by our MOR method is determined through an optimization problem constrained by a minimum behavioral error between the original model and its piecewise linearized model. We also extend the state space by adding the input as a decision variable for the evaluation of the reduced models. Moreover, we establish a set of conformance checking criteria and refine the reduced model to guarantee simulation acceleration and accuracy. We illustrate

1063-8210 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Manuscript received May 5, 2014; revised December 22, 2014; accepted March 18, 2015. Date of publication May 13, 2015; date of current version February 23, 2016.

The authors are with the Department of Electrical and Computer Engineering, Concordia University, Montréal, QC H4B 1R6, Canada (e-mail: h\_aridh@encs.concordia.ca; mzaki@encs.concordia.ca; tahar@ece. concordia.ca).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

our proposed methodology on different analog circuits: 1) a transmission line; 2) a ring oscillator; 3) a voltage controlled oscillator (VCO); 4) a phase-locked loop (PLL); and 5) an analog comparator.

The rest of this paper is organized as follows. First, we give an overview of existing MOR techniques developed for large analog ICs in Section II. Then, in Section III, we briefly explain MOR through projection and QS. After that, Section IV details the different steps of our proposed MOR method. In Section V, we display and discuss our experimental results. Finally, the conclusion is drawn in Section VI.

## II. RELATED WORK

The state-of-the-art MOR methods for VLSI circuits are based on the Krylov [10] space projections and the SVD method [1]. The Krylov space [10] based methods perform an implicit transfer function moment matching via the Arnoldi or Lanczos algorithms [1]. The SVD and the Krylov [10] space projections have been used successfully for reducing the size of linear analog circuit models, such as *RC*, *RL*, and *RLC* networks, and so on, as detailed in [2].

A framework for hierarchical MOR (HMOR), which relies on partitioning the circuits into blocks, was introduced in [12]. The target application was the reduction of linear interconnect models for signal integrity analysis. In [13], HMOR was used to enhance the performance analysis of large Resistor, Inductor, Capacitor, with Coupling Coefficient (RLCK) power delivery systems. In addition, Honkala *et al.* [7] introduced an HMOR method to reduce large linear blocks of ICs while preserving their passivity. Unfortunately, the idea of HMOR has not yet been explored for nonlinear circuits.

In spite of the progress of MOR methods and their successful application to various linear circuit models, their extension to the case of nonlinear circuits was not straightforward and is still facing many challenges. In addition to the major differences between linear and nonlinear circuits, their complexity and sensitivity to parameter variability led to difficulties in applying the SVD and Krylov space projections. Existing MOR techniques for nonlinear circuits are based on various heuristics and have many limitations including the dependence on the inputs, the initial conditions, and the parameters variability of the nonlinear circuit model under consideration.

The proper orthogonal decomposition (POD) [14] is a straightforward application of the SVD method to the case of a nonlinear dynamical system. It uses singular vectors of the system response for a fixed input to project the system dynamics into a smaller state space retaining only its main singular values. Hinze *et al.* [6] used the POD to derive surrogate models for semiconductors devices that are modeled by drift-diffusion equations. They also outlined the drawback of this method that is the dependence on the original model inputs and parameters.

Phillips [15] presented a simple method for automatically extracting macromodels of weakly nonlinear circuits with time-varying operating points based on Volterra series and variational analysis theory [16]. Feng [17] outlined that the efficiency of the method in [15] is limited because of the exponential increase of the size of the Volterra series descriptions. They proposed to enhance them using a two-sided projection method. In [18], nonlinear systems are approximated with quadratic Taylor approximations and reduced via Krylov space projections. This method is accurate only if the nonlinear model is similar to a quadratic approximation. These limitations have been addressed in [19] by rewriting strongly nonlinear models in quadratic-linear form without any approximation. However, this method increases the size of the initial model by introducing new variables and equations and scales poorly with the number of nonlinear terms.

The trajectory piecewise linear (TPWL) [3] MOR method utilizes an aggregation of local linear approximations around expansion points that are selected from the trained original model trajectories. This method was used for the case of weakly nonlinear systems, such as nonlinear transmission lines, amplifier chains, and micromachined devices. The accuracy of the TPWL method heavily depends on the extracted expansion points and the training inputs. An improvement for the linear models aggregation in the TPWL MOR method, that uses state velocities in weights computation, is introduced in [20]. An enhancement of the TPWL MOR method, which consists of an adaptive sampling of the linearization points across the model trajectory based on the error between the nonlinear model and its linearized form, is proposed in [21]. Aridhi et al. [5] used a k-means clustering algorithm to optimize the set of linearization points and used simple weights to improve speedup of the TPWL method.

The main steps of the TPWL MOR method have been followed with a replacement of the local linear models with local piecewise polynomial (PWP) models [4] and Tchebychev interpolating polynomial models [22], respectively. These approaches improved the accuracy of the local reduced models but increased their on-the-fly evaluation time. Gu and Roychowdhury [23] presented a nonlinear MOR method that constructs parameterized manifolds capturing dc and ac responses for nonlinear systems using symbolic transformations and Krylov projections.

De Jonghe and Gielen [24] presented a methodology to approximate nonlinear analog circuit models with a compact set of analytical behavioral models. These models are obtained by extracting nodal matrices from SPICE transient simulations and using recursive vector fitting regression algorithm. This method enhances the MOR method automation but does not overcome the input dependence problem. A similar idea, which consist of using curve fitting to automatically generate analog circuit models from their SPICE transient simulation traces and reducing them using Krylov space projections, is proposed in [25]. Bond et al. [26] formulated the identification of stable compact models for radio frequency systems as a semidefinite optimization problem. In [27], a method for high-order Volterra transfer function reduction via Krylov projection is proposed. The novelty of the method consists of using association of multivariate (Laplace) variables in high-order multiple-input multiple-output transfer functions to generate the standard single-s transfer functions.

While the above cited MOR methods present a variety of techniques to reduce different types of circuit models, there are still many limitations that need to be addressed. For instance, the trajectory-based methods can be improved and enhanced in a number of ways. For example: 1) using a method that computes all possible trajectories of the model under the effect of variations in the inputs, parameters, and initial conditions; 2) using a better procedure for the expansion points selection; 3) studying the weighting functions during the aggregation of local approximation models; and 4) using the hierarchical approach [7].

## III. PRELIMINARIES

### A. Model Order Reduction

The majority of analog ICs can be described by the large differential model  $in^1$ 

$$\dot{x} = f(x, t, u, p)$$
  

$$y = g(x)$$
(1)

where  $f : \mathbb{R}^n \to \mathbb{R}^n$  is a vector valued function, x and  $\dot{x} \in \mathbb{R}^n$  are the state vector and its time derivative, t is the time,  $u \in \mathbb{R}^m$  is a vector of inputs,  $p \in \mathbb{R}^{n_p}$  is a set of parameters,  $y \in \mathbb{R}^{n_y}$  is a vector of outputs, and  $g : \mathbb{R}^n \to \mathbb{R}^{n_y}$  is an output generation function.

MOR consists of transforming, algorithmically, the large mathematical model Model(n) of (1) to a smaller model Model(q), as given in

$$\dot{z} = \hat{f}(z, t, u, \hat{p})$$
  
$$\dot{y} = \hat{g}(z)$$
(2)

where  $\hat{f} : R^q \to R^q$  is a vector valued function, z and  $\dot{z} \in R^q$ are the state reduced vector and its time derivative,  $\hat{p} \in R^{n_{\hat{p}}}$  is a set of parameters,  $\hat{y} \in R^{n_y}$  is an approximation of the output vector y, and  $\hat{g} : R^q \to R^{n_y}$  is an output generation function.

The reduced model has a smaller size  $q \ll n$ , has less parameters  $n_{\hat{p}} \leq n_p$ , is less computationally expensive than Model(*n*), and its output accurately reproduces the behavior of Model(*n*). For the case of analog IC models, the creation of a robust method which outputs, in a finite number of step, an accurate and efficient reduced model is very challenging. It should be customized and elaborated based on the common characteristics of analog IC models which are their infinite state space, their sensitivity to parameters and environment conditions, their nonlinearity, and the type of analysis (dc steady state, ac steady state, transient simulations, and so on). In this paper, we focus on reducing the number of equations required to describe the model while preserving their parameters and input dependence.

## B. Projection-Based MOR

The MOR of the mathematical model in (1) via projection consists of finding an  $n \times q$  unitary projection matrix  $(V \cdot V^t = I_n)$  and using the projection  $\hat{x} = V \cdot z$  as an approximate of the original state vector x, where z is the reduced state vector of variables. There are different algorithms, which compute such reduction matrices, e.g., the Arnoldi or Lanczos algorithm [1], the SVD [2], and the POD [6]. For linear dynamical models, only one projection matrix is needed to perform the reduction of the full order model. However, reducing a nonlinear model is extremely challenging. It requires as many reduction matrices as the number of piecewise linear models used to approximate its nonlinear behavior. This is the idea behind the TPWL [3] method and its derivatives [4], [5], as well as the method we propose in this paper.

#### C. Qualitative Simulation

The QS is an extension of traditional numerical-logical integration methods for dynamical system performance analysis [8]. It uses fuzzy differential equations (FDEs), which are differential equations where the deterministic quantities, such as parameters, coefficients, and/or initial conditions, are considered as fuzzy numbers, as given in

$$\dot{x} = f(x, t, u, p)$$
  
 $x(0) = \mu_{x(0)}, \quad u = \mu_{u}, \quad p = \mu_{p}$  (3)

where  $\mu_{x(0)}$ ,  $\mu_u$ , and  $\mu_p$  refer to the membership functions used for the possibility distributions of the fuzzy numbers x(0), u, and p, respectively. The membership functions can have uniform, Gaussian, trapezoidal, triangular, or bell function forms. They are transformed into sets of intervals by means of  $\alpha$  – cut levels during the QS. Therefore, the uniform membership function is represented with a single  $\alpha$  – cut level, the Gaussian membership function allows the evaluation of the model for different  $\alpha$  – cut levels and consequently requires larger QS runtime than the uniform membership function [8].

To illustrate the use of QS, we explain its application for the case of a dynamical system subject to uncertain initial state values. The QS finds an envelope of all system trajectories originating in time from the fuzzy number  $\mu_{x(0)}$ . It formulates this problem as a multiobjective optimization problem, as given in

min 
$$F(y) = \int_0^{t^*} f(x, t, u, p) dt$$
,  $y = x(0)$   
s.t.  $y = \mu_{x(0)}, x \in [a, b]^n$  (4)

where the objective function is the solution x = F(y) of the dynamical system in (3) and the constraints are the initial fuzzy condition y described with a membership function  $\mu_{x(0)}$ and the solution x being within the set  $[a, b]^n$ . In this problem, the optimal solution  $y^*$  is not important as  $x_l(t^*) = F(y^*)$ , which provides the lowest possible state value. The highest possible state value  $x_h(t^*)$  is obtained by maximizing the same problem in (4). The complete envelope of the dynamical system transient behavior, i.e., the time evolution of the initial fuzzy state  $\mu_{x(0)}$ , is obtained by computing all the trajectories which lead to  $x_l(t^*)$  and  $x_h(t^*)$  for  $t^* = 0, \ldots, t_f$ . The evaluation of the effect of process variation on the model trajectories is formulated similar to (4). However, for a better convergence, the gradient of the objective function x = F(p) with respect to the parameter p  $(\partial x/\partial p)$ , which can be numerically approximated using the relation  $((\partial \dot{x}/\partial p) = (\partial \dot{x}/\partial x)(\partial x/\partial p))$ , has

<sup>&</sup>lt;sup>1</sup>All terms defined, in this section, have the same meaning in the rest of this paper, unless stated differently.



Fig. 1. Initial conditions membership function examples. (a) Uniform. (b) Gaussian.



Fig. 2. QS examples. (a) Transmission line output. (b) Differential ring oscillator output.

to be provided to the optimization engine [8]. In this paper, we only used QS for input and initial conditions.

Fig. 1(a) and (b) provides an example of fuzzy initial output conditions modeled with uniform and Gaussian membership functions for a transmission line and a differential ring oscillator, respectively. The QS of their output voltage is shown in Fig. 2(a) and (b), respectively. The obtained transient envelopes contain a complete set of their model trajectories and provide an insight on how sensitive the model is to the considered fuzzy quantities. For example, Fig. 2(a) shows that the transmission line behavior is equally affected while in Fig. 2(b) the differential ring oscillator is more sensitive to its initial output state during the startup time.

## IV. PROPOSED METHODOLOGY

Fig. 3 presents the proposed method for reducing analog circuits mathematical models. It has five main components, which are executed in an iterative way until the reduced model satisfies a defined set of accuracy and speedup constraints. First, in the model extraction and the preanalysis steps the circuit differential model is extracted and simulated in order to define a good guess of the MOR method parameters [5]. In this step, the model is linearized at different linearization points, which have been previously classified into regions via QS and clustering of the extended state space, in the previous preanalysis step. Then, the linearized models are reduced in each region using Krylov space projections and the final reduced model is obtained by dynamically evaluating a weighted sum of three local models. After that, the reduced



Fig. 3. MOR method overview.

model is input to a conformance criteria step to check that it is much faster than the original circuit model while it mimics its behavior. Based on the result of this step, either the reduced model is accepted or it is refined iteratively during the model refinement step until all the speedup and accuracy requirements are met.

## A. Model Extraction and Preanalysis

The model extraction step, which parses a circuit SPICE netlist and applies modified nodal analysis (MNA) formulation [28], leads to a parametric differential model (Model(n)), as given in (1). It consists of n symbolic nonlinear differential equations relating all the state variables (the circuit voltages and currents) and including a set of parameters p which represent device values and geometry. The main objective of the preanalysis step is the evaluation of the full order Model(n) and the prediction of a good initial guess of the MOR method parameters for a good reduced model accuracy/speedup tradeoff. In fact, Model(n) is transformed to a set of FDEs, as given in (3), and is input to QS. The input and the initial conditions are considered as fuzzy numbers with uniform or Gaussian membership distributions, as detailed in Section III-C. The QS of the obtained FDEs leads to a set of state trajectories and their envelopes. The state trajectories are used to select linearization points and the envelopes provide a means to measure how sensitive the model is to each of the specified fuzzy quantities. If the state envelopes are tight, the model is not too sensitive to the introduced fuzziness. Otherwise, if the state envelopes are wide, the model is very sensitive to the provided fuzzy quantity and the reduced model has to be built carefully in order to preserve the same behavior variations. As detailed in [5], the algorithm used for the selection of the k linearization points is based on k-means clustering. The linearization of (1) is given in

$$\dot{x} = A_{x_c} \cdot (x - x_c) + A_{u_c} \cdot (u - u_c) + A_{p_0} \cdot (p - p_0) + f(x_c, t, u_c, p_0) y = g(x)$$
(5)

where  $A_{x_c} = (df/dx)_{|x=x_c}$ ,  $A_{u_c} = (df/du)_{|u=u_c}$ , and  $A_{p_0} = (df/dp)_{|p=p_0}$ .



Fig. 4. State-space regions.

One of the main shortcomings of previous MOR methods is the automatic selection of the linearization points set. In the proposed method, the extended state space which also includes the input, is considered and the linearization points selection is performed in two steps: 1) a subdivision of the circuit state space into R regions which are overlapping at their interface, where the behavior of the circuit is coarsely the same (extra-wide, wide, tight QS envelope regions, and so on) and 2) the clustering via k-means of the content of each of these regions. The number of the linearization points, which ensure a target model accuracy in each of these regions, has to be minimal. The reason behind this is related to the speedup of the reduced model that is higher when using a smaller number of points. Fig. 4 shows this idea for a model having two state variables and requiring R = 3 behavior regions. The region  $R_1$  contains many points, and  $R_2$  contains a few points of  $R_1$  and shares a few points with  $R_3$ . Using this kind of subdivision of regions limits the total number of linearization points required to evaluate the reduced model during simulation, which improves speedup and accuracy. For example, some points of the region  $R_1$  are required at the beginning of a simulation during a transient behavior and are never reused for the rest of the simulation time, while the remaining regions  $R_2$  and  $R_3$  are alternatively used for a steady-state behavior.

The objective of clustering is to gather similar states of the circuit in the same cluster and use their centroid (geometrical mean) as a linearization point. A local linear model around that point is going to be used to generate a local reduced model. However, using the centroid, which is not necessarily a real numerical solution of the circuit model, as a linearization point might introduce errors. Therefore, after clustering the content of each of the main R behavior regions using the k-means clustering method, the clusters centroid are replaced with their nearest points from the simulation traces. The number of clusters k is not the same for all the R regions and is determined using the optimization procedure in

$$\min k$$
  
s.t.  $||y - y_L|| \le \varepsilon$  (6)

where the number of clusters k needs to guarantee a minimal error between the solution y of the original Model(n), given in (1) and the solution  $y_L$  of the piecewise linearized model given in (5).



Fig. 5. Effect of the number of clusters on the linearized model accuracy.



Fig. 6. Model reduction method.

Fig. 5 shows the error of a test circuit (PLL) output for a varying number of clusters, using one behavior region after the lock-up time. If we use less than six clusters the output error is greater than a predefined error threshold  $\varepsilon = 0.05$ . This error is going to increase after reducing the model. Then, we initially use a  $k \ge 6$  as a first guess to generate a reduced model and we may tune it during the model refinement step, based on the conformance criteria checking status.

## B. Model Reduction

Fig. 6 details the MOR method which consists of the following three main steps.

Step 1 (Local Linear Models Generation): The matrices and the vectors in (5) are computed for  $k_i$  clusters in each of the i = 1, ..., R regions.

Step 2 (Reduction Using Krylov Space Projection): A Krylov type projection basis  $V_i$  is determined for each of the i = 1, ..., R regions using the Arnoldi process [1]. It is a unified basis for the reduction via projection of all the local models within a region  $V_i = \text{SVD}(\bigcup_{j=1}^{k_i} V_j)$ , where SVD is the SVD operator [1]. The local reduced models Model<sub>i</sub>(q) are given in (7), where z is the reduced state variable, i is the region index, and j is the linearization point index. The terms related to the parameters are included only when the original model is sensitive to them. Otherwise, their presence in the reduced model will increase its evaluation time without necessarily improving its accuracy

$$\dot{z} = \hat{A}_{z_j} \cdot (z - z_j) + \hat{A}_{u_j} \cdot (u - u_j) + \hat{A}_{p_0} \cdot (p - p_0) + \hat{f}(x_j, t, u_j, p_0)$$
(7)

where  $\hat{A}_{z_j} = V_i^t \cdot A_{x_j} \cdot V_i$ ,  $\hat{A}_{u_j} = V_i^t \cdot A_{u_j}$ ,  $\hat{A}_{p_0} = V_i^t \cdot A_{p_0}$ , and  $\hat{f}(x_i, t, u_j, p_0) = V_i^t \cdot f(x_i, t, u_j, p_0)$ .

The matrices and the vectors in (7) are stored in a lookup table and are used to compute the solution of the reduced differential model on the fly.

Step 3 (Generation of a Sequence of Weighted Models): Three reduced models  $Model_i(q, p)$  are weighted to form the reduced model Model(q). The weights are intended to smooth transitions between state-space regions and allow contributions of different local models. However, the procedure for selecting the closest linearization points to the current state z and the weights computation should be simple, otherwise, the simulation time will increase extensively (more multiplication and summing operations) without any gain in terms of accuracy. We find the set of only three closest linearization points  $k_s$  such that  $[k_s] = 3$  and it verifies the condition in (8), where  $k_i$  represents the current region linearization point indexes

$$\forall s \in k_s \quad \forall j \in \{k_i - k_s\} \quad \left\| \begin{matrix} z - z_s \\ u - u_s \end{matrix} \right\| \le \left\| \begin{matrix} z - z_j \\ u - u_j \end{matrix} \right\|. \tag{8}$$

Next, we make sure that the selected closest points set  $k_s$  to the current state have been generated for similar input conditions by involving the input u as a constraint in (8). In addition, the fact that the linearization points are organized into sets, which correspond to different behavioral regions instead of a single region with many points, makes the search for the set of points  $k_s$  faster.

The final reduced model Model(q) is given in

$$\dot{z} = \sum_{s \in k_s} w_s \cdot \hat{A}_{z_s} \cdot (z - z_s) + \hat{A}_{u_s} \cdot (u - u_s) + \hat{A}_{p_0} \cdot (p - p_0) + \hat{f}(x_s, t, u_s, p_0) \hat{y} = g(V_i \cdot z)$$
(9)

where  $\hat{y}$  is the output of the reduced model that approximates the output y of the full order model,  $w_s = (||z - z_s||^{-1}/(\sum_{s \in k_s} ||z - z_s||)^{-1}), s \in k_s$  are the current state weights and the rest of the terms are the same as defined in (7).

The main limitations of the above MOR method are the possible large number of state-space regions and linearization points, when the circuit behavior is strongly nonlinear and the signal variability is high. In addition, this method blindly reduces a flat circuit without optimizing the reduction effort when repeated circuit structures are present. These limitations can be avoided by subdividing a flat circuit into a set of subcircuits. Therefore, this simple MOR method can be applied in an iterative way to reduce each subcircuit in an order that depends on their computational cost, complexity, and size, until a reduced model compliant with the conformance criteria is obtained, as described in the sequel.



Fig. 7. HMOR scheme.

## C. Hierarchical MOR

The idea of HMOR of linear independent subcircuits has been proposed in [7]. We extend it here for the case of nonlinear circuit models based on the simple MOR method shown in Fig. 3. The advantage of a hierarchical reduction is to reduce the set of linearization points for large models and reduce the MOR computational effort when repeated circuit structures are present in the design. For example, if a circuit has N = 4 transmission lines and each of them needs a set of k = 7 linearization points. The total number of linearization points required for the case of the simple MOR method is  $k^N$  (7<sup>4</sup> = 2401) while this number is  $N \times k$  (7 × 4 = 28) for the case of a HMOR method. Using a large number of linearization points slows down the reduced model considerably, since during its evaluation the solution is always based on the closest points, which are searched in the set of linearization points. The number of required transient regions R is also reduced in the same way.

Fig. 7 provides the five main steps to perform an HMOR of large circuits having repeated subcircuits by iteratively employing the simple MOR method, as follows.

Step 1 [Subdivision of Model(n) Into N Models]: If the circuit netlist is not defined as independent subcircuits, then the differential Model(n) is subdivided into a set of N models (10) while performing the MNA formulation

$$\dot{x}_{Mi} = f_i(x_{Mi}, t, u, p) \tag{10}$$

where  $x_{Mi}$  is a subset of the state variables  $x_1, \ldots, x_n$ .

The dependence matrix, whose rows refer to the Model(*n*) equation indexes and columns refer to the state variable indexes, is generated as shown in Fig. 8. Each element  $d_{ij}$  of this matrix is set to the number i = 1, ..., N of the subcircuit based on the dependence of the equation *i* on each of the state variables  $x_i, j = 1, ..., n$ .

Step 2 (Evaluation of Reduction Options): The obtained N models are analyzed and the following quantities are computed.

|      | $x_1$          | $x_2$ | $x_3$ | $x_4$ |   |          |          |   | • • • | $x_{n-1}$ | $x_n$ |
|------|----------------|-------|-------|-------|---|----------|----------|---|-------|-----------|-------|
| eq1: | $1^{1}$        | 1     | 1     | 0     | 1 | 0        | 0        | 0 |       | 0         | 0     |
| eq2: | 1              | 1     | 1     | 1     | 0 | 0        | 0        | 0 |       | 0         | 0     |
| eq3: | 0              | 1     | 1     | 0     | 1 | 0        | 0        | 0 |       | 0         | 0     |
| eq4: | 1              | 1     | 0     | 1     | 1 | 0        | 0        | 0 |       | 0         | 0     |
|      | 1              | 1     | 1     | 0     | 1 | 0        | 0        | 0 |       | 0         | 0     |
|      | 0              | 0     | 2     | 0     | 0 | <b>2</b> | <b>2</b> | 2 |       | 0         | 0     |
|      | 0              | 0     | 0     | 0     | 0 | 0        | <b>2</b> | 0 |       | 0         | 0     |
|      | 0              | 0     | 0     | 0     | 0 | <b>2</b> | 0        | 2 |       | 0         | 0     |
| :    | :              |       |       | :     |   |          | :        |   | :     |           | :     |
| •    | 1 :            | _     | _     | ÷     | - | -        | ·        | - | •     |           | :     |
|      | 0              | 0     | 0     | 0     | 0 | 0        | 0        | 0 |       | N         | 0     |
| eqn: | $\backslash N$ | 0     | 0     | 0     | 0 | 0        | 0        | 0 |       | 0         | $_N/$ |

Fig. 8. Dependence matrix.

1) The complexity *C* quantified as the total number of nonlinear terms (nonlinear device) and the total number of linear terms (linear devices) divided by two

$$C = \operatorname{nbr}(\operatorname{nonlinear\_terms}) + \frac{\operatorname{nbr}(\operatorname{linear\_terms})}{2}.$$
 (11)

2) The percentage  $p_i$  of the total simulation time  $T_{\text{tot}}$  of all the *N* models

$$p_i = \frac{T_i}{T_{\text{tot}}} \times 100 \tag{12}$$

where  $T_i$  is the simulation time for the subcircuit model *i* in (10).

Based on this analysis, the subcircuit model i which has the highest complexity and the largest percentage  $p_i$  of the total simulation time is reduced using the simple MOR method.

Step 3 (Reduction of the Model i): The simple MOR method, described in Fig. 3, is applied to the subcircuit model *i* leading to a speedup  $S_i$  that is computed as the simulation time of the original model *i* over its reduced model simulation time.

Step 4 (Reconstruction of the Circuit Model): This is done using the reduced subcircuit model *i* which leads to a partially reduced model. The expected speedup  $S_f$  of the reconstructed model is given in (13) where  $S_i$  is the speedup of the reduced subcircuit model *i*. It gives an idea of when the MOR in a hierarchical way can lead to a good overall speedup. In fact, if a part is consuming only  $p_i = 30\%$  of the total simulation time, the speedup limit we can reach by reducing this model is  $S_{\text{lim}} \approx 1.4$ . However, if a part is worth  $p_i = 90\%$  of the total simulation time, the speedup limit that can be reached by reducing this model is  $S_{\text{lim}} \approx 10$ . This emphasizes that the HMOR method is more effective for large circuits with many repeated similar entities

$$S_f = \frac{1}{1 + \frac{p_i}{100} \left(\frac{1}{S_i} - 1\right)}.$$
 (13)

Step 5 (Reduction Effect): The reconstructed model speedup and accuracy are checked by simulation. Based on the result of this step, either the reconstructed model is accepted or additional reduction effort is performed on a new candidate from the remaining subcircuit models.

#### D. Checking Conformance Criteria

The objective of this step is to check that the reduced model fulfills some conformance criteria (accuracy and speedup) and can effectively be used instead of the original model for system level simulations. Because of its piecewise nature, the reduced models cannot be as accurate as the original nonlinear model for all inputs and initial condition ranges. However, the proposed method strengthens the accuracy of the generated models by construction and through the use of QS that defines the performance bounds of the model. For example, if the original model has a strong nonlinear behavior related to the input variation, the reduced model is compared with the original model for complete ranges of inputs. The system in (14) below provides a few examples of conformance criteria used to verify that the reduced model is behaviorally equivalent to the original model, where the error tolerances  $\varepsilon_1, \varepsilon_2, \varepsilon_3$ , and  $\varepsilon_4$ depend on the level of accuracy of the application that will make use of the reduced model. In practice, relative error values are used to measure the accuracy of two curves and are considered probably acceptable and meaningful in the range 3%-5%, good if <2%, and excellent if <1% [29]

$$\frac{\|\hat{x} - x\|_{2}}{\|x\|_{2}} \leq \varepsilon_{1}$$

$$\frac{\|\hat{y} - y\|_{2}}{\|y\|_{2}} \leq \varepsilon_{2}$$

$$\|\operatorname{Freq}(y) - \operatorname{Freq}(\hat{y})\| \leq \varepsilon_{3}$$

$$\|\operatorname{dc}(y) - \operatorname{dc}(\hat{y})\| \leq \varepsilon_{4}$$

$$S(\operatorname{Model}(q)) = \frac{T_{\operatorname{sim}}(\operatorname{Model}(n))}{T_{\operatorname{sim}}(\operatorname{Model}(q))} \geq S_{\min}.$$
(14)

In order to check that the input-output relationship of the original model is preserved, the relative error threshold of the output is required to be ( $\varepsilon_2 \leq 2\%$ ). The relative error threshold of the state variables, which are expected to have some deviations from the original behavior without affecting the output, is required to be ( $\varepsilon_1 \leq 3\%-5\%$ ) or better. The frequency of the original model and the frequency of reduced model outputs have to be the same ( $\varepsilon_3 \leq 2\%$ ). The dc characteristics of the original and reduced models are required to be accurate  $(\varepsilon_4 \leq 2\%)$  in addition to preserving the same properties (stability, hysteresis, gain, noise margin, and so on) [30]. The reduced model has to achieve a minimum speedup compared with the original model. In addition, the conformance criteria are evaluated for a range of inputs and initial conditions. If the conformance criteria are not satisfied, the MOR parameters are further refined. During this step, the conformance criteria which are not met are investigated and the MOR parameters are iteratively adjusted until the requirements are satisfied, which leads to an acceptance of the reduced model Model(q).

## E. Reduced Model Refinement

The reduced model refinement step consists of tuning the parameters of the MOR method, which affects the accuracy and speedup of the reduced model, based on the conformance criteria checking result. The simulation traces of the reduced model are analyzed and the regions, where the conformance criteria are not met, are determined and used to update the

Algorithm 1 Reduced Model Refinement Example

1: Input:  $S_{min}$ , S(Model(q)),  $\varepsilon_1$ ,  $\varepsilon_2$ ,  $K = \{k_1, k_2, ..., k_R\}$ ,  $k_{max}, R_{max}, V, x, z, y, \hat{y}, q, q_{min}$ 2: **Output:**  $K_{up} = \{k_1, k_2, ..., k_{R_{up}}\}, R_{up}, V_{up}, q_{up}$ 3: if Error(x) or Error(y) then 4:  $i \leftarrow \operatorname{Find}(K, x, \hat{x}, y, \hat{y}, \varepsilon_1, \varepsilon_2)$ 5: if  $k_i \leq k_{max}$  then  $k_i \leftarrow \text{Increase}(k_i)$ 6: 7: else if  $R < R_{max}$  then  $R_{up} \leftarrow \text{Increase}(R, k_i)$ 8: 9: else Return("Accuracy cannot be improved!") 10: 11: end if 12:  $K_{up} \leftarrow \text{Verify}(\text{Equation (6)})$ 13: else if  $S(Model(q)) \leq S_{min}$  then if  $\frac{S(Model(q))}{S} > 0.75$  and  $R < R_{max}$  then 14:  $\frac{S_{min}}{R_{up}} \leftarrow \text{Increase}(R, K)$ 15: 16:  $K_{up} \leftarrow \text{Verify}(\text{Equation (6)})$ 17: else if  $q > q_{min}$  then  $q_{up} \leftarrow \text{Decrease}(q)$ 18: 19: else 20: Return("Speedup cannot be improved!") 21: end if 22: end if 23: for  $i = 1 ... R_{up}$  do  $V_i \leftarrow SVD(\cup_{j=1}^{k_i} V_j)$ 24: 25: end for 26:  $V_{up} \leftarrow \{V_1, V_2, \dots, V_{R_{up}}\}$ 

parameters, which can lead to a better speedup and accuracy tradeoff.

Algorithm 1 is a simplified form of the reduced model refinement process for the simple MOR method. It requires as inputs, in line 1, the conformance checking criteria, the simulation traces of Model(n) and Model(q) and the current MOR parameters and their maximal and minimal values. It outputs, in line 2, the number of regions R, the number of clusters  $k_i$  in each region, the unified projection basis for each region, and the reduction order q which is also the size of the reduced model. The accuracy criteria of the reduced model is addressed before the speedup criteria in a recursive way. In lines 3-9, the number of linearization points  $k_i$  is increased in order to fix the behavior of the reduced model in the region  $R_i$  (determined in line 4), which does not meet the accuracy conformance criteria. If the number of the linearization points  $k_i$  reach the limit  $k_{max}$ , the set of regions is increased by splitting the region  $R_i$  into two new regions. After updating the number of clusters or the number of regions, the piecewise linearized model accuracy is verified in line 12 using (6). In lines 13-16, the number of regions R is increased first if it is less than the limit  $R_{\text{max}}$  and the speedup value is within the 75%-100% of the target speedup  $S_{\min}$  and (6) is verified. Otherwise, the size of the reduced model q is decreased to improve its speedup S(Model(q)) in lines 17 and 18. In all refinement cases, the unified reduction basis  $V_i$  in all regions are updated in lines 23-25, since the piecewise linear approximations are changed according to the refinement process. The reduced model refinement process is performed independently for each of the failing regions. Because of that, it is usually performed



Fig. 9. Transmission line circuit with nonlinear diodes.

at the cost of several simulations of the reduced model for the failing regions until the required accuracy constraint is met. In lines 10 and 20, the algorithm outputs that no further refinements can be done to check the current conformance criteria. In this case, the HMOR might become a good alternative to the simple MOR method.

#### V. APPLICATIONS

In this section, the simple MOR method is applied to a nonlinear transmission line, a ring oscillator, a VCO, a PLL, and an analog comparator. In addition, we perform a PLL model reduction using the HMOR presented in Section IV-C. All simulations were performed in the MATLAB environment [11], on a *Windows* 7 operating system with an Intel core i7 CPU, 2.8 GHz with 24 GB of RAM. In the following applications, we refer to the size of the original model and the reduced model as Model(n) and Model(q), respectively, where n is the size of the original model, q is the size of the reduced model, and n > q.

#### A. Transmission Line

Fig. 9 shows the transmission line model that is a chain of connected resistor, capacitor, and nonlinear diode cells. The input current source is i(t) and all capacitors and resistors values are set to 1 *F* and 1  $\Omega$ , respectively. The behavior of the diodes is nonlinear and is given by  $I_d(v) = \exp(40v) + v - 1$ .

The full order model of the transmission line is given in (15), where  $x_1, \ldots, x_n$  are the circuit node voltages

$$\dot{x_1} = -I_d(x_1) - I_d(x_1 - x_2) + b \ i(t)$$
  
$$\dot{x_i} = I_d(x_{i-1} - x_i) - I_d(x_i - x_{i+1})$$
  
$$\dot{x_n} = I_d(x_{n-1} - x_n)$$
  
$$y = x_1.$$
 (15)

In this application, we use three main transient regions and k = 20 linearization points. We also consider three cases of inputs and problem sizes as follows.

Case I: 
$$i(t) = H(t-3)$$
,  $n = 1500$ ,  $q = 30$ .  
Case II:  $i(t) = \exp(-t)$ ,  $n = 1500$ ,  $q = 30$ .  
Case III:  $i(t) = \sin(2\pi t/10)$ ,  $n = 100$ ,  $q = 10$ .

Table I compares the simulation times for the transmission line of the TPWL [3] and PWP [4] methods as well as the one proposed in this paper. Although the shown results

 TABLE I

 Simulation Times and Speedup for the Nonlinear Transmission

 Line Circuit Using k = 20 Linearization Points

| Case | TPWL [3]                             | PWP [4]                          | Our Method                         |
|------|--------------------------------------|----------------------------------|------------------------------------|
| Ι    | $\frac{9573.30}{80.80} \simeq 118$   | $\frac{873.38}{17.16} \simeq 51$ | $\frac{810.39}{0.64} \simeq 1248$  |
| П    | $\frac{11713.10}{110.90} \simeq 105$ | $\frac{986.32}{43.53} \simeq 23$ | $\frac{1061.32}{0.82} \simeq 1284$ |
| Ш    | $\frac{25.40}{2.70} \simeq 9$        | $\frac{4.6}{2.4} \simeq 2$       | $\frac{1.84}{0.31} \simeq 6$       |



Fig. 10. Transmission line transient behavior.

TABLE IIAccuracy for the Transmission Line Circuit Using k = 20Linearization Points

| Case | TPWL [3]         | PWP [4]           | Our Method       |
|------|------------------|-------------------|------------------|
| Ι    | $0.3 \ 10^{-2}$  | $11.69 \ 10^{-2}$ | $0.12 \ 10^{-2}$ |
| Π    | $0.48 \ 10^{-2}$ | $4.27 \ 10^{-2}$  | $0.18 \ 10^{-2}$ |
| III  | $0.56 \ 10^{-2}$ | $0.56 \ 10^{-2}$  | $0.37 \ 10^{-2}$ |

were conducted using different processors, the speedup criteria can measure the improvement of the proposed MOR method. In fact, the generated reduced models are more than 10 times faster than the TPWL and PWP methods and 1000 times faster than the full nonlinear model for the first two large problem sizes.

Fig. 10 shows that the transient behavior of the full order model and the reduced order model problems of Table I is the same, for the above three simulation cases.

Table II shows that the proposed method reduced models mimic the behavior of their original models for the three considered problem cases. The accuracy criteria are satisfied for the different experiments and the relative errors of the state variables and the output are always  $<10^{-2}$ , that is the maximum acceptable error during the conformance criteria checking step. Unexpectedly, the PWP method, which employs local second-order Taylor polynomial approximations, is less accurate than the TPWL and our method, which employs only local linear approximations, for the first two problem cases. This can be explained by the fact that the aggregation of multiple projection basis for the first-order terms and second-order terms in the PWP method can sometimes lead to large errors. An additional drawback of the PWP is a large



Fig. 11. Ring oscillator circuit.

memory requirement to store the second-order matrices. For example, the size of the projection matrices of the second-order terms for the cases I and II is  $size(V \otimes V) = (1500 \times 30)^2$  which requires a minimum memory of  $(1500 \times 30)^2 \times 8$  bytes = 16.2 GB assuming a double precision number of 8 bytes. Therefore, this method is impractical and very computationally expensive for large models.

## B. Ring Oscillator

Fig. 11 represents a ring oscillator composed of a large odd number *n* of inverters (cascaded nMOS and pMOS transistors) connected in a circular chain. The node voltages  $x_i$  of each of the *n* inverters oscillates between the ground gnd = 0 V and the power  $V_{dd} = 1.8$  V.

The circuit model is given in (16), where  $x_i$ , i = 1, ..., n, are the node voltages, C = 0.164 fF and the functions  $I_n$  and  $I_p$  model the nonlinear current generated by the nMOS and pMOS transistors, respectively, based on their gate, drain, and source voltages. The initial conditions x(0) are represented by the fuzzy number  $\mu_{x(0)}$ 

$$\dot{x_1} = -\frac{1}{C} (I_n(x_n, x_1, \text{gnd}) + I_p(x_n, x_1, v_{dd}))$$
  

$$\dot{x_i} = -\frac{1}{C} (I_n(x_{i-1}, x_i, \text{gnd}) + I_p(x_{i-1}, x_i, v_{dd}))$$
  

$$y = x_n$$
  

$$x(0) = \mu_{x(0)}.$$
(16)

When using one transient region for the ring oscillator model, it is hard to reproduce the oscillation behavior of the full order Model(131) with the required speedup and accuracy constraints,  $(S_{\min} = 100)$  and  $((\|\hat{x} - x\|_2/\|x\|_2) \le 10^{-2}$ and  $(\|\hat{y} - y\|_2/\|y\|_2) \le 10^{-2})$ , respectively. This could be explained by the highly nonlinear initial startup transient region that needs to be accurately approximated by the reduced model. This highlights two key points of our method, namely, the need for a further subdivision of the transient behavior into smaller regions, and the QS that provides better coverage of the initial region.

Our method used ten transient regions to provide a reduced model that accurately mimics the full order model behavior as shown in Fig. 12, where the state vector of the original full order model x is represented by the solid line and the backward projection of the reduced order model  $\hat{x} = Vz$  is represented by the dotted line.

Table III presents the refinement of the MOR for the ring oscillator model. The speedup and accuracy requirements were satisfied after refinement of the number of clusters in each



Fig. 12. Ring oscillator internal state transient responses.

TABLE III REFINEMENT OF Model (51) FOR THE RING OSCILLATOR MODEL

| Number of<br>Clusters k | Speedup $S_{Ring}$               | $\frac{\ \hat{x}\!-\!x\ _2}{\ x\ _2}$ | $\frac{\ \hat{y}\!-\!y\ _2}{\ y\ _2}$ | Status   |
|-------------------------|----------------------------------|---------------------------------------|---------------------------------------|----------|
| 10                      | $\frac{142.33}{0.68} \simeq 207$ | $0.99 \ 10^{-2}$                      | $2.33 \ 10^{-2}$                      | rejected |
| 14                      | $\frac{142.33}{0.71} \simeq 199$ | $0.55 \ 10^{-2}$                      | $1.30 \ 10^{-2}$                      | rejected |
| 18                      | $\frac{142.33}{0.96} \simeq 147$ | $0.43 \ 10^{-2}$                      | $1.76 \ 10^{-2}$                      | rejected |
| 19                      | $\frac{142.33}{1.02} \simeq 139$ | $0.40 \ 10^{-2}$                      | $0.63 \ 10^{-2}$                      | accepted |



Fig. 13. VCO circuit.



Fig. 14. QS of the VCO output.

region with  $S_{\text{Ring}} = 139$  and  $(\|\hat{x} - x\|_2 / \|x\|_2) = 0.4 \ 10^{-2}$ and  $(\|\hat{y} - y\|_2 / \|y\|_2) = 0.63 \ 10^{-2}$ .

# C. Voltage Controlled Oscillator

In this section, we apply the MOR method to the current-starved VCO detailed in [31]. Fig. 13 shows the schematic of the VCO that consists of two main components: 1) an inverter chain composed of the pMOS and nMOS transistors at the center and 2) a current mirror structure (upper pMOS and lower nMOS transistors) that limits the current mirrored in each of the inverters.

The QS of the VCO output voltage using a fuzzy input  $\mu V_{invco}$  shows that the output voltage  $V_{out}$  is very sensitive to



Fig. 15. Comparison of the VCO output frequency.



Fig. 16. Accuracy of the reduced VCO model.

TABLE IV Comparison of the Effect of QS, Clustering, Regions Subdivision, and Weighting Local Models

| Without                                       | QS    | Clustering | Regions | Weights |
|-----------------------------------------------|-------|------------|---------|---------|
| $\frac{\ \hat{x}-x\ _2}{\ x\ _2} \ [10^{-2}]$ | 42.1  | 12.10      | 3.37    | 2.38    |
| $\frac{\ \hat{y}-y\ _2}{\ y\ _2} \ [10^{-2}]$ | 19.08 | 9.74       | 1.89    | 1.71    |
| $S_{VCO}$                                     | 24    | 5          | 11      | 27      |

the input voltage  $V_{invco}$ , as displayed in Fig. 14. In fact, the input voltage  $V_{invco}$  variations result in an oscillation frequency variation which explains why the output voltage bounds are so wide and do not reflect an oscillatory behavior. Using the full order VCO simulation, we determined 50 clusters within four main regions to build the reduced model. The obtained reduced VCO model has 35 state variables and is 23 times faster than the original VCO model [ $S_{VCO} = (8708.73 \text{ s}/378.61 \text{ s}) = 23$ ].

Fig. 15 compares the frequencies of the VCO full order *Model*(48) and the reduced *Model*(35) output signals, obtained using fast Fourier transform [11]. It shows that both models are oscillating at the exact frequencies for the specified input voltage range.

Fig. 16 (a) and (b) provide the relative errors between the state variables and the output of the original VCO *Model*(48) and the backward projection of the reduced VCO *Model*(35), respectively. It proves that the accuracy conformance criteria are verified by having  $(||\hat{x} - x||_2/||x||_2) \le 0.02$  and  $(||\hat{y} - y||_2/||y||_2) \le 0.02$ .

Table IV compares the effect of QS, clustering, region subdivisions, and the use of the weighted closest local models.



Fig. 17. PLL block description.

In each column, one component of the presented method is omitted and the obtained reduced VCO model is evaluated. When the QS is not used and is replaced with training the original VCO model in column 2, a slight perturbation of the input leads to large relative error values. This is due to the fact that important trajectories, which correspond to the input variation, were not considered and because of that the linearized model was not accurate. Using a linearization points selection procedure based on traversing the training trajectories and comparing the linear approximation with the original VCO trajectory instead of clustering in column 3 leads to an inaccurate approximation and a smaller speedup value. This is due to the larger number of linearization points and a resulting lossy projection matrix. The effect of omitting the region subdivision also leads to higher error and smaller speedup values. This is expected since a single projection matrix is used instead of four different projection matrices for each of the separate four behavioral regions. The last column 5 shows that the obtained reduced VCO model is fast but is not as accurate as the refined VCO model where all these techniques are employed as shown in Fig. 16.

## D. Phase-Locked Loop

We consider a CMOS-180-nm implementation of a PLL, as described in [31]. This circuit is frequently used in the front-end of modern ICs. It is a nonlinear frequency-control system that generates a clock signal that locks after a delay time (the locking time) given an input data signal. The PLL is said to be locked when the input and the feedback clock frequencies match. However, the PLL might not lock for various reasons, such as an input frequency out of the specified range, jitter, noise, reset situations, or when any of the PLL subcircuits is not behaving properly [32].

Fig. 17 shows a simplified block diagram of the major subcircuits of the PLL of this application. This PLL is modeled at the transistor level with a set of 62 differential equations and has the following main blocks.

- 1) A phase frequency detector (PFD), that detects the difference in phase and frequency between the input clock u and feedback clock  $y_d$ , and asserts an up or down control signal based on whether the feedback signal  $y_d$  frequency is lagging or leading the input u frequency.
- 2) A charge pump (CP) and a low-pass filter (LPF), that receives the up and down signals from the PFD and drives a current to the LPF if the up signal is high and draws a current from the LPF if the down signal is high.
- 3) A VCO, which is biased with the control voltage  $V_{invco}$  generated by the LPF subcircuit. It oscillates at a higher frequency if the  $V_{invco}$  signal increases and oscillates



Fig. 18. PLL lock-up comparison.



Fig. 19. PLL internal voltages comparison.

at a lower frequency, otherwise. Consequently, the oscillation frequency of the VCO affects directly the phase and frequency of the feedback clock signal *y*. It stabilizes at a fixed frequency when the output of the LPF settles to a dc voltage.

- 4) An inverter, which is a buffer inverting the output of the VCO.
- 5) A feedback divide-by-two subcircuit that increases the frequency of the VCO output signal generating a signal  $y_d$  that is input to the PFD.

The full order PLL Model(62) was reduced to Model(7) using 21 regions and 10 clusters in each of these regions. The first 18 regions were required to approximate the startup behavior and after the locking time, only three regions are required to reproduce the behavior of the original PLL.

Fig. 18 shows the verification of the locking property for two PLLs: 1) the original PLL (at the bottom) and 2) the reduced PLL (at the top). The VCO input signal reaches a stable value which makes both PLLs lock almost at the same time  $T_{\text{lock}} = 0.3 \ \mu \text{s}.$ 

Fig. 19 shows three internal voltages ( $x_{46}$ ,  $x_{47}$ , and  $x_{48}$  in the VCO circuit) of the full order PLL Model(62) on the left side and the reduced PLL Model(7) on the right side. The slight deviation of the signals is expected because the MOR process is lossy. However, this deviation does not affect the VCO output that still oscillates at the expected frequency.

Table V compares the PLL internal voltages with their respective approximations generated with the PLL reduced

| TABLE V                                                 |
|---------------------------------------------------------|
| REDUCED PLL PERFORMANCE FOR DIFFERENT INPUT FREQUENCIES |

| Frequency[Ghz] | $\frac{\ \hat{x}\!-\!x\ _2}{\ x\ _2}$ | $\frac{\ \hat{y} - y\ _2}{\ y\ _2}$ | $\begin{array}{c} \text{Speedup} \\ S_{PLL} \end{array}$ |
|----------------|---------------------------------------|-------------------------------------|----------------------------------------------------------|
| 1              | $1.87 \ 10^{-2}$                      | $0.38 \ 10^{-2}$                    | 41                                                       |
| 0.8            | $0.56 \ 10^{-2}$                      | $0.46 \ 10^{-2}$                    | 43                                                       |
| 0.6            | $1.03 \ 10^{-2}$                      | $0.23 \ 10^{-2}$                    | 43                                                       |

TABLE VI PLL Subcircuits Analysis Report

| Subcircuits      | Percentage of Simulation Time $p_i$ | No of<br>Equations | Complexity<br>C |
|------------------|-------------------------------------|--------------------|-----------------|
| PFD              | 1.92                                | 30                 | 60              |
| CP and LPF       | 0.71                                | 9                  | 14.5            |
| VCO and Inverter | 96.85                               | 768                | 1027.5          |
| Divide-by-Two    | 0.65                                | 6                  | 9               |

TABLE VII Comparison of PLL Simulation Results

| PLL<br>Reduction | Size<br>q | $\frac{\ \hat{x} - x\ _2}{\ x\ _2}$ | $\frac{\ \hat{y}\!-\!y\ _2}{\ y\ _2}$ | Speedup $S_{PLL}$ |
|------------------|-----------|-------------------------------------|---------------------------------------|-------------------|
| HMOR             | 60 + 45   | $0.35 \ 10^{-2}$                    | $4.49 \ 10^{-3}$                      | 31.36             |
| MOR              | 105       | $1.03 \ 10^{-2}$                    | $1.25 \ 10^{-2}$                      | 27.10             |
| HMOR             | 30 + 45   | $0.81 \ 10^{-2}$                    | $2.49 \ 10^{-3}$                      | 31.56             |
| MOR              | 75        | $3.53 \ 10^{-2}$                    | $2.89 \ 10^{-2}$                      | 30.93             |

model for different input clock frequencies. Basically, it shows that the reduction is quite accurate. In all cases, the reduced PLL of size Model(7) is >40 times faster than the full order model Model(62).

#### E. PLL Reduction Using Hierarchical MOR

Following the procedure described in Section IV-C, we now perform a subdivision of a PLL model, having 813 state variables, into four subcircuits and analyzed their performance, as shown in Table VI. The VCO (255 stages) and the inverter, which are modeled with 768 nonlinear equations, are the most computationally expensive parts that use 96.85% of the total PLL simulation time. These statistics make the VCO and the inverter subcircuit the ultimate candidate for the HMOR reduction type of the PLL. Based on the formula in (13), we can target at least a PLL simulation speedup of  $S_{PLL} = (1/1 + (96.85/100)((1/1000) - 1)) \approx 30$ , if the VCO and the inverter reduced subcircuit has a speedup  $S_{VCO+Inverter} \ge 1000$ .

Table VII compares the accuracy and speedup results when reducing the PLL as a flat circuit using the simple MOR method with reducing the VCO and the inverter subcircuits using the HMOR method. The original PLL Model(813) is reduced to Model(105) and Model(75) using the same number of linearization points and the same regions subdivision, in the first two and the last two rows, respectively. The reduced PLL models obtained via the HMOR method are more accurate and slightly faster than the ones via the simple MOR method. The better accuracy results when using the HMOR method are expected because the unreduced PLL subcircuits (PFD, CP, LPF, and divide-by-two) are



Fig. 20. Analog comparator circuit.

modeled with their original nonlinear equations while only the VCO and the inverter, which have the same uniform structure, are reduced. The speedup of the reduced subcircuit is  $S_{\text{VCO+Inverter}} \ge 1300$  and the obtained PLL speedup is as expected in both HMOR cases. The slightly smaller speedup for the simple MOR method is explained by the fact that replacing a small number of nonlinear equations for the subcircuits (PFD, CP, LPF, and divide-by-two) with a piecewise linear representation is not expected to lead to higher speedup values. In fact, the speedup is considerable only when the reduction ratio is important because the matrices involved in a reduced model are dense and the number of multiplications and summations becomes comparable with the nonlinear model case for small reduction ratios.

## F. Analog Comparator

ι

We consider a CMOS-180-nm analog comparator circuit, as described in [31]. It is a decision-making circuit composed of three main stages, as shown in Fig. 20; a pre-amplification circuit (a differential amplifier with active loads), a decision circuit (a positive feedback), and a post-amplification circuit (a self-biasing differential amplifier used as a buffer). If the positive input voltage  $v_p$  is greater than the negative input voltage  $v_m$ , the comparator output is set to the maximum voltage  $V_{out} = 1.8$  V. Otherwise, the comparator output is set to the minimum voltage  $V_{out} = 0$  V. In practice, the propagation delay, the sensitivity, and the noise rejection of the comparator are of a great concern.

Using a SPICE netlist of the comparator, we elaborated a differential model Model(16) having 16 state variables using a set of nominal parameters  $p_0$  representing the width and length of each of the 19 comparator transistors. Model(16) has been reduced using ten behavioral regions to a weighted piecewise linear model Model(5) having only five reduced state variables, as given in

$$\dot{z} = \hat{A}_{z_j}(z - z_j) + \hat{A}_{v_{pj}}(v_p - v_{pj}) + \hat{A}_{v_{mj}}(v_m - v_{mj}) + \hat{f}(x_j, t, v_{pj}, v_{mj}, p_0) \hat{x} = V_i^t z pout = \hat{x}(16).$$
(17)



Fig. 21. Comparator output for sine input  $v_m$ .



Fig. 22. Comparator output for DC sweep input.



Fig. 23. Comparator output for pulse input.

Fig. 21 compares the behavior of the comparator-reduced model Model(5) to the full order model, while the input is such that the input  $v_p$  is held to 1 V and the input  $v_m = 0.8 - 0.7 \cos(2\pi \times 10^8)$ . The output voltage  $V_{\text{out}}$  is accurately set to its maximum value when  $v_p \ge v_m$ .

Fig. 22 shows the case when the input  $v_m$  is held to 0.9 V and the input  $v_p$  is swept up and down between 0 and 1.8 V. This experiment shows that the reduced model has the same dc characteristic, the same offset voltage, and the same hysteresis behavior that prevents its output instability.

Fig. 23 shows the case when the input  $v_m$  is held to 1.2 V and the input  $v_p$  is a pulse of a 10-ns duration and reaching a

TABLE VIII Performance of the Comparator Model(5) for Different Inputs

| Input Case | $\frac{\ \hat{x}\!-\!x\ _2}{\ x\ _2}$ | $\tfrac{\ \hat{y} - y\ _2}{\ y\ _2}$ | Speedup $S_{comp}$          |
|------------|---------------------------------------|--------------------------------------|-----------------------------|
| Figure 21  | 0.044                                 | 0.107                                | $\frac{132.44}{12.06} = 11$ |
| Figure 22  | 0.045                                 | 0.063                                | $\frac{194.44}{21.57} = 9$  |
| Figure 23  | 0.014                                 | 0.107                                | $\frac{171.34}{13.46} = 13$ |

maximum voltage of 1.25 V. This experiment proves that the comparator reduced model has the same sensitivity of 50 mV.

Finally, Table VIII provides accuracy and speedup results for the reduced comparator Model(5) for different inputs. The simulation results of rows 1, 2, and 3 are shown, respectively, in Figs. 21–23.

### VI. CONCLUSION

This paper presented a methodology for MOR of nonlinear analog circuits. Different techniques, such as QS, k-means clustering, linearization, and Krylov projections were used to build reduced models, which are more robust to small signal and parameters variations. The selection of an initial guess of the number of linearization points was addressed as an optimization problem minimizing the behavior error between the full order model and its full order linearized form. The use of the input as a decision variable in an extended state-space enhanced the method and improved its accuracy. An HMOR method that subdivides a circuit model into different blocks was also added as an option to meet specific accuracy and reduction requirements and to optimize the reduction effort. The experimental results on several analog circuits show that the presented method is accurate and effective. The comparison of our results with two closely related methods [3], [4] shows better speedup and accuracy results for the transmission line model. However, our experiments with these methods for the remaining applications show that they fail to preserve their behaviors (relative errors >10%). In future work, the reduced model refinement process can be extended to generate the specifications and limitations of the reduced models automatically. In addition, a statistical or a learning-based method, which selects a reduced model from a set different reduced models given a target performance requirements, can be implemented. Furthermore, the implementation of a technique that handles the interface between adjacent behavioral regions can enhance the accuracy of the reduced models. Finally, the developed framework can be used to enhance statistical simulation and circuit synthesis.

#### REFERENCES

- A. C. Antoulas, Approximation of Large-Scale Dynamical Systems (Advances in Design and Control). Philadelphia, PA, USA: SIAM, 2005.
- [2] S. Tab and L. He, Advanced Model Order Reduction Techniques in VLSI Design. New York, NY, USA: Cambridge Univ. Press, 2007.
- [3] M. Rewienski and J. White, "A trajectory piecewise-linear approach to model order reduction and fast simulation of nonlinear circuits and micromachined devices," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 22, no. 2, pp. 155–170, Nov. 2006.

- [4] N. Dong and J. Roychowdhury, "General-purpose nonlinear model-order reduction using piecewise-polynomial representations," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 27, no. 2, pp. 249–264, Feb. 2008.
- [5] H. Aridhi, M. H. Zaki, and S. Tahar, "Towards improving simulation of analog circuits using model order reduction," in *Proc. ACM/IEEE Design, Autom., Test Eur. Conf. Exhibit.*, Mar. 2012, pp. 1337–1342.
- [6] M. Hinze, M. Kunkel, and M. Vierling, "POD model order reduction of drift-diffusion equations in electrical networks," in *Model Reduction for Circuit Simulation* (Lecture Notes in Electrical Engineering), vol. 74. Amsterdam, The Netherlands: Springer-Verlag, 2011, pp. 177–192.
- [7] M. Honkala, P. Miettinen, J. Roos, and C. Neff, "Hierarchical modelorder reduction flow," in *Scientific Computing in Electrical Engineering* (Mathematics in Industry). Berlin, Germany: Springer-Verlag, 2010, pp. 539–546.
- [8] G. Bontempi, "Simulating continuous dynamical systems under conditions of uncertainty: The probability and the possibility approaches," in *Fuzzy Partial Differential Equations and Relational Equations: Reservoir Characterization and Modeling* (Studies in Fuzziness and Soft Computing), vol. 142. Berlin, Germany: Springer-Verlag, 2004, pp. 130–152.
- [9] C. Jacoboni and P. Lugli, The Monte Carlo Method for Semiconductor Device Simulation. New York, NY, USA: Springer-Verlag, 1989.
- [10] R. W. Freund, "Krylov-subspace methods for reduced-order modeling in circuit simulation," *Comput. Appl. Math.*, vol. 123, nos. 1–2, pp. 395–421, 2000.
- [11] (Mar. 2015). MATLAB. [Online]. Available: http://www.mathworks.com/
- [12] Y. M. Lee and C. C.-P. Chen, "Hierarchical model order reduction for signal-integrity interconnect synthesis," in *Proc. 11th ACM Great Lakes Symp. VLSI*, 2001, pp. 109–114.
- [13] Y. Cao, Y.-M. Lee, T.-H. Chen, and C. C.-P. Chen, "HiPRIME: Hierarchical and passivity reserved interconnect macromodeling engine for RLKC power delivery," in *Proc. 39th ACM/IEEE Design Autom. Conf.*, Jun. 2002, pp. 379–384.
- [14] L. Sirovich, "Turbulence and the dynamics of coherent structures. I—Coherent structures. II—Symmetries and transformations. III—Dynamics and scaling," *Quart. Appl. Math.*, vol. 45, no. 3, pp. 561–571, 1987.
- [15] J. R. Phillips, "Projection frameworks for model reduction of weakly nonlinear systems," in *Proc. ACM/IEEE Design Autom. Conf.*, Jun. 2000, pp. 184–189.
- [16] W. J. Rugh, Nonlinear System Theory: The Volterra/Wiener Approach. Baltimore, MD, USA: The Johns Hopkins Univ. Press, 1981.
- [17] L. Feng, "Review of model order reduction methods for numerical simulation of nonlinear circuits," *Appl. Math. Comput.*, vol. 167, no. 1, pp. 576–591, 2005.
- [18] Y. Chen, "Model order reduction for nonlinear systems," M.S. thesis, Dept. Math., Massachusetts Inst. Technol., Cambridge, MA, USA, 1999.
- [19] C. Gu, "QLMOR: A projection-based nonlinear model order reduction approach using quadratic-linear representation of nonlinear systems," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 30, no. 9, pp. 1307–1320, Sep. 2011.
- [20] S. A. Nahvi, M. Nabi, and S. Janardhanan, "Nonlinearity-aware submodel combination in trajectory based methods for nonlinear MOR," *Math. Comput. Simul.*, vol. 94, pp. 127–144, Aug. 2013.
- [21] S. A. Nahvi, M. Nabi, and S. Janardhanan, "Adaptive sampling of nonlinear system trajectory for model order reduction," in *Proc. Int. Modeling, Identificat., Control Conf.*, 2012, pp. 1249–1255.
- [22] M. U. Farooq, L. Xia, F. A. B. Hussin, and A. S. Malik, "Automated model generation of analog circuits through modified trajectory piecewise linear approach with Chebyshev Newton interpolating polynomials," in *Proc. 4th Int. Conf. Intell. Syst. Modeling Simulation*, 2013, pp. 605–609.
- [23] C. Gu and J. Roychowdhury, "Model reduction via projection onto nonlinear manifolds, with applications to analog circuits and biochemical systems," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design*, Nov. 2008, pp. 85–92.
- [24] D. De Jonghe and G. Gielen, "Characterization of analog circuits using transfer function trajectories," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 8, pp. 1796–1804, Aug. 2012.
- [25] P. Winkler, H. Aridhi, M. H. Zaki, and S. Tahar, "Generation of reduced analog circuit models using transient simulation traces," in *Proc. 24th* ACM Great Lakes Symp. VLSI, 2014, pp. 305–310.

- [26] B. N. Bond *et al.*, "Compact modeling of nonlinear analog circuits using system identification via semidefinite programming and incremental stability certification," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 29, no. 8, pp. 1149–1162, Aug. 2010.
- [27] Y. Zhang, H. Liu, Q. Wang, N. Fong, and N. Wong, "Fast nonlinear model order reduction via associated transforms of high-order Volterra transfer functions," in *Proc. 49th ACM/EDAC/IEEE Design Autom. Conf.*, Jun. 2012, pp. 289–294.
- [28] O. Lahiouel, H. Aridhi, M. H. Zaki, and S. Tahar. (Oct. 2011). "A tool for modeling and analysis of electronic circuits and systems," Dept. Elect. Comput. Eng., Concordia Univ., Montreal, QC, Canada, Tech. Rep. [Online]. Available: http://hvg.ece.concordia.ca/Publications/TECH\_REP/TMAAC\_TR12/ TMAAC\_TR12.pdf
- [29] P. R. Bevington and D. K. Robinson, Data Reduction and Error Analysis for the Physical Sciences. New York, NY, USA: McGraw-Hill, 1969.
- [30] O. Lahiouel, H. Aridhi, M. H. Zaki, and S. Tahar, "Enabling the DC solutions characterization using a fuzzy approach," in *Proc. IEEE Int. Conf. New Circuits Syst.*, Jun. 2014, pp. 161–164.
- [31] R. J. Baker, CMOS: Circuit Design, Layout, and Simulation. New York, NY, USA: Wiley, 2010.
- [32] I. Seghaier, H. Aridhi, M. H. Zaki, and S. Tahar, "A qualitative simulation approach for verifying PLL locking property," in *Proc. 24th* ACM Great Lakes Symp. VLSI, 2014, pp. 317–322.



Henda Aridhi (M'15) received the Engineering Diploma and master's degrees in electrical engineering from the Tunisia Polytechnic School, La Marsa, Tunisia, in 2005 and 2008, respectively, and the Ph.D. degree in electrical engineering from Concordia University, Montréal, QC, Canada, in 2015. She continued her graduate studies in the field of analog and mixed signal systems under the supervision of Prof. S. Tahar with the Hardware Verification Group, Concordia University. She was with CISCO Systems, Inc., San Jose, CA,

USA, from 2005 to 2008, as a SW/HW Verification Engineer. Her current research interests include design and verification of digital and analog systems, model order reduction for nonlinear systems, and formal hardware verification.



**Mohamed H. Zaki** (M'05) received the bachelor's degree in electrical engineering from Ain Shams University, Cairo, Egypt, in 2000, and the master's and Ph.D. degrees from Concordia University, Montréal, QC, Canada, in 2002 and 2008, respectively.

He continued his graduate studies with the Hardware Verification Group, Concordia University, under the supervision of Prof. S. Tahar. He then conducted a post-doctoral training with the Department of Computer Science, University of

British Columbia, Vancouver, BC, Canada. He has been a Consultant at the start-up company Innovia Technologies, Cambridge, U.K., since 2010. He is currently a Visiting Research Associate of AMS Design Verification with Concordia University.



**Sofiène Tahar** (M'96–SM'07) received the Diploma degree in computer engineering from the University of Darmstadt, Darmstadt, Germany, in 1990, and the Ph.D. (Hons.) degree in computer science from the University of Karlsruhe, Karlsruhe, Germany, in 1994.

He is currently a Professor and Research Chair in formal verification of Systems-on-Chip with the Department of Electrical and Computer Engineering, Concordia University, Montréal, QC, Canada. He has made contributions and

published papers in the areas of formal hardware verification, system-on-chip verification, AMS circuit verification, and probabilistic, statistical, and reliability analysis of systems. He is also the Founder and Director of the Hardware Verification Group with Concordia University.

Dr. Tahar is a Senior Member of the Association for Computing Machinery and a Professional Engineer in the Province of Quebec. He was named University Research Fellow upon receiving Concordia University's Senior Research Award in 2007.