Welcome back - STA concept 3
Standard cell Library
This chapter describes about timing information present in the standard cell library. I could be NOR or a IP or any cell. In addition to timing information, the library cell description contains several attributes such as cell area and functionality, which are unrelated to timing but are relevant during the RTL synthesis process. In this chapter, we focus only on the attributes relevant to the timing and power calculations.
The initial sections in this chapter describe the linear and the non-linear timing models followed by advanced timing models for nanometer technologies
1. Pin Capacitance:
Every input and output of a cell can specify capacitance at the pin. In most cases, the capacitance is specified only for the cell inputs and not for the outputs, that is, the output pin capacitance in most cell libraries is 0.
2. Timing Models:
The cell timing models are intended to provide accurate timing for various instances of the cell in the design environment. The timing models are nomally obtained from detailed circuit simulations of the cell to model the actual scenario of the cell operation. The timing models are specified for each timing arc of the cell. The delay for the timing arc through the inverter cell is dependent on two factors: i.the output load, that is, the capacitance load at the output pin of the inverter, and ii. the transition time of the signal at the input The delay values have a direct correlation with the load capacitance- the larger the load capacitance, the larger the delay. In most cases, the delay increases with increasing input transition time. There are a few scenarios where the input threshold (used for measuring delay) is significantly different from the internal switching point of the cell. In such cases, the delay through the cell may show non-monotonic behavior with respect to the in put transition time- a larger input transition time may produce a smaller delay especially if the output is lightly loaded.
The slew at the output of a cell depends mainly upon the output capacitance- output transition time increases with output load. Thus, a large slew at the input (large transition time) can improve at the output depending upon the cell type and its output load. Figure shows cases where the transition time at the output of a cell can improve or deteriorate depending on the load at the output of the cell.
3.1 Linear Timing Model:
A simple timing model is a linear delay model, where the delay and the out put transition time of the cell are represented as linear functions of the two parameters: input transition time and the output load capacitance. The general form of the linear model for the delay, D, through the cell is illus trated below. D = D0 + D1 * S + D2 * C where D0, D1, D2 are constants, S is the input transition time, and C is the output load capacitance. The linear delay models are not accurate over the range of input transition time and output capacitance for submicron technologies, and thus most cell libraries presently use the more complex models such as the non-linear delay model.
3.2 Non Linear delay models:
Most of the cell libraries include table models to specify the delays and timing checks for various timing arcs of the cell. Some newer timing libraries for nanometer technologies also provide current source based advanced timing models (such as CCS, ECSM, etc.) which are described later in this chapter. The table models are referred to as NLDM (Non-Linear Delay Model) and are used for delay, output slew, or other timing checks. The table models capture the delay through the cell for various combinations of input transition time at the cell input pin and total output capacitance at the cell output.
An NLDM model for delay is presented in a two-dimensional form, with the two independent variables being the input transition time and the output load capacitance, and the entries in the table denoting the delay.
Based upon the delay tables, an input fall transition time of 0.3ns and an output load of 0.16pf will correspond to the rise delay of the inverter of
0.1018ns. Since a falling transition at the input results in the inverter output rise, the table lookup for the rise delay involves a falling transition at the
inverter input. This form of representing delays in a table as a function of two variables, transition time and capacitance, is called the non-linear delay model, since non-linear variations of delay with input transition time and load capacitance are expressed in such tables.
The NLD Mmodels are used not only for the delay but also for the transition time at the output of a cell which is characterized by the input transition time and the output load. Thus, there are separate two-dimensional tables for computing the output rise and fall transition times of a cell.
As illustrated above, an inverter cell with an NLDM model has the following tables: • Risedelay • Falldelay • Risetransition • Falltransition
In cases where the lookup does not correspond to any of the entries available in the table, two-dimensional interpolation is utilized to provide the resulting timing value. The two nearest table indices in each dimension are chosen for the table interpolation. Note that the equations above are valid for interpolation as well as extrapolation.
3.3 Threshold specifications and slew derating
Slew (transition time) = how long it takes a signal to switch from low to high (or high to low). Slew = time between crossing 30% and crossing 70% Older technologies: Signal transition was slow and mostly LINEAR across the full swing: Linear region was wide → 10% to 90% was fine Newer technologies: Signal transition is FASTER with curved edges, Linear region is now only between 30% and 70% → More accurate to measure here.
The problem: Now there is compatibility problem: Old libraries measured: 10% → 90% (wide window) New libraries measure: 30% → 70% (narrow window)
Same physical transition: 10/90 measurement = 1.0ns (let’s say) 30/70 measurement = 0.5ns (roughly half the time)
If you mix old and new libraries in the same design: → Timing tools get confused → Constraints written for 10/90 don’t match 30/70 values → Timing analysis breaks
The Fix: Slew Derate Factor Why 30/70? More linear, more accurate at fine nodes Why derate 0.5? Makes 30/70 numbers equivalent to 10/90 Why 10/90 equiv? Backward compatibility with older tools and constraint methodology
Physical reality: 30/70 window ≈ half of 10/90 window
Derate fix: multiply by 2 (divide by 0.5)
Net result: accurate characterization +
compatible with legacy flows
3.4 Timing Models Sequential Cells:
3.4.1 Synchronous Checks: Setup and Hold
Negative Values in Setup and Hold Checks Notice that some of the hold values in the example above are negative.This is acceptable and normally happens when the path from the pin of the flip flop to the internal latch point for the data is longer than the corresponding path for the clock. Thus, a negative hold check implies that the data pin of the flip-flop can change ahead of the clock pin and still meet the hold time check. The setup values of a flip-flop can also be negative. This means that at the pins of the flip-flop, the data can change after the clock pin and still meet the setup time check. Can both setup and hold be negative? No; for the setup and hold checks to be consistent, the sum of setup and hold values should be positive. Thus, if the setup (or hold) check contains negative values- the corresponding hold (or setup) should be sufficiently positive so that the setup plus hold value is a positive quantity.
For flip-flops, it is helpful to have a negative hold time on scan data input pins. This gives flexibility in terms of clock skew and can eliminate the need for almost all buffer insertion for fixing hold violations in scan mode (scan mode is the one in which flip-flops are tied serially forming a scan chain- output of flip-flop is typically connected to the scan data input pin of the next flip-flop in series; these connections are for testability).
3.4.2 Asynchronous Checks:
Recovery and Removal checks Asynchronous pins such as asynchronous clear or asynchronous set override any synchronous behavior of the cell. When an asynchronous pin is active, the output is governed by the asynchronous pin and not by the clock latching in the data inputs. However, when the asynchronous pin be comes inactive, the active edge of the clock starts latching in the data input. The asynchronous recovery and removal constraint checks verify that the asynchronous pin has returned unambiguously to an inactive state at the next active clock edge. The recovery time is the minimum time that an asynchronous input is stable after being de-asserted before the next active clock edge. Similarly, the removal time is the minimum time after an active clock edge that the asynchronous pin must remain active before it can be de-asserted. Pulse Width Checks In addition to the synchronous and asynchronous timing checks, there is a check which ensures that the pulse width at an input pin of a cell meets the minimum requirement. For example, if the width of pulse at the clock pin is smaller than the specified minimum, the clock may not latch the data properly. The pulse width checks can be specified for relevant synchronousandasynchronous pins also.The minimumpulse widthchecks canbe specified for high pulse and also for low pulse.
3.4.3 Propagation Delay
The propagation delay of a sequential cell is from the active edge of the clock to a rising or falling edge on the output. Here is an example of a propagation delay arc for a negative edge-triggered flip-flop, from clock pin CKN to output Q. This is a non-unate timing arc as the active edge of the clock can cause either a rising or a falling edge on the output Q. Here is the delay table:
3.4.4 State Dependant Models
In manycombinational blocks, the timing arcs between inputs and outputs depend on the state of other pins in the block. These timing arcs between input and output pins can be positive unate, negative unate, or both positive as well as negative unate arcs. An example is the xor or xnor cell where the timing to the output can be positive unate or negative unate. In such cases, the timing behaviors can be different depending upon the state of other inputs of the block. In general, multiple timing models depending upon the states of the pins are described. Such models are referred to as state-dependent models.
3.5 Interface Timing Model for a BlackBox
In summary, a black box model can have the following timing arcs: i.Input to output timing arcs for combinational logic paths. ii. Setup and hold timing arcs from the synchronous inputs to the related clock pins. iii. Recovery and removal timing arcs for the asynchronous inputs to the related clock pins. iv. Output propagation delay from clock pins to the output pins. The interface timing model as described above is not intended to capture the internal timing of the black box, but only the timing of its interfaces.
3.6 Advanced Timing Models
The timing models, such as NLDM,represent the delay through the timing arcs based upon output load capacitance and input transition time. In real ity, the load seen by the cell output is comprised of capacitance as well as interconnect resistance. The interconnect resistance becomes an issue since the NLDMapproach assumes that the output loading is purely capacitive. Even with non-zero interconnect resistance, these NLDM models have been utilized when the effect of interconnect resistance is small. In presence of resistive interconnect, the delay calculation methodologies retrofit the NLDMmodels by obtaining an equivalent effective capacitance at the output of the cell. The “effective” capacitance methodology used within delay calculation tools obtains an equivalent capacitance that has the same delay at the output of the cell as the cell with RC interconnect. As the feature size shrinks, the effect of interconnect resistance can result in large inaccuracy as the waveforms become highly non-linear. Various modeling approaches provide additional accuracy for the cell output drivers. Broadly, these approaches obtain higher accuracy by modeling the output stage of the driver by an equivalent current source. Examples of these approaches are- CCS (Composite Current Source), or ECSM (Effective Current Source Model). For example, the CCS timing models provide the additional accuracy for modeling cell output drivers by using a time-varying and voltage-dependent current source. The timing information is provided by specifying detailed models for the receiver pin capacitance1 and output charging currents under different scenarios. The details of the CCS model are described next.
3.6.1 Receiver Pin Capacitance
Thereceiver pin capacitance corresponds to the input pin capacitance specified for the NLDM models. Unlike the pin capacitance for the NLDM models, the CCS models allow separate specification of receiver capacitance in different portions of the transitioning waveform. Due to intercon nect RC and the equivalent input non-linear capacitance due to the Miller effect from the input devices within the cell, the receiver capacitance value varies at different points on the transitioning waveform. This capacitance is thus modeled differently in the initial (or leading) portion of waveform versus the trailing portion of the waveform.
Thereceiver pin capacitance can be specified at the pin level (like in NLDM models) where all timing arcs through that pin use that capacitance value. Alternately, the receiver capacitance can be specified at the timing arc level in which case different capacitance models can be specified for different timing arcs. These two methods of specifying the receiver pin capacitance are described next.
3.7 Power Dissipation Modelling
The cell library contains information related to power dissipation in the cells. This includes active power as well as standby or leakage power. As the names imply, the active power is related to the activity in the design whereas the standby power is the power dissipated in the standby mode, which is mainly due to leakage.
3.7.1 Active Power
The active power is related to the activity at the input and output pin of the cell. The active power in the cell is due to charging of the output load as well as internal switching. These two are normally referred to as output switching power and internal switching power respectively. The output switching power is independent of the cell type and depends only upon the output capacitive load, frequency of switching and the power supply of the cell. P = cvf. The internal switching power depends upon the type of the cell and this value is thus included in the cell library. The specification of the internal switching power in the library is described next.
The internal switching power is referred to as internal power in the cell library. This is the power consumption within the cell when there is activity at the input or the output of the cell. For a combinational cell, an input pin transition can cause the output to switch and this results in internal switching power. Switching power can be dissipated even when the outputs or the internal state does not have a transition. A common example is the clock that toggles at the clock pin of a flip-flop. The flip-flop dissipates power with each clock toggle- typically due to switching of an inverter inside of the flip flop cell. The power due to clock pin toggle is dissipated even if the flip flop output does not switch. Thus for sequential cells, the input pin power refers to the power dissipation internal to the cell, that is, when the outputs do not transition.Elaborate more later
3.7.2 Leakage Power
Most standard cells are designed such that the power is dissipated only when the output or state changes. Any power dissipated when the cell is powered but there is no activity is due to non-zero leakage current. The leakage can be due to subthreshold current for MOS devices or due to tunneling current through the gate oxide. In the earlier generations of CMOS process technologies, the leakage power has been negligible and has not been a major consideration during the design process. However, as the technology shrinks, the leakage power is becoming significant and is no longer negligible in comparison to active power.
As described above, the leakage power contribution is from two phenomena: subthreshold current in the MOS device and gate oxide tunneling. By using high Vt cells1, one can reduce the subthreshold current; however, there is a trade-off due to reduced speed of the high Vt cells. The high Vt cells have smaller leakage but are slower in speed. Similarly, the low Vt cells have larger leakage but allow greater speed. The contribution due to gate oxide tunneling does not change significantly by switching to high (or low) Vt cells. Thus, a possible way to control the leakage power is to utilize high Vt cells. Similar to the selection between high Vt and standard Vt cells, the strength of cells used in the design is a trade-off between leakage and speed. The higher strength cells have higher leakage power but provide higher speed.
The subthreshold MOS leakage has a strong non-linear dependence with respect to temperature. In most process technologies, the subthreshold leakage can grow by 10x to 20x as the device junction temperature is increased from 25C to 125C. The contribution due to gate oxide tunneling is relatively invariant with respect to temperature or the Vt of the devices. The gate oxide tunneling which was negligible at process technologies 100nm and above, has become a significant contributor to leakage at lower temperatures for 65nm or finer technologies. For example, gate oxide tunneling leakage may equal the subthreshold leakage at room temperature for 65nm or finer process technologies. At high temperatures, the sub threshold leakage continues to be the dominant contributor to leakage power.
