Verilog basics
1) verilog code to swap contents of 2 registers with and without temporary registers. with temp reg:
Electrical Engineer from Somewhere
1) verilog code to swap contents of 2 registers with and without temporary registers. with temp reg:
In this article we will discuss how 6T SRAM works.
RISC: Sequence of simple operations. CISC: A few no. of complex instructions for a same task. Instructions of variable length and variable cycles to complete.
There are the following steps to find the minterm solution or K-map:
Floorplanning a chip or block is an important task of PD in which location, size and shape of soft modules, and placement of hard macros are decided. Floorplanning sometimes can also include I/O pad, pin placement, bump assignment, bus planning, power planning and more.
RF is large signal array. SRAM is small signal array. SRAM uses bit line differential signalling and hence generally faster than register files.
Assume there is a N1 net and has a rising transition at output and there is an aggressor and cc between these two nets.
Glitch noise validates integrity of steady signal in presence of various noise sources and its impact on downstream circuit.
In some cases tool reports negative net/cell delay. The delay for a net/cell is the time it takes for the output signal to reach threshold point after the input signal reaches to threashold point.
LBIST The LBIST (Logic built in self test) is inserted into a design to generate patterns for self-testing.
In this article we shall discuss about the different constraints we give for a block.
In this we will understand how tool calculates stage counts and distances on AOCV for launch and capture timing paths and gets corresponding AOCV derate values.
Scan chains are the elements in scan-based designs that are used to shift-in and shift-out test data. A scan chain is formed by a number of flops connected back to back in a chain with the output of one flop connected to another. The input of first flop is connected to the input pin of the chip (called scan-in) from where scan data is fed. The output of the last flop is connected to the output pin of the chip (called scan-out) which is used to take the shifted data out.
Let us discuss various scenarios where we migh see miscorrelation btw the diff tool results in this article.
By default tool does not model sequential constant registers.
For full flat chip timing analysis we need to read in gate level netlist alogn with spef/sdf, timing libraries and constraints. Using this approach designers should wait till all blocks completion prior to performing full chip timing.
Clock gating check is a constraint, either applied or inferred automatically by tool, that ensures that the clock will propagate without any glitch through the gate.
In hierarchical design flows, chip-level timing constraints must be mapped correctly to corresponding block-level constraints.
STA can be run for many different scenarios. The three main variables that determine a scenario are:
The cell timing models are intended to provide accurate timing for various instances of the cell in the design environment. The timing models are normally obtained from detailed circuit simulations of the cell to model the actual scenario of the cell operation. The timing models are specified for each timing arc of the cell.
Net delay is the diff btw the time a signal is first applied to the net and the time it reaches other devices connected to that net.
RC variation is also considered as corners for the setup and hold checks. RC variation can happen because of fabrication process and the width of metal layer can vary from the desired one.
We could also get latency requirements from top level.
First we will go over power distribution network issues.
We need to understand whether the clock domains are related or independent of each other. It depends on whether there are any data paths that start from one clock domain and end in the other clock domain. If there are no such paths, you can safely conclude that the two clock domains are independent of each other. This means that there is no timing path that starts from one clock domain and ends in the other clock domain.
Via pillar is a new technology that aims to reduce via resistance and increase electromigration robustness for enhanced performance.
For Flops, data arrival later than capture clock edge causes SETUP violation. Whereas Latch remains transparent for entire duration of active clock edge, relaxing arrive-before-edge criterion.
This is a follow up article based on previously written io timing miscorrelation article.
A source-synchronous interface outputs the clock in addition to the data constrained with it. This clock port is used to sample data at the receiver that connects to the interface. There are different categories of source-synchronous interfaces.
A virtual clock is a clock that exists but is not associated with any pin or port of the design. It is used as a reference in timing analysis to specify the input and output delays relative to a clock.
clock_expected This warning indicates a missing signal of type clock on a clock pin ofterm termed as clock phase. If there is any active, ie, non-disabled sequential arc from or to the clock pin & clock pin does not get clock signal, then this warning is flagged.
We will discuss various ways to fix timing in synthesis.
The clock input pins of macros (.lib model) typically need to be balanced earlier than other sinks because of internal clock paths inside the macro. How can the clock latency of a hard IP be specified for CTS?
Hold timing analysis can be performed early in flow to identify hard macros which have large hold time requirements. Identifying these situations early allows planning for enough space to insert the required buffers and delay cells to fix them.
A metal-only ECO is carried out by changing only metal interconnects in the design. Metal-only ECOs are very common in today’s semiconductor industry as they save complete silicon re-spin. Sometimes there may be need to change the design for various reasons, and that too, a minor change. These changes may be due to some bug in the design or due to customer demand. A metal-only ECO enables the design to be re-fabricated only for a few layers. It is very cost-effective as for complete silicon re-spin, there may be a requirement of around 100 layer masks to be manufactured. Metal-only ECOs enable the older masks to be used for most of the layers. Only the layers with changes in them need to be manufactured again, which is usually 2 to 4 in case of metal-only ECOs.
In this article we will discuss in details about the affects in cell delay calculation and interconnect delay calculation.
Performance/Power/Area are the key metrics to validate the functionality of any design on a given technology node. In this article we will go over various place-and-route techniques to achieve these key metrics.
Wire delay comes from two sources. One is the intrinsic, speed-of-light delay. The second is the lossy nature of on-chip wires; because the resistance of such wires is very high, the wires form RC circuits. Speed of light delay is proportional to the length of the wire, while RC delay increases with the square of the wire length. Thus, for long wires, RC delay dominates. For shorter wires, speed of light delay still dominates. However, such delay over a short wire is still relatively small compared to gate delay.
As the technology nodes are shrinking consistently, the probability of the occurrence of faults is also increasing which makes DFT an indispensable function for modern sub-micron SoCs.
These are timing checks for asynchronous signals similar to the setup and hold checks. Enables recovery and removal timing model checks to be performed during timing analysis.
Firstly, let us answer the question what are data to data timing checks and why we need them.
In this article we shall discuss various optimization steps involved in synthesis.
MIM (Metal-Insulator-Metal) and MOM (Metal-Oxide-Metal) capacitors are both metal-to-metal capacitors.
If two clock domains are asynchronous and you have applied set_false_path between these two clocks, no timing checks can be performed. Also, if you have defined a clock group with asynchronous clocks using the set_clock_groups command with the -asynchronous option, by default the tool cannot perform a timing check. But if you use the -allow_paths option with the set_clock_groups command, timing check can be performed.
Problem Statement: One path has 100% gate delay, second path has 50% gate delay and 50% interconnect delay, third path has 100% interconnect delay. These 3 different paths converged at a voltage and temperature. What would happen to each path if voltage and temperature changed? Which would be fast and which would be slow?
By definition, clock jitter is the deviation of a clock edge from its ideal position in time. Simply speaking, it is the inability of a clock source to produce a clock with clean edges. As the clock edge can arrive within a range, the difference between two successive clock edges will determine the instantaneous period for that cycle. So, clock jitter is of importance while talking about timing analysis. There are many causes of jitter including PLL loop noise, power supply ripples, thermal noise, crosstalk between signals etc. Let us elaborate the concept of clock jitter with the help of an example:
Any discussion of clock domain crossing (CDC) should start with a basic understanding of metastability and synchronization. In layman’s terms, metastability refers to an unstable intermediate state, where the slightest disturbance will cause a resolution to a stable state. When applied to flip-flops in digital circuits, it means a state where the flip-flop’s output may not have settled to the final expected value.
There can be multitude of reasons for congestion, with some reasons having a direct and others having an indirect impact. Let’s examine some.
1) def
Here we discuss about various types of design rule checks (DRC) violation, their causes and how to fix the various design rule checks (DRC) at lower technology node on block level as well as full chip level implementation while meeting the design rule with respect to latest technology standards.
Input files for LVS in ICV tool are listed below:
Latch
One of the primary challenges is variation in manufacturing parameters, namely random and systematic variations. To model these parameter variations, a few engineers came up with the on-chip variation (OCV) model. The concept of OCV was first introduced in technology nodes above 90nm. The fundamental idea behind OCV is to apply global derates on the whole design irrespective of the type of cells, its individual variation or its slew-load conditions. But this simple concept became ineffective in lower technology nodes. Unfortunately, global derates make the design too optimistic for shorter paths and too pessimistic for longer paths. Subsequently, expected results are not accurate and reliable enough, which affects the performance of the chip.
In all, there are two phenomenon that govern the conductivity in any device-
If the number of routing tracks available in one particular area is less than the required number of routing tracks then it is called congestion.
WIRE TO WIRE SPACING(MIN SPACING) MIN WIDTH OF WIRES VIA TO VIA SPACINGS NOTCH AVOIDING
Multiple patterning lithography (MPL) techniques have been used to extend the 193nm lithography to 22nm/14nm nodes. Possibly further due to the delay of extreme ultra violet lithography and electric beam lithography (EBL) Generally speaking, the MPL consists of double patterning lithography (DPL) and triple patterning lithography (TPL).
Effect of charge accumulation in isolated nodes of an integrated circuit during its processing is known as Antenna effect. This effect is also known as Plasma Induced Damage. The discharging of accumulated charges, which is done through the thin gate oxide of the transistor, it might cause damage to the transistors and degrade its performance.
A Multisource Clock Tree System (MCTS) represents a novel clock distribution technology that fills the gap between conventional clock tree and clock mesh. Clock mesh delivers the best possible clock frequency, skew and OCV results, and whereas conventional clock tree delivers the lowest power consumption and easiest flow.
There are four key differences between conventional CTS, multisource CTS, and clock mesh: shared path, mesh fabric, design complexity, and timing analysis. Each subsequent section discusses each of the three clock distribution methods with respect to these key differences.
First let us understand the difference between OCV and PVT.
The Synopsys Design Constraints (SDC) format is used to specify the design intent, including timing, power and area constraints for a design. This format is used by different EDA tools to synthesize and analyse a design. SDC is based on the tool command language (Tcl).
There are 5 stages that happen in place_opt. 1) initial placement: Perform wire length driven coarse placement.
LEC consists of three steps: Setup, Map and Compare.
Why do we need lock up latch and how are we gonna benefit from lockup latch ? A lock-up latch is nothing more than a transparent latch used intelligently in the places where clock skew is very large and meeting hold timing is a challenge due to large uncommon clock path. That is why, lockup latches are used to connect two flops in scan chain having excessive clock skews/uncommon clock paths as the probability of hold failure is high in such cases.
In this article I will write about cts goals and how to implement.
clock opt engine starts with an initialization step, which checks for pre-requisites and placement overlaps before building the clock tree network. The quality of the clock tree is highly dependent on the quality of input given to the tool. For example, a very tight transition may lead to high buffering, whereas a relaxed transition leads to a substantial violation at signoff. Hence, it is important to understand the flow inputs that are passed to the tool.
The feature of balancing more than one group of clock pins is similar to the capability of local skew or useful skew in clock tree synthesis. For example, you can group clock sinks that are not related to timing critical paths in a skew group and relax the target skew goals of the skew group. You can also give separate goals of target early delay to different skew groups and achieve the similar effect of useful skew to improve the slack of critical timing paths.
General guidelines to improve the cts quality are:
A buffer is nothing but two inverters connected back to back. Does it make any difference if the CTS is done using buffers or inverters ? What are the pros and cons and what factors would backend design engineer consider while building clock tree? These questions will be answered here.
Assuming we have the all the input data ready.
Blocakges are specific locations where placing of cells are prevented or blocked. These act as guidelines for placing std cells in the design. Blockages are of following types:
Usually with regular CTS, clbock tree is balanced primarily considering a slow corner. When timed in a fast corner, the delays of different cell sizes or cell types may scale differently to one another and differently to the RC delay of the connecting wires, leading to skew. This may lead to harder setup and/or hold timing closure.
In this article let us discuss in detail about different cts strategies, their advantages and disadvantages.
In this article we shall discuss about different optimization techniques that could be done in synthesis for timing correction.
Benefits of Clock Gating:
PD compliers does not balance ICG cells because they are not synchronization points on the clock tree. ICG cells are intermediate points on the clock tree, so the clock arrival times cannot be balanced. CTS balances the flip-flops located downstream.
DC low power flow: The size_only attribute is set on all inserted isolation cells, level shifters, and enable registers to prevent them from being optimized away. This ensures that thae isolation and level -shifting functions between power domains are maintained throughout the flow.
As power becomes increasingly significant in the advanced technologies more power reduction design methods are explored. There are several different RTL and gate level design strategies for reducing power. Clock gating is one of them and the most popular one. Other such is dynamic voltage and frequency scaling, but this is not very popular as it is difficult to design and implement.
Power switching is a power-saving technique in which portions of the chip are shut down completely during periods of inactivity.
In the UPF language, a power domain is a group of elements in the design that share a common set of power supply needs.
Power consumption has become a very important factor in recent process nodes. In this article we will explore/discuss the increasing challenges of power consumption and various design strategies to reduce power consumption.
Let us first compare the power between two designs. For 8kx32 sram, there will be one clock read and 1 clock write. But if we divide that into two then we need to have more clocks and we might see more power dissipated in the second design than first.
Unconstrained paths are paths without any timing constraints specified to them, i.e. set_input_delay, create_clock, etc.
If no multi-cycle is defined, then default setup check happens after one clock cycle and hold check happens at same clock edge as launch. This looks something like below:
Setup time is defined as the minimum amount of time before the clock’s active edge by which the data must be stable for it to be latched correctly. Any violation in this required time causes incorrect data to be captured and is known as a setup violation.
The disadvantages of flat runs are more run times, more memory requirement, limitation of EDA tools to handle designs greater than certain gate count.The design under discussion had two blocks. So effectively we had two blocks and a chip_top to pay attention to. Figure 1 is the floorplan in such a case. As you can see, there are two huge partitions in the design. It can also be seen that there is huge channel in between the blocks. Also, there are certain design requirements like signals from either of the partitions should not cross over the other partition