Verilog basics

1) verilog code to swap contents of 2 registers with and without temporary registers. with temp reg:

SRAM operation

In this article we will discuss how 6T SRAM works.

5 stage pipeline

RISC: Sequence of simple operations. CISC: A few no. of complex instructions for a same task. Instructions of variable length and variable cycles to complete.

k-map

There are the following steps to find the minterm solution or K-map:

Floorplan basics

Floorplanning a chip or block is an important task of PD in which location, size and shape of soft modules, and placement of hard macros are decided. Floorplanning sometimes can also include I/O pad, pin placement, bump assignment, bus planning, power planning and more.

RF vs SRAM

RF is large signal array. SRAM is small signal array. SRAM uses bit line differential signalling and hence generally faster than register files.

Crosstalk Delay Analysis

Assume there is a N1 net and has a rising transition at output and there is an aggressor and cc between these two nets.

Glitch noise analysis and crosstalk analysis

Glitch noise validates integrity of steady signal in presence of various noise sources and its impact on downstream circuit.

Negative net delay cases

In some cases tool reports negative net/cell delay. The delay for a net/cell is the time it takes for the output signal to reach threshold point after the input signal reaches to threashold point.

Jtag, LBIST, MBIST, compression

LBIST The LBIST (Logic built in self test) is inserted into a design to generate patterns for self-testing.

Constraints

In this article we shall discuss about the different constraints we give for a block.

AOCV

In this we will understand how tool calculates stage counts and distances on AOCV for launch and capture timing paths and gets corresponding AOCV derate values.

Scan Chain

Scan chains are the elements in scan-based designs that are used to shift-in and shift-out test data. A scan chain is formed by a number of flops connected back to back in a chain with the output of one flop connected to another. The input of first flop is connected to the input pin of the chip (called scan-in) from where scan data is fed. The output of the last flop is connected to the output pin of the chip (called scan-out) which is used to take the shifted data out.

Miscorrelation btw syn pnr and sta

Let us discuss various scenarios where we migh see miscorrelation btw the diff tool results in this article.

Various issues at lec

How to model sequential constant registers in lec?

By default tool does not model sequential constant registers.

Timing Models

For full flat chip timing analysis we need to read in gate level netlist alogn with spef/sdf, timing libraries and constraints. Using this approach designers should wait till all blocks completion prior to performing full chip timing.

Physical Synthesis

Setup/hold on Flop2flop, flop2latch and latch2flop

Positive edge triggered FF to positive level sensitive latch

Clock gating checks

Clock gating check is a constraint, either applied or inferred automatically by tool, that ensures that the clock will propagate without any glitch through the gate.

Timing Budgeting

In hierarchical design flows, chip-level timing constraints must be mapped correctly to corresponding block-level constraints.

Signoff Methodology

STA can be run for many different scenarios. The three main variables that determine a scenario are:

Parasitic corners (RC interconnect corners and operating conditions used for parasitic extraction).
operating mode.
PVT corner.

NLDM vs CCS

The cell timing models are intended to provide accurate timing for various instances of the cell in the design environment. The timing models are normally obtained from detailed circuit simulations of the cell to model the actual scenario of the cell operation. The timing models are specified for each timing arc of the cell.

Interconnect delay

Net delay is the diff btw the time a signal is first applied to the net and the time it reaches other devices connected to that net.

RC Corners

RC variation is also considered as corners for the setup and hold checks. RC variation can happen because of fabrication process and the width of metal layer can vary from the desired one.

What decides target latency and skew numbers?

We could also get latency requirements from top level.

EM and IR

First we will go over power distribution network issues.

When to specify false path between diff clock domains?

We need to understand whether the clock domains are related or independent of each other. It depends on whether there are any data paths that start from one clock domain and end in the other clock domain. If there are no such paths, you can safely conclude that the two clock domains are independent of each other. This means that there is no timing path that starts from one clock domain and ends in the other clock domain.

Via Pillar

Via pillar is a new technology that aims to reduce via resistance and increase electromigration robustness for enhanced performance.

Time borrowing for latch based designs

For Flops, data arrival later than capture clock edge causes SETUP violation. Whereas Latch remains transparent for entire duration of active clock edge, relaxing arrive-before-edge criterion.

How to increase clock latency source window and ensure block level timing closure in top level context?

This is a follow up article based on previously written io timing miscorrelation article.

Constraint management for source snychronous designs

A source-synchronous interface outputs the clock in addition to the data constrained with it. This clock port is used to sample data at the receiver that connects to the interface. There are different categories of source-synchronous interfaces.

What is virtual clock and when to use it?

A virtual clock is a clock that exists but is not associated with any pin or port of the design. It is used as a reference in timing analysis to specify the input and output delays relative to a clock.

Understanding check_timing

clock_expected This warning indicates a missing signal of type clock on a clock pin ofterm termed as clock phase. If there is any active, ie, non-disabled sequential arc from or to the clock pin & clock pin does not get clock signal, then this warning is flagged.

How to fix timing in synthesis

We will discuss various ways to fix timing in synthesis.

How to specify clock latency for clock pins of macro models

Problem statement

The clock input pins of macros (.lib model) typically need to be balanced earlier than other sinks because of internal clock paths inside the macro. How can the clock latency of a hard IP be specified for CTS?

Recommendations for fixing hold violations during Pnr

Pre-CTS stage

Hold timing analysis can be performed early in flow to identify hard macros which have large hold time requirements. Identifying these situations early allows planning for enough space to insert the required buffers and delay cells to fix them.

Metal ECO flow

A metal-only ECO is carried out by changing only metal interconnects in the design. Metal-only ECOs are very common in today’s semiconductor industry as they save complete silicon re-spin. Sometimes there may be need to change the design for various reasons, and that too, a minor change. These changes may be due to some bug in the design or due to customer demand. A metal-only ECO enables the design to be re-fabricated only for a few layers. It is very cost-effective as for complete silicon re-spin, there may be a requirement of around 100 layer masks to be manufactured. Metal-only ECOs enable the older masks to be used for most of the layers. Only the layers with changes in them need to be manufactured again, which is usually 2 to 4 in case of metal-only ECOs.

What affects the cell delay and interconnect delay

In this article we will discuss in details about the affects in cell delay calculation and interconnect delay calculation.

PPA tips and tricks

Performance/Power/Area are the key metrics to validate the functionality of any design on a given technology node. In this article we will go over various place-and-route techniques to achieve these key metrics.

Interconnect RC

Wire delay comes from two sources. One is the intrinsic, speed-of-light delay. The second is the lossy nature of on-chip wires; because the resistance of such wires is very high, the wires form RC circuits. Speed of light delay is proportional to the length of the wire, while RC delay increases with the square of the wire length. Thus, for long wires, RC delay dominates. For shorter wires, speed of light delay still dominates. However, such delay over a short wire is still relatively small compared to gate delay.

DFT Modes

As the technology nodes are shrinking consistently, the probability of the occurrence of faults is also increasing which makes DFT an indispensable function for modern sub-micron SoCs.

What are asynchronous checks and why we need them

These are timing checks for asynchronous signals similar to the setup and hold checks. Enables recovery and removal timing model checks to be performed during timing analysis.

What are data to data timing checks?

Firstly, let us answer the question what are data to data timing checks and why we need them.

Optimizations done at synthesis

In this article we shall discuss various optimization steps involved in synthesis.

MIMCAP and MOMCAP

MIM (Metal-Insulator-Metal) and MOM (Metal-Oxide-Metal) capacitors are both metal-to-metal capacitors.

How to perform timing check between asynchronous clock domains

If two clock domains are asynchronous and you have applied set_false_path between these two clocks, no timing checks can be performed. Also, if you have defined a clock group with asynchronous clocks using the set_clock_groups command with the -asynchronous option, by default the tool cannot perform a timing check. But if you use the -allow_paths option with the set_clock_groups command, timing check can be performed.

Gate vs interconnect delay

Problem Statement: One path has 100% gate delay, second path has 50% gate delay and 50% interconnect delay, third path has 100% interconnect delay. These 3 different paths converged at a voltage and temperature. What would happen to each path if voltage and temperature changed? Which would be fast and which would be slow?

Clock Jitter

By definition, clock jitter is the deviation of a clock edge from its ideal position in time. Simply speaking, it is the inability of a clock source to produce a clock with clean edges. As the clock edge can arrive within a range, the difference between two successive clock edges will determine the instantaneous period for that cycle. So, clock jitter is of importance while talking about timing analysis. There are many causes of jitter including PLL loop noise, power supply ripples, thermal noise, crosstalk between signals etc. Let us elaborate the concept of clock jitter with the help of an example:

Clock Domain Crossing

What is Metastability?

Any discussion of clock domain crossing (CDC) should start with a basic understanding of metastability and synchronization. In layman’s terms, metastability refers to an unstable intermediate state, where the slightest disturbance will cause a resolution to a stable state. When applied to flip-flops in digital circuits, it means a state where the flip-flop’s output may not have settled to the final expected value.

Interface timing miscorrelation block vs top level

IO Timing Miscorrelation between Block Level and Top-Level

Combating Congestion

Reasons for congestion

There can be multitude of reasons for congestion, with some reasons having a direct and others having an indirect impact. Let’s examine some.

Mutli bit flops and MIMCAP

Benefits of Multibit flops:

Area reduction because of shared transistors and transistor level optimized layout: Area of Multibit cell is less than two single bit cells because of transistor level optimization of cell layout, which includes shared logic, power-supply and substrate-well.
Total length of clock tree is reduced: This results in reduction of clock-tree buffers and clock-tree power. Clock-tree buffer level reduction improves overall balanced design skew.
Power reduction.
Better skew

RV inputs

1) def

Latency vs skew

To achieve better latency; high latency means more power dissipation

A good placement of flip flops will reduce latency i,e., sink pins of a clock are not placed far away, so that latency is reduced.
Constraint the clock net routing to be on upper metal layers that are less resistive using the TopPreferredLayer and BottomPreferredLayer constructs.
Using mesh it is possible to achieve lower insertion delays.
Htree reduces insertion delay: the combination of larger drivers and low RC routing layers reduces the non common path clock insertion delay, potentially increasing the performance.

Design Rule Checks (DRC)

Here we discuss about various types of design rule checks (DRC) violation, their causes and how to fix the various design rule checks (DRC) at lower technology node on block level as well as full chip level implementation while meeting the design rule with respect to latest technology standards.

Layout vs Schematic Debug (LVS)

Input files for LVS in ICV tool are listed below:

Latch vs Flip Flop

Latch

Latch is faster (no need to wait for clock edge) but less predictable (more prone to race conditions).
Latch uses less area (becasue there are less no.of gates).
Latch is fast (the longer combinational path could be compensated by shorter paths in the subsequent logic states). That is why for high performance, circuit designers are turning into latch based design.
For ASIC’s with large skew, latches have substantial benefits for reducing the clock period.

OCV, AOCV, POCV Part 2

One of the primary challenges is variation in manufacturing parameters, namely random and systematic variations. To model these parameter variations, a few engineers came up with the on-chip variation (OCV) model. The concept of OCV was first introduced in technology nodes above 90nm. The fundamental idea behind OCV is to apply global derates on the whole design irrespective of the type of cells, its individual variation or its slew-load conditions. But this simple concept became ineffective in lower technology nodes. Unfortunately, global derates make the design too optimistic for shorter paths and too pessimistic for longer paths. Subsequently, expected results are not accurate and reliable enough, which affects the performance of the chip.

Temperature Inversion

In all, there are two phenomenon that govern the conductivity in any device-

Congestion

If the number of routing tracks available in one particular area is less than the required number of routing tracks then it is called congestion.

Physical DRC

WIRE TO WIRE SPACING(MIN SPACING) MIN WIDTH OF WIRES VIA TO VIA SPACINGS NOTCH AVOIDING

DPT and color conflict

Multiple patterning lithography (MPL) techniques have been used to extend the 193nm lithography to 22nm/14nm nodes. Possibly further due to the delay of extreme ultra violet lithography and electric beam lithography (EBL) Generally speaking, the MPL consists of double patterning lithography (DPL) and triple patterning lithography (TPL).

Antenna Affects

Effect of charge accumulation in isolated nodes of an integrated circuit during its processing is known as Antenna effect. This effect is also known as Plasma Induced Damage. The discharging of accumulated charges, which is done through the thin gate oxide of the transistor, it might cause damage to the transistors and degrade its performance.

Deep dive into multisource cts

A Multisource Clock Tree System (MCTS) represents a novel clock distribution technology that fills the gap between conventional clock tree and clock mesh. Clock mesh delivers the best possible clock frequency, skew and OCV results, and whereas conventional clock tree delivers the lowest power consumption and easiest flow.

Deep dive into LVT, SVT, HVT cells

Difference between CTS, MultiSource CTS and Mesh

There are four key differences between conventional CTS, multisource CTS, and clock mesh: shared path, mesh fabric, design complexity, and timing analysis. Each subsequent section discusses each of the three clock distribution methods with respect to these key differences.

Introduction to OCV, AOCV, POCV

First let us understand the difference between OCV and PVT.

Introduction to STA Part 1

What is STA ?

Introduction to SDC!

The Synopsys Design Constraints (SDC) format is used to specify the design intent, including timing, power and area constraints for a design. This format is used by different EDA tools to synthesize and analyse a design. SDC is based on the tool command language (Tcl).

Placement and timing

There are 5 stages that happen in place_opt. 1) initial placement: Perform wire length driven coarse placement.

Logical Equivalence Check

LEC consists of three steps: Setup, Map and Compare.

Lockup Latch

Why do we need lock up latch and how are we gonna benefit from lockup latch ? A lock-up latch is nothing more than a transparent latch used intelligently in the places where clock skew is very large and meeting hold timing is a challenge due to large uncommon clock path. That is why, lockup latches are used to connect two flops in scan chain having excessive clock skews/uncommon clock paths as the probability of hold failure is high in such cases.

More details on CTS

In this article I will write about cts goals and how to implement.

Understanding various steps in cts

clock opt engine starts with an initialization step, which checks for pre-requisites and placement overlaps before building the clock tree network. The quality of the clock tree is highly dependent on the quality of input given to the tool. For example, a very tight transition may lead to high buffering, whereas a relaxed transition leads to a substantial violation at signoff. Hence, it is important to understand the flow inputs that are passed to the tool.

Skew groups

The feature of balancing more than one group of clock pins is similar to the capability of local skew or useful skew in clock tree synthesis. For example, you can group clock sinks that are not related to timing critical paths in a skew group and relax the target skew goals of the skew group. You can also give separate goals of target early delay to different skew groups and achieve the similar effect of useful skew to improve the slack of critical timing paths.

CTS quality enhancement and debugging

General guidelines to improve the cts quality are:

Use buffers and inverters with a minimal difference between its rise and fall delay.
Use default values for CTS constraints.

Inverter vs Buffer based clock tree

A buffer is nothing but two inverters connected back to back. Does it make any difference if the CTS is done using buffers or inverters ? What are the pros and cons and what factors would backend design engineer consider while building clock tree? These questions will be answered here.

PD Implementation Techniques

Assuming we have the all the input data ready.

Halo vs blockage

Blocakges are specific locations where placing of cells are prevented or blocked. These act as guidelines for placing std cells in the design. Blockages are of following types:

Introduction to Mesh, H-Tree and Multi-Tap Clock

Usually with regular CTS, clbock tree is balanced primarily considering a slow corner. When timed in a fast corner, the delays of different cell sizes or cell types may scale differently to one another and differently to the RC delay of the connecting wires, leading to skew. This may lead to harder setup and/or hold timing closure.

CTS Strategies Part 1

In this article let us discuss in detail about different cts strategies, their advantages and disadvantages.

Optimizations in Physical synthesis

In this article we shall discuss about different optimization techniques that could be done in synthesis for timing correction.

ICG Methodology for power and timing QoR

Benefits of Clock Gating:

Part2 on ICG cells

PD compliers does not balance ICG cells because they are not synchronization points on the clock tree. ICG cells are intermediate points on the clock tree, so the clock arrival times cannot be balanced. CTS balances the flip-flops located downstream.

Handling Power in EDA Flow

DC low power flow: The size_only attribute is set on all inserted isolation cells, level shifters, and enable registers to prevent them from being optimized away. This ensures that thae isolation and level -shifting functions between power domains are maintained throughout the flow.

Power Reduction Methods - Part1

As power becomes increasingly significant in the advanced technologies more power reduction design methods are explored. There are several different RTL and gate level design strategies for reducing power. Clock gating is one of them and the most popular one. Other such is dynamic voltage and frequency scaling, but this is not very popular as it is difficult to design and implement.

Power Reduction Methods - Part2

Power switching is a power-saving technique in which portions of the chip are shut down completely during periods of inactivity.

Power Intent Concepts

In the UPF language, a power domain is a group of elements in the design that share a common set of power supply needs.

Low Power Design Strategies

Power consumption has become a very important factor in recent process nodes. In this article we will explore/discuss the increasing challenges of power consumption and various design strategies to reduce power consumption.

Mutlipoint CTS vs Singlepoint CTS

SRAMS vs RFS

PPA comparision of 8kx32 vs two 4kx32 SRAMS

Let us first compare the power between two designs. For 8kx32 sram, there will be one clock read and 1 clock write. But if we divide that into two then we need to have more clocks and we might see more power dissipated in the second design than first.

Unconstrained Paths

Unconstrained paths are paths without any timing constraints specified to them, i.e. set_input_delay, create_clock, etc.

MultiCycle Paths!

If no multi-cycle is defined, then default setup check happens after one clock cycle and hold check happens at same clock edge as launch. This looks something like below:

Placement of Clock Gating Cells

Should the clock gating cells be placed closer to clock source or closer to sink flops ?

Different Setup and Hold fix methods!

Ways to fix Setup and hold timing violation

Setup time is defined as the minimum amount of time before the clock’s active edge by which the data must be stable for it to be latched correctly. Any violation in this required time causes incorrect data to be captured and is known as a setup violation.

Flat vs Hier Designs!

Introduction

The disadvantages of flat runs are more run times, more memory requirement, limitation of EDA tools to handle designs greater than certain gate count.The design under discussion had two blocks. So effectively we had two blocks and a chip_top to pay attention to. Figure 1 is the floorplan in such a case. As you can see, there are two huge partitions in the design. It can also be seen that there is huge channel in between the blocks. Also, there are certain design requirements like signals from either of the partitions should not cross over the other partition