More details on CTS

In this article I will write about cts goals and how to implement.

Eligibility to start CTS:

  • Placement should be completed with acceptable congestion, timing qor and drc’s.
  • high fanout nets should be buffered already like reset, scan enable etc.
  • check_design -checks pre_clock_tree_stage.
  • Check_clock_trees: checks for clock definitions, propagations, ref cell list, cap and tran constraints, dont touch nets etc.

Goal for the cts:

  • build the clock tree structure.
  • route the clock nets.
  • optimize data path logic for setup and hold timing and drc’s.

We have 2 flows to do that:

  • classic cts: where cts is built first using skew as the goal, followed by data path optimization.
  • CCD: where cts and data path optimizations are done concurrently. Here final better setup/hold timing qor after cts is the final goal. (This is usually used for critical designs to close timing).

Why we need to focus on optimizing clock skew?

Clock skew is a phenomenon in synchronous circuits in which the clock signal (sent from the clock circuit or source or clock definition point) arrives at different components at different times due to:

  • wire-interconnect length.
  • temperature variations.
  • capacitive coupling.
  • material imperfections.

These factors become more critical for high frequency.

So, we need to ensure that the clock skew is minimal so that we can operate at high frequencies.

Comparing normal cts with ccd:

normal cts:

  • clock tree is built while ignoring the data paths.
  • Goal is to minimize skew.
  • Data path is optimized post-cts. Clock tree reamins untouched.

Step 1: Build GR clk trees, and performs inter-clock balancing, for minimum skew.

Step 2: Routes the clock nets, updates I/O latencies.

Step 3: performs the data path optimization to meet timing and DRC’s.

CCD:

  • clock tree is built with full knowledge of data path timing.
  • Goal is to meet setup/hold timing.
  • Data path is optimized along with incremental clock tree modifications by introducing useful skew.

Step 1: Builds timing driven GR clk trees, and performs inter clock balancing to improve setup/hold and DRC’s.

Step 2: Routes the clk nets, updates I/O latencies.

Step 3: optimize data path logic and simultaneously opt clock logic to further improve setup/hold/drcs.

For realistic timing analysis, the effect of CRP should be removed. The pessimism is removed in setup by adding the CRP in the capture path and by subtracting for hold.

How changes in common clock path between two registers affect setup/hold and how changes in uncommon clock path affect setup/hold timing?

When any pair of launching and capturing flop have some portion of clock path as common, the difference between the max and min delay of that common clock segment is referred to as common path pessimism.

Setup check would be most critical when clock reaches the launching flop late and capturing flop early; and the data path takes more delay.

Hold check would be most critical when clock reaches the launching flop early, and capturing flop late and data path takes less delay.

figure1

So, while doing setup analysis, the clock tree buffers in the launch path would be derated by +5%, and in capture path by -5% and data path with +5%.

For hold analysis, the launch would be derated by -5%, capture with +5% and data with -5%.

figure2

If we have common clock path, let us see how the evaluation changes. Ideally we would like to take +5% while considering launch and -5% while considering capture. but, how can the same buffer or set of buffers be derated differently for launch and capture? It does not make sense to consider different delays for same buffers.

For setup, we take CRPR, but for hold since it is same clock edge check we don’t consider CRPR.

Discuss this with a friend if Possible

Local skew optimization:

  • performs timing aware clustering.
  • optimizes local skew directly for setup and hold timing critical pairs.
  • can relax skew target where possible to save block buffer area.

Local skew is the difference in the arrival of clock signal at the clock pin of related flops. Related flops meaning flops which have valid timing paths. Global skew is the difference in the arrival of clock signal at the clock pin of non related flops. This also defined as the difference between shortest clock path delay and longest clock path delay reaching two sequential elements.

Positive Skew:

If capture clock comes late than launch clock then it is called +ve skew. Clock and data both travel in same direction. +ve skew can lead to hold violation. +ve skew improves setup time.

Negative Skew:

If capture clock comes early than launch clock it is called –ve skew. Clock and data travel in opposite direction. When data and clock are routed in opposite then it is negative skew. -ve skew can lead to setup violation. -ve skew improves hold time.

With relaxed skew target we will save area on the clock trees but you may need to add extra hold buffers on the data path. Without this feature, you will increase clock area and save some data path hold buffers. This is a trade-off.

By default, CCD skews all registers for timing, to prevent latency adjustment on boundary registers we need to set optimize boundary timing false.

In early stage of a project, I/O timing constraints are often not well defined for all blocks, so conservative constraints may be used. Since CCD is focused on creating “useful skew” to meet timing, this can cause exceptionally large skews on I/O registers. To prevent this, CCD can be disabled for I/O or boundary registers. CTS will perform normal cts on these boundary registers so their latency will be close to other avg latencies of all registers.

Written on September 19, 2020