Part2 on ICG cells

PD compliers does not balance ICG cells because they are not synchronization points on the clock tree. ICG cells are intermediate points on the clock tree, so the clock arrival times cannot be balanced. CTS balances the flip-flops located downstream.

The clock pin arrival time on an ICG cell is primarily driven by the fanout of that ICG cell. By reducing the fanout, the ICG cell is located further downstream in the clock tree because less buffering is needed. With a large fanout after ICG cell, a subtree must be build at its output. This subtree increases latency after the ICG cell. As a result, the ICG cell is now located upstream in the clock tree.

Handling the EN logic of Critical ICGs

After CTS, many timing paths that end at clock gate EN pins appear. Why didn’t these paths get fixed in placement, and how can i handle them?

Pre-CTS, the register pins and clock pins of ICG see a clock latency of 0ns which models the same arrival time for both.

Post-CTS, the ICG cells are now halfway through the tree and see a latency of 800ps. However, all registers see a 1.5ns arrival time for their clock pins since they are at the leaf level of the tree.

Any paths from register to clock gate now see the difference in clock arrival times, and the pre-cts slack is degraded by 700ps.

Since the clock gate is supposed to be at the intermediate point to allow the shut-down portions of the clock tree, it is not correct to assume the clock pins of clock gates should be balanced with the registers.

These paths can be addressed in the following ways:

First, examine how far down these ICGs lie in the clock tree for POST-CTS. Whether they are near the root of the clock tree or the clock pins of the flops can influence how you handle them.

If the clock gates are roughly halfway down the clock tree, you might get the benefit by splitting (replicating) the clock gate. Splitting the clock gate creates parallel copies of the original driver, resulting more clock gate drivers with fewer loads per driver. If the splitting is done for pre-cts, then we effectively push the clock gate further down the clock tree, increasing power but improving enable timing.
If the clock gates are either at or near the bottom of the tree, splitting clock gates will unlikely offer any improvement. In this case, you should add a pre-cts clock latency value to the ICG clock pin so that you model the pre-cts latency correctly. Using above example, you would apply a -700ps latency to the clock pin of ICG during place_opt but before cts. Applying latency allows you to correctly model the slack before the actual clock gate clock arrival time is known.
If the ICG is a single “top level clock gate” , which is fed by a relatively small cone of logic, you might apply a float pin constraint to the flip-flops feeding the END signal logic to get their clocks earlier (useful skew). Sometimes this technique is the best solution for top-level clock gates because it does not impact power; splitting top-level clock gates can have a very large power impact.

Written on September 6, 2020