It’s a long way to Monte Carlo, it’s a long way to go…

How far is it to Monte Carlo from here?

Lately, there were some questions on Monte Carlo analyses and how many runs we have to run. So, that needed some discussion I thought.
(Thanks to the Siliconeer for some references and numbers).

The problem

Problem is this: you have an electronic circuit and you want to know if it works. Ok, so you would simulate first, then you would tape-out, then you would get the circuit back from the fab. Then you would start to measure and it might not work as you expected. Either complete failure due to some swapped wires or what have you. Or there are more soft errors like mismatch creating skews or gain errors or what have you. That’s an old story which we all recognize.

Typically, you would verify as much as possible before submitting your expensive design. You run all the corners, all the use cases, all the temperatures, all the statistical variations, etc.

But wait… there is the keyword: all. How large is the number “all”?

That is a rethorical questions and we all (pun intended) kind of know that we cannot run all test cases before taping out. It is not possible from a timing point of view. We would miss the famous market window or Ph.D. window or what-have-you window.

Unfortunately, we will see that there is no direct answer, unless we make quite a few assumptions or have a priori knowledge about e.g. standard deviations. Otherwise, it has to be an iterative process.

Statistical analysis

What you would instead do is that you run your statistical analyses picking a set of simulation results and deduce the performance from that. For example, you would run, say 100 transient simulations of your analog-to-digital converter (ADC) and find an effective number of bits (ENOB) of say 6.7. How reliable is this number? Is it 6.7-ish? Or between 6.5 and 6.9, or? In each analysis a random error, or deviation, or mismatch, or variation, is added that changes the outcome slightly. The results are then compiled in an histogram or so.

rect6912

That is where the confidence interval is coming into play. We have to express that in terms of how likely it is that something is between this and that. We should say:

It is 99% likely that the true ENOB is between 6.5 and 6.9.

Which means, that still, there is a small probability that the ENOB is actually outside that range. How do these ranges and values depend on the number of samples in the Monte Carlo analysis?

Crunch some numbers

In some sense, the answer is “the square-root of n”. The more samples you take, the more accurate the prediction becomes, but it only scales with the square root of n. If you need to go from 1-% accuracy to 0.1-% accuracy, you have to increase the number of samples by 100 (approximately).

Let’s see if we can make this statement just a bit more useful as we are more interested in what to write in the Cadence GXL form…

Let us refer to x as the parameter we are looking for in our simulations, we will use m and s for the measured mean and standard deviation, respectively. We will use M and S for the true mean and true standard deviation, respectively. The m and s we calculate with standard methods: m is the total sum divided by number of samples, s is more or less the square root of the average of summed squares of values minus the m.

m = \frac{1}{n} \sum_{i=1}^{n} x_i
s = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (x_i - m)^2 }

Now, the following holds if we can assume that we have a Gaussian distribution of our x samples (not really the case for low numbers, and it could also dependend on the parameter we measure – it might not end up Gaussian as such. For example, ENOB would never be larger than the NOB, which implies a skew in the distribution).

-s \cdot k_c + m < M < m + s \cdot k_c

where k_c is a confidence parameter and will depend on the number of samples and the percentage of confidence. For example, if we want to state a 99-% confidence, the k_c value will be larger than for e.g. 95%, k_{99} > k_{95}. Also, with more Monte Carlo runs, n getting higher, it is also natural to assume that k_c will decrease.

Similarly, we might also have to estimate the true spread, i.e., standard deviation. The following is true:

k_1 \cdot s < S < s \cdot k_2

The standard deviation has to be bounded by two parameters (also dependent on the confidence and number of samples) times the measured standard deviation, due to the nature of the standard deviation (A sum of Gaussians results in a chi-square distribution). These $k_j$ values will also be functions of the number of samples. The more sample we take, the more we will narrow it down. For example, for n = 300 samples, these values are k_1 = 0.926 and k_2 = 1.087.

Get to the point

So, what does it tell us then? It says that we will eventually have, after millions and millions (an infinite number) of devices have been fabricated, a Gaussian distribution per:

N( M, S)

where we, in advance, do not know more about M and S than what’s stated above, and the degree of uncertainty is given by the number of samples we run and the confidence we chose. This means that the expected distribution could actually be the following (!) in the worst-case (within our level of confidence)

N( -k_c \cdot s + m, k_2 \cdot s)

This is graphically shown below, the real Gaussian might thus move a bit off from measured mean as well as becoming wider (and it could be shifted to the other side, of course).

rect6912

We could for the sake of clarity plot these coefficients as function of number of samples. The below picture shows the curves for increasing number of samples. To get the scales in place, the k1 and k2 coefficients were centered around 1. The curves are for 99-% confidence.

coefficients

Yield

And now, that is not the whole story either. We quite often talk about yield in our circuits. We want to guarantee that a certain number, percentage, of our circuits are within specification, for example p = 99%. Dependent on that measure of quality, the Gaussian blobs have a significant impact too. See the picture below, assume that the minimum and maximum parameters should be guesstimated within confidence and within yield requirements. We have to consider the worst-case Gaussians in both directions, i.e, the Gaussians with the widest standard deviation and the minimum and maximum mean values within degree of confidence. The pink areas are samples that will be discarded as they do not meet the yield. The values can be calculated with the inverse error function (erfinv or so). For example, a yield of 99% would put the xmin and xmax at a distance of 2 sigma away from the mean value. A yield of 68% would put them 1 sigma away.

rect6912

How do we work out some useful numbers for this. We have three components in our analysis, design method: number of samples (n), yield (p), and confidence. Probably, confidence and yield has to be set by your project manager or product manager. That is not much you can do as a designer. However, you should be able to motivate the number of samples before you press the big, red tape-out button, given that you have the minimum and maximum specifications on the parameters, xmin and xmax.

Let us normalize the two distributions and we have the following two requirements (and avoid any discussions on not being in right region, etc., and assuming we have equal amount of failed chips on each side, etc.). For simplicity, assume that the specification is symmetric around a point x_0 and at “distances” of \delta x, e.g., x_{max} = x_0 + \delta x. We get something like this after we have divided both sides with x_0 (assuming it’s not zero):

1 + \frac{\delta x }{ x_0} -\frac{m}{ x_0 }  > ( \text{erfinv}(  \frac{ 1 + p }{ 2} ) \cdot k_2 + k_c )  \cdot \frac{ s}  { x_0 }

Let us now also assume that the specification extreme points are 10% away. We get something like

\frac{ 1.1 -  {m}/{ x_0 }}{s / x_0 } > \Phi (n,c,p)

or

{m}  + \Phi (n,c,p) \cdot {s   } < 1.1 x_0 = x_{max}

if we assume symmetry. If we run the simulations and show the \Phi (n,c,p) as function of number of samples we get the following graph for 99% confidence and 99% yield. It converges to (approximately) 2, due to the yield requirements. For lower number of samples, we have to have a stricter scaling.
coefficients_scale

“Concluding”

Given the formulas above, there is no direct control over the number of samples you can/should run. It depends on the measured mean value, the standard deviation, your specification and yield requirements. Assuming 99-% yield, 99-% confidence, 10-% symmetric specification points, show the required coefficient with which the measured standard deviation has to be scaled with in order to evaluate if enough samples are taken or not.

Way forward

A more accurate approach to this way of thinking would however be to simply run a hypothesis test and see how likely it is that the x is between xmin and xmax and see how it varies with the number of samples, n. I might return to that one day when I’ve refreshed my statistics knowledge. We want to be able to state that, with c-% confidence, that the following is true:

P(x_{min} < x < x_{max}) < p

where p is the yield. What would be the required number of samples to do that prediction?

Further on, given the results above, it would make sense to have an adaptive Monte Carlo analyses. Run a few first, check the results, then continue to run until the above requirement is met.

And of course we have to assume that our models are adding the variations in a sufficiently random and correct way. But that’s another story…

References

Early in the morning …

The last few weeks there have been quite a few questions popping up with respect to the Early voltage (or Early effect) for sub-micron CMOS transistors. Why, one could wonder… The Early voltage is perhaps more widely used for BJT transistors and in the CMOS case we use more the terminology output conductance, conductance, and channel length modulation in some form. Therefore, I will use the term Early voltage, VE, a bit sloppy throughout the text.
So basically, this post is just an orientation around a small test bench and some comments. The ambition in the end is a bit larger and requires some more work and I will return to that in a future post.

Where do we start?

Consider the expression for the drain current in the saturation region: 

I = \alpha \cdot \left( V_{GS} - V_T \right) ^2 \cdot \left( 1 + \lambda V_{DS} \right)  

where we have the channel length modulation effect indicated by the lambda times the drain-source voltage. Lambda would then be somewhat like the inverse Early voltage. Now, the equation above is still an approximation and not really valid for low geometries. There are several of the parameters that need modification to become more accurate.

However, it is not really needed to be too accurate in the hand calculations. If we want a very accurate calculation we have to spend way too much time and often it does not really help us. Instead we rely on the trends (increasing this yields that, decreasing that yields this, etc.). To obtain accurate results we use the simulator.

Assume now we connect our transistor as in the testbench below. The ambition is to measure the current through the drain of the transistor. 

Image

If we sweep the drain-source voltage and plot the current, we would get a behavior like the one below

blog_idstyle

In the picture, we show both the (arguably) “wanted”, blue curve where the current is independent on the drain source voltage in the saturation region (to the right of the dashed line). The red curve indicates a more realistic case where the current in the linear region is not so linear, and where the current in the saturation region is dependent on the drain-source voltage. The latter is undesired, since that would create a transistor with a non-zero output conductance, i.e., the current is dependent on the voltage across the output terminals, i.e., its functionality as a buffer is poor. We can also interpret it in terms of its intrinsic gain.

A comment here is also that the velocity saturation affects the result. We will however not talk about that for the time being. We will also avoid other third-order effects in our argument. We want to originate from the simple first-order function and use that. However, we will do some modifications to the channel-length modulation, i.e., the inverse Early voltage.

Let’s look at the picture again, but now with an extrapolated line from the current to the point when it crosses the x axis.

blog_idstyle

That crossing point forms the Early voltage. It is from the picture quite obvious that the extrapolation line could be quite sensitive and there could be large variations depending on how we fit that line. For the ideal, wanted, blue case, we see that the early voltage is infinitely large. For low geometries, the Early voltage will approach the y axis rapidly and for say 45-nm processes, the early voltage is at a few volts.

I = 0 \rightarrow \alpha \cdot \left( V_{GS} - V_T \right) ^2 \cdot \left( 1 + \lambda V_{DS} \right) = 0 \rightarrow \lambda V_{DS}  = -1  

Simulating

Further on, “within” the same process, the drain current varies a lot with transistor width and length, drain and gate voltages. (Some of these variations are of course natural per se, but I refer to the template function as above). With shorter and shorter lengths the template changes. These short channel effects typically make the channel length modulation depend inversely on the channel length and cannot be considered a constant.

\lambda \sim  1 / L

In the picture below we have plotted the currents when varying length from 100 to 1000 nm and widths from 0.5 to 10 um. Gate voltage is swept from 0 to 1 V.

early_blog

To clarify the graphs a bit, we can normalize the current with respect to the width/length ratio. This gives us perhaps a slightly more understandable graph. We see more distinct groups in the family curves, and we can see that for some of the curves, the slopes are quite difference. Compare the center ones with the top ones, for example.

early_blog_2

Where to go from here?

Now, say that we accept that the formula above is just a bit wrong, within the pretty large variations that hand calculations give. Can we make them just slightly more useful/reliable?

First, we can do a derivation of the current with respect to the output voltage (using the equation above), i.e., the first linearized model. From that formula, we can find the lambda and hence also the Early voltage. We get something like

\lambda = \frac{\frac{dI_D}{dV_{DS}} / I_D }{1 - \frac{dI_D}{dV_{DS}} \cdot V_{DS} / I_D }  \rightarrow V_E = \frac{1}{\lambda} = \frac{1 - \frac{dI_D}{V_{DS}} \cdot V_{DS} / I_D }{\frac{dI_D}{V_{DS}} / I_D }

From the simulated data we have at hand, we can manipulate such that we get a whole lot of family curves describing lambda, in various shapes. By inspecting the curves, we also quite clearly see that the Early voltage increasing with longer channels. In fact, for this particular 65-nm process, it varies between 0.2 (!) and 15 V in this particular process. (From the graph, we also see some “ill-conditioned” cases , i.e. subthreshold. We also see cases where the maximum output conductance yet has not been reached. The relationship between sizes and voltages are also a bit ill-conditioned.)
Screenshot-Virtuoso (R) Visualization & Analysis XL
Compare this to the gain expression of the transistor (assuming small-signal parameters):

A_0 = \frac{g_m}{g_{ds}} = \frac{ 2 I_D / \left( V_{GS}-V_T \right) }{ \lambda I_D } = \frac{2 V_E}{V_{GS}-V_T  } = \frac{2 V_E } { V_{eff}}

where veff is the effective gate voltage. One would from this formula think that you should be able to crank out a gain of some 2 times 15 over 0.2 V = 150 times, to use the rule of thumb. This is not all true, since the effective gate voltage is quite high for the upper graphs… At least we have ball park numbers now given this simple analysis. If we want to study the formula above a bit more in detail, we end up in the gm-over-id design methodology. That’s probably a topic of another post too.

Going MATLAB on the curves

Now, the idea is to do some curve fitting… Already now, using a Taylor expansion, we can for example see that there is a square component hiding there. What if we take the whole sha-bang into MATLAB and play around a bit with different models and fit them to curves using the least square methods?

Use your favourite printing command:

ocnPrint(?numberNotation "scientific" ?output "lambda.txt" ?from 0 ?to 1.2 ?step 0.001 1 / (deriv(i("/Mtrans/d" ?result "dc")) / i("/Mtrans/d" ?result "dc") / (1 - ((deriv(i("/Mtrans/d" ?result "dc")) / i("/Mtrans/d" ?result "dc")) * v("/vDrain" ?result "dc")))))

and massage it a bit in emacs (remove all the unnecessary characters, etc, and make it MATLAB-friendly) and then libreoffice calc. A nice table with 336 by 1201 values (!)

Let us go MATLAB on the curves and return with results in another post …

matlabblog

Silly script: Create OCEAN desVars

As we are now ramping up some of the projects for the TSTE16 course at our university, I wanted to just propose a small snippet to get your design variables from MATLAB into Cadence (or more exactly the OCEAN-script for simulating Spectre). The idea is to have MATLAB as master in these projects. This also implies that you want to set any design variables using MATLAB rather than hacking multiple files and risking loosing track of what you’ve been doing.

There are many ways to do this:
(1) One option is to use the simulink interface provided by MATLAB and Cadence, but is a bit over the top in this case.
(2) Another option is to create all the required netlists in the Cadence ADE environment. This will actually generate multiple files in your simulation directory. The “full” file which will be used for spectre is normally called “input.scs” . Further on, if you play that play button (Cadence 6) and study the log file, you actually see in the text which spectre command that is run. (Further on, this command is also stored in the runSimulation “script” stored by Cadence in that directory). The input.scs file is just a concatenation of several files, such as netlistHeader, netlist, netlistFooter — but also some dot-files, like .modelFiles and .designVariables. The latter is a text version of your variables that you could alter if you like. Then concatenate the files into the input.scs accordingly. And run. Requires some more hands on and have less post-processing options.

(3) Or assume we have already generated the Cadence test bench and generated an OCEAN script with a bunch of desVar definitions in it. We can replace those rows with a load command, e.g.

(loadi "tste16ProjDesVars.il")

which you have stored accessible somewhere.
Then make sure you can change that file with MATLAB. Like for example:


desVar(1).Name = 'vccr12';
desVar(1).Value = 1.2;

desVar(2).Name = 'vLo';
desVar(2).Value = 0.1;

desVar(3).Name = 'vHi';
desVar(3).Value = 1.1;

desVar(4).Name = 'compGain';
desVar(4).Value = 1000;

fClk = 1e6;
desVar(5).Name = 'fClk';
desVar(5).Value = fClk;

sigLength = 8;
desVar(6).Name = 'sigLength';
desVar(6).Value = sigLength;

desVar(7).Name = 'tStop';
desVar(7).Value = sigLength/fClk;

desVar(8).Name = 'daisyDemoTopReadFile';
desVar(8).Value = '/home/jacobw/TSTE16/testIn.txt';

desVar(9).Name = 'daisyDemoTopWriteFile';
desVar(9).Value = '/home/jacobw/TSTE16/testOut.txt';

%% This part would then be in a separate function.
fid = fopen('tste16ProjDesVar.il','w');
fprintf(fid, ';; ==================================== \n');
fprintf(fid, ';; Design variables generated by MATLAB \n');
fprintf(fid, ';; ==================================== \n');
for M = 1:length(desVar);
if isstr(desVar(M).Value)
fprintf(fid, '(desVar "%s" "\\"%s\\"")\n', desVar(M).Name, ...
desVar(M).Value);
else
fprintf(fid, '(desVar "%s" %f)\n', desVar(M).Name, ...
desVar(M).Value);
end
end;
fprintf(fid, ';; ==================================== \n');
close(fid);

Where I have used structs to store the intermediate design variables. You can also store strings into the design variables to pass on file names or similar to verilog A blocks in Cadence.
This gives you a file like


;; ====================================
;; Design variables generated by MATLAB
;; ====================================
(desVar "vccr12" 1.200000)
(desVar "vLo" 0.100000)
(desVar "vHi" 1.100000)
(desVar "compGain" 1000.000000)
(desVar "fClk" 1000000.000000)
(desVar "sigLength" 8.000000)
(desVar "tStop" 0.000008)
(desVar "daisyDemoTopReadFile" "\"/home/jacobw/TSTE16/testIn.txt\"")
(desVar "daisyDemoTopWriteFile" "\"/home/jacobw/TSTE16/testOut.txt\"")
;; ====================================

Happy hacking!