October 11, 2005
Basic Functions: FORECAST, TREND, GROWTH
In this tutorial, we will look at three related functions that can be used to predict future unknown values - based on existing data.The FORECAST function takes a set of X-Y data pairs and returns a single Y value - given it's corresponding X-value. The assumption is that the data is linear in behaviour.
In the example below, we have sales data for the first eight months of 2004. We want to predict (forecast) the sales figures for July 2005.
The first argument is the X value for which we need a forecast. The second and third arguments are the known Y values and known X values respectively.

The TREND function issimilar to FORECAST. A linear relationship is assumed between the X and Y data sets.
Howver, TREND is an array function and returns an array of unknown Y values.
The frst two arguments to TREND are the know X and Y data sets. The third argument is the array of X values for which we wish to predict the corresponding Y data.

As you can see from the chart below, OOo Calc ignore anomalous data as best it can in predicting future values for Y.

Like TREND, the GROWTH function returns an array of predicted values. The difference being that exponential growth is assumed. Otherwise the format is the same.

Again, by charting the known and predicted data, you can decide if OOo Calc has made a reasonable prediction based on the data provided to it.

Posted by Dave at 10:37 PM | Comments (0) | TrackBack
October 03, 2005
Lognormal Distribution
The lognormal distribution is defined with reference to the normal distribution. A random variable is lognormally distributed if the logarithm of the random variable is normally distributed.The lognormal distribution is commonly used for general reliability analysis, cycles-to-failure in fatigue, material strengths and loading variables in probabilistic design. Another advantage of the lognormal distribution is that it is positive-definite, so it is often useful for representing quantities that cannot have negative values. Lognormal distributions have proven useful as distributions for rainfall amounts, for the size distributions of aerosol particles or droplets, and for many other cases.
The following formula can be used to generate lognormal random data
=LOGINV(RAND(),meanlog,sdlog)
This is illustrated in the example sheet below...

The formula in C6 is =LOGINV(RAND();$C$2;$C$3). Drag and fill for the rest of the table.
Just to convince ourselves that we have lognormally distributed data, we take each value in the lognormally distributed data set and calculate the natural log - below.

Here, the formula in cell H6 is =LN(C6) Drag and fill as before.
The frequency distribution table & corresponding chart for our lognormally distributed data.

For the corresponding normally distributed data, we see anaverage of 1.972 and standard deviation of 1.05 - which is expected - given our initial parameters for the lognormally distributed data.

Posted by Dave at 11:25 PM | Comments (0) | TrackBack
September 27, 2005
Normal Distribution IV: Skewness
Plotting the frequency distribution of our random data set is quite straightforward and allows us to visually evaluate how symmetric about the mean our data is.We will also look at a function that measures how symmetric a data set is.
We have already met the FREQUENCY function in a previous tip
In the example below, we set up our intervals in B22:B30 and the FREQUENCY array function is added to C22:C30
The final step is to create a chart from the newly created frequency table.

A function that measures how symmetrically distributed a set of data is is SKEW. For perfectly symmetric data, the SKEW function returns 0.
In the plot below, the data is tailing off to the right - giving a positive skew. This is confirmed by the value of the SKEW function.

In the next tip, we will show how to generate a skewed set of random data - known as the Lognormal distribution
Posted by Dave at 08:55 PM | Comments (0) | TrackBack
September 26, 2005
Normal Distribution III
In previous tips, we showed how to generate random input data that was distributed according to our requirements. In particular, we looked at the Normal Distribution here and hereThere is an even easier way to generate our normally distributed data - and that is to use the NORMINV function. NORMINV returns the inverse of the normal cumulative distribution. Given a cumulative probability, a mean, and a standard deviation, NORMINV returns the value that cuts off the cumulative probability. The arguments to the function are :
NORMINV(number, mean, stdev) where
- number represents the probability value used to determine the inverse normal distribution.
- mean represents the mean value in the normal distribution.
- stdev represents the standard deviation of the normal distribution.
In the example below, we set the mean to 0.0 and the stdev to 1.0. As you can see, the generated data set comes pretty close to our requirements.

Next, we will plot a frequency graph of our data.
Posted by Dave at 08:13 PM | Comments (0) | TrackBack
April 25, 2005
Ranking and sorting data I
Computing the rank order of entries in an array of data is needed in many applications.There are a number of builtin OOo functions that deal with the ranking and ordering of data. We will look at four - RANK, PERCENTRANK, PERCENTILE, and LARGE
In the example below, we see two variants of RANK wherein the entries can be ordered in descening or ascending order. The first two arguments of RANK are the value to be ranked and the array that contains the data. By default, with these two arguments, the entries will be ranked in descending order. The optional third argument - when set to 1 - ranks the entires in ascending order.
The PERCENTRANK function assigns a percentage value to a given value based on a 100% range between the minumum and maximum values of the specified range. If you don't format these results as a percentage value, the output from PERCENTRANK will lie betwen 0 and 1.
The PERCENTILE function accepts a percentage value as input and returns the percentile of data values in an array. A percentile returns the scale value for the array which goes from the smallest (%age=0) to the largest value (%age=1) of the array. The 50% percentile of an array is the same as the MEAN.
Posted by Dave at 11:18 PM | Comments (0)
April 18, 2005
Regression Analysis III : LINEST
The third solution to our linear curve fitting problem is to use the LINEST function.LINEST is an array function - returning an array of parameters extracted from the provided X-Y data.
In the example below, we opt for a minimized set of return parameters and we get the slope and y-intercept values as shown. C29:C31 are the original formulae we saw before. C35:D35 is the array output from LINEST. Remember to use Ctrl-Shift-Enter when applying this formula.
If array functions make you break out in sweat, you can embed LINEST inside an INDEX to extract whatever return parameter you want. This is also include in the example in C37:C38. Note that INDEX is not an array function - but it likes to work with array functions.
Posted by Dave at 11:07 AM | Comments (0)
April 13, 2005
Regression Analysis II : Basic functions, charting
In a continuation from the previous tip, we look at some basic built-in functions for determining trendline coefficients of a X-Y plot.For a series of X-Y values that we suspect have a linear relationship, we can determine the slope and y-intercept values of the linear approximation using the builtin functions SLOPE() and INTERCEPT as showb below. Compare these values to those we obtained when performig our regression analysis without using the builtin functions.
Now, plotting X-Y data is straightforward. Adding a trendline takes a little more work. We start off with the basic X-Y plot.
First, select the chart for editing by right-clicking anywhere inside the chart boundary. Now, before we add the trendline, we need to select the data series for editing. Move the cursor over one of the data points - you will see popup info about the nearest data point.
Left-click and the data series is now selected (below)
Now right-click and select Object Properties (below)
The Data Series dialog opens up. Select the Statistics tab (below). Select the Linear Regression curve and click OK
The modified X-Y plot with the newly added trend line.
Posted by Dave at 02:58 AM | Comments (6)
April 12, 2005
Regression analysis I : Basic linear formulas
As requested by Joerg in Germany, I am going to cover the topic of regression analysis over the next few days - including formulas, built-in functions and charting with trendlines.Consider a set of data point pairs - which suggests a possible linear relationship between the two variables.
The equations below are used to calculate the slope m and y-intercept point b for a given set of data, as well as the correlation coefficient r.
Given the formula above, it is a straightforward process to extract the linear coefficients for the given set of data points.
Next, we will look at buil-in OOo Calc functions and the plotting of the trendline for this linear regression example.
Posted by Dave at 10:15 PM | Comments (1)
January 22, 2005
Random numbers: Normal Distribution II
We present a simple macro that returns a random number with normal distribution.randomize
Dim PI As Double
PI = 3.1412
RandNormal = sqr(-2 * log(1-rnd())) * cos(rnd() * 2 * PI)
End Function
To verify the correctness of our macro, we invoke this function many times and calculate the average and standard deviation of the resulting random data. For normally distribution, we are loking for an average close to 0.0 and a standard deviation close to 1.0

Posted by Dave at 05:59 AM | Comments (0)
September 22, 2004
Random numbers: Normal Distribution I
The RAND function returns a random number with even distribution between 0 and 1. Here are examples of how to use this function...- For a random number of even distribution between 0 and 100 =RAND()*100
- For a random number of even distribution between -10 and 10 =(RAND()*20)-10
Posted by Dave at 05:11 AM | Comments (0)
September 03, 2004
Generating the nth roots of a number
The builtin OOo Calc function SQRT returns the square root of it's numerical argument. There are no
builtin functions for cube roots or higher - but you can use the exponentiation operator to derive any
order of root - as is illustrated below.
As you may notice, the parentheses are required in the example - because of the higher order of
precedence of the "^" operator over "/". If you are not sure of the precedence rules - it is better
to add the parentheses anyway - as it makes for more legible equations.

Posted by Dave at 06:20 AM | Comments (1)