Archive for the ‘Math & Statistics’ Category

Basic Functions: FORECAST, TREND, GROWTH

Tuesday, October 11th, 2005

In this tutorial, we will look at three related functions that can be used to predict future unknown values - based on existing data.

The FORECAST function takes a set of X-Y data pairs and returns a single Y value - given it’s corresponding X-value. The assumption is that the data is linear in behaviour.

In the example below, we have sales data for the first eight months of 2004. We want to predict (forecast) the sales figures for July 2005.

The first argument is the X value for which we need a forecast. The second and third arguments are the known Y values and known X values respectively.

trend_1.jpg

The TREND function issimilar to FORECAST. A linear relationship is assumed between the X and Y data sets.

Howver, TREND is an array function and returns an array of unknown Y values.

The frst two arguments to TREND are the know X and Y data sets. The third argument is the array of X values for which we wish to predict the corresponding Y data.

trend_2.jpg

As you can see from the chart below, OOo Calc ignore anomalous data as best it can in predicting future values for Y.

trend_3.jpg

Like TREND, the GROWTH function returns an array of predicted values. The difference being that exponential growth is assumed. Otherwise the format is the same.

trend_4.jpg

Again, by charting the known and predicted data, you can decide if OOo Calc has made a reasonable prediction based on the data provided to it.

trend_5.jpg

Lognormal Distribution

Monday, October 3rd, 2005

The lognormal distribution is defined with reference to the normal distribution. A random variable is lognormally distributed if the logarithm of the random variable is normally distributed.

The lognormal distribution is commonly used for general reliability analysis, cycles-to-failure in fatigue, material strengths and loading variables in probabilistic design. Another advantage of the lognormal distribution is that it is positive-definite, so it is often useful for representing quantities that cannot have negative values. Lognormal distributions have proven useful as distributions for rainfall amounts, for the size distributions of aerosol particles or droplets, and for many other cases.

The following formula can be used to generate lognormal random data

=LOGINV(RAND(),meanlog,sdlog)

This is illustrated in the example sheet below…

lognormal_1.jpg

The formula in C6 is =LOGINV(RAND();$C$2;$C$3). Drag and fill for the rest of the table.

Just to convince ourselves that we have lognormally distributed data, we take each value in the lognormally distributed data set and calculate the natural log - below.

lognormal_2.jpg

Here, the formula in cell H6 is =LN(C6) Drag and fill as before.

The frequency distribution table & corresponding chart for our lognormally distributed data.

lognormal_3.jpg

For the corresponding normally distributed data, we see anaverage of 1.972 and standard deviation of 1.05 - which is expected - given our initial parameters for the lognormally distributed data.

lognormal_4.jpg

Normal Distribution IV: Skewness

Tuesday, September 27th, 2005

Plotting the frequency distribution of our random data set is quite straightforward and allows us to visually evaluate how symmetric about the mean our data is.

We will also look at a function that measures how symmetric a data set is.

We have already met the FREQUENCY function in a previous
tip

In the example below, we set up our intervals in B22:B30

and the FREQUENCY array function is added to C22:C30

The final step is to create a chart from the newly created frequency table.

<rand_2.jpg

A function that measures how symmetrically distributed a set of data is is SKEW. For perfectly symmetric data, the SKEW function returns 0.

In the plot below, the data is tailing off to the right - giving a positive skew. This is confirmed by the value of the SKEW function.

skew_2.jpg

In the next tip, we will show how to generate a skewed set of random data - known as the Lognormal distribution

Normal Distribution III

Monday, September 26th, 2005

In previous tips, we showed how to generate random input data that was distributed according to our requirements. In particular, we looked at the Normal Distribution here and here

There is an even easier way to generate our normally distributed data - and that is to use the NORMINV function. NORMINV returns the inverse of the normal cumulative distribution. Given a cumulative probability, a mean, and a standard deviation, NORMINV returns the value that cuts off the cumulative probability.
The arguments to the function are :

NORMINV(number, mean, stdev)
where

  • number represents the probability value used to determine the inverse normal distribution.
  • mean represents the mean value in the normal distribution.
  • stdev represents the standard deviation of the normal distribution.

By using the RAND function with multiple calls to NORMINV, we can obtain a nrmally distributed input data set.

In the example below, we set the mean to 0.0 and the stdev to 1.0. As you can see, the generated data set comes pretty close to our requirements.

rand_1.jpg

Next, we will plot a frequency graph of our data.

Regression Analysis III : LINEST

Monday, April 18th, 2005

The third solution to our linear curve fitting problem is to use the LINEST function.

LINEST is an array function - returning an array of parameters extracted from the provided X-Y data.

In the example below, we opt for a minimized set of return parameters
and we get the slope and y-intercept values as shown. C29:C31 are the original formulae we saw before. C35:D35 is the array output from LINEST. Remember to use Ctrl-Shift-Enter when applying this formula.

If array functions make you break out in sweat, you can embed LINEST inside an INDEX to extract whatever return parameter you want. This is also include in the example in C37:C38. Note that INDEX is not an array function - but it likes to work with array functions.

linreg3_1.jpg

Regression Analysis II : Basic functions, charting

Wednesday, April 13th, 2005

n a continuation from the previous tip, we look at some basic built-in functions for determining trendline coefficients of a X-Y plot.

For a series of X-Y values that we suspect have a linear relationship, we can determine the slope and y-intercept values of the linear approximation using the builtin functions SLOPE() and INTERCEPT as showb below. Compare these values to those we obtained when performig our regression analysis without using the builtin functions.

linreg2_1.jpg

Now, plotting X-Y data is straightforward. Adding a trendline takes a little more work. We start off with the basic X-Y plot.

linreg2_2.jpg

First, select the chart for editing by right-clicking anywhere inside the chart boundary. Now, before we add the trendline, we need to select the data series for editing. Move the cursor over one of the data points - you will see popup info about the nearest data point.

linreg2_3.jpg

Left-click and the data series is now selected (below)

linreg2_4.jpg

Now right-click and select Object Properties (below)

linreg2_5.jpg

The Data Series dialog opens up. Select the Statistics tab (below). Select the Linear Regression curve and click OK

linreg2_6.jpg

The modified X-Y plot with the newly added trend line.

linreg2_7.jpg

Regression analysis I : Basic linear formulas

Tuesday, April 12th, 2005

As requested by Joerg in Germany, I am going to cover the topic of regression analysis over the next few days - including formulas, built-in functions and charting with trendlines.

Consider a set of data point pairs - which suggests a possible linear relationship between the two variables.

The equations below are used to calculate the slope m and y-intercept point b for a given set of data, as well as the correlation coefficient r.

linreg1.jpg

Given the formula above, it is a straightforward process to extract the linear coefficients for the given set of data points.

linreg2.jpg

Next, we will look at buil-in OOo Calc functions and the plotting of the trendline for this linear regression example.

Random numbers: Normal Distribution II

Saturday, January 22nd, 2005

We present a simple macro that returns a random number with normal distribution.
(more…)

Random numbers: Normal Distribution I

Wednesday, September 22nd, 2004

The RAND function returns a random number with even distribution between 0 and 1. Here are examples of how to use this function…

(more…)

Generating the nth roots of a number

Friday, September 3rd, 2004

The builtin OOo Calc function SQRT returns the square root of it’s numerical argument. There are no
builtin functions for cube roots or higher - but you can use the exponentiation operator to derive any
order of root - as is illustrated below.

As you may notice, the parentheses are required in the example - because of the higher order of
precedence of the “^” operator over “/”. If you are not sure of the precedence rules - it is better
to add the parentheses anyway - as it makes for more legible equations.

roots.jpg