As requested by Joerg in Germany, I am going to cover the topic of regression analysis over the next few days – including formulas, built-in functions and charting with trendlines.

Consider a set of data point pairs – which suggests a possible linear relationship between the two variables.

The equations below are used to calculate the slope m and y-intercept point b for a given set of data, as well as the correlation coefficient r.

Given the formula above, it is a straightforward process to extract the linear coefficients for the given set of data points.

Next, we will look at buil-in OOo Calc functions and the plotting of the trendline for this linear regression example.

The MROUND function allows you to round to any value you want and is particularly useful in financial/commerce applications where rounding to the nearest nickel/dime/quarter is required. This function is only available if Analysis AddIn is installed. It rounds the target value to any multiple that you specify.

To round the value in A1 to the nearest nickel we use =MROUND(A1; 0.05)

Unlike the equivalent Excel function, MROUND in Calc allows negative numbers.

In fact, MROUND() is so useful – it can even be used to round the time to the nearest quarter hour. An example of this is shown below.

It has been decided to merge the Mungits and Wingdings product lines. We will use Paste Special to merge the sales totals in the spreadsheet below.

Step 1: Select C6:F6 and Edit – Copy

Step 2: Select C5:F5 and Edit – Paste Special

The Paste Special dialog opens. In this situation, we apply the ADD operator to the paste operation which has the effect of adding the pasted data to the existing contents of the selected cells.

Having merged the data, we can safely delete the Wingdings row in the table.

This is a topic that you will best learn by trying out stuff for yourself – but here I summarize the basics…

There are a number of builtin functions that make use of regular expressions. Here, we use COUNTIF to illustrate some of the possibilities.

Regular expressions are a system for matching patterns in strings. They provide a very powerful set of tools for finding particular words or combinations of characters.

Based on my investigation, the following is a summary of the allowable constructs in OOo Calc.

Most characters match themselves. There are exceptions – see below. COUNTIF(A1:A100; “foo”) will return all the instances of the string “foo” in the specified range.

“.” matches any single character. COUNTIF(B2:B18; “..”) counts all cells with exactly two characters.

“*” is a special character that matches zero or more occurences of the previous expression. COUNTIF(B2:B18; “.*e”) counts all cells that end in “e”.

“+” is a special character that matches one or more occurences of the previous single character. COUNTIF(B2:B18; “.+m.+”) includes “Tmmy” and “name”, but not “my”.

A string of characters enclosed in square brackets ([]) matches any one character in that string. COUNTIF(B2:B18; “[efg].*”) counts all cells starting with e, f, or g.

At first glance, the TEXT function may not sound very useful. Here are a few simple applications.

In the top example, without the TEXT function, the total is not formatted correctlt (as a currency). However, the TEXT function allows us to apply the correct format to the number component of the string.

In the second example, we see how the TEXT function allows us to reformat existing data – such as dates.

Version 2.0 of OOo Calc will feature significant improvements to the DataPlot functionality. For this reason, we will revisit this powerful feature over the next few days.

First, the basics…

Consider the table of data below. There are numerous ways we may want to analyse this data. By using the DataPilot functionality in OOo Calc, we can perform the different analyses quite easily – with a bit of practice.

We first select the table we wish to analyze. Then Data-DataPilot-Start as is illustrated below.

We first are presented with the popup dialog below. We have already selected the data we are interested in – so we just select OK.

The next dialog box allow us to choose the layout of the DataPilot table. In our example, we are interested in knowing how each broker performed in each exchange.

Normally, at least one of the columns in our data source is numerical data that will be processed against the variables we are interested in.

Finally, we get to decide how the data is processed against the input variables. By default, the data is SUMmed, but that may not necessarily always be the case. This dialog is activated by double-clicking on the data field button.

Finally, the output DataPIlot table (or Pivot Table in Excel speak). It isplaced by OOo Calc just below the original data in this example.

In this tutorial, we will look at three related functions that can be used to predict future unknown values – based on existing data.

The FORECAST function takes a set of X-Y data pairs and returns a single Y value – given it’s corresponding X-value. The assumption is that the data is linear in behaviour.

In the example below, we have sales data for the first eight months of 2004. We want to predict (forecast) the sales figures for July 2005.

The first argument is the X value for which we need a forecast. The second and third arguments are the known Y values and known X values respectively.

The TREND function issimilar to FORECAST. A linear relationship is assumed between the X and Y data sets.

Howver, TREND is an array function and returns an array of unknown Y values.

The frst two arguments to TREND are the know X and Y data sets. The third argument is the array of X values for which we wish to predict the corresponding Y data.

As you can see from the chart below, OOo Calc ignore anomalous data as best it can in predicting future values for Y.

Like TREND, the GROWTH function returns an array of predicted values. The difference being that exponential growth is assumed. Otherwise the format is the same.

Again, by charting the known and predicted data, you can decide if OOo Calc has made a reasonable prediction based on the data provided to it.

I have plans to implement a Keno game in the OOo basic language – but first a simple Lotto number generator. If you go into the source, you can configure to your own requirements. Generating a random number between 1 and 49 is really simple. Making sure that number has not already been picked slightly complicates matters.

The COUNTBLANK function allows you to total the number of blank or empty cells in a specified range.

It is used as show n below in the example. Cells C3, B5 and D5 are empty – so the COUNTBLANK function in D7 returns 3.

You may have a situation where the array is quite large and it is important that each cell has a value. To make it easier to trap blank cells, you can combine the COUNTBLANK inside an IF function…

Conditional formatting would also help you spot blank cells.

Conditional Formatting: Comparing two Lists

We will use conditional formatting to identify unique entries between two lists.

In the sheet below, we have two lists of US States. We would like to identify the list entries that are unique to each list. This technique can also be applied to two versions of the same list to identify additions/removals.

Step 1: Select B2:B19 and open the Conditional Formatting dialog – below. We have already defined a formatting style that has a darker background.

In the Conditional Formatting dialog, select the “Formula is”option from the drop down list on the left and enter the formula as displayed below. Choose the predefined style. Exit the dialog.

We apply similar conditonal formatting to the D2:D19 array. I will leave that as an exercise for the reader.

The result after the conditional formatting is applied. Those cells that are shaded dark do not have a matching entry in the other list.

## Recent Comments