Data Validation: Lists

October 16th, 2005

For each cell, you can define in advance what type of contents is valid for that cell. This allows you to guide users through data entry in OpenOffice.org Calc by restricting cells to receive specific values and ranges.

In this tutorial, we will see how to restrict the value of a cell or range of cells to a specified list.

In the example below, we would like to restrict the values of two fields - State and Sex.

valid_1.jpg

First select the column with the State values. Invoke the Validity dialog with Data - Validity as shown below.

valid_2.jpg

There are three tabs in the Validity dialog. First, we define the criteria to be applied to the selected cells.

By default, all values are allowed. We will define a list which will be displayed when the user selects the cell for input.

valid_3.jpg

Because the list of valid states will be displayed from which the user can choose, we can dispense with an input help message. The Error Alert tab allows us to define an error message/action to be generated if erroneous value is inputted.

valid_4.jpg

We click OK to exit the Validity dialog. We are now ready to see how effective this feature is in validating the data being input in the table.

As you can see, selecting a cell in the state column results now in a drop-down list from which the user selects the desired value.

valid_5.jpg

We can also define other lists where desired/required - for example the Sex column - as shown below.

valid_6.jpg

Basic Functions: FORECAST, TREND, GROWTH

October 11th, 2005

In this tutorial, we will look at three related functions that can be used to predict future unknown values - based on existing data.

The FORECAST function takes a set of X-Y data pairs and returns a single Y value - given it’s corresponding X-value. The assumption is that the data is linear in behaviour.

In the example below, we have sales data for the first eight months of 2004. We want to predict (forecast) the sales figures for July 2005.

The first argument is the X value for which we need a forecast. The second and third arguments are the known Y values and known X values respectively.

trend_1.jpg

The TREND function issimilar to FORECAST. A linear relationship is assumed between the X and Y data sets.

Howver, TREND is an array function and returns an array of unknown Y values.

The frst two arguments to TREND are the know X and Y data sets. The third argument is the array of X values for which we wish to predict the corresponding Y data.

trend_2.jpg

As you can see from the chart below, OOo Calc ignore anomalous data as best it can in predicting future values for Y.

trend_3.jpg

Like TREND, the GROWTH function returns an array of predicted values. The difference being that exponential growth is assumed. Otherwise the format is the same.

trend_4.jpg

Again, by charting the known and predicted data, you can decide if OOo Calc has made a reasonable prediction based on the data provided to it.

trend_5.jpg

Charting: Creating a Candlestick chart

October 10th, 2005

I was intrigued to see that OOo Calc supported the creation of candestick charts.

What follows is not so much a tutorial on how to effectively make use of the feature in OOo Calc - but rather a summary of my experiences in trying to generate some meaningful charts from historical stock data available on the Web.

You are probably familiar with candlestick charts. One of the attractions is that in a single graphic, four useful stock metrics are presented, opening price, closing price, intraday hgh, and intraday low.

With no information or documentation to work with, I figured that in order to create a candlestick chart, I would need these four aforementioned metrics for my selected stock over a timeperiod that was of interest to me - say the last two months. This information is readily available online at sites like Yahoo - see below.

Step 1: From the Yahoo page below, I selected all columns of the historical prices table - except adjusted closing price - for Freescale (FSL) - about two months worth of data. Even though Volume is not required for the candlestick chart, I grabbed it anyways as it is a key metric in technical analysis of stocks.

candle_1.jpg

Step 2: I pasted the table into Ooo Calc as shown below. I use OOo 2.0 on Win XP and this direct copy/paste manouever from browser to spreadsheet went without a hitch. On other systems, your mileage may vary. Ultimately, we would want some mechanism whereby the stocks would be updated automatically at the close of trading each day. This would certainly be possible with some macro programming.

Step 3: I noticed that the table was arranged with the most recent data at the top. This would map into a candlestick moving from left to right - which is incorrect. I selected the table and then Data - Sort to bring up the Sort dialog. I then chose to sort by Column B - Ascending. The imported and flipped data is shown below.

candle_2.jpg

Step 4: Select all columns except the volume and open the chart wizard with Insert - Chart

First page: Both first row and first column selected as labels…

candle_3.jpg

Second page: We select the Stock chart type…

candle_4.jpg

Third page: The variant of the stock chart type is Stock Chart 2 - otherwise known as the candlestick chart. I also selected Y-axis grid lines for better legibility. A shortcoming of OOo Calc is the lack of control one has with the X-axis. More about that later.

candle_5.jpg

Fourth page: I added a basic title anddisabled the legend and axis titles for the moment.

candle_6.jpg

My first pass. What needs immediate attention is the X-axis - which is useless in it’s present state. Let us fix that…

candle_6a.jpg

The one way to ‘fix’ the x-axis is to create an alternate column in our table that will be converted more legibly in the final chart. With a bit of experimentation, I came up with the arrangement below. When regenerating the chart, ignore the column on the far left and use the manually created column to it’s right.

candle_6b.jpg

The result of my first attempt at a candlestick chart. This leaves a lot to be desired - but it is certainly promising. Certainly, we would like to automate this whole process by using macros. An automated daily update of the table and chart would also be very useful. Stay tuned!

candle_7.jpg

Paste Special III: Merging rows

October 9th, 2005

It has been decided to merge the Mungits and Wingdings product lines. We will use Paste Special to merge the sales totals in the spreadsheet below.

pspecial2_1.jpg

Step 1: Select C6:F6 and Edit - Copy

Step 2: Select C5:F5 and Edit - Paste Special

pspecial2_2.jpg

The Paste Special dialog opens. In this situation, we apply the ADD operator to the paste operation which has the effect of adding the pasted data to the existing contents of the selected cells.

pspecial2_3.jpg

Having merged the data, we can safely delete the Wingdings row in the table.

pspecial2_4.jpg

Random sort

October 6th, 2005

Unlike Excel, OOo Calc does not have random sort feature. It is straightforward to work around this.

Consider the table below - we wish to randomly sort the rows.

randsort_1.jpg

Step 1: Add a random number for each row in a seperate column as shown below.

randsort_2.jpg

Step 2: Select all columns and invoke the sort dialog as shown below…

randsort_3.jpg

We will be sorting our original two columns based on the column of random numbers…

randsort_4.jpg

The final result. You can delete the column of random numbers - they have served their purpose.

randsort_5.jpg

Charting: Creating a Gantt chart

October 5th, 2005

If you are familiar with project management and the various tools that are used to schedule deadlines and resources, you may be familiar with Gantt charts.

In this particular tutorial, we look at using stacked bar charts

First, we need to tabulate our project data. The numbers in columns D & E are total days completed & estimated days remaining in each task respectively.

gantt_1.jpg

Selecting the above table, we invoke the chart wizard with Insert - Chart.

Both the first row and column are selected as labels..

gantt_2.jpg

Select the Bar Chart type - below…

gantt_3.jpg

Of the different bar chart types, we choose the stacked variant - below…

gantt_4.jpg

Here, we add a meaningful title..

gantt_5.jpg

The ‘finished’ chart. We have some postprocessing further to perform before it looks like a Gantt chart. Select it for editing…

gantt_6.jpg

The blue bar above represents the start date of each task. We will make it disappear. Fist double click on any blue bar to invoke the Data Series editor below. Set the area fill to None and remove the border.

gantt_7.jpg

This is what the chart looks like with the blue bars removed …

gantt_8.jpg

Now select the Y axis for editing as shown below…

gantt_9.jpg

We now want to adjust the Scale settings for the Y-axis based on the timelines defined in the initial table.

The major interval (61.0) is approximately 2 months. The minor interval (1.0) is one day.

gantt_10.jpg

On the Label tab, we rotate the text to almost 270 degrees as shown below…

gantt_11.jpg

The finished Gantt chart…

gantt_12.jpg

Lognormal Distribution

October 3rd, 2005

The lognormal distribution is defined with reference to the normal distribution. A random variable is lognormally distributed if the logarithm of the random variable is normally distributed.

The lognormal distribution is commonly used for general reliability analysis, cycles-to-failure in fatigue, material strengths and loading variables in probabilistic design. Another advantage of the lognormal distribution is that it is positive-definite, so it is often useful for representing quantities that cannot have negative values. Lognormal distributions have proven useful as distributions for rainfall amounts, for the size distributions of aerosol particles or droplets, and for many other cases.

The following formula can be used to generate lognormal random data

=LOGINV(RAND(),meanlog,sdlog)

This is illustrated in the example sheet below…

lognormal_1.jpg

The formula in C6 is =LOGINV(RAND();$C$2;$C$3). Drag and fill for the rest of the table.

Just to convince ourselves that we have lognormally distributed data, we take each value in the lognormally distributed data set and calculate the natural log - below.

lognormal_2.jpg

Here, the formula in cell H6 is =LN(C6) Drag and fill as before.

The frequency distribution table & corresponding chart for our lognormally distributed data.

lognormal_3.jpg

For the corresponding normally distributed data, we see anaverage of 1.972 and standard deviation of 1.05 - which is expected - given our initial parameters for the lognormally distributed data.

lognormal_4.jpg

Normal Distribution IV: Skewness

September 27th, 2005

Plotting the frequency distribution of our random data set is quite straightforward and allows us to visually evaluate how symmetric about the mean our data is.

We will also look at a function that measures how symmetric a data set is.

We have already met the FREQUENCY function in a previous
tip

In the example below, we set up our intervals in B22:B30

and the FREQUENCY array function is added to C22:C30

The final step is to create a chart from the newly created frequency table.

<rand_2.jpg

A function that measures how symmetrically distributed a set of data is is SKEW. For perfectly symmetric data, the SKEW function returns 0.

In the plot below, the data is tailing off to the right - giving a positive skew. This is confirmed by the value of the SKEW function.

skew_2.jpg

In the next tip, we will show how to generate a skewed set of random data - known as the Lognormal distribution

Normal Distribution III

September 26th, 2005

In previous tips, we showed how to generate random input data that was distributed according to our requirements. In particular, we looked at the Normal Distribution here and here

There is an even easier way to generate our normally distributed data - and that is to use the NORMINV function. NORMINV returns the inverse of the normal cumulative distribution. Given a cumulative probability, a mean, and a standard deviation, NORMINV returns the value that cuts off the cumulative probability.
The arguments to the function are :

NORMINV(number, mean, stdev)
where

  • number represents the probability value used to determine the inverse normal distribution.
  • mean represents the mean value in the normal distribution.
  • stdev represents the standard deviation of the normal distribution.

By using the RAND function with multiple calls to NORMINV, we can obtain a nrmally distributed input data set.

In the example below, we set the mean to 0.0 and the stdev to 1.0. As you can see, the generated data set comes pretty close to our requirements.

rand_1.jpg

Next, we will plot a frequency graph of our data.

Basic Functions : FIXED

September 25th, 2005

In OpenOffice Calc, the FIXED function returns a text representation of a number rounded to a specified number of decimal places.

Read the rest of this entry »