Data Analysis

Investigating, applying statistical techniques, evaluating and using data to identify trends or relationships which can complement a research project or benefit an organisation. The findings from the analysis of data can help to significantly improve decision making.
If a field survey is carried out a certain amount of data will be collected. This raw data needs to be analysed so that it can be presented in a meaningful way within the body of the report.
Whilst there are many statistical techniques that might be used to analyse data, a basic start can be made by looking at a few ways of describing averages: Mean, Median, Mode, before exploring more advanced statistical techniques.
Mean: This is the traditional way of thinking of an average. Normally it will be the 'sample mean' because a survey will not measure every individual within a population. For example, if a wood consists of 1000 trees, it is possible that maybe 50 trees might be surveyed; the analysis will therefore be on the sample of the 50 trees, thus a sample mean; with inferences being made from this data analysis.
Worked Example:
The heights of 10 shrubs are measured in metres as follows: 2, 1, 1, 2, 3, 4, 2, 4, 3, 2
The average height is (total value)/(number of samples) = 24/10 = 2.4m
Median:
If rogue values, which will be very high or low values are included within the data, the use of the 'mean' will probably produce a value, which is unrepresentative of the sample.
Worked Example:
Grass height is being measured: if the design hadn't indicated the types of grass, such as turfgrasses, agricultural grasses, ornamental grasses, or 'wild' bamboo (which can grow to 30m (100 feet)), then if readings were taken from nine turfgrasses and one from a very tall bamboo, then the 'mean' average would be meaningless.
In this case it would be better to use the 'median' average which would be representative of the intended sample targets.
The median is the middle value within the recorded data. If there are even numbers of data, then the median is halfway between the two middle values.
Samples might have been taken as follows (height in cm): 1, 2, 2, 4, 4, 6, 6, 8, 10, 10000 (this last figure being 10 metres).
The middle values are 4 and 6; the median value is therefore these two figures added together and divided by two = (4+6)/2 = 5cm. This is much more representative than if the 10m height bamboo had been included, which would have occurred if the 'mean' average was used.
Mode:
This average describes the value which occurs most frequently, although there can be more than one set of most frequently occurring values.
Using the data given in the 'mean' worked example above:
Value : Frequency (i.e. how many times that value occurs)
1 : 2
2 : 4
3 : 2
4 : 2
The mode is therefore 2, because 2 occurs the most.
If we use some other data and if two values had the same highest frequency: for example.
Value : Frequency (i.e. how many times that value occurs)
2 : 7
4 : 3
8 : 7
10 : 4
then this would be termed bi-modal (bi = two) with the values 2 and 8.
Where there are more than two of the same frequently occurring values then this is termed multi-modal.
Statistical tests to determine probability, correlation, causality may be undertaken on the data, however these are not covered in this guide.