Today we are taking a look at Frequency and Two-way tables. Frequency tables are created from raw data that have categorised, tallied and then totalled. We can use frequency tables to calculate relative frequencies which can be useful to describe proportions. If the sample is large enough, it can enable us to interpret them as probabilities. Two-way tables provide information about the frequency of two variables and the key to solving problems of this type is to pay attention to the totals column. This will enable you to complete a two-way table accurately and then use the information to calculate probabilities based on the data within the table.
Types of data and Sampling is the foci in today's countdown. You need to be able to recall the types of data and be able to categorise data when it is given to you. Note that there are several categories of data and your information may fit into two or more of these categories. The idea behind sampling is to get a selection of people that will accurately represent the whole population when statistical analysis is carried out. There are two types to be aware of, random (members given a number and then randomly chosen) and stratified (proportionate amount of each group is selected). When dealing with data collection it is always important to think about Bias and whether the data will be collected in a 'fair' manner.
In today's blog, we extend to diagrams used in probability, these include; Venn diagrams and Probability trees.With Venn diagrams, we organise data into Sets which are then contained within overlapping circles. Only elements of data that share two properties are situated in the overlap. If asked to fill in a Venn diagram, try and start with the overlaps where possible. It is important to understand the key terms surround Venn diagrams such as Union, Intersection and Complement.
With probability trees, you will be looking at two or more events happening. Remember that the probability of each event must add up to one. When the events are combined, we multiply along the branches (do not simplify at this point) to calculate the probability that these events will happen. When answering a question you may need to find the combinations that satisfy the question and then add those probabilities together. Be aware of repeated events where something is NOT replaced. This will mean that the denominator will be reduced in the second event.
Today we shift our focus to the Statistics section and focus on the topic of Probability. Here we take an introductory look at probability and using a sample space diagram to identify the number of outcomes. You should know that all probabilities from an event add up to make 1. Take care when calculating probabilities as your answer can be in the form of fractions (simplify them), decimals, and percentages so you may need to convert between them. We cover the 'or' rule for mutually exclusive events and the 'and' rule for independent events.
The focus for today is Scatter graphs. A scatter graph is used to show if two sets of data are related (correlated). There are three types of correlation to watch out for; Positive, Negative and No correlation. Sometimes you will be asked to plot points on a graph, be sure to plot these points like coordinates and try to do it as accurately as possible.
When asked to extract and estimate information from the scatter graph you will normally be marked on constructing a line of best fit so don't forget to do this.
On day 23, our focus is on Histograms which is a special 'Bar' chart for grouped data and will often have different widths. Beware the trap that the y axis is NOT frequency but should be labelled frequency density. The formula for calculating frequency density is covered in this snapshot and you may need to create extra columns in order to calculate this.
By rearranging the formula, you can calculate the frequency of each bar by multiplying frequency density by the bar width. This will help to fill in an incomplete data table.
For day 22 we extend beyond plotting cumulative frequency graphs and look at Quartiles and box plots. Once the graph has been created, we tend to examine it in more detail by looking at the quartiles which are situated at 25%, 50% (the Median) and 75%. An important area we analyse is the Inter Quartile Range (IQR), this tells about the spread of data of the middle 50% of the data.
A box plot is extracted from a cumulative frequency diagram and is made up of five key elements; Highest and lowest value, Upper and lower quartile (75% and 25%) and the median. These become useful when we want to compare tow or more sets of data.
On day 21 we move on and have a look at Cumulative frequency tables and graphs. This is essentially a running total where we add up the frequencies as we go along. It is important to check to see if your final value of the cumulative frequency matches the total frequency (normally given in the question).
When constructing a cumulative frequency graph, it is important to plot each point at the end of each group and join the points up with a nice smooth curve ('s' shaped). The graph is used to estimate numbers above and below certain values.
The focus for today's snapshot is Representing data. Once data has been collected, diagrams are used to represent the data so that it is easier to extract the key points from the data without having to look at all the numbers etc. The most common diagrams that are used are Pie charts, Bar charts, Line graphs and Pictograms. It is important that you follow the rules regarding the construction of the diagrams.
On day 18, we focus on calculating averages from a simple frequency table. There will be times where you will need to create the fx column so that the total amount can be worked out. Remember the formula for the median shows you where the median is located.
It will be important to look and check your answers to see if they are reasonable answers and fit within the data set.
On day 17, we shift our focus to the Statistics topic looking at Mean, Median, Mode and Range. We take a look at how to calculate the three averages and when it might be necessary to calculate each average.
Mean - When you want to include all the data. Median - When you want to eliminate extreme values (outliers) from the data set. Mode - When you are dealing with categorical data (either Quantitative or Qualitative). Range - Is not an average but a measure of spread.