Patterns and Trends
Data vs Evidence
- The terms data and evidence are often used interchangeably, but in scientific inquiries the terms refer to two different concepts.
- Data is essentially pure information.- The results of a scientific inquiry (e.g. a table or spreadsheet) with no interpretation is an example of data.
- There is no context or information attached.
 
- Evidence is data with context.- While data can exist independently, when conclusions and analyses are made from it, it is evidence.
- The data becomes evidence for a statement.
- Data is only evidence when there is an opinion, viewpoint, or argument that it reinforces or refutes.
 
- Data has no meaning alone, it must be in the form of evidence to be of any use.
Statistics in Scientific Research
Mean, Median, and Standard Deviation
- Mean refers to the average of a dataset
- Mean is calculated using - Median refers to the middle value of the dataset, and is the term at position - Median is represented by - Standard deviation is the amount of variation in a dataset
- Standard Deviation can be calculated using - Standard Deviation is represented by 
Statistics Tests
F-Test
When is it used?
- When you have 2 numerical datasets, and want to compare their variances (how much they deviate from their respective means)
What does it tell us?
- The further the result of the F-test is from 1, the stronger the evidence for unequal population variances.
- Therefore, higher F-statistics can be interpreted as less correlation between the two variables.
How is an F-statistic calculated?
- Define the null hypothesis - Don’t actually write that if you’re asked. Instead, write “variable 1 is dependent on variable 2.” for the null, and “variable 1 is not dependent on variable 2.”
 
- Calculate the statistic using 
How can a conclusion be drawn from the F-test?
- Use an f-statistic table to determine the critical F-value of the dataset:- The significance/alpha level will be written above the table, usually in the form of - The numerator’s degrees of freedom (number of values of variable 1, minus 1) is along the top
- The denominator’s degrees of freedom (number of values of variable 2, minus 1) is along the left side
- You’ll be given the table for any in-class test. There’s also one here.
 
- The significance/alpha level will be written above the table, usually in the form of 
- If the calculated F-statistic is lower than the critical F value, accept the null hypothesis. If the calculated f-statistic is greater than the critical value, reject the null hypothesis.
T-Test
When is it used?
- To compare 2 normally-distributed variables with unknown variances
- Can be used alongside the F-Test
What does it tell us?
- T-test determines whether there is a significant difference between th means of 2 groups of data.
- Results from a T-test can be used as evidence for correlation between 2 variables.
How is the T-Statistic calculated?
- Identify the mean - Establish a null hypothesis stating that mean 1 - This will usually be phrased as - Following this with an alternate hypothesis is also a good idea.
 
- This will usually be phrased as 
- Use the formula for T-test with 2 variables:
If you only have 1 variable, use the 1-var t-test:
How is the T-statistic interpreted?
- Usually, a p-value is used to interpret the t-statistic. However, critical T-values can also be found using yet another table.
- This time, a few extra steps are needed:
- Determine your significance/alpha level (assume - Determine if your test is 1-tailed or 2-tailed:- Rephrase the question as an equation (for example, from “25% of packets are too heavy” to “Too heavy > 25%”)
- If the equation has “greater than” or “less than”, you need a 1-tailed t-test
- If the equation has “equals”, you need a 2-tailed t-test
 
- Now we move to the Empirical rule: because your data is normally distributed, you need to determine how many standard deviations from the mean your data can fall.- For example, an alpha value of 0.05 (or 5%) means data needs to fall between - If your test is 2-tailed, half your alpha level (because you’re only looking at 1 side of a symmetrical distribution)
 
- For example, an alpha value of 0.05 (or 5%) means data needs to fall between 
- Use a t-score table to determine the critical t-value. If your calculated T-value is greater than the critical value,
How can conclusions be drawn?
- If the calculated t-value is greater than the critical value, reject the null hypothesis. Otherwise, accept the null hypothesis.
I still don’t get it :/
Crash Course Statistics has a good video on T-tests that explains it far better than I have here.
Chi-Squared Test 
When is it used?
- To determine whether a categorical variable fits an expected distribution.
- Can only be used for discrete variables, such as frequency.
What does it tell us?
- Chi-Squared determines whether the difference between the observed and theoretical distributions is significant enough to be meaningful.
- Variation between observed and expected might be chanc/randomness, but a Chi-Squared test will determine the likelihood of this.
How is Chi-Squared calculated?
Where 
How can 
- Identify degrees of freedom: (number of rows minus 1) times (number of columns minus 1)
- Identify alpha/significance level (usually - Use a chi-squared table to determine the critical value.
- If the calculated chi-squared value is greater than the critical value, reject the null hypothesis. Otherwise, accept it.
Crash course?
Analysis of Variance (ANoVA)
When is it used?
- Analysis of Variance is used to analyse variance (🙄)
- It compares the amount of variance within each group to the variance between groups.
What does it tell us?
- If variance within each group is high, but between groups is low, it’s likely caused by an external influence.
- If variance within groups is low but between groups is high, it’s likely that the property being measured is dependent on the group the sample was taken from.
How is ANoVA calculated?
- Calculate the mean of each group - Calculate the SSR (sum of squares regression) using the formula - Calculate the SSE (sum of squares error) using the formula - Calculate the SST (sum of squares total) as SSR+SSE
How is ANoVA interpreted?
- Conveniently, we can use the same method as we did for the F-test (even the same distribution tables)
- The significance/alpha level will be written above the table, usually in the form of - The numerator’s degrees of freedom (number of groups, minus 1) is along the top
- The numerator’s degrees of freedom (number of values across all groups, minus 1) is along the left side
- You’ll be given the table for any in-class test. There’s also one here.
- If the calculated ANoVA is lower than the critical value, accept the null hypothesis. If the calculated ANoVA is greater than the critical value, reject the null hypothesis.
Crash Course?
Found this post useful? Support us on Patreon.