Dumb Blonde Data's Incomplete Guide to Stats
Statistics Are All About Relationships
So the dictionary 📚 definition of statistics is:
“Statistics is a form of mathematical analysis that uses quantified models, representations and synopses for a given set of experimental data or real-life studies. Statistics studies methodologies to gather, review, analyze and draw conclusions from data”
And one formula for regression:
Umm… this is where I normally stop reading, but we are going to attempt to discuss stats in a more fun, less formal way. 🤩
I will admit I’ve always had an interesting relationship with statistics. Overall, LOVED it, got good grades, but feel like I half didn’t listen and just fit all the puzzle pieces in my head so it all made sense.
So WHY even try to understand all these concepts? 🤷♀️
Well, by having a good foundation of statistical concepts you increase the likelihood of being a good data analyst. Half the battle is looking at the data you have available, then deciding what data cleaning and analysis is necessary to solve a problem or gain understanding of the subject matter.
Alrighty, let’s dig in.
My definition of statistics is:
"A collection of quantitative methods that help you understand data points and their relationship to other numbers."
We’ll use this imaginary data as an example to explain the following concepts:
Let’s say you could access your own history with your (not so favorite) dating app.
You’re able to export a data set of potential match’s age, neighborhood, income, gender, hair color, dog lover, and days on app.
You’ll also get the binary variable of if you swiped right (1) or swiped left (0) on the match.
Within stats, there are two different categories… Descriptive vs Inferential Statistics.
Some descriptive statistics you could pull from this data set would be average age, range of time on app, max income, any outliers, sum of matches, frequencies of each gender, etc. 📊
The inferential statistics would be characteristics that, when they demonstrate strong correlations, can tell you what made you more likely to swipe right. For instance, we could do a regression analysis to see which characteristics of the matches result in a higher likelihood of you swiping right. This analysis will show you which characteristics have a stronger statistical pull toward you swiping right. 🔎
Other analysis that are important to know and understand are outliers, Pearson correlation, regression, and z-tests.
We’ll cover outliers today and then the rest next time!
We need to identify these guys in our dataset and ghost them. Otherwise they are going to skew our results dataset! That sweet older guy you dated who was great but then got a new job and moved away maybe isn't the BEST predictor of future long term love. 👻
Luckily there’s this general rule to help us find them. If a data point is 1.5 times more than the interquartile range (IQR), then buh, bye.
To find those guys, take all those ages and the upper range outlier formula is Q3 + (1.5*IQR).
Box and Whiskers Plot
You made it all the way to the end!! Good job. Now don't forget to use all the buzzwords at your next meeting: stats, IQR, quantitative, standard deviation, ghosting, DTR... oh wait, save those last ones for happy hour.
'Til Next Time 💖