![]() # define labs first so we are thinking about what to expect on the graph! It’s hard to see in the hist column above because it’s small, so we’ll use the inspectdf::inspect_num() to look at the skewness and kurtosis of these Skewness refers to the distribution of a variable that is not symmetrical. The skimr::skim() output above shows us four numeric variables ( dislike_count, comment_count, view_count, and like_count), and the hist variable tells us these variables are skewed. ![]() We want to use specific terms to describe what a variable distribution looks like because this will give us some precision in what we’re seeing. We’ll explore these topics further below in visualizations. These measure the central tendency (i.e. the In contrast to the ‘spread’, a variable’s ‘middle’ is represented using numbers like the mean, median, and mode. We use numbers like variance, standard deviation, and interquartile range to represent the ‘spread’ or the dispertion of values for a particular variable. The amount a variable varies represents the amount of uncertainty we have in a particular phenomena or measurement. The skimr and inspectdf packages allow us to take a quick look at an entire data frame or sets of variables.īelow is a skimr::skim() of the DailyShowYouTube ame. We’ll start by visualizing variables by themselves, then move into bivariate (two-variable) graphs. In this section, we’re going to use visualizations to help us understand how much two numeric variables are related, or how much they are correlated. For more information on these variables, check out the YouTube API documentation. Some of these are meta data for the videos in the playlist ( id, url, and published_at), others contain information on the video related to viewership ( dislike_count, comment_count, view_count, and like_count). The DailyShowYouTube contains 9 variables. #> $ title "The Daily Show - Admiral General Aladeen", "The Dail… #> $ id "yEPSJF7BYOo", "AHO1a1kvZGo", "lPgZfhnCAdI", "9pOiOhx… ![]() # fs::dir_tree("data", regex = "DailyShow")ĭailyShowYouTube % dplyr::glimpse(78) #> Observations: 251 If you’d like to see the script for how we downloaded and imported these data, they’re in a Gist here. For this how-to, we’ll be two YouTube playlists: We covered how to access data using the tuber in a previous tutorial. Ggplot2::theme_set(theme_ipsum_tw(base_family = "Titillium Web", If you want to learn more about ggplot2, check out our tutorial here. Graph theme’s give us a little customization for the graphs we’ll be producing. Library(socviz) # for %nin% Set a graph theme Library(psych) # for skewness and kurtosis Library(tidymodels) # meta package for modeling Library(tidyverse) # all tidyverse packages Library(inspectdf) # check entire ame for variable types, etc. The packages we’ll be using in this tutorial are the following: library(egg) # for ggarrange ![]() Then we’ll examine the relationship between two variables by looking at the covariance and the correlation coefficient. We will look at how to assess a variable’s distribution using skewness, kurtosis, and normality. This post will cover how to measure the relationship between two numeric variables with the corrr package.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |