univariate_continuous.Rmd
Type library(statdata)
and then, datasets in the statdata
package are avaiable.
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
#> ✓ tibble 3.1.4 ✓ dplyr 1.0.7
#> ✓ tidyr 1.1.3 ✓ stringr 1.4.0
#> ✓ readr 2.0.1 ✓ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
library(statdata)
kicks_num
dataset contains number of success in the traditional Korean game, Jegichagi
.
data("kicks_num")
kicks_num <- kicks_num %>%
set_names('count')
kicks_num
#> # A tibble: 30 × 1
#> count
#> <dbl>
#> 1 32
#> 2 46
#> 3 54
#> 4 27
#> 5 16
#> 6 52
#> 7 18
#> 8 45
#> 9 47
#> 10 36
#> # … with 20 more rows
skimr
package contains skim()
function, which is an improved version of summary
function. This one line command spits out all the descriptive statstics needed to understand the continuous variable.
skimr::skim(kicks_num)
Name | kicks_num |
Number of rows | 30 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
count | 0 | 1 | 33.9 | 14.44 | 12 | 25 | 30 | 46.75 | 59 | ▇▇▃▆▅ |
We can visualize the univariate continouse variable with histogram or stem-and-leaf plot. There are various ways to visualize histogram, but the simplest way is to use hist()
.
hist(kicks_num$count)
The stem-and-leaf plot is also possible.
stem(kicks_num$count)
#>
#> The decimal point is 1 digit(s) to the right of the |
#>
#> 1 | 24
#> 1 | 67889
#> 2 |
#> 2 | 55567778
#> 3 | 23
#> 3 | 66
#> 4 | 4
#> 4 | 56778
#> 5 | 24
#> 5 | 599