Do numerical summaries by groups — do

Do numerical summaries by groups with formaula interface. Missing values are automatically removed.

do_summary(
  y,
  data = NULL,
  stat = c("n", "missing", "mean", "trimmed", "sd", "variance", "min", "Q1", "median",
    "Q3", "max", "mad", "IQR", "range", "cv", "se", "skewness", "kurtosis"),
  trim = 0.1,
  type = 3,
  na.rm = TRUE
)

# S3 method for num_summaries
print(x, ..., digits = NA, format = "f", digits_sk = 2)

Arguments

y	formula with variable names to summarize. See more in examples.
data	data set
stat	(character) Descriptive statistics to compute. Currently supported statistics: `"n"` - number of non-missing observations, `"missing"` - number of missing observations,, `"mean"` - arithmetic mean, `"sd"` - standard deviation, `"variance"` - variance, `"trimmed"` - trimmed mean, `"min"` - minimum value, `"Q1"` - 1-st quartile, `"Md"` - median, `"Q3"` - 3-rd quartile, `"max"` - maximum value, `"mad"` - median absolute deviation from median (more details mad), `"IQR"` - interquartile range, `"range"` - range, `"cv"` - coefficient of variation, `"se"` - standard error of mean, `"skewness"` - skewness, `"kurtosis"` - excess kurtosis.
trim	The fraction (0 to 0.5) of observations to be trimmed from each end of sorted variable before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
type	(integer: 1, 2, 3) The type of skewness and kurtosis estimate. See `psych::describe()` and `psych::mardia()` for details.
na.rm	(logical) Flag to remove missing values. Default is `TRUE`.
x	object to print
...	further arguments to methods.
digits	Number of digits for descriptive statistics.
format	(character) `"f"`, `"g"`, `"e"`, `"fg"`. Either one value or a vector of values for each column. Each value will be passed to `fun` separately. `"f"` gives numbers in the usual `xxx.xxx` format; `"e"` and `"E"` give `n.ddde+nn` or `n.dddE+nn` (scientific format); `"g"` and `"G"` put number into scientific format only if it saves space to do so. `"fg"` uses fixed format as `"f"`, but digits as the minimum number of significant digits. This can lead to quite long result strings
digits_sk	Number of digits for skweness and kurtosis.

Value

Data frame with summary satatistics.

Examples

library(biostat)
data(cabbages, package = "MASS")

do_summary(~VitC, data = cabbages) %>%
  print(digits = 2)
#> Warning: `funs()` is deprecated as of dplyr 0.8.0.
#> Please use a list of either functions or lambdas: 
#> 
#>   # Simple named list: 
#>   list(mean = mean, median = median)
#> 
#>   # Auto named with `tibble::lst()`: 
#>   tibble::lst(mean, median)
#> 
#>   # Using lambdas
#>   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
#>   .summary_of  n missing  mean trimmed    sd variance   min    Q1 median    Q3
#> 1        VitC 60       0 57.95   57.67 10.12   102.39 41.00 50.75  56.00 66.25
#>     max   mad   IQR range   cv   se skewness kurtosis
#> 1 84.00 10.38 15.50 43.00 0.17 1.31     0.32    -0.68

do_summary(VitC ~ Cult, data = cabbages) %>%
  print(digits = 2)
#>   .summary_of Cult  n missing  mean trimmed   sd variance   min    Q1 median
#> 1        VitC  c39 30       0 51.50   50.92 7.12    50.74 41.00 46.00  51.00
#> 2        VitC  c52 30       0 64.40   64.25 8.46    71.49 47.00 58.00  64.50
#>      Q3   max  mad   IQR range   cv   se skewness kurtosis
#> 1 54.75 68.00 5.93  8.75 27.00 0.14 1.30     0.58    -0.26
#> 2 70.75 84.00 9.64 12.75 37.00 0.13 1.54     0.12    -0.62

do_summary(VitC ~ Cult + Date, data = cabbages, stat = "mean") %>%
  print(digits = 2)
#>   .summary_of Cult Date  mean
#> 1        VitC  c39  d16 50.30
#> 2        VitC  c39  d20 49.40
#> 3        VitC  c39  d21 54.80
#> 4        VitC  c52  d16 62.50
#> 5        VitC  c52  d20 58.90
#> 6        VitC  c52  d21 71.80

do_summary(HeadWt + VitC ~ Cult + Date,
  data = cabbages,
  stat = c("n", "mean", "sd")
) %>%
  print(digits = 1)
#>    .summary_of Cult Date  n mean  sd
#> 1       HeadWt  c39  d16 10  3.2 1.0
#> 2       HeadWt  c39  d20 10  2.8 0.3
#> 3       HeadWt  c39  d21 10  2.7 1.0
#> 4       HeadWt  c52  d16 10  2.3 0.4
#> 5       HeadWt  c52  d20 10  3.1 0.8
#> 6       HeadWt  c52  d21 10  1.5 0.2
#> 7         VitC  c39  d16 10 50.3 4.3
#> 8         VitC  c39  d20 10 49.4 8.3
#> 9         VitC  c39  d21 10 54.8 7.6
#> 10        VitC  c52  d16 10 62.5 5.8
#> 11        VitC  c52  d20 10 58.9 7.7
#> 12        VitC  c52  d21 10 71.8 6.2


# TODO:
# 1. First argument should be a data frame
#