Do numerical summaries by groups with formaula interface. Missing values are automatically removed.

do_summary(
  y,
  data = NULL,
  stat = c("n", "missing", "mean", "trimmed", "sd", "variance", "min", "Q1", "median",
    "Q3", "max", "mad", "IQR", "range", "cv", "se", "skewness", "kurtosis"),
  trim = 0.1,
  type = 3,
  na.rm = TRUE
)

# S3 method for num_summaries
print(x, ..., digits = NA, format = "f", digits_sk = 2)

Arguments

y

formula with variable names to summarize. See more in examples.

data

data set

stat

(character) Descriptive statistics to compute. Currently supported statistics:

  • "n" - number of non-missing observations,

  • "missing" - number of missing observations,,

  • "mean" - arithmetic mean,

  • "sd" - standard deviation,

  • "variance" - variance,

  • "trimmed" - trimmed mean,

  • "min" - minimum value,

  • "Q1" - 1-st quartile,

  • "Md" - median,

  • "Q3" - 3-rd quartile,

  • "max" - maximum value,

  • "mad" - median absolute deviation from median (more details mad),

  • "IQR" - interquartile range,

  • "range" - range,

  • "cv" - coefficient of variation,

  • "se" - standard error of mean,

  • "skewness" - skewness,

  • "kurtosis" - excess kurtosis.

trim

The fraction (0 to 0.5) of observations to be trimmed from each end of sorted variable before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.

type

(integer: 1, 2, 3) The type of skewness and kurtosis estimate. See psych::describe() and psych::mardia() for details.

na.rm

(logical) Flag to remove missing values. Default is TRUE.

x

object to print

...

further arguments to methods.

digits

Number of digits for descriptive statistics.

format

(character) "f", "g", "e", "fg". Either one value or a vector of values for each column. Each value will be passed to fun separately.
"f" gives numbers in the usual xxx.xxx format;
"e" and "E" give n.ddde+nn or n.dddE+nn (scientific format);
"g" and "G" put number into scientific format only if it saves space to do so.
"fg" uses fixed format as "f", but digits as the minimum number of significant digits. This can lead to quite long result strings

digits_sk

Number of digits for skweness and kurtosis.

Value

Data frame with summary satatistics.

Examples

library(biostat) data(cabbages, package = "MASS") do_summary(~VitC, data = cabbages) %>% print(digits = 2)
#> Warning: `funs()` is deprecated as of dplyr 0.8.0. #> Please use a list of either functions or lambdas: #> #> # Simple named list: #> list(mean = mean, median = median) #> #> # Auto named with `tibble::lst()`: #> tibble::lst(mean, median) #> #> # Using lambdas #> list(~ mean(., trim = .2), ~ median(., na.rm = TRUE)) #> This warning is displayed once every 8 hours. #> Call `lifecycle::last_warnings()` to see where this warning was generated.
#> .summary_of n missing mean trimmed sd variance min Q1 median Q3 #> 1 VitC 60 0 57.95 57.67 10.12 102.39 41.00 50.75 56.00 66.25 #> max mad IQR range cv se skewness kurtosis #> 1 84.00 10.38 15.50 43.00 0.17 1.31 0.32 -0.68
do_summary(VitC ~ Cult, data = cabbages) %>% print(digits = 2)
#> .summary_of Cult n missing mean trimmed sd variance min Q1 median #> 1 VitC c39 30 0 51.50 50.92 7.12 50.74 41.00 46.00 51.00 #> 2 VitC c52 30 0 64.40 64.25 8.46 71.49 47.00 58.00 64.50 #> Q3 max mad IQR range cv se skewness kurtosis #> 1 54.75 68.00 5.93 8.75 27.00 0.14 1.30 0.58 -0.26 #> 2 70.75 84.00 9.64 12.75 37.00 0.13 1.54 0.12 -0.62
do_summary(VitC ~ Cult + Date, data = cabbages, stat = "mean") %>% print(digits = 2)
#> .summary_of Cult Date mean #> 1 VitC c39 d16 50.30 #> 2 VitC c39 d20 49.40 #> 3 VitC c39 d21 54.80 #> 4 VitC c52 d16 62.50 #> 5 VitC c52 d20 58.90 #> 6 VitC c52 d21 71.80
do_summary(HeadWt + VitC ~ Cult + Date, data = cabbages, stat = c("n", "mean", "sd") ) %>% print(digits = 1)
#> .summary_of Cult Date n mean sd #> 1 HeadWt c39 d16 10 3.2 1.0 #> 2 HeadWt c39 d20 10 2.8 0.3 #> 3 HeadWt c39 d21 10 2.7 1.0 #> 4 HeadWt c52 d16 10 2.3 0.4 #> 5 HeadWt c52 d20 10 3.1 0.8 #> 6 HeadWt c52 d21 10 1.5 0.2 #> 7 VitC c39 d16 10 50.3 4.3 #> 8 VitC c39 d20 10 49.4 8.3 #> 9 VitC c39 d21 10 54.8 7.6 #> 10 VitC c52 d16 10 62.5 5.8 #> 11 VitC c52 d20 10 58.9 7.7 #> 12 VitC c52 d21 10 71.8 6.2
# TODO: # 1. First argument should be a data frame #