Programming/R: Difference between revisions

From Dev Wiki
Jump to navigation Jump to search
m (Correct formatting)
(Create factor section)
Line 32: Line 32:
  # This will print out the typing for "my_variable".
  # This will print out the typing for "my_variable".
  class(my_variable)
  class(my_variable)
== Factors ==
Due to the statistical nature of R, we can also create "categorical variables", called '''factors'''. These are basically variables in which there is a limited, set number of possible values, aka categories.
=== Declaring Factors ===
# First, create a vector of your desired categories (see below section for more on vectors).
gender_vector <- c("Male", "Female", "Other")
&nbsp;
# Use the "factor" function to create a categorical variable of this vector.
gender_factor <- factor(gender_vector)
&nbsp;
# Note that it's possible to create a factor out of a vector with duplicate values.
# For example, the following vector will create the same factor as above.
gender_vector <- c("Male", "Female", "Male", "Female", "Other")
It's also possible to change the values of factors after the fact.
# Given an existing factor of ("S", "M", "L"),
# we want to change this to ("Small", "Medium", "Large").
size_factor <- factor(c("S", "M", "L"))
&nbsp;
# We can print this out to see that values are printed in alphabetical order.
size_factor
&nbsp;
# Now we change the values, but declare them alphabetically.
levels(size_factor) <- c("Large", "Medium", "Small")
=== Types of Factors ===
* '''Nominal Category''' - A category in which there is no associated ordering. Ex: Gender.
* '''Ordinal Category''' - A category which has an implied ordering. Ex: Sizes of fast food drinks.
By default, all factors in R are nominal. But they can be "upgraded" to ordinal with the following:
size_vector <- c("Small", "Medium", "Large")
size_factor <- factor(size_vector, order=True, levels = c("Small", "Medium", "Large"))
Note that order does not matter in the original vector. But it does matter in the '''levels''' argument supplied to the {ic | factor}} function.
=== Summary ===
You can get a quick and easy summary of factors with the {{ ic | summary }} function.
size_factor <- factor(c("S", "M", "L"))
summary(size_factor)





Revision as of 03:18, 31 May 2020

R is a language used for statistics.


Comments

# This is an inline comment.


Variables

Variables R are loosely typed in R. That means that the type (bool, int, string, etc) is implicitly declared by the value provided.

ToDo: Link to variable typing.

Variable Definition

a_bool <- TRUE
b_bool <- FALSE
my_var_1 <- "This is "
my_var_2 <- "a string."

Variable Usage

# We can print our variables by retyping the variable name with no further syntax.
a_bool
b_bool
my_var_1
my_var_2

Variable Types

Variable types in R are called the following:

  • Booleans are called Logicals .
  • Text is called characters .
  • Numbers are called numerics .

If ever unsure you can check the typing of a variable with class() . For example:

# This will print out the typing for "my_variable".
class(my_variable)


Factors

Due to the statistical nature of R, we can also create "categorical variables", called factors. These are basically variables in which there is a limited, set number of possible values, aka categories.

Declaring Factors

# First, create a vector of your desired categories (see below section for more on vectors).
gender_vector <- c("Male", "Female", "Other")
 
# Use the "factor" function to create a categorical variable of this vector.
gender_factor <- factor(gender_vector)
 
# Note that it's possible to create a factor out of a vector with duplicate values.
# For example, the following vector will create the same factor as above.
gender_vector <- c("Male", "Female", "Male", "Female", "Other")

It's also possible to change the values of factors after the fact.

# Given an existing factor of ("S", "M", "L"),
# we want to change this to ("Small", "Medium", "Large").
size_factor <- factor(c("S", "M", "L"))
 
# We can print this out to see that values are printed in alphabetical order.
size_factor
 
# Now we change the values, but declare them alphabetically.
levels(size_factor) <- c("Large", "Medium", "Small")

Types of Factors

  • Nominal Category - A category in which there is no associated ordering. Ex: Gender.
  • Ordinal Category - A category which has an implied ordering. Ex: Sizes of fast food drinks.

By default, all factors in R are nominal. But they can be "upgraded" to ordinal with the following:

size_vector <- c("Small", "Medium", "Large")
size_factor <- factor(size_vector, order=True, levels = c("Small", "Medium", "Large"))

Note that order does not matter in the original vector. But it does matter in the levels argument supplied to the {ic | factor}} function.

Summary

You can get a quick and easy summary of factors with the summary function.

size_factor <- factor(c("S", "M", "L"))
summary(size_factor)


Basic Data Structures

Vectors

In R, "Vectors" are what most other languages call "Arrays".

Arrays (vectors) in R are similar to Arrays (lists) in Python. That is, the size and semantics of the array are taken care of for you, and all you need to worry about are the values you place into it.

For the rest of this section, R arrays will be referred to by the proper name, aka Vectors.

Declaring Vectors

# Vectors in R can have mixed value types.
character_vector <- c("This", "is", "a", "character", "vector")
numeric_vector <- c(1, 2, 15, 6)
logical_vector <- c(TRUE, FALSE, FALSE)
mixed_vector <- c(TRUE, 1, "test")

Accessing Vector Values

Unlike most other programming languages, indexes in R start at 1.

my_vector <- c(5, 7, 2)
 
# Print first index.
my_vector[1]
 
# Print last index.
my_vector[3]

Furthermore, unlike most languages, you can select multiple values at once by using a nested Vector syntax:

my_vector <- c(5, 7, 2)
 
# Print out first and last index at the same time.
my_vector[c(1, 3)]
 
# Alternatively, to select a range of values, we can use this syntax.
# In this case, we print out the first two indexes.
my_vector[1:2]


Manipulating Vectors

Mathematical functions on Vectors handle very similarly to how you would expect real Mathematical Vectors to handle. That is, they are applied "element-wise" to the vector. Template:ToDo

For example, adding two vectors will add the corresponding index values.

my_vector <- c(1, 2, 3)
ones_vector <- c(1, 1, 1)
 
# This should create a new vector of (2, 3, 4).
new_vector <- my_vector + ones_vector

Alternatively, if you want to combine all values in a single vector, use the sum() function.

my_vector <- c(1, 2, 3)
 
# This will output "6".
sum(my_vector)

Or if we want to combine multiple vectors into one, we can use this:

vector_1 <- c(1, 2, 3)
vector_2 <- c(4, 5, 6)
 
# Combine these into a vector of (1, 2, 3, 4, 5, 6).
my_vector <- c(vector_1, vector_2)

To get an average of array values, we can use the mean() function.

my_vector <- c(1, 2, 3)
 
# This will output "2".
mean(my_vector)

We can also test equality on every value within a Vector.

my_vector <- c(1, 2, 3)
 
# Check which values are greater than 1.
my_vector > 1

We can take equality testing a bit further, to only print the values that met our criteria.

my_vector <- c(9, 10, 7, 11, 5, 12)
 
# Save which values are greater than 10.
selection_vector <- my_vector > 10
 
# Print out only values greater than 10. This should print out (11, 12).
my_vector[selection_vector]

Dictionaries

Dictionaries in R appear to actually be modified vectors. Basically, first create your desired vector (to hold the "values"), then use the names function on it to declare keys.

Here's an example for a hypothetical business trying to track count of items sold.

product_sold <- c(50, 56, 102)
names(product_sold) <- c("Ice Cream", "Burgers", "Pizza")

Alternatively, you can create two arrays and combine them.

product_sold <- c(50, 56, 102)
product_names <- c("Ice Cream", "Burgers", "Pizza")
names(product_sold) <- product_names

The two above code snippets should be equivalent.

Once we have names (aka keys) associated with our values, we can use those to get specific indexes.

# Print out the count of pizza sold.
product_sold["Pizza"]

Matrices

Matrices are effectively "2-D Vectors" in R.

Declaring Matrices

Matrices are essentially declared via a special function that returns the formatting we want. The format is:

matrix(<values>, byrow = <bool>, nrow = <row_count>

For example, to declare a 3x3 matrix with values 1 through 9, we can use:

matrix(1:9, byrow = TRUE, nrow = 3)

Similarly to Vectors, (see #Dictionaries), we can associate strings with our values. The syntax is as follows.

# Declare matrix row names.
rownames(my_matrix) <- row_names_vector
 
# Declare matrix column names.
colnames(my_matrix) <- col_names_vector

Accessing Matrix Values

Accessing matrix values is very similar to accessing #Vector values, except that you need to specify both row and column. Omitting a row/col value will assume you want all row/col indexes.

# For example, this will get the second row and third column of a matrix.
my_matrix[2, 3]
 
# Get all values in the first two rows and first three columns of a matrix.
my_matrix[1:2, 1:3]
 
# Get all values in the first row.
my_matrix[1, ]
 
# Get all values in the first column.
my_matrix[, 1]

Manipulating Matrices

We can combine matrices with the rbind or cbind function.

Note: This can also combine vectors into a matrix, if desired.
# This will combine two matrices into one, by matching row names.
rbind(matrix_1, matrix_2)
 
# This will combine two matrices into one, by matching column names.
cbind(matrix_1, matrix_2)

Similarly to #Vectors, mathematical operations applied to Matrices will apply element-wise. For example:

# This will add two to all matrix indexes.
my_matrix + 2

We can get sums of our matrix with the rowSums or colSums function :

# Get sum of rows.
rowSums(my_matrix)
 
# Get sum of columns.
colSums(my_matrix)