Programming/R: Difference between revisions

From Dev Wiki
Jump to navigation Jump to search
(Add list section)
m (Formatting)
Line 308: Line 308:
For example, maybe a dataset has multiple sets, and it makes sense to represent each set as a [[#Vector | vector]]. Then to represent the full dataset, you might put all these vectors in a list.
For example, maybe a dataset has multiple sets, and it makes sense to represent each set as a [[#Vector | vector]]. Then to represent the full dataset, you might put all these vectors in a list.


==== Declaring Lists ===
==== Declaring Lists ====
To declare a list, use the {{ ic | list }} function and include all the desired datasets within.
To declare a list, use the {{ ic | list }} function and include all the desired datasets within.
  # For example, to create a list of vectors, like our example scenario above.
  # For example, to create a list of vectors, like our example scenario above.
Line 325: Line 325:
   
   
  # Sometimes, syntax uses double brackets. This seems to generally act the same as above.
  # Sometimes, syntax uses double brackets. This seems to generally act the same as above.
  my_list[[1]]
  my_list<nowiki>[[1]]</nowiki>


We can add new list values with assignment. Updating uses the same syntax.
We can add new list values with assignment. Updating uses the same syntax.

Revision as of 22:35, 31 May 2020

R is a language used for statistics.


Comments

# This is an inline comment.


Variables

Variables R are loosely typed in R. That means that the type (bool, int, string, etc) is implicitly declared by the value provided.

ToDo: Link to variable typing.

Variable Definition

# Null.
type_1 <- NULL
type_2 <- NA
 
# Booleans.
a_bool <- TRUE
b_bool <- FALSE
 
# Ints.
val_1 <- 5
val_2 <- -6
 
# Strings.
my_var_1 <- "This is "
my_var_2 <- "a string."

Variable Usage

# We can print our variables by retyping the variable name with no further syntax.
a_bool
b_bool
my_var_1
my_var_2
 
# Alternatively, we can use the "print" function.
print(a_bool)
print(b_bool)
print(my_var_1)
print(my_var_2)

Variable Types

Variable types in R are called the following:

  • Booleans are called Logicals .
  • Text is called characters .
  • Numbers are called numerics .

If ever unsure you can check the typing of a variable with class() . For example:

# This will print out the typing for "my_variable".
class(my_variable)


If Statements

Basic If

if(x == y) {
    # Logic if true.
}

Full If

if (x == y) {
    # Logic for "if" true.
} else if ( x & (y | z)) {
    # Logic for "else if" true.
} else {
    # Logic for false.
}


Loops

While Loop

while (x == y) {
    # Logic to run for loop.
}

For Loop

For loops are extremely clean in R. They handle similarly to Python in that they directly grab the items for the iterable object we loop through. Template:ToDo

# This prints out all items in the provided vector.
for (item in my_vector) {
    print(item)
}

However, sometimes it's still useful to loop using index values.

# Same as above, but loops via index values.
for (index in 1:length(my_vector)) {
    print(my_vector[index])
}

Next

We can immediately jump to the next iteration of a loop with the next keyword.

for (item in my_vector) {
    # If item is "5", then skip to next item.
    if (item == 5) {
        next
    }
}

Break

We can break out of a loop early with the break keyword.

# The initial statement indicates this loop should run forever.
while (TRUE) {
    # But this break statement will terminate the loop early.
    break
}

Basic Data Structures

Vectors

In R, "Vectors" are what most other languages call "Arrays".

Arrays (vectors) in R are similar to Arrays (lists) in Python. That is, the size and semantics of the array are taken care of for you, and all you need to worry about are the values you place into it.

For the rest of this section, R arrays will be referred to by the proper name, aka Vectors.

Declaring Vectors

# Vectors in R can have mixed value types.
character_vector <- c("This", "is", "a", "character", "vector")
numeric_vector <- c(1, 2, 15, 6)
logical_vector <- c(TRUE, FALSE, FALSE)
mixed_vector <- c(TRUE, 1, "test")

Accessing Vector Values

Unlike most other programming languages, indexes in R start at 1.

my_vector <- c(5, 7, 2)
 
# Print first index.
my_vector[1]
 
# Print last index.
my_vector[3]

Furthermore, unlike most languages, you can select multiple values at once by using a nested Vector syntax:

my_vector <- c(5, 7, 2)
 
# Print out first and last index at the same time.
my_vector[c(1, 3)]
 
# Alternatively, to select a range of values, we can use this syntax.
# In this case, we print out the first two indexes.
my_vector[1:2]

Manipulating Vectors

Mathematical functions on Vectors handle very similarly to how you would expect real Mathematical Vectors to handle. That is, they are applied "element-wise" to the vector. Template:ToDo

For example, adding two vectors will add the corresponding index values.

my_vector <- c(1, 2, 3)
ones_vector <- c(1, 1, 1)
 
# This should create a new vector of (2, 3, 4).
new_vector <- my_vector + ones_vector

Alternatively, if you want to combine all values in a single vector, use the sum() function.

my_vector <- c(1, 2, 3)
 
# This will output "6".
sum(my_vector)

Or if we want to combine multiple vectors into one, we can use this:

vector_1 <- c(1, 2, 3)
vector_2 <- c(4, 5, 6)
 
# Combine these into a vector of (1, 2, 3, 4, 5, 6).
my_vector <- c(vector_1, vector_2)

We can also test equality on every value within a Vector.

my_vector <- c(1, 2, 3)
 
# Check which values are greater than 1.
my_vector > 1

We can take equality testing a bit further, to only print the values that met our criteria.

my_vector <- c(9, 10, 7, 11, 5, 12)
 
# Save which values are greater than 10.
selection_vector <- my_vector > 10
 
# Print out only values greater than 10. This should print out (11, 12).
my_vector[selection_vector]

Dictionaries

Dictionaries in R appear to actually be modified vectors. Basically, first create your desired vector (to hold the "values"), then use the names function on it to declare keys.

Here's an example for a hypothetical business trying to track count of items sold.

product_sold <- c(50, 56, 102)
names(product_sold) <- c("Ice Cream", "Burgers", "Pizza")

Alternatively, you can create two arrays and combine them.

product_sold <- c(50, 56, 102)
product_names <- c("Ice Cream", "Burgers", "Pizza")
names(product_sold) <- product_names

The two above code snippets should be equivalent.

Once we have names (aka keys) associated with our values, we can use those to get specific indexes.

# Print out the count of pizza sold.
product_sold["Pizza"]


Matrices

Matrices are effectively "2-D Vectors" in R.

Declaring Matrices

Matrices are essentially declared via a special function that returns the formatting we want. The format is:

matrix(<values>, byrow = <bool>, nrow = <row_count>

For example, to declare a 3x3 matrix with values 1 through 9, we can use:

matrix(1:9, byrow = TRUE, nrow = 3)

Similarly to Vectors, (see dictionaries), we can associate strings with our values. The syntax is as follows.

# Declare matrix row names.
rownames(my_matrix) <- row_names_vector
 
# Declare matrix column names.
colnames(my_matrix) <- col_names_vector

Accessing Matrix Values

Accessing matrix values is very similar to accessing vector values, except that you need to specify both row and column. Omitting a row/col value will assume you want all row/col indexes.

# For example, this will get the second row and third column of a matrix.
my_matrix[2, 3]
 
# Get all values in the first two rows and first three columns of a matrix.
my_matrix[1:2, 1:3]
 
# Get all values in the first row.
my_matrix[1, ]
 
# Get all values in the first column.
my_matrix[, 1]

Manipulating Matrices

We can combine matrices with the rbind or cbind function.

Note: This can also combine vectors into a matrix, if desired.
# This will combine two matrices into one, by matching row names.
rbind(matrix_1, matrix_2)
 
# This will combine two matrices into one, by matching column names.
cbind(matrix_1, matrix_2)

Similarly to vectors, mathematical operations applied to Matrices will apply element-wise. For example:

# This will add two to all matrix indexes.
my_matrix + 2

We can get sums of our matrix with the rowSums or colSums function :

# Get sum of rows.
rowSums(my_matrix)
 
# Get sum of columns.
colSums(my_matrix)


Factors

Due to the statistical nature of R, we can also create "categorical variables", called factors. These are basically variables in which there is a limited, set number of possible values, aka categories.

Declaring Factors

# First, create a vector of your desired categories.
gender_vector <- c("Male", "Female", "Other")
 
# Use the "factor" function to create a categorical variable of this vector.
gender_factor <- factor(gender_vector)
 
# Note that it's possible to create a factor out of a vector with duplicate values.
# For example, the following vector will create the same factor as above.
gender_vector <- c("Male", "Female", "Male", "Female", "Other")

It's also possible to change the values of factors after the fact.

# Given an existing factor of ("S", "M", "L"),
# we want to change this to ("Small", "Medium", "Large").
size_factor <- factor(c("S", "M", "L"))
 
# We can print this out to see that values are printed in alphabetical order.
size_factor
 
# Now we change the values, but declare them alphabetically.
levels(size_factor) <- c("Large", "Medium", "Small")

Types of Factors

  • Nominal Category - A category in which there is no associated ordering. Ex: Gender.
  • Ordinal Category - A category which has an implied ordering. Ex: Sizes of fast food drinks.

By default, all factors in R are nominal. But they can be "upgraded" to ordinal with the following:

size_vector <- c("Small", "Medium", "Large")
size_factor <- factor(size_vector, order=True, levels = c("Small", "Medium", "Large"))

Note that order does not matter in the original vector. But it does matter in the levels argument supplied to the {ic | factor}} function.


Data Frames

Data Frames are a lot like matrices, except that they are specialized to handle data sets with various different data types within.

Declaring Data Frames

# We can use data.frame() along with desired vectors.
# Note that we can pass in as many or few vectors as we want.
# Each passed vector becomes a column for the data frame,
# so all passed vectors should be of the same size.
data.frame(my_vector_1, my_vector_2, my_vector_3)

Accessing Data Frame Values

Data Frame values are accessed the same was as Matrix values. See Accessing Matrix Values for more info.


Lists

Lists are essentially "collections of multiple objects".

For example, maybe a dataset has multiple sets, and it makes sense to represent each set as a vector. Then to represent the full dataset, you might put all these vectors in a list.

Declaring Lists

To declare a list, use the list function and include all the desired datasets within.

# For example, to create a list of vectors, like our example scenario above.
my_list <- list(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9))

Similarly to Vectors, we can name our list items if desired.

names(my_list) <- c("First Set", "Second Set", "Third Set")

Manipulating List Values

Accessing values in a list is similar as for vectors.

# Create our list.
my_list <- list(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9))
 
# Get first item in list.
my_list[1]
 
# Sometimes, syntax uses double brackets. This seems to generally act the same as above.
my_list[[1]]

We can add new list values with assignment. Updating uses the same syntax.

# Add a new, 4th vector to our list.
my_list[4] <- c(10, 11, 12)

We can remove list values with assignment to NULL.

# Remove the 4th vector of our list.
my_list[4] <- NULL


Functions

Declaring Custom Functions

The general syntax to declare a function is:

###
 # Description of function.
 ##
<function_name> <- function(<args>, <kwargs>) {
    # Function logic here.
     
    # Return a value at end of function.
    return(<return_value>)
}

For example:

###
 # This function does stuff.
 ##
my_function <- function(my_arg, my_kwarg="Default Value") {
    # Function logic here.
     
    return(my_value)
}

Common Functions

Below are some common, prebuilt functions that apply to most, if not all of the data structures described in Basic Data Structures.

Print

Displays provided value to console. Can be used

my_vector <- c(1, 2, 3, 4, 5)
 
# This will return "3".
mean(my_vector)

Mean

Returns the mean of the passed data structure.

# For example, we can get the mean of a vector.
my_vector <- c(1, 2, 3, 4, 5)
 
# This will return "3".
mean(my_vector)

Summary

Returns the mean, median, and quartiles of the passed data structure.

Generally can be very useful for getting a quick overview of data properties.

# For example, we can get the summary of a vector.
my_vector <- c(1, 2, 3, 4, 5)
 
# This will return appropriate summary values.
summary(my_vector)

Head

Returns the first 6 or so values of the passed data structure.

# For example, we can get the head of a vector.
my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1)
 
# This will return "1, 2, 3, 4, 5, 5".
head(my_vector)

Tail

Returns the last 6 or so values of the passed data structure.

# For example, we can get the tail of a vector.
my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1)
 
# This will return "5, 5, 4, 3, 2, 1".
tail(my_vector)

Str

Returns a quick overview of the passed data structure. Includes things like number of values, type of variables, first few observations, and more.

# For example, we can get the str of a vector.
my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1)
 
# This will return the general structure of the vector.
str(my_vector)

Subset

Returns a subset of data from the passed data structure.

This is particularly useful for data frames.

# For example, we can create a data frame of employee ratings.
names <- c("Bob", "Jim", "Johnny", "Sarah", "Sally")
rating <- c(1, 2, 3, 4, 5)
my_df = data.frame(names, rating)
 
# Then we can get a subset of only employees with ratings greater than 3.
subset(my_df, subset = rating > 3)

Order

Returns indexes of the passed data structure in ascending (smallest to largest) order.

# For example, we can use this vector.
my_vector <- c(5, 4, 3, 2, 1)
 
# This will return indexes in reverse order.
# Aka "5, 4, 3, 2, 1", to denote value ordering.
order(my_vector)
 
# If we want, we can then pass this back into our vector to display values in order.
vector_order <- order(my_vector)
my_vector[vector_order )]