Programming/R: Difference between revisions
Brodriguez (talk | contribs) (Add loops section) |
Brodriguez (talk | contribs) m (Brodriguez moved page Programming/R (Language) to Programming/R) |
||
(5 intermediate revisions by the same user not shown) | |||
Line 11: | Line 11: | ||
=== Variable Definition === | === Variable Definition === | ||
# Null. | |||
type_1 <- NULL | |||
type_2 <- NA | |||
| |||
# Booleans. | |||
a_bool <- TRUE | a_bool <- TRUE | ||
b_bool <- FALSE | b_bool <- FALSE | ||
| |||
# Ints. | |||
val_1 <- 5 | |||
val_2 <- -6 | |||
| |||
# Strings. | |||
my_var_1 <- "This is " | my_var_1 <- "This is " | ||
my_var_2 <- "a string." | my_var_2 <- "a string." | ||
Line 281: | Line 292: | ||
Data Frames are a lot like [[#Matrices|matrices]], except that they are specialized to handle data sets with various different data types within. | Data Frames are a lot like [[#Matrices|matrices]], except that they are specialized to handle data sets with various different data types within. | ||
=== Declaring Data Frames === | ==== Declaring Data Frames ==== | ||
# We can use data.frame() along with desired vectors. | # We can use data.frame() along with desired vectors. | ||
# Note that we can pass in as many or few vectors as we want. | # Note that we can pass in as many or few vectors as we want. | ||
Line 288: | Line 299: | ||
data.frame(my_vector_1, my_vector_2, my_vector_3) | data.frame(my_vector_1, my_vector_2, my_vector_3) | ||
=== Accessing Data Frame Values === | ==== Accessing Data Frame Values ==== | ||
Data Frame values are accessed the same was as Matrix values. See [[#Accessing Matrix Values|Accessing Matrix Values]] for more info. | Data Frame values are accessed the same was as Matrix values. See [[#Accessing Matrix Values|Accessing Matrix Values]] for more info. | ||
== Common Functions == | === Lists === | ||
Below are some common functions that apply to most, if not all of the data structures described | Lists are essentially "collections of multiple objects". | ||
For example, maybe a dataset has multiple sets, and it makes sense to represent each set as a [[#Vector | vector]]. Then to represent the full dataset, you might put all these vectors in a list. | |||
==== Declaring Lists ==== | |||
To declare a list, use the {{ ic | list }} function and include all the desired datasets within. | |||
# For example, to create a list of vectors, like our example scenario above. | |||
my_list <- list(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9)) | |||
Similarly to Vectors, we can name our list items if desired. | |||
names(my_list) <- c("First Set", "Second Set", "Third Set") | |||
==== Manipulating List Values ==== | |||
Accessing values in a list is similar as for [[#Vector | vectors]]. | |||
# Create our list. | |||
my_list <- list(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9)) | |||
| |||
# Get first item in list. | |||
my_list[1] | |||
| |||
# Sometimes, syntax uses double brackets. This seems to generally act the same as above. | |||
my_list<nowiki>[[1]]</nowiki> | |||
We can add new list values with assignment. Updating uses the same syntax. | |||
# Add a new, 4th vector to our list. | |||
my_list[4] <- c(10, 11, 12) | |||
We can remove list values with assignment to NULL. | |||
# Remove the 4th vector of our list. | |||
my_list[4] <- NULL | |||
== Functions == | |||
=== Declaring Custom Functions === | |||
The general syntax to declare a function is: | |||
### | |||
# Description of function. | |||
## | |||
<function_name> <- function(<args>, <kwargs>) { | |||
# Function logic here. | |||
| |||
# Return a value at end of function. | |||
return(<return_value>) | |||
} | |||
For example: | |||
### | |||
# This function does stuff. | |||
## | |||
my_function <- function(my_arg, my_kwarg="Default Value") { | |||
# Function logic here. | |||
| |||
return(my_value) | |||
} | |||
=== Common Functions === | |||
Below are some common, prebuilt functions that apply to most, if not all of the data structures described in [[#Basic Data Structures|Basic Data Structures]]. | |||
==== Print ==== | |||
Displays provided value to console. Can be used | |||
my_vector <- c(1, 2, 3, 4, 5) | |||
| |||
# This will return "3". | |||
mean(my_vector) | |||
=== Mean === | ==== Mean ==== | ||
Returns the [[Statistics/Core Measurements#Mean|mean]] of the passed data structure. | |||
# For example, we can get the mean of a vector. | # For example, we can get the mean of a vector. | ||
my_vector <- c(1, 2, 3, 4, 5) | my_vector <- c(1, 2, 3, 4, 5) | ||
Line 303: | Line 378: | ||
mean(my_vector) | mean(my_vector) | ||
=== Summary === | ==== Summary ==== | ||
Returns the [[Statistics/Core Measurements#Mean|mean]], [[Statistics/Core Measurements#Median|median]], and [[Statistics/Data Characteristics#Quartiles|quartiles]] of the passed data structure. | |||
Generally can be very useful for getting a quick overview of data properties. | Generally can be very useful for getting a quick overview of data properties. | ||
Line 313: | Line 388: | ||
summary(my_vector) | summary(my_vector) | ||
=== Head === | ==== Head ==== | ||
Returns the first 6 or so values of the passed data structure. | |||
# For example, we can get the head of a vector. | # For example, we can get the head of a vector. | ||
my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1) | my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1) | ||
Line 321: | Line 396: | ||
head(my_vector) | head(my_vector) | ||
=== Tail === | ==== Tail ==== | ||
Returns the last 6 or so values of the passed data structure. | |||
# For example, we can get the tail of a vector. | # For example, we can get the tail of a vector. | ||
my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1) | my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1) | ||
Line 329: | Line 404: | ||
tail(my_vector) | tail(my_vector) | ||
=== Str === | ==== Str ==== | ||
Returns a quick overview of the passed data structure. Includes things like number of values, type of variables, first few observations, and more. | |||
# For example, we can get the str of a vector. | # For example, we can get the str of a vector. | ||
my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1) | my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1) | ||
Line 337: | Line 412: | ||
str(my_vector) | str(my_vector) | ||
=== Subset === | ==== Subset ==== | ||
Returns a subset of data from the passed data structure. | Returns a subset of data from the passed data structure. | ||
Line 349: | Line 424: | ||
subset(my_df, subset = rating > 3) | subset(my_df, subset = rating > 3) | ||
=== Order === | ==== Order ==== | ||
Returns indexes of the passed data structure in ascending (smallest to largest) order. | |||
# For example, we can use this vector. | # For example, we can use this vector. | ||
my_vector <- c(5, 4, 3, 2, 1) | my_vector <- c(5, 4, 3, 2, 1) |
Latest revision as of 17:28, 25 October 2020
R is a language used for statistics.
Comments
# This is an inline comment.
Variables
Variables R are loosely typed in R. That means that the type (bool, int, string, etc) is implicitly declared by the value provided.
Variable Definition
# Null. type_1 <- NULL type_2 <- NA # Booleans. a_bool <- TRUE b_bool <- FALSE # Ints. val_1 <- 5 val_2 <- -6 # Strings. my_var_1 <- "This is " my_var_2 <- "a string."
Variable Usage
# We can print our variables by retyping the variable name with no further syntax. a_bool b_bool my_var_1 my_var_2 # Alternatively, we can use the "print" function. print(a_bool) print(b_bool) print(my_var_1) print(my_var_2)
Variable Types
Variable types in R are called the following:
- Booleans are called
Logicals
. - Text is called
characters
. - Numbers are called
numerics
.
If ever unsure you can check the typing of a variable with class()
. For example:
# This will print out the typing for "my_variable". class(my_variable)
If Statements
Basic If
if(x == y) { # Logic if true. }
Full If
if (x == y) { # Logic for "if" true. } else if ( x & (y | z)) { # Logic for "else if" true. } else { # Logic for false. }
Loops
While Loop
while (x == y) { # Logic to run for loop. }
For Loop
For loops are extremely clean in R. They handle similarly to Python in that they directly grab the items for the iterable object we loop through. Template:ToDo
# This prints out all items in the provided vector. for (item in my_vector) { print(item) }
However, sometimes it's still useful to loop using index values.
# Same as above, but loops via index values. for (index in 1:length(my_vector)) { print(my_vector[index]) }
Next
We can immediately jump to the next iteration of a loop with the next
keyword.
for (item in my_vector) { # If item is "5", then skip to next item. if (item == 5) { next } }
Break
We can break out of a loop early with the break
keyword.
# The initial statement indicates this loop should run forever. while (TRUE) { # But this break statement will terminate the loop early. break }
Basic Data Structures
Vectors
In R, "Vectors" are what most other languages call "Arrays".
Arrays (vectors) in R are similar to Arrays (lists) in Python. That is, the size and semantics of the array are taken care of for you, and all you need to worry about are the values you place into it.
For the rest of this section, R arrays will be referred to by the proper name, aka Vectors.
Declaring Vectors
# Vectors in R can have mixed value types. character_vector <- c("This", "is", "a", "character", "vector") numeric_vector <- c(1, 2, 15, 6) logical_vector <- c(TRUE, FALSE, FALSE) mixed_vector <- c(TRUE, 1, "test")
Accessing Vector Values
Unlike most other programming languages, indexes in R start at 1.
my_vector <- c(5, 7, 2) # Print first index. my_vector[1] # Print last index. my_vector[3]
Furthermore, unlike most languages, you can select multiple values at once by using a nested Vector syntax:
my_vector <- c(5, 7, 2) # Print out first and last index at the same time. my_vector[c(1, 3)] # Alternatively, to select a range of values, we can use this syntax. # In this case, we print out the first two indexes. my_vector[1:2]
Manipulating Vectors
Mathematical functions on Vectors handle very similarly to how you would expect real Mathematical Vectors to handle. That is, they are applied "element-wise" to the vector. Template:ToDo
For example, adding two vectors will add the corresponding index values.
my_vector <- c(1, 2, 3) ones_vector <- c(1, 1, 1) # This should create a new vector of (2, 3, 4). new_vector <- my_vector + ones_vector
Alternatively, if you want to combine all values in a single vector, use the sum()
function.
my_vector <- c(1, 2, 3) # This will output "6". sum(my_vector)
Or if we want to combine multiple vectors into one, we can use this:
vector_1 <- c(1, 2, 3) vector_2 <- c(4, 5, 6) # Combine these into a vector of (1, 2, 3, 4, 5, 6). my_vector <- c(vector_1, vector_2)
We can also test equality on every value within a Vector.
my_vector <- c(1, 2, 3) # Check which values are greater than 1. my_vector > 1
We can take equality testing a bit further, to only print the values that met our criteria.
my_vector <- c(9, 10, 7, 11, 5, 12) # Save which values are greater than 10. selection_vector <- my_vector > 10 # Print out only values greater than 10. This should print out (11, 12). my_vector[selection_vector]
Dictionaries
Dictionaries in R appear to actually be modified vectors. Basically, first create your desired vector (to hold the "values"), then use the names
function on it to declare keys.
Here's an example for a hypothetical business trying to track count of items sold.
product_sold <- c(50, 56, 102) names(product_sold) <- c("Ice Cream", "Burgers", "Pizza")
Alternatively, you can create two arrays and combine them.
product_sold <- c(50, 56, 102) product_names <- c("Ice Cream", "Burgers", "Pizza") names(product_sold) <- product_names
The two above code snippets should be equivalent.
Once we have names (aka keys) associated with our values, we can use those to get specific indexes.
# Print out the count of pizza sold. product_sold["Pizza"]
Matrices
Matrices are effectively "2-D Vectors" in R.
Declaring Matrices
Matrices are essentially declared via a special function that returns the formatting we want. The format is:
matrix(<values>, byrow = <bool>, nrow = <row_count>
For example, to declare a 3x3 matrix with values 1 through 9, we can use:
matrix(1:9, byrow = TRUE, nrow = 3)
Similarly to Vectors, (see dictionaries), we can associate strings with our values. The syntax is as follows.
# Declare matrix row names. rownames(my_matrix) <- row_names_vector # Declare matrix column names. colnames(my_matrix) <- col_names_vector
Accessing Matrix Values
Accessing matrix values is very similar to accessing vector values, except that you need to specify both row and column. Omitting a row/col value will assume you want all row/col indexes.
# For example, this will get the second row and third column of a matrix. my_matrix[2, 3] # Get all values in the first two rows and first three columns of a matrix. my_matrix[1:2, 1:3] # Get all values in the first row. my_matrix[1, ] # Get all values in the first column. my_matrix[, 1]
Manipulating Matrices
We can combine matrices with the rbind
or cbind
function.
# This will combine two matrices into one, by matching row names. rbind(matrix_1, matrix_2) # This will combine two matrices into one, by matching column names. cbind(matrix_1, matrix_2)
Similarly to vectors, mathematical operations applied to Matrices will apply element-wise. For example:
# This will add two to all matrix indexes. my_matrix + 2
We can get sums of our matrix with the rowSums
or colSums
function :
# Get sum of rows. rowSums(my_matrix) # Get sum of columns. colSums(my_matrix)
Factors
Due to the statistical nature of R, we can also create "categorical variables", called factors. These are basically variables in which there is a limited, set number of possible values, aka categories.
Declaring Factors
# First, create a vector of your desired categories. gender_vector <- c("Male", "Female", "Other") # Use the "factor" function to create a categorical variable of this vector. gender_factor <- factor(gender_vector) # Note that it's possible to create a factor out of a vector with duplicate values. # For example, the following vector will create the same factor as above. gender_vector <- c("Male", "Female", "Male", "Female", "Other")
It's also possible to change the values of factors after the fact.
# Given an existing factor of ("S", "M", "L"), # we want to change this to ("Small", "Medium", "Large"). size_factor <- factor(c("S", "M", "L")) # We can print this out to see that values are printed in alphabetical order. size_factor # Now we change the values, but declare them alphabetically. levels(size_factor) <- c("Large", "Medium", "Small")
Types of Factors
- Nominal Category - A category in which there is no associated ordering. Ex: Gender.
- Ordinal Category - A category which has an implied ordering. Ex: Sizes of fast food drinks.
By default, all factors in R are nominal. But they can be "upgraded" to ordinal with the following:
size_vector <- c("Small", "Medium", "Large") size_factor <- factor(size_vector, order=True, levels = c("Small", "Medium", "Large"))
Note that order does not matter in the original vector. But it does matter in the levels argument supplied to the {ic | factor}} function.
Data Frames
Data Frames are a lot like matrices, except that they are specialized to handle data sets with various different data types within.
Declaring Data Frames
# We can use data.frame() along with desired vectors. # Note that we can pass in as many or few vectors as we want. # Each passed vector becomes a column for the data frame, # so all passed vectors should be of the same size. data.frame(my_vector_1, my_vector_2, my_vector_3)
Accessing Data Frame Values
Data Frame values are accessed the same was as Matrix values. See Accessing Matrix Values for more info.
Lists
Lists are essentially "collections of multiple objects".
For example, maybe a dataset has multiple sets, and it makes sense to represent each set as a vector. Then to represent the full dataset, you might put all these vectors in a list.
Declaring Lists
To declare a list, use the list
function and include all the desired datasets within.
# For example, to create a list of vectors, like our example scenario above. my_list <- list(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9))
Similarly to Vectors, we can name our list items if desired.
names(my_list) <- c("First Set", "Second Set", "Third Set")
Manipulating List Values
Accessing values in a list is similar as for vectors.
# Create our list. my_list <- list(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9)) # Get first item in list. my_list[1] # Sometimes, syntax uses double brackets. This seems to generally act the same as above. my_list[[1]]
We can add new list values with assignment. Updating uses the same syntax.
# Add a new, 4th vector to our list. my_list[4] <- c(10, 11, 12)
We can remove list values with assignment to NULL.
# Remove the 4th vector of our list. my_list[4] <- NULL
Functions
Declaring Custom Functions
The general syntax to declare a function is:
### # Description of function. ## <function_name> <- function(<args>, <kwargs>) { # Function logic here. # Return a value at end of function. return(<return_value>) }
For example:
### # This function does stuff. ## my_function <- function(my_arg, my_kwarg="Default Value") { # Function logic here. return(my_value) }
Common Functions
Below are some common, prebuilt functions that apply to most, if not all of the data structures described in Basic Data Structures.
Displays provided value to console. Can be used
my_vector <- c(1, 2, 3, 4, 5) # This will return "3". mean(my_vector)
Mean
Returns the mean of the passed data structure.
# For example, we can get the mean of a vector. my_vector <- c(1, 2, 3, 4, 5) # This will return "3". mean(my_vector)
Summary
Returns the mean, median, and quartiles of the passed data structure.
Generally can be very useful for getting a quick overview of data properties.
# For example, we can get the summary of a vector. my_vector <- c(1, 2, 3, 4, 5) # This will return appropriate summary values. summary(my_vector)
Head
Returns the first 6 or so values of the passed data structure.
# For example, we can get the head of a vector. my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1) # This will return "1, 2, 3, 4, 5, 5". head(my_vector)
Tail
Returns the last 6 or so values of the passed data structure.
# For example, we can get the tail of a vector. my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1) # This will return "5, 5, 4, 3, 2, 1". tail(my_vector)
Str
Returns a quick overview of the passed data structure. Includes things like number of values, type of variables, first few observations, and more.
# For example, we can get the str of a vector. my_vector <- c(1, 2, 3, 4, 5, 5, 4, 3, 2, 1) # This will return the general structure of the vector. str(my_vector)
Subset
Returns a subset of data from the passed data structure.
This is particularly useful for data frames.
# For example, we can create a data frame of employee ratings. names <- c("Bob", "Jim", "Johnny", "Sarah", "Sally") rating <- c(1, 2, 3, 4, 5) my_df = data.frame(names, rating) # Then we can get a subset of only employees with ratings greater than 3. subset(my_df, subset = rating > 3)
Order
Returns indexes of the passed data structure in ascending (smallest to largest) order.
# For example, we can use this vector. my_vector <- c(5, 4, 3, 2, 1) # This will return indexes in reverse order. # Aka "5, 4, 3, 2, 1", to denote value ordering. order(my_vector) # If we want, we can then pass this back into our vector to display values in order. vector_order <- order(my_vector) my_vector[vector_order )]