8 Functions
R is a functional programming language, meaning that everything you do is basically built on functions. However, moving beyond simply using pre-built functions to writing your own functions is when your capabilities really start to take off and your code development/writing takes on a new level of efficiency.
Functions allow you to reduce code duplication by automating a generalized task to be applied recursively. Whenever you catch yourself repeating a function or copy and pasting code there is a good change that you should write a function to eliminate the redundancies.
Unfortunately, due to their abstractness, grasping the idea of writing functions (let alone writing them well) can take some time. However, in this chapter i will provide you with the basic knowledge of how functions operate in R to get you started on the right path and sample codes
8.1 Learning objectives
By the end of this lesson you will be able to:
Identify when you should re-write code into a function.
Understand the general components of functions.
Write functions to automate a generalized task
8.2 When to write functions
NOTE:
This section is taken directly from R for Data Science by Garrett Grolemund and Hadley Wickham and is a great example of how to first start thinking about functions.
You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code). For example, take a look at this code. What does it do?
# jusrt creating a dummy variable
df <- data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10),
d = rnorm(10))
# normalising the variables i.e rescale each column to 0-1
# using min - max method.
df$a <- (df$a - min(df$a, na.rm = TRUE))/(max(df$a, na.rm = TRUE) -
min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE))/(max(df$a, na.rm = TRUE) -
min(df$b, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE))/(max(df$c, na.rm = TRUE) -
min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE))/(max(df$d, na.rm = TRUE) -
min(df$d, na.rm = TRUE))
But did you spot the mistake? I made an error when copying-and-pasting the code for df$b
: I forgot to change an a
to a b
. Extracting repeated code out into a function is a good idea because it prevents you from making this type of mistake.
To write a function you need to first analyze the code. How many inputs does it have? This code only has one input:
df$a
. (i.e. the variable)Also note that there is some duplication in this code. We’re using the
min
of the variables two times, but it makes sense to do it in one step
Below is the function:
rescale_var <- function(var) {
minimum <- min(var, na.rm = TRUE)
maximum <- max(var, na.rm = TRUE)
var_new <- (var - minimum)/(maximum - minimum)
return(var_new)
}
Using the function / function call:
df$a <- rescale_var(df$a)
df$b <- rescale_var(df$b)
df$c <- rescale_var(df$c)
df$d <- rescale_var(df$d)
What if the variables are many? You might not want to Re - write the function many times; hence you can use loop to ease your work; As shown below :-
for (name in colnames(df)) {
df$name <- rescale_var(df$name)
}
Isn’t that so good that you can automate the repeative code line, which reduces errors, Its time to understand how functions work in R and we write your own functions
8.3 Function components
All R functions have three parts:
the
body()
, the code inside the function.the
formals()
, the list of arguments which controls how you can call the function.the
environment()
, the “map” of the location of the function’s variables.
When you print a function in R, it shows you these three important components. If the environment isn’t displayed, it means that the function was created in the global environment.
8.4 Built In Functions
The built-in function in the R programming language is the functions that are already existing or pre-defined within an R framework. The built-in functions enables you to program in the R language easily and simpler. R language provides its user with a rich set of pre-defined functions to make their computation more efficient as well as minimize their programming time.
In r programming language the built-in functions are categorized furthers as following
Math function
Character function
Statistical probability
Other statistical functions
8.4.1 Math functions
Built-in function | Description |
---|---|
abs(x) |
It returns the absolute value of input x |
sqrt(x) |
It returns the square root of input x |
ceiling(x) |
It returns the smallest integer which is larger than or equal to x. |
floor(x) |
It returns the largest integer, which is smaller than or equal to x. |
trunc(x) |
It returns the truncate value of input x. |
round(x, digits=n) |
It returns round value of input x. |
cos(x), sin(x), tan(x) |
It returns cos(x), sin(x) , tan(x) value of input x |
log(x) |
It returns natural logarithm of input x |
log10(x) |
It returns common logarithm of input x |
exp(x) |
It returns exponent |
8.4.2 Character functions
Built-in function | Description |
---|---|
tolower(x) |
It is used to convert the string into lower case. |
toupper(x) |
It is used to convert the string into upper case. |
strsplit(x, split)) |
It splits the elements of character vector x at split point. |
paste(..., sep="") |
Concatenate strings after using sep string to seperate them. |
sub(pattern,replacement, x,ignore.case=FALSE,fixed=FALSE) |
Find pattern in x and replace with replacement text. If fixed=FALSE then pattern is a regular expression. If fixed = T then pattern is a text string. |
grep(pattern, x , ignore.case=FALSE, fixed=FALSE) |
It searches for pattern in x. |
substr(x, start=n1,stop=n2) |
It is used to extract substrings in a character vector. |
8.4.3 Statistical functions
Built-in function | Description |
---|---|
mean(x, trim=0,na.rm=FALSE) |
Calculates the average or mean of a set of numbers Simply calculate mean of object x. |
sd(x) |
It returns standard deviation of an object. |
median(x) |
It returns median |
range(x) |
It returns range |
sum(x) |
It returns sum. |
diff(x, lag=1) |
It returns differences with lag indicating which lag to use. |
min(x) |
It returns minimum value of object. |
max(x) |
It returns maximum value of object. |
scale(x, center=TRUE, scale= TRUE) |
Column center or standardize a matrix. |
8.5 User Defined Functions
These are functions specific to what a user wants and once created they can be used like the built-in functions.
User defined Functions Components
The different parts of a function are −
Function Name − This is the actual name of the function. It is stored in R environment as an object with this name.
Arguments − An argument is a placeholder. When a function is invoked, you pass a value to the argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can have default values.
Function Body − The function body contains a collection of statements that defines what the function does.
Return Value − The return value of a function is the last expression in the function body to be evaluated.
8.6 Sample of user Defined Functions
8.6.1 Function 1:
# Function Definition
check <- function(x) {
if (x%%3 == 0) {
result <- "the number is divisible by three"
} else {
result = "the number is not divisible by three"
}
result
}
# Function Call
check(23)
8.6.2 Function 2:
# Function to calculate the average of a vector of numbers
calculate_average <- function(numbers) {
average <- sum(numbers)/length(numbers)
return(average)
}
# Example usage/function call
vector1 <- c(3, 5, 7, 9, 11)
average1 <- calculate_average(vector1)
print(average1)
vector2 <- c(2.5, 3.7, 6.9, 8.2, 9.6)
average2 <- calculate_average(vector2)
print(average2)
8.7 Function Call
Every operation in R is a function call
“To understand computations in R, two slogans are helpful:
Everything that exists is an object.
Everything that happens is a function call.”
— John Chambers
Once you have created a function, how do you call it? Where can you call the function?
You can call a function from anywhere in the environment in which the function is declared. We will read more about global and local environments and scopes in a later section.
To call a function, we simply have to use the function’s name and provide appropriate arguments. For example:
function_name(arguments)
When calling a function you can specify arguments by position, by complete name, or by partial name. Arguments are matched first by exact name (perfect matching), then by prefix matching, and finally by position.
Generally, you only want to use positional matching for the first one or two arguments; they will be the most commonly used, and most readers will know what they are.
Avoid using positional matching for less commonly used arguments, and only use readable abbreviations with partial matching. Named arguments should always come after unnamed arguments. If a function uses ...
(discussed in more detail below), you can only specify arguments listed after ...
with their full name.
These are good calls:
This is probably overkill:
mean(x = 1:10)
And these are just confusing: