Ν. Σ.

On R

This article provides some examples of using R both in Statistics and Mathematics.

We assume that you have already installed R in your computer. See: www.r-project.org.

There are plenty of introductory and tutorial material at that web site. Let us start with a simple example.

1. Plotting curves

When you start R, the command prompt > appears. You just type the commands for the R language.

Plot y = x^2, for values of -10.0 <= x <= 10.0, with increments of 1.0. First, we have to generate values of x before we can compute the corresponding values of y.The R command that does this is:

x = seq(-10,10,1)

The first number is the starting number for x, the second is the end number, the third number is the value by which you increment values of x.

To verify the x values, just type x at the command prompt.

You will see a list of values for x that have been generated listed on your screen. Now you are ready to compute values of y.

Type,

y = x^2 at the command prompt. ^ denotes exponentiation or raising to some power.

Now, type y at the command prompt. You will see the list of y values for the corresponding x values. The plot command will display the y = x^2 graph on your computer display:

plot(x,y, " l ")

"l" is not the number one but the lower case L that tells R to connect the points as a continuous line.

Here is the result:

2. Reading-in Data

You can read-in the data-- if the amount of data is not large-- into R, thus:

x = c(1,20, 21, 34, 56,86, 3, 4)

Type x at the command prompt. You will see, > x [1] 1 20 21 34 56 86 3 4 > mean(x) # Gives you the mean of x. [1] 28.125 > sd(x) # Gives you the Std. Dev. of x [1] 29.82538 Similarly, for Variance, you type, var(x)2.1 Some built-in Functions in R:

The number e:

exp(1) # 2.718151 .............,

The number pi:

pi # 3.14159...............

Log of x to the base e:

log(x) # Note it is not ln.

3. How to generate Random Normal Deviates using the OpenOffice Spreadsheet.

There are two functions that do this: NORMINV & NORMSINV.

NORMINV generates a "Normal" value for the Inverse Normal Distribution for a specified mean and standard deviation, like this:

NORMINV(number; mean; standard Deviation)

The "number" should always in the open interval (0,1).

Example: Type,

=NORMINV(0.8;50;10)

The selected cell will display a value such as 58.4162.Whereas, NORMSINV produces values for the Standard Normal Distribution (mean = 0 and Std. dev. = 1).

NORMSINV is specified like this: =NORMSINV(0.8)

Note: 0.8 is chosen as an example. To produce a series of normally distributed random numbers, you select a cell, and type this equation in the equation slot of the spreadsheet thus:

=norminv(rand(); 50; 10)

Where rand() is the Uniform Random Number generator function. (i.e. 0< x <1)

This will produce one random number-- from the Normal Distribution-- in the selected cell. You can copy and paste this cell in as many cells as you want-- both horizontally and vertically.

An Example:

An array of 20 by 20 random normally distributed numbers to two decimals, with mean =50, and Std. Dev. = 10.

48.38 29.33 70.29 32.90 21.18 62.56 50.42 53.09 47.50 56.90 67.20 58.95 41.48 40.81 58.03 49.79 47.26 48.94 48.43 54.67

47.48 40.30 51.62 44.01 38.87 54.15 43.46 40.17 48.01 38.79 64.49 60.66 39.66 63.11 62.84 39.74 58.44 47.91 57.52 41.66

40.54 56.16 28.70 46.66 63.42 58.62 60.84 44.82 43.67 44.71 70.91 54.23 49.11 51.15 65.44 52.46 41.83 41.41 57.07 53.27

45.50 54.86 49.62 49.42 52.32 47.09 53.30 47.27 58.51 47.55 52.39 69.56 38.94 52.82 45.99 32.99 47.89 41.75 46.06 54.67

50.08 45.55 45.98 68.49 61.47 44.25 51.64 36.03 49.44 45.47 55.52 58.57 27.29 41.38 44.20 52.62 52.02 63.42 76.06 47.01

45.14 52.27 46.35 48.69 41.60 55.28 49.81 52.98 63.18 59.90 44.67 47.71 40.35 53.61 46.87 32.66 64.29 65.75 38.38 47.50

43.79 59.63 41.78 44.14 30.66 49.66 61.86 52.70 47.38 61.63 72.93 55.54 49.18 46.05 39.97 54.34 33.34 53.85 44.32 66.86

49.72 52.07 46.51 54.02 42.37 56.46 55.17 39.48 54.77 59.16 51.20 66.52 53.92 56.84 42.60 54.65 42.92 37.84 44.33 53.48

73.09 44.01 46.05 48.92 53.08 50.10 38.48 54.37 39.81 47.71 53.13 53.67 73.89 72.13 44.73 42.09 56.05 77.12 46.44 47.74

59.03 63.31 46.45 49.23 54.17 52.31 39.55 62.88 55.33 48.32 50.94 55.03 55.47 61.94 39.61 45.57 47.15 44.00 78.21 51.05

54.78 53.07 41.04 54.55 52.72 49.60 62.25 45.73 49.52 43.27 56.60 44.69 40.01 37.56 57.29 59.18 55.19 63.72 55.88 47.50

46.10 43.54 36.82 34.12 39.33 43.04 47.08 50.77 50.44 47.01 35.44 41.63 22.65 43.50 61.58 52.80 56.49 57.27 65.96 42.37

28.75 54.96 50.51 40.68 58.30 44.39 76.84 49.87 41.66 55.77 62.59 51.27 70.01 73.14 52.71 38.26 42.71 73.97 53.88 56.76

46.80 55.94 66.61 46.88 70.84 59.47 72.45 55.93 52.39 64.94 66.09 52.83 53.12 49.16 57.96 47.96 56.87 57.84 63.67 67.09

50.04 58.78 50.17 49.47 58.41 37.92 52.51 31.57 37.48 43.26 57.83 49.49 69.38 56.27 60.67 66.97 51.58 60.14 54.63 39.95

57.62 53.18 57.34 47.46 36.67 51.40 59.11 60.17 45.85 56.22 58.47 59.98 51.08 45.00 45.31 45.95 48.03 63.97 46.82 50.69

40.41 39.99 25.46 39.31 62.43 60.99 37.05 48.89 55.50 57.59 52.88 49.65 47.49 47.00 62.03 49.80 65.36 55.37 45.81 44.08

48.42 39.16 38.14 70.62 48.68 48.43 45.41 61.25 46.32 55.20 47.39 50.63 60.67 47.58 54.33 56.75 43.82 57.24 41.80 69.99

51.04 59.13 49.07 65.74 41.63 46.11 48.35 39.16 33.83 57.45 47.88 49.76 63.48 50.85 49.27 46.38 69.02 58.20 42.56 45.54

49.81 53.20 60.48 46.13 31.16 50.51 37.14 44.70 44.34 44.91 44.03 59.43 38.48 56.39 57.37 45.49 36.41 41.73 49.13 39.20

Note: We have restricted the numbers to two decimal places so that the array can fit within the web page.

For computational purposes, you may want more than two decimal places. Here we have generated a 20 x 20 array of normally distributed random numbers with a mean = 50, and the standard deviation =10.0.

We have purposely called this an array and not a matrix.

We have to use the R function "as.matrix" to convert the above array into a matrix-- if you want a matrix for further calculation.

Before we can use R for computations, the data has to be brought into R.

One method of data entry is via a spreadsheet such as Open Office Calc-- a free software.

After entering the data, the spreadsheet has to be saved as a .csv file. (csv stands for "comma separated variable").

Note: # is a comment character. Anything after # is ignored by R.

x = read.table("twenty_by_twenty_array.csv",header=F, sep=",", nrows=20) #reads the data from the file.

x # Lists the data array x on the terminal.

attach(x) # you have to bring the data into your program.

library(MASS) # One of the libraries in R that does the computations.

library(matrixcalc) # Another library in R that does matrix computations (does the inverse).

y= x[1:20,1:20] # Creates an array y to convert x into a matrix.

y # Lists the values of the array y on the terminal.

z = as.matrix(y) # Converts the array y into a matrix and assigns it to z.

z # Lists the values of z on the terminal. (We have not shown the listing here to save space)

zz =ginv(z) # calculates the generalized matrix inverse using MASS.

zz # We have not shown the listing here to save space.

z %*% zz # Multiplies the matrix z with its inverse zz calculated above. Result should be a diag. matrix [I]

p= inverse(z) # Again, compute the matrix invese of z to demonstrate the inverse function in matrixcalc lib.

p

q = z%*%p # Matrix multiplication : %*%

q

col.means= colMeans(z) # Compute the mean of each column in the matrix z.

col.means # List them.

row.means= rowMeans(z) # Compute the mean of each row

row.means

total.mean= mean(z) # Compute the mean of the entire array.

total.mean # Output the value.

> col.means

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

48.0275 49.7970 50.7705 53.5630 49.6185 46.0005 50.3575 52.1470 50.4975 49.1415

V11 V12 V13 V14 V15 V16 V17 V18 V19 V20

50.2575 51.5660 51.9245 49.0730 51.0820 48.9275 49.3760 52.3490 52.3380 48.0205

> row.means= rowMeans(z)

> row.means

>1 2 3 4 5 6 7 8 9 10

>49.6000 51.3220 50.4340 51.3155 50.4410 51.2335 52.3095 48.8760 53.2670 50.2080 11 12 13 14 15 16 17 18 19 20 51.3700 50.1935 47.5550 47.5600 50.0575 52.1805 48.2550 49.0375 46.7090 52.9100

>>> total.mean= mean(z)

>>> total.mean

>>[1] 50.24173

>4. Linear Regression

>># A Program to do Linear Regression

>># Program reads the data from a file named Reg_01 located in the same directory.

>>Data for the regression program below

>>#

>reg.data = read.table("Regr_01", header=T)

>>attach(reg.data)

>names(reg.data) # Tells you the names of the variables.

>>[1] "x" "y"

>x #Lists the values of x that have been read in.

>>[1] 26 27 33 29 29 34 30 40 22

>>y #Lists the values of y that have been read in.

>>[1] 290 305 325 327 356 411 488 554 246

>>pdf(file = "Regr_01.pdf") # Output a pdf file

>>postscript(file ="Regr_01.ps") # Output a postscript file.

>>plot(x,y, pch=16) # Scatter plot of the data on the terminal

>>lm(y~x) # lm stands for Linear Models.

>>Coefficients:

>(Intercept) x # The Result: What You will see

>> -109.92 15.89 # The eqn. of the regr. line is y =-109.92 +15.89x

>>summary(lm(y~x))

>>#

>>Residuals:

>>Min 1Q Median 3Q Max # The Result

>>-89.569 -19.463 -13.315 6.259 121.111

>>#

>Coefficients:

>> Estimate Std. Error t value Pr(>|t|)

>>(Intercept) -109.917 123.312 -0.891 0.40233

>>x 15.894 4.057 3.918 0.00576 **

>>Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

>>Residual standard error: 59.62 on 7 degrees of freedom

>>Multiple R-squared: 0.6868, Adjusted R-squared: 0.6421

>>F-statistic: 15.35 on 1 and 7 DF, p-value: 0.005765

>>#

>confint(lm(y~x)) # Computes the Confidence Intervals

>> 2.5 % 97.5 %

>>(Intercept) -401.504038 181.67070

>>x 6.300998 25.48604

>>p = seq(10,40,3)

>>p

>[1] 10 13 16 19 22 25 28 31 34 37 40

>>q =-109.92 +15.89*p

>>q

>>[1] 48.98 96.65 144.32 191.99 239.66 287.33 335.00 382.67 430.34 478.01

>>[11] 525.68

>>lines(p,q,"l")

>>dev.off() # Close the files

>>dev.off()

>>Plot of the Regression Line Along With the data

>5.0 Plotting a Parabola--Another Example

# Plotting a parabola y^2 = 4x

# The following command generates a series of x values, spaced 0.01 apart

x = seq(0,5, by =0.01)

# The following commands sets the stage for the y axis for the plot

y.upper = 2*sqrt(x)

y.lower = -2*sqrt(x)

y.max = max(y.upper)

y.min = min(y.lower)

##

#The following command merely displays the box in which the graph is to be plotted

plot ( c(-2,5), c(y.min, y.max), type ="n", xlab = "x", ylab ="y")

#The following command plots the upper portion (i.e. the part above the x axis)

lines(x, y.upper)

#The following command plots the lower half of y^2 = 4x

lines(x, y.lower)

#

# This command plots lines as specified, v for vertical, h for horizontal

abline (v = -1)

abline(h=0)

# The following plots a point as specified

points(1,0)

#This puts the text at the specified place, the text is specified in quotes

text(1,0, "focus (1,0)", pos = 4)

# Put a title on the graph thus:

title("The Plot of the Parabola y^2 = 4x")

# Execute this file (script) using source("plot_parabola.r", echo =T)

#You must be in the same directory as this file

Here is the result: