Data Steps/ Reading from file

we seldom create the data our selves, most of the time we read the data from file. there are many ways to read the data.

Reading from Raw data

> abc<-read.table("control_vs_inocul.txt")


> abc[1:10,]


    V1       V2   V3   V4      V5


1  Rep Genotype  top root Control


2    1     R1R4 0.65 0.48       0


3    1     R1R7 0.77 0.32       0


4    1     R1M1 0.77 0.25       0


5    1     R1M3 0.73 0.31       0


6    1     R1S1 0.69 0.33       0


7    1     R1S3 0.72 0.34       0


8    1     R1S7 0.56 0.28       0


9    1     R4R7 0.69 0.37       0


10   1     R4M1 0.65 0.40       0




Notice the V1, V2....... these are the default variables given by the R system , so to inclue the variable name from file





> abc<-read.table("control_vs_inocul.txt", header=T)


> abc[1:10,]


   Rep Genotype  top root Control


1    1     R1R4 0.65 0.48       0


2    1     R1R7 0.77 0.32       0


3    1     R1M1 0.77 0.25       0


4    1     R1M3 0.73 0.31       0


5    1     R1S1 0.69 0.33       0


6    1     R1S3 0.72 0.34       0


7    1     R1S7 0.56 0.28       0


8    1     R4R7 0.69 0.37       0


9    1     R4M1 0.65 0.40       0


10   1     R4M3 0.86 0.30       0




or to read the data first and assign variable name later,





> abc<-read.table("control_vs_inocul.txt", header=F, skip=1)


> abc[1:10,]


   V1   V2   V3   V4 V5


1   1 R1R4 0.65 0.48  0


2   1 R1R7 0.77 0.32  0


3   1 R1M1 0.77 0.25  0


4   1 R1M3 0.73 0.31  0


5   1 R1S1 0.69 0.33  0


6   1 R1S3 0.72 0.34  0


7   1 R1S7 0.56 0.28  0


8   1 R4R7 0.69 0.37  0


9   1 R4M1 0.65 0.40  0


10  1 R4M3 0.86 0.30  0






> names(abc)<-c("A", "B", "C", "D", "E")


> abc[1:10,]


   A    B    C    D E


1  1 R1R4 0.65 0.48 0


2  1 R1R7 0.77 0.32 0


3  1 R1M1 0.77 0.25 0


4  1 R1M3 0.73 0.31 0


5  1 R1S1 0.69 0.33 0


6  1 R1S3 0.72 0.34 0


7  1 R1S7 0.56 0.28 0


8  1 R4R7 0.69 0.37 0


9  1 R4M1 0.65 0.40 0


10 1 R4M3 0.86 0.30 0




in some rows there are missing data..





> abc[210:215,]


    A    B    C    D E


210 3 R7M1 0.60 0.26 1


211 3 R7M3 0.36 0.31 1


212 3 R7S1 0.68 0.23 1


213 3 R7S3    .    . 1


214 3 R7S7 0.52 0.42 1


215 3 M1M3 0.58 0.13 1




here "." does not mean missing value. it is the input from file, so to recognize it as missing value





> abc<-read.table("control_vs_inocul.txt", header=T, na.string=".")


> abc[210:215,]


    Rep Genotype  top root Control


210   3     R7M1 0.60 0.26       1


211   3     R7M3 0.36 0.31       1


212   3     R7S1 0.68 0.23       1


213   3     R7S3   NA   NA       1


214   3     R7S7 0.52 0.42       1


215   3     M1M3 0.58 0.13       1




the default separator of read.table is " ", you can always change this in parameter sep="," or any other else.



for a complete description type ?read.table in R interface.



there are variations in read.table such as read.csv, read.delim



The another common format of raw data is fixed width format. to read that you can use read.fwf



for example if fwf1.dat contains following data





1S1.52.33


2S2.56.33


3R1.23


then









> def <- read.fwf("fwf1.dat", width=c(1,1,3,4), col.names=c("ID", "type", "top", "root"))


> def


  ID type top root


1  1    S 1.5 2.33


2  2    S 2.5 6.33


3  3    R 1.2 3.00




Another primitive function to read data is scan() function





> scanned<-scan("control_vs_inocul.txt", skip=1, what=list(0,"",0,0,0), nlines=7)


Read 7 records


> scanned


[[1]]


[1] 1 1 1 1 1 1 1


 


[[2]]


[1] "R1R4" "R1R7" "R1M1" "R1M3" "R1S1" "R1S3" "R1S7"


 


[[3]]


[1] 0.65 0.77 0.77 0.73 0.69 0.72 0.56


 


[[4]]


[1] 0.48 0.32 0.25 0.31 0.33 0.34 0.28


 


[[5]]


[1] 0 0 0 0 0 0 0


you can also use scan to directly input data with keyboard.









> scan()


1: 1 2 3 4 5 6 7 8


9: 


Read 8 items


[1] 1 2 3 4 5 6 7 8




Most of the time we input our raw data in spreadsheet application such as MS Excel. What I do is save the data  in csv or tab delimitated txt format and read through read.table function.



Another nice package to read data directly from Excel is xlsReadWrite. to import from Excel file directly,





> library(xlsReadWrite)


> data1<-read.xls("original.xls")


> data1[1:7,]


   F M rep. CROSS Egg Gall  top root. count


1  1 2    1    12   0    1 0.65  0.48  1371


2  1 2    1    12   0    1 0.65  0.48  1371


3  1 2    1    12   1    1 0.65  0.48  1371


4  1 2    1    12   0    1 0.65  0.48  1371


5  1 2    1    12   1    3 0.65  0.48  1371


6  1 2    1    12   0    1 0.65  0.48  1371


7  1 2    1    12   0    1 0.65  0.48  1371




for more information about the parameters and default values look at



http://cran.r-project.org/web/packages/xlsReadWrite/xlsReadWrite.pdf



further next time

Data Steps / Creating Data

Crating data

using c(...)

> y=c(1,2,3,4,5)


> y


[1] 1 2 3 4 5




use of : for from:to





> x<-1:10


> x


 [1]  1  2  3  4  5  6  7  8  9 10




using seq()





> a<-seq(1,10, 0.5); b<-seq(1,10, by =0.5)


> a


 [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0


[16]  8.5  9.0  9.5 10.0


> b


 [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0


[16]  8.5  9.0  9.5 10.0






> c<-seq(1,10, length=10); d=seq(1,10, length=15) # did I say that "<-" and "=" both assigns value


> c


 [1]  1  2  3  4  5  6  7  8  9 10


> d


 [1]  1.000000  1.642857  2.285714  2.928571  3.571429  4.214286  4.857143


 [8]  5.500000  6.142857  6.785714  7.428571  8.071429  8.714286  9.357143


[15] 10.000000






> f=seq(1,10, by=2); g=seq(1,10, by=3)


> f


[1] 1 3 5 7 9


> g


[1]  1  4  7 10




using rep()





> h=rep(1:5,2); i=rep(1:5, times=2)


> h


 [1] 1 2 3 4 5 1 2 3 4 5


> i


 [1] 1 2 3 4 5 1 2 3 4 5






> j=rep(1:5, each=2); k=rep(1:5, c(2,3,1,2,2))


> j


 [1] 1 1 2 2 3 3 4 4 5 5


> k


 [1] 1 1 2 2 2 3 4 4 5 5






> l=rep(1:5, each=3 , times=2)


> l


 [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5






> m=rep(1:5, each=2, len=8);n=rep(1:5, each=2, len=12)


> m


[1] 1 1 2 2 3 3 4 4


> n


 [1] 1 1 2 2 3 3 4 4 5 5 1 1




use of gl to make factors





> gl(4,5)


 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4


Levels: 1 2 3 4


> gl(4,5, labels=c("A","B","C","D"))


 [1] A A A A A B B B B B C C C C C D D D D D


Levels: A B C D






> gl(4,1,20)


 [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4


Levels: 1 2 3 4




use of sequence





> sequence(c(8:1))


 [1] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1


Some Basics Again

When you start R session, It starts in a working directory,. You can manually change it going in File menu. or by the code
>setwd("D:/Rsession")




 


then, you can get the path by command


 




> getwd()


[1] "D:/Rsession"




You can assign values to object by <- or -> sign,





>x<-rnorm(50)


>rnorm(100)->y




if you tried the opposite, you will get error message





> y->rnorm(50)


Error in rnorm(50) <- y : 


  target of assignment expands to non-language object




the objects in your current working directory can be listed by





>objects()






> ls()


[1] "x" "y"




to remove the objects





> rm(x)


> ls()


[1] "y"




to remove all the objects





> rm(list=ls())


> ls()


character(0)




to know the column names of table





> names(xy)


[1] "Rep"      "Genotype" "top"      "root"     "Control" 




to know more information about variables





> ls.str()


xy : 'data.frame':      224 obs. of  5 variables:


 $ Rep     : int  1 1 1 1 1 1 1 1 1 1 ...


 $ Genotype: Factor w/ 28 levels "M1M3","M1S1",..: 10 11 8 9 12 13 14 17 15 16 ...


 $ top     : num  0.65 0.77 0.77 0.73 0.69 0.72 0.56 0.69 0.65 0.86 ...


 $ root    : num  0.48 0.32 0.25 0.31 0.33 0.34 0.28 0.37 0.4 0.3 ...


 $ Control : int  0 0 0 0 0 0 0 0 0 0 ...


y :  num [1:50] -1.824965 -0.400061 -1.241349  2.017474 -0.000123 ...




to access the variable of dataframe





> xy$Rep


  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2


 [38] 2 2 2 2 2 2 2 2 2 2 2 




notice that Rep will not give anything here





> Rep


Error: object "Rep" not found




to make it accessible





> attach(xy)


> Rep


  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2


 [38] 2 2 2 2 2 2 2 2 2 2 2 




to make inaccessible again





> detach()


> Rep


Error: object "Rep" not found




to quit a program




>q() #or


>quit()



 


all for today