Data Steps/ Reading from file

we seldom create the data our selves, most of the time we read the data from file. there are many ways to read the data.

Reading from Raw data

> abc<-read.table("control_vs_inocul.txt")


> abc[1:10,]


    V1       V2   V3   V4      V5


1  Rep Genotype  top root Control


2    1     R1R4 0.65 0.48       0


3    1     R1R7 0.77 0.32       0


4    1     R1M1 0.77 0.25       0


5    1     R1M3 0.73 0.31       0


6    1     R1S1 0.69 0.33       0


7    1     R1S3 0.72 0.34       0


8    1     R1S7 0.56 0.28       0


9    1     R4R7 0.69 0.37       0


10   1     R4M1 0.65 0.40       0




Notice the V1, V2....... these are the default variables given by the R system , so to inclue the variable name from file





> abc<-read.table("control_vs_inocul.txt", header=T)


> abc[1:10,]


   Rep Genotype  top root Control


1    1     R1R4 0.65 0.48       0


2    1     R1R7 0.77 0.32       0


3    1     R1M1 0.77 0.25       0


4    1     R1M3 0.73 0.31       0


5    1     R1S1 0.69 0.33       0


6    1     R1S3 0.72 0.34       0


7    1     R1S7 0.56 0.28       0


8    1     R4R7 0.69 0.37       0


9    1     R4M1 0.65 0.40       0


10   1     R4M3 0.86 0.30       0




or to read the data first and assign variable name later,





> abc<-read.table("control_vs_inocul.txt", header=F, skip=1)


> abc[1:10,]


   V1   V2   V3   V4 V5


1   1 R1R4 0.65 0.48  0


2   1 R1R7 0.77 0.32  0


3   1 R1M1 0.77 0.25  0


4   1 R1M3 0.73 0.31  0


5   1 R1S1 0.69 0.33  0


6   1 R1S3 0.72 0.34  0


7   1 R1S7 0.56 0.28  0


8   1 R4R7 0.69 0.37  0


9   1 R4M1 0.65 0.40  0


10  1 R4M3 0.86 0.30  0






> names(abc)<-c("A", "B", "C", "D", "E")


> abc[1:10,]


   A    B    C    D E


1  1 R1R4 0.65 0.48 0


2  1 R1R7 0.77 0.32 0


3  1 R1M1 0.77 0.25 0


4  1 R1M3 0.73 0.31 0


5  1 R1S1 0.69 0.33 0


6  1 R1S3 0.72 0.34 0


7  1 R1S7 0.56 0.28 0


8  1 R4R7 0.69 0.37 0


9  1 R4M1 0.65 0.40 0


10 1 R4M3 0.86 0.30 0




in some rows there are missing data..





> abc[210:215,]


    A    B    C    D E


210 3 R7M1 0.60 0.26 1


211 3 R7M3 0.36 0.31 1


212 3 R7S1 0.68 0.23 1


213 3 R7S3    .    . 1


214 3 R7S7 0.52 0.42 1


215 3 M1M3 0.58 0.13 1




here "." does not mean missing value. it is the input from file, so to recognize it as missing value





> abc<-read.table("control_vs_inocul.txt", header=T, na.string=".")


> abc[210:215,]


    Rep Genotype  top root Control


210   3     R7M1 0.60 0.26       1


211   3     R7M3 0.36 0.31       1


212   3     R7S1 0.68 0.23       1


213   3     R7S3   NA   NA       1


214   3     R7S7 0.52 0.42       1


215   3     M1M3 0.58 0.13       1




the default separator of read.table is " ", you can always change this in parameter sep="," or any other else.



for a complete description type ?read.table in R interface.



there are variations in read.table such as read.csv, read.delim



The another common format of raw data is fixed width format. to read that you can use read.fwf



for example if fwf1.dat contains following data





1S1.52.33


2S2.56.33


3R1.23


then









> def <- read.fwf("fwf1.dat", width=c(1,1,3,4), col.names=c("ID", "type", "top", "root"))


> def


  ID type top root


1  1    S 1.5 2.33


2  2    S 2.5 6.33


3  3    R 1.2 3.00




Another primitive function to read data is scan() function





> scanned<-scan("control_vs_inocul.txt", skip=1, what=list(0,"",0,0,0), nlines=7)


Read 7 records


> scanned


[[1]]


[1] 1 1 1 1 1 1 1


 


[[2]]


[1] "R1R4" "R1R7" "R1M1" "R1M3" "R1S1" "R1S3" "R1S7"


 


[[3]]


[1] 0.65 0.77 0.77 0.73 0.69 0.72 0.56


 


[[4]]


[1] 0.48 0.32 0.25 0.31 0.33 0.34 0.28


 


[[5]]


[1] 0 0 0 0 0 0 0


you can also use scan to directly input data with keyboard.









> scan()


1: 1 2 3 4 5 6 7 8


9: 


Read 8 items


[1] 1 2 3 4 5 6 7 8




Most of the time we input our raw data in spreadsheet application such as MS Excel. What I do is save the data  in csv or tab delimitated txt format and read through read.table function.



Another nice package to read data directly from Excel is xlsReadWrite. to import from Excel file directly,





> library(xlsReadWrite)


> data1<-read.xls("original.xls")


> data1[1:7,]


   F M rep. CROSS Egg Gall  top root. count


1  1 2    1    12   0    1 0.65  0.48  1371


2  1 2    1    12   0    1 0.65  0.48  1371


3  1 2    1    12   1    1 0.65  0.48  1371


4  1 2    1    12   0    1 0.65  0.48  1371


5  1 2    1    12   1    3 0.65  0.48  1371


6  1 2    1    12   0    1 0.65  0.48  1371


7  1 2    1    12   0    1 0.65  0.48  1371




for more information about the parameters and default values look at



http://cran.r-project.org/web/packages/xlsReadWrite/xlsReadWrite.pdf



further next time

0 comments: