we seldom create the data our selves, most of the time we read the data from file. there are many ways to read the data.
Reading from Raw data
> abc<-read.table("control_vs_inocul.txt")
> abc[1:10,]
V1 V2 V3 V4 V5
1 Rep Genotype top root Control
2 1 R1R4 0.65 0.48 0
3 1 R1R7 0.77 0.32 0
4 1 R1M1 0.77 0.25 0
5 1 R1M3 0.73 0.31 0
6 1 R1S1 0.69 0.33 0
7 1 R1S3 0.72 0.34 0
8 1 R1S7 0.56 0.28 0
9 1 R4R7 0.69 0.37 0
10 1 R4M1 0.65 0.40 0
Notice the V1, V2....... these are the default variables given by the R system , so to inclue the variable name from file
> abc<-read.table("control_vs_inocul.txt", header=T)
> abc[1:10,]
Rep Genotype top root Control
1 1 R1R4 0.65 0.48 0
2 1 R1R7 0.77 0.32 0
3 1 R1M1 0.77 0.25 0
4 1 R1M3 0.73 0.31 0
5 1 R1S1 0.69 0.33 0
6 1 R1S3 0.72 0.34 0
7 1 R1S7 0.56 0.28 0
8 1 R4R7 0.69 0.37 0
9 1 R4M1 0.65 0.40 0
10 1 R4M3 0.86 0.30 0
or to read the data first and assign variable name later,
> abc<-read.table("control_vs_inocul.txt", header=F, skip=1)
> abc[1:10,]
V1 V2 V3 V4 V5
1 1 R1R4 0.65 0.48 0
2 1 R1R7 0.77 0.32 0
3 1 R1M1 0.77 0.25 0
4 1 R1M3 0.73 0.31 0
5 1 R1S1 0.69 0.33 0
6 1 R1S3 0.72 0.34 0
7 1 R1S7 0.56 0.28 0
8 1 R4R7 0.69 0.37 0
9 1 R4M1 0.65 0.40 0
10 1 R4M3 0.86 0.30 0
> names(abc)<-c("A", "B", "C", "D", "E")
> abc[1:10,]
A B C D E
1 1 R1R4 0.65 0.48 0
2 1 R1R7 0.77 0.32 0
3 1 R1M1 0.77 0.25 0
4 1 R1M3 0.73 0.31 0
5 1 R1S1 0.69 0.33 0
6 1 R1S3 0.72 0.34 0
7 1 R1S7 0.56 0.28 0
8 1 R4R7 0.69 0.37 0
9 1 R4M1 0.65 0.40 0
10 1 R4M3 0.86 0.30 0
in some rows there are missing data..
> abc[210:215,]
A B C D E
210 3 R7M1 0.60 0.26 1
211 3 R7M3 0.36 0.31 1
212 3 R7S1 0.68 0.23 1
213 3 R7S3 . . 1
214 3 R7S7 0.52 0.42 1
215 3 M1M3 0.58 0.13 1
here "." does not mean missing value. it is the input from file, so to recognize it as missing value
> abc<-read.table("control_vs_inocul.txt", header=T, na.string=".")
> abc[210:215,]
Rep Genotype top root Control
210 3 R7M1 0.60 0.26 1
211 3 R7M3 0.36 0.31 1
212 3 R7S1 0.68 0.23 1
213 3 R7S3 NA NA 1
214 3 R7S7 0.52 0.42 1
215 3 M1M3 0.58 0.13 1
the default separator of read.table is " ", you can always change this in parameter sep="," or any other else.
for a complete description type ?read.table in R interface.
there are variations in read.table such as read.csv, read.delim
The another common format of raw data is fixed width format. to read that you can use read.fwf
for example if fwf1.dat contains following data
1S1.52.33
2S2.56.33
3R1.23
then
> def <- read.fwf("fwf1.dat", width=c(1,1,3,4), col.names=c("ID", "type", "top", "root"))
> def
ID type top root
1 1 S 1.5 2.33
2 2 S 2.5 6.33
3 3 R 1.2 3.00
Another primitive function to read data is scan() function
> scanned<-scan("control_vs_inocul.txt", skip=1, what=list(0,"",0,0,0), nlines=7)
Read 7 records
> scanned
[[1]]
[1] 1 1 1 1 1 1 1
[[2]]
[1] "R1R4" "R1R7" "R1M1" "R1M3" "R1S1" "R1S3" "R1S7"
[[3]]
[1] 0.65 0.77 0.77 0.73 0.69 0.72 0.56
[[4]]
[1] 0.48 0.32 0.25 0.31 0.33 0.34 0.28
[[5]]
[1] 0 0 0 0 0 0 0
you can also use scan to directly input data with keyboard.
> scan()
1: 1 2 3 4 5 6 7 8
9:
Read 8 items
[1] 1 2 3 4 5 6 7 8
Most of the time we input our raw data in spreadsheet application such as MS Excel. What I do is save the data in csv or tab delimitated txt format and read through read.table function.
Another nice package to read data directly from Excel is xlsReadWrite. to import from Excel file directly,
> library(xlsReadWrite)
> data1<-read.xls("original.xls")
> data1[1:7,]
F M rep. CROSS Egg Gall top root. count
1 1 2 1 12 0 1 0.65 0.48 1371
2 1 2 1 12 0 1 0.65 0.48 1371
3 1 2 1 12 1 1 0.65 0.48 1371
4 1 2 1 12 0 1 0.65 0.48 1371
5 1 2 1 12 1 3 0.65 0.48 1371
6 1 2 1 12 0 1 0.65 0.48 1371
7 1 2 1 12 0 1 0.65 0.48 1371
for more information about the parameters and default values look at
http://cran.r-project.org/web/packages/xlsReadWrite/xlsReadWrite.pdf
further next time
0 comments:
Post a Comment