To be honest, I’m totally ripping this from somewhere else. Usually I think I saw it as a trick in Data Manipulation with R. Even if I conceived of it on my own, the code was from somewhere else. I just happened to think “I could basically do the SAS datalines with this code.” For those of you familiar with SAS, you should have come across cards or datalines in some examples out there on the net. For instance, the Academic Technology Services at UCLA demonstrate reading data this way by this example:
DATA survey; INPUT id sex $ age inc r1 r2 r3 ; DATALINES; 1 F 35 17 7 2 2 17 M 50 14 5 5 3 33 F 45 6 7 2 7 49 M 24 14 7 5 7 65 F 52 9 4 7 7 81 M 44 11 7 7 7 2 F 34 17 6 5 3 18 M 40 14 7 5 2 34 F 47 6 6 5 6 50 M 35 17 5 7 5 ; RUN;
It is pretty clear what it does. You define a data set called “survey” that has the specified fields. The datalines statement is then followed by a series of lines of data to match the input statement in content. It provides a quick and easy way to read in small data sets. This is especially useful for the case you are dealing with small data. You can keep the data import with the code itself. It is also apt for examples.
R does not explicitly possess a similar behavior; though you can conveniently enter data into the R environment by copying a body of data like the above to the clipboard and use a read.table(“clipboard”, …) statement. Nevertheless, to recreate the SAS datalines or cards example, we can use a simple wrapper along with scan as follows.
survey <- data.frame(
scan(what =
list(id = 0, sex = “factor”, age = 0,
inc = 0, r1 = 0, r2 = 0, r3 = 0)
) # end scan
) # end data.frame
1 F 35 17 7 2 2
17 M 50 14 5 5 3
33 F 45 6 7 2 7
49 M 24 14 7 5 7
65 F 52 9 4 7 7
81 M 44 11 7 7 7
2 F 34 17 6 5 3
18 M 40 14 7 5 2
34 F 47 6 6 5 6
50 M 35 17 5 7 5
There should be a extra line below the last data line. The reason is that scan terminates once a blank entry is specified. The scan command has a what parameter that is similar to the input statement in SAS. The parameter takes a list object that contains data type identifiers for each field in the data set. For ease, you can use a data entity to represent the type, such as 0 for numeric or “” for character. We wrap this scan statement with a call to data.frame so the assigned object has the desired structure. The extra call wrapped around scan, however, can make the code daunting as the number of parentheses rises. Thus, we can make a wrapper for this function.
datalines <- function(...) data.frame(scan(...)) datalines(what = list(x = 0, y = 0)) 20 114 196 921 115 560 50 245 122 575 100 475 33 138 154 727 80 375 147 670 182 828 160 762
This datalines function is effectively nothing more than a call to scan that returns a data.frame object. It makes the code come out cleaner and more closely aligns to what we see in the SAS datalines command. While this is not world shattering news in the R community, it can serve the same utility that the SAS datalines command does. It can also help those in transition between SAS and R to think about “how can I make that SAS behavior in R?”