The “dataframe” is one of the most essential data structures used in R. It is conceptually equivalent to a database “relation” and to the typical rectangular dataset with variables as columns and cases as rows. For this activity, you will gain some skill with manipulating a dataframe.

Task 1

R offers several built-in dataframes: For this activity we will use the “mtcars” dataset that contains 11 variables and 32 cases representing different models of cars.

The goal is to create a new variable for this dataframe that represents the engine displacement per cylinder in cubic inches for each vehicle. You may not know what displacement is (or maybe even cylinders), but it will suffice to know that values in the column named “disp” divided by values in the column named “cyl” will yield the appropriate quantity.

One fundamental principle of working with data is that you should never overwrite or change your original raw data. Therefore, your very first line of code should be:

  # Copy original dataframe into a new one
my_mtcars <- mtcars

From that point forward you can work on my_mtcars without mucking up the original data. Also note that in order to establish that you have completed the assignment correctly, your last command should summarize your new variable using the summary() function. The output of that final command should look exactly like this:

  # Copy original dataframe into a new one
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
# create a new variable that is the quotient of disp/cyl
my_mtcars$disper <- my_mtcars$disp/mtcars$cyl
summary(my_mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb           disper     
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000   Min.   :17.77  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000   1st Qu.:26.92  
##  Median :0.0000   Median :4.000   Median :2.000   Median :34.48  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812   Mean   :35.03  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:43.19  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000   Max.   :59.00

Task 2

Gather some basic “demographic” information from about five friends or family members, and then enter those data into a data frame using the appropriate R commands. Finally, summarize the contents of the data frame, again using the appropriate R commands. Keep the demographics “light” to avoid getting too personal: For each person report 1) the number of pets that they have (dogs, cats, etc.); 2) their birth order in their family (i.e., 1 for first born, etc.); and 3) the number of siblings they have.

Collect the necessary data from your friends and family members, write, test, and submit the necessary code in R to accomplish the following:

  1. Create three vectors of integers as described above, using the c( ) (concatenate) command to store data reported by group members, with these variable names: Pets, Order, and Siblings.
pets <- c(1,3,5,6,0)
birthorder <- c(1,5,3,3,5)
siblings <- c(0,0,1,2,3)
  1. Also create a vector of user IDs for the friends and family members.
  # Copy original dataframe into a new one
person <- c(441,442,443,444,445)
  1. Bind those four vectors together into a data frame called myFriends.
  # Copy original dataframe into a new one
myFriends <- data.frame(person,siblings,pets,birthorder)
  1. Use the appropriate R command to report the structure of your data frame as well as a summary of the data (with minimums, means, maximums, etc. as shown on page 32. The result should show, “X obs. Of 4 variables,” where X is the number of friends and family members who reported their data.
  # Copy original dataframe into a new one
str(myFriends)
## 'data.frame':    5 obs. of  4 variables:
##  $ person    : num  441 442 443 444 445
##  $ siblings  : num  0 0 1 2 3
##  $ pets      : num  1 3 5 6 0
##  $ birthorder: num  1 5 3 3 5
summary(myFriends)
##      person       siblings        pets     birthorder 
##  Min.   :441   Min.   :0.0   Min.   :0   Min.   :1.0  
##  1st Qu.:442   1st Qu.:0.0   1st Qu.:1   1st Qu.:3.0  
##  Median :443   Median :1.0   Median :3   Median :3.0  
##  Mean   :443   Mean   :1.2   Mean   :3   Mean   :3.4  
##  3rd Qu.:444   3rd Qu.:2.0   3rd Qu.:5   3rd Qu.:5.0  
##  Max.   :445   Max.   :3.0   Max.   :6   Max.   :5.0
  1. Use the $ notation explained on page 33 to list all of the values for each of the variables in the myFriends data frame (example myGroup$Pets).
  # Copy original dataframe into a new one
myFriends$person
## [1] 441 442 443 444 445
myFriends$siblings
## [1] 0 0 1 2 3
myFriends$pets
## [1] 1 3 5 6 0
myFriends$birthorder
## [1] 1 5 3 3 5

Hints: All of the examples that you need in order to write the necessary R commands are right there in Chapter 5. The most challenging part of this challenge will probably be getting the data from your friends and family members. Don’t wait too long! It’s okay if not everyone you ask participates. Use the user IDs of the friends and family members from item #2 above to keep track of who participated.