Step 1: Load the Data

We will use the airquality data set, which you should already have as part of your R installation.

  # Assign the built-in "airquality" dataset into a new variable called "air"
 air <- data.frame(airquality)

Step 2: Clean the data

After you load the data, there will be some NAs in the data. You need to figure out what to do about those nasty NAs.

 # find which columns in the dataframe contain NAs.
 colnames(air)[colSums(is.na(air)) > 0]
## [1] "Ozone"   "Solar.R"
 # Check if the NAs in column "Ozone" and replace them by the mean value of this column
 air$Ozone[is.na(air$Ozone)] <- mean(air$Ozone, na.rm=TRUE)
 # Check if the NAs in column "Solar.R" and replace those NAs by the mean value of this column
 air$Solar.R[is.na(air$Solar.R)] <- mean(air$Solar.R, na.rm=TRUE)

Step 3: Understand the data distribution

Create the following visualizations using ggplot:

  # histograms for each of the variables
 # use the Ozone in "air" dataframe as X variable to create a histogram
 # set the bin width to be 5, border color to be white, and bin color to be black
 ggplot(air, aes(x=Ozone)) + geom_histogram(binwidth=5, color="white", fill="black")

# use the Solar in "air" dataframe as X variable to create a histogram 
 ggplot(air, aes(x=Solar.R)) + geom_histogram(binwidth=5, color="white", fill="black")

 # use the Wind in "air" dataframe as X variable to create a histogram 
 ggplot(air, aes(x=Wind)) + geom_histogram(binwidth=0.5, color="white", fill="black")

  # use the Temp in "air" dataframe as X variable to create a histogram 
 ggplot(air, aes(x=Temp)) + geom_histogram(binwidth=2, color="white", fill="black")

  # use the Month in "air" dataframe as X variable to create a histogram 
 ggplot(air, aes(x=Month)) + geom_histogram(binwidth=1, color="white", fill="black")

  # use the Day in "air" dataframe as X variable to create a histogram 
 ggplot(air, aes(x=Day)) + geom_histogram(binwidth=1, color="white", fill="black")