Chapter 11, “Storage Wars,” describes a variety of ways in which R can connect to data sources. Given that SQL – structured query language – is one of the most fundamental and widely used tools for manipulating data, understand how to use SQL in the context of R is very important. One of the basic data building blocks in R is the data frame and this object bears a very strong resemblance to the concept of a table in SQL. In fact, there is a package in R called “sqldf” that allows for the manipulation of a data frame as an SQL table. This feat is accomplished thanks to SQLite ( http://www.sqlite.org ), a fantastic, lightweight, open source implementation of SQL. Working with sqldf and SQLite is so convenient that under normal circumstances you can do your work completely within R, with no software installations needed on your computer (other than running the install.packages() command in R).

As we are about halfway through the course, this activity description does not provide the same level of code prompts as previous labs – it is assumed that you remember or can lookup the necessary code. The overall goal of this activity is to use SQL to produce a subset of the built-in “airquality” R dataset that contains only those records where the concentration of ozone is higher than the mean level of ozone. These are the conceptual steps you will need to follow:

• Review online documentation for sqldf so that you are familiar with the basic concepts and usage of the package and its commands.

• Install and activate (“library()”) the sqldf package in R-Studio. With any new package it is possible to run into installation issues depending on your platform and the versions of software you are running, so monitor your diagnostic messages carefully.

install.packages("sqldf")  #Comment out this line once you've installed the package
library(sqldf)
• Make sure the built-in “airquality” dataset is available for use in subsequent commands. It would be wise to reveal the first few records of airquality with head() to make sure that airquality is available. This will also show you the names of the columns of the airquality dataframe which you will need to use in SQL commands.

• Using sqldf(), run an SQL select command that calculates the average level of ozone across all records. Assign the resulting value into a variable and print it out in the console.

• Again using sqldf(), run another SQL command that selects all of the records from airquality where the value of Ozone is higher than the average. Note that it is possible to combine steps 4 and 5 into a single SQL command – those who are familiar with SQL syntax and usage should attempt to do so.

• Refine Step 5 to write the result table into a new R data object called “newAQ”. Then run a command to reveal what type of object newAQ is, another command to show what its dimensions are (i.e., how many rows and columns), and a head() command to show the first few rows.

• Repeat steps 4,5 and 6 using more “R” like way to do the analysis i.e., which()

LS0tCnRpdGxlOiAiTGFiIDU6IFN0b3JhZ2UgV2FycyIKYXV0aG9yOiAKLSBbWU9VUiBOQU1FXQotIFtZT1VSIFBBUlRORVJTIE5BTUVdCmRhdGU6ICJgciBTeXMudGltZSgpYCIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQpDaGFwdGVyIDExLCDigJxTdG9yYWdlIFdhcnMs4oCdIGRlc2NyaWJlcyBhIHZhcmlldHkgb2Ygd2F5cyBpbiB3aGljaCBSIGNhbiBjb25uZWN0IHRvIGRhdGEgc291cmNlcy4gR2l2ZW4gdGhhdCBTUUwg4oCTIHN0cnVjdHVyZWQgcXVlcnkgbGFuZ3VhZ2Ug4oCTIGlzIG9uZSBvZiB0aGUgbW9zdCBmdW5kYW1lbnRhbCBhbmQgd2lkZWx5IHVzZWQgdG9vbHMgZm9yIG1hbmlwdWxhdGluZyBkYXRhLCB1bmRlcnN0YW5kIGhvdyB0byB1c2UgU1FMIGluIHRoZSBjb250ZXh0IG9mIFIgaXMgdmVyeSBpbXBvcnRhbnQuIE9uZSBvZiB0aGUgYmFzaWMgZGF0YSBidWlsZGluZyBibG9ja3MgaW4gUiBpcyB0aGUgZGF0YSBmcmFtZSBhbmQgdGhpcyBvYmplY3QgYmVhcnMgYSB2ZXJ5IHN0cm9uZyByZXNlbWJsYW5jZSB0byB0aGUgY29uY2VwdCBvZiBhIHRhYmxlIGluIFNRTC4gSW4gZmFjdCwgdGhlcmUgaXMgYSBwYWNrYWdlIGluIFIgY2FsbGVkIOKAnHNxbGRm4oCdIHRoYXQgYWxsb3dzIGZvciB0aGUgbWFuaXB1bGF0aW9uIG9mIGEgZGF0YSBmcmFtZSBhcyBhbiBTUUwgdGFibGUuIFRoaXMgZmVhdCBpcyBhY2NvbXBsaXNoZWQgdGhhbmtzIHRvIFNRTGl0ZSAoIGh0dHA6Ly93d3cuc3FsaXRlLm9yZyApLCBhIGZhbnRhc3RpYywgbGlnaHR3ZWlnaHQsIG9wZW4gc291cmNlIGltcGxlbWVudGF0aW9uIG9mIFNRTC4gV29ya2luZyB3aXRoIHNxbGRmIGFuZCBTUUxpdGUgaXMgc28gY29udmVuaWVudCB0aGF0IHVuZGVyIG5vcm1hbCBjaXJjdW1zdGFuY2VzIHlvdSBjYW4gZG8geW91ciB3b3JrIGNvbXBsZXRlbHkgd2l0aGluIFIsIHdpdGggbm8gc29mdHdhcmUgaW5zdGFsbGF0aW9ucyBuZWVkZWQgb24geW91ciBjb21wdXRlciAob3RoZXIgdGhhbiBydW5uaW5nIHRoZSBpbnN0YWxsLnBhY2thZ2VzKCkgY29tbWFuZCBpbiBSKS4KCkFzIHdlIGFyZSBhYm91dCBoYWxmd2F5IHRocm91Z2ggdGhlIGNvdXJzZSwgdGhpcyBhY3Rpdml0eSBkZXNjcmlwdGlvbiBkb2VzIG5vdCBwcm92aWRlIHRoZSBzYW1lIGxldmVsIG9mIGNvZGUgcHJvbXB0cyBhcyBwcmV2aW91cyBsYWJzIOKAkyBpdCBpcyBhc3N1bWVkIHRoYXQgeW91IHJlbWVtYmVyIG9yIGNhbiBsb29rdXAgdGhlIG5lY2Vzc2FyeSBjb2RlLiBUaGUgb3ZlcmFsbCBnb2FsIG9mIHRoaXMgYWN0aXZpdHkgaXMgdG8gdXNlIFNRTCB0byBwcm9kdWNlIGEgc3Vic2V0IG9mIHRoZSBidWlsdC1pbiDigJxhaXJxdWFsaXR54oCdIFIgZGF0YXNldCB0aGF0IGNvbnRhaW5zIG9ubHkgdGhvc2UgcmVjb3JkcyB3aGVyZSB0aGUgY29uY2VudHJhdGlvbiBvZiBvem9uZSBpcyBoaWdoZXIgdGhhbiB0aGUgbWVhbiBsZXZlbCBvZiBvem9uZS4gVGhlc2UgYXJlIHRoZSBjb25jZXB0dWFsIHN0ZXBzIHlvdSB3aWxsIG5lZWQgdG8gZm9sbG93OgoKKiBSZXZpZXcgb25saW5lIGRvY3VtZW50YXRpb24gZm9yIHNxbGRmIHNvIHRoYXQgeW91IGFyZSBmYW1pbGlhciB3aXRoIHRoZSBiYXNpYyBjb25jZXB0cyBhbmQgdXNhZ2Ugb2YgdGhlIHBhY2thZ2UgYW5kIGl0cyBjb21tYW5kcy4gCgoqIEluc3RhbGwgYW5kIGFjdGl2YXRlICjigJxsaWJyYXJ5KCnigJ0pIHRoZSBzcWxkZiBwYWNrYWdlIGluIFItU3R1ZGlvLiBXaXRoIGFueSBuZXcgcGFja2FnZSBpdCBpcyBwb3NzaWJsZSB0byBydW4gaW50byBpbnN0YWxsYXRpb24gaXNzdWVzIGRlcGVuZGluZyBvbiB5b3VyIHBsYXRmb3JtIGFuZCB0aGUgdmVyc2lvbnMgb2Ygc29mdHdhcmUgeW91IGFyZSBydW5uaW5nLCBzbyBtb25pdG9yIHlvdXIgZGlhZ25vc3RpYyBtZXNzYWdlcyBjYXJlZnVsbHkuCgpgYGAKaW5zdGFsbC5wYWNrYWdlcygic3FsZGYiKSAgI0NvbW1lbnQgb3V0IHRoaXMgbGluZSBvbmNlIHlvdSd2ZSBpbnN0YWxsZWQgdGhlIHBhY2thZ2UKbGlicmFyeShzcWxkZikKYGBgCgoqIE1ha2Ugc3VyZSB0aGUgYnVpbHQtaW4g4oCcYWlycXVhbGl0eeKAnSBkYXRhc2V0IGlzIGF2YWlsYWJsZSBmb3IgdXNlIGluIHN1YnNlcXVlbnQgY29tbWFuZHMuIEl0IHdvdWxkIGJlIHdpc2UgdG8gcmV2ZWFsIHRoZSBmaXJzdCBmZXcgcmVjb3JkcyBvZiBhaXJxdWFsaXR5IHdpdGggaGVhZCgpIHRvIG1ha2Ugc3VyZSB0aGF0IGFpcnF1YWxpdHkgaXMgYXZhaWxhYmxlLiBUaGlzIHdpbGwgYWxzbyBzaG93IHlvdSB0aGUgbmFtZXMgb2YgdGhlIGNvbHVtbnMgb2YgdGhlIGFpcnF1YWxpdHkgZGF0YWZyYW1lIHdoaWNoIHlvdSB3aWxsIG5lZWQgdG8gdXNlIGluIFNRTCBjb21tYW5kcy4gCgoqIFVzaW5nIHNxbGRmKCksIHJ1biBhbiBTUUwgc2VsZWN0IGNvbW1hbmQgdGhhdCBjYWxjdWxhdGVzIHRoZSBhdmVyYWdlIGxldmVsIG9mIG96b25lIGFjcm9zcyBhbGwgcmVjb3Jkcy4gQXNzaWduIHRoZSByZXN1bHRpbmcgdmFsdWUgaW50byBhIHZhcmlhYmxlIGFuZCBwcmludCBpdCBvdXQgaW4gdGhlIGNvbnNvbGUuIAoKKiBBZ2FpbiB1c2luZyBzcWxkZigpLCBydW4gYW5vdGhlciBTUUwgY29tbWFuZCB0aGF0IHNlbGVjdHMgYWxsIG9mIHRoZSByZWNvcmRzIGZyb20gYWlycXVhbGl0eSB3aGVyZSB0aGUgdmFsdWUgb2YgT3pvbmUgaXMgaGlnaGVyIHRoYW4gdGhlIGF2ZXJhZ2UuIE5vdGUgdGhhdCBpdCBpcyBwb3NzaWJsZSB0byBjb21iaW5lIHN0ZXBzIDQgYW5kIDUgaW50byBhIHNpbmdsZSBTUUwgY29tbWFuZCDigJMgdGhvc2Ugd2hvIGFyZSBmYW1pbGlhciB3aXRoIFNRTCBzeW50YXggYW5kIHVzYWdlIHNob3VsZCBhdHRlbXB0IHRvIGRvIHNvLiAKCiogUmVmaW5lIFN0ZXAgNSB0byB3cml0ZSB0aGUgcmVzdWx0IHRhYmxlIGludG8gYSBuZXcgUiBkYXRhIG9iamVjdCBjYWxsZWQg4oCcbmV3QVHigJ0uIFRoZW4gcnVuIGEgY29tbWFuZCB0byByZXZlYWwgd2hhdCB0eXBlIG9mIG9iamVjdCBuZXdBUSBpcywgYW5vdGhlciBjb21tYW5kIHRvIHNob3cgd2hhdCBpdHMgZGltZW5zaW9ucyBhcmUgKGkuZS4sIGhvdyBtYW55IHJvd3MgYW5kIGNvbHVtbnMpLCBhbmQgYSBoZWFkKCkgY29tbWFuZCB0byBzaG93IHRoZSBmaXJzdCBmZXcgcm93cy4gCgoqIFJlcGVhdCBzdGVwcyA0LDUgYW5kIDYgdXNpbmcgbW9yZSDigJxS4oCdIGxpa2Ugd2F5IHRvIGRvIHRoZSBhbmFseXNpcyBpLmUuLCB3aGljaCgpCgoKCgoK