Week 5: Connecting with external data sourcess

IST 687

Corey Jackson

2020-02-05 18:15:44

Today’s Agenda

Announcements

Week 4 Inferential statistics (Review)

Week 4 Inferential statistics (Review)

Week 4: Sampling and Inference (Review)

Week 4: Sampling and Inference (Review)

Week 4: Important terminology/code (Review)

Law of large numbers: As the size of a sample drawn from a random variable increases, the mean of more samples gets closer and closer to the true population mean.

Central limit theorem: Given a dataset with unknown distribution (it could be uniform, binomial or completely random), the sample means will approximate the normal distribution.

*Possible test questions

Week 4: Replication (Review)

rep(c("Corey","Home"),3) or replicate(3,c("Corey","Home"))

## [1] "Corey" "Home"  "Corey" "Home"  "Corey" "Home"

Lab 5: Storage Wars

Lab 5: Storage Wars

Lab Goals:

Groups for Pair Programming

Lab 5: Working with SQL

SELECT column1, column2, …
FROM table-name

Lab 5: Working with SQL

##   mpg  hp
## 1  21 110
## 2  21 110
##   mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1  21   6  160 110  3.9 2.620 16.46  0  1    4    4
## 2  21   6  160 110  3.9 2.875 17.02  0  1    4    4

Lab 5: Working with SQL

"SELECT mpg,disp,cyl FROM mtcars WHERE cyl = 6"

##    mpg  disp cyl
## 1 21.0 160.0   6
## 2 21.0 160.0   6
## 3 21.4 258.0   6
## 4 18.1 225.0   6
## 5 19.2 167.6   6
## 6 17.8 167.6   6
## 7 19.7 145.0   6

Lab 5: Working with SQL

##   min(mpg)
## 1     10.4

Note: You’ll need to find out the appropirate functions for SQL queries for today’s lab. Check w3schools.com .

Lab 5: Working with SQL (subqueries)

SELECT min(mpg) FROM mtcar

##   min(cyl)
## 1        4

SELECT mpg,disp,cyl FROM mtcars WHERE cyl = (select min(cyl) from mtcars)

##    mpg  disp cyl
## 1 22.8 108.0   4
## 2 24.4 146.7   4
## 3 22.8 140.8   4
## 4 32.4  78.7   4
## 5 30.4  75.7   4

Lab 5: sqldf in R

Homework 5

Homework 5 Tips

Homework 5 Tips: About JSON

Homework 5 Tips: About JSON

Homework 5 Tips: Working with JSON (Step 1)

Homework 5 Tips: Working with JSON (Step 1)

##      Length Class  Mode
## meta     1  -none- list
## data 18638  -none- list

Homework 5 Tips: Aggregating using tapply()

##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"

Homework 5 Tips: Aggregating using tapply()

tapply(mtcars$mpg,mtcars$cyl,mean)

##        4        6        8 
## 26.66364 19.74286 15.10000

Note: na.rm = TRUE can also be used to ignore columns containing NAs

e.g., tapply(mtcars$mpg,mtcars$cyl,mean, na.rm=TRUE)

Homework 5 Tips: Users errors (Step 3 and 4)

“Corey”
nchar("Corey")

## [1] 5

“Corey ”
nchar(“Corey ”)

## [1] 6

Next Week

Group Project Meetings