An archive of Mark's Fall 2017 Intro Stat course.

Examining a random proportion (for 8:00 AM section)

mark

(5 pts)

Using your Discourse login name, load some data into a data from like so:

df = read.csv("https://www.marksmath.org/cgi-bin/random_data.csv?username=mark")
head(df)

#     first_name last_name age gender height weight income smoke100 exerany handedness
# 1      Donna     Dinan  35 female  65.37 164.26   1947        N       Y          R
# 2      Ramon     Davis  20   male  66.59 139.53  22747        Y       Y          R
# 3       Mark      Buss  23   male  74.58 124.21  15489        N       Y          R
# 4      Lidia    Elmore  52 female  63.87 153.64   8369        N       Y          R

Use this data to find a 95% confidence interval for the proportion of people who are left handed.
Supposedly, about 12% of the population is left-handed. Use this data to perform a hypothesis test of that statement.

I guess your confidence interval should look like this: [0.0268, 0.1332].

mark

asiarenee5

df=read.csv("https://marksmath.org/cgi-bin/random_data.csv?username=asiarenee5")
  head(df)

   first_name last_name age gender height weight income smoke100 exerany handedness
1   Nannette    Mounts  31 female  61.31 187.23   1210        Y       Y          L
2      Keith   Micheau  28   male  70.63 150.21  27734        Y       Y          R
3   Chantell  Alvarado  25 female  66.22 145.85   6424        N       N          L
4     Arthur     Allen  44   male  67.50 209.12   3734        N       N          R
5     Sheryl   Bednorz  32 female  61.29 138.08  24255        N       N          R
6      Debra Rodriguez  34 female  64.65 108.44   9326        N       Y          L
> table(df$handedness)

 L  R 
12 88 
> p=0.12
> se=sqrt(p*(1-p)/100)
> c(p-2*se,p+2*se)
[1] 0.05500769 0.18499231

     phat=0.12
     p0= 0.14
     se0=sqrt(p0*(1-p0)/100)
     pnorm(phat,p0,se0)
     [1] 0.05610010

My The 95% confidence interval for the people who are left handed is [ 0.05500769 0.1849923]. Performing a hypothesis test with a mean of 0.14 and phat of 0.12, I cannot reject my hypothesis that 14% of the population is left handed. My p value of 0.05610010 is not less than 0.05.

jkelso

df = read.csv(“https://www.marksmath.org/cgi-bin/random_data.csv?username=jkelso”)

table(df$handedness)
p=.19
se=sqrt(p*(1-p)/100)
c(p-1.96se, p+1.96se)
[1] 0.113109 0.266891

L R
19 81

The confidence interval is (.1121,.2669)

phat=.19
p0=.12
se0=sqrt(p0*(1-p0)/100)
pnorm(phat,p0,se0)
[1] 0.9843839

The P-value is not less than .05, so we cannot reject that 12% of the population is left handed

GetSwifty

that was no good

df = read.csv(“https://www.marksmath.org/cgi-bin/random_data.csv?username=GetSwifty”)

head(df)

first_name last_name age gender height weight income smoke100

1 Phyllis Smith 39 female 61.52 183.00 2507 N

2 Jennifer Willis 36 female 63.27 175.96 152928 Y

3 Kenneth Staton 45 male 72.41 158.97 201 Y

4 Brian Liles 38 male 69.35 170.46 12655 N

5 Alma Reynoso 33 female 62.22 158.13 1957 N

6 Yee Schoonover 33 female 63.10 190.13 420836 N

exerany handedness

1 Y R
2 Y R
3 Y L
4 Y R
5 Y R
6 Y R

table(df$handedness)

L R
19 81

p = 0.19

se = sqrt(p*(1-p)/100)

c(p-2se, p+2se)

95% confidence interval = [1] 0.1115398 0.2684602

phat = 0.19

p0 = 0.12

se0 = sqrt(p0*(1-p0)/100)

pnorm(phat,p0,se0)

[1] 0.9843839 p value is not less then 0.5 so we can not reject the hypothesis that 12 % of the populations left handed

nsugar

 first_name last_name age gender height weight income
1      Bryan    Barden  41   male  67.17 182.78  16229
2     Joanne    Miller  23 female  63.08 151.54  16281
3     Joshua     Tapia  25   male  71.46 162.37  48168
4     Gerald  Marshall  49   male  71.88 175.65  30826
5       Emma   Orourke  41 female  62.44 163.40  13781
6   Mercedes  Martinez  28 female  63.33 222.74    530
    smoke100 exerany handedness
1        Y       N          R
2        N       N          R
3        N       Y          R
4        N       Y          R
5        N       Y          R
6        Y       Y          L
 table(df$handedness)

 L  R 
18 82 
p=.18
 se=sqrt(p*(1-p)/100)
c(p-2*se,p+2*se)
[1] 0.1031625 0.2568375
phat=.18
p0=.12
se0=sqrt(p0*(1-p0)/100)
pnorm(phat,p0,se0)
[1] 0.9675809

The 95% Confidence Intervals for my data are: .1032, .2568
The P-value i s.9675809 and is not less than .05 therefore, we cannot reject the hypothesis that 12% of the population is left handed.

blaiser1

In my data, 21 people were left handed and 79 were right handed.
p=.21
se=.0407 (sqrt((.21*(.79))/100))

so my confidence interval is

(-.1285, +.291461)

df = read.csv("https://marksmath.org/cgi-bin/random_data.csv?
username=blaiser1")
first_name last_name age gender height weight income smoke100 exerany handedness
1      Marie   Ferrara  52 female  66.22 163.60   8202        Y       N          R
2      Helen  Madrigal  21 female  59.85 215.28  11506        N       Y          L
3       Anna  Prichard  23 female  56.30 141.59    714        Y       N          R
4      Tyler     Boyce  51   male  65.49 150.10   1775        Y       Y          R
5      Agnes    Pruitt  52 female  66.14 178.02   4290        Y       Y          L
6       Mary  Williams  56 female  60.76 176.40   7196        Y       N          R
head(df)
pnorm(.21,.12,.03249)
1-pnorm(.21,.12,.03249)

SO
when we take pnorm(.21,.12,.03249) we get 0.9971979. or we could skip and do 1-pnorm(.21,.12,.03249) which comes out to 0.0028021. You’ll notice that its smaller than .05, so we REJECT the null. According to my data I would propose a hypothesis that states that 21% of the population is left handed.

amandanail

> df=read.csv("https://marksmath.org/cgi-bin/random_data.csv?username=amandanail")
> head(df)
  first_name last_name age gender height weight income smoke100 exerany handedness
1      Bryon    Conway  25   male  66.33 188.01   2725        Y       Y          R
2      Steve    Garber  21   male  76.20 165.14 126743        N       N          R
3      David    Jedele  23   male  64.77 138.71  33589        N       Y          R
4      Rubye    Tuttle  21 female  62.70 134.78  10656        N       Y          L
5     Frieda     Bixby  33 female  64.52 178.19   7388        Y       N          R
6     Bonnie    Turner  44 female  63.92 186.30  56072        Y       Y          R
> table(df$handedness)
1. So, we see there are 12 left handed people and 88 right handed people.
L  R 
12 88 
2. The confidence interval is [0.02574136 0.13425864]

shiller

df=read.csv("https://marksmath.org/cgi-bin/random_data.csv?username=shiller")
head(df)

# first_name   last_name age gender height weight income smoke100 exerany handedness
# 1   Cornelia       Heath  59 female  66.51 171.07   1317 1        N       Y          R
# 2     Donald        Reid  20   male  64.74 146.36  22910 2        Y       N          R
# 3  Katherine    Driskill  38 female  66.20 184.15   9002 3        Y       N          R
# 4     Brian     Archila  39   male  70.88 105.76   4637  4        Y       Y          R
# 5     Debra Schexnayder  21 female  62.36 140.74   5829   5        Y       Y          R
# 6     Larry        Dunn  35   male  67.63 127.93   5493   6        Y       Y          L   

table(df$handedness)
 L  R 
13 87 
p=0.13
se=sqrt(p*(1-p)/100)
c(p-2*se,p+2*se)
[1] 0.06273931 0.19726069
phat=0.13
p0=0.12
se0=sqrt(p0*(1-p0)/100)
pnorm(phat,p0,se0)
[1] 0.6208556

My 95% confidence interval for people who are left handed is (0.06274, 0.19726). Performing a hypothesis test with a mean of .12 and phat of .13, I cannot reject the hypothesis that 12% of the population is left-handed. My p-value of 0.6208556 is not less than 0.05.

DariousAquarious

table(df$handedness)

 L  R 
 9 91

In my population 9 out of 100 are left handed.

sqrt((.9*.81)/100)
[1] 0.0853815
.9-0.170763
[1] 0.729237
.9+0.170763
[1] 1.070763

To a 95% confidence interval the true proportion of left handed people is

.9\pm.170763

[.729237,1.070763]

phat=.09
p0=.12
se0=sqrt(p0*(1-p0)/100)
pnorm(phat,p0,se0)
[1] 0.1779551

My null hypothesis is accepted as the final p-value of .1780 is more than the .05 of the null hypothesis.

alogan3

df = read.csv("https://marksmath.org/cgi-bin/random_data.csv?username=alogan3")
head(df)

first_name last_name age gender height weight   income smoke100 exerany handedness
1  Ernest Culbertson 37   male   67.36 154.32    25954        N       N     R
2  Leonard  Baird    31   male   68.74 169.88    21447        Y       N     R
3    Tina   Sweet    20  female  67.77 182.46     1582        N       Y     R
4 Kristine Seltzer   21  female  65.94 168.33 29010583        N       N     R
5    Judy   George   27  female  57.33 128.77    12985        N       Y     R
6  Jeffrey   Brown   29   male   63.71 152.05     8113        Y       Y     R
table(df$handedness)
L    R 
18  82 
p = 0.18
se = sqrt(p*(1-p)/100)
c(p-2*se, p+2*se)
[1] 0.1031625 0.2568375
phat = 0.18
p0 = 0.12
se0 = sqrt(p0*(1-p0)/100)
pnorm(phat,p0,se0)
 [1] 0.9675809

(0.1032, 0.2568) is my 95% confidence interval for people who are left-handed. Calculating the hypothesis test with a phat of .18 and a mean of 0.12, I cannot reject the null hypothesis that 12% of people are left-handed because my p-value, 0.9675809, is not less than 0.5.

acuozzi3

My 95% confidence interval for people who are left handed is (.07858572, .22141428). Performing a hypothesis test with a mean of .12 and a phat of .15, I cannot reject the hypothesis that 12% of the population is left-handed. My p-value is .8220449.

   L                         R

.07858572 .22141428

2 .8220449

mrothenb

se = sqrt(p*(1-p)/100)
c(p-2se, p+ 2se)
[1] 0.02574136 0.13425864

phat = .08
p0 = .12
se0 = sqrt(p0*(1-p0)/100)
pnorm(phat,p0,se0)
[1] 0.1091773

Due to the p value I can reject the hypothesis test.

nmitchel

df = read.csv(“https://www.marksmath.org/cgi-bin/random_data.csv?username=nmitchel”)
head(df)
first_name last_name age gender height weight income
1 Tasha Shelby 43 female 63.18 178.65 43854
2 Andrea Walden 51 female 60.98 170.78 7289
3 Cheryl Unger 46 female 59.90 179.02 4364
4 Melissa Hart 30 female 63.34 176.42 3398
5 Ethel Barker 32 female 64.61 145.12 88212
6 Laura Escareno 43 female 67.14 174.04 12467
smoke100 exerany handedness
1 N Y R
2 Y N R
3 Y Y R
4 Y Y R
5 N Y R
6 Y N R
table(df$handedness)

L R
15 85

p=.15
se=sqrt(p*(1-p)/100)
c(p-2se, p+2se)
[1] 0.07858572 0.22141428
phat=.15
p0=.12
se0=sqrt(p0*(1-p0)/100)
pnorm(phat,p0,se0)
[1] 0.8220449
data: 15 out of 100, null probability 0.15
X-squared = 0, df = 1, p-value = 0.5
alternative hypothesis: true p is less than 0.15
95 percent confidence interval:
0.000000 0.217903
sample estimates:
p
0.15

scrouse

 df = read.csv("https://marksmath.org/cgi-bin/random_data.csv?username=scrouse")
head(df)

 first_name   last_name age gender height weight income
1     Robert       Klima  20   male  64.06 194.86  46075
2      Keith       Duong  42   male  71.38 165.88   4266
3    Richard   Rodriquez  34   male  63.30 167.25   2267
4  Katharina      Ulcena  45 female  61.26 192.28  63607
5    Leonard Vandeventer  25   male  69.18 170.20  46165
6      Maria     Spencer  22 female  63.30 204.15  25589

smoke100 exerany handedness
1        Y       Y          R
2        Y       Y          R
3        N       Y          R
4        Y       Y          R
5        Y       Y          R
6        Y       N          R

 L  R 
13 87 

p= 0.13
se=sqrt(p*(1-p)/100)
se=0.0336
c(p-2*se)=0.0964
c(p+2*se)=0.1636
[0.0964, 0.1636]

BryanDadson3

df = read.csv("https://www.marksmath.org/cgi-bin/random_data.csv?username=BryanDadson3")
head(df)
first_name  last_name age gender height weight income smoke100 exerany handedness
1       Eric   Mcdonald  20   male  68.33 184.21  14563        Y       Y          R
2     Nicole   Mccreedy  36 female  62.78 191.15   8513        Y       N          R
3     Alicia  Hunsicker  28 female  64.10 119.92 168591        N       N          R
4      Janis       Hark  40 female  62.61 104.19   1384        Y       Y          R
5      James   Hamilton  33   male  69.29 204.69  21821        N       Y          R
6      Glenn Washington  28   male  70.42 157.09   1183        N       N          R

table(df$handedness)

L  R 
12 88 

p = 0.12
se = sqrt(p*(1-p)/100)
c(p-2*se, p+2*se)
[1] 0.05500769 0.18499231

phat = 0.12
p0 = 0.12
se0 = sqrt(p0*(1-p0)/100)
pnorm(phat,p0,se0)
[1] 0.5

The 95% confidence interval for the proportion of people who are left handed is: [0.055, 0.185]
The p-value is not less than .05, so we cannot reject that 12% of the population is left handed.

mark

audrey

Here’s my approach to the hypothesis test portion of the question. First, I’ll read in the data and format the handedness portion of that data as table.

df = read.csv("https://www.marksmath.org/cgi-bin/random_data.csv?username=audrey")
table(df$handedness)

* Out: 
* L  R 
* 10 90

From here, I can see that my sample proportion is \hat{p} = 10/100 = 0.1 , which is certainly not exactly the same as the published proportion of 0.12 . I guess our hypotheses should be:

H_0: p = 0.12 ,
H_A: p \neq 0.12 .

To compute the p -value, the question is - what is the probability of generating data at least as far away from 0.12 as 0.1 under the assumption that the true proportion is 0.12 ? We’ll use a normal to approximate this. The mean and standard deviation of the normal are dictated by the assumed proportion and sample size used to generate our approximation. Thus:

\mu = 0.12 \text{ and } \sigma = \sqrt{0.12\times0.88/100} \approx 0.0325.

The probability that we are interested in is represented by the following area:

temp

We can compute this in R as follows:

2*pnorm(0.1,0.12, sqrt(0.12*0.88/100))
# Out: 0.5382527

Since this is a lot bigger than 0.05, we cannot reject the null.