An archive of Mark's Fall 2017 Intro Stat course.

Random CDC-like data

mark

(5 pts)

I’ve got a random data generator on my webserver. You can download data directly into R and view the first couple of rows like so:

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=mark')
head(mydf,2)

#    first_name last_name age gender height weight income smoke100 exerany
#  1      Donna     Dinan  35 female  65.37 164.26   1947        0       1
#  2      Ramon     Davis  26   male  71.81 193.70  39311        1       1

Note that the username field must match your Discourse username and everyone gets different data.

The problem: Using your data, generate a contingency table relating gender and exercise. Does there appear to be a relationship? If so, what is that relationship?

Be sure to include the code you typed to get your answer. Code blocks are created by indenting your input four spaces.

TineriTalentati
cdc=read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=TineriTalentati')
> head(mydf,2)
Error in head(mydf, 2) : object 'mydf' not found
> head(mydf,2)
Error in head(mydf, 2) : object 'mydf' not found
> mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=TineriTalentati')
> head(mydf,2)
  first_name last_name age gender height weight income smoke100 exerany
1       Dana Pritchett  33 female  65.29 143.83 116672        1       0
2   Michelle       Rea  26 female  60.95 143.72  12435        0       1
 > table(mydf$gender,mydf$exerany)
    
          0  1
   female 11 34
   male   10 45
Gender Exerany(1) No Exerany(0) Total
Male 45 10 55
Female 34 11 45
Total 79 21 100

The information I was given seemed to suggest that, overall, men had a tendency to exercise in some way or capacity (81%) more often than women (76%). That is taking the slight number advantage the men had into account, as well.

As you can see, I also kept getting errors because I was typing in the wrong thing. This is how we learn!

mark

@TineriTalentati Excellent enthusiasm!! We’ll work on this in class tomorrow together, though, and I’ll show you how to include code and other goodies in your post.

Dancerlikens

0 1
female 10 34
male 7 49

swtaylor
Gender Exercise (Yes) Exercise (No)
Female 45 9
Male 36 10
Total 81 19

In my test group there were 2 females and 4 males. Out of the 4 males, 3 of them exercised, and 1 did not. Out of the 2 females, 1 exercised and 1 did not.
head(mydf)
| First Name | Last Name | Age | Gender | Height |
|Weight | Income | Smoke100 | Exerany |
| --------------- | -------------- | ------ | ---------- | -------- | ---------- | --------- | --------------- | ----------- |
| Frank | Burton | 51 | male | 70.22 | 156.50 | 24739 | 0 | 1 |
| Adam | Cortez | 33 | male | 72.23 | 157.17 | 1310 | 0 | 1 |
| Janie | Duncan | 39 | female | 61.71 | 170.02 | 6480 | 0 | 0 |
| Margie | Baklund | 35 | female | 62.85 | 133.39 | 75304 | 0 | 1 |
| George | Crider | 30 | male | 67.13 | 200.15 | 3358 | 0 | 0 |
| Matthew | Shields | 46 | male | 75.52 | 163.44 | 3494 | 1 | 1 |









Edit: I tried to format the data above and so far I’ve not been able to get it to work, I’ll keep working on it until it looks nicer

Out of the total contingency chart, 78% of Men and 83% of women state that they exercise in some way.

miles

mydf = read.csv(‘https://marksmath.org/cgi-bin/random_data.csv?username=miles’)

head(mydf,2)
first_name last_name age gender height weight income smoke100 exerany
1 Kathleen Miller 36 female 65.03 185.22 21171 0 1
2 Christina Murphy 42 female 58.77 158.55 16639 0 1
table(mydfgender,mydfexerany)
table(mydf gender,mydf exerany)




      0  1

female 14 31
male 10 45

Gender Exerany No Exerany Total
Male 45 10 55
Female 31 14 45
Total 76 24 100

From my information, men have a tendency to exercise (81%) more than women (68%).

audrey

Alright - here’s how I imported my data and the first two lines of the data frame:

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=audrey')
head(mydf,2)

#   first_name last_name age gender height weight income smoke100 exerany
# 1     George  Howerton  51   male  70.98 222.96 216670        1       1
# 2        Rae    Cherry  42 female  61.12 169.26    709        0       1

As we learned in our class problem sheet, the contingency table is

table(mydf$gender, mydf$exerany)
    
#         0  1
# female 13 33
# male   11 43

From here, it’s pretty easy to see that the percentage of men who smoke is 43/54\approx80\% , while the percentage of women who smoke is only 33/46\approx72\% . Looks “significant” and we’ll learn of ways to test that later.

avavball13

mydf = read.csv(‘https://marksmath.org/cgi-bin/random_data.csv?username=avavball13’)
head(mydf,2)

first_name last_name age gender height weight income smoke100 exerany
1 Evelyn Cano 28 female 64.93 113.82 173849 0 1
2 Mario Pettis 35 male 71.73 198.94 10393 1 1

Yes, there is a relationship, Evelyn and Mario both have 1 on exercise.

ceciliastack21

mydf = read.csv(‘https://marksmath.org/cgi-bin/random_data.csv?username=ceciliastack21’)

head(mydf,2)
first_name last_name age gender height weight income smoke100 exerany
1 Tammara Croom 40 female 65.56 160.47 11448 1 1
2 Charles Davis 20 male 61.65 173.19 25513 1 1


I don’t think there is a relationship because they both exercise and one is male and one is female.

everyrose

#This line accessed and stored the data set from the website.

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=everyrose') 

#This line provided the following table:

head(mydf,2)       

first_name last_name age gender height weight income smoke100 exerany
1      Clark Valentine  24   male  66.20 167.46   5752        1       1
2     Johnie    Walker  43   male  72.21 175.21   8334        0       1

#This line provided the table below:

table(mydf$gender, mydf$exerany)
Gender Exercise No Exercise Total
Male 45 10 55
Female 31 14 45
Total 76 24 100

In my test group the majority of participants (76%) exercised. 81.8% of male participants exercised while 68.9% of female participants exercised. While a larger percentage of male participants exercised than their female counterparts, this was a rather small sample (only 100 people), which largely affected the results. I would say there is not a strong relationship between male and female exercising and feel the data would tell a stronger story if it was about what type of exercise (cardio, strength, etc…) or even the frequency with which participants exercised.

everyrose

These are only the first two data points, given by the command “head(mydf, 2).” There should be 100 entries, and you can find the number of participants that exercised and didn’t by creating a table that displays gender and exerany.

BeauNichols

R wouldn’t let me change the username so I had to stick with Mark.
mydf = read.csv(‘https://marksmath.org/cgi-bin/random_data.csv?username=mark’)

head(mydf,2)
first_name last_name age gender height weight income smoke100 exerany
1 Donna Dinan 35 female 65.37 164.26 1947 0 1
2 Ramon Davis 26 male 71.81 193.70 39311 1 1


mark

@BeauNichols

read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=BeauNichols')

Seems to work fine for me!

TaylorHinson

first_name last_name age gender height weight income smoke100 exerany
1 Karen Onks 38 female 60.96 109.50 19356 0 0
2 Gregory Gurganus 43 male 68.88 168.48 4091 0 1

     0 1 total

female 13 35 48
male 19 33 52
total 32 68 100

mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=TaylorHinson')

In my test group, it seems that men exercise more than females do. Out of the male group 72.9% exercise and out of the female group only 63.5 % exercise. However, there is a greater amount of men than woman in this group, which is something you must take into consideration.

LunaLovegood

mydf = read.csv(‘https://marksmath.org/cgi-bin/random_data.csv?username=LunaLovegood’)
head(mydf)
first_name last_name age gender height weight income smoke100 exerany
1 Ashley Hudson 55 female 63.54 125.44 1965 1 1
2 Tammy Ayala 41 female 58.91 163.22 4611 0 1
3 Edith Kelly 21 female 62.20 170.03 69589 0 1
4 Christopher Long 29 male 65.16 152.04 26987 1 1
5 Howard Hang 44 male 70.13 182.23 13343 0 1
6 Jenifer Sublett 46 female 57.86 144.04 40640 1 0







table(mydf gender,mydf exerany)
0 1
female 17 44
male 4 35


In my data set, 72% of the women polled exercised, compared to almost 90% of the men. This suggests that in this sample, men cared more overall about their physical health. What we don’t know is how much of this difference results from ages, cultural norms, or other factors.

CubsW98

mydf = read.csv(‘https://marksmath.org/cgi-bin/random_data.csv?username=CubsW98’)

head(mydf,2)
first_name last_name age gender height weight income smoke100 exerany
1 Susan Morris 43 female 68.68 178.09 14732 0 1
2 Jeffrey Touhey 43 male 65.19 186.43 18370 0 1
table(mydf gender,mydf exerany)



      0  1

female 17 31
male 10 42

Gender    Exercise (Yes)    Exercise (No)    Total
Female	31	17    48
Male	42	10    52
Total	73	27    100

In my set of information, males do seem to lean more to exercising than the females do. 65% of all female participants say they exercise while 81% of the male participants say they exercise. Also, 42% of the total participants are exercising males, while only 31% of the total participants are exercising females.

mfinley
mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=mfinley')
 head(mydf,2)
first_name  last_name age gender height weight income smoke100 exerany
1       Tina        Fox  38 female  62.19 172.74  18226        0       0
2      Staci Mclaughlin  25 female  61.29 156.82   3640        1       1
 table(mydf$gender, mydf$exerany)
    
      0  1
female 14 35
male    9 42

The percentage of females who exercise is 71% while the percentage of males who exercise is 82%.

blaiser1
mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=blaiser12')
head(mydf,2)

  first_name last_name age gender height weight income smoke100 exerany
1     Samuel      Berg  33   male  71.71 167.15  54069        1       1
2     Sophia  Ferguson  40 female  61.22 169.63   8908        1       1



table(mydf$gender, mydf$exerany)

#         0  1
# female 11 39
# male    9   41

here you can see that 78% ( 39/50) of women smoke but 82% (41/50) of men smoke

Morgan
mydf = read.csv('https://marksmath.org/cgi-bin/random_data.csv?username=morgan')
head(mydf,2)

#    first_name last_name age gender height weight income smoke100 exerany
#  1      Donna     Dinan  35 female  65.37 164.26   1947        0       1
#  2      Ramon     Davis  26   male  71.81 193.70  39311        1       1
BryanDadson3
first_name last_name age gender height weight income smoke100 exerany
1       Eric  Mcdonald  20   male  68.33 184.21  14563        1       1
2    Matthew  Mccreedy  31   male  66.54 154.90    204        1       0
Gender Exerany (1) Exerany (0) Total
Male 17 28 45
Female 12 43 55
Total 29 71 100

From the data we can see that out of a sample of 100 people, men had a tendency to exercise more (37.8%) than women (21.8%). This is not a fair statistic however because there is not an even amount of males (45) and females (55) in the sample size of 100. To make this a more fair statistic, it would be better to get a Simple Random Sample (SRS) of 100 males and females (so 50 males and 50 females), and then ask each person if they exercise or not.