Date

Team members responsible for this notebook:

Yijia Mao: make ggplots for age groups and race groups, write explanations, and compare California employment rates with average employment rates.

Minghong Zheng: make employment rate difference maps, write explanations.

Yuhan Wang: make plots for general, run t-tests, run regressions and make predicts, write explanations.

Yiwen Wei: proofread codes.

In [1]:
%load_ext rmagic
In [2]:
from IPython.core.display import Image
In [3]:
%%bash
cd ../data1/cleaned
ls
age2013.csv
agecal.csv
gen2013.csv
gencal.csv
race2013.csv
racecal.csv

In [4]:
%%R
print(getwd())
setwd('../visualizations')
[1] "/home/oski/project/Team_Four.0/notebooks"

Part A

We are first working on the 2013 data by state.

read csv into R.

In [5]:
%%R
gen2013=read.csv('../data1/cleaned/gen2013.csv',header=T)
print(head(gen2013))
   X grp_code   state group emp_rate unemp_rate
1  2        2 Alabama   Men     59.2        7.0
2  3        3 Alabama Women     49.3        6.8
3 34        2  Alaska   Men     66.7        7.7
4 35        3  Alaska Women     59.6        5.3
5 60        2 Arizona   Men     60.4        8.2
6 61        3 Arizona Women     49.0        7.8

We need to compare the general employment rate between men and women. So we first generated 2 subsets with group name "Men" and "Women" only. Then we made a scatter plot to show the employment rates in different states.

In [6]:
%%R  -w  800
gen2013m=subset(gen2013, group=="Men")
gen2013w=subset(gen2013, group=="Women")

jpeg("Employment Rate of Men and Women by States in 2013.jpeg", width=800)
print(head(gen2013m$state))
plot(gen2013m$state, gen2013m$emp_rate, col='blue', las=2, ylim=c(min(gen2013m$emp_rate,gen2013w$emp_rate), max(gen2013m$emp_rate,gen2013w$emp_rate)),
     xlab='State Name', ylab='Employment Rate (percentage)', col.lab='blue')
points(gen2013w$state, gen2013w$emp_rate, col='red')
points(gen2013m$state, gen2013m$emp_rate, col='blue')
legend('topleft', c('Employment Rate-Men','Employment Rate-Women'), col=c('blue','red'),pch=1)
title('Employment Rate of Men and Women by States in 2013')
dev.off()
[1] Alabama    Alaska     Arizona    Arkansas   California Colorado  
51 Levels: Alabama Alaska Arizona Arkansas California Colorado ... Wyoming

In [7]:
Image("Employment Rate of Men and Women by States in 2013.jpeg")
Out[7]:

We also would like to know which states have the highest employment rates, and which states have the lowest employment rates.

In [8]:
%%R
print(list(gen2013m[which.max(gen2013m$emp_rate),],
gen2013m[which.min(gen2013m$emp_rate),],
gen2013w[which.max(gen2013w$emp_rate),],
gen2013w[which.min(gen2013w$emp_rate),]))
[[1]]
     X grp_code    state group emp_rate unemp_rate
55 866        2 Nebraska   Men     75.5          4

[[2]]
      X grp_code         state group emp_rate unemp_rate
97 1532        2 West Virginia   Men       55        7.8

[[3]]
      X grp_code        state group emp_rate unemp_rate
70 1094        3 North Dakota Women     66.2        2.3

[[4]]
      X grp_code         state group emp_rate unemp_rate
98 1533        3 West Virginia Women     45.9        5.1


And the employment rate difference between men and women.

we first made a new dataframe gen2013mw, which combined Men and Women employment rates for the same state in the same row. Two new columns were generated: emp_rate_diff, and unemp_rate_diff.

In [9]:
%%R
names(gen2013m)[5:6]=paste(names(gen2013m)[5:6],'_m',sep='')
names(gen2013w)[5:6]=paste(names(gen2013w)[5:6],'_w',sep='')
print(head(gen2013m))
     X grp_code      state group emp_rate_m unemp_rate_m
1    2        2    Alabama   Men       59.2          7.0
3   34        2     Alaska   Men       66.7          7.7
5   60        2    Arizona   Men       60.4          8.2
7   93        2   Arkansas   Men       58.7          7.9
9  125        2 California   Men       64.0          9.1
11 158        2   Colorado   Men       69.3          6.9

In [10]:
%%R
gen2013mw=merge(gen2013m,gen2013w, by='state')
gen2013mw['emp_diff']=gen2013mw['emp_rate_m']-gen2013mw['emp_rate_w']
gen2013mw['unemp_diff']=gen2013mw['unemp_rate_m']-gen2013mw['unemp_rate_w']
gen2013mw=subset(gen2013mw, select=c(state,emp_rate_m,unemp_rate_m,emp_rate_w,unemp_rate_w,emp_diff, unemp_diff))
print(head(gen2013mw))
       state emp_rate_m unemp_rate_m emp_rate_w unemp_rate_w emp_diff
1    Alabama       59.2          7.0       49.3          6.8      9.9
2     Alaska       66.7          7.7       59.6          5.3      7.1
3    Arizona       60.4          8.2       49.0          7.8     11.4
4   Arkansas       58.7          7.9       47.6          7.8     11.1
5 California       64.0          9.1       50.8          8.6     13.2
6   Colorado       69.3          6.9       58.0          6.2     11.3
  unemp_diff
1        0.2
2        2.4
3        0.4
4        0.1
5        0.5
6        0.7

Make a scatter plot to see the employment difference. We can clearly see that for all states, the difference is positive, which suggests that men are employed more than women. To see this for sure, we will run t-tests to test whether the difference is statistically significant.

Similarly, we made a plot too show the unemployment rate difference. In most states, the unemployment of men is a few percentage higher then women.

In [53]:
%%R
jpeg("Employment Rate Difference in 2013.jpeg", width=800)
plot(gen2013mw$state, gen2013mw$emp_diff, col='blue', las=2, ylim=c(min(gen2013mw$emp_diff), max(gen2013mw$emp_diff)), xlab='State Name', ylab='Employment Rate Difference (Percentage)', col.lab='blue')
title('Employment Rate Difference in 2013')

dev.off()
In [54]:
Image("Employment Rate Difference in 2013.jpeg")
Out[54]:
In [55]:
%%R
jpeg("Employment Rate Difference in 2013.jpeg", width=800)
plot(gen2013mw$state, gen2013mw$unemp_diff, col='blue', las=2, ylim=c(min(gen2013mw$unemp_diff), max(gen2013mw$unemp_diff)), xlab='State Name', ylab='Unmployment Rate Difference (Percentage)', col.lab='blue')
title('Unmployment Rate Difference in 2013')
dev.off()
In [56]:
Image("Employment Rate Difference in 2013.jpeg")
Out[56]:

see which state has the highest and lowest employment difference. Do the same for unemployment rate.

In [13]:
%%R
print(list(gen2013mw[which.max(gen2013mw$emp_diff),],
gen2013mw[which.min(gen2013mw$emp_diff),],
          gen2013mw[which.max(gen2013mw$unemp_diff),],
gen2013mw[which.min(gen2013mw$unemp_diff),]))
[[1]]
   state emp_rate_m unemp_rate_m emp_rate_w unemp_rate_w emp_diff unemp_diff
45  Utah       75.1          4.4       56.9          4.4     18.2          0

[[2]]
   state emp_rate_m unemp_rate_m emp_rate_w unemp_rate_w emp_diff unemp_diff
20 Maine       64.2          7.2         58          6.2      6.2          1

[[3]]
           state emp_rate_m unemp_rate_m emp_rate_w unemp_rate_w emp_diff
49 West Virginia         55          7.8       45.9          5.1      9.1
   unemp_diff
49        2.7

[[4]]
     state emp_rate_m unemp_rate_m emp_rate_w unemp_rate_w emp_diff unemp_diff
11 Georgia       65.3          7.3       50.7          9.3     14.6         -2


run a one-sample t-test to see whether we can reject the null hypothesis of "Men and women have the same employment rate", by comparing the employment rate difference with 0.

do the same for unemployment difference.

In [14]:
%%R
print(t.test(gen2013mw$emp_diff, mu=0))
print(t.test(gen2013mw$unemp_diff, mu=0))

	One Sample t-test

data:  gen2013mw$emp_diff
t = 28.4945, df = 50, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  9.65233 11.11630
sample estimates:
mean of x 
 10.38431 


	One Sample t-test

data:  gen2013mw$unemp_diff
t = 5.1914, df = 50, p-value = 3.826e-06
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 0.3738681 0.8457398
sample estimates:
mean of x 
0.6098039 


from what we get from the t-tests, we can reject the null hypotheses, and say that men has much higher employment rate, while lower unemployment rate, than women.

Make maps to see employment rate difference between men and women in states in 2013

  • Use packages "map" and "ggplot2" to make maps
In [15]:
%%R
install.packages("map")
install.packages("ggplot2")
Installing package into ‘/home/oski/R/i686-pc-linux-gnu-library/3.0’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
Installing package into ‘/home/oski/R/i686-pc-linux-gnu-library/3.0’
(as ‘lib’ is unspecified)
trying URL 'http://cran.cnr.Berkeley.edu/src/contrib/ggplot2_0.9.3.1.tar.gz'
Content type 'application/x-gzip' length 2330942 bytes (2.2 Mb)
opened URL
==================================================
downloaded 2.2 Mb


The downloaded source packages are in
	‘/tmp/RtmpDlgpC3/downloaded_packages’

In [16]:
%%R
require(ggplot2)
require(maps)
Loading required package: ggplot2
Use suppressPackageStartupMessages to eliminate package startup messages.
Loading required package: maps

  • Make a dataframe about an ordered list of longitude and latitude points that outlines each US state
In [17]:
%%R
us_state_map=map_data('state')
print(head(us_state_map))
       long      lat group order  region subregion
1 -87.46201 30.38968     1     1 alabama      <NA>
2 -87.48493 30.37249     1     2 alabama      <NA>
3 -87.52503 30.37249     1     3 alabama      <NA>
4 -87.53076 30.33239     1     4 alabama      <NA>
5 -87.57087 30.32665     1     5 alabama      <NA>
6 -87.58806 30.32665     1     6 alabama      <NA>

  • Name diff_rate as a dataframe with states and difference between employment rate of men and women in states in 2013
In [18]:
%%R
diff_rate=gen2013mw[c("emp_diff")]
B=tolower(unname(unlist(gen2013mw["state"])))#lowercase states
diff_rate["region"]=B#use name "region" instead of "state"
print(head(diff_rate))
  emp_diff     region
1      9.9    alabama
2      7.1     alaska
3     11.4    arizona
4     11.1   arkansas
5     13.2 california
6     11.3   colorado

  • Merge our dataframe of diff_rate into the map data, and sort it agian
In [19]:
%%R
map_data1=merge(us_state_map,diff_rate,by="region",all=T) 
map_data1=map_data1[order(map_data1$order), ]
print(head(map_data1))
   region      long      lat group order subregion emp_diff
1 alabama -87.46201 30.38968     1     1      <NA>      9.9
2 alabama -87.48493 30.37249     1     2      <NA>      9.9
3 alabama -87.52503 30.37249     1     3      <NA>      9.9
4 alabama -87.53076 30.33239     1     4      <NA>      9.9
5 alabama -87.57087 30.32665     1     5      <NA>      9.9
6 alabama -87.58806 30.32665     1     6      <NA>      9.9

  • Generate a similar map for employment rate difference between men and women in 2013 as we have done above
In [20]:
%%R
setwd('../visualizations')
emp_difference=unname(unlist(map_data1["emp_diff"]))# covert the class of dataframe into class of numeric
print(qplot(long,lat , data=map_data1, geom="polygon", group=group, fill=emp_difference)
      + labs(x="", y="")+theme_bw()+ggtitle("Employment Rate Difference between Men and Women in 2013")
      +theme(legend.position="bottom", legend.direction="horizontal")+scale_fill_gradient2("fill"))
  • Save it as image "emp_diff_wm.jpeg" into visulizations directory
In [21]:
%%R
ggsave("emp_diff_wm.jpeg")
Saving 6.67 x 6.67 in image

  • Putting it all togther and make a function to generate this kind of map
In [22]:
%%R
mapusa=function(dataset_m,dataset_w,title_for_empdiff){
    
K=dataset_m["emp_rate"]-dataset_w["emp_rate"]
names(K)="emp_diff"   
B=tolower(unname(unlist(dataset_m["state"])))#lowercase states
K["region"]=B
require(ggplot2)
require(maps)
us_state_map=map_data('state')
map_data=merge(us_state_map,K,by="region",all=T) 
map_data=map_data[order(map_data$order), ]   
emp_difference=unname(unlist(map_data["emp_diff"]))
setwd('../visualizations')
print(qplot(long,lat , data=map_data, geom="polygon", group=group, fill=emp_difference)
      + labs(x="", y="")+theme_bw()+ggtitle(title_for_empdiff)
      +theme(legend.position="bottom", legend.direction="horizontal")+scale_fill_gradient2("fill"))    
ggsave(paste(title_for_empdiff,".jpeg",sep=""))
}

Now we analyze the gender discrimination of all the states based on various age groups.

In [23]:
%%R
age2013=read.csv('../data1/cleaned/age2013.csv',header=T)
i=sapply(age2013, is.factor)
print(i)
age2013[i]=lapply(age2013[i],as.character)
age2013=age2013[complete.cases(age2013),]

A1=subset(age2013,grp_code==26|grp_code==33)
A2=subset(age2013,grp_code==27|grp_code==34)
A3=subset(age2013,grp_code==28|grp_code==35)
A4=subset(age2013,grp_code==29|grp_code==36)
A5=subset(age2013,grp_code==30|grp_code==37)
A6=subset(age2013,grp_code==31|grp_code==38)
A7=subset(age2013,grp_code==32|grp_code==39)
A=list(A1,A2,A3,A4,A5,A6,A7)
print(A1[1,"grp_code"])
print(head(A1))

clean=function(x,diff){       #make a function called"clean" to clean non-matching data
AK=NULL
a=1
for (i in (1:((nrow(x)-1))))
{if (abs(x[i+1,"grp_code"]-x[i,"grp_code"])!=diff)
   {AK[a]=i
     a=a+1}}
if (is.null(AK)) {return(x=x)}
else{return(x=x[-AK,])}}

for( i in (1: length(A)))
    {A[[i]]=clean(A[[i]],7)}  # clean non- matching data
         X   grp_code      state      group   emp_rate unemp_rate 
     FALSE      FALSE       TRUE       TRUE      FALSE      FALSE 
[1] 26
     X grp_code      state                 group emp_rate unemp_rate
1   19       26    Alabama   Men, 16 to 19 years     18.1       32.3
8   26       33    Alabama Women, 16 to 19 years     25.3       15.6
25  78       26    Arizona   Men, 16 to 19 years     24.3       29.8
32  85       33    Arizona Women, 16 to 19 years     20.4       28.9
39 111       26   Arkansas   Men, 16 to 19 years     21.3       33.3
52 143       26 California   Men, 16 to 19 years     17.6       34.0

In [24]:
%%R -h 1000  -w 1000

bigA=do.call(rbind, A)
bigA$Age=gsub(".*, ", "", bigA$group)
bigA$Gender=gsub(", .*", "", bigA$group)

p <- ggplot(bigA, aes(x = factor(Age), y = emp_rate, color = Gender)) + geom_boxplot() + geom_jitter() + ggtitle("Employment Rate by Age Groups") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Age Groups") + ylab("Employment Rate")
print(p)
ggsave("Employment Rate by Age Groups.jpeg")
Saving 13.9 x 13.9 in image

In [25]:
%%R -h 1000  -w 1000


p2 <- ggplot(bigA, aes(x = factor(Age), y = unemp_rate, color = Gender)) + geom_boxplot() + geom_jitter() + ggtitle("Unemployment Rate by Age Groups") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Age Groups") + ylab("Unemployment Rate")
print(p2)
ggsave("Unemployment Rate by Age Groups.jpeg")
Saving 13.9 x 13.9 in image

Use the function"mapusa" to make a map for employment rate difference of different age groups according to gender

Here is an example for making a map of empolyment rate difference from 16 to 19 years according to gender

In [26]:
%%R
Age2013m=subset(A[[1]],grp_code==A[[1]][1,"grp_code"])# the dataset of employment and unemployment rate among men from 16 to 19 years
Age2013w=subset(A[[1]],grp_code==A[[1]][2,"grp_code"])# the dataset of employment and unemployment rate among women from 16 to 19 years
mapusa(Age2013m,Age2013w,"Employment Rate Difference from 16 to 19 years")
Saving 6.67 x 6.67 in image

  • Put the codes above together and make a function called "make_age_group" to make maps for different age groups
In [27]:
%%R
make_age_group=function(i){
Agegroup=c("16 to 19 years","20 to 24 years","25 to 34 years","35 to 44 years","45 to 54 years","55 to 64 years","65 years and over")
Age2013m=subset(A[[i]],grp_code==A[[i]][1,"grp_code"])
Age2013w=subset(A[[i]],grp_code==A[[i]][2,"grp_code"])
mapusa(Age2013m,Age2013w,paste("Employment Rate Difference from ",Agegroup[i],sep=""))
 }
       
  • Make a map for empolyment rate difference from 20 to 24 years according to gender
In [28]:
%%R
make_age_group(2)
Saving 6.67 x 6.67 in image

  • Make a map for empolyment rate difference from 25 to 34 years according to gender
In [29]:
%%R
make_age_group(3)
Saving 6.67 x 6.67 in image

  • Make a map for empolyment rate difference from 35 to 44 years according to gender
In [30]:
%%R
make_age_group(4)
Saving 6.67 x 6.67 in image

  • Make a map for empolyment rate difference from 45 to 54 years according to gender
In [31]:
%%R
make_age_group(5)
Saving 6.67 x 6.67 in image

  • Make a map for empolyment rate difference from 55 to 64 years according to gender
In [32]:
%%R
make_age_group(6)
Saving 6.67 x 6.67 in image

  • Make a map for empolyment rate difference from 65 years and over according to gender
In [33]:
%%R
make_age_group(7)
Saving 6.67 x 6.67 in image

In [34]:
%%R
race2013=read.csv('../data1/cleaned/race2013.csv',header=T)
i=sapply(race2013, is.factor)
race2013[i]=lapply(race2013[i],as.character)
race2013=race2013[complete.cases(race2013),]


R1=subset(race2013,grp_code==5|grp_code==6)
R2=subset(race2013,grp_code==8|grp_code==9)
R3=subset(race2013,grp_code==14|grp_code==15)


R=list(R1,R2,R3)

for( i in (1: length(R)))
    {R[[i]]=clean(R[[i]],1)}  # use the function"clean" to clean non- matching data
print(R[[2]])
       X grp_code                state                            group
3      8        8              Alabama   Black or African American, men
4      9        9              Alabama Black or African American, women
12    66        8              Arizona   Black or African American, men
13    67        9              Arizona Black or African American, women
18    99        8             Arkansas   Black or African American, men
19   100        9             Arkansas Black or African American, women
24   131        8           California   Black or African American, men
25   132        9           California Black or African American, women
30   164        8             Colorado   Black or African American, men
31   165        9             Colorado Black or African American, women
36   197        8          Connecticut   Black or African American, men
37   198        9          Connecticut Black or African American, women
42   230        8             Delaware   Black or African American, men
43   231        9             Delaware Black or African American, women
48   263        8 District of Columbia   Black or African American, men
49   264        9 District of Columbia Black or African American, women
54   296        8              Florida   Black or African American, men
55   297        9              Florida Black or African American, women
60   329        8              Georgia   Black or African American, men
61   330        9              Georgia Black or African American, women
74   423        8             Illinois   Black or African American, men
75   424        9             Illinois Black or African American, women
80   456        8              Indiana   Black or African American, men
81   457        9              Indiana Black or African American, women
86   489        8                 Iowa   Black or African American, men
87   490        9                 Iowa Black or African American, women
92   522        8               Kansas   Black or African American, men
93   523        9               Kansas Black or African American, women
98   555        8             Kentucky   Black or African American, men
99   556        9             Kentucky Black or African American, women
104  588        8            Louisiana   Black or African American, men
105  589        9            Louisiana Black or African American, women
111  649        8             Maryland   Black or African American, men
112  650        9             Maryland Black or African American, women
117  682        8        Massachusetts   Black or African American, men
118  683        9        Massachusetts Black or African American, women
123  715        8             Michigan   Black or African American, men
124  716        9             Michigan Black or African American, women
129  748        8            Minnesota   Black or African American, men
130  749        9            Minnesota Black or African American, women
135  781        8          Mississippi   Black or African American, men
136  782        9          Mississippi Black or African American, women
140  812        8             Missouri   Black or African American, men
141  813        9             Missouri Black or African American, women
148  872        8             Nebraska   Black or African American, men
149  873        9             Nebraska Black or African American, women
154  905        8               Nevada   Black or African American, men
155  906        9               Nevada Black or African American, women
164  969        8           New Jersey   Black or African American, men
165  970        9           New Jersey Black or African American, women
174 1033        8             New York   Black or African American, men
175 1034        9             New York Black or African American, women
180 1066        8       North Carolina   Black or African American, men
181 1067        9       North Carolina Black or African American, women
188 1125        8                 Ohio   Black or African American, men
189 1126        9                 Ohio Black or African American, women
194 1158        8             Oklahoma   Black or African American, men
195 1159        9             Oklahoma Black or African American, women
204 1221        8         Pennsylvania   Black or African American, men
205 1222        9         Pennsylvania Black or African American, women
210 1254        8         Rhode Island   Black or African American, men
211 1255        9         Rhode Island Black or African American, women
216 1287        8       South Carolina   Black or African American, men
217 1288        9       South Carolina Black or African American, women
225 1349        8            Tennessee   Black or African American, men
226 1350        9            Tennessee Black or African American, women
231 1382        8                Texas   Black or African American, men
232 1383        9                Texas Black or African American, women
243 1472        8             Virginia   Black or African American, men
244 1473        9             Virginia Black or African American, women
249 1505        8           Washington   Black or African American, men
250 1506        9           Washington Black or African American, women
255 1538        8        West Virginia   Black or African American, men
256 1539        9        West Virginia Black or African American, women
259 1568        8            Wisconsin   Black or African American, men
260 1569        9            Wisconsin Black or African American, women
    emp_rate unemp_rate
3       43.5       14.0
4       47.4       10.7
12      54.9       15.6
13      43.8       15.1
18      47.7       17.4
19      45.8       15.7
24      48.8       16.7
25      45.4       14.7
30      62.7        9.5
31      58.0       13.1
36      54.5       15.6
37      57.6       11.4
42      56.9       11.8
43      53.7       10.1
48      51.0       15.4
49      46.7       15.0
54      57.4       14.1
55      53.5       10.7
60      60.8       12.3
61      53.2       14.5
74      47.9       19.6
75      49.9       14.9
80      50.4       17.0
81      51.5       17.5
86      55.9       14.7
87      58.2        8.4
92      60.6       11.7
93      55.7       11.8
98      56.8       16.9
99      55.7        9.3
104     48.7       14.1
105     47.8       10.6
111     61.9       10.1
112     60.6        9.4
117     58.6       12.9
118     50.2        8.0
123     49.7       15.6
124     45.5       17.2
129     63.5       14.7
130     56.5       15.3
135     45.5       16.3
136     43.6       11.7
140     54.2       11.3
141     52.7       10.9
148     72.2        8.6
149     62.7       11.8
154     54.1       15.4
155     50.0       14.8
164     56.3       15.1
165     55.7       11.2
174     52.8       15.1
175     51.1       11.4
180     55.3       14.8
181     55.0       10.8
188     48.5       15.2
189     48.9       14.9
194     63.8        7.6
195     47.1        9.9
204     50.6       17.8
205     52.8       11.5
210     62.7       13.5
211     49.0       18.6
216     51.8       12.1
217     51.1       11.5
225     59.6       16.6
226     52.1       13.5
231     61.8       11.0
232     56.6       10.4
243     57.8       11.0
244     59.5        8.4
249     61.9       15.0
250     46.7       12.9
255     53.9       11.2
256     54.2        1.4
259     53.5       14.5
260     53.0       15.6

In [35]:
%%R -h 1000  -w 1000

bigR=do.call(rbind, R)
bigR$Gender=gsub(".*, ", "", bigR$group)
bigR$Race=gsub(", .*", "", bigR$group)
print(head(bigR))

q <- ggplot(bigR, aes(x = factor(Race), y = emp_rate, color = Gender)) + geom_boxplot() + geom_jitter() + ggtitle("Employment Rate by Race Groups") + xlab("Race Groups") + ylab("Employment Rate") 
print(q)
ggsave("Employment Rate by Race Groups.jpeg")
    X grp_code   state        group emp_rate unemp_rate Gender  Race
1   5        5 Alabama   White, men     63.9        5.4    men White
2   6        6 Alabama White, women     50.1        5.1  women White
6  37        5  Alaska   White, men     67.9        6.5    men White
7  38        6  Alaska White, women     61.1        4.3  women White
10 63        5 Arizona   White, men     61.7        7.2    men White
11 64        6 Arizona White, women     49.5        7.2  women White
Saving 13.9 x 13.9 in image

In [36]:
%%R -h 1000  -w 1000



q2 <- ggplot(bigR, aes(x = factor(Race), y = unemp_rate, color = Gender)) + geom_boxplot() + geom_jitter() + ggtitle("Unemployment Rate by Race Groups") + xlab("Race Groups") + ylab("Unemployment Rate") 
print(q2)
ggsave("Unemployment Rate by Race Groups.jpeg")
Saving 13.9 x 13.9 in image

want to compare California Employment rate with the rates in other states & predict its trend according to the data in the past 10 years

We take the average of all males and females in all states and compare them with the average of males and females in California:

In [37]:
%%R   -h 1000  -w  1000

print(head(gen2013))
gen2013m=subset(gen2013, group=="Men")
gen2013w=subset(gen2013, group=="Women")
MenAvg=mean(gen2013m[ ,"emp_rate"], na.rm = TRUE)
WomenAvg=mean(gen2013w[ ,"emp_rate"], na.rm=TRUE)
print(MenAvg)

print(gen2013$emp_rate)
gen2013m$difference=gen2013m$emp_rate-MenAvg
gen2013w$differencen=gen2013w$emp_rate-WomenAvg
print(head(gen2013m))
jpeg("Male: Difference between California Employment Rate and Average Employment Rate.jpeg",width=800)
plot(gen2013m$state,gen2013m$difference, las=2, xlab="state", main="Male: Difference between California Employment Rate and Average Employment Rate")

dev.off()
   X grp_code   state group emp_rate unemp_rate
1  2        2 Alabama   Men     59.2        7.0
2  3        3 Alabama Women     49.3        6.8
3 34        2  Alaska   Men     66.7        7.7
4 35        3  Alaska Women     59.6        5.3
5 60        2 Arizona   Men     60.4        8.2
6 61        3 Arizona Women     49.0        7.8
[1] 65.03725
  [1] 59.2 49.3 66.7 59.6 60.4 49.0 58.7 47.6 64.0 50.8 69.3 58.0 64.4 55.7 61.3
 [16] 51.5 67.4 60.3 61.3 51.0 65.3 50.7 62.5 52.5 67.0 52.9 64.4 54.9 64.2 52.5
 [31] 70.6 62.7 68.8 59.9 60.7 51.4 61.8 49.4 64.2 58.0 67.3 58.9 64.1 56.5 60.7
 [46] 50.2 70.9 62.6 56.2 46.0 65.9 54.6 63.9 57.0 75.5 63.5 63.5 52.0 70.1 61.4
 [61] 64.9 54.1 59.4 47.8 61.9 52.5 62.7 51.4 75.3 66.2 62.3 54.3 67.1 50.8 60.4
 [76] 51.5 64.0 53.7 64.0 55.0 60.0 49.6 71.3 62.9 62.7 49.1 69.7 53.2 75.1 56.9
 [91] 68.5 62.3 68.3 57.2 64.5 54.1 55.0 45.9 67.7 60.3 71.8 58.1
     X grp_code      state group emp_rate unemp_rate difference
1    2        2    Alabama   Men     59.2        7.0  -5.837255
3   34        2     Alaska   Men     66.7        7.7   1.662745
5   60        2    Arizona   Men     60.4        8.2  -4.637255
7   93        2   Arkansas   Men     58.7        7.9  -6.337255
9  125        2 California   Men     64.0        9.1  -1.037255
11 158        2   Colorado   Men     69.3        6.9   4.262745

In [38]:
Image("Male: Difference between California Employment Rate and Average Employment Rate.jpeg")
Out[38]:
In [39]:
%%R
jpeg("Female: Difference between California Employment Rate and Average Employment Rate.jpeg",width=800)

plot(gen2013w$state,gen2013w$difference, las=2, xlab="state", main="Female: Difference between California Employment Rate and Average Employment Rate")
dev.off()
In [40]:
Image("Female: Difference between California Employment Rate and Average Employment Rate.jpeg")
Out[40]:
In [41]:
%%R
gencal=read.csv('../data1/cleaned/gencal.csv',header=T)
print(head(gencal))
     X year grp_code      state group emp_rate unemp_rate
1  189 2004        2 California   Men     69.2        6.3
2  190 2004        3 California Women     54.1        6.0
3 2608 2005        2 California   Men     70.4        5.2
4 2609 2005        3 California Women     54.0        5.4
5 5016 2006        2 California   Men     70.4        4.7
6 5017 2006        3 California Women     53.9        5.0

Use the function"mapusa" to make maps for employment rate difference in different race groups according to gender

  • Create a similar function called "make_race_group" as the function "make_age_group" to generate maps for different race groups
In [42]:
%%R
make_race_group=function(i){
Racegroup=c("Black or African American","Hispanic or Latino ethnicity","White")
Age2013m=subset(R[[i]],grp_code==R[[i]][1,"grp_code"])
Age2013w=subset(R[[i]],grp_code==R[[i]][2,"grp_code"])
mapusa(Age2013m,Age2013w,paste("Employment Rate Difference for ",Racegroup[i],sep=""))
 }
In [43]:
%%R
A=list()
  • Make a map for empolyment rate difference for Black or African American according to gender
In [44]:
%%R
make_race_group(1)
Saving 6.67 x 6.67 in image

  • Make a map for empolyment rate difference for Hispanic or Latino ethnicity according to gender
In [45]:
%%R
make_race_group(2)
Saving 6.67 x 6.67 in image

  • Make a map for empolyment rate difference for White according to gender
In [46]:
%%R
make_race_group(3)
Saving 6.67 x 6.67 in image

See the change trend, and make a prediction

Use 2013 California data, a linear regression is done using emp_diff (the employment rate difference between men and women) as dependent variable, and year as independent variable. And do the same for unemp_diff.

Using the regression result. We were able to predict the employment rate difference and unemployment difference in 2014.

First, calculate the employment rate difference, and clean the dataframe.

In [47]:
%%R
gencalm=subset(gencal, group=='Men')
gencalw=subset(gencal, group=="Women")

gencalmw=merge(gencalm,gencalw, by='year')
gencalmw['emp_diff']=gencalmw['emp_rate.x']-gencalmw['emp_rate.y']
gencalmw['unemp_diff']=gencalmw['unemp_rate.x']-gencalmw['unemp_rate.y']
gencalmw=subset(gencalmw, select=c('year','emp_diff', 'unemp_diff'))

print(gencalmw)
   year emp_diff unemp_diff
1  2004     15.1        0.3
2  2005     16.4       -0.2
3  2006     16.5       -0.3
4  2007     16.0        0.3
5  2008     15.0        0.5
6  2009     11.8        2.3
7  2010     12.0        1.6
8  2011     12.9        0.5
9  2012     13.4        0.0
10 2013     13.2        0.5

Run a linear regression to see the intercept and the coefficient.

In [48]:
%%R
year=gencalmw$year
diff=gencalmw$emp_diff
undiff=gencalmw$unemp_diff
model=lm(diff~year) #model of employment
modelun=lm(undiff~year) #model of unemployment
print(model)
print(modelun)

Call:
lm(formula = diff ~ year)

Coefficients:
(Intercept)         year  
   882.1455      -0.4321  


Call:
lm(formula = undiff ~ year)

Coefficients:
(Intercept)         year  
 -156.47818      0.07818  


Predict the employment rate difference in 2014.

Combine the data together with data in the past years, and make a plot with regression line.

Do the same for unemployment rate.

In [49]:
%%R
print(getwd())
setwd('../visualizations')
[1] "/home/oski/project/Team_Four.0/visualizations"

In [50]:
%%R
emp2014=data.frame(year=2014)
pred=predict(model, newdata=emp2014)
predun=predict(modelun, newdata=emp2014)
print(c(pred,predun))
data2014=data.frame(2014, pred, predun)
names(data2014)[1:3]=c('year','emp_diff', 'unemp_diff')
gencal14=rbind(gencalmw,data2014)

jpeg("prediction emp.jpeg")
plot(gencal14$year,gencal14$emp_diff, xlab="year", ylab="Employment Difference")
abline(model)
title(main="Employment Rate Linear Regression in California")
dev.off()

jpeg("prediction unemp.jpeg")
plot(gencal14$year,gencal14$unemp_diff, xlab="year", 
     ylab="Unemployment Difference")
abline(modelun)
title(main="Unemployment Rate Linear Regression in California")
dev.off()
       1        1 
11.85333  0.98000 

The regression results show that if our linear model is accurate, then the employment rate difference in 2014 is likely to be 11.85, and the unemployment rate difference in 2014 is likely to be 0.98.

In [51]:
Image('prediction emp.jpeg')
Out[51]:
In [52]:
Image('prediction unemp.jpeg')
Out[52]:
In [52]: