BUSINESS DATA MINING

Problem 1. Explain what each of the following R functions do? You can run them in R and check the
results.
(a) c(1, 17, −6, 3)
(b) seq(1, 5, by=0.5)
(c) seq(0, 10, length=5)
(d) rep(0, 5)
(e) rep(1:3, 4)
(f) rep(4:6, 1:3)
(g) sample(1:3)
(h) sample(1:5, size=3, replace=FALSE)
(i) sample(c(2,5,3), size=4, replace=TRUE)
(j) sample(1:2, size=10, prob=c(1,3), replace=TRUE)
(k) c(1, 2, 3) + c(4, 5, 6)
(l) max(1:10)
(m) min(1:10)
(n) range(1:10)
(o) matrix(1:12, nr=3, nc=4)
(q) Let a ← c(1,2,3), b ← c(10, 20, 30), c ←c(100, 200, 300), d ← c(1000, 2000, 3000). What does
the function rbind(a, b, c, d) do? What does cbind(a, b, c, d) do?
1
2 HOMEWORK 2 DUE DATE: FRIDAY, SEPTEMBER 25 AT 11:59 PM
(r) Let C be the following matrix
a b c d
1 10 100 1000
2 20 200 2000
3 30 300 3000
What is sum(C)? What is apply(C, 1, sum)? What is apply(C, 2, sum)?
(s) Let movies ← c(“SPYDERMAN”,“BATMAN”,“VERTIGO”,“CHINATOWN”). What does
lapply(movies, tolower) do? Notice that “tolower” changes the string value of a matrix to
lower case.
(t) Let x ← factor(c(“alpha”, “beta”, “gamma”, “alpha”, “beta”)). What does the function levels(x) return?
(u) c ← 35:50
(v) c(1, 2, 3) + c(4, 5, 6)
(w) c(1, 2, 3, 4) + c(10, 20)
(x) sqrt(c(100, 225, 400))
Problem 2. Create the following vectors in R.
a = (5, 10, 15, 20, …, 160)
b = (87, 86, 85, …, 56)
Use vector arithmetic to multiply these vectors and call the result d. Select subsets of d to identify the
following.
(a) What are the 19th, 20th, and 21st elements of d?
(b) What are all of the elements of d which are less than 2000?
(c) How many elements of d are greater than 6000?
Problem 3. This exercise relates to the College data set, which can be found in the file College.csv. It
contains a number of variables for 777 different universities and colleges in the US. The variables are
• Private : Public/private indicator
• Apps : Number of applications received
• Accept : Number of applicants accepted
• Enroll : Number of new students enrolled
• Top10perc : New students from top 10% of high school class
• Top25perc : New students from top 25% of high school class
• F.Undergrad : Number of full-time undergraduates
BUSINESS DATA MINING (IDS 472) 3
• P.Undergrad : Number of part-time undergraduates
• Outstate : Out-of-state tuition
• Room.Board : Room and board costs
• Books : Estimated book costs
• Personal : Estimated personal spending
• PhD : Percent of faculty with Ph.D.’s
• Terminal : Percent of faculty with terminal degree
• S.F.Ratio : Student/faculty ratio
• perc.alumni : Percent of alumni who donate
• Expend : Instructional expenditure per student
• Grad.Rate : Graduation rate
(a) Read the data into R. Call the loaded data “college”. Explain how you do this.
(b) How many variables are in this data set. What are their measurements? How do you get these
information?
(c) Use the function colnames() to change the “Top10perc” and “Top 25per” variables names to
“Top10” and “Top25”.
(d) Look at the data. You should notice that the first column is just the name of each university.
We don’t really want R to treat this as data. However, it may be handy to have these names
for later. Try the following commands:

rownames (college) → college [,1]
You should see that there is now a row.names column with the name of each university recorded.
This means that R has given each row a name corresponding to the appropriate university. R
will not try to perform calculations on the row names. However, we still need to eliminate the
first column in the data where the names are stored. Write a code to eliminate the first column.
(e) Add a column to indicate the acceptance rate for each university (acceptance rate = number of
accepted applications / number of applications received).
(f) Provide a summary statistics for numerical variables in the data set.
(g) Use the pairs() function to produce a scatterplot matrix of the first ten columns or variables of
the data. Recall that you can reference the first ten columns of a matrix A using A[,1:10]. Can
you observe any useful information in the plots?
(h) Use the boxplot() function to produce side-by-side boxplots of Outstate versus Private. Do you
observe any useful information in this plot?
(i) Create a new qualitative variable, called Elite, by binning the Top10perc variable. We are going
to divide universities into two groups based on whether or not the proportion of students coming
from the top 10% of their high school classes exceeds 50%. Follow the code below.
4 HOMEWORK 2 DUE DATE: FRIDAY, SEPTEMBER 25 AT 11:59 PM
Elite → rep (“No”,nrow(college))
Elite[college$Top10perc > 50] = “Yes”
Elite = as.factor(Elite)
college = data.frame(college,Elite)
i. Explain each line of the above code.
ii. Use the summary() function to see how many elite universities there are. Now use the
plot() function to produce side-by-side boxplots of Outstate versus Elite.
(j) Use the hist() function to produce some histograms with differing numbers of bins for a few of
the quantitative variables. You may find the command par(mfrow=c(2,2)) useful: it will divide
the print window into four regions so that four plots can be made simultaneously. Modifying
the arguments to this function will divide the screen in other ways.
(k) What is room and board costs of private schools on average ?
(l) Create a new binary variable that is 1 if the student/faculty ratio is greater than 0.5 and 0
otherwise.
(m) Compare the distribution of out of state tuition for private and public colleges.
Problem 4. This exercise involves the “Auto” data set.
(a) Remove the missing values from this data set.
(b) What is the range of each quantitative predictor? You can answer this using the range() function.
(c) What is the mean and standard deviation of each quantitative predictor?
(d) Remove the 10th through 85th observations. What is the range, mean, and standard deviation
of each predictor in the subset of the data that remains?
(e) Using the full data set, investigate the predictors graphically, using scatterplots or other tools of
your choice. Create some plots highlighting the relationships among the predictors. Comment
on your findings.
(f) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Do your
plots suggest that any of the other variables might be useful in predicting mpg? Justify your
answer.
Problem 5. FiveThirtyEight, a data journalism site devoted to politics, sports, science, economics,
and culture, recently published a series of articles on gun deaths in America. Gun violence in the
United States is a significant political issue, and while reducing gun deaths is a noble goal, we must first
understand the causes and patterns in gun violence in order to craft appropriate policies. As part of the
project, FiveThirtyEight collected data from the Centers for Disease Control and Prevention, as well as
BUSINESS DATA MINING (IDS 472) 5
other governmental agencies and non-profits, on all gun deaths in the United States from 2012-2014.You
can find this dataset, called ”gun deaths.csv”, on blackboard.
(a) Generate a data frame that summarizes the number of gun deaths per month.
(b) Generate a bar chart with labels on the x-axis. That is, each month should be labeled “Jan”,
“Feb”, “Mar” and etc.
(c) Generate a bar chart that identifies the number of gun deaths associated with each type of intent
cause of death. The bars should be sorted from highest to lowest values.
(d) Generate a boxplot visualizing the age of gun death victims, by sex. Print the average age of
female gun death victims.
Answer the following questions. Generate appropriate figures/tables to support your conclusions.
(e) How many white males with at least a high school education were killed by guns in 2012?
(f) Which season of the year has the most gun deaths? Assume that
– Winter = January – March
– Spring = April – June
– Summer = July – September
– Fall = October – December
– Hint: You need to convert a continuous variable into a categorical variable.
(g) Are whites who are killed by guns more likely to die because of suicide or homicide? How does
this compare to blacks and Hispanics?
(h) Are police-involved gun deaths significantly different from other gun deaths? Assess the relationship between police involvement and other variables.

Sample Solution

The post BUSINESS DATA MINING appeared first on homework handlers.

Exponential smoothing

What are the advantages of Exponential smoothing over the Moving average and the Weighted moving average? (Marks 2)
Explain the aggregate planning strategy?

Sample Solution

The post Exponential smoothing appeared first on homework handlers.

Cause/Effect Essay Essay

Choose one of the following topics to write on the causes OR the effects (these are two different things). Some of the topics are specified as only the effects or just the causes. This essay will also give you a chance to incorporate research. If you will notice, these essay are setup as arguments to make since it will be easier to take a side. Many of you will take ENG 112 after this class, and this will give you a chance to be better prepared with research and argument.
When you have decided on your topic, you must research your topic and have at least TWO sources.

  1. Argue that either a college education should or should not be a job requirement by focusing on the effects of getting a college education.
  2. Argue that students are too dependent on technology by focusing on the causes OR the effects of using technology.
  3. Argue that students need to have access to more technology in education by focusing on the causes OR the effects of technology in education. ( MY CHOICE)
  4. Argue that social media should be limited for teenagers by focusing on its effects on body image.
  5. Looking at the causes of negative body image, make an argument for one thing that could change to help lead to positive body image.

Sample Solution

The post Cause/Effect Essay Essay appeared first on homework handlers.

Application of psychology studies

Provide a reflection of at least 500 words (or 2 pages double spaced) of how the knowledge, skills, or theories of this course have been applied, or could be applied, in a practical manner to your current work environment. If you are not currently working, share times when you have or could observe these theories and knowledge could be applied to an employment opportunity in your field of study.

Requirements:

Provide a 500 word (or 2 pages double spaced) minimum reflection.
Use of proper APA formatting and citations. If supporting evidence from outside resources is used those must be properly cited.
Share a personal connection that identifies specific knowledge and theories from this course.
Demonstrate a connection to your current work environment. If you are not employed, demonstrate a connection to your desired work environment.
You should NOT, provide an overview of the assignments assigned in the course. The assignment asks that you reflect how the knowledge and skills obtained through meeting course objectives were applied or could be applied in the workplace.

Sample Solution

The post Application of psychology studies appeared first on homework handlers.