In this project you will chose a data set that interests you and investigate a possible association between two variables within that data set. This project will give you an opportunity to use StatCrunch to apply the skills and techniques you have learned in this class and to produce a professional report.
To produce a successful project you must:
- Read and follow the instructions carefully.
- Give yourself sufficient time to work on the project.
- Write clearly, using appropriate statistical terminology and correct mathematical notation. College-level writing is expected, as is the use of proper grammar.
- Use StatCrunch to complete all calculations and graphs.
- Create original work. The following link describes, in detail, plagiarism, fair use and HCC’s academic policy: Fair Use and Academic Honesty (Links to an external site.) Furthermore, this means that students who are repeating the course are expected to create an entirely new project using two new variables of interest.
- Submit a professional report that is typed and formatted and organized well.
SUBMISSION PROTOCOL
- Submit your project via Canvas as a PDF or Word file.
- 10% will be deducted for each calendar day the project is submitted after the due date. A project is considered “submitted” when it is available for the professor to grade in Canvas. No credit is given after a submission 5 days late.
- This assignment utilizes Unicheck, a tool that checks for plagiarism. Unicheck is integrated into the Canvas submission process. All submissions will be compared against a database and receive an originality rating.
PROJECT INSTRUCTIONS
For this project you are going to choose one data set from the list below that you find interesting and investigate an association between two variables within that data set. You will then examine the data and write a two page report. In your report you should:
- Introduce the data set and explain why you chose it.
- Describe the variables you chose and thoroughly explain what you are investigating. Be sure to define which variable is the explanatory variable and which is the response variable.
- Using StatCrunch, create an appropriate graph for the association you are investigating and calculate the correlation coefficient and the linear model.
- Be sure your graph is appropriately labeled and that it includes a title and then copy it into your paper.
- Report the correlation coefficient.
- Describe the association you are investigating using correct statistical terminology. Reference your graph and the correlation coefficient, and be sure to note any possible outliers.
- Report the linear model using correct notation.
- Interpret the slope and vertical intercept of your model, and discuss the appropriateness of your model.
- Summarize your findings and draw a conclusion.
DATA SETS
Below is a list of data sets – choose one for your project. The links will not work directly, so please follow these directions:
- In Canvas, click on MyLab and Mastering –> Open My Lab and Mastering –> StatCrunch (on the left) –> Visit the StatCrunch Website –> Open StatCrunch (yellow button at top of page, it will auto login)
- Then copy/paste a URL below into either the current tab or a new tab
- To save your data and work, go to the Data tab and choose Save
- To Re-open your saved file, click on MyLab and Mastering –> Open My Lab and Mastering –> StatCrunch (on the left) –> Visit the StatCrunch Website –>MyStatCrunch (on the right side)–>My Data
U.S. CBP Drug Seizure Statistics: https://www.statcrunch.com/app/index.php?dataid=28…
This data set summarizes the pounds of drugs seized at ports of entry and between points of entry by the U.S. Customs and Border Protection Agency. https://www.cbp.gov/newsroom/stats/cbp-enforcement…
U.S. Presidential Data: https://www.statcrunch.com/app/index.php?dataid=31…
This data set contains information on the U.S. Presidents from 1789-2019.
Fatal Encounters Updated September 2018: https://www.statcrunch.com/app/index.php?dataid=30…
This data set contains information on fatal encounters. Fatal Encounters is a non-profit organization that collects data on police involved deaths. Note: This is a volunteer agency collecting the data from people who are scouring new articles for evidence of these fatal encounters. Thus, this is not a complete population of fatal encounters, only a large sample. https://fatalencounters.org/
College Basketball Arenas: https://www.statcrunch.com/app/index.php?dataid=29…
This data set contains information on college basketball arenas throughout the country.
Marriage vs. the Economy: https://www.statcrunch.com/app/index.php?dataid=28…
This data set compares the number of marriages in the last 30 years to several factors of the economy.
Medical Costs: https://www.statcrunch.com/app/index.php?dataid=26…
This data set contains a variety of personal data in regards to medical costs.
MLB August 2019 Batting: https://www.statcrunch.com/app/index.php?dataid=31…
This data set contains MLB batter statistics and are year-to-date as of August 18, 2019.
Sample College Data: https://www.statcrunch.com/app/index.php?dataid=31…
This data set contains a variety of data for colleges and universities in Delaware, DC, Maryland, Pennsylvania, Virginia, and West Virginia. Data is for the year 2011.
Fast Food Nutritional Data: https://www.statcrunch.com/app/index.php?dataid=25…
This data set contains nutritional information on a variety of fast food items. Data was collected in January 2017 from online sources for each restaurant.
Marvel vs. DC at the Box Office: https://www.statcrunch.com/app/index.php?dataid=31…
This data set contains information on how the two comic book companies have fared at the box office. Note: The Adjusted column modifies the total Worldwide gross for inflation.
NFL Player Data 2016: https://www.statcrunch.com/app/index.php?dataid=27…
This data set lists the 2,764 NFL players for all team rosters as of July 22, 2016
Car Details 2019 Models: https://www.statcrunch.com/app/index.php?dataid=32…
This data set contains information on the 2019 models of widely-known sold cars. MSRP stands for Manufacturer Suggested Retail Price and MPG stands for Miles Per Gallon.
Super Heroes: https://www.statcrunch.com/app/index.php?dataid=26…
This data set contains various physical characteristics for over 700 fictional comic book superheroes. https://www.kaggle.com/claudiodavi/superhero-set
Movie Budgets & Box Office Earnings (Updated Spring 2018): https://www.statcrunch.com/app/index.php?dataid=21…
This data comes from the following website that tracks the financial performance of movies: https://www.the-numbers.com/movie/budgets/all; columns each are in millions of dollars
COMPLETION CHECKLIST
Use the following checklist when proofreading your project.
For each aspect below, an excellent report will:
Introduction
- Give the name of the data set chosen as well as some details describing the data set. (This may require following links or referencing the text)
- Include a clear justification for why the data set was chosen.
- Involve two quantitative variables.
- Clearly and thoroughly describe the variables chosen. (This may require following links or referencing the text)
- Clearly and correctly assign explanatory and response variables.
- Thoroughly explain the association being investigated and give a logical justification for why the author believes the association exists.
Graph
- Include an appropriate scatterplot generated using StatCrunch.
- Accurately assign and clearly label the axes for the scatterplot.
- Include an appropriate title for the scatterplot.
- Include the appropriate correlation coefficient for the association generated using StatCrunch.
- Include an accurate and thorough description of the association with reference to the graph, correlation coefficient and any outliers.
Linear Association Model
- Include an appropriate linear regression model generated using StatCrunch and written using correct notation and typesetting.
- Give an accurate and detailed interpretation of the slope of the linear regression model.
- Give an accurate and detailed interpretation of the y-intercept of the linear regression model.
- Thoroughly discuss the appropriateness of the linear regression model with reference to other aspects of the report.
Conclusion
- Includes a summary of the findings and a clear conclusion regarding the association.
- Appear highly professional, be easy to read and comprehend, and use correct statistical vocabulary.
ASSISTANCE
For this project, you may consult any resource for general help and advice provided that your computations, explanations, and embedded diagrams are your own work.
Rubric
Linear Regression Rubric
Linear Regression Rubric
Criteria |
Ratings |
Pts |
This criterion is linked to a Learning OutcomeIntroduction: Data SetAn excellent report will give the name of the data set chosen as well as some details describing the data set. |
2.0 ptsFull Credit
Name of the data set chosen as well as some details describing the data set. |
1.0 ptsHalf Credit
Name of data set is given but no further description or details. |
0.0 ptsNo Credit
Data Set chosen is not identified. |
|
2.0 pts |
This criterion is linked to a Learning OutcomeIntroduction: Justification for Data SetAn excellent report will include a clear justification for why the author chose the data set. |
2.0 ptsFull Credit
Clear and logical justification given. |
1.0 ptsHalf Credit
Justification is overly vague or not logical. |
0.0 ptsNo Credit
No justification given. |
|
2.0 pts |
This criterion is linked to a Learning OutcomeIntroduction: Two Quantitative VariablesAn excellent report will involve two quantitative variables. |
2.0 ptsFull Credit
Both variables chosen are quantitative. |
1.0 ptsHalf Credit
Only one of the chosen variables is quantitative. |
0.0 ptsNo Credit
Neither variable chosen is quantitative. |
|
2.0 pts |
This criterion is linked to a Learning OutcomeIntroduction: Description of VariablesAn excellent report will include a clear and thorough description of the chosen variables . |
4.0 ptsFull Credit
Clear and thorough description of variables given. |
2.0 ptsHalf Credit
Description of variables is unclear or nondescript. |
0.0 ptsNo Credit
No description of variables given. |
|
4.0 pts |
This criterion is linked to a Learning OutcomeIntroduction: Explanation of InvestigationAn excellent report will include a clear and thorough explanation and justification of the association being investigated. |
6.0 ptsFull Credit
Clear and thorough explanation of what is being investigated including logical justification. |
3.0 ptsHalf Credit
Description of what is being investigated is vague or illogical. |
0.0 ptsNo Credit
Description of what is being investigated is missing. |
|
6.0 pts |
This criterion is linked to a Learning OutcomeIntroduction: Explanatory and Response VariablesAn excellent report will clearly and correctly assign the explanatory and response variables. |
2.0 ptsFull Credit
Explanatory and Response variables are clearly and appropriately assigned. |
1.0 ptsHalf Credit
Explanatory and response variables are assigned incorrectly. |
0.0 ptsNo Credit
It is not clear which is the explanatory variable and which is the response variable. |
|
2.0 pts |
This criterion is linked to a Learning OutcomeGraph: Scatterplot Generated using StatCrunchAn excellent report will include an appropriate scatterplot generated using StatCrunch. |
6.0 ptsFull Credit
Report includes an appropriate scatterplot generated using StatCrunch. |
3.0 ptsHalf Credit
A scatterplot is present but it was not generated using StatCrunch. |
0.0 ptsNo Credit
No scatterplot given or scatterplot was created by hand. |
|
6.0 pts |
This criterion is linked to a Learning OutcomeGraph: Appropriately Assigned AxesAn excellent report will include appropriately assigned and clearly labeled axes for the scatterplot. |
2.0 ptsFull Credit
Axes are appropriately assigned and labeled. |
1.0 ptsHalf Credit
Axes are assigned incorrectly. |
0.0 ptsNo Credit
Axes are not labeled. |
|
2.0 pts |
This criterion is linked to a Learning OutcomeGraph: TitleAn excellent report will include an appropriate title for the scatterplot. |
2.0 ptsFull Credit
Scatterplot includes an appropriate title. |
1.0 ptsHalf Credit
A title is present but it does not accurately describe the graph. |
0.0 ptsNo Credit
Scatterplot does not have a title. |
|
2.0 pts |
This criterion is linked to a Learning OutcomeCorrelation CoefficientAn excellent report will include the appropriate correlation coefficient generated using StatCrunch. |
4.0 ptsFull Credit
Appropriate correlation coefficient is generated using StatCrunch and it is clear that the student knows which value outputted by StatCrunch is the correlation coefficient. |
2.0 ptsHalf Credit
Student includes what was generated by StatCrunch, but it is unclear that they know which value is the correlation coefficient. |
0.0 ptsNo Credit
No correlation coefficient is given. |
|
4.0 pts |
This criterion is linked to a Learning OutcomeDescription of AssociationAn excellent report will include an accurate and thorough description of the association with reference to the graph, the correlation coefficient and any outliers. |
8.0 ptsFull Credit
An accurate and thorough description of the association is given with references to the scatterplot, the correlation coefficient and any outliers. |
6.0 ptsPartial Credit
Description of association is correct but there is no reference to either the graph, the correlation coefficient or any outliers. |
4.0 ptsHalf Credit
Description of association is given but it is partially incomplete or incorrect. |
2.0 ptsPartial Credit
The description of the association has many errors and does not reference the scatterplot, the correlation coefficient or any outliers. |
0.0 ptsNo Credit
No description of the association given. |
|
8.0 pts |
This criterion is linked to a Learning OutcomeLinear Regression ModelAn excellent report will include an appropriate linear regression model generated using StatCrunch. |
4.0 ptsFull Credit
Appropriate model is generated using StatCrunch and it is clear that the student knows the equation for the model outputted by StatCrunch. |
2.0 ptsHalf Credit
Student includes what was generated by StatCrunch, but it is unclear that they know what the linear regression model is. |
0.0 ptsNo Credit
No linear regression model is given. |
|
4.0 pts |
This criterion is linked to a Learning OutcomeLinear Regression Model: Correctly TypedAn excellent report includes the linear regression model written using correct notation and typesetting. |
3.0 ptsFull Credit
Linear regression model is correct and written using proper notation. |
2.0 ptsPartial Credit
Linear regression model is correct but not written using proper notation. |
1.0 ptsPartial Credit
Linear regression model is not correct. |
0.0 ptsNo Credit
No linear regression model is given. |
|
3.0 pts |
This criterion is linked to a Learning OutcomeLinear Regression Model: Interpretation of SlopeAn excellent report will include an accurate and detailed interpretation of the slope. |
2.0 ptsFull Credit
Interpretation of the slope of the model is correct and precise. |
1.0 ptsHalf Credit
Interpretation of the slope is vague or has minor errors. |
0.0 ptsNo Credit
Interpretation of slope is missing or completely incorrect. |
|
2.0 pts |
This criterion is linked to a Learning OutcomeLinear Regression Model: Interpretation of InterceptAn excellent report will include an accurate and detailed interpretation of the intercept. |
2.0 ptsFull Credit
Interpretation of the intercept of the model is correct and precise. |
1.0 ptsHalf Credit
Interpretation of the intercept is vague or has minor errors. |
0.0 ptsNo Credit
Interpretation of the intercept is missing or completely incorrect. |
|
2.0 pts |
This criterion is linked to a Learning OutcomeLinear Regression Model: Appropriateness of ModelAn excellent report will include a detailed and thorough discussion of the linear regression model with reference to other aspects of the report. |
4.0 ptsFull Credit
Discussion of appropriateness of linear regression model is detailed and thorough and references other aspects of the report. |
2.0 ptsHalf Credit
Discussion is vague and/or doesn’t reference other aspects of report. |
0.0 ptsNo Credit
No discussion of the appropriateness of the model is given. |
|
4.0 pts |
This criterion is linked to a Learning OutcomeConclusion: Summary of FindingsAn excellent report includes an accurate summary of the findings. |
2.0 ptsFull Credit
A sufficient summary of the findings is given. |
1.0 ptsHalf Credit
The summary of findings is incorrect or incomplete. |
0.0 ptsNo Credit
No summary is given. |
|
2.0 pts |
This criterion is linked to a Learning OutcomeConclusion: Final ConclusionAn excellent report includes a clear and accurate final conclusion on the association being investigated. |
4.0 ptsFull Credit
Final conclusion on the association being investigated is clear and accurate. |
2.0 ptsHalf Credit
Final conclusion is incomplete or partially incorrect. |
0.0 ptsNo Credit
No final conclusion is given. |
|
4.0 pts |
This criterion is linked to a Learning OutcomeProfessionalismAn excellent report is highly professional in appearance, easy to read and comprehend, and uses correct statistical vocabulary. |
4.0 ptsFull Credit
Report is highly professional in appearance, easy to read and comprehend, and uses correct statistical vocabulary. |
2.0 ptsHalf Credit
Report demonstrates some professionalism but contains distracting errors or problems with formatting, organization, vocabulary or grammar. |
0.0 ptsNo Credit
Report is severely lacking in professionalism and/or errors or problems with formatting, organization, vocabulary or grammar make report far too difficult to read. |
|
4.0 pts |
Total Points: 65.0 |