The purpose of the project is for you to gain experience in applying the methods taught in the class to a real data set of interest to you.

Objectives

At the end of this project, you will be able to:

1.Define a research question and define appropriate variables to measure on a topic that you care about

2. Decide on an experimental design

3. Gather and analyze original data (rather than data prepared for you) on an issue of interest to you

4. Prepare an appropriate report

5. Provide constructive, thoughtful feedback to peers on their projects

Project Suggestions

Conducting an empirical analysis of economic data can be rewarding and informative. The first step in conducting an empirical analysis is choosing the topic you want to study and, within that topic, the specific question or questions you will investigate. Although there is not a single best way to choose a topic, the following suggestions might be useful.

Pick a topic that you find personally interesting, ideally one about which you already have some knowledge. The topic might be related to your career interests, summer work you did, employment experience of a family member, or something of intellectual interest to you. Often, a specific policy problem, a personal decision, or a business issue raises questions that can be addressed by an empirical study.

Make the question that will be the main focus of your study as specific as possible. The more narrowly the question relates to a measurable causal effect, the easier it will be to answer.

Check the related literature. You might find published studies on topics closely related to yours. Use previous work to give you ideas about data sources and about what questions have not yet been answered.

Choose a question that can be answered using the available data. Although the question you originally pose might not be answerable using available data, the data might support the analysis of a related and equally interesting question.

Share your topic on the discussion board. If you find your topic interesting, then the odds are that others will too, and an instructor or classmate might suggest an angle that you have not thought of.

As shown in the table below, the course project consists of 6 checkpoints that lead up to, and include, the final report.

Project Format

This project relies on multiple regression analysis to analyze a data set that is of interest to you. The final report for the project should be a 5-10 page single-spaced paper that describes the questions of interest, how you used your data set to analyze these questions with details on the steps you used in your analysis, your findings about your question of interest and the limitations of your study. Specifically, your report should contain the following:

1. Introduction.

The introduction succinctly states the problem you are interested in, briefly describes your data and the method of analysis, and summarizes your main conclusions. A summary of what you set out to learn, and what you ended up finding. It should summarize the entire report.

2. Data Description.

This section provides the details of the data sources, any transformations you have done to the data (for example, changing the units of some variables), gives a table of summary statistics (means and standard deviations) of the variables, and provides scatter-plots and/or other relevant plots of the data. If there are outliers other than those arising from corrected typographical or computer errors, this is the place to point them out.

3. Regression Analysis

Describe how you used multiple regression to analyze the data set. Specifically, you should discuss how you carried out the steps in analysis discussed in class, i.e., exploration of data to find an initial reasonable model, checking the model and changes to the model based on your checking of the model.

4. Empirical Results

This section provides the main empirical results in the paper. Conventionally, regression results are presented in tabular form, with footnotes clearly explaining the entries. The initial table of results should present the main results; sensitivity analysis using alternative specifications can be presented in additional columns in that table or in subsequent tables. For organizational purposes and clarity, you may chose to have some tables at the end of the paper, with appropriate references in the body of the paper as needed. The text should provide a careful discussion of the results, including assessments both of statistical significance and of economic significance, that is, the magnitude of the estimated relations in a real-world sense.

5.Summary and Discussion.

This section summarizes your main empirical findings and discusses their implications for the original question of interest. Describe any limitations of your study and how they might be overcome in future research and provide brief conclusions about the results of your study.

These are the proejct activities we did for this project.

Project Checkpoint 1: Topic Selection

Project Checkpoint 2: Hypothesis & Research Question

Project Checkpoint 3: Identify Variables for Study

Project Checkpoint 4: Data Sets

Project Checkpoint 5: Regression Analysis

Project Checkpoint 6: Final Report-Sunday, July 28

Project Checkpoint 1

Topic Selection

The model and the data are the starting points of an econometric project. The first step in formulating a model is to select a topic of interest and to consider the model’s scope and purpose. In particular thought should be given to the objectives of the study, what boundaries to place on the topic, what hypotheses might be tested, what variables might be predicted, and what policies might be evaluated. Close attention must be paid, however, to the availability of adequate data. In particular the model must involve causal relations among measurable variables.

The topic selected can be economic or non-economic. It could be a particular market (the market for college graduates, the market for economists, the market for cellular phones), a process (economic development, inflation, unemployment), demographic phenomena (birth rates, death rates), environmental phenomena (water quality, air quality), political phenomena (elections, voting behavior of legislatures), some combination of these, or some other topic.

You are free to choose the topic of your choice. The topic you choose will require approval from your instructor. Some paper title examples are presented below:

Air pollution and Population

Differential Growth in U.S. Cities

Birth Rates, Death Rates, and Economic Growth in Developing Economies

Economic and Social Determinants of Infant Mortality in the United States

The Relationship between Exports and Growth in Less Developed Countries

Remember that these ideas above are merely examples of reasonable topics. You should be original and follow your own interests. Perhaps the best choice of a topic is one in which you have prior experience or knowledge.

Keep in mind that this project is studying the impact of some independent variable X on a dependent variable Y. But since there are many variables X that have influence on the variable Y, it is important to include all those variables on the right hand side of the equation.

To ensure that the model is both interesting and manageable, it should contain at least three to four independent variables on the right hand side. The model should be formulated as an algebraic, linear, stochastic equation along with a corresponding verbal statement of the meaning of the equation. The expected signs of all the coefficients should be considered. All relevant multipliers, short-run and long-run, should be identified and considered.

An example would be: Being a college student, I know that college can be very expensive. However, depending on where you go to school, this price can vary, sometimes with even more then $30,000 difference. For my project I would like to study this variation of tuition price and how it is affected by factors such as school ranking, location, private/public standing, etc.

Project Checkpoint 2

Hypothesis and Research Question

Particular thought should be given to the objectives of the study, what boundaries to place on the topic, what hypotheses might be tested, what variables might be predicted, and what policies might be evaluated.

Once you have a general understanding of your topic, narrow it down into a manageable research question or hypothesis. This will help you define the parameters of your research, as well as your argument. A research hypothesis is a statement of expectation or prediction that will be tested by research.

Hypotheses look very much like “mini-arguments”; the objective of the research paper is to present evidence that will prove those hypotheses.

Before formulating your research hypothesis, read about the topic of interest to you. From your reading, which may include articles, books and/or cases, you should gain sufficient information about your topic that will enable you to narrow or limit it and express it as a research question. The research question flows from the topic that you are considering. The research question, when stated as one sentence, is your Research Hypothesis.

In your hypothesis, you are predicting the relationship between variables. Through the disciplinary insights gained in the research process throughout the year, you “prove” your hypothesis. This is a process of discovery to create greater understandings or conclusions.

An example of this checkpoint would be:

The objective of my study is to figure out whether the enormous population in China is the main factor that leads to low per capita GDP in China and if there are other factors account for this phenomenon. The boundary is that I would only focus on the correlation between GDP and population in China, not other countries. My hypothesis is that the enormous population in China is not the strongest factor that leads to low per capita GDP in China.

Project Checkpoint 3

Identify Variables for the Study

Keep in mind that this project is studying the impact of some independent variable X on a dependent variable Y. But since there are many variables X that have influence on the variable Y, it is important to include all those variables on the right hand side of the equation.

To ensure that the model is both interesting and manageable, it should contain at least three to four independent variables on the right hand side. The model should be formulated as an algebraic, linear, stochastic equation along with a corresponding verbal statement of the meaning of the equation. The expected signs of all the coefficients should be considered. All relevant multipliers, short-run and long-run, should be identified and considered.

An example of this section would be:

The soccer players’ salaries and their performances/statistics.

The dependent variable will involve salary and my independent variable would be games played, goal, assist, and a number of yellow or red cards during 2015-16 season. The most important parts would be a number of goals or assist they brought to the game.

Salary(y)= b0+ b1x1 + b2x3 + b3x3 + b4x4

From the correlation equation, the expected salary would be positive correlation with the number of goals, games played, assists, and the number of cards would be negative correlation in the season.

Project Checkpoint 4

Data Sets

Before finding a data set, you must be aware of what data will help you to answer the question you are investigating. It helps to understand how you intend to perform your analysis. What unit of observation would be most useful ( local governmental data? international data? etc.)?

In order for you to choose the right data set, you must be clear about what variables you are using before you search for your data set. You should already know what you are using for your dependent variable and what variables will help you answer the research question most effectively.

The library sources would be a good place to start in your search for data. In addition to the material resources available there, you can also seek assistance from the data librarian, who will point you in the right direction.

Here are some ideas for data sources that are available for public use:

Statistical Abstract of the US

Statistical Handbooks

Statistical Yearbooks

Federal Reserve Economic Data

International Economic Conditions

National Statistical Abstract

Center for Research in Securities Prices

Project Checkpoint 5

Regression Analysis

The first step in your empirical analysis is getting familiar with the data. Plot the data, using histograms and/or scatterplots. Are there big outliers, and if so are those observations accurately recorded or are they typographical or data manipulation errors? Be very careful when you input your data as any errors may completely throw off your analysis. Once you feel that the data are is error-free, you can start looking at specific relationships. Are the units of the data the ones you expected, and are they the ones you want to use? Do the relations you see in the scatterplots make sense? Do relationships look linear, or do they look nonlinear?

Once you are acquainted with your data set, you can begin your regression analysis. This is the point at which all the previous work you have done preparing your study begins to pay off.

Project Part 6: Final Report

Project Format

This project is relies on multiple regression analysis to analyze a data set that is of interest to you. The final report for the project should be a 5-10 page paper that describes the questions of interest, how you used your data set to analyze these questions with details on the steps you used in your analysis, your findings about your question of interest and the limitations of your study. Specifically, your report should contain the following:

Introduction. The introduction succinctly states the problem you are interested in, briefly describes your data and the method of analysis, and summarizes your main conclusions. A summary of what you set out to learn, and what you ended up finding. It should summarize the entire report.

Data Description. This section provides the details of the data sources, any transformations you have done to the data (for example, changing the units of some variables), gives a table of summary statistics (means and standard deviations) of the variables, and provides scatterplots and/or other relevant plots of the data. If there are outliers other than those arising from corrected typographical or computer errors, this is the place to point them out.

Regression Analysis.Describe how you used multiple regression to analyze the data set. Specifically, you should discuss how you carried out the steps in analysis discussed in class, i.e., exploration of data to find an initial reasonable model, checking the model and changes to the model based on your checking of the model.

Empirical Results. This section provides the main empirical results in the paper. Conventionally, regression results are presented in tabular form, with footnotes clearly explaining the entries. The initial table of results should present the main results; sensitivity analysis using alternative specifications can be presented in additional columns in that table or in subsequent tables. For organizational purposes and clarity, you may chose to have some tables at the end of the paper, with appropriate references in the body of the paper as needed. The text should provide a careful discussion of the results, including assessments both of statistical significance and of economic significance, that is, the magnitude of the estimated relations in a real-world sense.

Summary and Discussion. This section summarizes your main empirical findings and discusses their implications for the original question of interest. Describe any limitations of your study and how they might be overcome in future research and provide brief conclusions about the results of your study.

This is the rubric for this project:

Final Project Rubric

INTRODUCTION

Poor (60%): The introductions does not state the problem of interest or describe the data and method of analysis. It does not summarize main conclusions, and does not provide a summary of what we set out to learn. There is essentially no introduction.

Fair (70%): There is a short introduction that may or may not state the problem of interest. It mentions the data, but does not describe it. No major conclusions are drawn.

Good (80%): The introduction is a bit lengthy or short but does state the problem of interest, describes data and methods of analysis, although not clearly. Main conclusions are presented are not summarized. Excellent (90%): The introduction succinctly states the problem of interest, briefly describes the data and method of analysis, and summarizes main conclusions, a summary of what you set our to learn, and what you ended up finding.

Perfect (100%)

DATA DESCRIPTION:

Poor (60%): This section does not detail data sources, transformations, nor does it give any summary statistics. It does not provide any graphically relevant information.

Fair (70%): This section provides very little information in regards to data sources, transformations, and summary statistics. There are few or no scatter plots or other relevant graphical representations.

Good (80%): This section covers data sources, transformations, summary statistics, and scatter plots, and outliers if there are any. However, the order, structure, and presentation of the data does not flow well. Excellent (90%): Provides details of data sources, transformations of the data, gives a table of summary statistics, and provides scatter plots, and other relevant plots of the data. If there are outliers, they are made obvious.

Perfect (100%)

REGRESSION ANALYSIS

Absent (0%): the criteria is completely absent.

Poor (60%): The use of multiple regression is not clear nor is the regression model explained. There are no references made to class material.

Fair (70%): A multiple regression model is evident but is not clearly explained. Few, if any, references to the course material is made. Good (80%): The use of multiple regression is made clear, but there is not strong evidence supporting the use of a particular model. There are references made to supporting course material to justify decisions.

Excellent (90%): How multiple regression was used is described. The steps of analysis that were covered in readings/lectures is discussed. The process for deciding on a particular regression model is well discussed and justified.

Perfect (100%)

EMPIRICAL RESULTS

Absent (0%): the criteria is completely absent.

Poor (60%): This section has no appropriate empirical results presented. There may be regression results that do not refer to the regression model that was outlined in the regression analysis section. There is no discussion surrounding the results.

Fair (70%): Multiple regression results are presented. However, there is little to no discussion of the results, and if subsequent tables are needed, they are not presented.

Good (80%): All results of analysis are presented. However, the discussion of the results is unclear or inaccurate.

Excellent (90%): Regression results are presented in tabular form with footnotes explaining entries. The table presents main results while sensitivity analysis or alternative specifications are outlined in subsequent tables. The text provides a careful discussion of the results, including assessments of both statistical and economical significance.

Perfect (100%)

SUMMARY AND DISCUSSION

Absent (0%): the criteria is completely absent.

Poor (60%): There is no summary or discussion of the empirical results. No limitations for the study are presented, and there is no conclusion.

Fair (70%): There is little understanding or description of the empirical results and their implications for the original question of interest. There are little to no limitations presented and an unclear conclusion.

Good (80%): Empirical findings are summarized and implications refer to the original question are limited. Limitations of the study are very briefly addressed, and the conclusion of the study could be clearer.

Excellent (90%): This section summarizes the main empirical findings and discusses implications. Limitations of the study are addressed so that they may be overcome in the future. A very brief conclusion of the results is presented.

Perfect (100%)