The intention of this project is to produce an interesting, informative and compelling piece of data analysis answering questions that you define yourself.
For this you will be writing a Jupyter Notebook from scratch, loading, cleaning and analyzing the data yourself and thinking about how best to present the information in order to convince your audience of your findings.
Your job will be to propose a project brief that defines questions that require the analysis of data at either the state or international level. Once your proposal is approved, you will conduct analysis on your chosen datasets to answer your questions.
This proposal will comprise of two elements:
1. A list of 3 to 6 questions that you would like to answer with the data, and justification as to why these are interesting questions.
2. The 2+ datasets that you plan to use in order to answer your questions.
When thinking about putting together your proposal and selecting your datasets, please take in to account how the final project will be graded. In particular, the following five areas are the main considerations you should be making at this stage:
Sourcing: you will receive extra credit points if you source and use a dataset beyond the list of datasets provided below.
Joining: your project should involve the use of at least two datasets and require at least one join of data in order to perform your analysis. This means you will need to find datasets that share a common column that you can join around (and is one of the main reasons we are limiting the scope of these projects to country or state-level data – so that you can join datasets by state or by country).
Cleaning: the data you source will not be perfectly fit for purpose. You may need to handle missing data, or manipulate the data in order to answer the questions you stated in your proposal. You will be graded on your cleaning and manipulation of the data, so if you select datasets and questions where no data processing is needed, then you will be limiting your grade.
Exploration: once you have your data joined and cleaned, you will be graded on how thoroughly you explore you data and how you document that exploration process in your notebook. For example, you may want to explore how
Communication: notebook. How well presented is your notebook and how readable is your code?
Communication: presentation. How clearly and convincingly do you present your findings in the final presentation?
Here are two examples of good proposals:
Proposal 1: What makes countries happy?
1. What factors contribute the most to the happiness of countries?
2. What different groups of countries exist in terms of their happiness ratings?
3. How well does happiness correlate with economic success?
1. World happiness report data
2. Country demographic data
3. Country economic data
These questions are interesting because happiness is such a complex measurement and is driven by a broad number of factors. In fact, the drivers of happiness likely vary across different countries, so this is not a simple question. As countries increasingly begin to focus on happiness of their populations as an indicator of success, research is required to understand what drives happiness in order to drive government policy.
In order to answer these questions, I will be joining the happiness index dataset with both the demographic dataset, in order to understand how demographic variables like population density affect happiness, but also with the economic indicators dataset in order to see ho w economic factors such as GDP and inequality drive happiness rates across the globe.
Proposal 2: How varied are diets across the US?
1. How does the obesity rate vary across states with diet?
2. How does physical activity correlate with diet?
1. CDC Obesity dataset
2. CDC Physical Activity dataset
3. CDC Fruits and Vegetables dataset
These questions qualify as interesting questions as they are open questions that are changing with time and have many driving factors. Additionally, it is unlikely that there is a simple, clear relationship between exercise, obesity and diet. It is logical that there is some relationship, but it is unlikely that that relationship is consistent everywhere. I expect to see some range of variance across the US states.
In order to answer these questions I will be first manipulating get data to put it in an easy-to-use format, as the format available from the CDC website is in a tabular format that will making joining or deeper data analysis challenging. I will then join all three data sets around the state column, perform my exploration and hope to present the results using the plotly state chloropleth plot, for which the documentation can be found here: https://plotly.com/python/choropleth-maps/.
Hopefully these are useful illustrations of what an ideal project proposal should look like. The two main criteria you need to hit in order to have a successful project proposal are:
• Defining interesting questions
• Make sure you can join your dataset around a column and that those datasets will help you answer your questions
What makes an interesting question?
Interesting questions should be:
Good question: “What are the most important factors associated with education levels across US states?”
• This question is open to discovering what other datapoints are actually driving another, which is also interesting because it will be useful to other people in understanding how they can build policies around your analysis.
Bad question: “What is the average education level of US states?”.
• This question is just a simple statistical question.
Provided datasets (remember, you don’t have to use any of these datasets, but they should point you in a good direction). If you would like to source your own datasets we suggest searching for “[search term] dataset” using google and trying to find csv files that have numerical data :
OPTION 1: Global demographics and indicators
Brief: imagine that you will ultimately be presenting to a team of economists working for the United Nations. Their job is to create policy recommendations to countries to help them improve the welfare of their citizens. Ask questions that you think are going to help them understand how key indicators vary across different countries.
World happiness reports:
Countries of the world demographic data:
UN census data:
OPTION 2: State data
Brief: imagine that you are presenting to a team of policy makers for the US government. Their job is to create and propose policies that are passed before the government in order to improve the lives of US citizens. Ask questions that you think are going to help them understand how key indicators vary across different states, so that they can create more effective policies.
US government open datasets:
US state COVID data:
US state demographic data from the census:
When submitting your proposal please ensure you provide links to the datasets you will be using
.Want a similar task completed for you? Worry no more! Simply place your order at myessaydoer.com by clicking on the ORDER NOW option
Myessaydoer’s team of experts is available 24/7 to assist you in completing such tasks. We assure you of a well written and plagiarism free paper. Place your order at myessaydoer.com by clicking on the ORDER NOW option and get a 20% discount on your first assignment.