Data analytics
cmranchitrakar
Question
The purpose of the project is for you to apply what you learnt fromat least 4 modules on your dataset and make some inferences or estimations. Here I am asking you to do only 4 tests or analysis. But the key is – you bring the data and you come up with the question, and each question/set of analysis represents something you learnt from the Modules (1-5). There should be four different ones.
There are 3 options, you can choose one of them(there are no restrictions on that)
- Bring your own data from work (you can remove any private or confidential information, for example: if you are bringing any sales or cost data of an item/product or service – the name can be masked)
- Use data from your previous work or company you have access to (again you can remove any private/confidential information)
- Use data from public domain – In today’s world, there is no dearth of structured data. Here are some places where you can get data from:
Note: But remember, not all data are suitable for project. You need to have minimum number of data points (see below in requirements) and the data set cannot be random numbers. Proper citation is needed for source of data.
- Any data source you have access to like the Hawkes Learning Resources
- Datasets (1) from Hawkes
- Datasets (2) from Hawkes - Look at the additional datasets, not the chapter datasets
- U.S. Bureau of Labor Statics
- U.S. Government’s open data
- Center for Medicare and Medicaid services
- Kaggle datasets
- WHO Data repository
- World Bank Data
- Karami research lab's short link to public databases with data
- Google Public data explorer
- Amazing visualization or graphics
But remember, we need the data to do analysis, if you look at the bottom of any figure – Google would provide the source name, and you can retrieve data from there.
- Any data source you have access to like the Hawkes Learning Resources
- Any sports data (from the appropriate website, getting data in structured format for several years might be challenge, but a few minutes or an hour – you can do it)
- For example – Cricket data could be obtained from espncricinfo.
- Module 1 : Normal Distribution (Percentile, distribution of means, and chance of occurrence if we assume normal distribution)
- Module 2 : Confidence Interval Estimation (Including Sample Size determination)
- Module 3 : Inferences from data (Hypothesis testing, i.e., confirming or checking if a claim made about
- >span class="textLayer--absolute">Module 4 : More Inferences from data (Multiple samples)
- Module 5 : Regression analysis (Both simple and multiple, apart from basic ANOVA)
Details
No Answers Yet