STATISTICSS AND ANALYTICS (EXPERIMENTS 1-7)

EXPERIMENT -1

AIM: To prepare a close/open ended hand written questionnaire containing 25 questions.

PURPOSE: A questionnaire can be helpful for collecting data and analyse it.

STRUCTURE: 


 








EXPERIMENT -2

AIM: To transform a questionnaire into a Google form for a well-defined problem statement.

PURPOSE: A questionnaire in the form of Google forms helps to gather data by online survey.

STEPS INVOVLED:

 The reference questionnaire in experiment 1 is taken to be transformed into Google form. The following are the steps involved in creating the Google form.




• Click on Google apps icon




• Click on My Drive




• Click on New



Click on Google forms for a blank form




• Enter the title and form description

• Start entering the closed ended questions






EXPERIMENT -3

AIM: Send out a survey for a well-defined problem statement and collect the dataset in the spreadsheet (.csv file)

PURPOSE: Data in spreadsheet (.csv file) is very useful to analyze the dataset and interpretation.


STEPS INVOVLED

The reference questionnaire in experiment 1 is taken to be transformed into Google form and sent for online survey. The following are the steps involved in sending Google forms for online survey and download the responses in spreadsheet (.csv file).

• Gather the email ID’s of the respondents

• Click on the send, enter the recipient’s email ID, click on dialogue box and send.



Click on responses and download the .csv file






VIEW OF .CSV FILE












EXPERIMENT - 4

AIM: Remove unwanted observations from the dataset (spreadsheet) provided, including duplicate observations or irrelevant observations.

PURPOSE: Data cleaning is very important process in maintaining the quality of the dataset and conduct an efficient hypothesis testing.

STEPS INVOVLED:

The following are the steps involved in arranging the data in similar pattern.

➢ Using the TRIM function in Excel the data in a column can be arranged according to one’s priority. Below is an example where the TRIM function is used to fix the format for all the required cells in the spread sheet.

The data set shown below consists of the list of Hollywood movies in which the data is arranged erratically. The mistakes can be fixed using the TRIM function.



In the first movie name the text is arrange for proper text as shown above

Mistakes are fixed in the first movie and the auto format is used to format the rest




Now the data set is arranged according to the priority



This is the best practice to make the data look identical.

➢ In the example below the dataset consists of 50 passengers and their ages who travelled in the Titanic ship. Some cells in age column are empty. The logical test (IF) is used to fill the missing values in those cells.



A new age column is created and if logic is built.

If the value of the cell exists then the value is retained and if it is empty then a particular value of decision, say 28, is filled.

The code below executes this task.

IF(logical_test, [value_if_true], [value_if_false])




Auto format fill is used to fill this to all the cells in that column of new age. It is seen that the missing data is filled.





➢ DEDUPLICATION OF DATA
Duplicate values happen when the same value or set of values appear in the data.
Removal of such data is called Deduplication of data.
The following are the steps involved in removing duplicates in the data set.
In the data set given below contains the malnutrition in kids in different states of India, based on the three reasons is given. The data set contains duplicates which are highlighted. Now Duplicates are removed as shown below




• Select the entire data vertically and horizontally as well
• Click on DATA tab
• Then click on remove duplicates button




• On the window which appears on the screen, Click OK




• A window which appears shows the number of duplicates removed and the number of unique values remaining in the dataset. Click on OK button


• The dataset is now cleaned for duplicate data.







 FOR MORE CLARIFICATION :https://youtu.be/fJh1VRKFZA4










EXPERIMENT -5 

 In MS Excel Sread sheet draw the frequancy distribution table for the randomly generated data (Data set should contain minimum 50 data. 

Aim: 
Generate a random data of 50 students, who scored marks in any one subject (Subject-1) and find the frequency of the data generated. 

 • Steps involved: •

 1) Generate random data of 50 students, using RANDBETWEEN formula by taking the limits 1 to 100 [=RANDBETWEEN(1,100)] 

 • 2) copy the randomly generated values to subject-1 and in paste options choose values, as given below. (Otherwise, values change every time when you press enter



3) Now write class interval and upper limit as given below 






4) Select all the five cells below frequency heading, use frequency formula [=FREQUENCY] and select the 50 data generated and upper limit values, close the brackets and the press control shift enter to get Frequency.




OUTPUT






EXPERIMENT -6 
 In Microsoft Excel spread sheet draw the Relative frequency distribution table for the randomly generated data (data set should contain minimum 50 data). 

 Aim: 

Generate a random data of 50 students, who scored marks in any one subject (Subject-1) and find the relative frequency of the data generated. 
 
• Steps involved: •

 1) Generate random data of 50 students, using RANDBETWEEN formula by taking the limits 1 to 100 [=RANDBETWEEN(1,100)] 

 • 2) copy the randomly generated values to subject-1 and in paste options choose values, as given below. (Otherwise, values change every time when you press enter)  



3) Now write class interval and upper limit as given below 



4) Select all the five cells below frequency heading, use frequency formula [=FREQUENCY] and select the 50 data generated and upper limit values, close the brackets and the press control shift enter to get Frequency.



 

5) Find total frequency and insert another column for relative frequency and use formula frequency/Total frequency [=frequency value/50] and press enter. Now go to the corner of the cell. When black plus appear, click and drag it to the end and the total of relative frequency values should be “1”




  
OUTPUT












Experiment-07:

 To conduct survey on favourite fruit of 50 persons using excel spread sheet and to plot bar graph for the collected data and explain the graph in 30 words

 Aim: 

 Enter a random data of 50 persons of their favourite fruits (05 Fruits to be chosen in random) and plot the bar graph for the data generated,

• Choose 05 fruits such as 

 • Apple,

 • Mango,

 • Banana, 

• Papaya and

 • Guava

 and enter the fruits randomly to 50 persons as given below 




 Now write Fruits names and Frequency (Count of fruits) in a table as given below





Now use COUNTIF formula to find the count of each fruits by using [=COUNTIF(range, criteria)] and select the data range and enter the fruit name in double quotes and press enter to get the count of that fruit and repeat the same steps to find the count of other fruits as given and we get the result as below. 






Now select the fruits and Count of fruits, go to insert and select column chart and press enter to get the bar graph and it can be modified using chart elements.






Comments