Beruflich Dokumente
Kultur Dokumente
PRACTICAL FILE
Submitted for partial fulfillment for the award of
the Degree of
BACHELOR OF COMMERCE
Submitted by
NAME : SOUMYA VERMA
ENROLLMENT NO. : 09317788817
Other tools
Transpose table
Text to Column
Conditional Formatting – Highlight Cell rules (greater than, less than,
between, equal to, text that contains, a data occurring, duplicate
values)
Conditional Formatting – Top/ Bottom rules
Conditional Formatting – Data Bars
Conditional Formatting – Color Scales
Format as Tables
Format Cells – Number, Alignment, Font, Border, Fill
Cell Styles
Data validation – settings ( any value, number , custom)
Data validation – input message
Data validation – error alert
Customization - ribbon
Customization- quick access toolbar
backstage view
save as adobe pdf
Data Visualization and Analysis
Frequency
Relative frequency
Percentage frequency
Bar Graph
Histogram – Pareto (sorted diagram)
Histogram – Chart output
Histogram – Cumulative percentage
Pivot Table and its tools Histogram frequency
distribution
Pivot Chart and its tools
Descriptive statistics
Descriptive statistics for various scales
Correlation
Hypothesis Testing
One sample t test using dummy (one-tailed)
One sample t test using dummy (two-tailed)
One sample t test using test average (one-
tailed)
One sample t test using test average (two-
tailed)
t test using function (all combinations)
Two sample - Independent sample t test
Two sample - Paired Sample t test
One sample z test
Two sample z test
ANOVA – Single Factor
ANOVA – Two Factor without replication
ANOVA – Two Factor with replication
F test
Chi square test
Introduction to R
Four Panes in R
Import of Data Sheet in Excel
Descriptive statistics
Correlation
Hypothesis Testing
One sample t test
Two sample - Independent sample t test
Two sample - Paired Sample t test
One way ANOVA
F test
Chi square test
RESEARCH METHODOLOGY
Meaning of research:
Objectives Of Research:
MICROSOFT EXCEL
Microsoft Excel is a spreadsheet developed
by Microsoft for Windows, macOS, Android and iOS. It features calculation,
graphing tools, pivot tables, and a macro programming language called Visual
Basic for Applications. It has been a very widely applied spreadsheet for these
platforms, especially since version 5 in 1993, and it has replaced Lotus 1-2-3 as
the industry standard for spreadsheets. Excel forms part of the Microsoft
Office suite of software.
Microsoft excel plays a great role in formulating research reports. Advanced
functions provided by excel such as formulas whether its mathematical,
financial or statistical, pivot charts and table, data validation, what if analysis,
histograms etc. helps in presenting the collected data on a research in a much
better way with time efficiency. Moreover, excel also helps in hypothesis
creation through data analysis toolpack.
BASIC EXCEL FUNCTIONS
1. COUNT FUNCTION
This will count all cells that are NOT blank in your selected range.
2. COUNTA FUNCTION
That means it includes error values, like #VALUE!, numbers and blank spaces.
I don’t mean blank cells, I mean cells with empty text like for example if you
entered a space in a cell then COUNTA would count that cell.
3. COUNTBLANK FUNCTION
You’ll notice that the syntax is ‘range’ and there’s only one of them.
This is because unlike COUNT and COUNTA, the COUNTBLANK
function cannot handle non-contiguous ranges.
4. IF FUNCTION
Here, the criteria was to pay only those workers who took less than 4
leaves and the text entered was ‘yes’ if to be paid and ‘no’ if not to be
paid.
5. CONCATENATE
This function helps in combining the text entered in 2 or more cells in the active
cell using the following syntax:
=CONCATENATE(text1,text2,”abc”)
6. SUM
The SUM function is the first must-know formula in Excel. It usually
aggregates values from a selection of columns or rows from your selected
range.
=SUM(number1, [number2], …)
7. AVERAGE
The AVERAGE function should remind you of simple averages of data such as
the average number of shareholders in a given shareholding pool.
=AVERAGE(number1, [number2], …)
=MIN(number1, [number2], …)
9. SUMIF
The SUMIF function adds all numbers in a range of cells based on one criteria
10. COUNT IF
To count cells based on multiple criteria, use the following COUNTIFS
function.
Here, the criteria used is to count the number of students who scored more than
90 marks in each of the subjects.
HLOOK-UP AND VLOOK-UP FUNCTION
When the VLOOKUP function is called, Excel searches for a lookup value in
the leftmost column of a section of your spreadsheet called the table array. The
function returns another value in the same row, defined by the column index
number.
HLOOKUP is similar to VLOOKUP, but it searches a row instead of a column,
and the result is offset by a row index number. The V in VLOOKUP stands
for vertical search (in a single column), while the H in HLOOKUP stands
for horizontal search (within a single row).
VLOOKUP FUNCTION
For eg. On the basis of miles for state calculate charge:
DROPDOWN FUNCTION
The drop-down list is created with the Data Validation feature, and provides the
user with the list of choices based on the item table. Once the user selects an
item from the drop-down, Excel formulas populate the price and description
columns with the VLOOKUP function.
For eg: create drop down and then look for cost and colour
SOLUTION:
HLOOK UP FUNCTION
STEPS
Select the range of data you want to rearrange, including any row or column
labels, and press Ctrl+C.
Choose a new location in the workbook where you want to paste the transposed
table and the right click and select Paste Transpose under Paste Special
CONDITIONAL FORMATING FUNCTION
(A) MEANING:
(B) STEPS:
(C) EXAMPLE:
Create a table, then convert it back into a Range. On the worksheet, select a range
of cells that you want to format by applying a predefined table style.
STEPS
Select the cells you want to format as a table. From the Home Tab, click the
Format as Table command.
Select a Table style from the drop-down menu. A dialog box will appear,
confirming the selected cell range for the table.
Click OK.
FORMAT cells in Excel change the appearance of a number without changing the
number itself.
STEPS
Select the cells you want to format. On the Format menu, click Cells. In the Format
cells dialog box, make the required customisations.
RESULT.
CELL STYLES
Excel has CELL styles which make it more efficient to style your Excel worksheet.
STEPS
Select the cells which you want to style. On the Home tab, click on Cell Styles.
DATA VALIDATION is a feature in Excel used to control what a user can enter.
STEPS
Select the cells you want to create for. Select Data Validation under the Data tab.
Select the list option under Allow.
DATAVALIDATION ( INPUT MESSAGE)
DATA VALIDATION is a feature in Excel used to control what a user can enter.
STEPS
Select the cells you want to create for. Select Data Validation under the Data tab.
Enter the Input Message that may tell the user what data to enter.
DATAVALIDATION (ERROR ALERT)
DATA VALIDATION is a feature in Excel used to control what a user can enter.
STEPS
Select the cells you want to create for. Select Data Validation under the Data tab.
Enter the Error Alert message that will detect the wrong input.
RESULT.
CUSTOMIZATION (RIBBON)
STEPS
Right-Click the Ribbon and select Customize the Ribbon from the drop-down
menu.
The Excel Options dialog box will appear. Locate and select New Tab. Make sure
the New Group is selected, select a command, then click Add. You can also drag
commands directly into a group.
When you are done adding commands, click OK. The commands will be added to
the Ribbon.
CUSTOMIZATION (QUICK ACCESS TOOLBAR)
STEPS
Right-Click the Ribbon and select Customize Quick Access Toolbar from the
drop-down menu.
In the choose Commands from list, click Commands Not in the Ribbon.
Find the command in the list, and then click Add.
BACKSTAGE VIEW
Backstage view is an option that allows you to manipulate aspects of a file. The
backstage view gives access to saving, opening, info about the open file, creating a
new file, printing, and recently opened files.
First column of the backstage view will have the following options −
1 Save
2 Save As
A dialogue box will be displayed asking for sheet name and sheet
type. By default, it will save in sheet 2010 format with extension
.xlsx.
3 Open
4 Close
5 Info
7 New
8 Print
This option saves an opened sheet and displays options to send the
sheet using email etc.
10 Help
You can use this option to get the required help about excel 2010.
11 Options
12 Exit
STEPS
Open your Excel workbook and select ranges or tables you want to convert to a
PDF file.
In Excel click File>Save As. In the Save As dialog window, select PDF from the
“Save As type” drop-down list.
1.FREQUENCY
The FREQUENCY function in Excel calculates how often values occur within the
ranges you specify in a bin table.
STEPS
STEPS
3. PERCENTAGE FREQUENCY
The Percentage Frequency is found by multiplying each relative frequency value
by 100.
STEPS
RESULT.
4. BARGRAPH:
STEPS
Select the range of cells which are to be represented under bar graph.
Go to Insert>Bar Chart.
5. HISTOGRAM USING GRAPH TAB
STEPS
Select the range of cells which are to be represented under bar graph.
Go to Data > Data Analysis.
A Pivot Table allows you to extract the significance from a large, detailed data set.
STEPS
STEPS
Click any cell inside the Pivot Table. On the Charts tab, in the Insert group, click
on Pivot Chart.
Select the desired format of Pivot Chart and click OK.
8. DESCRIPTIVE STATISTICS
Descriptive statistics are one of the fundamental “must knows” with any set of
data.
STEPS
Grou Grou
pA pB
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77
Descriptive statistics are one of the fundamental “must knows” with any set of
data.
STEPS
Enter the data for which descriptive statistics data is to be obtained.
Grou Grou
pA pB
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77
9. CORRELATION
A Correlation coefficient (a value between -1 and +1) tells how strongly two
variables are related to each other.
STEPS
RESULT.
Hypothesis Testing
ONE SAMPLE T-TEST USING DUMMY (One-tailed)
Problem Statement:
Hypothesis:
H0: µ<=40
H1: µ>40
Age Dumm
y
18 0
24 0
56
78
67
24
65
89
76
23
45
65
78
55
32
33
44
STEPS
Go to Data > Data Analysis > T-Test sample assuming equal variance
t-Test: Two-Sample Assuming Equal
Variances
Dumm
Age y
51.2941
Mean 2 0
519.720
Variance 6 0
Observations 17 2
Pooled Variance 489.148
8
Hypothesized Mean Difference 40
df 17
0.68311
t Stat 6
0.25186
P(T<=t) one-tail 9
1.73960
t Critical one-tail 7
0.50373
P(T<=t) two-tail 8
2.10981
t Critical two-tail 6
Decision Rule:
Here,
t Stat value=0.683,which is less than t critical value= 1.739 therefore null hypothesis is
accepted.
Here p value is 0.25 which is greater than 0.05, so null hypothesis is accepted.
Inference:
The Null Hypothesis is accepted. Therefore, population mean age is greater than 40.
TWO SAMPLE –PARIED MEANS (One-tailed)
Problem Statement:
Is there sufficient evidence to suggest that the mean to exertion is greater after
Choco Milk than after Carbo Replacement drink? Use a significant level of α =
0.05
Hypothesis:
H0: µ1 - µ2<=0
H1: µ1 – u2> 0
Carbo
Cyclist Chocolate Milk Replacement
1 50.46 42.9
2 47.08 50.1
3 57.51 41.67
4 46.6 32.69
5 29.1 46.33
6 57.5 31.63
7 23.87 20.61
8 28.65 14.99
9 35.37 20.11
Steps:
Go to Data >Data Analysis > t-Test: Paired Two Samples for Means
t-Test: Paired Two Sample for
Means
Chocolate Carbo
Milk Replacement
Mean 41.79333333 33.44777778
Variance 164.53125 160.9338194
Observations 9 9
Pearson Correlation 0.508406248
Hypothesized Mean Difference 0
df 8
t Stat 1.979280834
P(T<=t) one-tail 0.0415706
t Critical one-tail 1.859548038
P(T<=t) two-tail 0.083141199
t Critical two-tail 2.306004135
Decision Rule:
Here,
t Stat value=1.98,which is greater than t critical value= 1.86 therefore null hypothesis is
rejected.
Here p value is 0.04 which is greater than 0.05, so null hypothesis is rejected.
Problem Statement:
The following are the weights of 8 patients to test the effectiveness of the diet. The
weight are before the consumption of the diet and after the diet, you're required to
determine whether the diet was effective or not.
Hypothesis:
H0: Loss =0(The average weight loss 0)
Before After
162 168
170 136
184 147
164 159
172 143
176 161
159 143
170 145
Steps:
Go to Data >Data Analysis > t-Test: Paired Two Samples for Means
t-Test: Paired Two Sample for
Means
Before After
Mean 169.625 150.25
121.928571
Variance 65.125 4
Observations 8 8
-
0.17674777
Pearson Correlation 2
Hypothesized Mean Difference 0
df 7
3.70687337
t Stat 3
0.00379299
P(T<=t) one-tail 4
1.89457860
t Critical one-tail 5
0.00758598
P(T<=t) two-tail 8
2.36462425
t Critical two-tail 2
Decision Rule:
Here,
t Stat value=3.707 ,which is greater than t critical value= 2.364 therefore alternate
Here p value is 0.007 which is greater than 0.05, so alternate hypothesis [H1]
is accepted.
Inference:
There is enough evidence that the diet was effective seen as on an average basis.
Two Factor Anova: Without replication
Problem Statement:
To test whether or not marks of students differ w.r.t student and subject both.
Hypothesis:
H0: (Column Wise): there is no significant difference for 3 subjects, i.e, Economics, History and
Science.
H1: (Column Wise): there is significant difference for 3 subjects, i.e, Economics, History and
Science
Steps:
Row Wise:
Here p value is 0.86 which is greater than 0.05, so null hypothesis is accepted
Column Wise:
Here p value is 0.01 which is less than 0.05, so null hypothesis is rejected
Inference:
Row: There is enough evidence that marks of students do not differ significantly
Problem Statement:
Given are the ranges of 35 employees. You are required to determine whether or
not population mean age differs significantly from 23 years. Assume population
standard deviation as “5” and alpha “10%”.
Hypothesis:
Ho: µ=23
H1: µ≠23
α=10%
AGE DUMMY
25 0
21 0
21 0
20 0
30
22
20
20
23
18
21
23
21
20
21
22
24
24
19
23
22
24
21
19
24
22
19
22
25
Steps:
AGE DUMMY
Mean 21.94285714 0
Known Variance 25 0.0001
Observations 35 4
Hypothesized Mean Difference 23
z -1.250806408
P(Z<=z) one-tail 0.105502558
z Critical one-tail 2.326347874
P(Z<=z) two-tail 0.211005116
z Critical two-tail 2.575829304
Decision Rule:
If value of z stat is > z critical reject null hypothesis
If p value is ≤ α reject null hypothesis
Here, z stat is -1.25, z critical is 2.57 which is > z stat, therefore accept null hypothesis
Here, p value is 0.21, alpha is 0.01 which is < p value, therefore accept null hypothesis
Inference:
There is sufficient evidence that population mean age does not differ from 23
F-TEST
Problem Statement:
Determine whether or not there is a significant difference b/w variance of 2 data set.
Group Group
1 2
150 125
175 165
160 130
130 155
160 170
145 150
Steps:
i. Since the variance for Group1 is less than variance of Group 2, we will swap the ranges.
INFERENCE:
CONCLUSION:
CHI TEST
STEPS:
1. Calculate row total of row 2 3 4 and 5 . the row total are 213,164,147,180.
2. Calculate total of column b c and d . The col total are 225 233 246.
3. Calculate table total equal to total of row totals and col total equal TO 704
4. Calculate expected value for each observed value where the expected value is
equal to row.
Compare chi square stats with tabulated value of chi square at r-1 c-1 degrees of
freedom reject null.
FOUR PANES IN R
1. TOP RIGHT:
Environment and history window. The environment window contains objects (data,
values, functions) R has currently stored in its memory. The history window shows
all commands that were executed in the Console.
2. TOP LEFT:
Text editor or script window. This is where you can save and edit collections of
commands.
3. BOTTOM RIGHT:
Files, plots, packages, help, and viewer pane. Here you can open files, view plots,
install and load packages, read man pages, and view markdown and other
documents in the viewer tab.
4. BOTTOM LEFT:
Console or command window. Here you can type any valid R command after the
prompt followed by Enter and R will execute that command.
IMPORT OF DATA SHEET IN EXCEL:
Importing data into R is a necessary step that, at times, can become time intensive.
To ease this task, RStudio includes new features to import data from: csv, xls, xlsx,
sav, dta, por, SAS, SPSS and Stata files.
Steps:
Descriptive Statistics are used to describe the basic features of the data in a study.
Question:
Group 1 Group 2
150 125
175 165
160 130
130 155
160 170
145 150
STEPS:
>summary(all_tests_data_only$’Group1’).
CORRELATION
Correlation is a statistical technique that can show whether and how strongly
pairs of variables are related.
Question:
Group A Group B
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77
STEPS
>cor.test(all_tests_data_only$’GroupA’,all_tests_data_only$’GroupB’).
F-Test is used to assess whether the variances of two populations (A and B) are
equal.
Question:
Group 1 Group 2
150 125
175 165
160 130
130 155
160 170
145 150
Let Group 1 be µ1 and Group 2 be µ2.
Therefore,
H1: µ1 - µ2 ≠ 0
H0: µ1 - µ2 = 0.
INTERPRETATION:
Here, p value i.e., 0.7142 > alpha i.e., 0.05, thus accept null hypothesis.
INFERENCE:
Question:
Step 1:After importing the Excel Sheet apply the below mentioned function:
>chisq.test (table(chi_square$month,chi_square$observed)).