The correlation is a statistical measure, which indicates the extent to which two or more variables are fluctuating together. If one variable goes up, does the other variable also goes up or down with that?

Let’s start with a simple case where we have one dependent and one independent variable, and that is represented by Y = f (X).

- Y is dependent on X and
- X is the independent variable

The independent variable is also known as input or controllable variable. This is something which you can control, and you want to see the effect of that variable on the dependent variable.

To understand this let’s take a simple example of an exam. To score good marks in an exam you to spend more time. We want to test whether that is the reality or not by collecting some data. For that, you talk to a number of students and ask, how many of hours they studied for an exam and what is the percentage they achieved in that specific exam. With this data, you try to find out whether there is any relationship between these two variables are not.

Here in this example marks obtained is Y (dependent variable), as marks depend on hours of study. Hours of study is X (independent variable). You want to see the effect of numbers of hours studied on the marks obtained.

Here let’s realize that that marks obtained are not just dependent on the number of hours of study. Marks will depend on many other factors as well e.g. books this student has used, type of teacher this person had etc. But to keep this thing simple, we have taken one defendant (marks) and one independent (hours studied) variable here.

When you want to see the effect of one variable on another variable, you would need to take these three steps.

- The first thing you might want to do is plot a scatter diagram. Once you draw a scatter diagram that can give a good visual impression of how these two variables are connected.
- Once you see that there is some relationship between these two variables, then you can calculate the correlation coefficient.
- Then as a next step, you might want to formulate a regression equation, which will help us in finding the dependent variable by inputting the independent variable value.