Understanding standard deviation: Exploring the formula and Its applications in data analysis

Written May 9, 2025, by Jeremy Moser

Haley, an HR professional, is tasked with analyzing employee satisfaction survey results from hundreds of staff members. The data is overwhelming, and she needs to present insights that are clear, actionable, and easy to digest. 

But how can she make sense of all the numbers without getting lost in the noise? This is where the concept of standard deviation becomes invaluable. Data analysts like Haley face the challenge of distilling vast volumes of data into meaningful insights. They need concise, easy-to-compare measures that effectively represent the core characteristics of large datasets.

In this article, we’ll explore how understanding and applying the standard deviation formula can help you unlock deeper insights and make more informed decisions.

One fundamental tool in a data analyst’s arsenal is the standard deviation.

What is the standard deviation?

Standard deviation (SD) is defined as the square root of a dataset’s variance. It’s an essential measure in descriptive statistics that shows how spread out the individual data points are from that dataset’s mean (average) value. 

In simpler terms, it quantifies a data point’s average distance from the distribution center.

For example, in our example of Haley, an HR consultant, let’s say she’s just collected responses from a large training needs assessment. She has a lot of data, but she needs a way to figure out how “spread out” the responses are. That’s where standard deviation comes in.

In simple terms, it tells her how much each individual response (like a rating on training topics) differs from the average or “center” of the data. If the standard deviation is small, it means most responses are pretty similar and close to the average. But if it’s large, the responses vary a lot, and there’s more diversity in opinions or needs.

For example, if most employees think the same training programs are necessary, the standard deviation will be small. If some employees think certain skills are essential while others feel they need entirely different training, the standard deviation will be larger.

So, as an HR consultant, understanding standard deviation helps Haley see how consistent or varied her employees’ training needs are, making it easier to design a program tailored to the group.

What does standard deviation measure?

Standard deviation is a measure of variability or dispersion within a quantitative dataset

While sample and population variances also measure this spread, the standard deviation is expressed in the same units as the original data in the statistical population, making it far easier to interpret.

  • bullet orange 150x150 1

    A high SD indicates data points are widely scattered around the mean, suggesting greater diversity or volatility in the dataset. 

    Let’s stick with Haley, the HR consultant, and her training needs assessment. Imagine she collects survey data on the employees’ preferences for specific types of training programs, such as leadership, communication, and technical skills.

    After calculating the average rating for each training topic, Haley finds that the standard deviation for leadership training is high. This means that while some employees rated leadership training as crucial, others felt it wasn’t necessary at all. There’s a significant spread in how people view the importance of leadership training, indicating greater diversity or volatility in the responses.

  • bullet orange 150x150 1

    A low SD indicates data points are clustered closely around the mean, suggesting a more uniform and predictable dataset.

    Returning to Haley, the HR consultant, let’s say she gathered responses from employees on their training needs. After analyzing the data, she finds that the standard deviation for technical skills training is low. This indicates that most employees rated technical training programs similarly, with responses clustering around the average score.

    For Haley, this low standard deviation means she can confidently focus on offering a standard technical training program, knowing that most employees will benefit from the same content. There’s less need for customization, as the data shows a shared understanding and agreement about what training is most valuable.

By understanding ‌standard deviation, we gain valuable insights into the consistency, predictability, and risk associated with the data we’re analyzing.

Sample vs. population standard deviation in data analysis (S vs. σ)

Standard deviation is a measure of variability or dispersion within a quantitative dataset

While sample and population variances also measure this spread, the standard deviation is expressed in the same units as the original data in the statistical population, making it far easier to interpret.

  • bullet orange 150x150 1

    When you have complete information about every individual in a group or dataset, you can calculate the population standard deviation, denoted by σ (the Greek letter sigma).

  • bullet orange 150x150 1

    When you’re working with a subset of the population and want to estimate the standard deviation of the entire population, you use the sample standard deviation denoted by S.

It’s important to note that it’s common for people to use S and σ interchangeably, but they’re not the same. When someone doesn’t specify which SD they mean, they usually refer to S, even if they use the symbol σ.

Standard deviation equations

There are two standard deviation equations, one for populations and one for samples. However, depending on whether you expand and simplify that formula, each equation can be written out in two ways.

Population standard deviation formula (σ)

The most straightforward way to calculate standard deviation is with its mathematical definition as the square root of the population variance:

Population standard deviation formula

Equation 1: Population standard deviation formula (Image source: made by author)

Where:

  • σ is the population standard deviation
  • Σ denotes the sum of…
  • xi is each data point
  • μ is the population mean
  • N is the total number of data points

Expanded population SD formula

We can expand and simplify the above formula to get a second, more computationally efficient way to calculate σ:

Population standard deviation expanded formula

Equation 2: Population standard deviation expanded formula (Image source: made by author)

The variables are the same as above. The most notable difference is that we don’t need the population’s mean value (μ).

Sample standard deviation (S)

For the sample, the SD formula is very similar, with the difference that we subtract 1 from the denominator inside the square root:

Sample standard deviation

Equation 3: Sample standard deviation formula (Image source: made by author)

Where:

  • S is the sample standard deviation.
  • x̄ is the sample mean.
  • n is the sample size or number of data points
  • The other symbols are the same as above.

Expanded sample SD formula

As before, there’s an expanded version of the sample SD formula:

Sample standard deviation expanded formula

Equation 4: Sample standard deviation expanded formula (Image source: made by author)

The variables are the same as before.

Coefficient of variation (CV)

Sometimes, expressing standard deviation as a percentage of the mean is helpful. Doing so gives us the relative standard deviation, a.k.a, the coefficient of variation (CV):

CV = (σ / μ) * 100%        or        CV = (S / x̄) * 100%

Interpreting SD results

The standard deviation isn’t just a number‌ — ‌it’s a powerful tool for drawing meaningful conclusions from your data. Here’s how you can interpret standard deviation in different contexts:

  • Comparing datasets: If two datasets have the same mean but different standard deviations, the one with the higher standard deviation has more variability.
  • Identifying outliers: Data points that fall more than two or three standard deviations from the mean are often considered outliers and may warrant further investigation.

    Let’s say Haley, the HR consultant, analyzes the results of a training needs survey. After reviewing the data, she notices the average rating for communication skills training is around 4 out of 5, with a low standard deviation indicating most employees are in agreement about its importance.

    However, when she looks more closely, she spots a couple of responses far outside the general trend. One employee rated communication training as a 1 (very low importance), while another gave it a perfect 5 (extremely important), even though the majority of responses are clustered around the 4.

    This is where standard deviation helps. These unusually low and high ratings are considered outliers, as they fall far away from the average score. With the standard deviation, Haley can easily identify these extreme data points that don’t align with the overall pattern.

    By flagging these outliers, Haley can dive deeper into understanding why these individuals have differing views. It could highlight specific needs or concerns that may not be immediately obvious to the rest of the group. Maybe the employee who rated communication training as a 1 has a different job function where communication isn’t as critical, or the one who gave it a 5 might be in a customer-facing role where communication skills are vital.

    Using standard deviation to identify these outliers allows Haley to address individual concerns or explore specific areas where training programs might need further refinement.

  • Process control: In manufacturing or quality control, standard deviation helps track process variability and identify deviations from desired standards.
  • Financial analysis: Standard deviation measures the volatility and deviation of stock returns, bonds, and other financial instruments.

Strengths and weaknesses of standard deviation as a measure of dispersion

Like any statistical tool, standard deviation has its strengths and weaknesses.

Strengths of SD for data analysis

  • Widely used and understood: Standard deviation is a well-established measure used across numerous fields, making it easy to communicate your findings.
  • It helps identify outliers: It reflects the impact of extreme values, providing a more complete picture of the data’s variability.

Weaknesses of standard deviation for data analysis

  • Assumption of normality: It’s most effective when the data follows a normal distribution. Other measures may be more appropriate for skewed or non-normal probability distributions
Normal distribution

Source: https://integratedmlai.com/normal-distribution-an-introductory-guide-to-pdf-and-cdf/

  • Sensitivity to outliers: While this is a strength, it can also be a weakness if extreme values are due to errors that skew the data.
Score

Calculating the standard deviation – An example

To illustrate the practical application of statistical analysis using standard deviation, let’s do an example calculation of SD in data center management.

Standard deviation is a powerful tool that could be applied for data center management and optimization. This measure serves software like Nlyte to analyze: 

  • Identify servers consuming excessive or insufficient power
  • Analyze power usage and consumption across servers
  • Monitor for temperature fluctuations
  • Understand workload distribution

Let’s consider a scenario where we’re monitoring the real-time power consumption (in watts) of 20 servers in a data center:

Server 

Power
Consumption
(Watts)

Server 

Power
Consumption
(Watts)

1

350

11

350

2

365

12

385

3

340

13

340

4

380

14

395

5

355

15

365

6

370

16

370

7

345

17

345

8

390

18

390

9

360

19

355

10

375

20

380

 

Now, let’s calculate the SD using both versions of the equations outlined above. As is normally the case, we’ll use equations 3 and 4 for the sample standard deviation.

How to calculate standard deviation the usual way‌

If you want to practice using the SD equation for a dataset like the one shown above, here’s what you must do:

Step #1: Calculate the mean (x̄)

Add all power consumption values and divide by the total number of servers (20). Since this is a sample, x̄ will denote this mean:

x̄ = (350 + 365 + … + 380) / 20 = 365.3

Step #2: Calculate the deviations from the mean (xi-x̄)

Subtract the mean from each power consumption value:

x1 – x̄ = 350 – 365.3 = -15.3

x2 – x̄ = 365 – 365.3 = -0.3

x20 – x̄ = 380 – 365.3 = 9.8

Step #3: Calculate the squared deviations

Square each of the differences you just calculated to get the squared deviations. Here’s what we ‌have so far:

Server (i)

xi

xi – x̄

(xi – x̄)²

1

350

-15.3

232.6

2

365

-0.3

0.1

3

340

-25.3

637.6

4

380

14.8

217.6

5

355

-10.3

105.1

6

370

4.8

22.6

7

345

-20.3

410.1

8

390

24.8

612.6

9

360

-5.3

27.6

10

375

9.8

95.1

11

350

-15.3

232.6

12

385

19.8

390.1

13

340

-25.3

637.6

14

395

29.8

885.1

15

365

-0.3

0.1

16

370

4.8

22.6

17

345

-20.3

410.1

18

390

24.8

612.6

19

355

-10.3

105.1

20

380

14.8

217.6

 

Step #4: Find the average of the squared differences

Add up all the squared differences (in the last column), and divide by the total number of servers minus 1 (if you’re calculating σ, you don’t need to subtract 1). The result is the sample variance, S²:

(-15.3)² + (-0.3)² + … + (14.8)² = 5873.8

S² = 5873.8 / (20 – 1) = 309.1 W²

Step #5: Take the square root of the variance

S = √(S²) = √309.1 = 17.6 W

How to calculate sample standard deviation the easy way: a step-by-step guide

Now, let’s do the same calculation but use the simplified expanded formula to see how it makes calculation easier.

Step #1: Square each value

x1²  = 350² = 122,500

x2²  = 365² = 133,225

x20²  = 380² = 144,400

Step #2: Find the sum of the squares

Add up all the squared values:

Σxi² = 122,500 + 133,225 + … + 144,400 = 2,674,025 W²

Step #3: Find the sum of the original values

Σxi = 350 + 365 + … + 380 = 7,305 W

This is what we would have so far:

Server (i)

xi

xi²

1

350

122,500

2

365

133,225

3

340

115,600

4

380

144,400

5

355

126,025

6

370

136,900

7

345

119,025

8

390

152,100

9

360

129,600

10

375

140,625

11

350

122,500

12

385

148,225

13

340

115,600

14

395

156,025

15

365

133,225

16

370

136,900

17

345

119,025

18

390

152,100

19

355

126,025

20

380

144,400

Sum

7,305

2,674,025

 

Step #4: Apply the expanded formula

Here again, we’ll use (n – 1) as the denominator inside the square root since we’re using a sample:

S = √ {[2,674,025 – (7,305)²/20)]/(20 – 1)} = 17.6 W

Both methods yield the same result, but the latter requires roughly half the calculations.

Wow customers with automated, scored reports​

 

Here’s a quick introduction on how Pointerpro works, brought to you by one of our product experts, Chris.

This is what clients say about us:​

Sample vs. population standard deviation in data analysis (S vs. σ)

Standard deviation is a measure of variability or dispersion within a quantitative dataset

While sample and population variances also measure this spread, the standard deviation is expressed in the same units as the original data in the statistical population, making it far easier to interpret.

Case study: HR drives store performance

A large restaurant chain in the midst of failure asked a team of consultants for help to determine why performance was down and how they could improve. Data collection was nonexistent, so the consultants created a survey which focused on three key outcomes:

  • Customer satisfaction
  • Employee retention
  • Customer count

The business distributed an engagement survey that:

  • Linked employee outcomes to their real business outcomes
  • Prioritized the factors that had the largest impact on business outcomes
  • Showed the business impact of improvements in these factors
  • Focused front-line managers on the factors that showed the largest impact
Use case of standard deviation in HR

They found six factors contributed the most to business improvement and success:

  • Ethics
  • Teamwork
  • Job fit
  • Senior leaders
  • Communication
  • Management

If the restaurant owners focused on promoting employees who scored a four or higher in these six characteristics, they could expect the following improvements in the three key business outcomes:

  • 16% increase in customer satisfaction
  • 18,000 more customers per year
  • 10% less staff turnover

Standard deviation is one of many tools in your data analysis belt

Because of its significance for the normal distribution function that models many real-world datasets, ‌standard deviation is the most widely used measure of dispersion in statistical analysis. However, it’s important to remember that it’s just one tool among many. 

Other measures of dispersion include:

  • Variance (a measure of average squared variances)
  • Range of values (the difference between the greatest and lowest value. A wider range means more dispersion)
  • Interquartile range (often abbreviated as IQR, which is the spread of the middle 50% of the data)
  • The mean absolute deviation (MAD), and others.

The most effective data analysts have a comprehensive understanding of various statistical measures and know when to apply each one to gain the most profound insights.

Create your own assessment
for free!

Recommended Reading

Want to know more?

Subscribe to our newsletter and get hand-picked articles directly to your inbox

Please wait..
Your submission was successful!

About the author:

Jeremy Moser

Jeremy is co-founder & CEO at uSERP, a digital PR and SEO agency working with brands like Monday, ActiveCampaign, Hotjar, and more. He also buys and builds SaaS companies like Wordable.io and writes for publications like Entrepreneur and Search Engine Journal.