Precision, Accuracy, and
Resolution
Dave Tutelman --
December 23, 2007
Too many people use the terms "precision"
and "accuracy"
interchangeably. They shouldn't. Precision
and accuracy
are completely different concepts. Let's explore what they really mean,
and how to tell the difference. While we're at it, we will also throw in "resolution",
which is also too-often confused with precision.
This
article first distinguishes between resolution
and precision,
then between precision
and accuracy.
In each case, we will start with an example chosen to make the point
clear, then take one or two examples from clubmaking measurement to
show why it's important to clubmakers.
Precision vs Resolution
First the definitions:
- Resolution
is the fineness to which an instrument can be read.
- Precision
is the fineness to which an instrument can be read repeatably
and reliably.
There is a difference. Let's see it with
an example.
Here are two stopwatches. One is analog and the other is digital. Both
are manually actuated; this is an important point in the distinction.
First, let's look at the resolution
of the two stopwatches:
- The analog stopwatch has to be viewed on its dial. If
you look closely, you can relate the big hand to the smallest tick-mark
on the big dial. That tickmark is a tenth of a second. The best a good
eye can do is resolve a reading to 1/10 second, which is therefore the resolution of the
stopwatch.
- The digital stopwatch has two digits beyond the
seconds, so it subdivides time in hundredths of a second. Since it is
easy to read to 1/100 of a second, that is its resolution.
So there is a substantial difference between the watches in resolution
-- a power of ten, from 1/10 to 1/100 second.
What about the precision?
Precision is reliable, repeatable measurement. The total measurement
system includes the human that activates the watch in either case. And
experiments have shown that a human takes about 1/10 of a second to
react to a stimulus and turn it into a button press. So...
- The analog stopwatch has a precision of about
1/10 second. Both the resolution and the stimulus-response time of the
human are 1/10 second.
- The digital stopwatch also has a precision of 1/10 of
a second. This is a surprise! After all, the watch has a resolution of 1/100
second. But, because of the human reaction time, the hundredths digit
is not reliable. If you measured precisely-known elapsed times with
this arrangement, you would find the last digit's value to be almost
random. There is a spread of about 1/10 of a second in
the measured times due to the human factor. So it is repeatable
to only 1/10 second.
|
|
|
This raises an important point. The advent of digital instrumentation
gave rise to a mindset that equated resolution with precision. Digital
readouts make it very easy to see what the resolution of an instrument
was. Most people simply assume, "Hey, the guys who designed this made
it to read to five digits, so it must be good
to five digits!" Whatever "good" means. Resolution? Yes. Precision?
Well, maybe.
How about a few real-life examples from clubmaker's instruments.
Digital Scales
In my article about testing
digital scales, I warn about measuring the same weight twice
in a row. That is because some manufacturers of digital scales realized
that their precision was not up to their resolution. For instance, the
resolution might be 1 gram; but they were not able, without significant
added expense, to get the precision below 3 grams. Instead of cutting
the resolution to 3 grams -- which would also be honest but expensive-- and
potentially confusing -- they just left the resolution at one gram.
But they realized that customers might be annoyed by discovering this
unfortunate fact of life. So they came up with a "cheater circuit" that
recognizes if the load being weighed is within 3 grams (or maybe 5
grams, just to be safe) of the last thing weighed. If so, then just
display the previous answer, instead of what you actually got this time.
So, if you are trying to determine the repeatability of your scale, be
sure to "cleanse the palate" with a completely different weight between
weighings.
Frequency Meter
Typically, a clubmaker's frequency meter has a resolution of 1cpm. But
John Kaufman has made a version of his very successful Club Scout frequency meter,
with a resolution
of 1/10 of a cpm (0.1cpm). It is reasonable to ask, what is its precision?
John has assured us that he does indeed get repeatable readings to
0.1cpm, and I believe him. It isn't hard to build electronics to do
this. But I also believe him when he says that technique and setup must
be watched when you're trying to attain this precision. Think about
things like:
- The stability of the clamp, and the bench to which it is
mounted.
- The repeatability of the clamp to the same pressure each
time a shaft is clamped.
- The repeatability of your technique for pulling and
releasing the shaft.
- The security with which the tip weight (or clubhead) is
attached to the shaft.
Any of these could be perfectly good for a precision of 1cpm, but might
introduce fluctuating readings with a resolution of 0.1cpm. In order to
achieve the resolution that John has built into the electronics, your
clamp, bench, and technique must be good for 0.1cpm repeatability.
Accuracy vs Precision
It's not hard to wrap your mind around repeatability, which is
the difference between resolution
and precision.
The difference between precision
and accuracy
is correctness
-- and that is sometimes a little harder to cope with. To make it
easier, we'll use a very graphic example [1].
Imagine you have a rifle with a telescopic sight.
When you shoot with it, you get a pattern like the one at the left. Not
very good.
So you decide there is something wrong with your telescopic sight. You
get better optics -- sharper, and greater magnification. Does that
solve the problem? |
No it does not! You now have a much tighter
distribution. But, on average, you're just as far from the bull's eye.
The real problem was not that the scope did not
show the target well enough; the scope was aligned wrong.
One way of expressing it is, "You have greatly improved the precision,
but the accuracy did not get better." That is:
- The repeatability from shot to shot (precision) is much
better, but...
- The "correctness" (accuracy)
of the shots -- their distance from the bull's eye -- did not improve
at all.
|
OK, so we can improve precision without improving
accuracy. Does it work the other way, too? Can we improve accuracy
without improving precision?
We can, as this picture shows. If, instead of working on the optics of
the sighting scope, we had just aligned it properly, here's the pattern
we would have gotten. No improvement in precision, but plenty of
improved accuracy.
|
Finally, just to complete the picture, here's
high accuracy with high precision. This would result from working on
both the alignment and the optics. |
On to the promised examples from clubmaking instrumentation.
Digital Scales, but not just digital scales
Let's look at the sort of errors that affect the accuracy
of instruments, as opposed to precision
or resolution.
In this section, our model instrument will be a digital scale, but we
can apply the information to any instrument, even analog instruments.
Let's start by assuming that the resolution and precision of the
instrument is easily good enough for the job you have. Your concern now is that these
precise results reflect reality: that is, they are accurate.
Example: You have a digital scale that reads to 1 gram -- both resolution and precision. You can weigh a 100-gram standard weight and it reads the same value every time
to within a gram. But that's just precision. If that consistent, precise value is that you read is 105 grams, then the accuracy is 5
grams, five times coarser than the precision.
So what would accuracy errors look like?
The first and perhaps most
important error in any instrument is scaling error.
In the graph, a perfect instrument would be the heavy black line, a
straight line at 45º. That is, the measured value y -- the reading -- would be
the same as the actual value x.
For instance, if the actual clubhead being weighed is 198 grams, then
the reading of the scale is also 198 grams.
The blue line shows what happens to the measurement if the digital
scale has a scaling
error. The reading is different from the actual value, by
an amount proportional to the actual value. There are a few other ways
to say this with precision:
- The reading of the scale is related to reality by the
equation y=Kx.
For perfect accuracy, K=1.000.
For any other value of K
there is a scaling error.
- The reading of the scale is off by a constant percentage.
Not a constant amount -- we'll get to that later -- but a constant
percentage or proportion. For
instance, if K=1.02 instead of 1.000,
then the reading is always 2% high. This gives a 2-gram error when
weighing 100 grams, a 4-gram error when weighing 200 grams, etc.
How does this sort of error occur? It is usually a calibration
error.
This can affect either analog or digital scales. For instance:
- Analog
- Analog scales generally depend on a spring for measurement. The
spring stretches or compresses in response to a force (a weight); the
change in the length of the spring is measured and read as a
force from markings on the scale. In order to do the conversion
correctly, the scale manufacturer has to
know the "spring constant" of the spring -- the ratio of force to
elongation.
It is well known that standard springs have a 10% tolerance
on their spring constant. 10% is too big an error for any kind of
scale, so the scale manufacturer will pay extra for a lower-tolerance
spring. How much more money, for how much lower tolerance? That is what
will determine the scaling error for the analog scale.
- Digital
- Many digital scales today have a "calibration" feature that works
like this: You put a known accurate weight (specified and often
supplied by the manufacturer) on the scale and hit a "calibration"
button. It then determines the "K"
of the scale, and adjusts the readout so the scale behaves as if K=1.000. A
calibrated scale is only as accurate as weight you use to calibrate it.
Horrible example: An acquaintance of mine saw that his scale
required (according to the manufacturer's instructions) a 10-kilogram
weight for calibration. He didn't have an exact 10-kilogram weight in
the shop. But he knew that 10 kilograms is about 20 pounds, so he
borrowed a [very accurate, he was assured] 20-pound weight from the
fitness store next door, and used that to calibrate the scale. Well, 10
kilograms is actually 22.05 pounds. Yes, that's "about 20 pounds" --
with a 10% difference. So his "calibrated" (actually miscalibrated)
scale now had a 10% error. This high-resolution, very precise digital scale had an accuracy
problem of 10%. He weighed a 198g clubhead and got a
reading of 218 grams -- and he wondered why.
|
The next kind of error we will look at is offset. This occurs when every reading is high (or low) by a constant amount, or constant offset.
Offset
errors are very easy to eliminate in digital instruments (or electronic
instruments in general). Most electronic instruments have some sort of
"zero adjustment"; you provide the instrument with a zero input (e.g.-
no weight on the scale) and tell it "this is zero". Examples:
- Digital scales usually have a "Tare/Zero" button for exactly this purpose.
- Many
analog meters have a "zero adjust" knob or screw. With a zero input to
the instrument, turn the knob so that the output is zero. Now the
instrument has no offset error.
Zero adjustments assure
that there is no output with zero input; that is, the instrument is
perfectly accurate at zero. When this is adjusted properly, then any
accuracy problems are something other than offset.
But
offset errors can still creep in if we are not careful. In particular,
it is sometimes hard to identify a zero (or any other arbitrary)
"standard" to use as a zero adjust or tare. Case in point: A clubmaker
added a Wixey to his loft/lie
machine. (A Wixey is a digital angle gauge that can measure its own
orientation to 0.1º.) He concluded that he could now measure lie angle
to 0.1º. The problem was that, without the Wixey, there was no way to
tell lie angle to better than a half degree with his L/L machine. So
there was no way to position the Wixey on the L/L machine oriented
within 0.1º.
For instance, suppose we have a standard club that
we know is 60º, and use that to set the Wixey so it reads 60.0º.
That solves the problem, right? Well, maybe. Consider... how do we know
that the standard club is 60º? Because we measured it in another
machine. OK then, how accurate was this other machine? Was it a full
0.1º accuracy machine? If not, our 60º standard club might actually be
60.3º. If we use it to orient the Wixey on our machine, we have an
instrument with an offset error of 0.3º. It measures lie differences to 0.1º accuracy, but it will measure absolute lie with the same 0.3º error every time. |
The final common accuracy error is linearity error. It is often the hardest to avoid building in real-world instruments.
In
the graph, the red curve matches the black line for zero input; so this
instrument has no offset error. It also matches the black line
near the top-right of the graph. So this instrument does not have any
percentage error at that point; we can't accuse it of scaling error.
If
the actual response is perfectly accurate for at least two
[widely-separated] input values, but inaccurate for other values, then
the instrument's response curve cannot be a straight line. That's just
geometry. The perfect response curve y=x
is a straight line. Two points determine a straight line. So, if the
actual response curve matched at two points and was a straight line, it
would be the same straight line as the perfect response curve. Q.E.D.
If the response curve is not a straight line, then it is nonlinear as mathematicians and engineers would say. Inaccuracies of this type are referred to as nonlinearities.
We
have already pointed out that analog scales frequently have scaling
errors because of tolerances on the spring constant. They also often
have nonlinear errors. To prove to yourself where these errors might
come from, take a fairly flexible coil spring and start stretching
it. For a while the length increases in proportion to the force you
apply. You can see and feel that. But, at some point, the coils are
less flat and more angled. The spring is straightening out. Now
it takes a greater increase of force to get the same increase of
length. Eventually the spring is mostly straight, and you can apply a
lot more force with almost no additional increase in length.
This
results in a nonlinear response curve like the one in the graph. As the
spring stretches, its rate of length increase becomes less, and the
curve gets "flatter" as shown. |
A
digital scale has a different kind of linearity problem. The problem
stems from the necessity to convert a weight (an analog quantity) into a
digital number. This conversion process, called "quantization", is a necessary
function of most digital instruments.
Example: a 500-gram scale with a 0.1-gram resolution must quantize its input to one of 5000 values -- 0.0g through 499.9g.
The
circuitry to do the quantization must be manufactured very accurately. This
graph reveals a quantization circuit (D/A converter) with an inaccurate
electronic component converting one of the bits. (It happens to be the
second-most-significant bit for you binary number fans.) The error
shown in the graph is unusually large, but smaller errors of this kind
are not uncommon. That is why my article on digital scale testing stresses looking for nonlinearity errors.
|
Let me repeat that all these errors can apply to perfectly precise instruments. They are about accuracy problems, not precision problems.
Spine Finding
I have long objected
to bearing-based spine finders because, unlike FLO-based
systems, they find not the direction of the spine but an unpredictable
mix of spine and residual bend. (These are often referred to as "feel
finders".) More and more people are coming around to my point of view.
But...
I recently corresponded with a clubmaker who said he agrees that FLO is
the right way to find spine. Then he wrote,
"I have built a spine finder
like JB's spine tool. But I have added an extra feature to it. The 3rd
bearing, at the tip, is attached to a small scale (like a small fishing
scale) which in turn is attached to a plate that is glued to my
workbench. That way I can chart and mark every spine I find down to the
gram. The tool works fine as a feelfinder but I believe the scale adds
a little bit more measurability and science to it."
"Added measurability and science" implies an increase in accuracy. But
is it? The accuracy problem with the feel-finder is the direction that
it finds,
which is only occasionally the true spine. His instrument will probably find the
same direction every time with the same shaft -- making it precise. But, if that precise direction is not that of the spine or NBP, then the instrument has an accuracy problem.
Adding a scale
does not improve the direction one iota. The instrument still has an
error. It's the same error as before. But now we know some data about this wrong result to
within a gram. Is someone prepared to argue that is actually helping?
I might add that there are a quite a few instruments around that
advertise this same [erroneous] spine-finding feature. This includes
the Auditor, the FlexMaster, and the NeuFinder[2]. In each case, they
glorify a feel-finder with a meter which gives the impression of
greater accuracy. In fact, it is attributing useless precision to a
highly inaccurate measurement.
Real-world ExampleBefore closing, let me cite an example from
the real world -- my bathroom -- that illustrates perfectly the
difference between resolution, precision, and accuracy. I have a
digital bathroom scale on which I weigh myself every morning. It reads
to a half pound. Recently I saw the identical scale at a yard sale. The
price was low enough that I bought it to compare the two scales, if for
no other reason. There were identical calibration certificates on both
machines; here is what they looked like:
*** CALIBRATION CERTIFICATE ***
Linearity: PASSED Hysteresis: PASSED Resolution: 0.5 lbs. Quality Test: PASSED Model: MS --- 7 | Does this mean that the scale is accurate to 0.5 pounds? Let's look at this for the three measurement qualities:
- Resolution:
This says that the resolution is 0.5 pounds. My experience confirms
this. Not only is the display capable of it, but I have seen readings
only 0.5 pounds apart.
- Precision: In my experience the
readings are repeatable to the 0.5 pounds resolution. That either means
the precision is 0.5 pounds or there is a "cheater circuit" that makes
my test look good. I tested for a cheater circuit, and found that there
probably is one. (Note: a cheater circuit works by imposing memory --
"hysteresis" -- on the readings. The fact that it passed the test might
suggest that they test to see if the cheater circuit is working.)
- Accuracy.
The certificate says nothing about accuracy. (Well, it does include
linearity, which by now we know is a component of accuracy.) But I put
the scales side by side and weighed the same standard load -- me -- on
both. The two scales differed by 2.5 pounds.
So:
The resolution is undoubtedly 0.5 pounds as advertised and tested. The precision is hard to determine in the presence of the cheater circuit, but seems to be no worse than 1.0 pounds. The accuracy is 2.5 pounds at best. (Another scale of the same model may be off by even more.) ConclusionAs an instrumentation engineer going back many years, I am very aware
of the distinction among resolution, precision, and accuracy. I react
-- perhaps over-react[3]
-- to statements that reflect disregard for this distinction. I have
written this article so I have something to refer people to when I see
such statements.
Notes:
- I Googled to
see if my idea of accuracy vs precision is the general consensus. Not
only was there general agreement on the concept, but 3 of the first 5
sites I checked used the same example I do to demonstrate it -- the
scatter of shots at a target. At first I was taken aback; I thought and even hoped my
analogy was original. But then, in the spirit of "great minds think
alike", I realized I must have it right if everybody uses the same example.
- This applies to the
NeuFinder 2, and even the NeuFinder 4 if you use it
naively. The NeuFinder 4 also supports a differential deflection mode,
which gives accurate spine finding at the expense of considerable extra
work.
- OK, let me trot
out an anecdote from my past -- mostly to demonstrate how long this
issue has concerned me. When I took my PhD qualifying exam in 1969, the
format was a written exam on a Monday, and later in the week an oral
grilling from five professors -- who had seen what I wrote on Monday.
Obviously, the first thing they were going to quiz me about was the
areas where my written answers were shaky.
One of the examining
professors (I knew who my inquisitors would be well in advance) had a
thing about the difference between precision and accuracy. I didn't
quite agree with him on the distinction. I had sufficient
confidence that I knew the difference -- I had been designing
instruments for a decade at that point -- that I took a risk. When I
saw that question on the written exam, I deliberately wrote an answer
that challenged his point of view. I had therefore set the agenda for
the half hour that he had to question me in the orals. He did indeed
rise to the bait. We argued the point for fifteen minutes, half his
allotted time, after which he agreed that I was probably right. He
strongly recommended passing me on the test.
Last modified 10/3/2009
|