Abstract

Conventional medicine treats laboratory tests as "within normal limits" or "abnormal," with no shades of grey in between. This paper explores the utility of using Z-scores to refine the diagnostic criteria for understanding the relationship between TSH, T4, T3, fT4, fT3, and rT3 in the practice of functional and naturopathic medicine.

Conventional Medical View of Reference Ranges

The conventional medical view taught and practiced by allopathic MDs is based on the premise that any lab test result "within the reference range" supports the view that a patient is "normal" and disease-free and, therefore, not a candidate for treatment. On the other hand, any lab test result "outside the reference range" supports the patient being diagnosed with a disease that can be assigned an ICD-10 diagnosis code, and therefore, the patient becomes a candidate for treatment, and the treatment may be subject to third-party payer reimbursement.

This paper will unpack the statistical basis for this view in the following paragraphs. We must discuss certain statistical concepts, including the null hypothesis, statistical significance, confidence intervals, and Boolean logic to do this.

Null hypothesis

The null hypothesis is the starting point of statistical analysis. It states,



There is no effect - there is no problem - prove me wrong.
In order to make a diagnosis that a patient has a disease, the physician must show that the null hypothesis has a statistically significant probability of being wrong. Note the emphasis on probability rather than proof. Statistics cannot provide proof - only varying degrees of certitude.

Rejecting the null hypothesis

In order to diagnose the presence of disease in a patient, statistical evidence must be presented that "it is unlikely" that a healthy patient will present with the particular lab test. In this case, we reject the null hypothesis and accept the alternate hypothesis that there is sufficient statistical certainty that there is an effect, problem, or disease.

Statistical significance

When the statistical certitude that the null hypothesis can be rejected reaches a certain threshold, then it is said that there is statistically significant evidence that the alternative hypothesis (that the patient is diseased) should be accepted. Typically, in medicine, there must be a statistical certitude of at least 95% that the null hypothesis can be rejected for the alternate hypothesis that the patient has an effect, problem, or disease to be accepted. Put another way, there is less than 5% certitude that the patient is "normal." This certitude is often expressed as "having statistical significance at p = 0.05").

Reference range

Various statistical techniques can estimate the reference range (the 95% confidence interval). The most straightforward approach is to collect 1,000 apparently healthy individuals and run the lab test on them. Results are expected to vary according to random chance. If the 1,000 lab values are sorted from lowest to highest, and the 25 lowest and 25 highest values are discarded, then the range of the remaining values will represent the "middle" 95% of the sample. The lowest and highest remaining values represent an estimate of the reference range. This procedure only gives a statistical estimate based on a sample of 1,000 individuals. If the process were repeated with a new group of 1,000 apparently healthy individuals, then a similar but not the same reference range is expected. A less repeatable reference range would be obtained if a smaller sample size were used (e.g., 200 healthy individuals and "trim" off the highest 5 values and the lowest 5 values). A larger sample size would give more repeatable results. The method described here is non-parametric, which means that it does not depend on any assumption of special properties of the data, such as "normal distribution" (see below). In practice, labs may base their reference ranges on statistical techniques that assume the data follows a normal distribution; these reference ranges are only reliable if the statistical assumptions made are valid. Regardless of how the reference range is obtained, it is treated as a uniform (rectangular) distribution in which any lab test value within the reference range equally satisfies the null hypothesis, and any value outside the reference range equally rejects the null hypothesis, which leads to all-or-nothing decision making, as described below.

Boolean logic

Boolean logic is a system of decision-making that is based on "true" and "false" with no intermediate "shades of gray." The conventional medical view taught and practiced by allopathic MDs uses Boolean logic to reason with test results. A given test result is classified as either "in the reference range," which means that it is FALSE that the patient has a problem, or "outside the reference range," which means that it is TRUE that the patient has a diagnosable problem. If the reference range is 0.50 to 4.50, for example, a patient with a test result of 0.51 is treated the same as a patient with a test result of 4.49 - i.e. is "within normal limits" and therefore has no diagnosis of disease. The patient is put into a "box" 4 units wide. Just a smidge to the left or right would cause the patient to fall out of the box and be classified as "abnormally low" or "abnormally high." This view may cause patients near the edges of the box who would benefit from treatment to be told that they are "normal" and to be denied treatment.

Functional Medical View of Reference Ranges

One approach to improving on the conventional medical view as taught and practiced by allopathic MDs is to acknowledge that there are shades of gray between "true" and "false" so that a patient in the above example with lab values of 0.51, 2.5, and 4.4 would be classified as "low-normal," "mid-normal," and "high-normal," respectively. Clinical decision-making can, therefore, be more precise, but the rules for clinical decision-making become more complicated than the conventional approach.

One approach to this problem is introducing the idea of "fuzzy logic." While "fuzzy" may sound like a disdainful term, it represents an acknowledgment that in the real world, decisions must be made with incomplete information of variable reliability. To implement fuzzy-logic decision-making, we can transform the patient's lab value into a Z-score in the same manner as is commonly done for DEXA scan reports of bone density. This Z-score can represent all shades of meaning from "low" (outside the reference range on the left) through "high" (outside the reference range on the right). To understand Z-scores, we must discuss certain statistical concepts, including the central limit theorem, population mean, standard deviation, and Z- transformation.

Central limit theorem

The distribution of many measured values subject to random independent variations tends to follow a "Gaussian," "normal," or "bell-shaped" distribution that approximates a binomial distribution. In particular, if the lab test results of many people are affected by random individual variation, then a graph of the values will be approximately normally distributed, and standard statistical techniques are applicable. In this view, the null hypothesis is most likely to be satisfied at the center of the "hump" of the bell curve and is progressively less likely to be satisfied as the patient's lab value moves left or right toward the "tails" of the bell curve.

This bell-shaped distribution assumption is not a perfect reflection of reality. However, it is a better approximation of reality than the uniform rectangular distribution implied by the conventional approach used by allopathic MDs as described above. As will be discussed below, fuzzy medical decision-making based on the assumption of a normal distribution of test results (e.g., statements like the patient's lab has a Z-score of -1, which means we are about 68% certain that the patient's lab value is abnormal) is expected to be more precise than Boolean medical decision-making using the "in the box/out of the box" approach of assuming a uniform rectangular distribution (which for that same patient we would say "the patient is normal)."

Population mean and standard deviation

Any Gaussian distribution curve can be characterized by the population mean (μ) and the standard deviation (σ). The population mean is the value that represents the top of the hump of the bell curve; the standard deviation is a measure of "how wide" the bell curve is. Without delving into all the mathematics, it can be shown that there is a simple approximate relationship between the reference range described in the conventional medical view and the population mean and standard deviation:

Let L and U represent the lower and upper bounds of the 95% reference range obtained as above (when applied to data that follows an approximately normal distribution).

Then the population mean = μ = (L+U)/2, and the standard deviation = σ = (U-L)/2 .

Z-scores

It is convenient to convert (transform) a patient's lab values into Z-scores.

The following formula converts the measured lab value (denoted V) to its corresponding Z-score (denoted Z):



Z = 2 * (V - μ) / σ

These transformed values have the following convenient properties according to the 68-95 rule:

  • Z = 0 means that the patient's lab value is in the middle of the reference range and is most likely normal (the patient is normal from an allopathic perspective);
  • Z < -2 means that the patient's lab value is lower than the 95% reference range - reject the null hypothesis at a level of p=0.05 (we are more than 95% certain the patient has a diagnosable disorder from an allopathic perspective);
  • Any Z value between -2 and +2 lies within the 95% reference range (within 2 standard deviations of the middle) - accept the null hypothesis at a level of p=0.05 (we are less than 95% certain the patient has a diagnosable disorder, so the patient is considered normal from an allopathic perspective);
  • Z > +2 means that the patient's lab value is greater than the 95% reference range - reject the null hypothesis at a level of p=0.05 (we are more than 95% certain the patient has a diagnosable disorder from an allopathic perspective);
  • Z-scores are real numbers with a continuum of values representing shades of gray (naturopathic and functional medical perspective)- not just true and false (allopathic medical perspective).

Extending the power of Z-Scores

An advantage of Z-scores is that the reference range is always from -2 to +2, so it is easy to tell where a lab value lies relative to the reference range (low, normal, high). Even more powerful, since Z-scores are continuous, degrees of belief in the null hypothesis can be expressed by intermediate values. If we assume that the test data is approximately normally distributed (which follows from the Central Limit Theorem of statistics), then given the 95% upper and lower bounds of the test data (L and U) and the patient's test value (V), then we can calculate a Z value as follows. For example, consider the case of TSH (reference range = 0.45 to 4.5) and a measured lab value = 3.5, as follows:


Example:

L = 0.5, U = 4.5, and V = 3.5; then

Z = 2 * (V - ?) / ? = 2 * (3.5 - 2.5) / 2 = +1.0

I.e., the patient's lab value is 1 standard deviation higher than the mid-range.

Based on the 68-95 rule, we are 68% certain the null hypothesis can be rejected, which means we are 68% certain the patient has a problem that deserves intervention. Do we wait until we are more than 95% certain the patient has a problem, or do we begin mild interventions sooner rather than wait for the patient to cross the line into 95% certainty of abnormality? Where do we draw the line between intervention and watchful waiting?

Assumptions

The following assumptions are more or less accurate - they are not perfect. However, their usefulness is highlighted by a short story:


Two hikers in the woods encountered a bear, which began to chase them. As they ran, the first hiker gasped, It is no use - we cannot outrun the bear! To which the second hiker grunted, I do not have to outrun the bear - I only have to outrun you!

The story's moral is that the analysis presented here better approximates reality than conventional allopathic medicine, even if imperfect. Therefore, we expect a better, if not perfect, patient response to treatment.

Optimality assumption

In the absence of any specific information to the contrary, we assume that the optimal value for a lab value is the center of the reference range, where Z = 0. In other words, the center of "normal" = "optimal." For example, in the thyroid system, a patient is in optimum balance when Z(TSH) = Z(T4) = Z(fT4) = Z(T3) = Z(fT3) = Z(rT3) = 0.

Conversion proportionality assumption

In the absence of any specific information to the contrary, we assume that for a process in which one precursor is converted through one or more steps to a product, if the conversion pathway proceeds at a normal rate, then the Z value of the precursor should be proportional to the Z value of the product. Note that if Z(precursor) > Z(product), it may imply either an abnormally active subsequent step is siphoning off the product or the conversion is impaired. For example, in the thyroid pathway, Z(T4) should equal Z(T3) and also equal Z(rT3) in the case of "normal rate of conversion."

Production proportionality assumption

In the absence of any specific information to the contrary, we assume that if a process in which a control substance causes the production of a product proceeds at a normal rate, the Z value of the control substance should be equal to the Z value of the product. For example, in the thyroid pathway, Z(TSH) should equal Z(T4).

Applications

See more details Go to herehere

Consider the thyroid metabolic pathway below, with Z-values for lab tests as shown:


Z(TSH) = +3
Z(T4) = +1
Z(T3) = -3
Z(rT3) = 0

This patient is hypothyroid, but why?

  • Z(TSH) > Z(T4), so we suspect the thyroid gland is underperforming.
  • Z(T4) > Z(T3), so we suspect that conversion from T4 to T3 is underperforming.
  • Z(T3) < Z(rT3), so we suspect the rT3 pathway dominates rather than the T3 pathway.
All three of these issues need to be addressed in the treatment plan.

Now consider:


Z(TSH) = -1.9
Z(T4) = -1.5
Z(T3) = -1.5
Z(rT3) = +1.9

By conventional standards, this patient is euthyroid but has clinical symptoms. Why?

  • Z(TSH) < Z(T4), so there is no evidence that the thyroid gland is underperforming - it is not being stimulated.
  • Z(T4) = Z(T3), so there is no evidence of a problem converting T4 to T3.
  • Z(T3) << Z(rT3), so we suspect that the rT3 pathway is dominating rather than the T3 pathway.
  • Z(TSH) << Z(rT3), so we suspect that rT3 is suppressing TSH via negative feedback.
In this case, we need to reduce rT3, either by supplementing with exogenous T3 or nutritional support for the endogenous conversion of T4 to T3 (e.g., selenium and other micronutrients).

Extension to Fuzzy Logic and Probabilistic Reasoning

Since a Z score of ±1 corresponds to a 68% probability that there is an effect (the null hypothesis fails), and a Z score of ±2 corresponds to a 95% probability that there is an effect (the null hypothesis fails), we can extend our reasoning to propose that if the difference between two Z scores (e.g. Z(TSH) and Z(T4)) equals 1, then there is a 68% probability (P) that the difference is statistically significant. Similarly, if the difference equals 2, there is a 95% probability (P) that the difference is statistically significant.

Boolean logic is used in the conventional medical view of reference ranges to reduce clinical decision-making as to whether it is true or false that each lab value is within the reference range. The functional medical view of reference ranges allows the comparison of non-binary values of different lab parameters using Z-scores. While this is an improvement, it suffers from the limitation that while Z-scores Z(a) and Z(b) can be compared to establish that Z(a) is less than, equal, or greater than Z(b), the significance of the comparison is not defined.

The next step in the development of the theory of thyroid statistics is to develop statistical functions that convert comparisons of Z values into probability functions. For example, if Z(a) = -1 and Z(b) = +0.5, what is the probability that Z(a) < Z(b)? What is the probability that Z(a) = Z(b)? What is the probability that Z(a) > Z(b)? What is the probability that Z(a) is less than the reference range? What is the probability that Z(a) exceeds the reference range? These probabilities are non-zero, but some are much smaller than others.

Based on the assumption that our reference range has a normal distribution, standard statistical calculations should allow all of these functions to be developed using the Excel function NORMSDIST function, which calculates the Standard Normal Cumulative Distribution Function for a supplied value.

What if the Probability Density Curve is not Normal?

Standard statistical methods depend on the assumption that the probability density function (PDF) is a normal bell curve. What if the PDF is distorted by skewness, kurtosis, or other problems? Numerical mathematics can transform these PDFs into normal forms. However, the biggest problem is that raw data is generally unavailable to determine the shape of the distorted PDFs. Presumably, the labs providing the tests have the required data, which they used to establish their reference ranges. However, can we find labs that are willing to share this information?

This is a Draft for Public Comment

Please send comments and constructive feedback to orville2@DrWeyrich.com

Help wanted! (added 01/12/2025)

I need to get access to anonymized thyroid lab data for TSH, F4, T3, freeT4, freeT3, and reverseT3 (or organic acid test data) suitable for determining reference ranges, mean, standard deviation, and evaluating possible deviations from normality. Is there anyone having access to such data that is open to collaborating with me?

Going a step further, it would be even better for me to get access to paired data records containing data for multiple parameters for each (anonymous) individual, possibly including limited demographic data such as sex, age, nutritional sataus, and possibly week of pregnancy in order to evaluate correlations.

Literature Review

Additional reviews will be added as they become available.

References