**Statistical Glossary**

**Statistical Glossary**

A

**Absolute value**

Represents a value by its magnitude (size) alone, without considering its sign (negative or positive).

**Accelerated life test**

Models product performance (usually failure times) at elevated stress levels so that you can extrapolate the results back to normal conditions. The goal of an accelerated life test is to speed up the failure process to obtain timely information on products with a long life.

**Acceptable quality level (AQL)**

The poorest level of quality from a supplier's process that would be considered acceptable as a process average.

**Acceptance region plot**

In variables acceptance sampling, when both specifications are given and the standard deviation is unknown, this plot allows you to see the region of sample means and sample standard deviations for which you will accept a lot.

**Acceptance sampling**

Inspection plan that enables you to accept or reject a particular lot of incoming material based on the data from a representative sample.

**Accuracy and precision**

Two categories of measurement error. Accuracy refers to how close measurements are to the "true" value, while precision refers to how close measurements are to each other. Accuracy and precision are usually evaluated through various measurement system analysis tools, such as Gage R&R Studies.

**Accuracy Measures**

Use these statistics to compare the fits of different forecasting and smoothing methods (time series analysis).

**Arccosine**

Determines the angle corresponding to a given cosine. Arccosine is defined in radians from 0 to π.

**Anderson-Darling statistic**

Measures how well the data follow a particular distribution. The better the distribution fits the data, the smaller this statistic will be.

**Additive model**

A data model in which the effects of individual factors are differentiated and added together to model the data.

**Adjusted p-value**

Use for multiple comparisons in General Linear Model ANOVA, the adjusted p-value indicates which factor level comparisons within a family of comparisons (hypothesis tests) are significantly different. If the adjusted p-value is less than alpha, then you reject the null hypothesis. The adjustment limits the family error rate to the alpha level you choose. If you use a regular p-value for multiple comparisons, then the family error rate grows with each additional comparison. The adjusted p-value also represents the smallest family error rate at which a particular null hypothesis will be rejected.

**Adjusted residuals for counts**

Adjusted residuals are the raw residuals divided by the estimated standard deviation of the observed count.

**Adjusted R2**

Percentage of response variable variation that is explained by its relationship with one or more predictor variables, adjusted for the number of predictors in the model. This adjustment is important because the R2 for any model will always increase when a new term is added. A model with more terms may appear to have a better fit simply because it has more terms. However, some increases in R2 may be due to chance alone.

**Alias structure and confounding**

Effects that are aliased, or confounded, cannot be estimated separately from one another. Confounding occurs when you use a fractional factorial design, because you do not run all factor level combinations.

**Alpha (a)**

Used in hypothesis testing, alpha () is the maximum acceptable level of risk for rejecting a true null hypothesis (type I error) and is expressed as a probability ranging between 0 and 1. Alpha is frequently referred to as the level of significance.

**Alpha for axial points**

The distance of each axial point (also called star point) from the center in a central composite design. This value (), along with the number of center points, determines whether a design can be orthogonally blocked and is rotatable.

**Alternative Hypothesis: H1 or HA**

States that the population parameter is different than the value of the population parameter in the null hypothesis.

**Analysis of covariance (ANCOVA)**

An extension of analysis of variance (ANOVA) that allows you to model and adjust for input variables that were measured but not randomized or controlled in the experiment. ANCOVA tests whether factors have an effect after removing the variance due to covariates.

**Analysis of means (ANOM)**

A graphical analog to ANOVA that tests the equality of population means. The graph displays each factor level mean, the overall mean, and the decision limits. If a point falls outside the decision limits, then evidence exists that the factor level mean represented by that point is significantly different from the overall mean.

**Analysis of variance (ANOVA)**

Tests the hypothesis that the means of two or more populations are equal. ANOVAs evaluate the importance of one or more factors by comparing the response variable means at the different factor levels.

**Analysis of variance (ANOVA) table**

The main output from an analysis of variance study arranged in a table. Lists the sources of variation, their degrees of freedom, the total sum of squares, and the mean squares. The analysis of variance table also includes the F-statistics and p-values.

**Analyze variability**

Identifies factor settings that produce less variable results in 2-level factorial designs with repeat or replicate measurements.

**Average outgoing quality (AOQ) curve**

Approximates the relationship between the quality of the incoming material and the quality of the outgoing material, assuming that rejected lots will be 100% inspected and defective items will be reworked and inspected again (rectifying inspection).

**Area graph**

Use to evaluate contributions to a total over time. Area graphs display multiple time series stacked on the y-axis against equally spaced time intervals on the x-axis. Each line on the graph is the cumulative sum so one can see each series' contribution to the sum and how the composition of the sum changes over time.

**Arithmetic Mean**

Describes an entire set of observations with a single value representing the center of the data. The mean (arithmetic average) is the sum of all the observations divided by the number of observations.

**ARIMA**

A time series model ARIMA used to model time series behavior and to make forecasts.

**Average total inspection (ATI) curve**

Approximates the relationship between the quality of the incoming material and the number of items that need to be inspected, assuming that rejected lots will be 100% inspected and defective items will be reworked and inspected again (rectifying inspection).

**Attribute**

Refers to a quality characteristic that meets or does not meet product specification. These characteristics can be categorized and counted.

**Attribute agreement analysis**

Use to evaluate the agreement of subjective nominal or ordinal ratings by multiple appraisers and answer how likely your measurement system will misclassify a part. Also commonly known as Attribute Gage R&R Study.

**Attribute gage study**

Use to examine the bias and repeatability of an attribute measurement system.

**Attributes charts**

Control charts that plot nonconformities (defects) or nonconforming units (defectives). A nonconformity refers to a quality characteristic and nonconforming refers to the overall product.

**Autocorrelation**

Autocorrelation is the correlation between observations of a time series separated by k time units. The plot of autocorrelations is called the autocorrelation function or ACF.

**Average p**

The proportion of items in your process that are defective, assuming you have collected enough samples to have a stable estimate. The center line of a P chart is drawn at the Average P.

**Average total inspection (ATI) curve**

Approximates the relationship between the quality of the incoming material and the number of items that need to be inspected, assuming that rejected lots will be 100% inspected and defective items will be reworked and inspected again (rectifying inspection).

B

**Balanced design**

A balanced design is one with equal numbers of observations at each combination of your treatment levels.

**chart**

Use to visually compare bar heights of category measures. Bar charts can be made of category tallies, of different statistics by categories, or of summary values. The height of the bars signifies the magnitude of the values.

**Bartlett's test**

Common tests of equal variance, use when the data come from normal distributions; Bartlett's test is not robust to departures from normality.

**Base design**

In factorial or response surface experiments, the initial design, or starting point, from which you can build your final design.

**Box-Behnken design**

A type of response surface design that does not contain an embedded factorial or fractional factorial design. Box-Behnken designs have treatment combinations that are at the midpoints of the edges of the experimental space and require at least three factors.

**Bell-shaped curve**

A bell-shaped curve that is symmetric about its mean. The mean (μ) and the standard deviation (σ) are the two parameters that define the normal distribution. The mean is the peak or center of the bell-shaped curve. The standard deviation determines the spread in the data.

**Bernoulli distribution**

Observed when a random process has exactly two outcomes, event or nonevent. For example, in the quality field, a product can be classified as good or bad.

A random variable X follows a Bernoulli distribution if,

P(X = 0) = p and P(X = 1) = 1 – p

Where, p is the probability that the outcome is a success.

The Bernoulli distribution is the building block for many distributions. For example, binomial, geometric, negative binomial distribution, etc.

**Beta distribution**

Use for random variables between 0 and 1. The beta distribution is often used to model the distribution of order statistics and to model events which are defined by minimum and maximum values. It is also used in Bayesian statistics.

**Bias and linearity**

Bias examines the difference between the observed average measurement and a reference or master value. It answers the question: "How accurate is my gage when compared to a reference value?" Linearity examines how accurate your measurements are through the expected range of the measurements. It answers the question: "Does my gage have the same accuracy across all reference values?"

**Binomial Distribution**

Binomial distributions are associated with data that can have one of two values (pass/fail, go/no-go). The binomial distribution characterizes defectives data, which are nonconformities in products or services that render the product or service unusable.

**Block**

A group of experimental runs conducted under relatively homogeneous conditions. Although every measurement should be taken under consistent experimental conditions (other than those that are being varied as part of the experiment), this is not always possible. Use blocks in experimental design and analysis to minimize bias and error variance due to nuisance factors.

**Bonferroni confidence intervals**

Method for controlling the simultaneous confidence level for an entire set of confidence intervals. It is important to consider the simultaneous confidence level when examining multiple confidence intervals because your chances that at least one of the confidence intervals does not contain the population parameter is greater for a set of intervals than for any single interval. To counter this higher error rate, Bonferroni's method adjusts the confidence level for each individual interval so that the resulting simultaneous confidence level is equal to the value you specify.

**Boxplot**

A graphical summary of the distribution of a sample that shows its shape, central tendency, and variability.

**Box-Behnken design**

A type of response surface design that does not contain an embedded factorial or fractional factorial design. Box-Behnken designs have treatment combinations that are at the midpoints of the edges of the experimental space and require at least three factors.

**Box-Behnken design**

A type of response surface design that does not contain an embedded factorial or fractional factorial design. Box-Behnken designs have treatment combinations that are at the midpoints of the edges of the experimental space and require at least three factors.

**Box-Behnken design**

**Box-Cox transformation**

Many analyses require an assumption of normality. In cases when data are not normal, sometimes Box-Cox transformation can be apply as a function to make data approximately normal so that you can complete your analysis.

**Burt table**

A symmetric matrix used to help visualize and analyze relationships between categorical variables. The table serves as the foundation for multiple correspondence analysis and is often used in marketing analysis to develop and interpret customer profiles.

C

**C chart**

Plots the number of nonconformities or defects when subgroup sizes are equal. It is possible for a unit to have one or more nonconformities but still be acceptable in function and performance.

**Calculated line**

A line that marks a target or theoretical relationship between variables on the x and y-axes. You can then compare your actual graphed values against this line as you would any other reference line. The calculated line is plotted along these coordinates. These coordinates may be derived from a function or some other formula.

**Capability**

The ability to produce products or provide services that meet specifications defined by the customer's needs.

**CCpk**

An index of potential process capability, calculated with data from subgroups in your study. CCpk compares the process spread to the distance between a specification limit and one of the following values:

- The process target, if one is specified, or
- The center of the specification spread, if you provide both the upper and lower specification limits, or
- The process average, otherwise

Compare your CCpk value to a benchmark to determine whether to improve your process; many industries use a benchmark value of 1.33.

**Cp and Pp**

Capability indices that measure whether a process is capable of meeting specifications by calculating a ratio between the specification spread and the process spread. In general, the higher your Cp and Pp values, the more capable your process. To calculate Cp and Pp, you must know both the upper and lower specification limits.

**Cpk, CPU, and CPL**

Measures of potential process capability, calculated with data from the subgroups in your study. They measure the distance between the process average and the specification limits, compared to the process spread:

- CPL measures how close the process mean is running to the lower specification limit
- CPU measures how close the process mean is running to the upper specification limit
- Cpk equals the lesser of CPU and CPL.

**Cpm**

An overall capability index that measures whether the process meets specification and is on target. Cpm compares the specification spread to the spread of your data, taking into account the data's deviation from the target value instead of its deviation from the process mean. Large distances between the target and your observations result in a small Cpm value. As your process improves and approaches the target, the value of the Cpm index increases.

**Group**

A collection of subjects or units defined by a common property.

**Cauchy distribution**

A bell-shaped curve, similar to a normal distribution. However, tails approach zero less quickly than do those of the normal distribution and the mean and standard deviation of the Cauchy distribution are undefined. The Cauchy distribution is also known as the Lorentz or Breit-Wigner distribution.

**Cause-and-effect diagram (C&E diagram)**

Brainstorming tool that allows you to investigate various causes that influence a specific effect. Developing a cause-and-effect diagram with your team can help you identify areas where there may be problems, and compare the relative importance of different causes.

**Central composite design**

The most commonly used response surface experimental design. Central composite designs consist of a factorial or fractional factorial design with center points, augmented with a group of axial (or star) points that allow estimation of curvature.

**CCpk**

An index of potential process capability, calculated with data from subgroups in your study. CCpk compares the process spread to the distance between a specification limit and one of the following values:

- The process target, if one is specified, or
- The center of the specification spread, if you provide both the upper and lower specification limits, or
- The process average, otherwise

Compare your CCpk value to a benchmark to determine whether to improve your process; many industries use a benchmark value of 1.33.

**Cumulative distribution function**

Gives the cumulative probability associated with a distribution. Specifically, the cumulative distribution function (CDF) gives the area under the probability density function, up to the value you specify.

**Empirical CDF plot**

Use to evaluate the fit of a distribution to your data, estimate percentiles, and compare different sample distributions.

**Censoring**

Occurs when the exact failure times of some test units are unknown.

**Center line**

A horizontal reference line on a control chart that is the average value of the charted quality characteristic. If a process is in control, the points will randomly vary around the center line.

**Cluster centroid**

Middle of a cluster. A centroid is a vector containing one number for each variable, where each number is the mean of a variable for the observations in that cluster.

**Center Points**

Center points - represent experimental runs with all factor levels set halfway between the low and high settings.

**Corner points**

Corner points - represent experimental run when all factors are set at their highest or lowest level.

**Central composite design**

The most commonly used response surface experimental design. Central composite designs consist of a factorial or fractional factorial design with center points, augmented with a group of axial (or star) points that allow estimation of curvature.

**Central limit theorem**

A fundamental theorem of probability and statistics, it states that the distribution of X-bar, the mean of a random sample from a population with finite variance, is approximately normally distributed when the sample size is large, regardless of the shape of the population's distribution.

**Central tendency**

The center or area where most values in a data set cluster. Central tendency can be described by a number of different statistics, like the mean, trimmed mean, median, or mode.

**Chi-square statistic**

A measure of divergence between your data's distribution and an expected or hypothesized distribution of your choice.

**Chi-square (2) distribution**

A common distribution used in tests of statistical significance to:

- Test how well a sample fits a theoretical distribution. For example, you can use a goodness-of-fit test to determine whether your sample data fit a Poisson distribution.
- Test the independence between categorical variables. For example, a manufacturer wants to know if the occurrence of four types of defects (missing pin, broken clamp, loose fastener, and leaky seal) is related to shift (day, evening, overnight).

**Chi-square test**

A family of hypothesis tests that compare the observed distribution of your data to their expected distribution under the null hypothesis.

**Confidence interval**

A range of values, derived from sample statistics, that is likely to contain the value of an unknown population parameter.

**Coefficients and standardized coefficients (PLS)**

Estimates of the population regression parameters in the PLS model. The coefficients are used with the predictors to calculate the fitted value of the response variables. In PLS, standardized coefficients indicate the importance of each predictor in the model and correspond to the standardized x- and y-variables. The coefficient matrix (dimension p x r, where p = number of predictors and r = number of responses) is calculated from the x-weights and x-loadings.

**Cronbach's alpha**

As a measure of internal consistency, Cronbach's alpha assesses how reliably survey or test items that are designed to measure the same construct actually do so. Higher values of Cronbach's alpha suggest higher internal consistency.

**Coefficient of Determination**

Percentage of response variable variation that is explained by its relationship with one or more predictor variables. In general, the higher the R2, the better the model fits your data. R2 is always between 0 and 100%. It is also known as the coefficient of determination or multiple determination (in multiple regression).

**Coefficient of variation**

A measure of relative variability, equal to the standard deviation divided by the mean. The coefficient of variation is a ratio of the standard deviation of the data to the mean. It is useful in comparing the dispersion of populations with significantly different means.

**Coefficients**

The numbers by which the variables in an equation are multiplied.

**Combination**

A selection of objects from a group, when the order of the selection does not matter.

**Common causes and special Causes**

Common causes refer to occurrences that contribute to the natural variation in any process. Special causes are unusual occurrences that are not normally (or intentionally) part of the process that unsettle its stability. While some degree of common cause variation will naturally occur in any process, it's important to identify and attempt to eliminate special causes of variation.

**Comparison values for median polish**

An exploratory data analysis (EDA) tool that helps you choose an appropriate transformation for data that is not well-described by the median polish additive model. The comparison value for an observation in row i and column j is:

(row effect i) * (column effect j) / (common effect)

**Components (DOE)**

The ingredients that make up a mixture. By performing a design of experiment (DOE), you can determine the relative proportion of each component that will optimize the mixture (the response). Mixture experiments commonly occur in food-processing, refining, or the manufacturing of chemicals.

**Concordant pairs and discordant pairs**

Used to describe the relationship between sets of paired observations. A pair is concordant if the pair of observations are in the same direction. A pair is discordant if the pair of observations are in opposite directions.

**Confidence bands**

Lines on a probability plot or fitted line plot that depict the upper and lower confidence bounds for all points on a fitted line within the range of data.

**Confidence level**

A range of values, derived from sample statistics, that is likely to contain the value of an unknown population parameter. Because of their random nature, it is unlikely that two samples from a given population will yield identical confidence intervals. But if you repeated your sample many times, a certain percentage of the resulting confidence intervals would contain the unknown population parameter. The percentage of these confidence intervals that contain the parameter is the confidence level of the interval.

**Confounding**

Effects that cannot be estimated separately from one another are said to be confounded. Confounding occurs when you use a fractional factorial design, because you do not run all factor level combinations.

**Connect line**

Joins data points, mean values, or median values on a graph so you can:

- View trends over time
- Make comparisons

**Contingency table**

A table that tallies observations according to multiple categorical variables. The tables' rows and columns correspond to these categorical variables.

**Continuous distribution**

Describes the probabilities of the possible values of a continuous random variable. A continuous random variable is a random variable with a set of possible values (known as the range or support) that is infinite and uncountable.

**Contour plot**

Use to explore the potential relationship between three variables. Contour plots display the three-dimensional relationship in two dimensions, with x- and y-factors (predictors) plotted on the x- and y-scales and response values represented by contours. A contour plot is like a topographical map in which x-, y-, and z-values are plotted instead of longitude, latitude, and elevation.

**Control chart**

Plots your process data in time-ordered sequence to help identify common cause and special cause variation.

**C chart**

Plots the number of nonconformities or defects when subgroup sizes are equal.

**CUSUM chart**

A type of time-weighted control chart that displays the cumulative sums (CUSUMs) of the deviations of each sample value from the target value.

**Control factor**

Process or design parameters that you can control. In Taguchi designed experiments, the objective is to identify control factor settings that minimize the variability produced by uncontrollable factors, called noise factors.

**Control limits**

Horizontal lines located above and below the center line that are used to judge whether a process is out of control. The upper and lower control limits are based on the expected random variation in the process.

**Cook's distance (D)**

Measures the influence of an observation on the set of regression coefficients in a regression or ANOVA model. Influential observations have a disproportionate impact on the model and can produce misleading results.

**Correlation**

A measure of linear association between two variables.

**Cosine**

The cosine of an acute angle of a right triangle is the ratio of the adjacent leg to the hypotenuse (the longest side, opposite the right angle).

**Cost optimization**

Simultaneously optimizes cost and one or more responses to find the factor settings that are both cost-effective and produce acceptable values for the responses. Often the factor settings that produce the best results are the most expensive to run. Cost optimization finds a compromise between minimizing cost and optimizing the responses.

**Counts data, Frequency data**

Synonymous terms that refer to a type of summary data that is organized with one or more columns of unique values or labels listed only once and a corresponding frequency column indicating the number of occurrences of each item.

**Cumulative Count (Cumulative N)**

A running total of the number of observations in successive categories.

**Covariance**

Measure of the linear relationship between two variables. Covariance is not standardized, unlike the correlation coefficient. Therefore, covariance values can range from negative infinity to positive infinity. Positive covariance values indicate that above average values of one variable are associated with above average values of the other variable and below average values are similarly associated. Negative covariance values indicate that above average values of one variable are associated with below average values of the other variable.

**Covariate**

A continuous predictor variable. In a DOE, a covariate is typically used to account for the effect of a variable that is observable, but difficult to control. A covariate is entered into the model to reduce the error variance.

**Cox-Snell residuals**

A type of standardized residuals used in reliability analysis. A residual is the difference between an observed data point and a predicted or fitted value. A Cox-Snell residual takes into account the distribution and estimated parameters from the lifetime regression model.

**Cp and Pp**

Capability indices that measure whether a process is capable of meeting specifications by calculating a ratio between the specification spread and the process spread. In general, the higher your Cp and Pp values, the more capable your process. To calculate Cp and Pp, you must know both the upper and lower specification limits.

**Cramer's V2**

Cramer's V2 measures association between two variables. A value of zero indicates that there is no association. A value of one indicates that there is a perfect association.

**Critical value**

In hypothesis testing, a critical value is a point on the test distribution that is compared to the observed value of the test statistic to determine whether to reject the null hypothesis. If your sample produces a test statistic that exceeds a critical value in magnitude, you can declare statistical significance and reject the null hypothesis. Critical values correspond to -levels, so their values become fixed when you choose the test's -level.

**Crossed factors**

Two factors are crossed when each level of one factor occurs in combination with each level of the other factor

**Cross-validated residuals**

Measure the model's predictive ability. Cross-validated residuals are used to calculate the PRESS statistic. In PLS, the cross-validated residuals are the differences between the actual responses and the cross-validated fitted values. The cross-validated residual value varies based on how many observations are omitted each time the model is recalculated during cross-validation.

**Cube plot**

Use a cube plot to see the factors and combination of settings used in your design.

**Cumulative percent**

The sum of all the percentage values up to that category, as opposed to the individual percentages of each category.

**Cumulative %Defective plot**

This graph plots the mean %Defective against the ordered samples to show how the estimate changes as you collect more samples.

**Cumulative DPU plot**

In capability analysis for Poisson data, this chart helps verify that you have collected data from enough samples to calculate a stable estimate of the mean number of defects per unit of observation (DPU). This graph plots the mean DPU against the ordered samples to show how the estimate changes as you collect more samples.

**Cumulative failure plot**

Displays the cumulative failure percents or probabilities versus time. The cumulative failure plot describes product reliability in terms of when the product fails.

**Cumulative failure rate**

The cumulative number of failures at a particular time divided by the time.

D

**Data means and fitted means**

Used in the main effects, interaction, and cube plots (Factorial Plots) for factorial designs. Data means are the raw response variable means for each factor level combination whereas fitted means use least squares to predict the mean response values of a balanced design. Therefore, the two types of means are identical for balanced designs but can be different for unbalanced designs. Fitted means are useful for observing response differences due to changes in factor levels rather than differences due to the disproportionate influence of unbalanced experimental conditions.

**Decomposed T squared value**

On a T squared chart, individual points are composite values that represent multiple variables. It is unclear, however, how much any one variable contributes to the combined point value, and this complicates interpretation of out of control points: perhaps all of the variables contributed to put the point out of control, or maybe it was only one extreme variable that skewed the composite value. To address this, you can choose to store "decomposed" T squared values. Minitab will then print the individual contribution of each variable for all out of control points.

**Defects per unit (DPU)**

DPU is the number of defects in a sample divided by the number of units sampled.

**Defects per opportunity (DPO)**

Defects per opportunity (DPO) is the number of defects in a sample divided by the total number of defect opportunities.

**Defects per million opportunities (DPMO)**

Defects per million opportunities (DPMO) is the number of defects in a sample divided by the total number of defect opportunities multiplied by 1 million. DPMO relates the customer's pain at the opportunity level and is useful because you can compare processes with different complexities.

**Defects and defectives**

Customers expect products and services to meet their specifications. When they don't, a defect or defective is present.

- A defect is any departure from these specifications. A defect does not necessarily mean that the product or service cannot be used; just that it wasn't as intended. For example, if a candle wick is not long enough, the candle cannot be used; but if a waiter greets his table after 5 minutes, the customer can still order and enjoy a meal even though the greeting did not meet expectations.
- A defective is when the whole product or service is considered unacceptable. Each product unit or service experience is either considered defective or not there are only two choices.

**Degree of lattice**

Determines where (vertices, edges, or faces) points are placed in a mixture design. In a simplex lattice design, points are arranged in a uniform manner (lattice).

**Degrees**

Units of measure used to indicate the size of an angle.

**Degrees of freedom**

The amount of information your data provide that you can "spend" to estimate the values of unknown population parameters, and calculate the variability of these estimates. Degrees of freedom are affected by the sample size and the number of parameters in your model. Increasing your sample size provides more information about the population, and consequently increases the degrees of freedom present in your data. Adding parameters to your model (by increasing the number of terms in a regression equation, for example) "spends" information from your data, and lowers the degrees of freedom available to estimate the variability of the parameter estimates.

**Delta**

The overall change in a value.

**Delta chi-square**

The change in Pearson chi-square due to deleting all the observations with the jth factor/covariate pattern. Observations that are poorly fit by the model have high delta chi-square values. Minitab calculates a delta chi-square value for each distinct factor/covariate pattern.

**Delta deviance**

The change in the deviance statistic due to deleting all the observations with the jth factor/covariate pattern. Minitab calculate a delta deviance value for each distinct factor/covariate pattern.

**DendrogramDesigned experiment**

A series of runs, or tests, in which you purposefully make changes to input variables simultaneously and observe the responses.

Illustrates the information in the amalgamation table in the form of a tree diagram.

**Probability density function (PDF)**

Describes the likelihood of each specific value that a variable can take on.

**Design generators**

Determine how the fraction (or subset of runs) is selected from the full set of runs in a fractional factorial design.

**Design resolution**

Describes the extent to which effects in a fractional factorial design are aliased with other effects. When you run a fractional factorial design, one or more of the effects are confounded, meaning they cannot be estimated separately from one another. In general, you want to use a fractional factorial design with the highest possible resolution for the amount of fractionation required.

**Factorial design**

A type of designed experiment that allows for the simultaneous study of the effects that several factors may have on a response. When performing an experiment, varying the levels of all factors simultaneously rather than one at a time allows for the study of interactions between the factors.

**Destructive testing**

A type of non-replicable testing that renders the test part or sample useless.

**Deviance residual**

A measure of how well the observation is predicted by the model. Observations that are poorly fit by the model have high deviance residuals.

**Difference (power and sample size)**

The difference between an actual population parameter and the hypothesized value; Difference is also known as population effect, or simply, effect. Usually, the true population parameter is not known; therefore, samples are taken and a statistical test, such as a t-test or a one-way ANOVA, is used to evaluate whether a difference exists.

**Discrete distribution**

Describes the probability of occurrence for discrete events.

**Discrete Variable**

A finite number of values between any two values.

**Distances from x-model (PLS)**

Measure how well observations are fitted in the x-space. Distances from the x-model indicate how well observations are described by the x-scores. An observation with a large distance value may also be a leverage point.

**Distances from y-model (PLS)**

Measure how well observations are fitted in the y-space. Distances from the y-model indicate how well observations are described by the y-scores. An observation with a large distance value may also be an outlier

**Distribution**

The shape, spread, and location of a data set.

**Dotplot**

Use to assess the distribution of continuous data. A dotplot plots each observation as a dot along a number line (x-axis). When values are close or the same, the dots are stacked.

**Double root residual**

The suspended rootogram plots the double root residuals (DRRes), which indicate how closely the data follow the comparison (normal) distribution. DRRes are calculated from the fitted count and the number of observations in the corresponding bin. A indicates that the double root residual is greater than 3 or less than -3.

**Drift**

The change in bias over time. Measurement stability represents the total variation in measurements obtained on the same part measured over time, also known as drift.

**Duane plot**

A scatterplot (on the natural log scale) of the cumulative number of failures at a particular time divided by the time (cumulative failure rate) versus time.

**Dunnett's method for multiple comparisons**

Used in ANOVA to create confidence intervals for differences between the mean of each factor level and the mean of a control group. If an interval contains zero, then there is no significant difference between the two means under comparison. You specify a family error rate for all comparisons, and Dunnett's method determines the confidence levels for each individual comparison accordingly.

**Durbin-Watson statistic**

Tests for the presence of autocorrelation in residuals. Autocorrelation means that adjacent observations are correlated. If they are correlated then least squares regression underestimates the standard error of the coefficients.

E

**Error**

Error is referenced in many areas generally refers to the extent that functions, formulas, and statistics fail to fully explain or model a true or theoretical value. In other words, it is the difference between an actual and predicted value. While some degree of error or uncertainty may exist in statistical analyses, identifying and quantifying it can at least help us account for its presence.

**Event probability**

The chance that a particular outcome or condition (called an event or success) will occur. Also called predicted probability.

**Event plot**

A plot of events (failures and retirements) for all systems.

**EWMA chart**

A type of time-weighted control chart that plots the exponentially weighted moving averages. Each EWMA point incorporates information from all the previous subgroups or observations based on a user-defined weighting factor. An advantage of EWMA charts is that they are not greatly influenced when a small or large value enters the calculation. By changing the weight used and the number of 's for the control limits, you can construct a chart that can detect almost any size shift in the process. Because of this, they are often used to monitor in-control processes for detecting small shifts away from the target.

**Expected performance**

The number of units in your process output that you expect to violate the specification limits. This expectation is based on the cumulative distribution function (CDF) of the distribution you choose to model your data. The expected performance can be expressed in terms of nonconforming parts per million (PPM) or per hundred (percent).

**Exponential**

For any number x, the value ex, where e is the base of the natural log equal to approximately 2.71828.

**Exponential distribution**

Most often used to model the behavior of units that have a constant failure rate. The exponential distribution has a wide range of applications in analyzing the reliability and availability of electronic systems, queuing theory, and Markov chains.

**Extreme vertices designs**

Mixture designs that cover only a subportion or smaller space within the simplex. These designs must be used when your chosen design space is not an L-simplex.The presence of both lower and upper bound constraints on the components often create this condition.

F

**F-distribution**

Used in hypothesis testing to determine whether two population variances are equal. The F-distribution is a sampling distribution of two independent random variables with chi-square distributions, each divided by its degrees of freedom. The F-distribution is also known as Snedecor's F distribution and Fisher-Snedecor distribution. The F-distribution is right skewed and described by its numerator (1) and denominator (2) degrees of freedom.

**F Test**

Use instead of Bartlett's test when you compare just two variances.

**Factor analysis**

Primarily used to examine the structure of data by explaining the correlations among variables. Factor analysis summarizes data into a few dimensions by condensing a large number of variables into a smaller set of latent variables or factors.

**Factor coefficients**

Indicate the relative weight of each variable in the component in a factor analysis. The bigger the absolute value of the coefficient, the more important the corresponding variable is in constructing the component. The factor coefficients are used to calculate the factor scores.

**Factor and factor level**

Used extensively in ANOVA and design of experiments, investigators select factors to systematically vary during an experiment in order to determine their effect on the response variable. Factors can only assume a limited number of possible values, known as factor levels. Factors can be a categorical variable or based on a continuous variable but only use a few controlled values in the experiment.

**Factor loadings**

Represent how much a factor explains a variable in factor analysis. Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the factor strongly influences the variable. Loadings close to zero indicate that the factor has a weak influence on the variable.

**Factor scores**

Estimated values of the factors in factor analysis.

**Factorial**

The product of all the consecutive integers from 1 to n, where n is a non-negative integer. For example, the factorial of 5 equals 1* 2 * 3 * 4 * 5 = 120.

**Factorial design**

A type of designed experiment that allows for the simultaneous study of the effects that several factors may have on a response. When performing an experiment, varying the levels of all factors simultaneously rather than one at a time allows for the study of interactions between the factors.

**Fractional factorial design**

A design in which experimenters perform only a selected subset or "fraction" of the runs in the full factorial design. Fractional factorial designs are a good choice when resources are limited or the number of factors in the design is large because they use fewer runs than the full factorial designs.

**Failure mode**

A cause of failure or one possible way a system can fail. When a system has many potential ways of failing, it has multiple failure modes or competing risks. The more complex a system is, the more failure modes there are.

**Failure Modes and Effects Analysis (FMEA)**

Failure Modes and Effects Analysis (FMEA) is methodology for analyzing causes of failures and understanding their frequency and impact. By identifying potential failure modes and their impact, the appropriate corrective actions and plans can be implemented.

**Fiducial confidence interval**

A confidence interval based on fiducial statistical theory, which considers unknown population parameters to be random variables. For a 100(x)% fiducial confidence interval, the probability that the population parameter falls within the interval is (x).

**Fisher's exact test**

Tests whether two binary variables are independent.

**Fisher's least significant difference (LSD) method**

Used in ANOVA to create confidence intervals for all pairwise differences between factor level means while controlling the individual error rate to a level you specify.

**Fitted values**

Point estimates of the mean response for the given values of the predictors, factor levels, or components. Fitted values Point estimates of the mean response for the given values of the predictors, factor levels, or components. Fitted values are also called or predicted values.

**Fitted distribution line**

Use to determine how well sample data follows a specific distribution.

**Fitted regression line**

Use to illustrate the relationship between a predictor and response variable and to see if your model fits the data. This line is a graphical representation of the mathematical regression equation. It is plotted using the least squares method which minimizes the sum of the squared distances between the points and the fitted line.

**Fixed and random factors**

In ANOVA, factors are either fixed or random. In general, if the investigator controls the levels of a factor, then the factor is fixed. On the other hand, if the investigator randomly sampled the levels of a factor from a population, then the factor is random.

**Forecasting**

Used extensively in time series analysis to predict a response variable, such as monthly profits, stock performance, or unemployment figures, over a specified period of time. Forecasts are based on patterns in existing data.

**Fractional factorial design**

A design in which experimenters perform only a selected subset or "fraction" of the runs in the full factorial design. Fractional factorial designs are a good choice when resources are limited or the number of factors in the design is large because they use fewer runs than the full factorial designs.

**Frequency data, counts data**

Synonymous terms that refer to a type of summary data that is organized with one or more columns of unique values or labels listed only once and a corresponding frequency column indicating the number of occurrences of each item.

**Friedman's test**

Essentially a nonparametric version of the Two-way analysis of variance.

**F-test**

A hypothesis test that examines the ratio of two variances to determine their equality. Typically one-tailed, F-tests refer to the F-distribution. An F-test evaluates whether the observed statistic exceeds a critical value from the distribution. If the observed F-statistic exceeds the critical value, reject the null hypothesis.

**Full rank**

Linear models are full rank when there are a sufficient number of observations per factor level combination to be able to estimate all terms included in the model. When there are insufficient observations in GLM.

G

**Gage**

Any instrument or device used for obtaining measurements.

**Gage linearity and bias study**

A Measurement System Analysis (MSA) method which assesses the bias and linearity in your measurement system. A Gage Linearity and Bias Study helps you answer whether your measurement system has bias when compared to a standard and whether it has the same bias across the range of your measurements.

**Gage linearity and bias study**

A Measurement System Analysis (MSA) method which assesses the bias and linearity in your measurement system. A Gage Linearity and Bias Study helps you answer whether your measurement system has bias when compared to a standard and whether it has the same bias across the range of your measurements.

**Gage repeatability and reproducibility study (Gage R&R Study)**

A Measurement System Analysis (MSA) method which evaluates your measurement system precision and estimates the combined measurement system repeatability and reproducibility. A Gage R&R Study helps you answer whether your measurement system variability is small compared with the process variability, how much variability in the measurement system is caused by differences between operators, and whether your measurement system is capable of discriminating between different parts.

**Gage run chart**

A Measurement System Analysis (MSA) method which plots all of your observations by operator and part number to help you quickly assess differences in measurements between different operators and parts. This is one of the most helpful charts to display variability in measurements.

**Gage tolerance**

Represents the discrimination, or increments of measure, that a particular gage has when measuring parts. A practical guideline, known as the Rule of Tens, states that instrument discrimination should divide the process tolerance into ten parts or more.

**Gamma distribution**

Often used to model positively skewed data when random variables are greater than 0.

**Gamma function (complete and incomplete)**

Extends the factorial function (1 * 2 * 3...* n) so that factorials can be calculated for fractions in addition to positive integers.

**General full factorial design**

An experimental design in which the factors can have any number of levels.

**General linear model (GLM)**

ANOVA procedure in which the calculations are performed using a least squares regression approach to describe the statistical relationship between one or more predictors and a continuous response variable. Predictors can be factors and covariates. GLM codes factor levels as indicator variables using a 1, 0, - 1 coding scheme. Factors may be crossed or nested, fixed or random. Covariates may be crossed with each other or with factors, or nested within factors. The design may be balanced or unbalanced. GLM can perform multiple comparisons between factor level means to find significant differences.

**Geometric distribution**

When performing an experiment with only two outcomes, this discrete distribution can model the number of consecutive trials necessary to observe the outcome of interest for the first time. It can also model the number of nonevents that occur before you observe the first outcome.

**Geometric mean**

A measure of central tendency that calculates an average of the data using multiplication rather than addition. The geometric mean determines what single value all the data values would need to be to achieve the same product.

**Goodness-of-fit test**

Determine whether a statistical model fits your data by analyzing the difference between your observed values and their expected values in the model. For continuous data, you can assess goodness-of-fit visually with a probability plot, or quantitatively with a hypothesis test such as the Anderson-Darling test. For categorical data, you can use a chi-square test.

**Grand mean**

The mean of all observations, as opposed to the mean of individual groups.

H

**Hypothesis test**

A procedure that evaluates two mutually exclusive statements about a population. A hypothesis test uses sample data to determine which statement is best supported by the data. These two statements are called the null hypothesis and the alternative hypotheses. They are always statements about populations attributes, such as the value of a parameter, the difference between corresponding parameters of multiple populations, or the type of distribution that best describes the population.

**Half normal plot**

Use to compare the magnitude and statistical significance of main and interaction effects from a 2-level factorial DOE.

**Hard-to-change factor**

A factor that is difficult to randomize completely due to time or cost constraints.

**Hazard plot**

The hazard plot displays the instantaneous failure rate for each time t.

**Heywood case**

Occurs in factor analysis when the iterative maximum likelihood estimation method converges to unique (specific) variances values that are less than a prefixed lower bound value.

**Hinges**

Like quartiles, hinges measure the middle half of the data, and both are methods for establishing the endpoints of the box in a boxplot. The hinge and quartile are essentially the same concept, though they are calculated differently. In practice, the hinges may be a bit closer to the median than the quartiles are, but in most cases this difference is not noticeable unless the data set is very small.

**Histogram**

A graph used to assess the shape and spread of continuous sample data.

**Hotelling's t-squared test**

Compares two groups in a special case of MANOVA, using one factor that has two levels.

**Hyperbolic trigonometry functions**

Trigonometric functions based on the hyperbola with the equation x2 - y2 = 1. These functions differ from those in standard (circular) trigonometry, whose functions are based on the unit circle with the equation x2 + y2 = 1. However, they share many similar identities, such as sinh2x+ cosh2x = 1, where h represents hyperbolic.

**Hypergeometric distribution**

A discrete distribution used for samples drawn from relatively small populations, without replacement.

I

**Individuals chart**

Plots individual observations over time for variables data. Use this chart to monitor the process center when it is difficult or impossible to group measurements into subgroups. This occurs when measurements are expensive, production volume is low, or products have a long cycle time.

**I-MR chart**

Plots individual observations (I chart) and moving ranges (MR chart) over time for variables data.

**Independent sample**

A sample that is selected randomly so that its observed values do not depend on the observed values of another sample. Many statistical analyses are based on the assumption that samples are independent.

**Independent trial**

A trial in an experiment is independent if the likelihood of each possible outcome does not change from trial to trial.

**Indicator variables**

Use to include categorical information in regression models.

**Individual and family error rate**

The type I error rates associated with the multiple comparisons often used to identify significant differences between specific factor levels in an ANOVA.

**Individual and simultaneous confidence level**

The confidence levels associated with the confidence intervals often used in multiple comparisons to identify significant differences between specific factor levels in an ANOVA. These confidence levels are analogous to the individual and family error rates but applied to confidence intervals.

**Individual and composite desirability**

Assess how well a combination of input variables satisfies the goals you have defined for the responses. Individual desirability (d) evaluates how the settings optimize a single response; composite desirability (D) evaluates how the settings optimize a set of responses overall. Desirability has a range of zero to one. One represents the ideal case; zero indicates that one or more responses are outside their acceptable limits.

**Individual value plot**

Use to assess and compare sample distributions through individual data values, with optional grouping by categorical variables. An individual value plot can also help you to check for obvious mistakes in coding values.

**Individuals chart**

Plots individual observations over time for variables data. Use this chart to monitor the process center when it is difficult or impossible to group measurements into subgroups. This occurs when measurements are expensive, production volume is low, or products have a long cycle time. The Individuals chart is also known as the I chart.

**Inferential statistics**

Uses random sample data taken from a population to describe and draw conclusions about the population.

**Influential observation**

Observations that have a disproportionate impact on a regression or ANOVA model. Influential observations, also known as unusual observations, are important to identify because they can produce misleading results.

**Integer distribution**

A discrete uniform distribution that ranges from the minimum to the maximum integer value specified. Each integer in the range has equal probability.

**Interaction**

When the effect of a one factor depends on the level of the other factor.

**Internal consistency**

An assessment of how reliably survey or test items that are designed to measure the same construct actually do so. A construct is an underlying theme, characteristic, or skill such as reading comprehension or customer satisfaction. A high degree of internal consistency indicates that items meant to evaluate the same construct yield similar scores. There are a variety of internal consistency measures. In general, they involve determining how highly these items are correlated and how well they predict each other. Cronbach's alpha is one commonly used measure. To use internal consistency measures, items generally should be in a single measurement instrument and administered to a group of people on one occasion in order to avoid confounding variables.

**Interval plot**

A graphical summary of the distribution of a sample that shows its central tendency and variability. The default interval plot display consists of a mean symbol with a 95% confidence interval bar.

**Inverse cumulative distribution function**

Gives the value associated with a specific cumulative probability. Use the inverse CDF to determine the value of the response associated with a specific probability.

J

**Johnson transformation**

A set of standard normal percentiles used to select and estimate the Johnson transformation. The optimum transformation is that which produces normalized data with the greatest p-value in an Anderson-Darling normality test.

K

**Kappa**

Indicates the degree of agreement of the nominal or ordinal assessments made by multiple appraisers when evaluating the same samples. Kappa statistics are commonly used in cross tabulation (table) applications and in attribute agreement analysis.

**Kendall's coefficient of concordance (KCC)**

Indicates the degree of association of ordinal assessments made by multiple appraisers when evaluating the same samples. Kendall's coefficient is commonly used in attribute agreement analysis (attribute gage R&R).

**Kendall's tau-b**

A nonparametric measure of association for ordinal data. Ordinal data are categorical variables that have three or more levels with a natural ordering, such as strongly disagree, disagree, neutral, agree, and strongly agree.

**Kolmogorov-Smirnov normality test**

This test compares the empirical cumulative distribution function of your sample data with the distribution expected if the data were normal. If this observed difference is sufficiently large, the test will reject the null hypothesis of population normality.

If the p-value of these test is less than your chosen -level, you can reject your null hypothesis and conclude that the population is nonnormal.

**Kruskal-Wallis test**

Tests whether two or more independent samples come from identical populations.

**Kurtosis**

The degree to which a data set is peaked. Like many other basic statistics, kurtosis can help you establish an initial understanding of your data. You can evaluate kurtosis visually via a graph (like a histogram) or mathematically through the kurtosis value statistic.

L

**Lack-of-fit tests**

Used in regression and DOE, lack-of-fit tests assess the fit of your model. If the p-value is less than your selected -level, evidence exists that your model does not accurately fit the data. You may need to add terms or transform your data to more accurately model the data.

**Laplace distribution**

Used when the distribution is more peaked than a normal distribution. It is also known as the double exponential distribution, because it is often considered as two exponential distributions, one positive and one negative. The Laplace distribution is used for modeling in biology, finance, and economics.

**Ljung-Box q statistic**

Use to test whether a series of observations over time are random and independent. If observations are not independent, one observation may be correlated with another observation k time units later, a relationship called autocorrelation. Autocorrelation can impair the accuracy of a time-based predictive model, such as time series plot, and lead to misinterpretation of the data.

**Least squares versus maximum likelihood estimation methods**

Two different approaches for estimating population parameters from a random sample.

**Least squares method**

Least squares estimates are calculated by fitting a regression line to the points from a data set that has the minimal sum of the deviations squared (least square error). In reliability analysis, this is plotted on a probability plot which can make interpretation easier.

**Legend**

Identifies the visual elements used to distinguish different groups of data on the graph. The legend helps you evaluate the effects of grouping.

**Length of observation**

Poisson processes count occurrences of a certain event or property over a given observation range, which can represent time, area, volume, number of items, etc. The length of observation represents the magnitude, duration, or size of each observation period, or the number of items over which you count events.

**Tests of equal variances**

Use to test the equality of variance between populations or factor levels. Many statistical procedures, such as analysis of variance (ANOVA) and regression, assume that although different samples may come from populations with different means, they have the same variance.

**Leverage**

In regression and ANOVA models, measures the distance from an observation's x-value to the average of the x-values for all observations in a data set. Observations with large leverage values may exert disproportionate influence on a model and produce misleading results.

**Likelihood ratio test**

A hypothesis test that compares the goodness-of-fit of two models – an unconstrained model with all parameters free, and its corresponding model constrained by the null hypothesis to fewer parameters – to determine which offers a better fit for your sample data.

**Line plot**

Use to compare response patterns for two or more groups.

**Linear constraints**

The upper and lower bounds on a function of components in a mixture design. Setting these limits helps to define your design space and focuses your experiment to make the best use of testing resources.

**Linear relationship**

A trend in the data that can be modeled by a straight line, which shows a steady rate of increase or decrease.

**Link function**

A link function maps the interval (0,1) onto the entire real number line in logistic regression and probit analysis. A link function transforms the probabilities of the levels of a categorical response variable (which exist between zero and one) to an unbounded continuous scale that can be modeled with a linear regression equation.

**Linkage methods**

Used with Cluster Observations and Cluster Variables, they determine how the distance between two clusters is defined. Choosing one over another may not make an appreciable difference with your data. However, because the goal of cluster amalgamation is somewhat subjective, you may find different methods are more or less appropriate with your particular situation and data.

**Log residuals**

Difference between the natural log of the observed response standard deviation and the natural log of the fitted standard deviation. The log residual represents the part of the observed response that is not explained by the model. Of the types of residuals Minitab calculates in Analyze Variability, the log residuals most closely resemble regular residuals. You may want to use the log residuals for residual plots to examine your model.

**Log base 10**

The exponent to which 10 must be raised to equal a given number.

**Logarithm**

The power to which a number (called a "base") must be raised to achieve a given number.

**Logistic distribution**

Used as a growth curve and to model binary responses. Used in the fields of biostatistics and economics.

**Logistic regression**

Models a relationship between predictor variables and a categorical response variable.

**Log-likelihood**

The expression that is maximized to find optimal values of the estimated coefficients (). Log-likelihood values cannot be used alone as an index of fit because they are a function of sample size but can be used to compare two models.

**Loglogistic distribution**

A random variable is loglogistically distributed if the logarithm of the variable is logistically distributed. This distribution is commonly used as a growth curve and to model binary responses and is often used in the biostatistics and economics fields. The loglogistic distribution is also known as the Fisk distribution.

**Lognormal distribution**

A random variable follows this distribution if the logarithm of the random variable is normally distributed. Use when random variables are greater than 0. Used for reliability analysis and in financial applications, such as modeling stock behavior. The lognormal distribution is also known as the Cobb-Douglas distribution.

**Lurking variable**

A variable that is not included as an explanatory or response variable in the analysis but may affect the interpretation of relationships among variables. A lurking variable can falsely suggest a strong relationship among variables or it can hide the true relationship.

M

**Matrices**

Rectangular blocks of numbers upon which mathematical operations can be performed. Matrices are often described by their dimensions.

**Mean**

Describes an entire set of observations with a single value representing the center of the data. The mean (arithmetic average) is the sum of all the observations divided by the number of observations.

**Moving average**

Averages calculated from artificial subgroups of consecutive observations. In control charting, you can create a moving average chart for time weighted data.

**Moving average chart**

A type of time-weighted control chart that plots the unweighted moving average over time for individual observations. This chart uses control limits (UCL and LCL) to determine when an out-of-control situation has occurred. Moving average (MA) charts are more effective than Xbar charts in detecting small process shifts, and are particularly useful when there is only 1 observation per subgroup. However, EWMA charts are generally preferred over MA charts because they weight the observations.

Measures of accuracy (time series analysis)

Use these statistics to compare the fits of different forecasting and smoothing methods. Minitab computes three measures of accuracy of the fitted model: MAPE, MAD, and MSD. The three measures are not very informative by themselves, but you can use them to compare the fits obtained by using different methods. For all three measures, smaller values generally indicate a better fitting model.

**Mean absolute percentage error (MAPE)**

Expresses accuracy as a percentage of the error. Because this number is a percentage, it may be easier to understand than the other statistics.

**Mean absolute deviation (MAD)**

Expresses accuracy in the same units as the data, which helps conceptualize the amount of error. Outliers have less of an affect on MAD than on MSD.

**Mean squared deviation (MSD)**

A commonly-used measure of accuracy of fitted time series values. Outliers have more influence on MSD than MAD.

**Mahalanobis distance**

Distance between a data point and a mulitivariate space's centroid (overall mean). Use the Mahalanobis distance in Principle Components Analysis to identify outliers. It is a more powerful multivariate method for detecting outliers than examining one variable at a time because it takes into account the different scales between variables and the correlations among them.

**Main effects and main effects plot**

Use in conjunction with an analysis of variance and design of experiments to examine differences among level means for one or more factors. A main effect is present when different levels of a factor affect the response differently. A main effects plot graphs the response mean for each factor level connected by a line.

**Mallows' Cp**

A statistic used as an aid in choosing between competing multiple regression models. Mallows' Cp compares the precision and bias of the full model to models with the best subsets of predictors.

**Mann-Whitney test**

A nonparametric hypothesis test to determine whether two populations have the same population median ().

**Multivariate analysis of variance (MANOVA)**

A test that simultaneously analyzes the relationship between several response variables and a common set of predictors. Like ANOVA, MANOVA requires continuous response variables and categorical predictors.

**MantelHaenszelCochran test**

Conditionally tests the associations of two binary variables in the presence of a third categorical variable.

**Margin of error**

Expresses the amount of random sampling error in the estimation of a parameter, such as the mean or proportion.

**Marginal plot**

Use to assess the relationship between two variables and examine their distributions. A marginal plot is a scatterplot with histograms, boxplots, or dotplots of the x- and y-variables in the margins. This two-in-one graph allows you to compare individual variables and their distributions at the same time.

**Maximum and minimum**

Maximum refers to the highest value; minimum refers to the lowest.

**Maximum likelihood method**

The likelihood function indicates how likely the observed sample is a function of possible parameter values. Therefore, maximizing the likelihood function determines the parameters that are most likely to produce the observed data.

**Mean cumulative function and Nelson-Aalen plot**

The mean cumulative function is the average cumulative number of failures or cost over all systems in the time interval (0, t). This function is overlaid on the Nelson-Aalen plot to help you determine how the number of failures or your repair costs are changing over time. In other words, it describes your system as improving, deteriorating, or staying constant.

**MSSD (Mean of the squared successive differences)**

Used as an estimate of variance. It is calculated by taking the sum of the differences between consecutive observations squared, then taking the mean of that sum and dividing by two.

**Mean squares**

Represents an estimate of population variance. It is calculated by dividing the corresponding sum of squares by the degrees of freedom.

**MTBF (mean time between failures)**

The average time between failures for a repairable system with a constant failure rate. The higher the MTBF, the more reliable the product.

**Measurement system analysis (MSA)**

Methods to evaluate your measurement systems and determine whether you can trust your data. MSA helps determine how much of the overall process variation is due to measurement system variation. Measurement systems can include your data collection procedures, gages, and other test equipment. Evaluation of your measurement system should be done prior to control charting, capability analysis, or any another analysis to prove that your measurement system is accurate and precise, and your data are trustworthy.

**Measurement system variation**

All variation associated with a measurement process. Potential sources of variation include gages, standards, procedures, software, environmental components, as well as others.

**Median**

The middle of the range of data: half the observations are less than or equal to it and half the observations are greater than or equal to it.

**Median polish**

An exploratory data analysis (EDA) procedure that creates an additive model to describe data in a two-way table.

**Multivariate EWMA chart**

The multivariate form of the EWMA control chart. Use a MEWMA chart to simultaneously monitor two or more related process characteristics in an exponentially weighted control chart.

**mfailure test**

A reliability test where up to "m" failures are allowed in a successful reliability demonstration test. The most common m-failure tests are the 0-failure test (m=0) or the 1-failure test (m=1).

**Mixture design**

A class of response surface experiments that investigate products containing several components. Use a mixture design to study product characteristics associated with changes in the proportions of the components, process conditions, or the amount of mixture.

**Mixture-amount experiment**

An experiment where the response is assumed to depend on both the proportions of the components and the amount of the mixture.

**Mixture process variable design**

A mixture design that includes process variables- factors in an experiment that are not part of the mixture but may affect the response.

**Maximum likelihood method**

The likelihood function indicates how likely the observed sample is a function of possible parameter values. Therefore, maximizing the likelihood function determines the parameters that are most likely to produce the observed data.

**Mod**

The whole number remainder of a number when it is divided by a given divisor (modulus).

**Mode**

The value that occurs most frequently in a set of observations. Mode may be used with mean and median to give an overall characterization of your data distribution. While the mean and median require a calculation, the mode is found simply by counting the number of times each value occurs in a data set.

**Multicollinearity**

In regression, multicollinearity refers to predictors that are correlated with other predictors. Moderate multicollinearity may not be problematic. However, severe multicollinearity is problematic because it can increase the variance of the regression coefficients, making them unstable and difficult to interpret.

**Multivariate normal distribution**

An extension of the univariate normal distribution for applications with a group of variables that may be correlated. Some multivariate analyses, such as factor analysis and MANOVA, assume that the data follow a multivariate normal distribution.

**Multiplicative model for decomposition**

Use when the size of the seasonal pattern depends on the level of the data. This model assumes that as the data increase, so does the seasonal pattern. Most time series plots exhibit such a pattern. In this model, the trend and seasonal components are multiplied and then added to the error component.

**Multi-vari chart**

A graphical representation of the relationships between factors and a response. Multi-vari charts are a way of presenting analysis of variance data in a graphical form Use in the preliminary stages of data analysis to look at data, possible relationships, and root causes for variation. Multi-vari charts are especially useful in understanding interactions.

**Multivariate control charts**

A type of variables control chart that shows how correlated, or dependent, variables jointly influence a process or outcome.

**Multivariate EWMA chart**

The multivariate form of the EWMA control chart. Use a MEWMA chart to simultaneously monitor two or more related process characteristics in an exponentially weighted control chart.

N

**Normal scores**

The expected values of ordered data from a normal distribution.

**Number of distinct categories**

Used in Gage R&R studies to indicate a measurement system's ability to detect a difference in the measured characteristic (resolution).

**Negative binomial distribution**

When performing an experiment with only two outcomes, this discrete distribution can model the number of trials necessary to produce a specified number of a certain outcome. It can also model the number of nonevents that occur before you observe the specified number of outcomes.

**Net workdays**

Returns the number of workdays (M-F) between two dates, inclusive.

**Noise factor**

In Taguchi designs, factors that cause variability in the performance of a system or product, but cannot be controlled during production or product use.

**Non-centrality parameter**

When your sampling distribution is derived from a normal distribution with a non-zero mean, you have a non-central distribution. Then non-centrality parameter helps define your non-central distribution and represents the degree to which the mean of the sampling distribution of the test statistic departs from its mean when the null is true.

**Nonlinear regression**

Generates an equation to describe the nonlinear relationship between a continuous response variable and one or more predictor variables, and predicts new observations.

**Nonparametric test**

A hypothesis test that does not require the population's distribution to be characterized by certain parameters.

**Normal distribution**

A bell-shaped curve that is symmetric about its mean. The normal distribution is the most common statistical distribution because approximate normality arises naturally in many physical, biological, and social measurement situations. Many statistical analyses require that the data come from normally distributed populations. The normal distribution is also known as the Gaussian distribution.

**Normality test**

A one-sample hypothesis test to determine whether the population from which you draw your sample is nonnormal. Many statistical procedures rely on population normality, and using a normality test to determine whether to reject this assumption can be an important step in your analysis.

**NP chart**

Plots the number of nonconforming units.

O

**Observation**

Counts or measures of a single item. With raw data, each row of a worksheet typically represents one observation.

**Observed performance**

The number of units in your process output that violate the specification limits. This statistic is based on combined data from all subgroups, so it represents what your customer experiences. The observed performance can be expressed in terms of nonconforming parts per million (PPM) or per hundred (percent).

**Operating characteristic (OC) curve**

Describes the discriminatory power of an acceptance sampling plan. The OC curve plots the probabilities of accepting a lot versus the fraction defective.

**Odds ratio**

An odds ratio compares the odds of two events, where the odds of an event equals the probability the event occurs divided by the probability that it does not occur.

**Opportunities per unit**

Is the number of chances for a defect to occur in a given product or service. You define the number of opportunities by studying your process to determine the outputs or features that must be correct to satisfy the customer.

**Optimal design**

A group of the "best" design points selected when reducing or augmenting the number of experimental runs in the original design. Two possible criteria for obtaining an optimal design are D-optimality and Distance-based optimality.

**Ordinary least squares (OLS) regression**

In OLS regression, the estimated equation is calculated by determining the equation that minimizes the sum of the squared distances between the sample's data points and the values predicted by the equation.

**Orthogonal design**

The orthogonal design contains each of the level combinations (factor pairs) an equal number of times.

**Orthogonal regression**

Orthogonal regression examines the linear relationship between two continuous variables: one response (Y) and one predictor (X). Unlike simple linear regression, both the response and predictor in orthogonal regression contain measurement error. In simple regression, only the response variable contains measurement error.

**Outlier**

An unusually large or small observation. Outliers can have a disproportionate influence on statistical results, such as the mean, which can result in misleading interpretations.

**Overlaid contour plot**

Use to visually identify the feasible inputs for multiple responses for a factorial, response surface, or mixture design experiment. Feasible input variable settings for one response may be far from feasible for another response. You can use overlaid contour plots to consider the responses simultaneously.

P

**P chart**

Plots the fraction, percent, or proportion of nonconforming units. The P chart is the most widely used attributes control chart.

**P-value**

Determines the appropriateness of rejecting the null hypothesis in a hypothesis test. P-values range from 0 to 1. The p-value is the probability of obtaining a test statistic that is at least as extreme as the calculated value if the null hypothesis is true. Before conducting any analyses, determine your alpha () level. A commonly used value is 0.05. If the p-value of a test statistic is less than your alpha, you reject the null hypothesis.

**Paired t-test**

A hypothesis test for the mean difference between paired observations that are related or dependent. The paired t-test is useful for analyzing differences between twins, differences in before-and-after measurements on the same subject, and differences between two treatments given to the same subject.

**Pairwise statistics**

Statistics calculated between pairs of observations in your data set.

**Parameters**

Descriptive measures of an entire population used as the inputs for a probability distribution function (PDF) to generate distribution curves.

**Pareto chart**

A special type of bar chart where the plotted values are arranged from largest to smallest. A Pareto chart is one of the basic quality control tools used to highlight the most frequently occurring defects, the most common causes of defects, or the most frequent causes of customer complaints.

**Partial least squares regression**

A technique that reduces the predictors to a smaller set of uncorrelated components and performs least squares regression on these components, instead of on the original data. Partial least squares (PLS) is particularly useful when your predictors are highly collinear, or when you have more predictors than observations and ordinary least squares regression either produces coefficients with high standard errors or fails completely.

**Partial products**

Multiplies the first i rows of a column and stores the product in the ith row of the storage column.

**Partial sums**

Adds the first i rows of a column and stores the sum in the ith row of the storage column.

**Parts per million (PPM)**

The number of nonconforming parts out of a million parts.

**Plackett-Burman designs**

A class of resolution III, two-level factorial experiment designs that allow you to investigate many factors inexpensively.

**Principal components analysis**

Use to form a smaller number of uncorrelated variables from a large set of data. The goal of principal components analysis is to explain the maximum amount of variance with the fewest number of principal components. Principal components analysis is commonly used in the social sciences, market research, and other industries that use large data sets.

**Probability density function (PDF)**

Describes the likelihood of each specific value that a variable can take on.

**Pearson residuals**

A measure of how well the observation is predicted by the model. Observations that are poorly fit by the model have high Pearson residuals.

**Pearson's correlation coefficient (r)**

Assesses whether two continuous variables are linearly related.

**Percent (%)**

An individual value that represents the contribution of a category to the whole. Percent is calculated by dividing the frequency of that category by the total frequency and then multiplying by 100.

**Proportion**

A relative portion of a whole, as opposed to a count or frequency.

**Percentile line**

A specialized reference line available only with probability plots and empirical CDF graphs. A percentile line consists of two segments intersecting at the fitted distribution line.

**Percentiles**

Divide the data set into parts. In general, the nth percentile has n% of the observations below it, and (100-n)% of observations above it.

**Percents for percentiles**

The percent of items expected to fail by a particular time.

**Permutation**

An ordered arrangement of objects from a group without repetitions.

**Pi (3.141...)**

The ratio of a circle's circumference to its diameter, often denoted Pi is an irrational constant equal to approximately3.14159.

**Pie chart**

Use to visually represent the proportion for each category in the data. The pie is divided into slices, with each slice representing a category of data. By comparing and contrasting the size of the slices, you can evaluate the relative magnitude of each category.

**Plackett-Burman designs**

A class of resolution III, two-level factorial experiment designs that allow you to investigate many factors inexpensively. Use Plackett-Burman designs to identify the most important factors early in the experimentation phase. They are generally used with eight or more (up to 47) factors.

**Poisson distribution**

Describes the number of times an event occurs in a finite observation space.

**Poisson capability plots**

These plots appear in the output of Poisson capability analysis to help verify whether your data follow a Poisson distribution.

**Pooled standard deviation**

Method for estimating a single standard deviation to represent all independent samples or groups in your study when they are assumed to have a common standard deviation. The pooled standard deviation is the average spread of all data points about their group mean (not the overall mean). It is a weighted average of each group's standard deviation. The weighting gives larger groups a proportionally greater influence on the overall estimate. Pooled standard deviations are used in t-tests, ANOVAs, control charts, and capability analysis.

**Probability of passing (POP) graph**

Use to ensure that a demonstration test has a reasonable chance of passing when a part has truly improved. The probability of passing graph plots the probability of passing a demonstration test against the amount of improvement.

**Population and samples**

A population is a collection of people, items, or events about which you want to draw conclusions. It is not always convenient or possible to examine every member of an entire population. Subset of the population is called a sample.

**Potential (within) and overall capability**

Most capability assessments can be grouped into one of two categories: Potential (within) and Overall capability. Each represents a unique measure of process capability. Potential capability is often called the "entitlement" of your process: it ignores differences between subgroups and represents how the process could perform if the shift and drift between subgroups were eliminated. Overall capability, on the other hand, is what the customer experiences; it accounts for the differences between subgroups. Capability indices that assess potential capability include Cp, CPU, CPL, and Cpk. Capability indices that assess overall capability include Pp, PPU, PPL, Ppk, and Cpm.

**Power**

In a hypothesis test, the likelihood that you will find a significant effect or difference when one truly exists. Power is the probability that you will correctly reject the null hypothesis when it is false.

**Parts per million (PPM)**

The number of nonconforming parts out of a million parts.

**Predicted R2**

Used in regression analysis to indicate how well the model predicts responses for new observations, whereas R2 indicates how well the model fits your data. Predicted R2 can prevent over fitting the model and can be more useful than adjusted R2 for comparing models because it is calculated using observations not included in model estimation.

**Prediction Interval**

Prediction Interval Represents a range that a single new observation is likely to fall given specified settings of the predictors.

**Prediction sum of squares (PRESS)**

Assesses your model's predictive ability. In general, the smaller the PRESS value, the better the model's predictive ability.

**Probability density function (PDF)**

Describes the likelihood of each specific value that a variable can take on.

**Process capability**

The ability to produce products or provide services that meet specifications defined by the customer's needs.

**Process stability**

A process is stable if it does not contain any special cause variation; only common cause variation is present.

**Process tolerance**

Every process has variation; the process tolerance represents the allowable deviation from a target value that will still meet specifications. The process tolerance can usually be found by subtracting the lower specification from the upper specification.

**Process z**

Describes the capability of a binary process – a process in which products are judged to be either defective or not defective; some practitioners call Process Z the sigma capability of a process. Process Z is the point on a standard normal distribution such that the area to the right of that point is equal to the Average P (the proportion of defective units in your process). The higher the process Z, the better the process performance; ideally, you want a process Z of 2 or more.

**Proportion**

A relative portion of a whole, as opposed to a count or frequency.

**P chart**

Plots the fraction, percent, or proportion of nonconforming units. The P chart is the most widely used attributes control chart.

**Power**

In a hypothesis test, the likelihood that you will find a significant effect or difference when one truly exists. Power is the probability that you will correctly reject the null hypothesis when it is false.

**Pseudocomponent**

Coded variables used to simplify design construction and model fitting, and reduce the correlation among component bounds in constrained designs.

Q

**Quartiles**

Quartiles are values that divide a sample of data into four equal parts. With them you can quickly evaluate a data set's spread and central tendency– important first steps in understanding your data.

**First quartile (Q1)** 25% of the data are less than or equal to this value.

**Second quartile (Q2)** The median. 50% of the data are less than or equal to this value.

**Third quartile (Q3)** 75% of the data are less than or equal to this value.

**Interquartile range** The distance between the first and third quartiles (Q3-Q1); thus, it spans the middle 50% of the data.

R

**R chart**

Plots the process range over time for variables data in subgroups. This control chart is widely used to examine the stability of processes in many industries.

**R2 (R-squared)**

Percentage of response variable variation that is explained by its relationship with one or more predictor variables. In general, the higher the R2 , the better the model fits your data. R2 is always between 0 and 100%. It is also known as the coefficient of determination or multiple determination (in multiple regression).

**Radians and degrees**

Units of measure used to indicate the size of an angle.

**Random sample**

A subset of a population selected by a process that makes all samples of a given size equally likely to occur. In statistics, you use a random sample to make generalizations, or inferences, about a population. Therefore, it is important that the sample is free from selection bias. A biased sample is one that is not representative of the population.

**Random variable**

A characteristic of an experiment or sample unit. In particular, it is a characteristic that is not directly controlled by the researcher. It can be numeric or it can be qualitative. For example, if you are sampling high school students, the sample would be a student. Possible random variables would include height, weight, gender, hair color, etc. As you randomly select students, each of these characteristics will vary.

**Randomization**

A technique used to balance the effect of extraneous or uncontrollable conditions that can impact the results of an experiment.

**Range**

The difference between the largest and smallest data values.

**R chart**

Plots the process range over time for variables data in subgroups. This control chart is widely used to examine the stability of processes in many industries.

**Rank**

Assign scores to values in a column: 1 to the smallest value in the column, 2 to the next smallest, and so on. Identical values (ties) are assigned the average rank for that value. Ranked scores are stored in a column.

**Ratio effects**

Provides a measure of the practical significance of a factor's effect. The ratio effect indicates the proportional increase in the standard deviation of the response when the factor is changed from the low to high level. The closer the ratio effect is to one, the smaller the effect of the factor.

**Ratio residuals**

The ratio of the observed response standard deviation to the fitted standard deviation. Indicates the portion of the response not explained by the model. You can obtain the ratio residuals by exponentiation the log residuals. Residual plots using ratio residuals may be difficult to interpret.

**Reference line**

A horizontal or vertical line that spans the data region of a graph to designate goals or demarcations.

**Reference value**

The known and correct measurement associated with each part. It serves as a master value for comparison during measurement system analysis.

**Regression analysis**

Generates an equation to describe the statistical relationship between one or more predictors and the response variable and to predict new observations. Regression generally uses the ordinary least squares method which derives the equation by minimizing the sum of the squared residuals.

**Rejectable quality level (RQL)**

The poorest level of quality that the consumer is willing to tolerate in an independent lot. You want to design a sampling plan that rejects a particular lot of product at the RQL most of the time.

**Reliability test plan**

Specifies the sample size and test time duration of a reliability test. Minitab provides three types of test plans: demonstration test plans, estimation test plans, and accelerated life test plans.

**Repairable system**

A system where component parts are repaired or replaced. Growth curves can be used to chart the performance of repairable systems to help you establish.

**Repeatability and reproducibility**

Two components of measurement precision.

- Repeatability represents the variation that occurs when the same appraiser measures the same part with the same device.
- Reproducibility represents the variation that occurs when different appraisers measure the same part with the same device.

**Replicates**

Multiple experimental runs with the same factor settings (levels). Replicates are subject to the same sources of variability, independently of one another. You can replicate combinations of factor levels, groups of factor level combinations, or entire designs.

**Residual**

The difference between an observed value (y) and its corresponding fitted value (ŷ).

**Residual plots**

Use to examine the goodness of model fit in regression and ANOVA.

**Response optimization**

Helps you identify the combination of input variable settings that jointly optimize a single response or a set of responses.This is useful when you need to evaluate the impact of multiple inputs on a response.

**Response surface design**

A set of advanced design of experiments (DOE) techniques that help you better understand and optimize your response. Response surface design methodology is often used to refine models after important factors have been determined using factorial designs; especially if you suspect curvature in the response surface.

**Response and predictor variables**

Variables of interest in an experiment (those that are measured or observed) are called response or dependent variables. Other variables in the experiment that affect the response and can be set or measured by the experimenter are called predictor, explanatory, or independent variables.

**Restricted and unrestricted models**

Two types of mixed models in ANOVA. A mixed model is one with both fixed and random factors. A restricted model requires the crossed, mixed terms to sum to zero over subscripts corresponding to fixed effects. An unrestricted model does not require this. Many textbooks use the restricted model while most statistics programs use the unrestricted model. Minitab fits the unrestricted model by default, but you can choose to fit the restricted form. The reasons to choose one form over the other have not been clearly defined in the statistical literature.

**Round**

Reduces the number of digits displayed based on the number of decimal places you specify.

**Rejectable quality level (RQL)**

The poorest level of quality that the consumer is willing to tolerate in an independent lot. You want to design a sampling plan that rejects a particular lot of product at the RQL most of the time.

**Run (DOE)**

Each experimental condition or factor level combination at which responses are measured.

**Run chart**

A simple graphic representation of process data over time.

**Reliability test plan**

Specifies the sample size and test time duration of a reliability test.

S

**S chart**

Plots the process standard deviation over time for variables data in subgroups. This control chart is widely used to examine the stability of processes in many industries.

**Standard deviation**

The most common measure of dispersion, or how spread out the data are from the mean. While the range estimates the spread of the data by subtracting the minimum value from the maximum value, the standard deviation roughly estimates the "average" distance of the individual observations from the mean. The greater the standard deviation, the greater the spread in the data.

**Standard error of the regression (S)**

Used as a measure of model fit in regression and ANOVA. S is measured in the units of the response variable and represents the standard distance data values fall from the regression line, or the standard deviation of the residuals. For a given study, the better the equation predicts the response, the lower the value of S.

**Signal-to-noise ratio**

In Taguchi designs, a measure of robustness used to identify control factors that reduce variability in a product or process by minimizing the effects of uncontrollable factors (noise factors). Control factors are those design and process parameters that can be controlled. Noise factors cannot be controlled during production or product use, but can be controlled during experimentation. In a Taguchi designed experiment, you manipulate noise factors to force variability to occur and from the results, identify optimal control factor settings that make the process or product robust, or resistant to variation from the noise factors. Higher values of the signal-to-noise ratio (S/N) indicate control factor settings that minimize the effects of the noise factors.

**Sample N**

The number of observations in your data. In general, the larger the sample, the more precise you can be in your statistical estimations.

**Sample size**

The number of observations in a subset of a population. Sample size influences the power of a test. As sample size increases, power increases; however, it is not always feasible or cost-effective to test a large sample.

**Sampling distribution**

Describes the likelihood of obtaining each possible value of a statistic from a random sample of a population; in other words, what proportion of all random samples of that size will give that value.

**Sampling risk**

The probability of incorrectly rejecting or accepting a particular lot based on a sample. In acceptance sampling you make a decision to accept or reject an entire lot based on the results from inspecting a sample from that lot.

**S chart**

Plots the process standard deviation over time for variables data in subgroups. This control chart is widely used to examine the stability of processes in many industries.

**Scatterplot**

Use to explore the potential relationship between a pair of continuous variables. When you create a Scatterplot, you usually display the response variable on the y-axis and the predictor variable on the x-axis for each observation.

**Smoother line**

A line that is fitted to the data to explore the potential relationships between two variables, without fitting a specific model, such as a regression line or a theoretical distribution.

**Scree plot**

Displays the eigenvalues associated with a component or factor in descending order versus the number of the component or factor. Used in principal components analysis and factor analysis to visually assess which components or factors account for most of the variability in the data.

**Screening design**

A design (like fractional factorial DOE or Plackett-Burman) used to identify significant factors. Especially when an experiment has the potential to be very large, it's rarely time or cost effective to analyze all variables at all potential level combinations. Instead, it's better to use an iterative process that allows successive experiments to build from more general experiments aimed at finding vital factors and/or optimal setting ranges, to more specific experiments that more fully model the response surface and pinpoint the optimal factors. Screening designs are used at this earlier phase to "screen out" unimportant factors and/or settings.

**Standard error of coefficient**

The standard deviation of the estimate of a regression coefficient. It measures how precisely your data can estimate the coefficient's unknown value. Its value is always positive, and smaller values indicate a more precise estimate.

**Standard error of fits (SE Fit)**

Estimates the variation in the estimated mean response for a given set of predictor values, factor levels, or components and is used to generate the confidence interval for the prediction. The smaller the standard error, the more precise the estimated mean response.

**Standard error of the mean (SE Mean)**

Measures how precisely the sample mean estimates the population mean and is used to create confidence intervals for the population mean. Lower SE Mean values indicate more precise estimates of the population mean.

**Sum of squares**

Represents a measure of variation or deviation from the mean. It is calculated as a summation of the squares of the differences from the mean. The calculation of the total sum of squares considers both the sum of squares from the factors and from random chance or error.

**Short-term z bench and long-term z bench**

Z bench values are used to describe the capability of your process, also known as the sigma capability of your process.

**Short-term Z bench (Z benchST)**

Is the number of short-term standard deviations that fit between the process center and the red line. This line represents the point on the normal curve that would include all the defects, if you could put them on one side. (So the defects outside both upper and lower specifications combined on the right side of the curve).

**Long-term Z bench (Z benchST)**

Is the number of long-term standard deviations that fit between the process center and the red line This line represents the point on the normal curve that would include all the defects, if you could put them on one side. (So the defects outside both upper and lower specifications combined on the right side of the curve).

**Signal factor**

A factor, with a range of settings, that is controlled by the user of the product to make use of its intended function. Signal factors are used in dynamic experiments, in which the response is measured at each level of the signal. The objective of the experiment is to improve the relationship between the signal factor and the response.

**Signs**

Converts negative numbers, zero, and positive numbers to -1, 0, and +1, respectively.

**Simplex centroid and simplex lattice designs**

Mixture designs in which the design points are arranged in a uniform manner (or lattice) over an L-simplex.

**Skewness**

The degree to which a data set is not symmetrical. Like many other basic statistics, skewness can help you establish an initial understanding of your data. You can evaluate skewness via a graph (like a histogram) or through the skewness statistic.

**Slope and intercept**

Indicates the steepness of a line (slope) and the location where it intersects an axis (intercept). The slope and the intercept define the linear relationship between two variables, and can be used to estimate an average rate of change. The greater the magnitude of the slope, the steeper the line and the greater the rate of change.

**Somer's d**

Somer's D measures the strength and direction of the relationship between pairs of variables. Values range from -1 (all pairs disagree) to 1 (all pairs agree.)

**Statistical process control (SPC)**

The use of statistical techniques to analyze a process in order to monitor, control, and improve it. The objective is to have a stable, consistent process that produces the fewest defects possible.

**Spearman's rho**

A measure of association between two ordinal variables, based on the ranks of the data values.

**Specification limits**

Values between which products or services should operate. Specification limits are usually set by customer requirements.

**Split-plot design**

An experimental design that includes at least one hard-to-change factor that is difficult to completely randomize due to time or cost constraints.

**Square root**

For any nonnegative number a, the square root is the nonnegative number n such that n2 = a. The square root can be denoted by or a0.5.

**Sum of squares (uncorrected)**

Squares each value in the column, and computes the sum of those squared values. That is, if the column contains x1, x2, ... , xn, then sum of squares calculates (x1 + x2 + ... + xn ). Unlike the corrected sum of squares, the uncorrected sum of squares includes error. The data values are squared without first subtracting the mean.

**Standardized residuals**

Helpful in detecting outliers. The standardized residual equals the value of a residual, ei, divided by an estimate of its standard deviation.

**Stability**

The change in bias over time. Measurement stability represents the total variation in measurements obtained on the same part measured over time, also known as drift. It is important to assess stability on an ongoing basis. While calibrations and gage studies provide some information about changes in the measurement system, neither provides information on what is happening to the measurement process over time.

**Standard error of the regression (S)**

Used as a measure of model fit in regression and ANOVA. S is measured in the units of the response variable and represents the standard distance data values fall from the regression line, or the standard deviation of the residuals. For a given study, the better the equation predicts the response, the lower the value of S.

**Standard error of coefficient**

The standard deviation of the estimate of a regression coefficient. It measures how precisely your data can estimate the coefficient's unknown value. Its value is always positive, and smaller values indicate a more precise estimate.

**Standard error of fits (SE Fit)**

Estimates the variation in the estimated mean response for a given set of predictor values, factor levels, or components and is used to generate the confidence interval for the prediction. The smaller the standard error, the more precise the estimated mean response.

**Standard error of the mean (SE Mean)**

Measures how precisely the sample mean estimates the population mean and is used to create confidence intervals for the population mean. Lower SE Mean values indicate more precise estimates of the population mean.

**Standard error of fits (SE Fit)**

Estimates the variation in the estimated mean response for a given set of predictor values, factor levels, or components and is used to generate the confidence interval for the prediction. The smaller the standard error, the more precise the estimated mean response.

**Studentized deleted residuals**

Helpful in detecting outliers. The Studentized deleted residual of an observation is calculated by dividing an observation's deleted residual by an estimate of its standard deviation. A deleted residual di is the difference between yi and its fitted value in a model that omits the ith observation from its calculations. The observation is omitted to see how the model behaves without this potential outlier. If an observation has a large Studentized deleted residual (if its absolute value is greater than 2), it may be an outlier in your data.

**Standardized log residual**

The log residual divided by its (asymptotic) standard error. Minitab calculates the standardized log residuals using the method you select to analyze the variability of your model. If you use the least squares estimation method, the standardized log residuals are the standardized residuals obtained from performing weighted least squares regression on the log of the standard deviation of your response. Use standardized log residual in residual plots to examine your model.

**Static and dynamic Taguchi designs**

Two types of Taguchi designs that allow you to choose a product or process that performs more consistently in the operating environment. Both designs attempt to identify control factors that minimize the effect of the noise factors on the product or service.

**Statistically significant**

The difference between a sample statistic and a hypothesized parameter value is statistically significant if a hypothesis test suggests it is too unlikely to have occurred by chance. You can assess statistical significance by looking at a test's p-value, which is the probability of obtaining a test statistic at least as extreme as the one you actually calculated from your sample, if the null hypothesis is true. If the p-value is below a specified significance – or alpha() – level (typically 0.10, 0.05, or 0.01), you can declare the difference to be statistically significant and reject the test's null hypothesis.

**Stem-and-leaf plot**

Displays data to show its shape and distribution. It is similar to a histogram; however a stem-and-leaf plot shows exact data points making the calculation of the mean, median, and mode much easier.

**Stress value, stress level**

The amount of stress that you impose on a unit during a pass fail test (probit analysis) or an accelerated life test.

**Study variation**

In gage R&R, the amount of variation caused by the measurement system and by the differences between parts.

**Subgroup**

A group of units produced under the same set of conditions.

**Sum**

The result of adding two or more numbers.

**Survival plot**

Displays a plot of the survival probabilities versus time. Each plot point represents the proportion of units surviving at time t. The survival function is a non-increasing function of time.

**Survival probability**

The proportion of units that survive beyond a given time. These estimates of survival probabilities are often referred to as reliability estimates. Use these values to determine whether your product meets reliability requirements or to compare the reliability of two or more designs of a product.

T

**t-test**

A hypothesis test of the mean of one or two normally distributed populations.

**t-value**

Test statistic for the t-test family, it measures the difference between an observed statistic and its hypothesized population parameter in units of standard error.

**Tangent**

In trigonometry, the tangent of an angle is the ratio of its sine to its cosine.

**Target value**

The ideal value of a product parameter as determined by a customer, engineer, or manager.

**t-distribution**

A symmetric, bell-shaped distribution that is similar to the normal distribution, but with thicker tails. As the foundation of the t-test, this distribution is useful for analyzing the mean of an approximately normal population when the population standard deviation is unknown. Tests of significance for regression coefficients also use the t-distribution.

**Test statistic**

A standardized value calculated from sample data during a hypothesis test, used to determine whether to reject the null hypothesis. When hypothesis tests compare your observed sample data with what is expected under the null hypothesis, the comparison is based on the test statistic. Values of a test statistic correspond to p-values for the hypothesis test. Therefore, when the data present strong evidence against the assumptions in the null hypothesis, the magnitude of the test statistic becomes large and the test's p-value may become small enough to reject the null hypothesis.

**Test R2**

Indicates how well the PLS model predicts your test data. Analysis using PLS is often done in two steps.

**t-value**

Test statistic for the t-test family, it measures the difference between an observed statistic and its hypothesized population parameter in units of standard error. A t-test compares this observed t-value to a critical value on the t-distribution with (n-1) degrees of freedom to determine whether the difference between the estimated and hypothesized values of the population parameter is statistically significant.

**Tests of equal variances**

Use to test the equality of variance between populations or factor levels. Many statistical procedures, such as analysis of variance (ANOVA) and regression, assume that although different samples may come from populations with different means, they have the same variance.

**Time series**

A sequence of observations over regularly spaced intervals of time.

**Time series plot**

Use to evaluate patterns and behavior in data over time. Time series plots display observations on the y-axis against equally spaced time intervals on the x-axis.

**Time-weighted charts**

Control charts that take into account historical data, allowing detection of relatively small shifts from the target value.

**Tolerance interval**

Frequently used to detect excessive variation by comparing client requirements to tolerance limits that cover a specified proportion of the population. If the tolerance interval is wider than the client's requirements, there may be too much product variation. Derived from sample statistics, tolerance intervals are a range of values for a specific quality characteristic that likely covers a specified population proportion. Alternatively, a lower or upper limit may be constructed such that the specified proportion will be greater than or less than the limit.

**Tolerance (capability analysis)**

A value that sets the standard by which the capability of your process is determined. It is defined as a multiple of a process standard deviation (sigma). Typically, 6*sigma is used as a tolerance.

**Total-time-on-test plot**

In repairable systems analysis, use the total-time-on-test (TTT) plot as a graphical goodness-of-fit test to determine whether the power-law process can model your system. The power-law process is a nonhomogeneous Poisson process with an intensity function that represents the rate of failures or repairs.

**Transform count**

Performs the Freeman-Tukey transformation to stabilize variance for Poisson data.

**Treatments**

Experimental settings applied in an organized manner across test subjects.

**Triangular distribution**

Used primarily to describe a population for which limited sample data are available.

**Trimmed mean (TRMEAN)**

The mean of the data after a certain percentage of the most extreme (high and low) values are excluded. The trimmed mean reduces the impact of very large or small values on the mean, and thus may provide a more useful measure of central tendency for data with outliers.

**Tsquared generalized variance chart**

Plots the process location (Tsquared chart) and the process variability (generalized variance chart) of two or more related process characteristics over time. It is the multivariate counterpart to the Xbar-R, Xbar-S, and I-MR charts. You can use this chart to simultaneously assess whether the process mean and variation are in control.

**Tukey's method**

Used in ANOVA to create confidence intervals for all pairwise differences between factor level means while controlling the family error rate to a level you specify.

**Two-level factorial design**

An experimental design where each factor has only two levels not counting center points.

**Type I and type II error**

Errors you may make in your decision to reject or fail to reject the null hypothesis (Ho ). If you reject the null hypothesis when it is true, you make a Type I error. If you fail to reject the null hypothesis when it is false, you make a Type II error.

U

**U chart**

Plots the number of nonconformities or defects per unit. It is possible for a unit to have one or more nonconformities but still be acceptable in function and performance.

**Uniform distribution**

A continuous distribution that describes variables that have a constant probability.

V

**Variance**

A measure of dispersion, which is the extent to which a data set or distribution is scattered around its mean.

**Variance components**

Use to evaluate sources of variation.

**Variable**

An observable characteristic that can vary for different units or subjects.

**Variables charts**

Control charts that plot continuous measurement process data, such as length or pressure, in a time-ordered sequence. In contrast, attributes control charts plot count data, such as the number of defects or defective units. Variables charts, like all control charts, help you identify causes of variation to investigate, so that you can take action on your process without over-controlling it.

**Variance inflation factor (VIF)**

Indicates the extent to which multicollinearity (correlation among predictors) is present in a regression analysis. Multicollinearity is problematic because it can increase the variance of the regression coefficients, making them unstable and difficult to interpret.

**Variance-covariance matrix**

A square matrix that contains the variances and covariances associated with several variables. The diagonal elements of the matrix contain the variances of the variables and the off-diagonal elements contain the covariances between all possible pairs of variables.

W

**Weibull distribution**

Distribution commonly used to model time-to-failure data.

**Whole plot and subplot error**

The two types of random experimental error present in a split-plot design.

Whole plot error is the variability that occurs between the whole plots in the experiment. Subplot error is the variation from run to run within a whole plot that is not explained by the factors. For example, a large baked goods company uses a split-plot experiment to design a new brownie recipe.

**Wilcoxon**

A nonparametric hypothesis test for the median of a single population.

X

**Xbar chart**

Plots the process mean over time for variables data in subgroups. This control chart is widely used to examine the stability of processes in many industries. For example, you can use Xbar charts to examine the process mean for subgroups of part lengths, call times, or hospital patients' blood pressure over time.

**Xbar-R chart**

Plots the process mean (Xbar chart) and process range (R chart) over time for variables data in subgroups. This combination control chart is widely used to examine the stability of processes in many industries. For example, you can use Xbar-R charts to examine the process mean and variation for subgroups of part lengths, call times, or hospital patients' blood pressure over time.

**Xbar-S chart**

Plots the process mean (Xbar chart) and process standard deviation (S chart) over time for variables data in subgroups. This combination control chart is widely used to examine the stability of processes in many industries. For example, you can use Xbar-S charts to examine the process mean and variation for subgroups of part lengths, call times, or hospital patients' blood pressure over time.

**X-calculated values**

Linear combinations of the x-scores. The x-calculated values contain the variance in the predictors explained by the PLS model. Observations with relatively small x-calculated values are outliers in the x-space and are not well explained by the model.

**X-loadings**

The linear coefficients that link the predictors to the x-scores. The x-loadings indicate the importance of the corresponding predictor to the mth component. X-loadings, which are similar to eigenvectors in principal components analysis, form an p x m matrix, where p = number of predictors and m = number of components.

**X-residuals**

Contain the variance in the predictors not explained by the PLS model. Observations with relatively large x-residuals are outliers in the x-space, indicating that they are not well explained by the model.

**X-scores**

Linear combinations of the predictor variables. The x-scores, which are similar to principal component scores, form an n x m matrix of uncorrelated columns, where n = number of observations and m = number of components. Providing a window to the x-space, the x-scores are projections of the observations on the PLS components. PLS fits the x-scores, which replace the original predictors, using least squares estimation.

**X-variance**

Indicates the amount of variance in the predictors that is explained by the model. The x-variance value is between 0 and 1. The closer the x-variance value is to 1, the better the components represent the original set of predictors. If you have more than 1 response, the x-variance value is the same for all responses.

**X-weights**

Describe the covariance between the predictors and responses. In the algorithm, the x-weights are used to ensure the x-scores are orthogonal, or unrelated to one another. The x-weights, which are used to calculate the x-scores, form an p x m matrix, where p = number of predictors and m = number of components.

Y

**Y-calculated values**

Linear combinations of the x-scores. The y-calculated values contain the variance in the responses explained by the PLS model. Observations with relatively small y-calculated values are outliers in the y-space and are not well explained.

**Yield metrics**

There are several different yield metrics that you can use to describe the performance of your process, in terms of the percentage of good units coming out of a process.

**Y-loadings**

The linear coefficients that link the responses to the y-scores. The y-loading values indicate the importance of the corresponding response to the mth component. Y-loadings form an r x m matrix, where r = number of responses and m = number of components.

**Y-residuals**

Contain the remaining variance in the responses not explained by the PLS model. Observations with relatively large y-residuals are outliers in the y-space, indicating that they are not well explained by the model.

The y-residuals are the differences between the actual response values and y-calculated values, and are on the same scale as the original responses. The y-residual matrix, similar to the original y-matrix, is an n x r matrix, where n = number of observations and r = number of responses.

**Y-scores**

Linear combinations of the response variables. The y-scores form an n x m matrix, where n = number of observations and m = number of components. Providing a window to the y-space, the y-scores are projections of the observations on the PLS components.

Z

**Z bench**

Z bench values are used to describe the capability of your process, also known as the sigma capability of your process.

**Z-MR chart**

Plots the standardized process mean (Z chart) and variation (MR chart) for short run processes when little data are available. Short run processes often do not have enough data in each run to produce good estimates of the process parameters. For example, you may produce only 20 units of a part, then reset the machine to produce a different part in the next run.

**Z (Johnson transformation)**

A set of standard normal percentiles used to select and estimate the Johnson transformation. The optimum transformation is that which produces normalized data with the greatest p-value in an Anderson-Darling normality test.

**Zone chart**

Plots a cumulative score, based on "zones" at 1, 2, and 3 standard deviations from the center line. A hybrid between an Xbar (or individuals) control chart and a CUSUM control chart. A point is out of control simply, by default, if its score is greater than or equal to 8 (the zone 4 score). Use Zone charts when you have variables data and want to create an easily interpretable control chart. Zone charts are also known as J charts.

**Z-test**

A hypothesis test based on the Z-statistic, which follows the standard normal distribution under the null hypothesis.

**Z-value (Z-score)**

Measures how far an observation lies from its mean, in units of standard deviation.