Glossary

A Primer on Partial Least Squares
Structural Equation Modeling (PLS-SEM)

Glossary

10 times rule: one way to determine the minimum sample size specific to the PLS path model that one needs for model estimation (i.e., 10 times the number of independent variables of the most complex ordinary least squares regression in the structural model or any formative measurement model). The 10 times rule is not a reliable indication of sample size requirements in PLS-SEM and should at best be seen as a rough estimate. While statistical power analyses provide more reliable minimum sample size estimates, researchers should primarily draw on the inverse square root method, which stands out in terms of precision and ease of use.

Absolute contribution: the information an indicator variable provides about the formatively measured item, ignoring all other indicators. The absolute contribution is provided by the loading of the indicator (i.e., its bivariate correlation with the formatively measured construct).

Absolute importance: see Absolute contribution.

Akaike weights: the weight of evidence in favor of a certain model being the best model for the situation at hand given a set of alternative models.

Algorithmic options: offer different ways to run the PLS-SEM algorithm by, for example, selecting between alternative starting values, stop values, weighting schemes, and maximum number of iterations.

Alternating extreme pole responses: a suspicious survey response pattern where a respondent uses only the extreme poles of the scale (e.g., a 7-point scale) in an alternating order to answer the questions.

Artifacts: human-made concepts that are typically measured with formative indicators.

AVE: see Average variance extracted.

Average variance extracted (AVE): a measure of convergent validity. It is the degree to which a latent construct explains the variance of its indicators; see Communality (construct).

Bandwidth-fidelity dilemma: a practical dilemma resulting from the trade-off between using measures that will cover the majority of variation in a trait or measures that will assess a few specific traits more precisely.

Bayesian information criterion (BIC): a criterion for model selection among an alternative set of models. The model with the lowest BIC is preferred.

Bias-corrected and accelerated (BCa) bootstrap confidence intervals: a method for constructing confidence intervals that adjusts for biases and skewness in the bootstrap distribution. The method yields very low type I errors but is limited in terms of statistical power.

BIC: see Bayesian information criterion.

Blindfolding: a sample reuse technique that omits singular elements of the data matrix and uses the model estimates to predict the omitted part. It is used to compute the Q² statistic.

Bootstrap cases: these make up the number of observations drawn in every bootstrap run. The number is set equal to the number of valid observations in the original data set.

Bootstrap confidence interval:** provides an estimated range of values that is likely to include an unknown population parameter. It is determined by its lower and upper bounds, which depend on a predefined probability of error and the standard error of the estimation for a given set of sample data. When zero does not fall into the confidence interval, an estimated parameter can be assumed to be significantly different from zero for the prespecified probability of error (e.g., 5%).

Bootstrap samples: the number of samples drawn in the bootstrapping procedure. Generally, 10,000 or more samples are recommended.

Bootstrapping: a resampling technique that draws a large number of subsamples from the original data (with replacement) and estimates models for each subsample. It is used to determine standard errors of coefficients to assess their statistical significance without relying on distributional assumptions.

Cascaded moderator analysis: a type of moderator analysis in which the strength of a moderating effect is influenced by another variable (i.e., the moderating effect is again moderated).

Casewise deletion: an entire observation (i.e., a case or respondent) is removed from the data set because of missing data.

Categorical moderator variable: see Multigroup analysis.

Categorical scale: see Nominal scale.

Causal indicators: a type of indicator used in formative measurement models. Causal indicators do not fully form the latent variable but “cause” it. Therefore, causal indicators must correspond to a theoretical definition of the concept under investigation.

Causal links: are directed relationships between constructs, which can be interpreted as causal if supported by strong theory.

CB-SEM: see Covariance-based structural equation modeling.

CCA: see Confirmatory composite analysis.

Centroid weighting scheme: uses in the first stage of the PLS-SEM algorithm a value of +1 or −1 for relationships between the constructs in the structural model depending on the sign of their correlations; see Weighting scheme.

Cluster analysis: see Clustering.

Clustering: a class of methods that partition a set of objects with the goal of obtaining high similarity within the formed groups and high dissimilarity across groups.

Coding: the assignment of numbers to scales in a manner that facilitates measurement.

Coefficient of determination (R²): a measure of the proportion of an endogenous construct’s variance that is explained by its predictor constructs. It indicates a model’s explanatory power with regard to a specific endogenous construct.

Collinearity: arises when two variables are highly correlated.

Common factor-based SEM: a type of SEM method, which considers the constructs as common factors that explain the covariation between its associated indicators.

Common factor model: assumes that only the variance shared by the indicators used to measure a construct (i.e., the common variance) should be used to estimate the construct and its relationship with other constructs in a model. Exploratory factor analysis (EFA), confirmatory factor analysis (CFA), and CB-SEM (also referred to as common factor-based SEM) are the three main types of analyses based on common factor models.

Communality (construct): see Average variance extracted (AVE).

Communality (item): see Indicator reliability.

Competitive mediation: a situation in mediation analysis that occurs when the indirect effect and the direct effect are both significant and point in opposite directions.

Complementary mediation: a situation in mediation analysis that occurs when the indirect effect and the direct effect are both significant and point in the same direction.

Composite indicators: a type of indicator used in formative measurement models. Composite indicators form the construct (or composite) fully by means of linear combinations.

Composite reliability (ρA): A measure of internal consistency reliability, which considered a sound tradeoff between the conservative Cronbach's alpha and the liberal composite reliability (ρC)

Composite reliability (ρC): a measure of internal consistency reliability, which, unlike Cronbach’s alpha, does not assume equal indicator loadings. It should be above 0.70 (in exploratory research, 0.60 to 0.70 is considered acceptable).

Composite scores: see Construct scores.

Composite variable: a linear combination of several variables.

Composite-based SEM: a type of SEM method that represents the constructs as composites, formed by linear combinations of sets of indicator variables.

Compositional invariance: exists when the composite scores across the groups are perfectly correlated.

Conditional indirect effect: see Moderated mediation.

Conditional process models: combine mediation and moderation analysis. See mediated moderation and moderated mediation.

Confidence interval: see Bootstrap confidence interval.

Configural invariance: exists when constructs are equally parameterized and estimated across groups.

Confirmatory: describes applications that aim at empirically testing theoretically developed models.

Confirmatory composite analysis (CCA): a set of analyses used to verify the quality of a composite measurement of a theoretical concept of interest.

Confirmatory tetrad analysis in PLS-SEM (CTA-PLS): a statistical procedure that allows for empirically testing the measurement model setup (i.e., whether the measures should be specified reflectively or formatively).

Consistency at large: describes an improvement of precision of PLS-SEM results when both the number of indicators per measurement model and the number of observations increase, assuming that the data stem from a common factor model.

Consistent PLS-SEM (PLSc-SEM): a variant of the standard PLS-SEM algorithm that provides consistent model estimates in common factor models by disattenuating the correlations between pairs of latent variables, thereby mimicking CB-SEM results.

Construct scores: columns of data (vectors) for each latent variable that represent a key result of the PLS-SEM algorithm. The length of every vector equals the number of observations in the data set used.

Constructs: measure theoretical concepts that are abstract, complex, and cannot be directly observed by means of (multiple) items. Constructs are represented in path models as circles or ovals and are also referred to as latent variables.

Content specification: the specification of the scope of the latent variable; that is, the domain of content the indicators are intended to capture.

Content validity: a subjective but systematic evaluation of how well the domain content of a construct is captured by its indicators.

Continuous moderator variable: a variable that affects the direction and/or strength of the relation between an exogenous latent variable and an endogenous latent variable. Continuous moderator variables can also be used to generate categories, which serve as basis for a subsequent multigroup analysis.

Control variables: the variables that researchers seek to keep constant when conducting research.

Convergence: this is reached when the results of the PLS-SEM algorithm do not change much. In that case, the PLS-SEM algorithm stops when a prespecified stop criterion (i.e., a small number such as 0.00001) that indicates the minimal changes of PLS-SEM computations has been reached. Thus, convergence has been accomplished when the PLS-SEM algorithm stops because the prespecified stop criterion has been reached and not the maximum number of iterations.

Convergent validity: the degree to which a reflectively specified construct explains the variance of its indicators (see Average variable extracted). In formative measurement model evaluation, convergent validity refers to the degree to which the formatively measured construct correlates positively with an alternative (reflective or single-item) measure of the same concept (see Redundancy analysis).

Correlation weights: See Mode A.

Covariance-based structural equation modeling (CB-SEM): an approach for estimating structural equation models that assumes that the concepts of interest can be represented by common factors. It can be used for theory testing but has clear limitations in terms of testing a model’s predictive power.

Coverage error: occurs when the bootstrapping confidence interval of a parameter does not correspond to its empirical confidence interval.

Critical t value: the cutoff or criterion on which the significance of a coefficient is determined. If the empirical t value is larger than the critical t value, the null hypothesis of no effect is rejected. Typical critical t values are 2.57, 1.96, and 1.65 for significance levels of 1%, 5%, and 10%, respectively (two-tailed tests).

Critical value: see Significance testing.

Cronbach’s alpha: a measure of internal consistency reliability that assumes equal indicator loadings. Cronbach’s alpha represents a conservative measure of internal consistency reliability.

Cross-loadings: an indicator’s correlation with other constructs in the model.

CTA-PLS: see Confirmatory tetrad analysis.

Data matrix: includes the empirical data that are needed to estimate the PLS path model. The data matrix must have one column for every indicator in the PLS path model. The rows represent the observations with their responses to every indicator on the PLS path model.

Degrees of freedom (df): the number of values in the final calculation of the test statistic that are free to vary.

Diagonal lining: a suspicious survey response pattern in which a respondent uses the available points on a scale (e.g., a 7-point scale) to place the answers to the different questions on a diagonal line.

Direct effect: a relationship linking two constructs with a single arrow between the two.

Direct-only nonmediation: a situation in mediation analysis that occurs when the direct effect is significant but not the indirect effect.

Disattenuated correlation: the correlation between two constructs, if they were perfectly measured (i.e., if they were perfectly reliable).

Discriminant validity: the extent to which a construct is empirically distinct from other constructs in the model.

Disjoint two-stage approach: uses only the lower-order components of a higher-order construct in its first stage to compute the construct scores, which serve as indicators of the higher-order component in the second stage.

Effect indicators: see Reflective measurement.

Embedded two-stage approach: uses the entire higher-order construct in its first stage to compute the construct scores, which serve as indicators of the higher-order component in the second stage.

Empirical t value: the test statistic value obtained from the data set at hand (here:** the bootstrapping results). See Significance testing.

Endogeneity: occurs when a predictor construct is correlated with the error term of the dependent construct to which it is related.

Endogenous constructs: see Endogenous latent variables.

Endogenous latent variables: serve only as dependent variables or as both independent and dependent variables in a structural model.

Equality of composite mean values and variances: the final requirement to establish full measurement invariance.

Equidistance: when the distance between data points of a scale is identical.

Error terms: capture the unexplained variance in constructs and indicators when path models are estimated.

Evaluation criteria: used to evaluate the quality of the measurement models and the structural model results in PLS-SEM based on a set of nonparametric evaluation criteria and procedures such as bootstrapping.

Exact fit test: a model fit test that applies bootstrapping to derive p values of the Euclidean distance (dL) or geodesic distance (dG) between the observed correlations and the model-implied correlations. Research has shown that these measures are largely unsuitable for detecting model misspecification in situations commonly encountered in applied research.

Exogenous constructs: see Exogenous latent variables.

Exogenous latent variables: latent variables that serve only as independent variables in a structural model.

Explained variance: see Coefficient of determination (R²).

Explaining and predicting (EP) theories: a theory type that implies both understanding of underlying causes and prediction, as well as description of theoretical constructs and the relationships among them.

Explanatory power: provides information about the strength of the assumed causal relationships in a PLS path model. The primary measure for assessing a PLS path model’s explanatory power is the coefficient of determination (R²).

Exploratory: describes applications that focus on exploring data patterns and identifying relationships.

Extended repeated indicators approach: a method for estimating a formatively specified higher-order constructs whose higher-order component serves as an endogenous construct in the PLS path model. Also see Repeated indicators approach.

f² effect size: a measure used to assess the relative impact of a predictor construct on an endogenous construct in terms of its explanatory power.

Factor (score) indeterminacy: means that one can compute an infinite number of sets of factor scores matching the specific requirements of a certain common factor model. In contrast to their explicit estimation in PLS-SEM, the scores of common factors as assumed in CB-SEM are indeterminate.

Factor weighting scheme: uses the correlations between constructs in the structural model to determine their relationships in the first stage of the PLS-SEM algorithm; see Weighting scheme.

FIMIX-PLS: see Finite mixture partial least squares.

Finite mixture partial least squares (FIMIX-PLS): a latent class approach that allows for identifying and treating unobserved heterogeneity in PLS path models. The approach applies mixture regressions to simultaneously estimate group-specific parameters and observations’ probabilities of segment membership.

First-generation techniques: statistical methods traditionally used by researchers, such as regression and analysis of variance.

Formative measurement model: a type of measurement model setup in which the indicators form the construct, and arrows point from the indicators to the construct. The outer weights estimation of formative measurement models usually uses Mode B in PLS-SEM.

Formative measurement: see Formative measurement model.

Formative measures: see Formative measurement model.

Formative–formative higher-order construct: has formatively measured lower-order components and relationships from the lower-order components to the higher-order component.

Formative–reflective higher-order construct: has formatively measured lower-order components and relationships from the higher-order component to the lower-order components.

Fornell-Larcker criterion: a measure of discriminant validity that compares the square root of each construct’s average variance extracted with its correlations with all other constructs in the model. The Fornell-Larcker criterion is largely unsuitable for detecting discriminant validity problems.

Full measurement invariance: this is confirmed when (1) configural invariance, (2) compositional invariance, and (3) equality of composite mean values and variances are demonstrated.

Full mediation: a situation in mediation analysis that occurs when the mediated effect is significant but not the direct effect. Hence, the mediator variable fully explains the relationship between an exogenous and an endogenous latent variable. Full mediation is also referred to as indirect-only mediation.

Gaussian copula approach: a method for diagnosing and treating endogeneity, which directly models the correlation of an antecedent construct with its endogenous construct’s error term.

Genetic algorithm segmentation in PLS-SEM (PLS-GAS): a distance-based segmentation method in PLS-SEM that builds on genetic algorithms, a search heuristic, which aims to find a good (not necessary the best) solution for the classification problem.

Geweke and Meese criterion (GM): a criterion for model selection among a set of alternative models. The model with the lowest GM is preferred.

GM: see Geweke and Meese criterion.

GoF: see Goodness-of-fit index.

Goodness-of-fit index (GoF): has been developed as an overall measure of model fit for PLS-SEM. However, as the GoF cannot reliably distinguish valid from invalid models and since its applicability is limited to certain model setups, researchers should avoid its use.

Heterogeneity: occurs when the data underlie groups of data characterized by significant differences in terms of model parameters. Heterogeneity can be either observed or unobserved, depending on whether its source can be traced back to observable characteristics (e.g., demographic variables) or whether the sources of heterogeneity are not fully known.

Heterotrait-heteromethod correlations: the correlations of the indicators across constructs measuring different constructs.

Heterotrait-monotrait ratio (HTMT): a measure of discriminant validity. The HTMT is the mean of all correlations of indicators across constructs measuring different constructs (i.e., the heterotrait-heteromethod correlations) relative to the (geometric) mean of the average correlations of indicators measuring the same construct (i.e., the monotrait-heteromethod correlations).

Hierarchical component models: see Higher-order constructs.

Higher-order component: represents a more abstract dimension of a concept in a higher-order construct.

Higher-order constructs: represent a higher-order structure (usually second-order) that contains several layers of constructs and involves a higher level of abstraction. Higher-order constructs involve a more abstract higher-order component related to two or more lower-order components a reflective or formative way.

Higher-order models: see Higher-order constructs.

Holdout sample: a subset of a larger data set or a separate data set not used in model estimation.

HTMT: see Heterotrait-monotrait ratio.

Hypothesized relationships: proposed explanations for constructs that define the path relationships in the structural model. The PLS-SEM results enable researchers to statistically test these hypotheses and thereby empirically substantiate the existence of the proposed path relationships.

Importance: a term used in the context of IPMA. It is equivalent to the unstandardized total effect of some latent variable on the target variable.

Importance–performance map analysis (IPMA): extends the standard PLS-SEM results reporting of path coefficient estimates by adding a dimension to the analysis that considers the average values of the latent variable scores. More precisely, the IPMA contrasts structural model total effects on a specific target construct with the average latent variable scores of this construct’s predecessors.

Importance–performance map: a graphical representation of the importance–performance map analysis.

Inconsistent mediation: see Competitive mediation.

Index: a set of formative indicators used to measure a construct.

Index of moderated mediation: quantifies the effect of a moderator on the indirect effect of an exogenous construct on an endogenous construct through a mediator.

Indicator reliability: the square of a standardized indicator’s outer loading. It represents how much of the variation in an item is explained by the construct and is referred to as the variance extracted from the item.

Indicators: these are directly measured observations (raw data), also referred to as either items or manifest variables, which are represented in path models as rectangles. They are also available data (e.g., responses to survey questions or collected from company databases) used in measurement models to measure the latent variables.

Indirect effect: represents a relationship between two latent variables via a third (e.g., mediator) construct in the PLS path model. If p1 is the relationship between the exogenous latent variable and the mediator variable, and p2 is the relationship between the mediator variable and the endogenous latent variable, the indirect effect is the product of path p1 and path p2.

Indirect-only mediation: a situation in mediation analysis that occurs when the indirect effect is significant but not the direct effect. Hence, the mediator variable fully explains the relationship between an exogenous and an endogenous latent variable. Indirect-only mediation is also referred to as full mediation.

Individual mediating effect: a type of mediating effect in a multiple mediation model which only considers one mediator.

Initial values: the values for the relationships between the latent variables and the indicators in the first iteration of the PLS-SEM algorithm. Since the user typically has no information which indicators are more important and which indicators are less important per measurement model, an equal weight for every indicator in the PLS path model serves well for the initialization of the PLS-SEM algorithm. In accordance, all relationships in the measurement models have an initial value of +1.

Inner model: see Structural model.

In-sample predictive power: see Coefficient of determination.

Interaction effect: see Moderating effect.

Interaction term: an auxiliary variable entered into the path model to account for the interaction of the moderator variable and the exogenous construct.

Internal consistency reliability: a form of reliability used to judge the consistency of results across items on the same test. It determines whether the items measuring a construct are similar in their scores (i.e., if the correlations between the items are strong).

Interpretational confounding: a situation in which the empirical meaning of a construct departs from the theoretically implied meaning.

Interval scale: can be used to provide a rating of objects and has a constant unit of measurement so the distance between the scale points is equal.

Inverse square root method: a method for determining the minimum sample size requirement, which uses the value of the path coefficient with the minimum magnitude in the PLS path model as input.

IPMA: see Importance–performance map analysis.

Items: see Indicators.

Iterative reweighted regressions segmentation (PLS-IRRS): a particularly fast and high-performing distance-based segmentation method for PLS-SEM.

Joint mediating effect: a type of mediating effect in a multiple mediation model which considers the total indirect effect of an exogenous on an endogenous construct via all mediators.

k-fold cross-validation: a model validation technique for assessing how the results of a PLS-SEM analysis will generalize to an independent data set. The technique combines k-1 subsets into a single training sample that is used to predict the remaining subset.

Kurtosis: is a measure of whether the distribution is too peaked (a very narrow distribution with most of the responses in the center).

Latent class techniques: statistical methods that facilitate uncovering and treating unobserved heterogeneity. Various approaches have been proposed, which generalize, for example, finite mixture, genetic algorithm, or hill-climbing approaches to PLS-SEM.

Latent variables: elements of a structural model that are used to represent theoretical concepts in statistical models. A latent variable that only explains other latent variables (only outgoing relationships in the structural model) is called exogenous, while latent variables with at least one incoming relationship in the structural model are called endogenous. Also see Constructs.

Latent variable scores: see Construct scores.

Linear regression model (LM) benchmark: a benchmark used in PLSpredict, derived from regressing an endogenous construct’s indicators on the indicators of all exogenous constructs. The LM benchmark thereby neglects the measurement model and structural configurations. PLS-SEM results are assumed to outperform the LM benchmark.

Listwise deletion: see Casewise deletion.

Lower-order components: represent more concrete subdimension of a concept in a higher-order construct.

MAE: see Mean absolute error (MAE).

Main effect: refers to the direct effect between an exogenous and an endogenous construct in the path model without the presence of a moderating effect. After inclusion of the moderator variable, the main effect typically changes in magnitude. Therefore, it is commonly referred to as simple effect in the context of a moderator model.

Manifest variables: see Indicators.

Maximum number of iterations: is needed to ensure that the PLS-SEM algorithm stops. The goal is to reach convergence. But if convergence cannot be reached, the algorithm should stop after a certain number of iterations. This maximum number of iterations (e.g., 300) should be sufficiently high to allow the PLS-SEM algorithm to converge based on the stop criterion. Also see Convergence.

Mean absolute error (MAE): a metric used in PLSpredict, defined as the average absolute differences between the predictions and the actual observations, with all the individual differences having equal weight.

Mean value replacement: inserts the sample mean for the missing data. Should only be used when indicators have less than 5% missing values.

Measurement: the process of assigning numbers to a variable based on a set of rules.

Measurement equivalence: see Measurement invariance.

Measurement error: the difference between the true value of a variable and the value obtained by a measurement.

Measurement invariance of composite models (MICOM) procedure: a series of tests to assess invariance of measures (constructs) across multiple groups of data. The procedure comprises three steps that test different aspects of measurement invariance: (1) configural invariance (i.e., equal parameterization and way of estimation), (2) compositional invariance (i.e., similar composite scores), and (3) equality of composite mean values and variances.

Measurement invariance: refers to whether or not, under different conditions of observing and studying phenomena (e.g., across different groups of respondents), measurement operations yield measures of the same attribute.

Measurement model misspecification: describes the use of a reflective measurement model when it is formative or the use of a formative measurement model when it is reflective. Measurement model misspecification usually yields invalid results and misleading conclusions.

Measurement model: an element of a path model that contains the indicators and their relationships with the constructs and is also called the outer model in PLS-SEM.

Measurement scale: a tool with a predetermined number of closed-ended responses that can be used to obtain an answer to a question.

Measurement theory: specifies how constructs should be measured with (a set of) indicators. It determines which indicators to use for construct measurement and the directional relationship between construct and indicators.

Mediated moderation: combines a moderator model with a mediation model in that the continuous moderating effect is mediated.

Mediating effect: occurs when a third construct intervenes between two other related constructs.

Mediation: represents a situation in which one or more mediator construct(s) explain the processes through which an exogenous construct influences an endogenous construct.

Mediation model: see Mediation.

Mediator construct: a construct that intervenes between two other directly related constructs.

Metric scale: represents data on a ratio scale and interval scale; see Ratio scale, Interval scale.

Metrological uncertainty: the dispersion of the measurement values that can be attributed to the object or concept being measured.

MICOM: see Measurement invariance of composite models (MICOM) procedure.

Minimum sample size requirements: the number of observations needed to represent the underlying population and to meet the technical requirements of the multivariate analysis method used. See inverse square root method.

Missing value treatment: can employ different methods such as mean replacement, EM (expectation-maximization algorithm), and nearest neighbor to obtain values for missing data points in the set of data used for the analysis. As an alternative, researchers may consider deleting cases with missing values (i.e., casewise deletion).

Mode A: uses correlation weights to compute composite scores from sets of indicators. More specifically, the outer weights are the correlation (or single regression) between the construct and each of its indicators. See Reflective measurement.

Mode B: uses regression weights to compute composite scores from sets of indicators. To obtain the weights, the construct is regressed on its indicators. Hence, the outer weighs in Mode B are the coefficients of a multiple regression model. See Formative measurement.

Model comparisons: involve establishing and empirically comparing a set of theoretically justified competing models that represent alternative explanations of the phenomenon under research.

Model complexity: indicates how many latent variables, structural model relationships, and indicators exist in a PLS path model.

Model overfit: occurs when the model estimates fit the data set used for model estimation but do not generalize well to other data sets.

Model parsimony: see Parsimonious models.

Model-implied nonredundant vanishing tetrads: tetrads considered for significance testing in CTA-PLS.

Moderated mediation: combines a mediation model with a moderator model in that the mediator relationship is moderated by a moderator construct.

Moderating effect: see Moderation.

Moderation: occurs when the effect of an exogenous latent variable on an endogenous latent variable depends on the values of a third variable, referred to as a moderator variable, which impacts the relationship.

Moderator effect: see Moderation.

Moderator variable: see Moderation.

Monotrait-heteromethod correlations: the correlations of indicators measuring the same construct.

Multicollinearity: see Collinearity.

Multigroup analysis: a type of moderator analysis where the moderator variable is categorical (usually with two categories) and is assumed to potentially affect all relationships in the structural model; it tests whether parameters (mostly path coefficients) differ significantly between two or more groups. Research has proposed a range of approaches to multigroup analysis, which rely on the bootstrapping and permutation procedures.

Multiple mediation analysis: describes a mediation analysis in which multiple mediator variables are being included in the model.

Multiple moderator model: describes a moderation analysis in which multiple moderators are being included in the model.

Multivariate analysis: a statistical method that simultaneously analyzes multiple variables.

NCA: see Necessary condition analysis.

Necessary condition analysis: a statistical method that facilitates analyzing whether an outcome or certain level of an outcome can only be achieved if the necessary cause is in place or is at a certain level.

No-effect nonmediation: a situation in mediation analysis that occurs when neither the direct nor the indirect effect is significant.

Nominal scale: a measurement scale in which numbers are assigned that can be used to identify and classify objects (e.g., people, companies, products, etc.).

Nomological validity: the degree to which a construct behaves as it should in a system of related constructs.

Observed heterogeneity: occurs when the sources of heterogeneity are known and can be traced back to observable characteristics such as demographics (e.g., gender, age, income).

Omission distance D: determines which data points are deleted when applying the blindfolding (see Blindfolding) procedure.

One-tailed test: see Significance testing.

Ordinal scale: a measurement scale in which numbers are assigned that indicate relative positions of objects in an ordered series.

Orthogonalizing approach: an approach to model the interaction term when including a moderator variable in the model. It creates an interaction term with orthogonal indicators, which are uncorrelated with the indicators of the independent variable and the moderator variable in the moderator model.

Outer loadings: the bivariate correlations between a construct and the indicators. They determine an item’s absolute contribution to its assigned construct. Loadings are of primary interest in the evaluation of reflective measurement models but are also interpreted when formative measures are involved.

Outer model: see Measurement model.

Outer weights: these are the results of a multiple regression of a construct on its set of indicators. Weights are the primary criterion to assess each indicator’s relative importance in formative measurement models.

Outlier: an extreme response to a particular question or extreme responses to all questions.

Out-of-sample predictive power: see Predictive power.

Pairwise deletion: uses all observations with complete responses in the calculation of the model parameters. As a result, different calculations in the analysis may be based on different sample sizes, which can bias the results. The use of pairwise deletion should generally be avoided.

Parameter settings: see Algorithmic options.

Parametric approach: a type of multigroup analysis representing a modified version of a two independent samples t test.

Parsimonious models: models with as few parameters as possible for a given quality of model estimation results.

Partial least squares k-means (PLS-SEM-KM): a clustering method that maximizes group-specific latent variable score differences, while at the same time accounting for heterogeneity in the structural and measurement model relations.

Partial least squares structural equation modeling (PLS-SEM): a composite-based method to estimate structural equation models. The goal is to maximize the explained variance of the endogenous latent variables.

Partial measurement invariance: this is confirmed when only (1) configural invariance and (2) compositional invariance are demonstrated.

Partial mediation: occurs when a mediator variable partially explains the relationship between an exogenous and an endogenous construct. Partial mediation can come in the form of complementary and competitive mediation, depending on the relationship between the direct and indirect effects.

Path coefficients: estimated path relationships in the structural model (i.e., between the constructs in the model). They correspond to standardized betas in a regression analysis.

Path model: a diagram that visually displays the hypotheses and variable relationships that are examined when structural equation modeling is applied.

Path weighting scheme: uses the results of partial regression models to determine the relationships between the constructs in the structural model in the first stage of the PLS-SEM algorithm; see Weighting scheme.

Percentile method: an approach for constructing bootstrap confidence intervals. Using the ordered set of parameter estimates obtained from bootstrapping, the lower and upper bounds are directly computed by excluding a certain percentage of lowest and highest values (e.g., as determined by the 2.5% and 97.5% bounds in the case of the 95% bootstrap confidence interval). The percentile method should be preferred when constructing confidence intervals.

Performance: a term used in the context of IPMA. It is the mean value of the unstandardized (and rescaled) scores of a latent variable or an indicator.

Permutation test: a type of multigroup analysis. The test randomly permutes observations between the groups and re-estimates the model to derive a test statistic for the group differences.

PLS path modeling: see Partial least squares structural equation modeling.

PLS regression: an analysis technique that explores the linear relationships between multiple independent variables and a single or multiple dependent variable(s). In developing the regression model, it constructs composites from both the multiple independent variables and the dependent variable(s) by means of principal component analysis.

PLS typological path modelling (PLS-TPM): a distance-based segmentation method developed for PLS-SEM.

PLSe2: a variant of the original PLS-SEM algorithm. Similar to PLSc, it makes the model estimates consistent in a common factor model sense.

PLS-GAS: see Genetic algorithm segmentation in PLS-SEM.

PLS-IRRS: see Iterative reweighted regressions segmentation method.

PLS-MGA: a bootstrap-based multigroup analysis technique.

PLS-POS: see Prediction-oriented segmentation approach in PLS-SEM.

PLSpredict procedure: a holdout-sample-based procedure that generates case-level predictions on an item or a construct level to facilitate the assessment of a PLS path model’s predictive power. The PLSpredict procedure relies on the concept of k-fold cross-validation.

PLS-SEM: see Partial least squares structural equation modeling.

PLS-SEM algorithm: the heart of the method. Based on the PLS path model and the indicator data available, the algorithm estimates the scores of all latent variables in the model, which in turn serve for estimating all path model relationships.

PLS-SEM bias: refers to PLS-SEM’s property that structural model relationships are slightly underestimated and relationships in the measurement models are slightly overestimated compared to CB-SEM when using the method on common factor model data. This difference can be attributed to the methods’ different handling of the latent variables in the model estimation but is negligible in most settings typically encountered in empirical research.

PLSc-SEM: see Consistent PLS-SEM.

PLS-SEM-KM: see Partial least squares k-means.

PLS-TPM: see PLS typological path modeling segmentation.

Prediction: see Predictive power.

Prediction error: the difference between a variable’s predicted and original value.

Prediction statistics: quantify the degree of prediction error.

Prediction-oriented segmentation in PLS-SEM (PLS-POS): a distance-based segmentation method for PLS-SEM.

Predictive power: indicates a model’s ability to predict new observations.

Product indicator approach: an approach to model the interaction term when including a moderator variable in the model. It involves multiplying the indicators of the moderator with the indicators of the exogenous latent variable to establish a measurement model of the interaction term. The approach is only applicable when both moderator and exogenous latent variables are measured reflectively.

Product indicators: indicators of an interaction term, generated by multiplication of each indicator of the exogenous construct with each indicator of the moderator variable. See Product indicator approach.

p value: in the context of structural model assessment, it is the probability of error for assuming that a path coefficient is significantly different from zero. In applications, researchers compare the p value of a coefficient with a significance level selected prior to the analysis to decide whether the path coefficient is statistically significant.

Q²predict: a metric used in PLSpredict to assess the model’s predictive power. The metric represents a naïve benchmark for the PLS-SEM results. Values greater zero indicate that the PLS-SEM estimation beats the naïve benchmark in terms of prediction.

Q² statistic: a measure for evaluating structural models. The computation of Q² draws on the blindfolding technique, which uses a subset of the available data to estimate model parameters and then predicts the omitted data. Q² examines whether a model accurately predicts data points not used in the estimation of model parameters. As the measure blends in-sample and out-of-sample predictive power assessment, we advise against its use.

R² value: See Coefficient of determination (R²).

Ratio scale: a measurement scale that has a constant unit of measurement and an absolute zero point; a ratio can be calculated using the scale points.

Raw data: the unstandardized observations in the data matrix that is used for the PLS path model estimation.

REBUS-PLS: see Response-based procedure for detecting unit segments in PLS path modeling.

Redundancy analysis: a method used to assess a formative construct’s convergent validity. It tests whether a formatively measured construct is highly correlated with a reflective or single-item measure of the same construct.

Reflective measure: see Reflective measurement.

Reflective measurement: a type of measurement model setup in which measures represent the effects (or manifestations) of an underlying construct. Causality is from the construct to its measures (indicators). The outer loadings estimation of reflective measurement models usually uses Mode A in PLS-SEM.

Reflective–formative higher-order construct: has reflectively measured lower-order components and relationships from the lower-order components to the higher-order component.

Reflective measurement model: See Reflective measurement.

Reflective–reflective higher-order construct: has reflectively measured lower-order components and relationships from the higher-order component to the lower-order component.

Regression weights:** See Mode B.

Relative contribution: the unique importance of each indicator by partializing the variance of the formatively measured construct that is predicted by the other indicators. An item’s relative contribution is provided by its weight.

Relevance of significant relationships: compares the relative importance of predictor constructs to explain endogenous latent constructs in the structural model. Significance is a prerequisite for the relevance, but not all constructs and their significant path coefficients are highly relevant to explain a selected target construct.

Reliability coefficient rA: a measure of internal consistency reliability.

Reliability: the consistency of a measure. A measure is reliable (in the sense of test-retest reliability) when it produces consistent outcomes under consistent conditions. The most commonly used measure of reliability is the internal consistency reliability.

Repeated indicators approach: a type of measurement model setup in higher-order constructs that reuses the indicators of the lower-order components as indicators of the higher-order component to identify the higher-order construct.

Rescaling: the act of changing the values of a variable’s scale to fit a predefined range (e.g., 0 to 100).

Response-based procedure for detecting unit segments in PLS path modeling (REBUS-PLS): a distance-based segmentation method for PLS-SEM that builds on the PLS-TPM method.

Response-based segmentation techniques: see Latent class techniques.

RMSE: see Root mean square error (RMSE).

RMStheta: see Root mean square residual covariance.

Root mean square error (RMSE): a metric used in PLSpredict, defined as the square root of the average of the squared differences between the predictions and the actual observations.

Root mean square residual covariance (RMStheta): a model fit measure that is based on the (root mean square) discrepancy between the observed covariance and the model-implied correlations. In CB-SEM, an SRMR value indicates good fit, but no threshold value has been introduced in a PLS-SEM context yet. Initial simulation results suggest a (conservative) threshold value for the root mean square residual covariance (RMStheta) of 0.12. That is, RMStheta values below 0.12 indicate a well-fitting model, whereas higher values indicate a lack of fit. However, model fit measures should generally be treated with extreme caution in PLS-SEM.

Scale: a set of reflective indicators used to measure a construct.

Secondary data: data that have already been gathered, often for a different research purpose and some time ago.

Second-generation techniques: overcome the limitations of first-generation techniques, for example, in terms of accounting for measurement error. SEM is the most prominent second-generation data analysis technique.

Second-order constructs: are a type of higher-order construct with two levels of abstraction.

SEM: see Structural equation modeling.

Serial mediating effect: a type of mediating effect in a multiple mediation model which considers a sequence of effects via two or more mediators simultaneously.

Significance testing: the process of testing whether a certain result likely has occurred by chance (i.e., whether an effect can be assumed to truly exist in the population).

Simple effect: a cause–effect relationship in a moderator model. The parameter estimate represents the size of the relationship between the exogenous and endogenous latent variable when the moderator variable is included in the model. For this reason, the main effect and the simple effect usually have different sizes.

Single mediation analysis: describes a mediation analysis in which only one mediator variable is being included in the model.

Single-item constructs: constructs that have only a single item measuring them. Since a single-item construct is equal to its measure, the indicator loading is 1.00, making conventional reliability and convergent validity assessments inappropriate.

Singular data matrix: occurs when a variable in a measurement model is a linear combination of another variable in the same measurement model or when a variable has identical values for all observations. In this case, the variable has no variance and the PLS-SEM approach cannot estimate the PLS path model.

Skewness: the extent to which a variable’s distribution is symmetrical around its mean value.

Slope plot: a type of line chart used to detect changes in linear slopes between groups.

Sobel test: a test that has been proposed to assess the significance of the indirect effect in a mediation model. However, research has dismissed the Sobel test for evaluating mediation analysis results of regression models and PLS-SEM.

Specific indirect effect: describes an indirect effect via one single mediator in a multiple mediation model.

SRMR: see Standardized root mean square residual.

Standard error: the standard deviation of the sampling distribution of a given statistic. Standard errors are important to show how much sampling fluctuation a statistic has.

Standardized data: have a mean value of 0 and a standard deviation of 1 (z-standardization). The PLS-SEM method usually uses standardized raw data. Most software tools automatically standardize the raw data when running the PLS-SEM algorithm.

Standardized root mean square residual (SRMR): a model fit measure, which is defined as the root mean square discrepancy between the observed correlations and the model-implied correlations. Research has shown that the SRMR is largely unsuitable for detecting model misspecification in situations commonly encountered in applied research.

Standardized values: indicate how many standard deviations an observation is above or below the mean.

Statistical power: the probability to detect a significant relationship when the relationship is in fact significant in the population.

Stop criterion: see Convergence.

Straight lining: describes a situation in which a respondent marks the same response for a high proportion of the questions.

Structural equation modeling (SEM): a set of statistical methods used to estimate relationships between constructs and indicators, while accounting for measurement error.

Structural model: includes the construct and their relationships as derived from theory and logic.

Structural theory: specifies how the latent variables are related to each other. That is, it shows the constructs and the paths between them.

Studentized bootstrap method: computes confidence intervals similarly to a confidence interval based on the t distribution, except that the standard error is derived from the bootstrapping results.

Sum scores: represent a naive way to determine the latent variable scores. Instead of estimating the relationships in the measurement models, sum scores use the same weight for each indicator per measurement model (equal weights) to determine the latent variable scores. As such, the sum scores approach does not account form measurement error.

Suppressor variable: describes the mediator variable in competitive mediation, which absorbs a significant share of or the entire direct effect, thereby substantially decreasing the magnitude of the total effect.

Tetrad: the difference of the product of a pair of covariances and the product of another pair of covariances. In reflective measurement models, this difference is assumed to be zero or at least close to zero; that is (i.e., they are expected to vanish). Nonvanishing tetrads in a latent variable’s measurement model cast doubt on its reflective specification, suggesting a formative specification.

Theoretical t value: see Critical t value.

Theory: a set of systematically related hypotheses developed following the scientific method that can be used to explain and predict outcomes and can be tested empirically.

Three-way interaction: an extension of two-way interaction where the moderator effect is again moderated by another moderator variable.

TOL: see Variance inflation factor.

Tolerance (TOL): see Variance inflation factor.

Total effect: the sum of the direct effect and the indirect effect between an exogenous and an endogenous latent variable in the path model.

Total indirect effect: the sum of all specific indirect effects in a multiple mediation model.

Training sample: a subset of a larger data set used for model estimation.

Two-stage approach (higher-order constructs): an approach to modeling and estimating an higher-order constructs in PLS-SEM, which is particularly useful when a reflective-formative or formative-formative higher-order construct serves as an endogenous construct in the PLS path model.

Two-stage approach (moderation analysis): an approach to model the interaction term when including a moderator variable in the model. The approach can also be used when the exogenous construct and/or the moderator variable are measured formatively.

Two-tailed test: see Significance testing.

Two-way interaction: the standard approach to moderator analysis where the moderator variable interacts with one other exogenous latent variable.

Unobserved heterogeneity: occurs when the sources of heterogeneous data structures are not (fully) known.

Validity: the extent to which a construct’s indicators jointly measure what they are supposed to measure.

Vanishing tetrads: see Tetrad.

Variance inflation factor (VIF): quantifies the severity of collinearity among the indicators in a formative measurement model. The VIF of an indicator i of a certain measurement model is directly related to the tolerance value (VIFi = 1/tolerancei).

Variance-based SEM: see Partial least squares structural equation modeling.

Variate: see Composite variable.

VIF: see Variance inflation factor.

Weighted PLS-SEM (WPLS): a modified version of the original PLS-SEM algorithm that allows the researcher to incorporate sampling weights.

Weighting scheme: describes a particular method to determine the relationships in the structural model when running the PLS-SEM algorithm. Standard options are the centroid, factor, and path weighting schemes. The final results do not differ much, and one should use the path weighting scheme as a default option since it maximizes the R² values of the PLS path model estimation.

WPLS: see Weighted PLS-SEM.

Chapter 8

Back to Overview