A Latent Class Model for Multivariate Binary Data Subject to Missingness

— When researchers are interested in measuring social phenomena that cannot be measured using a single variable, the appropriate statistical tool to be used is a latent variable model. A number of manifest variables is used to define the latent phenomenon. The manifest variables may be incomplete due to different forms of non-response that may or may not be random. In such cases, especially when the missingness is nonignorable, it is inevitable to include a missingness mechanism in the model to obtain valid estimates for parameters. In social surveys, categorical items can be considered the most common type of variable. We thus propose a latent class model where two categorical latent variables are defined; one represents the latent phenomenon of interest, and another represents a respondent’s propensity to respond to survey items. All manifest items are considered to be categorical. The proposed model incorporates a missingness mechanism that accounts for forms of missingness that may not be random by allowing the latent response propensity class to depend on the latent phenomenon under consideration, given a set of covariates. The Expectation-Maximization (EM) algorithm is used for estimating the proposed model. The proposed model is used to analyze data from 2014 Egyptian Demographic and Health Survey (EDHS14). Missing data is artificially created in order to study results under the three types of missingness: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).


I. INTRODUCTION
In many social science applications, the main interest is to measure constructs or concepts, such as behavior or abilities, which cannot be measured using a single observable variable. These are known as latent (unobserved) variables and can be measured through a set of manifests (observed) variables using what is known in Statistics as latent variable models. Observed variables can be of any type, and latent variables can be assumed to be either categorical or continuous, depending on the nature of the problem. This results in different classifications of latent variable models. Verbeke and Molenberghs [1] has given general overview of latent variable models and their inference. Network models provide an alternative to latent variable models [2].
Categorical latent variables are usually assumed when there is a reason to believe that a particular phenomenon is inherently categorical, justified by prior evidence or theory which leads to latent categories, or when it would be practically useful to have such categories; for example, to organize respondents into a number of relevant subgroups [3].
When the observed items are categorical, latent class analysis has been adopted by many authors to achieve this objective. Some previous studies also concern behavioral and psychometric fields that employ different versions of this type of model [4]- [10]. The covariates and direct effects within latent class models are also discussed in Janssen et al. [11] and Bakk and Kuha [12]. Bakk and Kuha [13] has fitted a latent class model with structural regression models for the relationships between the latent classes and observed manifest variables and covariates.
Item non-response is a common type of missing data, especially in surveys, in which a respondent may provide answers to some of the variables but not the others leading to different patterns of incomplete data. Despite trials to reduce non-response, such as probing "Don't Know" answers [14], almost all surveys still suffer from missing data due to nonresponse. Little and Rubin [15] has classified missing data in general into three types: MCAR, where missingness neither depends on observed data nor on data that is missing; MAR, where missingness is independent of missing data conditional on the observed data; or MNAR, where missingness may depend on the missing values, and possibly the observed data too. When data is MCAR, the observed data may be considered as a random subset of the full data. If data is MAR, it can still be considered as a random subset defined for specific values of the observed data. In such cases, the missingness is labeled as "ignorable missingness". When data is MNAR, the missingness is related to the value that was not observed itself, reflecting a systematic difference between respondents and nonrespondents. Hence, it is said that this type of missingness is "nonignorable" or "informative." Incorporating a missingness mechanism is crucial in this case to avoid biasedness in the estimation of model parameters.
There are various approaches to incorporate a missing nonignorable mechanism, mostly developed for longitudinal data. These include the selection approach and the patternmixture approach. Selection models factorize the joint distribution of observed and missing responses into a marginal distribution for the full data multiplied by the conditional distribution of the missing data given the full data. On the other hand, the pattern-mixture approach specifies the marginal distribution for the missing responses and the distribution of the complete data conditional on the missing responses. Du et al. [16] has proposed imputing data that is MNAR using a latent variable approach and fit it within a Bayesian framework.
For multiple observed variables, latent response propensities have been employed by many authors to account for missingness in data. Rose et al. [17] and Cursio et al. [18] are among the most recent publications that use this concept. The main idea is to create a binary indicator variable corresponding to each manifest item of those measuring the latent variable of interest that indicates whether this manifest item is observed or missing for each subject. The number of those created binary variables is thus the same as the number of manifest items. A latent variable of the phenomenon of interest is measured by a number of binary manifest variables that may include some missing values. Another latent variable labeled as response propensity as measured by a set of binary items that are created to indicate whether a value is observed or missing for each of the manifest variables. Both latent variables are assumed to be continuous. Models, where the response propensity is a categorical latent variable have received less attention. Jung [19] has used a categorical response propensity variable to gather with the joint distribution of the observed items. Harel and Schafer [20] has proposed using latent class models that deal with partially ignorable missingness by fitting a latent class model that includes binary missingness indicators as additional items. A possible criticism of this specification is that the latent class variable is a summary for both the main observed items and the response propensity, thus possibly changing the meaning of the latent variable itself. Kuha et al. [21] has proposed models for survey data that contain non-response, assuming the main latent variable to be continuous and the response propensity latent variable to be categorical. Bacci and Bartolucci [22] has defined similar models assuming both latent variables to be categorical. It also assumes that the two latent variables are independent conditional on a set of covariates. The non-response model may be dependent on one or both latent variables. Sterba [23] has presented a shared parameter latent transition analysis, assuming categorical latent variables, in the case of longitudinal data. This article considers models where the latent variable of interest may affect the probability of non-response through another latent variable that summarizes response propensity. The response propensity determines the non-response probabilities is affected by the main latent variable in the structural part of the model. The non-response is nonignorable if the response propensity is associated with the main latent variable and ignorable otherwise. We propose a latent class model that considers binary manifest variables subject to nonresponse, assuming categorical latent variables, both for the main latent variable and used to measure response propensity. Unlike Bacci and Bartolucci [22], the two latent variables are related by allowing members of the latent class to affect the probability of response. This means that the missingness mechanism may be nonignorable. However, the model allows the two latent variables to be affected by covariates. To illustrate the proposed model, data from the EDHS14 is analyzed [24]. Missingness is artificially created under three scenarios: MCAR, MAR, and MNAR. The aim is to study how the results of the model may change according to the type of missingness.
Section (II) of this article presents the specification of the proposed latent class model (LCM), the estimation process for the LCM parameters, and the methods from the literature for model selection and fit evaluation. Discussion and results of the proposed model using a real data set appear in Section (III). Concluding remarks appear at the end of the article in Section (IV).

A. Research Methodology
The general outline for systematic stages to conduct such research is as follows. In the initial stage of the research, the research problem is formulated. This involves an attempt to measure an unobserved phenomenon of interest using a number of observed binary variables (items). In many cases, the observed items will have some missing values. The first step is to select the most appropriate items for measuring the latent variable, depending on the suitable selection criteria given in the next section. The implementation stage then starts by creating a missingness pseudo-item corresponding to each of the original selected items to indicate whether a value is missing or not. The proposed latent class model, outlined in the next subsection, is then ready to be implemented. It assumes two categorical latent variables, one to summarize the main phenomenon of interest and the other to summarize response propensity. One of the main contributions of this model is that it allows the two latent variables to be dependent, thus allowing for missingness to be not at random. Estimation of this model is then carried out before moving to the final stages of the research, where the model is evaluated and models having different numbers of classes are compared. The best-fitting model concerning fit, and interpretability is then selected, and its results are interpreted. In our study, we create artificial missingness to evaluate the model's performance under different types of missingness. Fig. 1 is a flowchart representation of the outlined research methodology.

B. Model Specification
Finite mixture models are models in which categorical latent variables are made of classes where class membership is inferred from the data. A special case is latent class analysis, where the latent classes explain relationships among the manifest items. The model proposed here considers the case where all manifest variables are binary. The latent variable used to summarize the manifest variables is assumed to be categorical too. A missingness mechanism to account for item non-response is incorporated. This involves another categorical latent variable to measure a respondent's propensity to respond based on binary indicators representing whether a respondent has given an answer to each observed item. The latent variable of interest is allowed to affect response propensity makes the missingness possibly nonrandom. It is thus assumed that an individual's probability of responding to items depends on their response propensity, observed covariates, and possibly their class membership of the main latent variable. Fig. 2 is a path diagram representation of the proposed model. The main latent variable is denoted by , while that representing response propensity is denoted by . There are binary manifest variables, each denoted by . The vector represents a number of observed covariates that may influence the main latent variable, while another vector of covariates may be the same or different from , influencing the response propensity latent variable. An indicator variable takes value 1 if the manifest variable is observed and 0 if it is missing.
An LCM consists of two main parts known as the measurement and structural parts. In the case of our proposed model, a third part is added to incorporate the missingness mechanism. 1) Measurement model: The measurement part for an LCM is a multivariate regression model where the dependent variables are the manifest variables, and the categorical latent variables represent the independent variables. When manifest variables are binary, logistic regression equations are used to model the relationships between the observed and latent variables.
Let denote a vector of binary manifest variables and be the probability of a positive response on the variable for an individual in each category of the latent class 1, 2, … , . The latent classes for the latent variable of interest are mutually exclusive and exhaustive.
Each respondent belongs to one and only one latent class. Each observed binary variable follows a Bernoulli distribution. The probability of responding to variable positively can thus be presented as, where π z P y 1 │ z .
2) Missingness mechanism: To include the missingness mechanism within the proposed model, a random indicator variable for the missingness is defined for each observed item.
Let 3 denote a vector of indicator variables and be the probability that variable is observed 1 for a respondent, given their membership to the latent class categories 1, 2, … , . The latent classes for the latent variable of response propensity are also mutually exclusive and exhaustive. Each of the response/non-response indicator variables follows a Bernoulli distribution. The probability that a manifest variable is not missing ( 1 can thus be modeled as, where π 4 z 4 P r 1 │ z 4 . 3) Structural model: Relationships among latent variables and possibly covariates too, if they exist, are outlined within the structural part of the latent variable model. Both latent variables and are assumed to be binary, each of them having a Bernoulli distribution. Logistic regression equations are used to model these relationships. According to the model specification in Fig. 2, the structural model will be given by where π 7 8 x P z 1 │ x is the probability of being a member of the first class of the latent variable given a set of B observed covariates affecting , and logit π 7 C z , w = α 4 ; ϕz ; ∑ γ G H G@A w G , (5) where π 7 C z , w P z 4 1 │ z , w is the probability that an individual is a member of the first class of a latent variable for response propensity given their class membership of the main latent variable , and a number of I observed covariates affecting . If the regression coefficient ϕ is significant, this can reflect that the non-response is nonrandom, since the probability of non-response will be associated with certain levels of the main latent variable, and hence including a missingness mechanism is crucial.

C. Model Estimation
Estimating the LCM involves estimation of parameters determining the probability of latent class membership, represented in equations (4) and (5); in addition to parameters determining item-response probabilities, conditional on latent class membership. The latter parameters are present in equations (1) and (3).
The loglikelihood for a random sample of size 0 is given by According to the model specified by equations (1), (3), (4) and (5), the joint distribution of all observed items is given by where & and 3 & represent the 2* manifest variables for the % #$ respondent. The conditional distribution of y R │z is The joint distribution of and can be factorized as hbz , z 4 │x, w c h z 4 |z , w h z |x , where a Bernoulli distribution is assumed for the main latent variable conditional on covariates ℎ | and the response latent variable conditional on the main latent variable and covariates ℎ | , . For estimating the outlined model, a given response to an observed item is weighted by the probability of responding to this item. This probability is a direct function of response propensity and is indirectly affected by class membership of the main latent variable through the response propensity latent variable. Collins and Lanza [25] has shown that model parameters cannot be estimated in closed form in this case. These parameters are estimated from the data (for a given number of classes) using the EM algorithm, combined with another iterative algorithm, such as Newton-Raphson. These algorithms attempt to maximize the likelihood function, thus obtaining maximum likelihood (ML) parameter estimates. In our application, estimation is carried out in Mplus [26].

D. Model Selection and Fit
This section addresses two practical issues when fitting an LCM: selecting the relevant items to measure a latent class variable and determining the appropriate number of latent classes. Selecting items for latent class analysis is crucial to help interpretability of the latent variable and the model. In general, classification performance and precision of parameter estimates are better for more parsimonious models.
Collins and Lanza [25] has shwon that two aspects are characterized by a strong relation between each manifest item and a latent variable. The first aspect is how the item-response probabilities for manifest item vary across the latent classes. The second aspect is whether the item-response probabilities corresponding to the observed variable are close to 1 or 0. Real data can be checked if the item-response probabilities are close to 0 or 1, as it is not common to find item-response probabilities that are exactly 0 or 1.
Goodman [27] has discussed the identifiability of a latent class model given a specific number of classes for a given number of variables. For e classes and observed binary variables, the following condition needs to be satisfied for the model to be identified 2 Z > ( ; 1 × e. This assures that there is enough information for estimation of model parameters. However, in practice, the available data may be not sufficient to estimate the model parameters. Xu [28] has resolved identifiability issues for a restricted family of latent class models with binary manifest items. Deciding on the appropriate number of classes for a model usually involves comparing models with different classes (e.g., 2, 3, and 4 latent classes) and selecting the model that gives the best fit and most interpretability. There is no common standard for the best fit criterion, and researchers often use a number of fit criteria in selecting the appropriate number of latent classes. Reference Koo and Kim [29] fits a latent class model for longitudinal data, allowing the number of latent classes to be determined from the data based on a Bayesian estimation method. Tein et al. [30] stated that methods for determining the number of classes are classified into three categories: likelihood ratio statistical test methods, information-theoretic methods, and entropy-based criterion.
In the data analysis, we use these methods for selecting models with the best fit.

III. RESULTS AND DISCUSSION
This paper applies the proposed LCM incorporating a missingness mechanism to analyze data from the 2014 Egyptian Demographic and Health Survey (EDHS14) about people's access to knowledge sources. The EDHS14 consisted of two questionnaires, one for the household and another for individuals. The household questionnaire included social and economic questions. Out of 29,471 households sampled for the EDHS14, 28,630 households were found, and a response rate among those was 98.4 percent.

A. Access to Knowledge Sources: Measurement Model
The Bristol definition for information (knowledge sources) deprivation, based on the ''deprivation approach'' to poverty [31], was originally developed for children between 2 − 18 years old. It defines children without access to the following media: radio, television, telephone (landline or mobile phone), computer or newspapers at home as "information deprived". We propose a more flexible definition based on a latent variable approach that, unlike the original definition, provides a more flexible concept of deprivation. The latent class model does not merely consider an individual as "not deprived" if they have any of those devices but classifies subjects according to the measurement model that gives different weights to different items in measuring the latent variable.
The following eight items are measured on the household level, indicating whether certain sources of knowledge are available for all household members or not. We use these items to measure a latent variable that is interpreted as "Access to Knowledge Sources". All these items are binary variables. Each of them is given the value "1" for a household having this source of knowledge and "0" for a household that does not have that source.
The analysis aims to determine the contribution of each of the eight items to the measurement of access of people to knowledge sources. First, we investigate which of the eight items has the highest contribution in measuring the latent variable. Then, we choose the best items that represent the latent variable well according to the two criteria outlined in Section II-C. Next, the missingness mechanism is incorporated within the model framework. Data analysis is implemented in Mplus [26].
The results from the model in Equation (1) are presented in Table 1, assuming a two-class latent variable. Among 28,175 of households who were successfully interviewed 27,850 provided complete answers. The analysis is based on those who gave complete answers. By applying item selection criteria, four items are selected as measures of the latent variable that we label as "Access to Knowledge Sources"; namely, access to radio, telephone (landline), computer, and smartphone. The conditional probabilities of the other four items (television, mobile phone, video, and satellite dish) do not change much whether they belong to the first or second class of the latent variable, as shown in Table 1. This can indicate that they do not have a great contribution in measuring the latent variable and are thus excluded.
According to the identifiability condition in equation (11), the latent variable of interest "Access to Knowledge Sources" model will be identifiable for either two or three classes. By comparing the results of the measurement model with two versus three classes (see Table 2), it is found that the p-value for both the Lo-Mendell-Rubin (LMR) likelihood ratio test and the bootstrap likelihood ratio test (BLRT) indicates a good fit in both cases. The AIC, BIC, and entropy-based criteria for model selection are not too far for the two models, although they slightly favor the three-class model. Thus, we resort to ease of interpretability and labeling of the latent classes in determining the number of classes that will be used.  Table 3 presents the estimated conditional probabilities for a two-class measurement model versus those of a three-class measurement model, respectively. On examining the estimated probabilities, we decide that the suitable number of classes for our latent variable "Access to Knowledge Sources" is two, as it reflects a clear pattern of probabilities that are higher in the first-class than the second, while no specific pattern can be inferred from the three-class model.

B. Response Propensity: Missingness mechanism
The EDHS14 data has a negligible percentage of missingness. Therefore, in order to carry out a comparison of the proposed model under the three types of missingness MCAR, MAR and MNAR, missingness is artificially created within the selected items under the three scenarios. An indicator variable & is created to indicate whether a manifest item & is observed or not. It is thus given the value "1" if item is observed for an individual %, and the value "0" if it is artificially missing. For MCAR, the missingness is created in a totally random manner. The probability of an individual having a missing value for one of the manifest variables is neither dependent on observed nor unobserved data. This is achieved by randomly deleting 10% of each item resulting in 34.5% of missingness in the overall data. That is, 34.5% of individuals will have at least one of the four items missing. In case of data MAR, the probability of a missing value for one of the manifest variables is generated as a function of covariates. In this application, wealth index ( r A ) and educational level of household head ( r s ) are used as covariates for this matter. The probability of a missing response is thus given by P missing logit α ; α A x A ; α s x s .
For each manifest item, a uniform random variable [0, 1] of the same length is generated, such that an observation will be deleted and treated as missing if the corresponding w % ++ 01 > x [0, 1], . The percentage of missingness in each item was approximately 10% resulting in 27.7% missingness in the overall data. This was achieved at 0, A −1, and s = 0.1. It is worth noting that both the choice of covariates and values for parameters , A and s in the equation 12 are arbitrary. In order to obtain data that is MNAR, the missingness has to be generated such that the probability of missing observations depends on the unobserved values themselves. This will lead to a systematic difference between those who respond and those who do not. So, we randomly deleted 10% each item from those who do not have that device (those whose response to the item is 0), resulting in 34% missingness in the overall data. The four missingness indicators are used as measures of another latent variable that we label as "Response Propensity". The same steps for creating the latent variable "Response Propensity" and selecting the appropriate number of classes are repeated as we have done for the "Access to Knowledge Sources". Again, "Response Propensity" will be identifiable with either two or three classes. By comparing the two-class and the three-class models for the different types of missingness in Table 4, it is decided to go for a latent variable with two classes which also provides the best model fit.

C. Overall model
Having fitted each latent class variable "access to Knowledge Sources" and "Response Propensity" separately, the overall proposed model illustrated in Fig. 2 is now presented. Covariates used to affect the latent variable "Access to Knowledge Sources" are sex (male/ female), age in years, place of residence (urban/ rural), educational level, and wealth index of the household head. The wealth index is obtained in five categories: poorest, poorer, middle, richer, and richest. The highest level of education is obtained in six categories: no education, incomplete primary, complete primary, incomplete secondary, complete secondary, and higher. The covariates affecting the missingness latent variable "Response propensity" are sex (male/ female), age, and place of residence (urban/ rural) of the household head. We avoid studying the effect of wealth index and educational level of household head-on "Response Propensity" as these covariates are used in creating the MAR dataset.   Table 5 gives parameter estimates for the overall proposed model presented by Equations (1), (3), (4) and (5) and illustrated in Fig. 2 for the three types of missingness, assuming a two-latent class model for each of the latent variables "Access to Knowledge Sources" and "Response Propensity".
The measurement model is quite robust under different missingness scenarios. This indicates that the contribution of the manifest variables in measuring the latent variable of interest is not affected by the type of missing data. On the contrary, the missingness model exhibits differences under the different types of missingness. The four missingness indicators are insignificant in defining the latent variable "Response Propensity" when data is MCAR. This is a logical result as the items were created to reflect a completely random pattern of missingness in the data, and thus the created dataset is a random subset of the data. The indicators do not measure "Response Propensity" in this case. However, they are significant in measuring the latent variable when missing data is MAR or MNAR, as in both cases, those who have missing values are different from those who respond. However, while their contribution in measuring "Response Propensity" is almost equal for the four indicators in data MAR, it is higher for computers and smartphones than radio and telephone in the case of data MNAR.
The structural parameter ϕ , reflecting the relationship between "Response Propensity" and "Access to Knowledge Sources", is given at the bottom of Table 5. As one would expect, this relationship is insignificant in case of data MCAR. On the contrary, there is a significant relationship in the case of data MAR and MNAR. The significant positive effect indicates that higher response levels are more likely to be found with higher levels of access to knowledge sources, even after controlling for covariates. The magnitude of the structural parameter is much higher in the case of data MNAR. A possible explanation of the significant effect indicating nonignorable missingness in the case of data that was originally created at random is that levels that are used in creating missingness are themselves confounded with certain levels of "Access to Knowledge Sources", and thus there still exists kind of dependence of "Response Propensity" on "Access to Knowledge Sources". Table 6 shows the estimated conditional probabilities for the manifest items given class membership of the main latent variable "Access to Knowledge Sources", and those of the missingness indicators given class membership of the latent variable "Response Propensity," assuming that both latent variables are binary. The conditional probabilities are reported under different types of missingness. We may consider the first latent class of "Access to Knowledge Sources" to indicate "High access to knowledge sources" and the second to indicate "Low access to knowledge sources", as the conditional probabilities of having any of the devices are consistently higher for the first-class than the second. The conditional probabilities resulting from the "Response Propensity" latent variable are not reliable in the case of data MCAR since the indicators are not significant in defining the "Response Propensity" latent variable (see Table V).
However, there is a clear pattern of higher estimated probabilities for the first class than the second, in data MAR or MNAR, although the differences are sometimes not too big. The first latent class may thus be labeled as "High response propensity" and the second latent class as "Low response propensity". The complement of the above probabilities indicates the probability of responding with a "No" to the corresponding item. The complement of the response indicator probabilities gives the probability of a "Missing" response. From Table 7, and considering how the latent variable is defined, it can be concluded that the probability of having high access to knowledge sources is generally higher for more privileged people. That is to say; it is higher for males than females and people with a higher wealth index, of older age, and with higher levels of education. An unexpected result is that the probability of having high access to knowledge sources is higher for those living in rural areas compared to urban. However, it is not known whether the available media are used to access knowledge or mainly for entertainment and communication.
In our application, covariates effects on the missingness part of the model seem to be significant only in the case of data MAR. The sex covariate has a negative effect on "Response Propensity," which means that females are more likely to respond. Place of residence and age positively affect "Response Propensity," which means that younger people and people in urban areas are more likely to respond. For data MCAR, the "Response Propensity" latent variable is not welldefined due to randomness in creating the missingness in this case, so its relationship with "Access to Knowledge Sources" and covariates turned out to be insignificant. In the case of data MNAR, only place of residence significantly affects "Response Propensity".
A possible explanation is that the effect of the other two covariates is already indirectly carried within the latent variable of interest "Access to Knowledge Sources", since their effect on "Access to Knowledge Sources" is already highly significant.

IV. CONCLUSION
When multiple manifest variables are used as measures of a latent variable, it is quite often to have some missing values in the data due to item non-response. In this paper, we proposed to summarize item non-response by another latent variable that can be labeled as "Response Propensity". The missingness can thus be allowed to be non-random by allowing the "Response Propensity" latent variable to depend on the main latent variable of interest.
A model specification incorporating a missingness mechanism within a latent class model framework has been proposed to model multivariate binary data used as measures of a categorical latent variable. This model specification allows for nonignorable item non-response by letting the response propensity latent variable summarize the response indicators, depending on the latent variable of interest and covariates. Logistic regression equations are used to model relationships within the latent class model under the categorical nature of all manifest and latent variables in the model. Estimation of model parameters and goodness of fit measures use conventional methods that are usually used to fit latent variable models for multivariate data.
The proposed model has been applied to data from Egypt's Demographic and Health Survey 2014. Data missingness has been artificially created to generate three different types of missingness, MCAR, MAR and MNAR, to study the results of the model in each case. An important result of the model was that the measurement part defining the latent variable of interest, "Access to Knowledge Sources" is quite robust no matter how missingness was created. For data MAR and MNAR, the relationship between the "Response Propensity" latent variable and the "Access to Knowledge Sources" latent variable remains significant even after controlling for covariates.
Unlike other models already existing in the literature, such as Bacci and Bartolucci [22] and Beesley et al [32], the proposed model accounts for missingness and allows for this missingness to be non-random by depending on levels of the latent class of interest. The estimated probabilities of class membership of the "Response Propensity" latent variable are affected by class membership of the "Access to Knowledge Sources" latent variable making the missingness nonignorable. Lower levels of response were associated with lower levels of "Access to Knowledge Sources". This result confirms the importance of accommodating the missingness mechanism within the modeling of the data due to the systematic difference between respondents and nonrespondents. Covariates effects are also found to be robust on the measurement model; however, they are quite sensitive to the type of missingness in the missingness part of the model. We have used Bayesian estimation in Zakaria et al. [33] to fit the same model specification proposed in this article and to study the sensitivity of the results to different levels of missingness.