Sunday, November 21, 2010

Sociological Methodology 40

Methods for Life-Course Data Analysis

Multichannel Sequence Analysis Applied to Social Science Data
Jacques-Antoine Gauthier, Eric D. Widmer, Philipp Bucher and Cédric Notredame
Applications of optimal matching analysis in the social sciences are typically based on sequences of specific social statuses that model the residential, family, or occupational trajectories of individuals. Despite the broadly recognized interdependence of these statuses, few attempts have been made to systematize the ways in which optimal matching analysis should be applied multidimensionally—that is, in an approach that takes into account multiple trajectories simultaneously. Based on methods pioneered in the field of bioinformatics, this paper proposes a method of multichannel sequence analysis (MCSA) that simultaneously extends the usual optimal matching analysis (OMA) to multiple life spheres. Using data from the Swiss household panel (SHP), we examine the types of trajectories obtained using MCSA. We also consider a random data set and find that MCSA offers an alternative to the sole use of ex-post sum of distance matrices by locally aligning distinct life trajectories simultaneously. Moreover, MCSA reduces the complexity of the typologies it allows to produce, without making them less informative. It is more robust to noise in the data, and it provides more reliable alignments than two independent OMA.

Memory Bias in Retrospectively Collected Employment Careers: A Model-Based Approach to Correct for Measurement Error
Anna Manzoni, Jeroen K. Vermunt, Ruud Luijkx and Ruud Muffels
Event history data constitute a valuable source to analyze life courses, although the reliance of such data on autobiographical memory raises many concerns over their reliability. In this paper, we use Swedish survey data to investigate bias in retrospective reports of employment biographies, applying a novel model-based latent Markov method.A descriptive comparison of the biographies as reconstructed by the same respondents at two interviews carried out about 10 years apart reveals that careers appear simpler and less heterogeneous and have fewer elements and episodes when reported at a point long after their occurrence, with a particularly high underreport of unemployment. Using matching techniques, the dissimilarity between the two reconstructions turns out to be unaffected by respondents’ sociodemographic characteristics but particularly affected by the occurrence of unemployment spells and career complexity.Using latent Markov models, we assume correlated errors across occasions to determine the measurement error and to obtain a more reliable estimate of the (true) latent state occupied at a particular time point. The results confirm that (correlated) measurement errors lead to simplification and conventionalism. Career complexity makes recall particularly problematic at longer recall distances, whereas unemployment underreporting also happens very close to the interview. However, only a small fraction of respondents make consistent errors over time, while the great majority makes no errors at all.


Causal Inference and Multivariate Data Analysis

The Foundations of Causal Inference
Judea Pearl
This paper reviews recent advances in the foundations of causal inference and introduces a systematic methodology for defining, estimating, and testing causal claims in experimental and observational studies. It is based on nonparametric structural equation models (SEM)—a natural generalization of those used by econometricians and social scientists in the 1950s and 1960s, which provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring the effects of potential interventions (also called “causal effects” or “policy evaluation”), as well as direct and indirect effects (also known as “mediation”), in both linear and nonlinear systems. Finally, the paper clarifies the role of propensity score matching in causal analysis, defines the relationships between the structural and potential-outcome frameworks, and develops symbiotic tools that use the strong features of both.

Bayesian Propensity Score Estimators: Incorporating Uncertainties in Propensity Scores into Causal Inference
Weihua An
Despite their popularity, conventional propensity score estimators (PSEs) do not take into account uncertainties in propensity scores. This paper develops Bayesian propensity score estimators (BPSEs) to model the joint likelihood of both propensity score and outcome in one step, which naturally incorporates such uncertainties into causal inference. Simulations show that PSEs using estimated propensity scores tend to overestimate variations in the estimates of treatment effects—that is, too often they provide larger than necessary standard errors and lead to overly conservative inference—whereas BPSEs provide correct standard errors for the estimates of treatment effects and valid inference. Compared with other variance adjustment methods, BPSEs are guaranteed to provide positive standard errors, more reliable in small samples, can be readily employed to draw inference on individual treatment effects, etc. To illustrate the proposed methods, BPSEs are applied to evaluating a job training program. Accompanying software is available on the author's website.

Finite Normal Mixture SEM Analysis by Fitting Multiple Conventional SEM Models
Ke-Hai Yuan and Peter M. Bentler
This paper proposes a two-stage maximum likelihood (ML) approach to normal mixture structural equation modeling (SEM) and develops a statistical inference that allows distributional misspecification. Saturated means and covariances are estimated at stage 1 together with a sandwich-type covariance matrix. These are used to evaluate structural models at stage 2. Techniques accumulated in the conventional SEM literature for model diagnosis and evaluation can be used to study the model structure for each component. Examples show that the two-stage ML approach leads to correct or nearly correct models even when the normal mixture assumptions are violated and initial models are misspecified. Compared to single-stage ML, two-stage ML avoids the confounding effect of model specification and the number of components, and it is computationally more efficient. Monte Carlo results indicate that two-stage ML loses only minimal efficiency under the condition where single-stage ML performs best. Monte Carlo results also indicate that the commonly used model selection criterion BIC is more robust to distribution violations for the saturated model than that for a structural model at moderate sample sizes. The proposed two-stage ML approach is also extremely flexible in modeling different components with different models. Potential new developments in the mixture modeling literature can be easily adapted to study issues with normal mixture SEM.

The Simultaneous Decision(s) about the Number of Lower- and Higher-Level Classes in Multilevel Latent Class Analysis
Olga Lukočienė, Roberta Varriale and Jeroen K. Vermunt
Recently, several types of extensions of the latent class (LC) model have been developed for the analysis of data sets having a multilevel structure. The most popular variant is the multilevel LC model with finite mixture distributions at multiple levels of a hierarchical structure; that is, with LCs for both lower-level units (e.g. individuals, citizens, or patients) and higher-level units (e.g. groups, regions, or hospitals). A problem in the application of this model is that determining the number of LCs is much more complicated than in standard (single-level) LC analysis because it involves multiple, nonindependent decisions. We propose a three-step model-fitting procedure for deciding about the number of higher- and lower-level classes. We also investigate the performance of information criteria (BIC, AIC, CAIC, and AIC3) in the context of multilevel LC analysis, with different types of response variables. A specific difficulty associated with using BIC and CAIC in any type of multilevel analysis is that these measures contain the sample size in their formulae, and we investigate whether this should be the number of groups, the number of individuals, or either the number of groups or individuals depending on whether one has to decide about model features concerning the higher or lower level. The three main conclusions of our simulations studies are that (1) the proposed three-step model-fitting strategy works rather well, (2) the number of higher-level units (K) is the preferred sample size for BIC and CAIC, both for decisions about higher- and lower-level classes, and (3) with categorical indicators, AIC3 and BIC based on the higher-level sample size are the preferred measures for deciding about the number of LCs at both the higher and lower level. With continuous indicators, BIC(K) performs better than AIC3. AIC performs best in very specific situations—namely, with poorly separated classes and categorical indicators.


Methods for the Analysis of Social Network Data

Respondent-Driven Sampling: An Assessment of Current Methodology
Krista J. Gile and Mark S. Handcock
Respondent-driven sampling (RDS) employs a variant of a link-tracing network sampling strategy to collect data from hard-to-reach populations. By tracing the links in the underlying social network, the process exploits the social structure to expand the sample and reduce its dependence on the initial (convenience) sample. The current estimators of population averages make strong assumptions in order to treat the data as a probability sample. We evaluate three critical sensitivities of the estimators: (1) to bias induced by the initial sample, (2) to uncontrollable features of respondent behavior, and (3) to the without-replacement structure of sampling. Our analysis indicates: (1) that the convenience sample of seeds can induce bias, and the number of sample waves typically used in RDS is likely insufficient for the type of nodal mixing required to obtain the reputed asymptotic unbiasedness; (2) that preferential referral behavior by respondents leads to bias; (3) that when a substantial fraction of the target population is sampled the current estimators can have substantial bias. This paper sounds a cautionary note for the users of RDS. While current RDS methodology is powerful and clever, the favorable statistical properties claimed for the current estimates are shown to be heavily dependent on often unrealistic assumptions. We recommend ways to improve the methodology.

Dynamic Networks and Behavior: Separating Selection from Influence
Christian Steglich, Tom A. B. Snijders and Michael Pearson
A recurrent problem in the analysis of behavioral dynamics, given a simultaneously evolving social network, is the difficulty of separating the effects of partner selection from the effects of social influence. Because misattribution of selection effects to social influence, or vice versa, suggests wrong conclusions about the social mechanisms underlying the observed dynamics, special diligence in data analysis is advisable. While a dependable and valid method would benefit several research areas, according to the best of our knowledge, it has been lacking in the extant literature. In this paper, we present a recently developed family of statistical models that enables researchers to separate the two effects in a statistically adequate manner. To illustrate our method, we investigate the roles of homophile selection and peer influence mechanisms in the joint dynamics of friendship formation and substance use among adolescents. Making use of a three-wave panel measured in the years 1995–1997 at a school in Scotland, we are able to assess the strength of selection and influence mechanisms and quantify the relative contributions of homophile selection, assimilation to peers, and control mechanisms to observed similarity of substance use among friends.

Sociological Methodology, 2010: Volume 40

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.