Causal and Interdependency Discovery from Data

The research is to advance the theory and design principles in several fronts with broader interest in many applications: (1) The work develops a framework to learn structure of Cyclic Causal Graphs from data. (2) It studies recovering the causal graphs with missing/incomplete data. (3) The research also explores how to incorporate other sources of information, such as prior knowledge and inductive rules arisen from other sources of information into the causal discovery. (4) The research, for the first time, investigates and develops the theory of learning the structure of probabilistic graphical models in both parametric and non-parametric scenarios under indirect low-dimensional observations. (5) It will also introduces a novel paradigm based on density evolution on graphs (which is originally developed for the optimal design of low-density parity check codes for reliable communication) that leverages the prior (dependency) structure of a high dimensional signal to design and optimize a compressive measurement system for the more accurate graph-structure recovery.

We also studied learning the graphical structure of for Markov Random Fields MRFs. The existing works in structure learning for MRF assume that the variables (associated with nodes of MRF) are directly observed in the training samples. However, in many real-world applications this may be infeasible. One of the main challenges in structure estimation (learning the graphical model) especially for high dimensional data is that usually the number of training/observation samples is less than the dimension of covariates (i.e., number of nodes in MRF). This is generally tackled via M-estimators, where a certain low-dimensional underlying structure is assumed for promoting the required structure. Our research primarily focused on developing algorithms for learning undirected graphical structure (i.e., Markov Random Fields) of higher dimensional random variables. To that end, the feasibility of recovery was studied for two scenarios of distribution of the random vector signal, namely, when the distribution was assumed to be (i) parametric (Gaussian), and (ii) non-parametric (nonparanormal).

Learning Causal Graphs via Normalizing Flows (NoDAG)

In this work, we wish to use observational and interventional data and learn the causal graphical model that gives rise to the generation of the data. In particular, we look at the recovery of cyclic graphs with nonlinear parent-child (causal) relationships. An additive noise model was used to represent the structural equations defining the parent-child relations between the nodes, where we assume that the intrinsic noise variable is from a known family of distribution. Utilizing both observational and interventional data samples, the causal structure is learnt from data using a differentiable learning framework. To that end, we propose NODAGS-Flow, a causal discovery framework where we employ contractive neural networks along with an input mask to model the functional relationship between the nodes in the graph. The parameters of the model are learnt via maximum likelihood estimation using both observational and interventional data samples. This relies on being able to compute the data likelihood which in general can be expensive. Fortunately, NODAGS-Flow can be treated as a residual normalizing flow model and hence tools from normalizing flow-based methods can be used to efficiently compute the data likelihood.

Learning Graphical Models from Low Dimensional Observations

The research is focused on recovering the graphical model structure of a high dimensional signal from its low dimensional observations. We considered only undirected graphs (i.e., Markov Random Fields). This is particularly useful in biological applications where direct measurement of all variables is very costly and time consuming. Specifically, we consider the scenario where the measurements are taken through a linear measurement system ‘A’. That is the observations y = Ax + e, where x denotes the random variables corresponding to the node and e corresponds to background disturbance (noise). For the case of parametric distribution with Gaussian assumption, we wish to recover structure from compressed measurements by projecting a high dimensional vector x to a lower dimension. Additionally, for the non-parametric case, the graph structure is recovered from noisy linear measurements with a minimum dimension equal to that of x. In both parametric and non-parametric cases, recovering the covariance matrix is the key for recovering the graph structure.

(a) Parametric Setting

We considered the setting where the underlying distribution of the signal was Gaussian and the measurements taken via an under-determined system. We develop an unbiased estimator for the covariance matrix from under-determined measurements, y = Ax. This provides a stable estimator even when the number of samples of x is only comparable to its dimension. We develop a two-stage algorithm to recover the structure of the graph. We first recover the covariance matrix using the proposed covariance estimator. Using the estimated covariance we then use graphical lasso to estimate the inverse covariance matrix, whose support provides the underlying structure of the graph. We also develop the theory studying the behavior of the proposed algorithm and also analyze the sample requirements and the amount of compression that can be achieved. We then show via theoretical analysis that the proposed estimator is capable of recovering the actual covariance of x with high probability.

(b) Non-parametric Setting

In this case, we study recovery of structure of graphs whose underlying distribution is nonparanormal. Here, the measurements y are taken to have a higher dimension than that of x. Upon estimating x from y, we then provide an estimator for the Cummulative Density Function (CDF) of x which is then used to generate an estimate of the covariance matrix, graphical lasso is used to estimate the inverse covariance matrix, which then gives the graph structure. Additionally, the behavior of the proposed algorithm is studied via theoretical analysis and we obtain a non-asymptotic uniform bound on the estimation error of CDF of x with inexact knowledge of noise distribution.