Using CausalMGM is easyJust follow the steps below
CausalMGM is a data analysis tool to explore large, complex datasets. The method learns a graphical model of the data where the nodes are variables and edges display the dependencies among variables. The graphical model allows users to query their data to find the direct influences of a target variable of interest, or to find novel associations between pairs of variables.
CausalMGM.org is freely accessible for all. This includes both commercial and non-commercial accesses.
Interface & Cloud Construction:
Xiaoyu Ge, Daniel Petrov
Supervised By :
Panos K. Chrysanthis
Panayiotis V. Benos
The CausalMGM method expects a text file in tab-separated format with variables in the columns and samples in the rows.
The first row of the file should have the variable names, and each row following should have numerical or categorical data.
Numeric columns should only contain digits along with a single decimal point.
Categorical columns should only have a maximum of 5 unique categories and may contain any combination of numbers and characters to encode each category.
The current implementation of MGM does not support data with missing values, so this should be handled by the user before submitting.
Otherwise, complete case analysis or median imputation will be performed automatically.
If the user chooses to use our automated methods for handling missing data, then missing data entries should be encoded with an *.
Please download the sample data to see an example of the properly formatted dataset.
Explanation of Feature Selection:
CausalMGM's feature selection is based on the PrefDiv algorithm, which is a method to identify the features most associated with a target variable yet not associated to one another.
PrefDiv requires the following input:
The number of features to be selected.
Features that should be kept no matter their relevance.
The target variable.
PrefDiv only operates on continuous features, so all categorical features will automatically be included.
The target variable may be continuous or categorical.
The method is computationally expensive on large datasets, please be cautious.