Step 1. Upload the dataset
Some requirements for the dataset are:
- Must be in tabular format with variable names in the first row.
- Must have no missing data.
- Must contain only continuous and categorical variables (no censoring).
- Ordinal variables will be treated as continuous if there are more than 5 categories.
- To treat these as categorical, combine categories to reduce the number to 5.
- To treat these as continuous, use real numbered values (e.g. 2.0, 3.0, etc.).
- You can use the Check my Data button to confirm proper format
Step 2. Configure the experiment
The meaning of each parameter is as follows:
Pref-Div parameters
Number of variables to be selected - This is the number of continuous features to be selected.
hese selected features, the target variable, and all of the categorical variables will be used for graphical modeling. A higher number enables a more accurate graphical model, but will take longer to run and will be more difficult to visualize/interpret.
Name of target variable - Which variable should be treated as the target variable?
Variables to Keep - Comma-separated list of continuous variables that should be automatically included in the final graphical model. (Not counted towards the "number of variables to be selected")
Automatic clustering - Redundant features will be represented as a cluster variable using Principal Component Analysis instead of a single representative of the entire cluster.
Graphical Modeling Parameters
Lambda parameters determine the sparsity level of the mixed graphical model. A higher lambda value results in fewer (but higher confidence) edges in the output graph. There are separate lambda parameters for each edge type.
Lambda 1 - Controls sparsity for edges between two continuous variables.
Lambda 2 - Controls sparsity for edges between continuous and categorical variables.
Lambda 3 - Controls sparsity for edges between two categorical variables.
Lambda values can be automatically chosen based on stability using the "Find Lambdas" button. Note that this operation can be time consuming for large datasets.
Alpha - PC-Stable requires an alpha value. This is the p-value threshold for the conditional independence tests. A lower alpha value means a sparser graph because fewer tests will meet the threshold for significance.