1 Introduction
The use of novel machine learning tools for artificial intelligence (AI) applied to clinical psychiatry data sets have consistently increased in recent years
[Beam and Kohane2018], mostly due to the prevalence of algorithms that can ingest heterogeneous data sets and at the same time produce highly predictive models. Yet, while high predictability is indeed a desirable result, the healthcare community also requires that the abstractions generated by machine learning models are also interpretable, so experts can potentially incorporate new machine learning insights to currently classical tools, or even better, so experts can improve the performance of the abstraction by tuning the datadriven models. In this paper we take a practical approach towards solving this problem, both developing a new algorithm capable of mining association rules from wide categorical datasets, and applying our mining method towards building predictable and interpretable models to build transdiagnostic screening tools for psychiatric disorders.The artificial intelligence research community has been urged to develop interpretable machine learning methods, which can provide accessible and explicit explanations. A report by the AI Now Institute at New York University remarks as the top recommendation in their 2017 report that core government agencies, including those responsible for healthcare, “should no longer use black box AI and algorithmic systems” [Campolo et al.2017]. The Explainable Artificial Intelligence (XAI) program at DARPA has as one of its goals to “enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners” [Gunning2017]
. But popular machine learning methods such as artificial neural networks
[Hopfield1988, LeCun, Bengio, and Hinton2015] and ensemble models [Dietterich2000] are known for their elusive readout. For example, while artificial neural network applications exist for tumor detection in CT scans [Anthimopoulos et al.2016], it is virtually impossible for a person to understand the rational behind such a mathematical abstraction.Interpretability is often loosely defined as not only understanding what a model emitted, but also why it did [Gilpin et al.2018]
. In this context, straightforward linguistic explanations are frequently considered as the most interpretable when compared to examining coefficients of a linear model or evaluating the importance of perceptrons in artificial neural networks
[Morcos et al.2018]. Recent efforts towards interpretable machine learning models in healthcare can be found in the literature, such as the development of a boosting method to create decision trees as the combination of single decision nodes
[Valdes et al.2016]. Bayesian Rule List (BRL) [Rudin, Letham, and Madigan2013, Letham et al.2013, Letham et al.2015]mixes the interpretability of sequenced logical rules for categorical datasets, together with the inference power of Bayesian statistics. Compared to decision trees, BRL rule lists take the form of a hierarchical series of
ifthenelse statements where model emissions are correspond to the successful association to a given rule. BRL results in models that are inspired, and therefore similar, to standard humanbuilt decisionmaking algorithms.While BRL is by itself an interesting model to try on clinical psychiatry datasets, it relies on the existence of a set of rules from which the actual rule lists are built, similar to the approach taken by other associative classification methods [Liu, Hsu, and Ma1998, Yin and Han2003, Li, Han, and Pei2001]. Frequent pattern mining has been a standard tool to build such initial set of rules, with methods like Apriori [Agrawal and Srikant1994] and FPGrowth [Han, Pei, and Yin2000] being commonly used to extract rules from categorical datasets. However, frequent pattern mining methods do not scale well for wide datasets, i.e., datasets where the total number of categorical features is much larger than the number of samples, commonly denoted as . Most clinical healthcare datasets are wide and thus require new mining methods to enable the use of BRL in this research area.
In this paper we propose a new rule mining technique that is not based on the frequency in which certain categories simultaneously appear. Instead, we use Multiple Correspondence Analysis (MCA) [Greenacre1984, Greenacre and Blasius2006], a particular application of correspondence analysis to categorical datasets, to establish a similarity score between different associative rules. We show that our new MCAminer method is significantly faster than commonly used frequent pattern mining methods, and that it scales well to wide datasets. Moreover, we show that MCAminer performs equally well than other miners when used together with BRL. Finally, we use MCAminer and BRL to analyze a transdiagnostic dataset for psychiatric disorders, building both interpretable and accurate predictors to support clinician screening tasks.
The remainder of this paper is organized as follows. In Sec. 2
we layout the problem of constructing a rule based classifier and establish the notation we use throughout the paper. We illustrate the algorithmic structure of MCAminer in Sec.
3, and discuss the mathematical property of pruning rules based on coordinates scoring. We compare the performance of our new method against standard benchmark datasets in Sec. 4, and we study new screening datadriven models for psychiatric disorders in Sec. 5.2 Problem Description and Definitions
We begin by introducing relevant notation and definitions used throughout this paper. An attribute, denoted , is a categorical property of each data sample, which can take a discrete and finite number of values, denoted . A literal is a Boolean statement checking if an attribute takes a given value, e.g., given an attribute with categorical values we can define the following literals: is , and is . Given a collection of attributes , a data sample is a list of categorical values, one per attribute. A rule, denoted , is a collection of literals, with length , which is used to produce Boolean evaluations of data samples as follows: a rule evaluates to True whenever all the literals are also True, and evaluates to False otherwise.
In this paper we consider the problem of efficiently building rule lists for data sets with a large total number of categories among all attributes (i.e., ), a common situation among data sets related to health care or pharmacology. Given data samples, we represent a data set as a matrix with dimensions , where is the category assigned to the th sample for the
th attribute. We also consider a categorical label for each data sample, collectively represented as a vector
with length . We denote the number of label categories by , where . If then we are in presence of a standard binary classification problem. If, instead, then we solve a multiclass classification problem, as shown in Sec. 5.2.1 Bayesian Rule Lists
Bayesian Rule Lists (BRL) is a framework proposed by Rudin et al. [Rudin, Letham, and Madigan2013, Letham et al.2013, Letham et al.2015] to build lists of rules for data sample classification. An example of a BRL output trained on the commonly used Titanic survival data set [Hendricks2015], as shown in [Letham et al.2015], is included in Fig. 1.
Although BRL is a significant step forward in the development of explainable AI methods, searching over the configuration space of all possible rules containing all possible combinations of literals obtained from a given data set is simply infeasible. Letham et al. [Letham et al.2015] offer a good compromise solution to this problem, where first a set of rules is mined from a data set, and then BRL searches over the configuration space of combinations of the prescribed set of rules using a custombuilt MCMC algorithm.
While efficient rule mining methods are available in the literature such as Apriori and FPGrowth, as explained in Sec. 1, we have found that such methods fail to execute on data sets with a large total number of categories, due to either unacceptably long computation time or prohibitively high memory usage. We show an example of this situation in Sec. 5.
In this paper we build upon the method in [Letham et al.2015] developing two improvements. First, we propose a novel rule mining algorithm based on Multiple Correspondence Analysis that is both computational and memory efficient, enabling us to apply BRL on datasets with a large total number of categories. Our MCAbased rule mining algorithm is explained in detail in Sec. 3.
Second, we parallelized the MCMC search method in BRL by executing individual Markov chains in separate CPU cores of a computer. Moreover, we periodically check the convergence of the multiple chains using the generalized Gelman & Rubin convergence criteria
[Brooks and Gelman1998, Gelman and Rubin1992], thus stopping the execution once the convergence criteria is met. As shown in Fig. 4, our implementation is significantly faster than the original singlecore version, enabling the study of more data sets with longer rules or a large number of features.3 MCAbased Rule Mining
Multiple Correspondence Analysis (MCA) [Greenacre1984, Greenacre and Blasius2006]
is a method that applies the power of Correspondence Analysis (CA) to categorical data sets. For the purpose of this paper it is important to note that MCA is the application of CA to the indicator matrix of all categories in the set of attributes, thus generating principal vectors projecting each of those categories into a euclidean space. We use these principal vectors to build a heuristic merit function over the set of all available rules given the categories in a data set. Moreover, the structure of our merit function enables us to efficiently mine the best rules, as detailed below.
3.1 Rule Score Calculation
First, we define the extended data matrix concatenating and , denoted with dimensions . We then compute the MCA principal vectors for each category present of . Let us call the MCA principal vectors associated each categorical value as categorical vectors, denoted by , where is the set of attributes in the data set . Also, let us call the MCA principal vectors associated to label categories as label vectors, denoted by .
Since each category can be mapped to a literal statement, as explained in Sec. 2, these principal vectors serve as a heuristic to evaluate the quality of a given literal to predict a label, as suggested in [Zhu et al.2010]. Therefore, we compute the a score between each categorical vector and each label vector as the cosine of their angle:
(1) 
Note that in the context of random variables,
is equivalent to the correlation between the two principal vectors [Loève1977].We compute the score between a rule and label category , denoted , as the average among the scores between the literals in and the same label category, i.e.:
(2) 
Finally, we search the configuration space of rules built using the combinations of all available literals in a data set such that , and identify those with highest scores for each label category. These top rules are the output of our miner, and are passed over to the BRL method as the set of rules from which rule lists will be built.
3.2 Rule Prunning
Since the number of rules generated by all combinations of all available literals up to length is excessively large even for modest values of , our miner includes two conditions under which we efficiently eliminate rules from consideration.
First, similar to the approach in FPGrowth [Han, Pei, and Yin2000] and other popular miners, we eliminate rules whose support over each label category is smaller than a userdefined threshold . Recall that the support of a rule for label category , denoted , is the fraction of data samples that the rule evaluates to True among the total number of data samples associated to a given label. Given a rule , note that the support of every other rule containing the collection of literals in satisfies . Hence, once a rule fails to pass our minimum support test, we stop considering all rules longer than that also contain the all the literals in .
Second, we eliminate rules whose score is smaller than a userdefined threshold . Now, suppose that we want to build a new rule by taking a rule and adding a literal . In that case, given a category the score of this rule must satisfy:
(3) 
Let be the largest score for label category among all available literals, then we can predict that an extension of will have a score greater than if:
(4) 
Finally, given the maximum number of rules to be mined per label , we recompute as we iterate combining literals to build new rules. Indeed, we periodically sort the scores for our temporary list of candidate rules and set equal to the score of the th rule in the sorted list. As increases due to better candidate rules becoming available, the condition in eq. (4) becomes more restrictive, resulting in less rules being considered and therefore in a faster overall mining.
4 Benchmark Experiments
Dataset  FPGrowth + BRL  MCAminer + BRL  

Accuracy  AUC  Accuracy  AUC  
Adult  45,222  14  111  0.81  0.85  512  0.81  0.85  115 
ASD  248  21  89  0.87  0.90  198  0.87  0.90  16 
Cancer  569  32  150  0.92  0.97  168  0.92  0.94  22 
Heart  303  13  49  0.82  0.86  117  0.82  0.86  15 
HIV  5,840  8  160  0.87  0.88  449  0.87  0.88  36 
Titanic  2,201  3  8  0.79  0.76  118  0.79  0.75  10 
Our MCAminer method in Fig. 2, when used together with BRL, offers the power of rule list interpretability while maintaining the predictive capabilities of already established machine learning methods.
We benchmark the performance and computational efficiency of our MCAminer against the “Titanic” dataset [Hendricks2015], as well as the following 5 datasets available in the UCI Machine Learning Repository [Dheeru and Karra Taniskidou2017]: “Adult,” “Autism Screening Adult,” “Breast Cancer Wisconsin (Diagnostic),” “Heart Disease,” and “HIV1 protease cleavage,” which due to space constraints we designate as Adult, ASD, Cancer, Heart, and HIV, respectively. These datasets represent a wide variety of realworld experiments and observations, thus enabling us to fairly compare our improvements against the original BRL implementation using the FPGrowth miner.
All 6 benchmark datasets correspond to binary classification tasks. We conduct the experiments using the same set up in each of the benchmarks. First, we transform the dataset into a format that is compatible with our BRL implementation. Second, we quantize all continuous attributes into either 2 or 3 categories, while keeping the original categories of all other variables. It is worth noting that depending on the dataset and how its data was originally collected, we prioritize the existing taxonomy and expert domain knowledge to generate the continuous variable quantization. We simply generate a balanced quantization when no other information was available. Third, we train and test a model using 5fold crossvalidations, reporting the average accuracy and Area Under the ROC Curve (AUC) as model performance measurements.
Table 1 presents the empirical result of comparing both implementations.
The notation in the table follows the definitions in Sec. 2.
To guarantee a fair comparison between both implementations we fixed the parameters and for both methods, and in particular for MCAminer we also set , and .
Our multicore implementations for both MCAminer and BRL were executed on 6 parallel processes, and only stopped when the Gelman & Rubin parameter [Brooks and Gelman1998] satisfied .
We ran all the experiments using a single AWS EC2 c5.18xlarge
instance with 72 cores.
It is clear from our experiments in Table 1 that our MCAminer matches the performance of FPGrowth in each case, while significantly reducing the computation time required to mine rules and train a BRL model.
5 Transdiagnostic Screen for Mental Health
The Consortium for Neuropsychiatric Phenomics (CNP) [Poldrack et al.2016] is a research project aimed at understanding shared and distinct neurobiological characteristics among multiple diagnostically distinct patient populations. Four groups of subjects are included in the study: healthy controls (HC, ), Schizophrenia patients (SCHZ, ), Bipolar Disorder patients (BD, ), and Attention Deficit and Hyperactivity Disorder patients (ADHD, ). The total number of subjects in the dataset is . Our goal analyzing the CNP dataset is to develop interpretable and effective screening tools to identify the diagnosis of these three psychiatric disorders in patients.
5.1 CNP SelfReported Instruments Dataset
Among other data modalities, the CNP study includes responses to individual questions, belonging to 13 selfreport clinical questionnaires, per subject [Poldrack et al.2016]. The total number of categories generated by the 578 questions is . The 13 questionnaires are the following (in alphabetical order):

Adult ADHD SelfReport Screener (ASRS),

Barratt Impulsiveness Scale (Barratt),

Chapman Perceptual Aberration Scale (ChapPer),

Chapman Physical Anhedonia Scale (ChapPhy),

Chapman Social Anhedonia Scale (ChapSoc),

Dickman Function and Dysfunctional Impulsivity Inventory (Dickman),

Eysenck’s Impusivity Inventory (Eysenck),

Golden & Meehl’s 7 MMPI Items Selected by Taxonomic Method (Golden),

Hopkins Symptom Check List (Hopkins),

Hypomanic Personality Scale (Hypomanic),

Multidimensional Personality Questionnaire – Control Subscale (MPQ),

Temperament and Character Inventory (TCI), and

Scale for Traits that Increase Risk for Bipolar II Disorder (BipolarII).
The details about these questionnaires are beyond the scope of this paper, and due to space constraints we abbreviate the individual questions using the name in parenthesis in the list above together with the question number. For example, Hopkins#57 denotes the 57th question in the “Hopkins Symptom Check List” questionnaire.
Depending on the particular clinical questionnaire, each question has results in a binary answer (i.e., True or False) or a rating integer (e.g., from 1 to 5). We used each question as a literal attribute, resulting in a range from 2 to 5 categories per attribute.
5.2 Performance Benchmark
Rather than prune the number of attributes a priori to reduce the search space for both the rule miner and BRL, we applied our novel MCAminer to identify the best rules over complete search space of literal combinations. Note that this results in a challenging problem for most machine learning algorithms since this is a wide dataset with more features than samples, i.e., . Indeed, just generating all rules with 3 literals from this dataset results in approximately 23 million rules. Fig. 3 compares the wall execution time of our MCAminer against three popular associative mining methods: FPGrowth, Apriori, and Carpenter, all using the implementation in the PyFIM package [Borgelt2012]. As shown in Fig. 3, while the associative mining methods are reasonably efficient on datasets with few features, they are incapable of handling from than roughly 70 features from the CNP dataset, resulting in outofmemory errors or impractically long executions even for largescale computeoptimized AWS EC2 instances. In comparison, MCAminer empirically exhibits a grow rate compatible with datasets much larger than CNP, as it runs many orders of magnitude faster than associative mining methods. It is worth noting that while FPGrowth is shown as the fastest associative mining method in [Borgelt2012], its scaling behavior vs. the number of attributes is practically the same as Apriori in our experiments.
In addition to the increased performance due to MCAminer, we also improved the implementation of the BRL training MCMC algorithm by running parallel Markov chains simultaneously in different CPU cores, as explained in Sec. 2.1. Fig. 4 shows the BRL training time comparison, given the same rule set and both using 6 chains, between our multicore implementation against the original singlecore implementation reported in [Letham et al.2015]. Also, Fig. 5 shows that the multicore implementation convergence wall time scales linearly with the number of Markov chains, with . While both implementations display a similar grow rate as the rule set size increases, our multicore implementation is roughly 3 times faster in this experiment.
5.3 Interpretable Transdiagnostic Classifiers
In the interest of building the best possible transdiagnostic screening tool for the three types of psychiatric patients present in the CNP dataset, we build three different classifiers. First, we build a binary classifier to separate HC from the set of Patients, defined as the union of SCHZ, BD, and ADHD subjects. Second, we build a multiclass classifier to directly separate all four original categorical labels available in the dataset. Finally, we evaluate the performance of the multiclass classifier by repeating the binary classification task and comparing the results. In addition to using Accuracy and AUC as performance metrics as in Sec. 4, we also report Cohen’s coefficient [Cohen1960] as another indication for the effect size of our classifier. Cohen’s is compatible with both binary and multiclass classifiers. It ranges between 1 (complete misclassification) to 1 (perfect classification), with 0 corresponding to a chance classifier. To avoid a biased precision calculation, we subsample the dataset to balance out each label, resulting in subjects for each of the four classes, with a total of samples. Finally, we use 5fold crossvalidation to ensure the robustness of our training and testing methodology.
Binary classifier
Classifier  Accuracy  AUC  Cohen 

MCAminer + BRL  0.79  0.82  0.58 
Random Forest  0.75  0.85  0.51 
Boosted Trees  0.79  0.87  0.59 
Decision Tree  0.71  0.71  0.43 
Besides using our proposed MCAminer together with BRL to build an interpretable rule list, we also benchmark its performance against other commonly used machine learning algorithms compatible with categorical data, which we applied using the Scikitlearn [Pedregosa et al.2011] implementations and default parameters. As shown in Table 2, our method is statistically as good, if not better, than the other methods we compared against.
The rule list generated using MCAminer and BRL is shown in Fig. 7. Also, a breakdown analysis of the number of subjects being classified per rule in the list is shown in Fig. 7. The detailed description of the questions in Fig. 7 is shown in Table 3. Note that most of the subjects are classified with a high probability in the top two rules, which is a very useful feature in situations where fast clinical screening is required.
Multiclass classifier
Fig. 9 shows the output rule list after training a BRL model using the all 4 labels in the CNP dataset, as explained above. Note that the rules in Fig. 9 emit the maximum likelihood estimate corresponding to the multinomial distribution generated by the same rule in the BRL model, since this is the most useful output for practical clinical use. After 5fold crossvalidation our MCAminer with BRL classifier has an accuracy of and Cohen’s of . Fig. 10
shows the average confusion matrix for the multiclass classifier using all 5 crossvalidation testing cohorts. The actual questions referenced in the rule list in Fig.
9 are shown in detail in Table 3.Label  Question  Answer type 

Barratt#12  I am a careful thinker  1 (rarely) to 4 (almost always) 
BipolarII#1  My mood often changes, from happiness to sadness, withour my knowing why  Boolean 
BipolarII#2  I have frequent ups and downs in mood, with and without apparent cause  Boolean 
ChapSoc#9  I sometimes become deeply attached to people I spend a lot of time with  Boolean 
ChapSoc#13  My emotional responses seem very different from those of other people  Boolean 
Dickman#22  I don’t like to do things quickly, even when I am doing something that is not very difficult  Boolean 
Dickman#28  I often get into trouble because I don’t think before I act  Boolean 
Dickman#29  I have more curiosity than most people  Boolean 
Eyenseck#1  Weakness in parts of your body  Boolean 
Golden#1  I have not lived the right kind of life  Boolean 
Hopkins#39  Heart pounding or racing  0 (not at all) to 3 (extremely) 
Hopkins#56  Weakness in parts of your body  0 (not at all) to 3 (extremely) 
Hypomanic#1  I consider myself to be an average kind of person  Boolean 
Hypomanic#8  There are often times when I am so restless that it is impossible for me to sit still  Boolean 
TCI#231  I usually stay away from social situations where I would have to meet strangers, even if I am assured that they will be friendly  Boolean 
The interpretability and transparency of the rule list in Fig. 9 enables us to obtain further insights regarding the population in the CNP dataset. Indeed, similar to the binary classifier, Fig. 9 shows the mapping of all CNP subjects using the 4class rule list. While the accuracy of the rule list as a multiclass classifier is not perfect, it is worth noting how just 7 questions out of a total of 578 are enough to produce a relatively balanced output among the rules, while significantly separating the label categories.
Also note that even though each of the 13 questionnaires in the dataset have been thoroughly tested in the literature as clinical instruments to detect and evaluate different traits and behaviors, the 7 questions picked by our rule list do not favor any of the questionnaires in particular. This is an indication that transdiagnostic classifiers are better obtained from different sources of data, and likely improve their performance as other modalities, such as mobile digital inputs, are included in the dataset.
Binary classification using multiclass rule list
We further evaluate the performance of the multiclass classifier in Fig. 9 by using it as binary classifier, i.e., we replace the ADHD, BD, and SCHZ labels with Patients. Using the same 5fold crossvalidated models obtained in the multiclass section above, we compute their performance as binary classifiers obtaining an accuracy of , AUC of , and Cohen’s of . These values are on par with those in Table 2, showing that our method does not decrease performance by adding more categorical labels.
6 Discussion
In the paper we propose a novel methodology to analyze categorical datasets with a large number of attributes, a property that is prevalent in the clinical psychiatry community. Our contributions consist of a novel MCAbased rule mining method with excellent scaling properties against the number of categorical attributes, and a new implementation of the BRL algorithm using multicore parallel execution. Then, we study the CNP dataset for psychiatric disorders using our new methodology, resulting in rulebased interpretable classifiers capable of screening patients from selfreported questionnaire data. Our results not only show the viability of building interpretable models for stateoftheart clinical psychiatry datasets, but also that these models can be scaled to larger datasets to understand the interactions and differences between these disorders.
References
 [Agrawal and Srikant1994] Agrawal, R., and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, 487–499.

[Anthimopoulos et al.2016]
Anthimopoulos, M.; Christodoulidis, S.; Ebner, L.; Christe, A.; and
Mougiakakou, S.
2016.
Lung pattern classification for interstitial lung diseases using a deep convolutional neural network.
IEEE Transactions on Medical Imaging 35(5):1207–1216.  [Beam and Kohane2018] Beam, A. L., and Kohane, I. S. 2018. Big data and machine learning in health care. JAMA 319(13):1317–1318.
 [Borgelt2012] Borgelt, C. 2012. Frequent item set mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2(6):437–456.
 [Brooks and Gelman1998] Brooks, S. P., and Gelman, A. 1998. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7(4):434–455.
 [Campolo et al.2017] Campolo, A.; Sanfilippo, M.; Whittaker, M.; and Crawford, K. 2017. AI Now 2017 report. AI Now Institute at New York University.
 [Cohen1960] Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1):37–46.
 [Dheeru and Karra Taniskidou2017] Dheeru, D., and Karra Taniskidou, E. 2017. UCI machine learning repository.
 [Dietterich2000] Dietterich, T. G. 2000. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2):139–157.
 [Gelman and Rubin1992] Gelman, A., and Rubin, D. B. 1992. Inference from iterative simulation using multiple sequences. Statistical Science 7(4):457–472.
 [Gilpin et al.2018] Gilpin, L. H.; Bau, D.; Yuan, B. Z.; Bajwa, A.; Specter, M.; and Kagal, L. 2018. Explaining explanations: An approach to evaluating interpretability of machine learning. ArXiv Preprints.
 [Greenacre and Blasius2006] Greenacre, M. J., and Blasius, J. 2006. Multiple correspondence analysis and related methods. Chapman & Hall/CRC.
 [Greenacre1984] Greenacre, M. J. 1984. Theory and Applications of Correspondence Analysis. Academic Press.
 [Gunning2017] Gunning, D. 2017. Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA).
 [Han, Pei, and Yin2000] Han, J.; Pei, J.; and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 1–12.
 [Hendricks2015] Hendricks, P. 2015. titanic: Titanic Passenger Survival Data Set. R package version 0.1.0.
 [Hopfield1988] Hopfield, J. J. 1988. Artificial neural networks. IEEE Circuits and Devices Magazine 4(5):3–10.
 [LeCun, Bengio, and Hinton2015] LeCun, Y.; Bengio, Y.; and Hinton, G. 2015. Deep learning. Nature 521(7553):436.
 [Letham et al.2013] Letham, B.; Rudin, C.; McCormick, T. H.; and Madigan, D. 2013. An interpretable stroke prediction model using rules and bayesian analysis. In Proceedings of the 17th AAAI Conference on LateBreaking Developments in the Field of Artificial Intelligence, 65–67.
 [Letham et al.2015] Letham, B.; Rudin, C.; McCormick, T. H.; and Madigan, D. 2015. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics 9(3):1350–1371.
 [Li, Han, and Pei2001] Li, W.; Han, J.; and Pei, J. 2001. CMAR: Accurate and efficient classification based on multiple classassociation rules. In Proceedings of the 2001 IEEE International Conference on Data Mining, 369–376.
 [Liu, Hsu, and Ma1998] Liu, B.; Hsu, W.; and Ma, Y. 1998. Integrating classification and association rule mining. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 80–86.
 [Loève1977] Loève, M. 1977. Probability Theory I. Number 45 in Graduate Texts in Mathematics. Springer.
 [Morcos et al.2018] Morcos, A. S.; Barrett, D. G.; Rabinowitz, N. C.; and Botvinick, M. 2018. On the importance of single directions for generalization. ArXiv Preprints.
 [Pedregosa et al.2011] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; and Duchesnay, E. 2011. Scikitlearn: Machine learning in Python. Journal of Machine Learning Research 12:2825–2830.
 [Poldrack et al.2016] Poldrack, R. A.; Congdon, E.; Triplett, W.; Gorgolewski, K. J.; Karlsgodt, K. H.; Mumford, J. A.; Sabb, F. W.; Freimer, N. B.; London, E. D.; Cannon, T. D.; and Bilder, R. M. 2016. A phenomewide examination of neural and cognitive function. Scientific Data 3:160110.
 [Rudin, Letham, and Madigan2013] Rudin, C.; Letham, B.; and Madigan, D. 2013. Learning theory analysis for association rules and sequential event prediction. Journal of Machine Learning Research 14:3441–3492.
 [Valdes et al.2016] Valdes, G.; Luna, J. M.; Eaton, E.; II, C.; Ungar, L. H.; and Solberg, T. D. 2016. MediBoost: a patient stratification tool for interpretable decision making in the era of precision medicine. Scientific Reports 6:37854.
 [Yin and Han2003] Yin, X., and Han, J. 2003. CPAR: Classification based on predictive association rules. In Proceedings of the 2003 SIAM International Conference on Data Mining, 331–335.
 [Zhu et al.2010] Zhu, Q.; Lin, L.; Shyu, M.L.; and Chen, S.C. 2010. Feature selection using correlation and reliability based scoring metric for video semantic detection. In Proceedings of the IEEE Fourth International Conference on Semantic Computing, 462–469.
Comments
There are no comments yet.