Stanford researchers have developed a patented algorithm for general supervised learning.

About

Summary

Stanford researchers have developed a patented algorithm for general supervised learning. Its initialization requires a learning sample, with features and outcome given (missing values are allowed in predictors, but not in the outcomes). Once a decision tree is built, then its application is to cases with only features but not outcome given. It is particularly applicable to complex data sets where multiple factors, especially SNPs (single nucleotide polymorphisms) in a genetic scenario, combine to determine outcome. Those factors have complicated and influential interactions but may have insignificant individual contributions. 

Such cases are common in the real world, one good example being the association between features such as genetic and environmental risk factors on the one hand, and complex disease on the other. Traditional approaches focusing on individual effects have proven difficult to apply in this case. The FlexTree approach, on the contrary, treats all risk factors together. It considers suitably chosen interactions and main effects simultaneously. The technique stems from well-known binary classification tree methods. It uses the tree as framework and employs penalized linear regression on suitably transformed features to define a partitioning rule. This approach allows consideration of optimally chosen complicated interactions, and also enables simple, easy interpretation. Its predictive power and robustness are improved by the variable selection procedure embedded in the algorithm. FlexTree demonstrates substantial improvement in performance over several other cutting-edge technologies in some applications. 

 

Stage of Research


FlexTree has been successfully applied to finding genetic and environmental interactions that predispose Chinese women to hypertension. The technology is also being utilized to find genotypes and various interactions that are predictive of (a subset of) cardiovascular disease.
 

Applications


Data mining
Life sciences research - data analysis for:

SNPs
bioinformatics
mass spectroscopy
defining risk groups
statistics 




 

Advantages


Powerful:


considers both interactions and combined effects simultaneously
provides a simple easy-to-interpret model


Robust


 

Purchase a license for full unlimited access to all innovation profiles on LEO

  • Direct connection to thousands of more innovations
  • Access to market Experts and Universities
  • Filter relevant solutions into your own dedicated Network