Rechercher des projets européens

Learning and Testing Structured Probability Distributions (LTSPD)
Date du début: 1 mars 2014, Date de fin: 28 févr. 2018 PROJET  TERMINÉ 

"The research topic of the current proposal lies within the area of Algorithms and Complexity.The goal of this proposal is to advance a research program of developingcomputationally efficient algorithms for learning and testinga wide range of natural and important classes of probability distributions.We live in an era of “big data,” where the amount of data that can be brought to bearon questions of biology, climate, economics, etc, is vast and expanding rapidly.Much of this raw data frequently consists of example points without corresponding labels.The challenge of how to make sense of this unlabeled data has immediate relevanceand has rapidly become a bottleneck in scientific understanding across many disciplines.An important class of big data is most naturally modeled as samplesfrom a probability distribution over a very large domain.This prompts the basic question:Given samples from some unknown distribution, what can we infer?While this question has been studied for several decadesby various different communities of researchers,both the number of samples and running time required for such estimation tasksare not yet well understood, even for some surprisingly simple types of discrete distributions.In this project we will develop computationally efficient algorithmsfor learning and testing various classes of discrete distributions over very large domains.Specific problems we will address include:(1) Developing efficient algorithms to learn and test probability distributions that satisfy variousnatural types of ""shape restrictions"" on the underlying probability density function.(2) Developing efficient algorithms for learning and testing complex distributions that resultfrom the aggregation of many independent simple sources of randomness.We believe that highly efficient algorithms for these estimation tasksmay play an important role for the next generation of large-scale machine learning applications."