Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
It is of soaring demand to develop statistical analysis tools that are robust against contamination as well as preserving individual data owners' privacy. In spite of the fact that both topics host a rich body of literature, to the best of our knowledge, we are the first to systematically study the connections between the optimality under Huber's contamination model and the local differential privacy (LDP) constraints. We start with a general minimax lower bound result, which disentangles the costs of being robust against Huber's contamination and preserving LDP. We further study four concrete examples: a two-point testing problem, a potentially-diverging mean estimation problem, a nonparametric density estimation problem and a univariate median estimation problem. For each problem, we demonstrate procedures that are optimal in the presence of both contamination and LDP constraints, comment on the connections with the state-of-the-art methods that are only studied under either contamination or privacy constraints, and unveil the connections between robustness and LDP via partially answering whether LDP procedures are robust and whether robust procedures can be efficiently privatised. Overall, our work showcases a promising prospect of joint study for robustness and local differential privacy.
This is joint work with Mengchu Li and Yi Yu.
[-]
It is of soaring demand to develop statistical analysis tools that are robust against contamination as well as preserving individual data owners' privacy. In spite of the fact that both topics host a rich body of literature, to the best of our knowledge, we are the first to systematically study the connections between the optimality under Huber's contamination model and the local differential privacy (LDP) constraints. We start with a general ...
[+]
62C20 ; 62G35 ; 62G10
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
In this talk we consider high-dimensional classification. We discuss first high-dimensional binary classification by sparse logistic regression, propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic bounds for the resulting misclassification excess risk. Implementation of any complexity penalty-based criterion, however, requires a combinatorial search over all possible models. To find a model selection procedure computationally feasible for high-dimensional data, we consider logistic Lasso and Slope classifiers and show that they also achieve the optimal rate. We extend further the proposed approach to multiclass classification by sparse multinomial logistic regression.
This is joint work with Vadim Grinshtein and Tomer Levy.
[-]
In this talk we consider high-dimensional classification. We discuss first high-dimensional binary classification by sparse logistic regression, propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic bounds for the resulting misclassification excess risk. Implementation of any complexity penalty-based criterion, however, requires a combinatorial ...
[+]
62H30 ; 62C20
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
We consider the problem of estimating the mean vector of the multivariate complex normaldistribution with unknown covariance matrix under an invariant loss function when the samplesize is smaller than the dimension of the mean vector. Following the approach of Chételat and Wells (2012, Ann.Statist, p. 3137–3160), we show that a modification of Baranchik-tpye estimatorsbeats the MLE if it satisfies certain conditions. Based on this result, we propose the James-Stein-like shrinkage and its positive-part estimators.
[-]
We consider the problem of estimating the mean vector of the multivariate complex normaldistribution with unknown covariance matrix under an invariant loss function when the samplesize is smaller than the dimension of the mean vector. Following the approach of Chételat and Wells (2012, Ann.Statist, p. 3137–3160), we show that a modification of Baranchik-tpye estimatorsbeats the MLE if it satisfies certain conditions. Based on this result, we ...
[+]
62F10 ; 62C20 ; 62H12
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
When performing multiple testing, adjusting the distribution of the null hypotheses is ubiquitous in applications. However, the effect of such an operation remains largely unknown, especially in terms of false discovery proportion (FDP) and true discovery proportion (TDP). In this talk, we explore this issue in the most classical case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and where the Benjamini-Hochberg (BH) procedure is applied after a datarescaling step.
[-]
When performing multiple testing, adjusting the distribution of the null hypotheses is ubiquitous in applications. However, the effect of such an operation remains largely unknown, especially in terms of false discovery proportion (FDP) and true discovery proportion (TDP). In this talk, we explore this issue in the most classical case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and where the ...
[+]
62G10 ; 62C20 ; 62G30
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
we discuss classification problems in high dimension. We study classification problems using three classical notions: complexity of decision boundary, noise, and margin. We demonstrate that under suitable conditions on the decision boundary, classification problems can be very efficiently approximated, even in high dimensions. If a margin condition is assumed, then arbitrary fast approximation rates can be achieved, despite the problem being high-dimensional and discontinuous. We extend the approximation results ta learning results and show close ta optimal learning rates for empirical risk minimization in high dimensional classification.
[-]
we discuss classification problems in high dimension. We study classification problems using three classical notions: complexity of decision boundary, noise, and margin. We demonstrate that under suitable conditions on the decision boundary, classification problems can be very efficiently approximated, even in high dimensions. If a margin condition is assumed, then arbitrary fast approximation rates can be achieved, despite the problem being ...
[+]
68T05 ; 62C20 ; 41A25 ; 41A46
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
In this short course, we will discuss the problem of ranking with partially observed pairwise comparisons in the setting of Bradley-Terry-Luce (BTL) model. There are two fundamental problems: 1) top-K ranking, which is to select the set of K players with top performances; 2) total ranking, which is to rank the entire set of players. Both ranking problems find important applications in web search, recommender systems and sports competition.
In the first presentation, we will consider the top-K ranking problem. The statistical properties of two popular algorithms, MLE and rank centrality (spectral ranking) will be precisely characterized. In terms of both partial and exact recovery, the MLE achieves optimality with matching lower bounds. The spectral method is shown to be generally sub-optimal, though it has the same order of sample complexity as the MLE. Our theory also reveals the essentially only situation when the spectral method is optimal. This turns out to be the most favorable choice of skill parameter given the separation of the two groups.
The second presentation will be focused on total ranking. The problem is to find a permutation vector to rank the entire set of players. We will show that the minimax rate of the problem with respect to the Kendall's tau loss exhibits a transition between an exponential rate and a polynomial rate depending on the signal to noise ratio of the problem. The optimal algorithm consists of two stages. In the first stage, games with very high or low scores are used to partition the entire set of players into different leagues. In the second stage, games that are very close are used to rank the players within each league. We will give intuition and some analysis to show why the algorithm works optimally.
[-]
In this short course, we will discuss the problem of ranking with partially observed pairwise comparisons in the setting of Bradley-Terry-Luce (BTL) model. There are two fundamental problems: 1) top-K ranking, which is to select the set of K players with top performances; 2) total ranking, which is to rank the entire set of players. Both ranking problems find important applications in web search, recommender systems and sports competition.
In ...
[+]
62C20 ; 62F07 ; 62J12
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
In this short course, we will discuss the problem of ranking with partially observed pairwise comparisons in the setting of Bradley-Terry-Luce (BTL) model. There are two fundamental problems: 1) top-K ranking, which is to select the set of K players with top performances; 2) total ranking, which is to rank the entire set of players. Both ranking problems find important applications in web search, recommender systems and sports competition.
In the first presentation, we will consider the top-K ranking problem. The statistical properties of two popular algorithms, MLE and rank centrality (spectral ranking) will be precisely characterized. In terms of both partial and exact recovery, the MLE achieves optimality with matching lower bounds. The spectral method is shown to be generally sub-optimal, though it has the same order of sample complexity as the MLE. Our theory also reveals the essentially only situation when the spectral method is optimal. This turns out to be the most favorable choice of skill parameter given the separation of the two groups.
The second presentation will be focused on total ranking. The problem is to find a permutation vector to rank the entire set of players. We will show that the minimax rate of the problem with respect to the Kendall's tau loss exhibits a transition between an exponential rate and a polynomial rate depending on the signal to noise ratio of the problem. The optimal algorithm consists of two stages. In the first stage, games with very high or low scores are used to partition the entire set of players into different leagues. In the second stage, games that are very close are used to rank the players within each league. We will give intuition and some analysis to show why the algorithm works optimally.
[-]
In this short course, we will discuss the problem of ranking with partially observed pairwise comparisons in the setting of Bradley-Terry-Luce (BTL) model. There are two fundamental problems: 1) top-K ranking, which is to select the set of K players with top performances; 2) total ranking, which is to rank the entire set of players. Both ranking problems find important applications in web search, recommender systems and sports competition.
In ...
[+]
62C20 ; 62F07 ; 62J12
Déposez votre fichier ici pour le déplacer vers cet enregistrement.