Nexus of Information and Computation Theories Inference Problems Theme March 7  18, 2016
Titles, Abstracts, and Videos
Alex Andoni (Columbia) (Video: Part 1, Part 2)
Sketching and Embeddings Abstract: Sketching for distance estimation is the problem where we need to design a possibly randomized function \(f\) from a metric space to short strings, such that from \(f(x)\) and \(f(y)\) we can estimate the distance between \(x\) and \(y\). This problem is a core problem in both the streaming and nearest neighbor search areas. We will discuss this problem and its connections to the theory of metric embeddings. In particular, we will discuss when and why sketching is equivalent to embedding into normed space such as \(\ell_1\).
Iryna Andriyanova (ETIS Lab, ENSEA/University of CergyPontoise/CNRS) (Video)
IterativelyDecoded ErasureCorrecting Coding for Distributed Storage Abstract: This talk will present some constructions of iterativelydecoded sparsegraph codes over various erasure channel models, coming from distributed storage systems. Although the stateoftheart of coding for distributed storage is built on short algebraic block codes, there have been several attempts to use sparsegraph codes, with the aim to improve the decoding complexity and the scalability of the storage system in whole.
This talk will introduce the existing code constructions and will discuss the use of graphbased codes in the framework of distributed storage.
Alexandre d'Aspremont (École Normale Supérieure) (Video)
An Optimal Affine Invariant Smooth Minimization Algorithm Abstract: We formulate an affine invariant implementation of the algorithm in (Nesterov, 1983). We show that the complexity bound is then proportional to an affine invariant regularity constant defined with respect to the Minkowski gauge of the feasible set.
Joint work with Cristóbal Guzmán and Martin Jaggi.
Guy Bresler (Massachusetts Institute of Technology)
Learning a TreeStructured Ising Model in Order to Make Predictions Abstract: We study the problem of learning a tree graphical model from samples such that loworder marginals are accurate. We define a distance (‘‘small set TV" or ssTV) between distributions P and Q by taking the maximum, over all subsets S of a given size, of the total variation between the marginals of P and Q on S. Approximating a distribution to within small ssTV allows making predictions based on partial observations. Focusing on pairwise marginals and treestructured Ising models on p nodes, we give an algorithm that produces a distribution (from the same class) with ssTV at most eta using significantly fewer samples than is necessary for learning the tree structure itself. Thus, even when there are far too few samples to recover the correct tree, it is possible to learn an incorrect tree that is useful.
Joint work with Mina Karzand.
Amit Chakrabarti (Dartmouth College) (Video: Part 1, Part 2)
Overview of Communication Complexity Abstract: This will be a tutorialstyle (long) talk, giving an overview of the important basic results in communication complexity, with emphasis on results that can be seen (sometimes creatively) as designing efficient protocols for certain tasks. As we shall see, a number of lower bounds in communication complexity are also ultimately based on designing protocols.
Stephen Chestnut (ETH Zurich) (Video)
Streaming sums and symmetric norms
Joint work with Jaroslaw Blasiok, Vladimir Braverman, Robert Krauthgamer, David P. Woodruf, and Lin F. Yang.
Graham Cormode (University of Warwick) (Video: Part 1, Part 2)
Compact summaries over large datasets Abstract: A fundamental challenge in processing the massive quantities of information generated by modern applications is in extracting suitable representations of the data that can be stored, manipulated and interrogated on a single machine. A promising approach is in the design and analysis of compact summaries: data structures which capture key features of the data, and which can be created effectively over distributed data sets. Popular summary structures include the count distinct algorithms, which compactly approximate item set cardinalities, and sketches which allow vector norms and products to be estimated. These are very attractive, since they can be computed in parallel and combined to yield a single, compact summary of the data. This talk introduces the concepts and examples of compact summaries as well as some recent developments.
Arnak Dalalyan (ENSAE / CREST, GENES) (Video)
Theoretical guarantees for approximate sampling from a smooth and logconcave density Abstract: Sampling from various kinds of distributions is an issue of paramount importance in statistics since it is often the key ingredient for constructing estimators, test procedures or confidence intervals. In many situations, the exact sampling from a given distribution is impossible or computationally expensive and, therefore, one needs to resort to approximate sampling strategies. However, there is no welldeveloped theory providing meaningful nonasymptotic guarantees for the approximate sampling procedures, especially in the highdimensional problems. In this talk, we present some recent advances in this direction by focusing on the problem of sampling from a multivariate distribution having a smooth and logconcave density. We establish nonasymptotic bounds for the error of approximating the true distribution by the one obtained by the Langevin Monte Carlo method and its variants. The computational complexity of the resulting sampling method will be discussed along with the main steps of the proof of the central result.
Ilias Diakonikolas (University of Southern California) (Video)
Density estimation via piecewise polynomial approximation in sample nearlinear time Abstract: In this talk, I will focus on the problem of density estimation, i.e., how to estimate (learn) a probability distribution based on random samples. I will describe a sampleoptimal and computationally efficient algorithm to learn univariate distributions that are wellapproximated by piecewise polynomial density functions. As a consequence of this algorithm, we obtain the first (near)sample optimal and earlinear time density estimators for a wide range of wellstudied structured distribution families.
If time permits, I will mention applications of the underlying algorithmic ideas to other inference tasks (e.g., regression).
(Joint work with J. Acharya, J. Li, and L. Schmidt.)
David Gamarnik (MIT) (Video)
(Arguably) Hard on Average Optimization Problems and the Overlap Gap Property Abstract: Many problems in the area of random combinatorial structures and highdimensional statistics exhibit an apparent computational hardness, even though the formal results establishing such hardness is lacking. Examples include the problem of finding an independent set of a random graph, finding a proper coloring of a random graph and the problem of finding the largest submatrix of a random matrix.
Inspired by insights from the statistical physics we propose a general conjecture regarding the onset of hardness for these type of problems formulated in terms of a certain overlap gap property. We conjecture that such problems become hard on average when the space of overlaps becomes disconnected in an appropriate sense (the model exhibits an overlap gap), and we prove the existence of such gaps for several of these problems including the problem of finding the largest submatrix of a random Gaussian matrix. Furthermore, we establish that the overlap gap property is a provable obstacle for a certain class of local algorithms for the problem of finding a largest independent set of a sparse graph and finding a satisfying assignment of a random NAEKSAT problem.
Joint work with Madhu Sudan and Quan Li.
Sudipto Guha (University of Pennsylvania) (Video)
Convex Programming in Small Space Abstract: I plan to talk about solving convex programs in small space  focusing on applications in streaming algorithms and distributed computing, in problems such as maximum matching and correlation clustering.
Sidharth Jaggi (The Chinese University of Hong Kong) (Video)
Grouptesting: Together we are one Group testing is perhaps the “simplest” class of nonlinear inference problems. Broadly speaking, grouptesting measurements exhibit a “threshold” behaviour, with positive test outcomes if the number of items in a test are above the threshold, and negative test outcomes otherwise. In this talk we'll survey bounds, algorithms and applications for a variety of flavours of grouptesting (adaptivenonadaptive grouptesting, zeroerrorepsilonerror grouptesting, noiseless/noisy measurements, universal grouptesting, grouptesting with inhibitors, constrained grouptesting). The talk is intended as a survey of classical and recent work, and will also present some open questions.
Michael Kapralov (EPFL) (Video)
Approximating matchings in sublinear space Abstract: Finding maximum matchings in graphs is one of the most wellstudied questions in combinatorial optimization. This problem is known to be solvable in polynomial time if the edge set of the graph can be loaded into memory. However, the sizes of data sets common in modern large data analysis often make this assumption unrealistic, calling for algorithms whose space requirements are sublinear in the size of the input that they operate on. An ideal algorithm would scan the list of edges of the input graph presented to it exactly once, maintain a small space representation of the stream seen so far at each point in time, and output a good approximation to the maximum matching at the end – such algorithms are referred to as single pass streaming algorithms.
In this talk we will discuss streaming algorithms for approximating maximum matchings, and limitations that space constraints imply. On the lower bounds side, we will show that if the order in which the edges are presented to the algorithm is adversarial and the algorithm must output the edges of a nearly optimal matching at the end of the stream, a substantial amount of space (much larger than the number of bits needed to describe a solution) is required to obtain a good constant factor approximation in a single pass. On the algorithms side, will we show that if only an approximation to the size of the maximum matching (as opposed to finding the actual edges) is needed and the stream is presented in a random order, a surprisingly efficient algorithm exists.
Christian Konrad (Reykjavik University) (Video)
Challenges in Streaming XML Abstract: Today, XML (eXtensible Markup Language) is ubiquitous. For example, it is the standard file format for data exchange on the Internet, and (often massive) XML databases are widely employed. Streaming XML, i.e., the processing of XML streams, gained popularity in recent years. In many applications, streaming processing is simply the only option (e.g. when monitoring data in sensor networks), but also in application where more involved approaches are possible, streaming algorithms often outperform usual nonstreaming approaches.
In this presentation, we discuss some of the challenges that arise when processing streaming XML. We discuss streaming algorithms for fundamental XMLrelated problems such as wellformedness and validity of XML documents. Presented techniques include hashing, randomization and communication complexity.
Ioannis Kontoyiannis (Athens U of Econ & Business) (Video)
Testing temporal causality and estimating directed information Abstract: The problem of estimating the directed information rate between two Markov chains of arbitrary (but finite) order is considered. Specifically for the socalled “plugin” (or maximumlikelihood) estimator, under natural conditions we show that it is consistent with probability one, and that it is asymptotically Gaussian. From this it is show that its convergence rate is of \(O(1/\sqrt{n})\), which is the best possible. A connection is established between this estimation problem and that of performing a hypothesis test for the presence of causal influence between the two processes. Under the null hypothesis, which corresponds to the absence of (temporal) causality, we show that the plugin estimator has an asymptotic \(\chi^2\) distribution, and that this estimator can be expressed precisely in terms of the classical likelihood ratio statistic. Combining these two results facilitates the design of a NeymanPearson likelihood ratio test for the presence of causal influence.
This is joint work with Maria Skoularidou.
Harry Lang (Johns Hopkins University) (Video)
Sliding Windows and Clustering Problems Abstract: We explore clustering problems in the streaming sliding window model in both general metric spaces and Euclidean space. We present the first polylogarithmic space \(O(1)\)approximation to the metric \(k\)median and metric \(k\)means problems in the sliding window model, answering the main open problem posed by Babcock, Datar, Motwani and O'Callaghan, which has remained unanswered for over a decade. Our algorithm uses \(O(k^3 \log^6 n)\) space and \(\mathrm{poly}(k, \log n)\) update time. This is an exponential improvement on the space required by the technique due to Babcock, et al. We introduce a data structure that extends smooth histograms as introduced by Braverman and Ostrovsky to operate on a broader class of functions. In particular, we show that using only polylogarithmic space we can maintain a summary of the current window from which we can construct an \(O(1)\)approximate clustering solution.
Mergeandreduce is a generic method in computational geometry for adapting offline algorithms to the insertiononly streaming model. Several wellknown coreset constructions are maintainable in the insertiononly streaming model using this method, including wellknown coreset techniques for the \(k\)median and \(k\)means problems in both lowand highdimensional Euclidean space. Previous work has adapted coreset techniques to the insertiondeletion model, but translating them to the sliding window model has remained a challenge. We give the first algorithm that, given an insertiononly streaming coreset of space \(s\) (maintained using mergeandreduce method), maintains this coreset in the sliding window model using \(O(s^2\epsilon^{2}\log n)\) space.
For clustering problems, our results constitute the first significant step towards resolving problem number 20 from the List of Open Problems in Sublinear Algorithms.
Yue Lu (Harvard University) (Video)
Dynamics of Randomized Iterative Methods for LargeScale Inference Problems Abstract: In this talk, I will present an exact analysis of the dynamics of randomized iterative methods for solving inference problems. I will show that, in the large systems limit, the dynamics of these algorithms converges to trajectories governed by a set of deterministic and coupled ODEs or PDEs. Analyzing these deterministic ODEs and PDEs allows one to establish performance guarantees of the associated randomized iterative algorithms.
Gábor Lugosi (Pompeu Fabra University) (Video: Part 1, Part 2)
How to estimate the mean of a random variable? Abstract: Given n independent, identically distributed copies of a random variable, one is interested in estimating the expected value. Perhaps surprisingly, there are still open questions concerning this very basic problem in statistics. In this
talk we are primarily interested in nonasymptotic subGaussian estimates
for potentially heavytailed random variables. We discuss various estimates
and extensions to high dimensions, empirical risk minimization, and multivariate problems. This talk is based on joint work with Emilien Joly, Luc Devroye, Matthieu Lerasle, and Roberto Imbuzeiro Oliveira.
Nicolas Macris (EPFL) (Video)
Spatial coupling as a proof technique Abstract: This talk will outline a recent set of ideas on using spatially coupled ensembles to deduce properties of the underlying noncoupled ensemble. An application is a proof of the replica symmetric formula for conditionnal entropy of LowDensityParityCheck codes on arbitrary binary input memoryless channels, as well as a proof of the Maxwell area construction for such systems. Applications to lossy source coding and satisfiability will be discussed time permitting.
Laurent Massoulie (Inria SaclayIle de France / Microsoft ResearchINRIA Joint Centre)
TBA
Andrew McGregor (University of Massachusetts) (Video: Part 1, Part 2)
Graph Sketching and Streaming Tutorial Abstract: We'll present a 3 hour tutorial covering recent algorithmic results on processing massive graphs via random linear projections, aka sketches, and data streams.
Mehdi Molkaraie (UPF) (Video)
Efficient Monte Carlo Methods for the Potts Model at Low Temperature Abstract: We consider the problem of estimating the partition function of the ferromagnetic qstate Potts model. We propose an importance sampling algorithm in the dual of the normal factor graph representing the model. The algorithm can efficiently compute an estimate of the partition function when the coupling parameters of the model are strong (corresponding to models at low temperature) or when the model contains a mixture of strong and weak couplings. We show that, in this setting, the proposed algorithm significantly outperforms the stateoftheart methods.
Eric Moulines (Télécom Paristech) (Video)
Sampling from logconcave nonsmooth densities or when Moreau meets Langevin
Jelani Nelson (Harvard) Optimal approximate matrix product in terms of stable rank Abstract: We give two different proofs that use the subspace embedding guarantee in a black box way to show that one can achieve the spectral norm guarantee for approximate matrix multiplication with a dimensionalityreducing map that has \(O(r / \epsilon^2)\) rows, where \(r\) is the maximum stable rank of the two matrices being multiplied. This resolves the main open questions of (Magen, Zouzias SODA’11) and (Kyrillidis, Vlachos, Zouzias ISIT’14).
Our work has already been applied by (Cohen et al, STOC’15) to obtain the new results on dimensionality reduction for kmeans clustering, and can also be applied to arguments of (Yang, Pilanci, Wainwright’15) to yield new dimensionality reduction results for nonparametric regression. We also show some new implications for least squares regression and lowrank approximation.
This talk is based on joint work with Michael B. Cohen (MIT) and David P. Woodruff (IBM Almaden).
Sewoong Oh (UIUC) (Video)
Nearoptimal messagepassing algorithms for crowdsourcing Abstract: Crowdsourcing systems, like Amazon Mechanical Turk, provide platforms where largescale projects are broken into small tasks that are electronically distributed to numerous ondemand contributors. Because these lowpaid workers can be unreliable, we need to devise schemes to increase confidence in our answers, typically by assigning each task multiple times and combining the answers in some way. I will present a rigorous treatment of this problem, and provide both an optimal task assignment scheme (using a random graph) and an optimal inference algorithm (based on lowrank matrix approximation and belief propagation) for that task assignment.
We represent crowdsourcing systems using graphical models and address the problem of inference in this graphical model. Standard techniques like belief propagation are difficult to implement in practice because they require knowledge of a priori distribution of the problem parameters. Instead, we propose a messagepassing algorithm that does not require any knowledge of the apriori distributions. We show that this algorithm achieves performance close to a minimax lower bound. To analyze the performance of this messagepassing algorithm, we borrow techniques from statistical physics and coding theory such as phase transition, correlation decay, and density evolution. Precisely, we show that above a phase transition, the graphical model exhibits correlation decay property. Then, an analysis technique known as density evolution gives a precise description of the density (or distribution) of the messages. Time permitting, I will discuss an interesting connection between this messagepassing algorithm and the singular vectors of sparse random matrices.
Krzysztof Onak (IBM T. J. Watson) (Video)
Communication Complexity of Learning Discrete Distributions
Abstract: The bounds on the sample complexity of most fundamental learning and testing problems for discrete distributions are well understood. We consider the scenario in which samples are collected by multiple players who have to communicate in order to solve the learning or testing problem. We ask how much communication this kind of task requires.
In the talk, I will focus on the problem of learning the distribution and show that players have to essentially transmit all their samples, provided each of them has a limited number of them.
Joint work with Ilias Diakonikolas, Elena Grigorescu, and Abhiram Natarajan.
Henry Pfister (Duke University) (Video: Part 1, Part 2)
Factor Graphs, Belief Propagation, and Density Evolution Abstract: The goal of this minicourse is to introduce students to marginal inference techniques for large systems of random variables defined by sparse random factor graphs. Over the past 20 years, these techniques have revolutionized errorcorrecting codes, compressed sensing, and random satisfiability. In particular, we consider approximate marginal inference based on the lowcomplexity iterative algorithm called belief propagation (BP). In general, this algorithm is quite effective when the neighborhoods of most variable nodes do not contain short cycles. Density evolution is a technique that, in some cases, allows one to rigorously analyze the asymptotic performance of BP as the size of the sparse random graph increases. Each technique will be illustrated via worked examples and descriptions of how they are used in practice.
Galen Reeves (Duke University) (Video)
Understanding Phase Transitions in Compressed Sensing Abstract: Large compressed sensing problems can exhibit phase transitions in which a small change in the number of measurements leads to a large change in the meansquared error. Over the past decade, these phase transitions have been studied using an amazingly diverse set of ideas from information theory, statistical physics, highdimensional geometry, and statistical decision theory. The goal of this talk is to use an information theoretic framework to explain the connections between three very different methods of analysis. The first uses the heuristic replica method from statistical physics to characterize the fundamental limits. The second uses the analysis of approximate loopy belief propagation to characterize the asymptotic performance of practical algorithms, and the third uses Gaussian process theory and concentration of measure to provide sharp nonasymptotic bounds for optimizationbased algorithms.
Ronitt Rubinfeld (MIT and Tel Aviv University) (Video: Part 1, Part 2)
Testing properties of distributions over big domains Abstract: We survey several works regarding the complexity of testing
global properties of discrete distributions, when given access to only
a few samples from the distribution. Such properties might include
testing if two distributions have small statistical distance,
testing various independence properties, testing whether
a distribution has a specific shape (such as monotone decreasing,
kmodal, khistogram, monotone hazard rate,…), and approximating the
entropy. We describe bounds for such testing problems whose
sample complexities are sublinear in the size of the support.
Christian Sohler (TU Dortmund) (Video)
Testing Cluster Structure of Graphs Abstract: We study the problem of recognizing the cluster structure of a graph in
the framework of property testing in the bounded degree model. Given a
parameter eps, a \(d\)bounded degree graph is defined to be
\((k,F)\)clusterable, if it can be partitioned into no more than \(k\) parts,
such that the (inner) conductance of the induced subgraph on each part is
at least \(F\) and the (outer) conductance of each part is at most \(c \epsilon^4 F^2\), where \(c\) depends only on \(d,k\). Our main result is a sublinear
algorithm with the running time \(O (\sqrt(n) \mathrm{poly}(F,k,1/\epsilon))\) that takes asinput a graph with maximum degree bounded by \(d\), parameters \(k\), \(F\), \(\epsilon\) , and with probability at least \(2/3\), accepts the graph if it is \((k,F)\)clusterable and rejects the graph if it is \(\epsilon\)far from \((k,F^*)\)clusterable for \(F^* = c O F^2 \epsilon^4 \log n\) , where \(cO\) depends only on \(d,k\).
Alexandre Tsybakov (CRESTENSAE)
Sharp minimax and adaptive variable selection Abstract: We derive nonasymptotic bounds for the minimax risk of variable selection under the expected Hamming loss in the problem of recovery of \(s\)sparse vectors in \(\mathbb{R}^d\) whose nonzero components are are greater than \(a > 0\). We obtain exact expressions for the nonasymptotic minimax risk as a function of \((d, s, a)\) and find explicitly the minimax selectors. Analogous results are obtained for the probability of wrong recovery of the sparsity pattern. As corollaries, we establish necessary and sufficient conditions for such asymptotic properties as almost full recovery and exact recovery. Moreover, we propose datadriven selectors that provide almost full and exact recovery adaptive to the parameters \((s,a)\) of the classes.
This is a joint work with C.Butucea and N.Stepanova.
Ruediger Urbanke (EPFL) (Video: Part 1, Part 2)
The Area Theorem and CapacityAchieving Codes Abstract: The area theorem can be thought of as a conservation law for error correcting codes. It is a mathematical formulation of the fact that there are no “good” or “bad” codes, only codes of different characteristic. I will start from scratch and first show the very simple derivation of the area theorem and then discuss its consequences. In particular I will discuss what it tells us about capacityachieving codes. In particular I will review a recent result by Kudekar, Kumar, Mondelli, Pfister, Sasoglu and Urbanke that shows that ReedMuller codes achieve capacity on the binary erasure channel where the area theorem plays a crucial role.
Gregory Valiant (Stanford) (Video: Part 1, Part 2)
When your big data seems too small: accurate inferences beyond the empirical distribution Abstract: We discuss three problems related to the general challenge of making
accurate inferences about a complex distribution, in the regime in
which the amount of data (i.e the sample size) is too small for the
empirical distribution of the samples to be an accurate representation
of the underlying distribution. The first problem we consider is the
following basic learning task: given independent draws from an unknown
distribution over a discrete support, output an approximation of the
distribution that is as accurate as possible in L1 distance (ie total
variation distance). Perhaps surprisingly, it is
often possible to “denoise” the empirical distribution of the samples
to return an approximation of the true distribution that is
significantly more accurate than the empirical distribution, without
relying on any prior assumptions on the distribution. We present an
instance optimal learning algorithm which optimally performs this
denoising for every distribution for which such a denoising is
possible. One curious implication of our techniques is an algorithm
for accurately estimating the number of new domain elements that would
be seen given a new larger sample, of size up to n*log n.
(Extrapolation beyond this sample size is provable information
theoretically impossible, without additional assumptions on the
distribution.) While these results are applicable generally, we
highlight an adaptation of this general approach to some problems in
genomics (e.g. quantifying the number of unobserved protein coding
variants).
The second problem we consider is the task of accurately estimating
the eigenvalues of the covariance matrix of a (high dimensional
realvalued) distribution–the “population spectrum”. (These
eigenvalues contain basic information about the distribution,
including the presence or lack of lowdimensional structure in the
distribution and the applicability of many higherlevel machine
learning and multivariate statistical tools.) As we show, even in the
regime where the sample size is linear or sublinear in the
dimensionality of the distribution, and hence the eigenvalues and
eigenvectors of the empirical covariance matrix are misleading,
accurate approximations to the true population spectrum are possible.
The final problem we discuss is the problem of recovering a lowrank
approximation to a matrix of probabilities P, given access to an
observed matrix of “counts” obtained via independent samples from the
distribution defined by P. This problem can be viewed as a
generalization of “community detection”, and is relevant to several
recent machine learning efforts, including the work on constructing
“word embeddings”.
This talk is based on four papers, which are joint works with Paul
Valiant, with Paul Valiant and James Zou, with Weihao Kong, and with
Qingqing Huang, Sham Kakade, and Weihao Kong.
Sergei Vassilvitskii (Google)
Playing games with a bandit
Abstract: As the number of advertising exchanges has grown, sellers have turned to low regret learning mechanisms to decide which exchange has the best price for their inventory. This in turn opens a question for the exchanges: how to set reserve prices to attract as many sellers as possible while maximizing revenue. This is a learning question in and of itself. In this work we formulate this problem precisely, and prove algorithms showing that simply knowing that the counterparty is using a low regret learning algorithm is enough for the exchange to have a low regret algorithm for the optimal price.
Pascal Vontobel (Chinese University of Hong Kong) (Video)
Analysis of double covers of factor graphs
Abstract: Graph covers have been shown to be a very useful tool for analyzing and understanding messagepassing iterative algorithms on factor graphs. In this talk, we introduce a novel technique for investigating the relationship between a base factor graph and its double covers. Potentially, this technique is also useful for analyzing graph covers of higher degree.
David Woodruff (IBM Almaden) (Video: Part 1, Part 2)
New Algorithms for Heavy Hitters in Data Streams Abstract: An old and fundamental problem in databases and data streams is that of finding the heavy hitters, also known as the topk, most popular items, frequent items, elephants, or iceberg queries. There are several variants of this problem, which quantify what it means for an item to be frequent, including what are known as the \(\ell_1\)heavy hitters and \(\ell_2\)heavy hitters. There are a number of algorithmic solutions for these problems, starting with the work of Misra and Gries, as well as the CountMin and CountSketch data structures, among others.
In this talk we cover several recent results developed in this area, which improve upon the classical solutions to these problems. In particular, we develop new algorithms for finding \(\ell_1\)heavy hitters and \(\ell_2\)heavy hitters, with significantly less memory required than what was known, and which are optimal in a number of parameter regimes.
Based on recent works with Arnab Bhattacharyya, Palash Dey and Vladimir Braverman, Stephen Chestnut, Nikita Ivkin
