# Nexus of Information and Computation Theories Inference Problems Theme March 7 - 18, 2016

### Titles, Abstracts, and Videos

Alex Andoni (Columbia)   (Video: Part 1, Part 2)
Sketching and Embeddings

Abstract: Sketching for distance estimation is the problem where we need to design a possibly randomized function $$f$$ from a metric space to short strings, such that from $$f(x)$$ and $$f(y)$$ we can estimate the distance between $$x$$ and $$y$$. This problem is a core problem in both the streaming and nearest neighbor search areas. We will discuss this problem and its connections to the theory of metric embeddings. In particular, we will discuss when and why sketching is equivalent to embedding into normed space such as $$\ell_1$$.

Iryna Andriyanova (ETIS Lab, ENSEA/University of Cergy-Pontoise/CNRS)   (Video)
Iteratively-Decoded Erasure-Correcting Coding for Distributed Storage

Abstract: This talk will present some constructions of iteratively-decoded sparse-graph codes over various erasure channel models, coming from distributed storage systems. Although the state-of-the-art of coding for distributed storage is built on short algebraic block codes, there have been several attempts to use sparse-graph codes, with the aim to improve the decoding complexity and the scalability of the storage system in whole. This talk will introduce the existing code constructions and will discuss the use of graph-based codes in the framework of distributed storage.

Alexandre d'Aspremont (École Normale Supérieure)   (Video)
An Optimal Affine Invariant Smooth Minimization Algorithm

Abstract: We formulate an affine invariant implementation of the algorithm in (Nesterov, 1983). We show that the complexity bound is then proportional to an affine invariant regularity constant defined with respect to the Minkowski gauge of the feasible set.

Joint work with Cristóbal Guzmán and Martin Jaggi.

Guy Bresler (Massachusetts Institute of Technology)
Learning a Tree-Structured Ising Model in Order to Make Predictions

Abstract: We study the problem of learning a tree graphical model from samples such that low-order marginals are accurate. We define a distance (‘‘small set TV" or ssTV) between distributions P and Q by taking the maximum, over all subsets S of a given size, of the total variation between the marginals of P and Q on S. Approximating a distribution to within small ssTV allows making predictions based on partial observations. Focusing on pairwise marginals and tree-structured Ising models on p nodes, we give an algorithm that produces a distribution (from the same class) with ssTV at most eta using significantly fewer samples than is necessary for learning the tree structure itself. Thus, even when there are far too few samples to recover the correct tree, it is possible to learn an incorrect tree that is useful.

Joint work with Mina Karzand.

Amit Chakrabarti (Dartmouth College)   (Video: Part 1, Part 2)
Overview of Communication Complexity

Abstract: This will be a tutorial-style (long) talk, giving an overview of the important basic results in communication complexity, with emphasis on results that can be seen (sometimes creatively) as designing efficient protocols for certain tasks. As we shall see, a number of lower bounds in communication complexity are also ultimately based on designing protocols.

Stephen Chestnut (ETH Zurich)   (Video)
Streaming sums and symmetric norms

Joint work with Jaroslaw Blasiok, Vladimir Braverman, Robert Krauthgamer, David P. Woodruf, and Lin F. Yang.

Graham Cormode (University of Warwick)   (Video: Part 1, Part 2)
Compact summaries over large datasets

Abstract: A fundamental challenge in processing the massive quantities of information generated by modern applications is in extracting suitable representations of the data that can be stored, manipulated and interrogated on a single machine. A promising approach is in the design and analysis of compact summaries: data structures which capture key features of the data, and which can be created effectively over distributed data sets. Popular summary structures include the count distinct algorithms, which compactly approximate item set cardinalities, and sketches which allow vector norms and products to be estimated. These are very attractive, since they can be computed in parallel and combined to yield a single, compact summary of the data. This talk introduces the concepts and examples of compact summaries as well as some recent developments.

Arnak Dalalyan (ENSAE / CREST, GENES)   (Video)
Theoretical guarantees for approximate sampling from a smooth and log-concave density

Abstract: Sampling from various kinds of distributions is an issue of paramount importance in statistics since it is often the key ingredient for constructing estimators, test procedures or confidence intervals. In many situations, the exact sampling from a given distribution is impossible or computationally expensive and, therefore, one needs to resort to approximate sampling strategies. However, there is no well-developed theory providing meaningful nonasymptotic guarantees for the approximate sampling procedures, especially in the high-dimensional problems. In this talk, we present some recent advances in this direction by focusing on the problem of sampling from a multivariate distribution having a smooth and log-concave density. We establish nonasymptotic bounds for the error of approximating the true distribution by the one obtained by the Langevin Monte Carlo method and its variants. The computational complexity of the resulting sampling method will be discussed along with the main steps of the proof of the central result.

Ilias Diakonikolas (University of Southern California)   (Video)
Density estimation via piecewise polynomial approximation in sample near-linear time

Abstract: In this talk, I will focus on the problem of density estimation, i.e., how to estimate (learn) a probability distribution based on random samples. I will describe a sample-optimal and computationally efficient algorithm to learn univariate distributions that are well-approximated by piecewise polynomial density functions. As a consequence of this algorithm, we obtain the first (near-)sample optimal and
ear-linear time density estimators for a wide range of well-studied structured distribution families.

If time permits, I will mention applications of the underlying algorithmic ideas to other inference tasks (e.g., regression).

(Joint work with J. Acharya, J. Li, and L. Schmidt.)

David Gamarnik (MIT)   (Video)
(Arguably) Hard on Average Optimization Problems and the Overlap Gap Property

Abstract: Many problems in the area of random combinatorial structures and high-dimensional statistics exhibit an apparent computational hardness, even though the formal results establishing such hardness is lacking. Examples include the problem of finding an independent set of a random graph, finding a proper coloring of a random graph and the problem of finding the largest submatrix of a random matrix.

Inspired by insights from the statistical physics we propose a general conjecture regarding the onset of hardness for these type of problems formulated in terms of a certain overlap gap property. We conjecture that such problems become hard on average when the space of overlaps becomes disconnected in an appropriate sense (the model exhibits an overlap gap), and we prove the existence of such gaps for several of these problems including the problem of finding the largest submatrix of a random Gaussian matrix. Furthermore, we establish that the overlap gap property is a provable obstacle for a certain class of local algorithms for the problem of finding a largest independent set of a sparse graph and finding a satisfying assignment of a random NAE-K-SAT problem.

Joint work with Madhu Sudan and Quan Li.

Sudipto Guha (University of Pennsylvania)   (Video)
Convex Programming in Small Space

Abstract: I plan to talk about solving convex programs in small space - focusing on applications in streaming algorithms and distributed computing, in problems such as maximum matching and correlation clustering.

Sidharth Jaggi (The Chinese University of Hong Kong)   (Video)
Group-testing: Together we are one

Group testing is perhaps the “simplest” class of non-linear inference problems. Broadly speaking, group-testing measurements exhibit a “threshold” behaviour, with positive test outcomes if the number of items in a test are above the threshold, and negative test outcomes otherwise. In this talk we'll survey bounds, algorithms and applications for a variety of flavours of group-testing (adaptivenon-adaptive group-testing, zero-errorepsilon-error group-testing, noiseless/noisy measurements, universal group-testing, group-testing with inhibitors, constrained group-testing). The talk is intended as a survey of classical and recent work, and will also present some open questions.

Michael Kapralov (EPFL)   (Video)
Approximating matchings in sublinear space

Abstract: Finding maximum matchings in graphs is one of the most well-studied questions in combinatorial optimization. This problem is known to be solvable in polynomial time if the edge set of the graph can be loaded into memory. However, the sizes of data sets common in modern large data analysis often make this assumption unrealistic, calling for algorithms whose space requirements are sublinear in the size of the input that they operate on. An ideal algorithm would scan the list of edges of the input graph presented to it exactly once, maintain a small space representation of the stream seen so far at each point in time, and output a good approximation to the maximum matching at the end – such algorithms are referred to as single pass streaming algorithms.

In this talk we will discuss streaming algorithms for approximating maximum matchings, and limitations that space constraints imply. On the lower bounds side, we will show that if the order in which the edges are presented to the algorithm is adversarial and the algorithm must output the edges of a nearly optimal matching at the end of the stream, a substantial amount of space (much larger than the number of bits needed to describe a solution) is required to obtain a good constant factor approximation in a single pass. On the algorithms side, will we show that if only an approximation to the size of the maximum matching (as opposed to finding the actual edges) is needed and the stream is presented in a random order, a surprisingly efficient algorithm exists.

Christian Konrad (Reykjavik University)   (Video)
Challenges in Streaming XML

Abstract: Today, XML (eXtensible Markup Language) is ubiquitous. For example, it is the standard file format for data exchange on the Internet, and (often massive) XML databases are widely employed. Streaming XML, i.e., the processing of XML streams, gained popularity in recent years. In many applications, streaming processing is simply the only option (e.g. when monitoring data in sensor networks), but also in application where more involved approaches are possible, streaming algorithms often outperform usual non-streaming approaches.

In this presentation, we discuss some of the challenges that arise when processing streaming XML. We discuss streaming algorithms for fundamental XML-related problems such as well-formedness and validity of XML documents. Presented techniques include hashing, randomization and communication complexity.

Ioannis Kontoyiannis (Athens U of Econ & Business)   (Video)
Testing temporal causality and estimating directed information

Abstract: The problem of estimating the directed information rate between two Markov chains of arbitrary (but finite) order is considered. Specifically for the so-called “plug-in” (or maximum-likelihood) estimator, under natural conditions we show that it is consistent with probability one, and that it is asymptotically Gaussian. From this it is show that its convergence rate is of $$O(1/\sqrt{n})$$, which is the best possible. A connection is established between this estimation problem and that of performing a hypothesis test for the presence of causal influence between the two processes. Under the null hypothesis, which corresponds to the absence of (temporal) causality, we show that the plug-in estimator has an asymptotic $$\chi^2$$ distribution, and that this estimator can be expressed precisely in terms of the classical likelihood ratio statistic. Combining these two results facilitates the design of a Neyman-Pearson likelihood ratio test for the presence of causal influence.

This is joint work with Maria Skoularidou.

Harry Lang (Johns Hopkins University)   (Video)
Sliding Windows and Clustering Problems

Abstract: We explore clustering problems in the streaming sliding window model in both general metric spaces and Euclidean space. We present the first polylogarithmic space $$O(1)$$-approximation to the metric $$k$$-median and metric $$k$$-means problems in the sliding window model, answering the main open problem posed by Babcock, Datar, Motwani and O'Callaghan, which has remained unanswered for over a decade. Our algorithm uses $$O(k^3 \log^6 n)$$ space and $$\mathrm{poly}(k, \log n)$$ update time. This is an exponential improvement on the space required by the technique due to Babcock, et al. We introduce a data structure that extends smooth histograms as introduced by Braverman and Ostrovsky to operate on a broader class of functions. In particular, we show that using only polylogarithmic space we can maintain a summary of the current window from which we can construct an $$O(1)$$-approximate clustering solution.

Merge-and-reduce is a generic method in computational geometry for adapting offline algorithms to the insertion-only streaming model. Several well-known coreset constructions are maintainable in the insertion-only streaming model using this method, including well-known coreset techniques for the $$k$$-median and $$k$$-means problems in both low-and high-dimensional Euclidean space. Previous work has adapted coreset techniques to the insertion-deletion model, but translating them to the sliding window model has remained a challenge. We give the first algorithm that, given an insertion-only streaming coreset of space $$s$$ (maintained using merge-and-reduce method), maintains this coreset in the sliding window model using $$O(s^2\epsilon^{-2}\log n)$$ space.

For clustering problems, our results constitute the first significant step towards resolving problem number 20 from the List of Open Problems in Sublinear Algorithms.

Yue Lu (Harvard University)   (Video)
Dynamics of Randomized Iterative Methods for Large-Scale Inference Problems

Abstract: In this talk, I will present an exact analysis of the dynamics of randomized iterative methods for solving inference problems. I will show that, in the large systems limit, the dynamics of these algorithms converges to trajectories governed by a set of deterministic and coupled ODEs or PDEs. Analyzing these deterministic ODEs and PDEs allows one to establish performance guarantees of the associated randomized iterative algorithms.

Gábor Lugosi (Pompeu Fabra University)   (Video: Part 1, Part 2)
How to estimate the mean of a random variable?

Abstract: Given n independent, identically distributed copies of a random variable, one is interested in estimating the expected value. Perhaps surprisingly, there are still open questions concerning this very basic problem in statistics. In this talk we are primarily interested in non-asymptotic sub-Gaussian estimates for potentially heavy-tailed random variables. We discuss various estimates and extensions to high dimensions, empirical risk minimization, and multivariate problems. This talk is based on joint work with Emilien Joly, Luc Devroye, Matthieu Lerasle, and Roberto Imbuzeiro Oliveira.

Nicolas Macris (EPFL)   (Video)
Spatial coupling as a proof technique

Abstract: This talk will outline a recent set of ideas on using spatially coupled ensembles to deduce properties of the underlying non-coupled ensemble. An application is a proof of the replica symmetric formula for conditionnal entropy of Low-Density-Parity-Check codes on arbitrary binary input memoryless channels, as well as a proof of the Maxwell area construction for such systems. Applications to lossy source coding and satisfiability will be discussed time permitting.

Laurent Massoulie (Inria Saclay-Ile de France / Microsoft Research-INRIA Joint Centre)
TBA

Andrew McGregor (University of Massachusetts)   (Video: Part 1, Part 2)
Graph Sketching and Streaming Tutorial

Abstract: We'll present a 3 hour tutorial covering recent algorithmic results on processing massive graphs via random linear projections, aka sketches, and data streams.

Mehdi Molkaraie (UPF)   (Video)
Efficient Monte Carlo Methods for the Potts Model at Low Temperature

Abstract: We consider the problem of estimating the partition function of the ferromagnetic q-state Potts model. We propose an importance sampling algorithm in the dual of the normal factor graph representing the model. The algorithm can efficiently compute an estimate of the partition function when the coupling parameters of the model are strong (corresponding to models at low temperature) or when the model contains a mixture of strong and weak couplings. We show that, in this setting, the proposed algorithm significantly outperforms the state-of-the-art methods.

Eric Moulines (Télécom Paristech)    (Video)
Sampling from log-concave non-smooth densities or when Moreau meets Langevin

Jelani Nelson (Harvard)
Optimal approximate matrix product in terms of stable rank

Abstract: We give two different proofs that use the subspace embedding guarantee in a black box way to show that one can achieve the spectral norm guarantee for approximate matrix multiplication with a dimensionality-reducing map that has $$O(r / \epsilon^2)$$ rows, where $$r$$ is the maximum stable rank of the two matrices being multiplied. This resolves the main open questions of (Magen, Zouzias SODA’11) and (Kyrillidis, Vlachos, Zouzias ISIT’14).

Our work has already been applied by (Cohen et al, STOC’15) to obtain the new results on dimensionality reduction for k-means clustering, and can also be applied to arguments of (Yang, Pilanci, Wainwright’15) to yield new dimensionality reduction results for non-parametric regression. We also show some new implications for least squares regression and low-rank approximation.

This talk is based on joint work with Michael B. Cohen (MIT) and David P. Woodruff (IBM Almaden).

Sewoong Oh (UIUC)   (Video)
Near-optimal message-passing algorithms for crowdsourcing

Abstract: Crowdsourcing systems, like Amazon Mechanical Turk, provide platforms where large-scale projects are broken into small tasks that are electronically distributed to numerous on-demand contributors. Because these low-paid workers can be unreliable, we need to devise schemes to increase confidence in our answers, typically by assigning each task multiple times and combining the answers in some way. I will present a rigorous treatment of this problem, and provide both an optimal task assignment scheme (using a random graph) and an optimal inference algorithm (based on low-rank matrix approximation and belief propagation) for that task assignment.

We represent crowdsourcing systems using graphical models and address the problem of inference in this graphical model. Standard techniques like belief propagation are difficult to implement in practice because they require knowledge of a priori distribution of the problem parameters. Instead, we propose a message-passing algorithm that does not require any knowledge of the apriori distributions. We show that this algorithm achieves performance close to a minimax lower bound. To analyze the performance of this message-passing algorithm, we borrow techniques from statistical physics and coding theory such as phase transition, correlation decay, and density evolution. Precisely, we show that above a phase transition, the graphical model exhibits correlation decay property. Then, an analysis technique known as density evolution gives a precise description of the density (or distribution) of the messages. Time permitting, I will discuss an interesting connection between this message-passing algorithm and the singular vectors of sparse random matrices.

Krzysztof Onak (IBM T. J. Watson)   (Video)
Communication Complexity of Learning Discrete Distributions

Abstract: The bounds on the sample complexity of most fundamental learning and testing problems for discrete distributions are well understood. We consider the scenario in which samples are collected by multiple players who have to communicate in order to solve the learning or testing problem. We ask how much communication this kind of task requires.

In the talk, I will focus on the problem of learning the distribution and show that players have to essentially transmit all their samples, provided each of them has a limited number of them.

Joint work with Ilias Diakonikolas, Elena Grigorescu, and Abhiram Natarajan.

Henry Pfister (Duke University)   (Video: Part 1, Part 2)
Factor Graphs, Belief Propagation, and Density Evolution

Abstract: The goal of this mini-course is to introduce students to marginal inference techniques for large systems of random variables defined by sparse random factor graphs. Over the past 20 years, these techniques have revolutionized error-correcting codes, compressed sensing, and random satisfiability. In particular, we consider approximate marginal inference based on the low-complexity iterative algorithm called belief propagation (BP). In general, this algorithm is quite effective when the neighborhoods of most variable nodes do not contain short cycles. Density evolution is a technique that, in some cases, allows one to rigorously analyze the asymptotic performance of BP as the size of the sparse random graph increases. Each technique will be illustrated via worked examples and descriptions of how they are used in practice.

Galen Reeves (Duke University)   (Video)
Understanding Phase Transitions in Compressed Sensing

Abstract: Large compressed sensing problems can exhibit phase transitions in which a small change in the number of measurements leads to a large change in the mean-squared error. Over the past decade, these phase transitions have been studied using an amazingly diverse set of ideas from information theory, statistical physics, high-dimensional geometry, and statistical decision theory. The goal of this talk is to use an information theoretic framework to explain the connections between three very different methods of analysis. The first uses the heuristic replica method from statistical physics to characterize the fundamental limits. The second uses the analysis of approximate loopy belief propagation to characterize the asymptotic performance of practical algorithms, and the third uses Gaussian process theory and concentration of measure to provide sharp non-asymptotic bounds for optimization-based algorithms.

Ronitt Rubinfeld (MIT and Tel Aviv University)   (Video: Part 1, Part 2)
Testing properties of distributions over big domains

Abstract: We survey several works regarding the complexity of testing global properties of discrete distributions, when given access to only a few samples from the distribution. Such properties might include testing if two distributions have small statistical distance, testing various independence properties, testing whether a distribution has a specific shape (such as monotone decreasing, k-modal, k-histogram, monotone hazard rate,…), and approximating the entropy. We describe bounds for such testing problems whose sample complexities are sublinear in the size of the support.

Christian Sohler (TU Dortmund)   (Video)
Testing Cluster Structure of Graphs

Abstract: We study the problem of recognizing the cluster structure of a graph in the framework of property testing in the bounded degree model. Given a parameter eps, a $$d$$-bounded degree graph is defined to be $$(k,F)$$-clusterable, if it can be partitioned into no more than $$k$$ parts, such that the (inner) conductance of the induced subgraph on each part is at least $$F$$ and the (outer) conductance of each part is at most $$c \epsilon^4 F^2$$, where $$c$$ depends only on $$d,k$$. Our main result is a sublinear algorithm with the running time $$O (\sqrt(n) \mathrm{poly}(F,k,1/\epsilon))$$ that takes asinput a graph with maximum degree bounded by $$d$$, parameters $$k$$, $$F$$, $$\epsilon$$ , and with probability at least $$2/3$$, accepts the graph if it is $$(k,F)$$-clusterable and rejects the graph if it is $$\epsilon$$-far from $$(k,F^*)$$-clusterable for $$F^* = c O F^2 \epsilon^4 \log n$$ , where $$cO$$ depends only on $$d,k$$.

Alexandre Tsybakov (CREST-ENSAE)
Sharp minimax and adaptive variable selection

Abstract: We derive non-asymptotic bounds for the minimax risk of variable selection under the expected Hamming loss in the problem of recovery of $$s$$-sparse vectors in $$\mathbb{R}^d$$ whose non-zero components are are greater than $$a > 0$$. We obtain exact expressions for the non-asymptotic minimax risk as a function of $$(d, s, a)$$ and find explicitly the minimax selectors. Analogous results are obtained for the probability of wrong recovery of the sparsity pattern. As corollaries, we establish necessary and sufficient conditions for such asymptotic properties as almost full recovery and exact recovery. Moreover, we propose data-driven selectors that provide almost full and exact recovery adaptive to the parameters $$(s,a)$$ of the classes.

This is a joint work with C.Butucea and N.Stepanova.

Ruediger Urbanke (EPFL)   (Video: Part 1, Part 2)
The Area Theorem and Capacity-Achieving Codes

Abstract: The area theorem can be thought of as a conservation law for error correcting codes. It is a mathematical formulation of the fact that there are no “good” or “bad” codes, only codes of different characteristic. I will start from scratch and first show the very simple derivation of the area theorem and then discuss its consequences. In particular I will discuss what it tells us about capacity-achieving codes. In particular I will review a recent result by Kudekar, Kumar, Mondelli, Pfister, Sasoglu and Urbanke that shows that Reed-Muller codes achieve capacity on the binary erasure channel where the area theorem plays a crucial role.

Gregory Valiant (Stanford)   (Video: Part 1, Part 2)
When your big data seems too small: accurate inferences beyond the empirical distribution

Abstract: We discuss three problems related to the general challenge of making accurate inferences about a complex distribution, in the regime in which the amount of data (i.e the sample size) is too small for the empirical distribution of the samples to be an accurate representation of the underlying distribution. The first problem we consider is the following basic learning task: given independent draws from an unknown distribution over a discrete support, output an approximation of the distribution that is as accurate as possible in L1 distance (ie total variation distance). Perhaps surprisingly, it is often possible to “de-noise” the empirical distribution of the samples to return an approximation of the true distribution that is significantly more accurate than the empirical distribution, without relying on any prior assumptions on the distribution. We present an instance optimal learning algorithm which optimally performs this de-noising for every distribution for which such a de-noising is possible. One curious implication of our techniques is an algorithm for accurately estimating the number of new domain elements that would be seen given a new larger sample, of size up to n*log n. (Extrapolation beyond this sample size is provable information theoretically impossible, without additional assumptions on the distribution.) While these results are applicable generally, we highlight an adaptation of this general approach to some problems in genomics (e.g. quantifying the number of unobserved protein coding variants).

The second problem we consider is the task of accurately estimating the eigenvalues of the covariance matrix of a (high dimensional real-valued) distribution–the “population spectrum”. (These eigenvalues contain basic information about the distribution, including the presence or lack of low-dimensional structure in the distribution and the applicability of many higher-level machine learning and multivariate statistical tools.) As we show, even in the regime where the sample size is linear or sublinear in the dimensionality of the distribution, and hence the eigenvalues and eigenvectors of the empirical covariance matrix are misleading, accurate approximations to the true population spectrum are possible.

The final problem we discuss is the problem of recovering a low-rank approximation to a matrix of probabilities P, given access to an observed matrix of “counts” obtained via independent samples from the distribution defined by P. This problem can be viewed as a generalization of “community detection”, and is relevant to several recent machine learning efforts, including the work on constructing “word embeddings”.

This talk is based on four papers, which are joint works with Paul Valiant, with Paul Valiant and James Zou, with Weihao Kong, and with Qingqing Huang, Sham Kakade, and Weihao Kong.

Playing games with a bandit

Abstract: As the number of advertising exchanges has grown, sellers have turned to low regret learning mechanisms to decide which exchange has the best price for their inventory. This in turn opens a question for the exchanges: how to set reserve prices to attract as many sellers as possible while maximizing revenue. This is a learning question in and of itself. In this work we formulate this problem precisely, and prove algorithms showing that simply knowing that the counterparty is using a low regret learning algorithm is enough for the exchange to have a low regret algorithm for the optimal price.

Pascal Vontobel (Chinese University of Hong Kong)   (Video)
Analysis of double covers of factor graphs

Abstract: Graph covers have been shown to be a very useful tool for analyzing and understanding message-passing iterative algorithms on factor graphs. In this talk, we introduce a novel technique for investigating the relationship between a base factor graph and its double covers. Potentially, this technique is also useful for analyzing graph covers of higher degree.

David Woodruff (IBM Almaden)   (Video: Part 1, Part 2)
New Algorithms for Heavy Hitters in Data Streams

Abstract: An old and fundamental problem in databases and data streams is that of finding the heavy hitters, also known as the top-k, most popular items, frequent items, elephants, or iceberg queries. There are several variants of this problem, which quantify what it means for an item to be frequent, including what are known as the $$\ell_1$$-heavy hitters and $$\ell_2$$-heavy hitters. There are a number of algorithmic solutions for these problems, starting with the work of Misra and Gries, as well as the CountMin and CountSketch data structures, among others.

In this talk we cover several recent results developed in this area, which improve upon the classical solutions to these problems. In particular, we develop new algorithms for finding $$\ell_1$$-heavy hitters and $$\ell_2$$-heavy hitters, with significantly less memory required than what was known, and which are optimal in a number of parameter regimes.

Based on recent works with Arnab Bhattacharyya, Palash Dey and Vladimir Braverman, Stephen Chestnut, Nikita Ivkin