The minimum path cover problem asks us to find a minimum-cardinality set of paths that cover all the nodes of a directed acyclic graph (DAG). We study the case when the size k of a minimum path cover is small, that is, the DAG has a small width. This case is motivated by applications in pan-genomics, where genomic variation of a population is expressed as a DAG. We observe that classical alignment algorithms exploiting sparse dynamic programming can be extended to the sequence-against-DAG case by mimicking the algorithm for sequences on each path of a minimum path cover and handling an evaluation order anomaly with reachability queries.Namely, we propose an algorithm for finding a minimum path cover of a DAG (V,E) in O(k|E|\log|V|) time, improving all known time-bounds when k is small and the DAG is not too dense. An immediate consequence is an improved space/time tradeoff for reachability queries in arbitrary directed graphs. Then we introduce the general framework for sparse dynamic programming extensions. This framework generally produces algorithms that are slower than their counterparts on sequences only by a factor k. We illustrate this on two classical problems extended to DAGs: longest increasing subsequence and longest common subsequence. For the former we obtain an algorithm with running time O(k|E|\log |V|). This matches the optimal solution to the classical problem variant when, e.g., the input sequence is modeled as a path. Finally, we apply this technique to the more involved co-linear chaining problem.

We study clustering problems with outliers. More specifically, we look at \ufl (UFL), \kmed, and \kmeans. In these problems, we are given a set $\cX$ of data points in a metric space $\delta(.,.)$, a set $\fa$ of possible centres (each maybe with an opening cost), maybe an integer parameter $k$, plus an additional parameter $z$ as the number of outliers. In \ufl with outliers, we have to open some centres, discard up to $z$ points of $\cX$ and assign every other point to the nearest open centre, minimizing the total assignment cost plus centre opening costs. In \kmed and \kmeans, we have to open up to $k$ centres but there are no opening costs. In \kmeans, the cost of assigning $j$ to $i$ is $\delta^2(j,i)$. We present several results. Our main focus is on cases where $\delta$ is a doubling metric or is the shortest path metrics of graphs from a minor-closed family of graphs. For \uufl with outliers on such metrics we show that a multiswap simple local search heuristic yields a PTAS. With a bit more work, we extend this to bicriteria approximations for the \kmed and \kmeans problems in the same metrics where, for any constant $\epsilon > 0$, we can find a solution using $(1+\epsilon)k$ centres whose cost is at most a $(1+\epsilon)$-factor of the optimum and uses at most $z$ outliers.

We consider two related scheduling problems: single resource constrained scheduling on identical parallel machines and a generalization with resource dependent processing times. In both problems, jobs require a certain amount of an additional resource and have to be scheduled on machines minimizing the makespan, while at every point in time a given resource capacity is not exceeded. In the first variant of the problem the processing times and resource amounts are fixed, while in the second the former depends on the latter. Both problems contain bin packing with cardinality constraint as a special case, and, therefore, these problems are strongly NP-complete even for a constant number of machines larger than three, which can be proven by a reduction from 3-Partition. Furthermore, if the number of machines is part of the input, we can not hope for an approximation algorithm with absolute approximation ratio smaller than 3/2. We present asymptotic fully polynomial time approximation schemes (AFPTAS) for the problems: For any µ > 0 a schedule of length at most (1+µ) times the optimum plus an additive term of O(log(1/µ)/µ) is provided, and the running time is polynomially bounded in 1/µ and the input length. Up to now only approximation algorithms with absolute approximation ratios were known. Furthermore, the AFPTAS for resource constrained scheduling on identical parallel machines directly improves the additive term of the best AFPTAS for bin packing with cardinality constraint so far.

The distinct elements problem is one of the fundamental problems in streaming algorithms --- given a stream of integers in the range $\{1,\ldots,n\}$, we wish to provide a $(1+\varepsilon)$ approximation to the number of distinct elements in the input. After a long line of research optimal solution for this problem with constant probability of success, using $\mathcal{O}(\frac{1}{\varepsilon^2}+\log n)$ bits of space, was given by Kane, Nelson and Woodruff in 2010. The standard approach used in order to achieve low failure probability $\delta$, is to take a median of $\log \delta^{-1}$ parallel repetitions of the original algorithm and report the median of computed answers. We show that such a multiplicative space blow-up is unnecessary: we provide an optimal algorithm using $\mathcal{O}(\frac{\log \delta^{-1}}{\varepsilon^2} + \log n)$ bits of space --- matching known lower bounds for this problem. That is, the $\log\delta^{-1}$ factor does not multiply the $\log n$ term. This settles completely the space complexity of the distinct elements problem with respect to all standard parameters. We consider also \emph{strong tracking} (or \emph{continuous monitoring}) variant of the distinct elements problem, where we want an algorithm which provides an approximation of the number of distinct elements seen so far, at all times of the stream. We show that this variant can be solved using $\mathcal{O}(\frac{\log \log n + \log \delta^{-1}}{\varepsilon^2} + \log n)$ bits of space, which we show to be optimal.

We study the k-server problem in the resource augmentation setting i.e.,when the performance of the online algorithm with k servers is compared to the offline optimal solution with h d k servers. The problem is very poorly understood beyond uniform metrics. For this special case, the classic k-server algorithms are roughly (1+1/µ)-competitive when k=(1+µ) h, for any µ>0. Surprisingly however, no o(h)-competitive algorithm is known even for HSTs of depth 2 and even when k/h is arbitrarily large. We obtain several new results for the problem. First we show that the known k-server algorithms do not work even on very simple metrics. In particular, the Double Coverage algorithm has competitive ratio ©(h) irrespective of the value of k, even for depth-2 HSTs. Similarly, the Work Function Algorithm, that is believed to be optimal for all metric spaces when k=h, has competitive ratio ©(h) on depth-3 HSTs even if k=2h. Our main result is a new algorithm that is O(1)-competitive for constant depth trees, whenever k =(1+µ)h for any µ > 0. Finally, we give a general lower bound that any deterministic online algorithm has competitive ratio at least 2.4 even for depth-2 HSTs and when k/h is arbitrarily large. This gives a surprising qualitative separation between uniform metrics and depth-2 HSTs for the (h,k)-server problem, and gives the strongest known lower bound for the problem on general metrics.

Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency and/or correctness are analyzed using probabilistic tools under the assumption that the inputs and queries are independent of the internal randomness of the data structure. In this work, we consider data structures in a more robust model, which we call the adversarial model. Roughly speaking, this model allows an adversary to choose inputs and queries adaptively according to previous responses. Specifically, we consider a data structure known as "Bloom filter" and prove a tight connection between Bloom filters in this model and cryptography. A Bloom filter represents a set S of elements approximately, by using fewer bits than a precise representation. The price for succinctness is allowing some errors: for any x in S it should always answer `Yes', and for any x not in S it should answer 'Yes' only with small probability. In the adversarial model, we consider both efficient adversaries (that run in polynomial time) and computationally unbounded adversaries that are only bounded in the amount of queries they can make. For computationally bounded adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and only if one-way functions exist. For unbounded adversaries we show that there exists a Bloom filter for sets of size n and error eps, that is secure against t queries and uses only O(n*log(1/eps)+t) bits of memory. In comparison, n*log(1/eps) is the best possible under a non-adaptive adversary.

We show tight upper and lower bounds for time--space trade-offs for the *c*-Approximate Near Neighbor Search problem.
For the *d*-dimensional Euclidean space and *n*-point datasets, we develop a data structure with space *n ^{1+Áu+o(1)}+O(dn)* and query time

*n*for every

^{Áq+o(1)}+dn^{o(1)}*Á*with:

_{u}, Á_{q}e0*c*For example, for the approximation

^{2}Á_{q}+ (c^{2}- 1) Á_{u}= (2c^{2}-1).*c=2*we achieve: Ï Space

*n*and query time

^{1.77...}*n*, significantly improving upon known data structures that support very fast queries; Ï Space

^{o(1)}*n*and query time

^{1.14...}*n*, matching the optimal data-dependent Locality--Sensitive Hashing (LSH); Ï Space

^{0.14...}*n*and query time

^{1+o(1)}*n*, making significant progress in the regime of near-linear space, which is arguably of the most interest for practice. This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor, improving upon [Kap15]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on spherical Locality-Sensitive Filtering [BDGL16] and data-dependent hashing [AR15]. Our matching lower bounds are of two types: conditional and unconditional. We prove tightness of the whole trade-off in a restricted model of computation. We show unconditional cell-probe lower bounds for one and two probes that match the trade-off for $\rho_q = 0$. This is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the corresponding one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.

^{0.43...}We give an new arithmetic algorithm to compute the generalized Discrete Fourier Transform (DFT) over finite groups $G$. The new algorithm uses $O(|G|^{\omega/2 + o(1)})$ operations to compute the generalized DFT over finite groups of Lie type, including the linear, orthogonal, and symplectic families and their variants, as well as all finite simple groups of Lie type. Here $\omega$ is the exponent of matrix multiplication, so the exponent $\omega/2$ is optimal if $\omega = 2$. Previously, ``exponent one'' algorithms were known for supersolvable groups and the symmetric and alternating groups. No exponent one algorithms were known, even under the assumption $\omega = 2$, for families of linear groups of fixed dimension, and indeed the previous best-known algorithm for $\SL_2(\F_q)$ had exponent $4/3$ despite being the focus of significant effort. We unconditionally achieve exponent at most $1.19$ for this group, and exponent one if $\omega = 2$. Our algorithm also yields an improved exponent for computing the generalized DFT over general finite groups $G$, which beats the longstanding previous best upper bound, for any $\omega$. In particular, assuming $\omega = 2$, we achieve exponent $\sqrt{2}$, while the previous best was $3/2$.

For a set P of n points in the unit ball bRd, consider the problem of finding a small subset TP such that its convex-hull µ-approximates the convex-hull of the original set. We present an efficient algorithm to compute such a µ2-approximation of size kalg, where µ2 is function of µ, and kalg is a function of the minimum size kopt of such an µ-approximation. Surprisingly, there is no dependency on the dimension d in both bounds. Furthermore, every point of P can be µ-approximated by a convex-combination of points of T that is O(1/µ2)-sparse. Our result can be viewed as a method for sparse, convex autoencoding: approximately representing the data in a compact way using sparse combinations of a small subset T of the original data. The new algorithm can be kernelized, and it preserves sparsity in the original input.

We algorithmize the structural characterization for claw-free graphs by Chudnovsky and Seymour. Building on this result, we show that Dominating Set on claw-free graphs is (i) fixed-parameter tractable and (ii) even possesses a polynomial kernel. To complement these results, we establish that Dominating Set is unlikely to be fixed-parameter tractable on the slightly larger class of graphs that exclude K_1,4 as an induced subgraph (K_1,4-free graphs). We show that our algorithmization can also be used to show that the related Connected Dominating Set problem is fixed-parameter tractable on claw-free graphs. To complement that result, we show that Connected Dominating Set is unlikely to have a polynomial kernel on claw-free graphs and is unlikely to be fixed-parameter tractable on K_1,4-free graphs. Combined, our results provide a dichotomy for Dominating Set and Connected Dominating Set on K_1,L-free graphs and show that the problem is fixed-parameter tractable if and only if L <= 3.

Convergence rate and stability of a solution concept are classically measured in terms of ``eventually'' and ``forever'', respectively. In the wake of recent computational criticisms to this approach, we study whether these time frames can be updated to have states computed ``quickly'' and stable for ``long enough''. Logit dynamics allows irrationality in players' behavior, and may take time exponential in the number of players $n$ to converge to a stable state (i.e., a certain distribution over pure strategy profiles). We prove that every potential game, for which the behavior of the logit dynamics is not chaotic as $n$ increases, admits distributions stable for a super-polynomial number of steps in $n$ no matter the players' irrationality, and the starting profile of the dynamics. The convergence rate to these \emph{metastable distributions} is polynomial in $n$ when the players are not too rational. Our proofs build upon a number of involved technical contributions.