Agnostic Learning in Permutation-Invariant Domains

We generalize algorithms from computational learning theory that are successful under the uniform distribution on the Boolean hypercube {0,... (more)

Maximizing k-Submodular Functions and Beyond

We consider the maximization problem in the value oracle model of functions defined on k-tuples of sets that are submodular in every orthant and... (more)

Improved Approximation Algorithms for Matroid and Knapsack Median Problems and Applications

We consider the matroid median problem [Krishnaswamy et al. 2011], wherein we are given a set of facilities with opening costs and a matroid on the... (more)

A Linear-Size Logarithmic Stretch Path-Reporting Distance Oracle for General Graphs

Thorup and Zwick [2001a] proposed a landmark distance oracle with the following properties. Given an n-vertex undirected graph G = (V, E) and a... (more)

Compressed Cache-Oblivious String B-Tree

In this article, we study three variants of the well-known prefix-search problem for strings, and we design solutions for the cache-oblivious model which improve the best known results. Among these contributions, we close (asymptotically) the classic problem, which asks for the detection of the set of strings that share the longest common prefix... (more)

Data Structures for Path Queries

Consider a tree T on n nodes, each having a weight drawn from [1‥σ]. In this article, we study the problem of supporting various path queries over the tree T. The path counting query asks for the number of the nodes on a query path whose weights are in a query range, while the path reporting query requires to report these nodes. The path... (more)

Approximation Algorithms for Movement Repairmen

In the Movement Repairmen (MR) problem, we are given a metric space (V, d) along with a set R of k repairmen r1, r2, …, rk with their start depots s1, s2, …, sk ∈ V and speeds v1, v2, …, vk ⩾ 0, respectively, and a set C of m clients c1, c2, …, cm having start locations s′1, s′2, …, s′m... (more)

On Hierarchical Routing in Doubling Metrics

We study the problem of routing in doubling metrics and show how to perform hierarchical routing in such metrics with small stretch and compact routing tables (i.e., with a small amount of routing information stored at each vertex). We say that a metric (X, d) has doubling dimension dim(<i<X</i<) at most α if every ball can be... (more)


About TALG

The ACM Transactions on Algorithms (TALG) publishes original research of the highest quality dealing with algorithms that are inherently discrete and finite, and having mathematical content in a natural way, either in the objective or in the analysis.

Nearest-Neighbor Searching Under Uncertainty II

Nearest-neighbor (NN) search, which returns the nearest neighbor of a query point in a set of points, is an important and widely studied problem in many fields, and it has wide range of applications. In many of them, such as sensor databases, location-based services, face recognition, and mobile data, the location of data is imprecise. We therefore study nearest neighbor queries in a probabilistic framework in which the location of each input point is specified as a probability density function. We present efficient algorithms for (i) computing all points that are nearest neighbors of a query point with nonzero probability; (ii) estimating, within a specified additive error, the probability of a point being the nearest neighbor of a query point; (iii) using it to return the point that maximizes the probability being the nearest neighbor, or all the points with probabilities greater than some threshold to be the NN.

Adaptive and Approximate Orthogonal Range Counting

We present three new results on one of the most basic problems in geometric data structures, 2-D orthogonal range counting. All the results are in the $w$-bit word RAM model. - It is well known that there are linear-space data structures for 2-D orthogonal range counting with worst-case optimal query time O(log n / loglog n). We give an O(n loglog n)-space adaptive data structure that improves the query time to O(loglog n+log k / loglog n), where k is the output count. When k=O(1), our bounds match the state of the art for the 2-D orthogonal range emptiness problem [Chan, Larsen, and Patrascu, SoCG 2011]. - We give an O(n loglog n)-space data structure for approximate 2-D orthogonal range counting that can compute a (1+delta)-factor approximation to the count in O(loglog n) time for any fixed constant delta>0. Again, our bounds match the state of the art for the 2-D orthogonal range emptiness problem. - Lastly, we consider the 1-D range selection problem, where a query in an array involves finding the kth least element in a given subarray. This problem is closely related to 2-D 3-sided orthogonal range counting. Recently, Jorgensen and Larsen [SODA 2011] presented a linear-space adaptive data structure with query time O(loglog n + log k / loglog n). We give a new linear-space structure that improves the query time to O(1 + log k / loglog n), exactly matching the lower bound proved by Jorgensen and Larsen.

An Improved Approximation for k-median, and Positive Correlation in Budgeted Optimization

Dependent rounding is a useful technique for optimization problems with hard budget constraints. This framework naturally leads to \emph{negative correlation} properties. However, what if an application naturally calls for dependent rounding on the one hand, and desires \emph{positive} correlation on the other? More generally, we develop algorithms that guarantee the known properties of dependent rounding, but also have nearly best-possible behavior -- near-independence, which generalizes positive correlation -- on ``small" subsets of the variables. The recent breakthrough of Li \& Svensson for the classical $k$-median problem has to handle positive correlation in certain dependent-rounding settings, and does so implicitly. We improve upon Li-Svensson's approximation ratio for $k$-median from $2.732 + \epsilon$ to $2.611 + \epsilon$ by developing an algorithm that improves upon various aspects of their work. Our dependent-rounding approach helps us improve the dependence of the runtime on the parameter $\epsilon$ from Li-Svensson's $N^{O(1/\epsilon^2)}$ to $N^{O((1/\epsilon) \log(1/\epsilon))}$.

Inapproximability of the Multi-level Uncapacitated Facility Location Problem

Tight lower bound for the channel assignment problem

We study the complexity of the Channel Assignment problem. An open problem asks whether Channel Assignment admits an $O(c^n)$-time algorithm, for a constant $c$ independent of the weights on the edges. We answer this question in the negative i.e. we show that there is no $2^{o(n\log n)}$-time algorithm solving Channel Assignment unless the Exponential Time Hypothesis fails. Note that the currently best known algorithm works in time $O^*(n!) = 2^{O(n\log n)}$ so our lower bound is tight.

On Uniform Capacitated k-Median Beyond the Natural LP Relaxation

In this paper, we study the uniform capacitated k-median problem. In the problem, we are given a set F of potential facility locations, a set C of clients, a metric d over F \cup C, an upper bound k on the number of facilities we can open and an upper bound u on the number of clients each facility can serve. We need to open a subset S \subseteq F of k facilities and connect clients in C to facilities in S so that each facility is connected by at most u clients. The goal is to minimize the total connection cost over all clients. Obtaining a constant approximation algorithm for this problem is a notorious open problem; most previous works gave constant approximations by either violating the capacity constraints or the cardinality constraint. Notably, all these algorithms are based on the natural LP-relaxation for the problem. The LP-relaxation has unbounded integrality gap, even when we are allowed to violate the capacity constraints or the cardinality constraint by a factor of 2-\eps. Our result is an \exp(O(1/\eps^2))-approximation algorithm for the problem that violates the cardinality constraint by a factor of 1+\eps. This is already beyond the capability of the natural LP relaxation, as it has unbounded integrality gap even if we are allowed to open (2-\eps)k facilities. Indeed, our result is based on a novel LP for this problem. We hope that this LP is the first step towards a constant approximation for capacitated k-median.

Tabulating Pseudoprimes and Tabulating Liars

This paper presents new algorithms for two problems related to the Miller-Rabin-Selfridge primality test. The first problem is to tabulate strong pseudoprimes to a fixed base $a$. Tabulating up to $x$ requires $O(x)$ multiplications, where previous methods required $O(x \log{x})$ multiplications. The second problem is to find all strong liars and witnesses, given a fixed odd composite $n$. This appears to be unstudied, and an algorithm is presented that requires $O(n (\log\log{n})^2)$ multiplications. Although interesting in their own right, a notable application is the search for sets of composites with no reliable witness.

Deletion Without Rebalancing in Binary Search Trees

We address the vexing issue of deletions in balanced trees. Rebalancing after a deletion is generally more complicated than rebalancing after an insertion. Textbooks neglect deletion rebalancing, and many B-tree-based database systems do not do it. We describe a relaxation of AVL trees in which rebalancing is done after insertions but not after deletions, yet access time remains logarithmic in the number of insertions. For many applications of balanced trees, our structure offers performance competitive with that of classical balanced trees. With the addition of periodic rebuilding, the performance of our structure is theoretically superior to that of many if not all classic balanced tree structures. Our structure needs lglg(m) + 1 bits of balance information per node, where m is the number of insertions and lg is the base-two logarithm, or lglg(n) + O(1) with periodic rebuilding, where n is the number of nodes. An insertion takes up to two rotations and O(1) amortized time, the same as in standard AVL trees. Using an analysis that relies on an exponential potential function, we show that rebalancing steps occur with a frequency that is exponentially small in the height of the affected node. Our techniques apply to other types of balanced trees, notably B-trees, as we show in a companion paper, and in particular red-black trees, which can be viewed as a special case of B-trees.

Waste Makes Haste: Bounded Time algorithms for Envy-Free Cake Cutting with Free Disposal

We consider the classic problem of envy-free division of a heterogeneous good ("cake") among several agents. It is known that, when the allotted pieces must be connected, the problem cannot be solved by a finite algorithm for 3 or more agents. Even when the pieces may be disconnected, no bounded-time algorithm is known for 5 or more agents. The impossibility result, however, assumes that the entire cake must be allocated. In this paper we replace the entire-allocation requirement with a weaker partial-proportionality requirement: the piece given to each agent must be worth for it at least a certain positive fraction of the entire cake value. We prove that this version of the problem is solvable in bounded time even when the pieces must be connected. We present bounded-time envy-free cake-cutting algorithms for: (1) giving each of $n$ agents a connected piece with a positive value; (2) giving each of 3 agents a connected piece worth at least 1/3; (3) giving each of 4 agents a connected piece worth at least 1/7; (4) giving each of 4 agents a disconnected piece worth at least 1/4; (5) giving each of $n$ agents a disconnected piece worth at least $(1-\epsilon)/n$ for any positive $\epsilon$.

Smoothed Analysis of the 2-Opt Algorithm for the General TSP

2-Opt is a simple local search heuristic for the traveling salesperson problem, which performs very well in experiments both with respect to running time and solution quality. In contrast to this, there are instances on which 2-Opt may need an exponential number of steps to reach a local optimum. To understand why 2-Opt usually finds local optima quickly in experiments, we study its expected running time in the model of smoothed analysis, which can be considered as a less pessimistic variant of worst-case analysis in which the adversarial input is subject to a small amount of random noise. In our probabilistic input model an adversary chooses an arbitrary graph~$G$ and additionally a probability density function for each edge according to which its length is chosen. We prove that in this model the expected number of local improvements is~$O(mn\phi(\log m)^3\cdot 4^{3\sqrt{\ln{m}}})=m^{1+o(1)}n\phi$, where~$n$ and~$m$ denote the number of vertices and edges of~$G$, respectively, and~$\phi$ denotes an upper bound on the density functions.

Space-Constrained Interval Selection

We study streaming algorithms for the interval selection problem: finding a maximum cardinality subset of disjoint intervals on the line. A deterministic $2$-approximation streaming algorithm for this problem is developed, together with an algorithm for the special case of proper intervals, achieving improved approximation ratio of $3/2$. We complement these upper bounds by proving that they are essentially best possible in the streaming setting: it is shown that an approximation ratio of $2 - \epsilon$ (or $3 / 2 - \epsilon$ for proper intervals) cannot be achieved unless the space is linear in the input size. In passing, we also answer an open question of Adler and Azar (J.\ Scheduling 2003) regarding the space complexity of constant-competitive randomized preemptive online algorithms for the same problem.

Semi-Streaming Set Cover

This paper studies the set cover problem under the semi-streaming model. The underlying set system is formalized in terms of a hypergraph $G = (V, E)$ whose edges arrive one-by-one and the goal is to construct an edge cover $F \subseteq E$ with the objective of minimizing the cardinality (cost in the weighted case) of $F$. We consider a parameterized relaxation of this problem, where given some $0 \leq \epsilon < 1$, the goal is to construct an edge $(1 - \epsilon)$-cover, namely, a subset of edges incident to all but an $\epsilon$-fraction of the vertices (or their benefit in the weighted case). The key limitation imposed on the algorithm is that its space is limited to (poly)logarithmically many bits per vertex. Our main result is an asymptotically tight trade-off between $\epsilon$ and the approximation ratio: We design a semi-streaming algorithm that on input hypergraph $G$, constructs a succinct data structure $\mathcal{D}$ such that for every $0 \leq \epsilon < 1$, an edge $(1 - \epsilon)$-cover that approximates the optimal edge \mbox{($1$-)cover} within a factor of $f(\epsilon, n)$ can be extracted from $\mathcal{D}$ (efficiently and with no additional space requirements), where \[ f(\epsilon, n) = \left\{ \begin{array}{ll} O (1 / \epsilon), & \text{if } \epsilon > 1 / \sqrt{n} \\ O (\sqrt{n}), & \text{otherwise} \end{array} \right. \, . \] In particular for the traditional set cover problem we obtain an $O(\sqrt{n})$-approximation. This algorithm is proved to be best possible by establishing a family (parameterized by $\epsilon$) of matching lower bounds.

Sparse Fault-Tolerant BFS Structures

A {\em fault-tolerant} structure for a network is required to continue functioning following the failure of some of the network's edges or vertices. This paper considers {\em breadth-first search (BFS)} spanning trees, and addresses the problem of designing a sparse {\em fault-tolerant} BFS structure, or {\em FT-BFS structure} for short, namely, a sparse subgraph $T$ of the given network $G$ such that subsequent to the failure of a single edge or vertex, the surviving part $T'$ of $T$ still contains a BFS spanning tree for (the surviving part of) $G$. Our main results are as follows. We present an algorithm that for every $n$-vertex graph $G$ and source node $s$ constructs a (single edge failure) FT-BFS structure rooted at $s$ with $O(n \cdot \min\{Depth(s), \sqrt{n}\})$ edges, where $Depth(s)$ is the depth of the BFS tree rooted at $s$. This result is complemented by a matching lower bound. We then consider {\em fault-tolerant multi-source BFS structures}, or {\em FT-MBFS structure} for short, aiming to provide (following a failure) a BFS tree rooted at each source $s\in S$ for some subset of sources $S\subseteq V$. Finally, we propose an $O(\log n)$ approximation algorithm for constructing FT-BFS and FT-MBFS structures.

An Improved Approximation Algorithm for the Edge-Disjoint Paths Problem with Congestion Two

Minimum Latency Submodular Cover

We study the Minimum Latency Submodular Cover problem (MLSC), which consists of a metric $(V,d)$ with source $r\in V$ and $m$ monotone submodular functions $f_1, f_2, ..., f_m: 2^V \rightarrow [0,1]$. The goal is to find a path originating at $r$ that minimizes the total cover time of all functions. This generalizes well-studied problems, such as Submodular Ranking [AzarG11] and Group Steiner Tree [GargKR00]. We give a polynomial time $O(\log \frac{1}{\eps} \cdot \log^{2+\delta} |V|)$-approximation algorithm for MLSC, where $\epsilon>0$ is the smallest non-zero marginal increase of any $\{f_i\}_{i=1}^m$ and $\delta>0$ is any constant. We also consider the Latency Covering Steiner Tree problem (LCST), which is the special case of \mlsc where the $f_i$s are multi-coverage functions. This is a common generalization of the Latency Group Steiner Tree [GuptaNR10, ChakrabartyS11] and Generalized Min-sum Set Cover [AzarGY09, BansalGK10] problems. We obtain an $O(\log^2|V|)$-approximation algorithm for LCST. Finally we study a natural stochastic extension of the Submodular Ranking problem, and obtain an adaptive algorithm with an $O(\log 1/ \eps)$ approximation ratio, which is best possible. This result also generalizes some previously studied stochastic optimization problems, such as Stochastic Set Cover [GoemansV06] and Shared Filter Evaluation [MunagalaSW07,LiuPRY08].

2-Edge Connectivity in Directed Graphs

Edge and vertex connectivity are fundamental concepts in graph theory. While they have been thoroughly studied in the case of undirected graphs, surprisingly not much has been investigated for directed graphs. In this paper we study 2-edge connectivity problems in directed graphs and, in particular, we consider the computation of the following natural relation: We say that two vertices v and w are 2-edge-connected if there are two edge-disjoint paths from v to w and two edge-disjoint paths from w to v. This relation partitions the vertices into blocks such that all vertices in the same block are 2-edge-connected. Differently from the undirected case, those blocks do not correspond to the 2-edge-connected components of the graph. The main result of this paper is an algorithm for computing the 2-edge-connected blocks of a directed graph in linear time. Besides being asymptotically optimal, our algorithm improves significantly over previous bounds. Once the 2-edge-connected blocks are available, we can test in constant time if two vertices are 2-edge-connected. Additionally, when two query vertices v and w are not 2-edge-connected, we can produce in constant time a witness of this property. We are also able to compute in linear time a sparse certificate for this relation, i.e., a subgraph of the input graph that has O(n) edges and maintains the same 2-edge-connected blocks as the input graph, where n is the number of vertices.

How good is multi-pivot quicksort?

Multi-Pivot Quicksort refers to variants of classical quicksort where in the partitioning step k pivots are used to split the input into k + 1 segments. For many years, multi-pivot quicksort was regarded as impractical, but in 2009 a 2-pivot approach by Yaroslavskiy, Bentley, and Bloch was chosen as the standard sorting algorithm in Sun's Java 7. In 2014 at ALENEX, Kushagra et al. introduced an even faster algorithm that uses three pivots. This paper studies what possible advantages multi-pivot quicksort might offer in general. The contributions are as follows: Natural comparison-optimal algorithms for multi-pivot quicksort are devised and analyzed. The analysis shows that the benefits of using multiple pivots with respect to the average comparison count are marginal and these strategies are inferior to simpler strategies such as the well known median-of-k approach. A substantial part of the partitioning cost is caused by rearranging elements. A rigorous analysis of an algorithm for rearranging elements in the partitioning step is carried out, observing mainly how often array cells are accessed during partitioning. The algorithm behaves best if 3 or 5 pivots are used. Experiments show that this translates into good cache behavior and is closest to predicting observed running times of multi-pivot quicksort algorithms. Finally, it is studied how choosing pivots from a sample affects sorting cost.

On the Tradeoff between Stability and Fit

In computing, as in many aspects of life, changes incur cost. Many optimization problems are formulated as a one-time instance starting from scratch. However, a common case that arises is when we already have a set of prior assignments, and must decide how to respond to a new set of constraints, given that each change from the current assignment comes at a price. That is, we would like to maximize the fitness or efficiency of our system, but we need to balance it with the changeout cost from the previous state. We provide a precise formulation for this tradeoff and analyze the resulting {\em stable extensions} of some fundamental problems in measurement and analytics. Our main technical contribution is a stable extension of PPS (probability proportional to size) weighted random sampling, with applications to monitoring and anomaly detection problems. We also provide a general framework that applies to top-k, minimum spanning tree, and assignment. In both cases, we are able to provide exact solutions, and discuss efficient incremental algorithms that can find new solutions as the input changes.

Better Balance by Being Biased: A 0.8776-Approximation for Max Bisection

Dynamic Facility Location via Exponential Clocks

The dynamic facility location problem is a generalization of the classic facility location problem proposed by Eisenstat, Mathieu, and Schabanel to model the dynamics of evolving social/infrastructure networks. The generalization lies in that the distance metric between clients and facilities changes over time. This leads to a trade-off between optimizing the classic objective function and the "stability" of the solution: there is a switching cost charged every time a client changes the facility to which it is connected. While the standard linear program (LP) relaxation for the classic problem naturally extends to this problem, traditional LP-rounding techniques do not, as they are often sensitive to small changes in the metric resulting in frequent switches. We present a new LP-rounding algorithm for facility location problems, which yields the first constant approximation algorithm for the dynamic facility location problem. Our algorithm installs competing exponential clocks on the clients and facilities, and connect every client by the path that repeatedly follows the smallest clock in the neighborhood. The use of exponential clocks gives rise to several properties that distinguish our approach from previous LP-roundings for facility location problems. In particular, we use no clustering and we allow clients to connect through paths of arbitrary lengths. In fact, the clustering-free nature of our algorithm is crucial for applying our LP-rounding approach to the dynamic problem.


