We consider the problem of constructing fast and small binary adder circuits. The Kogge-Stone adder is often considered the fastest, because it computes the carry bits for two n-bit numbers with a depth of 2 log n logic gates, size 4n log n, and all fan-outs bounded by two. Fan-outs of more than two are avoided, because they lead to the insertion of repeaters for repowering the signal and additional depth in the physical implementation. However, the depth bound of the Kogge-Stone adder is off by a factor of two from the lower bound of log n. This bound is achieved asymptotically in two separate constructions by Brent and Krapchenko. Brents construction gives neither a bound on the fan-out nor the size, while Krapchenkos adder has linear size, but can have up to linear fan-out. With a fan-out bound of two, neither construction achieves a depth of less than 2 log n. In a further approach, Brent and Kung proposed an adder with linear size and fan-out two, but twice the depth of the Kogge-Stone adder. These results are 33-43 years old and no substantial theoretical improvement for has been made since then. In this paper we integrate the individual advantages of previous adder circuits into a new family of full adders, the first to improve on the depth bound of 2 log n while maintaining a fan-out bound of two. Our adders achieve an asymptotically optimum logic gate depth of log n + o(log n) and linear size.

We present a new scaling algorithm for maximum (or minimum) weight perfect matching on general, edge weighted graphs. Our algorithm runs in $O(m\sqrt{n}\log(nN))$ time, $O(m\sqrt{n})$ per scale, which matches the running time of the best cardinality matching algorithms on sparse graphs. Here $m,n,$ and $N$ bound the number of edges, vertices, and magnitude of any integer edge weight. Our result improves on a 25-year old algorithm of Gabow and Tarjan, which runs in $O(m\sqrt{n\log n\alpha(m,n)} \log(nN))$ time.

Chazelle and Matou\v sek [\emph{J. Algorithms}, 1996] presented a derandomization of Clarkson's sampling-based algorithm [\emph{J. ACM}, 1995] for solving linear programs with $n$ constraints and $d$ variables in $d^{(7+o(1))d}n$ deterministic time. The time bound can be improved to $d^{(5+o(1))d}n$ with subsequent work by Br\"onnimann, Chazelle, and Matou\v sek [\emph{SIAM J. Comput.}, 1999]. We first point out a much simpler derandomization of Clarkson's algorithm that avoids $\eps$-approximations and runs in $d^{(3+o(1))d}n$ time. We then describe a few additional ideas that eventually improve the deterministic time bound to $d^{(1/2+o(1))d}n$.

We present the buffer heap, a cache-oblivious priority queue that supports Delete-Min, Delete and a hybrid Insert/Decrease-Key operation in $\Oh{\frac{1}{B}\log_{2}{\frac{N}{M}}}$ amortized cache-misses, where $M$ and $B$ are the (unknown) cache-size and block-size, respectively, and $N$ is the number of elements in the queue. We introduce the notion of a slim data structure which captures the situation when only a limited portion of the cache which we call a slim cache, is available to the data structure to retain data between data structural operations. We show that a buffer heap automatically adapts to such an environment and supports all operations in $\Oh{\frac{1}{\lam}+\frac{1}{B}\log_{2}{\frac{N}{\lam}}}$ amortized cache-misses each when the slim cache size is $\lam$. Our results provide substantial improvements over known trivial cache performance bounds for cache-oblivious priority queues with Decrease-Keys. Using the buffer heap we present cache-oblivious implementations of Dijkstra's SSSP algorithm for undirected and directed graphs with non-negative real edge-weights. On a graph with $n$ vertices and $m$ edges, our algorithm for the undirected case incurs $\Oh{n+\frac{m}{B}\log_{2}{\frac{n}{M}}}$ cache-misses and for the directed case causes $\Oh{\left(n+\frac{m}{B}\right)\cdot\log_{2}{\frac{n}{B}}}$ cache-misses. These results give the first non-trivial cache-oblivious bounds for the SSSP problem on general graphs. For the APSP problem on weighted undirected graphs, we incorporate slim buffer heaps into multi-buffer-heaps and use these to improve the cache-aware cache complexity. We also present a simple cache-oblivious APSP algorithm for unweighted undirected graphs that incurs $O(\frac{mn}{B}\log_{M/B}\frac{n}{B})$ cache-misses. This matches the cache-aware bound and is a substantial improvement over the previous cache oblivious bound for the problem.

We consider a new construction of locality-sensitive hash functions for Hamming space that is covering in the sense that is it guaranteed to produce a collision for every pair of vectors within a given radius r. The construction is efficient in the sense that the expected number of hash collisions between vectors at distance cr, for a given c>1, comes close to that of the best possible data independent LSH without the covering guarantee, namely, the seminal LSH construction of Indyk and Motwani (STOC '98). The efficiency of the new construction essentially matches their bound when the search radius is not too large --- e.g., when cr = o(log(n)/log log n), where n is the number of points in the data set, and when cr = log(n)/k where k is an integer constant. In general, it differs by at most a factor ln(4) in the exponent of the time bounds. As a consequence, LSH-based similarity search in Hamming space can avoid the problem of false negatives at little or no cost in efficiency.

We define the parametric closure problem, in which the input is a partially ordered set whose elements have linearly varying weights and the goal is to compute the sequence of minimum-weight lower sets of the partial order as the weights vary. We give polynomial time solutions to many important special cases of this problem including semiorders, reachability orders of bounded-treewidth graphs, partial orders of bounded width, and series-parallel partial orders. Our result for series-parallel orders provides a significant generalization of a previous result of Carlson and Eppstein on bicriterion subtree problems.

This paper addresses the problem of designing a {\em fault-tolerant} $(\alpha, \beta)$ approximate BFS structure (or {\em FT-ABFS structure} for short), namely, a subgraph $H$ of the network $G$ such that subsequent to the failure of some subset $F$ of edges or vertices, the surviving part %$H'$ of $H$ still contains an \emph{approximate} BFS spanning tree for (the surviving part of) $G$, satisfying $\dist(s,v,H\setminus F) \leq \alpha \cdot \dist(s,v,G\setminus F)+\beta$ for every $v \in V$. In SODA'14, we considered {\em multiplicative} $(\alpha,0)$ FT-ABFS structures resilient to a failure of a \emph{single} edge and presented an algorithm that given an $n$-vertex unweighted undirected graph $G$ and a source $s$ constructs a $(3,0)$ FT-ABFS structure rooted at $s$ with at most $4n$ edges (improving by an $O(\log n)$ factor on the near-tight result of \cite{BS10} for the special case of edge failures). Assuming at most $f$ edge failures, for constant integer $f>1$, we prove that there exists a (poly-time constructible) $(3(f+1), (f+1) \log n)$ FT-ABFS structure with $O(f n)$ edges. %%resilient to the failure of up to $f$ edges. We then consider {\em additive} $(1,\beta)$ FT-ABFS structures and demonstrate an interesting dichotomy between multiplicative and additive spanners. In contrast to the linear size of $(\alpha,0)$ FT-ABFS structures, we show that for every $\beta \in [1, O(\log n)]$ there exists an $n$-vertex graph $G$ with a source $s$ for which any $(1,\beta)$ FT-ABFS structure rooted at $s$ has $\Omega(n^{1+\epsilon(\beta)})$ edges, for some function $\epsilon(\beta) \in (0,1)$.

We study the problem of maintaining a breadth-first spanning tree (BFS tree) in partially dynamic distributed networks modeling a sequence of either failures or additions of communication links (but not both). We show deterministic (1+[)-approximation algorithms whose amortized time (over some number of link changes) is sublinear in D, the maximum diameter of the network. Our technique also leads to a deterministic (1+[)-approximate incremental algorithm for single-source shortest paths (SSSP) in the sequential (usual RAM) model. Prior to our work, the state of the art was the classic exact algorithm of Even and Shiloach [1981] that is optimal under some assumptions [Roditty and Zwick 2011; Henzinger et al. 2015]. Our result is the first to show that, in the incremental setting, this bound can be beaten in certain cases if a small approximation is allowed.

We study the problem of computing maximin share allocations, a recently introduced fairness notion. Given a set of *n* agents and a set of goods, the maximin share of an agent is the best she can guarantee to herself, if she is allowed to partition the goods in any way she prefers, into *n* bundles, and then receive her least desirable bundle.
The objective then is to find a partition, where each agent is guaranteed her maximin share.
Such allocations do not always exist, hence we resort to approximation algorithms.
Our main result is a 2/3-approximation, that runs in polynomial time for any number of agents and goods. This improves upon the algorithm of (Procaccia and Wang, 2014), which is also a 2/3-approximation but runs in polynomial time only for a constant number of agents.
To achieve this we redesign certain parts of their algorithm, exploiting the construction of carefully selected matchings in a bipartite graph representation of the problem.
Furthermore, motivated by the apparent difficulty in establishing lower bounds, we undertake a probabilistic analysis. We prove that
in randomly generated instances, maximin share allocations exist with high probability, which can be seen as a justification of previously reported experimental evidence.
Finally, we provide further positive results for two special cases arising from previous works. The first is the intriguing case of 3 agents, where we provide an improved 7/8-approximation. The second case is when all item values belong to {0, 1, 2}, where we obtain an exact algorithm.

The Lovasz Local Lemma (LLL) is a cornerstone principle in the probabilistic method of combinatorics, and a seminal algorithm of Moser & Tardos (2010) provides an efficient randomized algorithm to implement it. This algorithm can be parallelized to give an algorithm that uses polynomially many processors and runs in $O(\log^3 n)$ time, stemming from $O(\log n)$ adaptive computations of a maximal independent set (MIS). Chung et al. (2014) developed faster local and parallel algorithms, potentially running in time $O(\log^2 n)$, but these algorithms work under significantly more stringent conditions than the LLL. We give a new parallel algorithm that works under essentially the same conditions as the original algorithm of Moser \& Tardos but uses only a single MIS computation, thus running in $O(\log^2 n)$ time. This conceptually new algorithm also gives a clean combinatorial description of a satisfying assignment which might be of independent interest. Our techniques extend to the deterministic LLL algorithm given by Chandrasekaran et al. (2013) leading to an NC-algorithm running in time $O(\log^2 n)$ as well.%, subject to the same, slightly stronger, technical conditions compared to the randomized algorithm. We also provide improved bounds on the run-times of the sequential and parallel resampling-based algorithms originally developed by Moser \& Tardos. Our bounds extend to any problem instance in which the tighter Shearer LLL criterion is satisfied. We also improve on the analysis of Kolipaka & Szegedy (2011) to give tighter concentration results.

We show a method resulting in the improvement of several polynomial-space, exponential-time algorithms. The method capitalizes on the existence of small balanced separators for sparse graphs, which can be exploited for branching to disconnect an instance into independent components. For this algorithm design paradigm, the challenge to date has been to obtain improvements in worst-case analyses of algorithms, compared with algorithms that are analyzed with advanced methods, notably Measure and Conquer. Our contribution is the design of a general method to integrate the advantage from the separator-branching into Measure and Conquer, for a more precise and improved running time analysis. We illustrate the method with improved algorithms for Max (r,2)-CSP and #Dominating Set. The previous best algorithms for these problems all used local transformations and were analyzed by the Measure and Conquer method. Our new algorithms capitalize on the existence of small balanced separators for cubic graphs a non-local property and the ability to tailor the local algorithms always to pivot on a vertex in the separator. The new algorithms perform much as the old ones until the separator is empty, at which point they gain because the remaining vertices are split into two independent problem instances that can be solved recursively. It is likely that such algorithms can be effective for other problems too, and we present their design and analysis in a general framework.

We consider the problem of online scheduling of jobs on unrelated machines with dynamic speed scaling to minimize the sum of energy and weighted flow time. We give an algorithm with an almost optimal competitive ratio for arbitrary power functions. (No earlier results handled arbitrary power functions for unrelated machines.) For power functions of the form f(s) = s^± for some constant ± > 1, we get a competitive ratio of O(±/log ±), improving upon a previous competitive ratio of O(±^2) by [Anand et al. 2012], along with a matching lower bound of ©(±/log ±). Further, in the resource augmentation model, with a 1+µ speed up, we give a 2(1/µ+1) competitive algorithm, with essentially the same techniques, improving the bound of 1 + O(1/µ^2) by [Gupta et al. 2010] and matching the bound of [Anand et al. 2012] for the special case of fixed speed unrelated machines. Unlike the previous results most of which used an amortized local competitiveness argument or dual fitting methods, we use a primal-dual method, which is useful not only to analyze the algorithms but also to design the algorithm itself.

We consider a distributed private data analysis setting, where multiple parties each hold some sensitive data; and they wish to run a protocol to learn some aggregate statistics over the distributed dataset, while protecting each user's privacy. As an initial effort, we consider a distributed summation problem. We first show a lower bound, i.e., under information-theoretic differential privacy, any multi-party protocol with a small number of messages must have large additive error. We then show that by adopting a computational differential privacy notion, one can circumvent this lower-bound, and design practical protocols for the periodic distributed summation problem. Our construction has several desirable features. First, it works in the client-server model, and requires no peer-to-peer communication among the clients. Second, our protocol is fault tolerant, and can output meaningful statistics even when a subset of the participants fail to respond. Our constructions guarantee the privacy of honest parties even when a fraction of the participants may be compromised and colluding. In addition, we propose a new distributed noise addition mechanism that guarantees small total error.

We consider the Traveling Salesman Problem with Neighborhoods (TSPN) in doubling metrics. The goal is to find a shortest tour that visits each of a given collection of subsets (regions or neighborhoods) in the underlying metric space. We give a randomized polynomial time approximation scheme (PTAS) when the regions are fat weakly disjoint. This notion of regions was first defined when a QPTAS was given for the problem in [SODA 2010: Chan and Elbassioni]. We combine the techniques in the previous work, together with the recent PTAS for TSP [STOC 2012: Bartal, Gottlieb and Krauthgamer] to achieve a PTAS for TSPN. However, several non-trivial technical hurdles need to be overcome for applying the PTAS framework to TSPN. (1) Heuristic to detect sparse instances. In the STOC 2012 paper, a minimum spanning tree heuristic is used to estimate the portion of an optimal tour within some ball. However, for TSPN, it is not known if an optimal tour would use points inside the ball to visit regions that intersect the ball. (2) Partially cut regions in the recursion. After a sparse ball is identified by the heuristic, the PTAS framework for TSP uses dynamic program to solve the instance restricted to the sparse ball, and recursively solve the remaining instance. However, for TSPN, it is an important issue to decide whether each region partially intersecting the sparse ball should be solved in the sparse instance or considered in the remaining instance.

We give the first linear kernels for Dominating Set and Connected Dominating Set problems on graphs excluding a fixed graph $H$ as a topological minor. Our results extend the known classes of graphs on which Dominating Set and Connected Dominating Set problems admit linear kernels. Prior to our work, it was known that these problems admit linear kernels on graphs excluding a fixed apex graph $H$ as a minor. Moreover, for Dominating Set, a kernel of size $k^{c(H)}$, where $c(H)$ is a constant depending on the size of $H$, follows from a more general result on the kernelization of Dominating Set on graphs of bounded degeneracy. Alon and Gutner asked explicitly, whether one can obtain a linear kernel for Dominating Set on $H$-minor free, graphs. We answer this question in affirmative and in fact prove a more general result. Our kernelization algorithm is based on a non-trivial combination of the following ingredients (a) the structural theorem of Grohe and Marx [STOC 2012] for graphs excluding a fixed graph $H$ as a topological subgraph; (b) A novel notion of protrusions, different that the one defined in [FOCS 2009]; and (c) Our results are based on a generic reduction rule producing an equivalent instance of the problem with treewidth $O(\sqrt{k})$. The application of this rule in a divide-and-conquer fashion together with new notion of protrusions brings us to linear kernels. We believe that the new notion of protrusion will be useful in many other algorithmic settings.

In the Subset Feedback Vertex Set (Subset FVS) problem, the input is a graph G on n vertices and m edges, a subset of vertices T, referred to as terminals, and an integer k. The objective is to determine whether there exists a set of at most k vertices intersecting every cycle that contains a terminal. The study of parameterized algorithms for this generalization of the Feedback Vertex Set problem has received significant attention over the last few years. In fact the parameterized complexity of this problem was open until 2011, when two groups independently showed that the problem is fixed parameter tractable (FPT). However, the existence of an FPT algorithm with a linear dependence on the input size has remained open. In this paper we design the first linear time parameterized algorithms for Subset FVS. More precisely, we obtain the following new algorithms for Subset FVS. " A randomized algorithm for Subset FVS running in time O(25.6^k · (n + m)). " A deterministic algorithm for Subset FVS running in time 2^{O(k log k)} · (n + m). In particular, the first algorithm obtains the best possible dependence on both the parameter as well as the input size, up to the constant in the exponent. Both of our algorithms are based on cut centrality, in the sense that solution vertices are likely to show up in minimum size cuts between vertices sampled from carefully chosen distributions.

In the Independent Set problem, the input is a graph G, every vertex has a non-negative integer weight, and the task is to find a set S of pairwise non-adjacent vertices, maximizing the total weight of the vertices in S. We give an n^{O(log^2 n)} time algorithm for this problem on graphs excluding the path P6 on 6 vertices as an induced subgraph. Currently, there is no constant k known for which Independent Set on Pk-free graphs becomes NP-complete, and our result implies that if such a k exists, then k > 6 unless all problems in NP can be decided in (quasi)polynomial time. Using the combinatorial tools that we develop for the above algorithm, we also give a polynomial-time algorithm for Efficient Dominating Set on P6-free graphs. In this problem, the input is a graph G, every vertex has an integer weight, and the objective is to find a set S of maximum weight such that every vertex in G has exactly one vertex in S in its closed neighborhood, or to determine that no such set exists. Prior to our work, the class of P6-free graphs was the only class of graphs defined by a single forbidden induced subgraph on which the computational complexity of Efficient Dominating Set was unknown.