The problem of finding large cliques in random graphs and its ``planted" variant, where one wants to recover a clique of size $\omega \gg \log{(n)}$ added to an \Erdos-\Renyi graph $G \sim G(n,\frac{1}{2})$, have been intensely studied. Nevertheless, existing polynomial time algorithms can only recover planted cliques of size $\omega = \Omega(\sqrt{n})$. By contrast, information theoretically, one can recover planted cliques so long as $\omega \gg \log{(n)}$. In this work, we continue the investigation of algorithms from the sum of squares hierarchy for solving the planted clique problem begun by Meka, Potechin, and Wigderson \cite{MPW15} and Deshpande and Montanari \cite{DM15}. Our main result is that degree four SoS does not recover the planted clique unless $\omega \gg \sqrt n / \polylog n$, improving upon the bound $\omega \gg n^{1/3}$ due to \cite{DM15}. An argument of Kelner shows that the this result cannot be proved using the same certificate as prior works. Rather, our proof involves constructing and analyzing a new certificate that yields the nearly tight lower bound by ``correcting" the certificate of \cite{MPW15,DM15,FeigeK03}.

The Firefighter problem and a variant of it, known as Resource Minimization for Fire Containment (RMFC), are natural models for optimal inhibition of harmful spreading processes. Despite considerable progress on several fronts, the approximability of these problems is still badly understood. This is the case even when the underlying graph is a tree, which is one of the most-studied graph structures in this context and the focus of this paper. In their simplest version, a fire spreads from one fixed vertex step by step from burning to adjacent non-burning vertices, and at each time step B many non-burning vertices can be protected from catching fire. The Firefighter problem asks, for a given B, to maximize the number of vertices that will not catch fire, whereas RMFC (on a tree) asks to find the smallest B that allows for saving all leaves of the tree. Prior to this work, the best known approximation ratios were an O(1)-approximation for the Firefighter problem and an O(log^* n)-approximation for RMFC, both being LP-based and essentially matching the integrality gaps of two natural LP relaxations. We improve on both approximations by presenting a PTAS for the Firefighter problem and an O(1)-approximation for RMFC, both qualitatively matching the known hardness results. Our results are obtained through a combination of the known LPs with several new techniques, which allow for efficiently enumerating over super-constant size sets of constraints to strengthen the natural LPs.

We consider the classical selection and sorting problems in a model where the initial permutation of the input has to be restored after completing the computation. While the requirement of the restoration is stringent compared to the classical versions of the problems, this model is more relaxed than a read-only memory (ROM) where the input elements are not allowed to be moved within the input array. We fi rst show that for a sequence of n integers, selection ( finding the median or more generally the k-th smallest element for a given k) can be done in O(n) time using O(lg n) words. In contrast, no linear-time selection algorithm is known which uses polylogarithmic space in ROM. For sorting n integers in this model, we fi rst present an O(n lg n)-time algorithm using O(lg n) words. When the universe size U is polynomial in n, we give a faster O(n)-time algorithm (analogous to radix sort) which uses O(n^eps) words of extra space for an arbitrarily small constant eps > 0. More generally, we show how to match the time bound of any word-RAM integer-sorting algorithms using O(n^eps) words of extra space. In sharp contrast, there is an (n^2/S)-time lower bound for integer sorting using O(S) space in ROM. For indivisible input elements, we prove the same lower bound for sorting in our model. En route, we develop linear-time in-place algorithms to extract leading bits of the input and to compress and decompress strings with low entropy.

Chazelle and Matou\v sek [\emph{J. Algorithms}, 1996] presented a derandomization of Clarkson's sampling-based algorithm [\emph{J. ACM}, 1995] for solving linear programs with $n$ constraints and $d$ variables in $d^{(7+o(1))d}n$ deterministic time. The time bound can be improved to $d^{(5+o(1))d}n$ with subsequent work by Br\"onnimann, Chazelle, and Matou\v sek [\emph{SIAM J. Comput.}, 1999]. We first point out a much simpler derandomization of Clarkson's algorithm that avoids $\eps$-approximations and runs in $d^{(3+o(1))d}n$ time. We then describe a few additional ideas that eventually improve the deterministic time bound to $d^{(1/2+o(1))d}n$.

We consider a new construction of locality-sensitive hash functions for Hamming space that is covering in the sense that is it guaranteed to produce a collision for every pair of vectors within a given radius r. The construction is efficient in the sense that the expected number of hash collisions between vectors at distance cr, for a given c>1, comes close to that of the best possible data independent LSH without the covering guarantee, namely, the seminal LSH construction of Indyk and Motwani (STOC '98). The efficiency of the new construction essentially matches their bound when the search radius is not too large --- e.g., when cr = o(log(n)/log log n), where n is the number of points in the data set, and when cr = log(n)/k where k is an integer constant. In general, it differs by at most a factor ln(4) in the exponent of the time bounds. As a consequence, LSH-based similarity search in Hamming space can avoid the problem of false negatives at little or no cost in efficiency.

We consider the minimum-load k-facility location (MLKFL) problem: give a set F of facilities, a set C of clients, and an integer k\geq 0, and a distance function d(f,j). The goal is to open a set F'\subseteq F of k facilities, and assign each client j to a facility f(j)\in F so as to minimize \max_{f\in F}\sum_{j\in C:f(j)=f}d(f,j). This problem was studied under the name of min-max star cover in {EGK+03,AHL06}, who gave bicriteria approximation algorithms for when F=C. MLKFL is rather poorly understood, and only an $O(k)$-approximation is currently known even for line metrics. Our main result is a PTAS for MLKFL on line metrics. Complementing this, we prove that MLKFL is strongly NP-hard on line metrics. We also devise a QPTAS for it on tree metrics. MLKFL turns out to be surprisingly challenging even on line metrics; we show that: (a) even a configuration-style LP has a bad integrality gap; and (b) a multi-swap local-search heuristic has a bad locality gap. Our PTAS for line metrics consists of two main ingredients. First, we prove existence of a near-optimal solution possessing some nice properties. A novel aspect of this proof is that we first move to a mixed-integer LP (MILP), and argue that a MILP-solution minimizing a certain potential function possesses the desired structure, and then use a rounding algorithm for the generalized-assignment problem to ``transfer'' this structure to the rounded integer solution. We then show how to find a solution having these structural properties using DP.

For any integer $n\geq 1$ a \emph{middle levels Gray code} is a cyclic listing of all bitstrings of length $2n+1$ that have either $n$ or $n+1$ entries equal to 1 such that any two consecutive bitstrings in the list differ in exactly one bit. The question whether such a Gray code exists for every $n\geq 1$ has been the subject of intensive research during the last 30 years, and has been answered affirmatively only recently [T.~Mütze. Proof of the middle levels conjecture. \textit{arXiv:1404.4442}, 2014]. In this work we provide the first efficient algorithm to compute a middle levels Gray code. For a given bitstring, our algorithm computes the next $\ell$ bitstrings in the Gray code in time $\cO(n\ell(1+\frac{n}{\ell}))$, which is $\cO(n)$ on average per bitstring provided that $\ell=\Omega(n)$.

Let $M=(E, \mathcal{I})$ be a matroid of rank $n$. A {\em $k$-truncation} of $M$ is a matroid {$M'=(E,{\mathcal I}')$} such that for any $A\subseteq E$, $A\in {\mathcal2 I}'$ if and only if $|A|\leq k$ and $A\in \I$. Given a linear representation, $A$, of $M$ we consider the problem of finding a linear representation, $A_k$, of the $k$-truncation of $M$. A common way to compute $A_k$ is to multiply the matrix $A$ with a random $k\times n$ matrix, yielding a simple randomized algorithm. So a natural question is whether we can compute $A_k$ {\em deterministically}. In this paper we settle this question for matrices over any field in which the field operations can be done efficiently. This includes any finite field and the field of rational numbers ($\mathbb Q$). Our algorithms are based on the properties of the classical Wronskian determinant, and the folded Wronskian determinant, which was recently introduced by Guruswami and Kopparty~[\,{\em FOCS, 2013; COMBINATORICA 2016}\,], and Forbes and Shpilka~[\,{\em STOC, 2012}\,]. Our main conceptual contribution in this paper is to show that the Wronskian determinant can also be used to obtain a representation of the truncation of a linear matroid in deterministic polynomial time. Finally, we use our results to derandomize several parameterized algorithms, including an algorithm for computing {\sc $\ell$-Matroid Parity}, to which several problems, such as {\sc $\ell$-Matroid Intersection}, can be reduced.

We study the problem of detecting {\em outlier pairs} of strongly correlated variables among a collection of $n$ variables with otherwise weak pairwise correlations. After normalization, this task amounts to the geometric task where we are given as input a set of $n$ vectors with unit Euclidean norm and dimension $d$, and we are asked to find all the outlier pairs of vectors whose inner product is at least $\rho$ in absolute value, subject to the promise that all but at most $q$ pairs of vectors have inner product at most $\tau$ in absolute value for some constants $0<\tau<\rho<1$. Improving on an algorithm of G.~Valiant [FOCS~2012; J.\,ACM~2015], we present a randomized algorithm that for Boolean inputs ($\{-1,1\}$-valued data normalized to unit Euclidean length) runs in time \[ \tilde O\bigl(n^{\max\,\{1-\gamma+M(\Delta\gamma,\gamma),\,M(1-\gamma,2\Delta\gamma)\}}+qdn^{2\gamma}\bigr)\,, \] where $0<\gamma<1$ is a constant tradeoff parameter and $M(\mu,\nu)$ is the exponent to multiply an $\lfloor n^\mu\rfloor\times\lfloor n^\nu\rfloor$ matrix with an $\lfloor n^\nu\rfloor\times \lfloor n^\mu\rfloor$ matrix and $\Delta=1/(1-\log_\tau\rho)$. As corollaries we obtain randomized algorithms that run in time \[ \tilde O\bigl(n^{\frac{2\omega}{3-\log_\tau\rho}}+qdn^{\frac{2(1-\log_\tau\rho)}{3-\log_\tau\rho}}\bigr) \] and in time \[ \tilde O\bigl(n^{\frac{4}{2+\alpha(1-\log_\tau\rho)}}+qdn^{\frac{2\alpha(1-\log_\tau\rho)}{2+\alpha(1-\log_\tau\rho)}}\bigr)\,, \] where $2\leq\omega<2.38$ is the exponent for square matrix multiplication and $0.3<\alpha\leq 1$ is the exponent for\, rectangular matrix multiplication. We present further corollaries for the light bulb problem and for learning sparse Boolean functions. (The notation {$\tilde O(\cdot)$} hides polylogarithmic factors in $n$ and $d$ whose degree may depend on $\rho$ and $\tau$.)

The approximate nearest neighbor problem ($\epsilon$-ANN) in high dimensional Euclidean space has been mainly addressed by Locality Sensitive Hashing (LSH), which has polynomial dependence in the dimension, sublinear query time, but subquadratic space requirement. In this paper, we introduce a new definition of ``low-quality'' embeddings for metric spaces. It requires that, for some query point $q$, there exists an approximate nearest neighbor among the pre-images of the $k>1$ approximate nearest neighbors in the target space. Focusing on Euclidean spaces, we employ random projections in order to reduce the original problem to one in a space of dimension inversely proportional to $k$. The $k$ approximate nearest neighbors can be efficiently retrieved by a data structure such as BBD-trees. The same approach is applied to the problem of computing an approximate near neighbor, where we obtain a data structure requiring linear space, and query time in $O(d n^{\rho})$, for $\rho\approx 1-\epsilon^2/\log(1/\epsilon)$. This directly implies a solution for $\epsilon$-ANN, while achieving a better exponent in the query time than the method based on BBD-trees. Better bounds are obtained in the case of doubling subsets of $\ell_2$, by combining our method with $r$-nets. We implement our method in C++, and present experimental results in dimension up to $500$ and $10^6$ points, which show that performance is better than predicted by the analysis. In addition, we compare our ANN approach to E2LSH, which implements LSH, and we show that the theoretical advantages of each method are reflected on their actual performance.

We present a deterministic incremental algorithm for \textit{exactly} maintaining the size of a minimum cut with $\widetilde{O}(1)$ amortized time per edge insertion and $O(1)$ query time. This result partially answers an open question posed by Thorup [Combinatorica 2007]. It also stays in sharp contrast to a polynomial conditional lower-bound for the fully-dynamic weighted minimum cut problem. Our algorithm is obtained by combining a recent sparsification technique of Kawarabayashi and Thorup [STOC 2015] and an exact incremental algorithm of Henzinger [J. of Algorithm 1997]. We also study space-efficient incremental algorithms for the minimum cut problem. Concretely, we show that there exists an ${O}(n\log n/\varepsilon^2)$ space Monte-Carlo algorithm that can process a stream of edge insertions starting from an empty graph, and with high probability, the algorithm maintains a $(1+\varepsilon)$-approximation to the minimum cut. The algorithm has $\widetilde{O}(1)$ amortized update-time and constant query-time.

We consider the problems of online and stochastic packet queuing in a distributed system of n nodes with queues, where the communication between the nodes is done via a multiple access channel. In the online setting, in each round, an arbitrary number of packets can be injected into the system, each to an arbitrary node's queue. Two measures of performance are considered: the total number of packets in the system, called the total load, and the maximum queue size, called the maximum load. We develop a deterministic distributed algorithm that is asymptotically optimal with respect to both complexity measures, in a competitive way. More precisely, the total load of our algorithm is bigger than the total load of any other algorithm, including centralized online solutions, by only an additive term of O(n^2), while the maximum queue size of our algorithm is at most n times bigger than the maximum queue size of any other algorithm, with an extra additive O(n). The optimality for both measures is justified by proving the corresponding lower bounds, which also separates nearly exponentially distributed solutions from the centralized ones. Next, we show that our algorithm is also stochastically stable for any expected injection rate smaller or equal to 1. This is the first solution to the stochastic queuing problem on a multiple access channel that achieves such stability for the (highest possible) rate equal to 1.

Motivated by applications in cancer genomics and following the work of Hajirasouliha and Raphael (WABI 2014), Hujdurovi et al. (IEEE TCBB, to appear) introduced the minimum conflict-free row split (MCRS) problem: split each row of a given binary matrix into a bitwise OR of a set of rows so that the resulting matrix corresponds to a perfect phylogeny and has the minimum possible number of rows among all matrices with this property. Hajirasouliha and Raphael also proposed the study of a similar problem, in which the task is to minimize the number of distinct rows of the resulting matrix. Hujdurovi et al. proved that both problems are NP-hard, gave a related characterization of transitively orientable graphs, and proposed a polynomial-time heuristic algorithm for the MCRS problem based on coloring cocomparability graphs. We give new, more transparent formulations of the two problems, showing that the problems are equivalent to two optimization problems on branchings in a derived directed acyclic graph. Building on these formulations, we obtain new results on the two problems, including: (i) a strengthening of the heuristic by Hujdurovi et al. via a new min-max result in digraphs generalizing Dilworth's theorem, which may be of independent interest, (ii) APX-hardness results for both problems, (iii) approximation algorithms, and (iv) exponential-time algorithms solving the two problems to optimality faster than the naïve brute-force approach. Our work relates to several well studied notions in combinatorial optimization: chain partitions in partially ordered sets, laminar hypergraphs, and (classical and weighted) colorings of graphs.

We obtain a number of lower bounds on the running time of algorithms solving problems on graphs of bounded treewidth. We prove the results under the Strong Exponential Time Hypothesis of Impagliazzo and Paturi. In particular, assuming that SAT cannot be solved in (2-\epsilon)^{n}m^{O(1)} time, we show that for any e > 0; {\sc Independent Set} cannot be solved in (2-e)^{tw(G)}|V(G)|^{O(1)} time, {\sc Dominating Set} cannot be solved in (3-e)^{tw(G)}|V(G)|^{O(1)} time, {\sc Max Cut} cannot be solved in (2-e)^{tw(G)}|V(G)|^{O(1)} time, {\sc Odd Cycle Transversal} cannot be solved in (3-e)^{tw(G)}|V(G)|^{O(1)} time, For any qe 3, q-{\sc Coloring} cannot be solved in (q-e)^{tw(G)}|V(G)|^{O(1)} time, {\sc Partition Into Triangles} cannot be solved in (2-e)^{tw(G)}|V(G)|^{O(1)} time. Our lower bounds match the running times for the best known algorithms for the problems, up to the e in the base.

The Weighted Tree Augmentation Problem (WTAP) is a fundamental well-studied problem in the field of network design. Given an undirected tree $G=(V,E)$, an additional set of edges $L \subseteq V\times V$ disjoint from $E$ called \textit{links} and a cost vector $c\in \mathbb{R}_{\geq 0}^L$, WTAP asks to find a minimum-cost set $F\subseteq L$ with the property that $(V,E\cup F)$ is $2$-edge connected. The special case where $c_\ell = 1$ for all $\ell\in L$ is called the Tree Augmentation Problem (TAP). For the class of bounded cost vectors, we present a first improved approximation algorithm for WTAP since more than three decades. Concretely, for any $M\in \mathbb{R}_{\geq 1}$ and $\epsilon > 0$, we present an LP based $(\delta+\epsilon)$-approximation for WTAP restricted to cost vectors $c$ in $[1,M]^L$ for $\delta \approx 1.96417$. More generally, our result is a $(\delta+\epsilon)$-approximation algorithm with running time $n^{r^{O(1)}}$, where $r = c_{\max}/c_{\min}$ is the ratio between the largest and the smallest cost of any link. For the special case of TAP we improve this factor to $\frac{5}{3}+\epsilon$. Our results rely on several new ideas, including a new LP relaxation of WTAP and a two-phase rounding algorithm.

We prove the first non-trivial performance ratio strictly above 0.5 for the weighted Ranking algorithm on the oblivious matching problem where nodes in a general graph can have arbitrary weights. We have discovered a new structural property of the ranking algorithm: if a node has two unmatched neighbors, then it will still be matched even when its rank is demoted to the bottom. This property allows us to form LP constraints for both the weighted and the unweighted versions of the problem. Using a new class of continuous LP, we prove that the ratio for the weighted case is at least 0.501512, and improve the ratio for the unweighted case to 0.526823 (from the previous best 0.523166 in SODA 2014). Unlike previous continuous LP in which the primal solution must be continuous everywhere, our new continuous LP framework allows the monotone component of the primal function to have jump discontinuities, and the other primal components to take non-conventional forms such as the Dirac delta function.

Given a set S of integers whose sum is zero, consider the problem of finding a permutation of these integers such that: (i) all prefixes of the ordering are non-negative, and (ii) the maximum value of a prefix sum is minimized. Kellerer et al. referred to this problem as the stock size problem and showed that it can be approximated to within 3/2. They also showed that an approximation ratio of 2 can be achieved via several simple algorithms. We consider a related problem, which we call the alternating stock size problem, where the numbers of positive and negative integers in the input set S are equal. The problem is the same as above, but we are additionally required to alternate the positive and negative numbers in the output ordering. This problem also has several simple 2-approximations. We show that it can be approximated to within 1.79. Then we show that this problem is closely related to an optimization version of the gasoline puzzle due to Lovasz, in which we want to minimize the size of the gas tank necessary to go around the track. We present a 2-approximation for this problem, using a natural linear programming relaxation whose feasible solutions are doubly stochastic matrices. Our novel rounding algorithm is based on a transformation that yields another doubly stochastic matrix with special properties, from which we can extract a suitable permutation.