About me

Machine Learning + Optimization

I am a Ph.D. student at the University of Southern California, advised by Prof. Bistra Dilkina, and working in the Center for AI in Society (CAIS). I work at the intersection of machine learning and combinatorial optimization and am passionate about applying my work to social good settings.

My main research direction is tightly integrating combinatorial optimization into deep learning pipelines so that predictive models can be trained end-to-end with decision-making modules. Additionally, I have worked on machine learning methods for speeding up combinatorial solvers. More broadly, I am interested in machine learning, combinatorial optimization, and developing new methods for tightly integrating the two. I am currently looking for industry positions that can leverage my skills in machine learning, data science, and optimization.

News

  • [July 23-July 29, 2023] ICML 2023 will feature SurCo and our work on using constrastive learning for local search in MILP (paper)!
  • [May 29-June 1, 2023] CPAIOR 2023 will feature our work on predicting wildlife trafficking paths with differentiable path solvers and local branching heuristics for solving MILP.
  • [Feb 27-Mar 3, 2023] Our work will be presented at IPAM Artificial Intelligence and Discrete Optimization: channel, Yuandong’s video, Bistra’s video.
  • [Feb 14, 2023] Our work “Predicting Wildlife Trafficking Routes with Differentiable Shortest Paths” will be presented at the AAAI AI4SG workshop, and was accepted to CPAIOR 2023.

Publications and preprints

SurCo
Aaron Ferber, Taoan Huang, Daochen Zha, Martin Schubert, Benoit Steiner, Bistra Dilkina, Yuandong Tian
SurCo: Learning Linear Surrogates For Combinatorial Nonlinear Optimization Problems
ICML 2023 [paper]

TLDR: Solves nonlinear combinatorial problems using a learned linear surrogate by training the linear surrgate via gradient updates, with applications in large-scale (facebook-sized) recommendation systems and inverse photonics for metaverse device creation. Optimization problems with expensive nonlinear cost functions and combinatorial constraints appear in many real-world applications, but remain challenging to solve efficiently. Existing combinatorial solvers like Mixed Integer Linear Programming can be fast in practice but cannot readily optimize nonlinear cost functions, while general nonlinear optimizers like gradient descent often do not handle complex combinatorial structures, may require many queries of the cost function, and are prone to local optima. To bridge this gap, we propose SurCo that learns linear Surrogate costs which can be used by existing Combinatorial solvers to output good solutions to the original nonlinear combinatorial optimization problem, combining the flexibility of gradient-based methods with the structure of linear combinatorial optimization. We learn these linear surrogates end-to-end with the nonlinear loss by differentiating through the linear surrogate solver. Three variants of SurCo are proposed: SurCo-zero operates on individual nonlinear problems, SurCo-prior trains a linear surrogate predictor on distributions of problems, and SurCo-hybrid uses a model trained offline to warm start online solving for SurCo-zero. We analyze our method theoretically and empirically, showing smooth convergence and improved performance. Experiments show that compared to state-of-the-art approaches and expert-designed heuristics, SurCo obtains lower cost solutions with comparable or faster solve time for two realworld industry-level applications: embedding table sharding and inverse photonic design.

contrastive lns
Taoan Huang, Aaron Ferber, Yuandong Tian, Bistra Dilkina, Benoit Steiner
Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning
ICML 2023 [paper]

TLDR: Leveraging contrastive learning to identify promising neighborhoods to locally improve combinatorial solutions. Integer Linear Programs (ILPs) are powerful tools for modeling and solving a large number of combinatorial optimization problems. Recently, it has been shown that Large Neighborhood Search (LNS), as a heuristic algorithm, can find high quality solutions to ILPs faster than Branch and Bound. However, how to find the right heuristics to maximize the performance of LNS remains an open problem. In this paper, we propose a novel approach, CL-LNS, that delivers state-of-the-art anytime performance on several ILP benchmarks measured by metrics including the primal gap, the primal integral, survival rates and the best performing rate. Specifically, CL-LNS collects positive and negative solution samples from an expert heuristic that is slow to compute and learns a new one with a contrastive loss. We use graph attention networks and a richer set of features to further improve its performance.

Path Prediction
Aaron Ferber, Emily Griffin, Bistra Dilkina, Burcu Keskin, Meredith Gore
Predicting Wildlife Trafficking Routes with Differentiable Shortest Paths
CPAIOR 2023

TLDR: Predicting wildlife trafficking routes in the airline network using a path prediction module which relies on a differentiable shortest path solver, applying our predictions on real-world data. Wildlife trafficking (WT), the illegal trade of wild fauna, flora, and their parts, directly threatens biodiversity and conservation of trafficked species, while also negatively impacting human health, national security, and economic development. Wildlife traffickers obfuscate their activities in plain sight, leveraging legal, large, and globally linked transportation networks. To complicate matters, defensive interdiction resources are limited, datasets are fragmented and rarely interoperable, and interventions like setting checkpoints place a burden on legal transportation. As a result, interpretable predictions of which routes wildlife traffickers are likely to take can help target defensive efforts and understand what wildlife traffickers may be considering when selecting routes. We propose a data-driven model for predicting trafficking routes on the global commercial flight network, a transportation network for which we have some historical seizure data and a specification of the possible routes that traffickers may take. While seizure data has limitations such as data bias and dependence on the deployed defensive resources, this is a first step towards predicting wildlife trafficking routes on real-world data. Our seizure data documents the planned commercial flight itinerary of trafficked and successfully interdicted wildlife. We aim to provide predictions of highly-trafficked flight paths for known origin-destination pairs with plausible explanations that illuminate how traffickers make decisions based on the presence of criminal actors, markets, and resilience systems. We propose a model that first predicts likelihoods of which commercial flights will be taken out of a given airport given input features, and then subsequently finds the highest-likelihood flight path from origin to destination using a differentiable shortest path solver, allowing us to automatically align our model's loss with the overall goal of correctly predicting the full flight itinerary from a given source to a destination. We evaluate the proposed model's predictions and interpretations both quantitatively and qualitatively, showing that the predicted paths are aligned with observed held-out seizures, and can be interpreted by policy-makers.

LB Relax
Taoan Huang, Aaron Ferber, Yuandong Tian, Bistra Dilkina, Benoit Steiner
Local Branching Relaxation Heuristics for Integer Linear Programs.
CPAIOR 2023 [arXiv]

TLDR: Several heuristic approaches for local search that we found which give improved performance in several problem classes. Large Neighborhood Search (LNS) is a popular heuristic algorithm for solving combinatorial optimization problems (COP). It starts with an initial solution to the problem and iteratively improves it by searching a large neighborhood around the current best solution. LNS relies on heuristics to select neighborhoods to search in. In this paper, we focus on designing effective and efficient heuristics in LNS for integer linear programs (ILP) since a wide range of COPs can be represented as ILPs. Local Branching (LB) is a heuristic that selects the neighborhood that leads to the largest improvement over the current solution in each iteration of LNS. LB is often slow since it needs to solve an ILP of the same size as input. Our proposed heuristics, LB-RELAX and its variants, use the linear programming relaxation of LB to select neighborhoods. Empirically, LB-RELAX and its variants compute as effective neighborhoods as LB but run faster. They achieve state-of-the-art anytime performance on several ILP benchmarks.

IWT
Meredith Gore, Emily Griffin, Bistra Dilkina, Aaron Ferber, Stanley Griffis, Burcu Keskin, John MacDonald
Advancing Interdisciplinary Science for Disrupting Wildlife Trafficking Networks
PNAS 2023 [Paper]

TLDR: All hands on deck! Combating wildlife trafficking means working with interdisciplinary teams, datasets, and stakeholders. Wildlife trafficking, whether local in scope or as a transnational organized crime, undermines sustainable development efforts, erodes the local and global economy, and facilitates the spread of zoonotic diseases. Wildlife trafficking networks (WTNs) can occupy a unique grey space in supply chains--straddling licit and illicit networks, supporting legitimate and criminal workforces, and often demonstrating high resilience in their sourcing flexibility, adaptability, and market position. Authorities desire, but currently lack, knowledge about how to allocate resources to disrupt illicit supply networks and defend against their collateral impacts. Without a deeper scientific understanding about the structure of WTNs, the response of WTNs to policy, law enforcement, market-based, or other interventions is unknown. This paper advances a novel conceptualization of WTNs by integrating knowledge of licit, for-profit supply chain structures and operations to identify points of resilience. Identifying WTN resilience simultaneously exposes new points of fragility that can potentially be targeted for intervention. The case of Madagascar’s ploughshare tortoise trafficking is presented to illustrate the potential of key advancements of interdisciplinary thinking. Insights herein suggest wide berth to generate new science-based recommendations for WTN-related data collection and analysis for supply chain visibility, shifts in illicit supply chain dominance, or limits of the supplier base. Until then, it will be difficult to effectively identify WTN threats, vulnerabilities, and risks.

iwt survey
Burcu Keskin, Emily Griffin, Jonathan Prell, Bistra Dilkina, Aaron Ferber, John MacDonald, Rowan Hilend, Stanley Griffis, Meredith Gore
Quantitative Investigation of Wildlife Trafficking Supply Chains: A Review
Omega 2023 [paper]

TLDR: A review of computational approaches that interact with the wildlife trafficking supply chain: e.g. analyzing social media, identifying the key monitoring points to stop traffickers using optimization, and more! The illicit wildlife trade is a pervasive and global problem that has far-reaching impacts on both society and the environment. Aside from threatening numerous species around the world and acting as a potential disease transmission vector for several zoonotic diseases, including the COVID-19 pandemic, this complex system is often linked with other illicit networks such as drugs, weapons, and human trafficking. The annual monetary value of wildlife trafficking is estimated to be over twenty billion USD, and, unfortunately, wildlife trafficking has several unique characteristics that make it difficult to disrupt in an effective and efficient manner. There has been much research and media awareness around wildlife conservation and moral issues surrounding the illicit wildlife trade, but little is known about the supply chain structures and operations of these illicit networks, especially from a quantitative, analytical perspective. This research reviews wildlife trafficking through an operations and supply chain lens. By understanding the unique challenges faced in impeding wildlife trafficking, we present opportunities to resolve them using analytical techniques. We provide the groundwork for future developments in detection, interdiction, reduction, and possibly, elimination of illicit wildlife trade.

learning backdoors
Aaron Ferber, Jialin Song, Bistra Dilkina, Yisong Yue
Learning Pseudo-Backdoors for Mixed Integer Programs
CPAIOR 2022 [paper]

TLDR: Use machine learning to identify the core hardness component of hard combinatorial optimization problems so that those can be resolved first before solving the rest of the problem. We propose a machine learning approach for quickly solving Mixed Integer Programs (MIPs) by learning to prioritize sets of branching variables at the root node which result in faster solution times, which we call pseudo-backdoors. Learning-based approaches have seen success in combinatorial optimization by flexibly leveraging common structures in a given distribution of problems. Our approach takes inspiration from the concept of strong backdoors, which are small sets of variables such that only branching on these variables yields an optimal integral solution and a proof of optimality. Our notion of pseudo-backdoors corresponds to a small set of variables such that prioritizing branching on them when possible leads to faster solve time. A key advantage of pseudo-backdoors over strong backdoors is that they retain the solver’s optimality guarantees and are amenable to data-driven identification. Our proposed method learns to estimate the relative solver speed of a candidate pseudo-backdoor and determine whether or not to use it. This pipeline can be used to identify high-quality pseudo-backdoors on unseen MIP instances for a given MIP distribution. We evaluate our method on five problem distributions and find that our approach can efficiently identify high-quality pseudo-backdoors. In addition, we compare our learned approach against Gurobi, a state-of-the-art MIP solver, demonstrating that our method can be used to improve solver performance.

contrastive fairness
Umang Gupta, Aaron Ferber, Bistra Dilkina, Greg Ver Steeg
Controllable Guarantees for Fair Outcomes via Contrastive Information Estimation
AAAI 2021 [paper, code, video]

TLDR: Training fairer machine learning models by trading off predictive performance with information content about protected classes. Controlling bias in training datasets is vital for ensuring equal treatment, or parity, between different groups in downstream applications. A naive solution is to transform the data so that it is statistically independent of group membership, but this may throw away too much information when a reasonable compromise between fairness and accuracy is desired. Another common approach is to limit the ability of a particular adversary who seeks to maximize parity. Unfortunately, representations produced by adversarial approaches may still retain biases as their efficacy is tied to the complexity of the adversary used during training. To this end, we theoretically establish that by limiting the mutual information between representations and protected attributes, we can assuredly control the parity of any downstream classifier. We demonstrate an effective method for controlling parity through mutual information based on contrastive information estimators and show that they outperform approaches that rely on variational bounds based on complex generative models. We test our approach on UCI Adult and Heritage Health datasets and demonstrate that our approach provides more informative representations across a range of desired parity thresholds while providing strong theoretical guarantees on the parity of any downstream algorithm.

MIPaaL
Aaron Ferber, Bryan Wilder, Bistra Dilkina, Milind Tambe
MIPaal: Mixed Integer Program as a Layer
AAAI 2020 [paper]

TLDR: Backpropagate through general-purpose NP-Hard problems to train ML + Opt pipelines end-to-end, with applications in finance, smart grid, and fair bipartite matching. Machine learning components commonly appear in larger decision-making pipelines; however, the model training process typically focuses only on a loss that measures average accuracy between predicted values and ground truth values. Decision-focused learning explicitly integrates the downstream decision problem when training the predictive model, in order to optimize the quality of decisions induced by the predictions. It has been successfully applied to several limited combinatorial problem classes, such as those that can be expressed as linear programs (LP), and submodular optimization. However, these previous applications have uniformly focused on problems with simple constraints. Here, we enable decision-focused learning for the broad class of problems that can be encoded as a mixed integer linear program (MIP), hence supporting arbitrary linear constraints over discrete and continuous variables. We show how to differentiate through a MIP by employing a cutting planes solution approach, an algorithm that iteratively tightens the continuous relaxation by adding constraints removing fractional solutions. We evaluate our new end-to-end approach on several real world domains and show that it outperforms the standard two phase approaches that treat prediction and optimization separately, as well as a baseline approach of simply applying decision-focused learning to the LP relaxation of the MIP. Lastly, we demonstrate generalization performance in several transfer learning tasks.

Budgeted Prize-Collecting Traveling Salesman and Minimum Spanning Tree Problems
Alice Paul, Daniel Freund, Aaron Ferber, David B. Shmoys, David P. Williamson
Budgeted Prize-Collecting Traveling Salesman and Minimum Spanning Tree Problems
Mathematics of Operations Research [paper]

TLDR: Approximation algorithms for budgeted prize-collecting TSP and MST with empirical evaluation on real-world bikeshare networks. We consider constrained versions of the prize-collecting traveling salesman and the prize-collecting minimum spanning tree problems. The goal is to maximize the number of vertices in the returned tour/tree subject to a bound on the tour/tree cost. Rooted variants of the problems have the additional constraint that a given vertex, the root, must be contained in the tour/tree. We present a 2-approximation algorithm for the rooted and unrooted versions of both the tree and tour variants. The algorithm is based on a parameterized primal–dual approach. It relies on first finding a threshold value for the dual variable corresponding to the budget constraint in the primal and then carefully constructing a tour/tree that is, in a precise sense, just within budget. We improve upon the best-known guarantee of 2 + ε for the rooted and unrooted tour versions and 3 + ε for the rooted and unrooted tree versions. Our analysis extends to the setting with weighted vertices, in which we want to maximize the total weight of vertices in the tour/tree. Interestingly enough, the algorithm and analysis for the rooted case and the unrooted case are almost identical.

Projects

Teaching Testimonials

  • CSCI 499 - AI For Social Good [review]