Build history
270 submissions recorded
anonymous
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
$ python main.py
anonymous
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
$ python main.py
anonymous
ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras
$ python main.py
anonymous
Autonomous Demand-Side Management Based on Game-Theoretic Energy Consumption Scheduling for the Future Smart Grid
$ python main.py
anonymous
ORB-SLAM: A Versatile and Accurate Monocular SLAM System
$ python main.py
anonymous
Suppressing errors is the central challenge for useful quantum computing, requiring quantum error correction for large-scale processing. However, the overhead in the realization of error-corrected ``logical'' qubits, where information is encoded across many physical qubits for redundancy, poses significant challenges to large-scale logical quantum computing. Here we report the realization of a programmable quantum processor based on encoded logical qubits operating with up to 280 physical qubits. Utilizing logical-level control and a zoned architecture in reconfigurable neutral atom arrays, our system combines high two-qubit gate fidelities, arbitrary connectivity, as well as fully programmable single-qubit rotations and mid-circuit readout. Operating this logical processor with various types of encodings, we demonstrate improvement of a two-qubit logic gate by scaling surface code distance from d=3 to d=7, preparation of color code qubits with break-even fidelities, fault-tolerant creation of logical GHZ states and feedforward entanglement teleportation, as well as operation of 40 color code qubits. Finally, using three-dimensional [[8,3,2]] code blocks, we realize computationally complex sampling circuits with up to 48 logical qubits entangled with hypercube connectivity with 228 logical two-qubit gates and 48 logical CCZ gates. We find that this logical encoding substantially improves algorithmic performance with error detection, outperforming physical qubit fidelities at both cross-entropy benchmarking and quantum simulations of fast scrambling. These results herald the advent of early error-corrected quantum computation and chart a path toward large-scale logical processors.
$ python main.py
anonymous
AbstractQuantum computing promises to offer substantial speed-ups over its classical counterpart for certain problems. However, the greatest impediment to realizing its full potential is noise that is inherent to these systems. The widely accepted solution to this challenge is the implementation of fault-tolerant quantum circuits, which is out of reach for current processors. Here we report experiments on a noisy 127-qubit processor and demonstrate the measurement of accurate expectation values for circuit volumes at a scale beyond brute-force classical computation. We argue that this represents evidence for the utility of quantum computing in a pre-fault-tolerant era. These experimental results are enabled by advances in the coherence and calibration of a superconducting processor at this scale and the ability to characterize1 and controllably manipulate noise across such a large device. We establish the accuracy of the measured expectation values by comparing them with the output of exactly verifiable circuits. In the regime of strong entanglement, the quantum computer provides correct results for which leading classical approximations such as pure-state-based 1D (matrix product states, MPS) and 2D (isometric tensor network states, isoTNS) tensor network methods2,3 break down. These experiments demonstrate a foundational tool for the realization of near-term quantum applications4,5.
$ python main.py
anonymous
Quantum Machine Learning in Feature Hilbert Spaces
$ python main.py
anonymous
Good quantum error-correcting codes exist
$ python main.py
anonymous
Efficient sampling of equilibrium states Molecular dynamics or Monte Carlo methods can be used to sample equilibrium states, but these methods become computationally expensive for complex systems, where the transition from one equilibrium state to another may only occur through rare events. Noé et al. used neural networks and deep learning to generate distributions of independent soft condensed-matter samples at equilibrium (see the Perspective by Tuckerman). Supervised training is used to construct invertible transformations between the coordinates of the complex system of interest and simple Gaussian coordinates of the same dimensionality. Thus, configurations can be sampled in this simpler coordinate system and then transformed back into the complex one using the correct statistical weighting. Science , this issue p. eaaw1147 ; see also p. 982
$ python main.py
anonymous
Density matrix formulation for quantum renormalization groups
$ python main.py
anonymous
Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC
$ python main.py
anonymous
Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC
$ python main.py
anonymous
Ballistic Focusing of Polyenergetic Protons Driven by Petawatt Laser Pulses
$ python main.py
anonymous
Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules
$ python main.py
anonymous
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.
$ python main.py
anonymous
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.
$ python main.py
anonymous
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.
$ python main.py
anonymous
This work presents Neural Equivariant Interatomic Potentials (NequIP), an E(3)-equivariant neural network approach for learning interatomic potentials from ab-initio calculations for molecular dynamics simulations. While most contemporary symmetry-aware models use invariant convolutions and only act on scalars, NequIP employs E(3)-equivariant convolutions for interactions of geometric tensors, resulting in a more information-rich and faithful representation of atomic environments. The method achieves state-of-the-art accuracy on a challenging and diverse set of molecules and materials while exhibiting remarkable data efficiency. NequIP outperforms existing models with up to three orders of magnitude fewer training data, challenging the widely held belief that deep neural networks require massive training sets. The high data efficiency of the method allows for the construction of accurate potentials using high-order quantum chemical level of theory as reference and enables high-fidelity molecular dynamics simulations over long time scales.
$ python main.py
anonymous
In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets. Second, these features are also excellent k-NN classifiers, reaching 78.3% top-1 on ImageNet with a small ViT. Our study also underlines the importance of momentum encoder, multi-crop training, and the use of small patches with ViTs. We implement our findings into a simple self-supervised method, called DINO, which we interpret as a form of self-distillation with no labels. We show the synergy between DINO and ViTs by achieving 80.1% top-1 on ImageNet in linear evaluation with ViT-Base.
$ python main.py
anonymous
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
$ python main.py
anonymous
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
$ python main.py
anonymous
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
$ python main.py
anonymous
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.
$ python main.py
anonymous
Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.
$ python main.py
anonymous
Designing agile locomotion for quadruped robots often requires extensive expertise and tedious manual tuning. In this paper, we present a system to automate this process by leveraging deep reinforcement learning techniques. Our system can learn quadruped locomotion from scratch using simple reward signals. In addition, users can provide an open loop reference to guide the learning process when more control over the learned gait is needed. The control policies are learned in a physics simulator and then deployed on real robots. In robotics, policies trained in simulation often do not transfer to the real world. We narrow this reality gap by improving the physics simulator and learning robust policies. We improve the simulation using system identification, developing an accurate actuator model and simulating latency. We learn robust controllers by randomizing the physical environments, adding perturbations and designing a compact observation space. We evaluate our system on two agile locomotion gaits: trotting and galloping. After learning in simulation, a quadruped robot can successfully perform both gaits in the real world.
$ python main.py
anonymous
Using synthetic data for training deep neural networks for robotic manipulation holds the promise of an almost unlimited amount of pre-labeled training data, generated safely out of harm's way. One of the key challenges of synthetic data, to date, has been to bridge the so-called reality gap, so that networks trained on synthetic data operate correctly when exposed to real-world data. We explore the reality gap in the context of 6-DoF pose estimation of known objects from a single RGB image. We show that for this problem the reality gap can be successfully spanned by a simple combination of domain randomized and photorealistic data. Using synthetic data generated in this manner, we introduce a one-shot deep neural network that is able to perform competitively against a state-of-the-art network trained on a combination of real and synthetic data. To our knowledge, this is the first deep network trained only on synthetic data that is able to achieve state-of-the-art performance on 6-DoF object pose estimation. Our network also generalizes better to novel environments including extreme lighting conditions, for which we show qualitative results. Using this network we demonstrate a real-time system estimating object poses with sufficient accuracy for real-world semantic grasping of known household objects in clutter by a real robot.
$ python main.py
anonymous
Minimum snap trajectory generation and control for quadrotors
$ python main.py
anonymous
To reduce data collection time for deep learning of robust robotic grasp plans, we explore training from a synthetic dataset of 6.7 million point clouds, grasps, and analytic grasp metrics generated from thousands of 3D models from Dex-Net 1.0 in randomized poses on a table. We use the resulting dataset, Dex-Net 2.0, to train a Grasp Quality Convolutional Neural Network (GQ-CNN) model that rapidly predicts the probability of success of grasps from depth images, where grasps are specified as the planar position, angle, and depth of a gripper relative to an RGB-D sensor. Experiments with over 1,000 trials on an ABB YuMi comparing grasp planning methods on singulated objects suggest that a GQ-CNN trained with only synthetic data from Dex-Net 2.0 can be used to plan grasps in 0.8sec with a success rate of 93% on eight known objects with adversarial geometry and is 3x faster than registering point clouds to a precomputed dataset of objects and indexing grasps. The Dex-Net 2.0 grasp planner also has the highest success rate on a dataset of 10 novel rigid objects and achieves 99% precision (one false positive out of 69 grasps classified as robust) on a dataset of 40 novel household objects, some of which are articulated or deformable. Code, datasets, videos, and supplementary material are available at http://berkeleyautomation.github.io/dex-net .
$ python main.py
anonymous
Climate change impacts on wind energy: A review
$ python main.py
anonymous
PerCom 2008 advertisement
$ python main.py
anonymous
In this paper, we propose a locomotion training framework where a control policy and a state estimator are trained concurrently. The framework consists of a policy network which outputs the desired joint positions and a state estimation network which outputs estimates of the robot's states such as the base linear velocity, foot height, and contact probability. We exploit a fast simulation environment to train the networks and the trained networks are transferred to the real robot. The trained policy and state estimator are capable of traversing diverse terrains such as a hill, slippery plate, and bumpy road. We also demonstrate that the learned policy can run at up to 3.75 m/s on normal flat ground and 3.54 m/s on a slippery plate with the coefficient of friction of 0.22.
$ python main.py
anonymous
ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras
$ python main.py
anonymous
Autonomous Demand-Side Management Based on Game-Theoretic Energy Consumption Scheduling for the Future Smart Grid
$ python main.py
anonymous
Intelligent control algorithms for robotic-assisted beating heart surgery
$ python main.py
anonymous
ORB-SLAM: A Versatile and Accurate Monocular SLAM System
$ python main.py
anonymous
AbstractQuantum computing promises to offer substantial speed-ups over its classical counterpart for certain problems. However, the greatest impediment to realizing its full potential is noise that is inherent to these systems. The widely accepted solution to this challenge is the implementation of fault-tolerant quantum circuits, which is out of reach for current processors. Here we report experiments on a noisy 127-qubit processor and demonstrate the measurement of accurate expectation values for circuit volumes at a scale beyond brute-force classical computation. We argue that this represents evidence for the utility of quantum computing in a pre-fault-tolerant era. These experimental results are enabled by advances in the coherence and calibration of a superconducting processor at this scale and the ability to characterize1 and controllably manipulate noise across such a large device. We establish the accuracy of the measured expectation values by comparing them with the output of exactly verifiable circuits. In the regime of strong entanglement, the quantum computer provides correct results for which leading classical approximations such as pure-state-based 1D (matrix product states, MPS) and 2D (isometric tensor network states, isoTNS) tensor network methods2,3 break down. These experiments demonstrate a foundational tool for the realization of near-term quantum applications4,5.
$ python main.py
anonymous
Suppressing errors is the central challenge for useful quantum computing, requiring quantum error correction for large-scale processing. However, the overhead in the realization of error-corrected ``logical'' qubits, where information is encoded across many physical qubits for redundancy, poses significant challenges to large-scale logical quantum computing. Here we report the realization of a programmable quantum processor based on encoded logical qubits operating with up to 280 physical qubits. Utilizing logical-level control and a zoned architecture in reconfigurable neutral atom arrays, our system combines high two-qubit gate fidelities, arbitrary connectivity, as well as fully programmable single-qubit rotations and mid-circuit readout. Operating this logical processor with various types of encodings, we demonstrate improvement of a two-qubit logic gate by scaling surface code distance from d=3 to d=7, preparation of color code qubits with break-even fidelities, fault-tolerant creation of logical GHZ states and feedforward entanglement teleportation, as well as operation of 40 color code qubits. Finally, using three-dimensional [[8,3,2]] code blocks, we realize computationally complex sampling circuits with up to 48 logical qubits entangled with hypercube connectivity with 228 logical two-qubit gates and 48 logical CCZ gates. We find that this logical encoding substantially improves algorithmic performance with error detection, outperforming physical qubit fidelities at both cross-entropy benchmarking and quantum simulations of fast scrambling. These results herald the advent of early error-corrected quantum computation and chart a path toward large-scale logical processors.
$ python main.py
anonymous
We propose a classical-quantum hybrid algorithm for machine learning on near-term quantum processors, which we call quantum circuit learning. A quantum circuit driven by our framework learns a given task by tuning parameters implemented on it. The iterative optimization of the parameters allows us to circumvent the high-depth circuit. Theoretical investigation shows that a quantum circuit can approximate nonlinear functions, which is further confirmed by numerical simulations. Hybridizing a low-depth quantum circuit and a classical computer for machine learning, the proposed framework paves the way toward applications of near-term quantum devices for quantum machine learning.
$ python main.py
anonymous
Quantum Machine Learning in Feature Hilbert Spaces
$ python main.py
anonymous
Supervised learning with quantum-enhanced feature spaces
$ python main.py
anonymous
Probing many-body dynamics on a 51-atom quantum simulator
$ python main.py
anonymous
Feynman's 1982 conjecture, that quantum computers can be programmed to simulate any local quantum system, is shown to be correct.
$ python main.py
anonymous
Long-distance quantum communication with atomic ensembles and linear optics
$ python main.py
anonymous
Quantum cryptography based on Bell’s theorem
$ python main.py
anonymous
Quantum cryptography: Public key distribution and coin tossing
$ python main.py
anonymous
Quantum dynamics of single trapped ions
$ python main.py
anonymous
Circuit quantum electrodynamics
$ python main.py
anonymous
Cavity quantum electrodynamics for superconducting electrical circuits: An architecture for quantum computation
$ python main.py
anonymous
AbstractPractical quantum computing will require error rates well below those achievable with physical qubits. Quantum error correction1,2 offers a path to algorithmically relevant error rates by encoding logical qubits within many physical qubits, for which increasing the number of physical qubits enhances protection against physical errors. However, introducing more qubits also increases the number of error sources, so the density of errors must be sufficiently low for logical performance to improve with increasing code size. Here we report the measurement of logical qubit performance scaling across several code sizes, and demonstrate that our system of superconducting qubits has sufficient performance to overcome the additional errors from increasing qubit number. We find that our distance-5 surface code logical qubit modestly outperforms an ensemble of distance-3 logical qubits on average, in terms of both logical error probability over 25 cycles and logical error per cycle ((2.914 ± 0.016)% compared to (3.028 ± 0.023)%). To investigate damaging, low-probability error sources, we run a distance-25 repetition code and observe a 1.7 × 10−6 logical error per cycle floor set by a single high-energy event (1.6 × 10−7 excluding this event). We accurately model our experiment, extracting error budgets that highlight the biggest challenges for future systems. These results mark an experimental demonstration in which quantum error correction begins to improve performance with increasing qubit number, illuminating the path to reaching the logical error rates required for computation.
$ python main.py
anonymous
Practical quantum computing will require error rates that are well below what is achievable with physical qubits. Quantum error correction offers a path to algorithmically-relevant error rates by encoding logical qubits within many physical qubits, where increasing the number of physical qubits enhances protection against physical errors. However, introducing more qubits also increases the number of error sources, so the density of errors must be sufficiently low in order for logical performance to improve with increasing code size. Here, we report the measurement of logical qubit performance scaling across multiple code sizes, and demonstrate that our system of superconducting qubits has sufficient performance to overcome the additional errors from increasing qubit number. We find our distance-5 surface code logical qubit modestly outperforms an ensemble of distance-3 logical qubits on average, both in terms of logical error probability over 25 cycles and logical error per cycle ($2.914\%\pm 0.016\%$ compared to $3.028\%\pm 0.023\%$). To investigate damaging, low-probability error sources, we run a distance-25 repetition code and observe a $1.7\times10^{-6}$ logical error per round floor set by a single high-energy event ($1.6\times10^{-7}$ when excluding this event). We are able to accurately model our experiment, and from this model we can extract error budgets that highlight the biggest challenges for future systems. These results mark the first experimental demonstration where quantum error correction begins to improve performance with increasing qubit number, illuminating the path to reaching the logical error rates required for computation.
$ python main.py
anonymous
We demonstrate a decoherence-free quantum memory of one qubit. By encoding the qubit into the decoherence-free subspace (DFS) of a pair of trapped 9 Be + ions, we protect the qubit from environment-induced dephasing that limits the storage time of a qubit composed of a single ion. We measured the storage time under ambient conditions and under interaction with an engineered noisy environment and observed that encoding into the DFS increases the storage time by up to an order of magnitude. The encoding reversibly transfers an arbitrary qubit stored in a single ion to the DFS of two ions.
$ python main.py
anonymous
Magic-state distillation with low overhead
$ python main.py
anonymous
Good quantum error-correcting codes exist
$ python main.py
anonymous
Scheme for reducing decoherence in quantum computer memory
$ python main.py
anonymous
A light approach to quantum advantage Quantum computational advantage or supremacy is a long-anticipated milestone toward practical quantum computers. Recent work claimed to have reached this point, but subsequent work managed to speed up the classical simulation and pointed toward a sample size–dependent loophole. Quantum computational advantage, rather than being a one-shot experimental proof, will be the result of a long-term competition between quantum devices and classical simulation. Zhong et al. sent 50 indistinguishable single-mode squeezed states into a 100-mode ultralow-loss interferometer and sampled the output using 100 high-efficiency single-photon detectors. By obtaining up to 76-photon coincidence, yielding a state space dimension of about 10 30 , they measured a sampling rate that is about 10 14 -fold faster than using state-of-the-art classical simulation strategies and supercomputers. Science , this issue p. 1460
$ python main.py
anonymous
Quantum supremacy using a programmable superconducting processor
$ python main.py
anonymous
Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets
$ python main.py
anonymous
We introduce a quantum algorithm that produces approximate solutions for combinatorial optimization problems. The algorithm depends on a positive integer p and the quality of the approximation improves as p is increased. The quantum circuit that implements the algorithm consists of unitary gates whose locality is at most the locality of the objective function whose optimum is sought. The depth of the circuit grows linearly with p times (at worst) the number of constraints. If p is fixed, that is, independent of the input size, the algorithm makes use of efficient classical preprocessing. If p grows with the input size a different strategy is proposed. We study the algorithm as applied to MaxCut on regular graphs and analyze its performance on 2-regular and 3-regular graphs for fixed p. For p = 1, on 3-regular graphs the quantum algorithm always finds a cut that is at least 0.6924 times the size of the optimal cut.
$ python main.py
anonymous
A variational eigenvalue solver on a photonic quantum processor
$ python main.py
anonymous
Surface codes: Towards practical large-scale quantum computation
$ python main.py
anonymous
Quantum Algorithm for Linear Systems of Equations
$ python main.py
anonymous
Abstract A class of problems is described which can be solved more efficiently by quantum computation than by any classical or stochastic method. The quantum computation solves the problem with certainty in exponentially less time than any classical deterministic computation.
$ python main.py
anonymous
Quantum Mechanics Helps in Searching for a Needle in a Haystack
$ python main.py
anonymous
Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer
$ python main.py
anonymous
This work presents Neural Equivariant Interatomic Potentials (NequIP), an E(3)-equivariant neural network approach for learning interatomic potentials from ab-initio calculations for molecular dynamics simulations. While most contemporary symmetry-aware models use invariant convolutions and only act on scalars, NequIP employs E(3)-equivariant convolutions for interactions of geometric tensors, resulting in a more information-rich and faithful representation of atomic environments. The method achieves state-of-the-art accuracy on a challenging and diverse set of molecules and materials while exhibiting remarkable data efficiency. NequIP outperforms existing models with up to three orders of magnitude fewer training data, challenging the widely held belief that deep neural networks require massive training sets. The high data efficiency of the method allows for the construction of accurate potentials using high-order quantum chemical level of theory as reference and enables high-fidelity molecular dynamics simulations over long time scales.
$ python main.py
anonymous
Efficient sampling of equilibrium states Molecular dynamics or Monte Carlo methods can be used to sample equilibrium states, but these methods become computationally expensive for complex systems, where the transition from one equilibrium state to another may only occur through rare events. Noé et al. used neural networks and deep learning to generate distributions of independent soft condensed-matter samples at equilibrium (see the Perspective by Tuckerman). Supervised training is used to construct invertible transformations between the coordinates of the complex system of interest and simple Gaussian coordinates of the same dimensionality. Thus, configurations can be sampled in this simpler coordinate system and then transformed back into the complex one using the correct statistical weighting. Science , this issue p. eaaw1147 ; see also p. 982
$ python main.py
anonymous
Hydrodynamics of soft active matter
$ python main.py
anonymous
Novel Type of Phase Transition in a System of Self-Driven Particles
$ python main.py
anonymous
Density matrix formulation for quantum renormalization groups
$ python main.py
anonymous
Noisy intermediate-scale quantum algorithms
$ python main.py
anonymous
On September 14, 2015 at 09:50:45 UTC the two detectors of the Laser Interferometer Gravitational-Wave Observatory simultaneously observed a transient gravitational-wave signal. The signal sweeps upwards in frequency from 35 to 250 Hz with a peak gravitational-wave strain of 1.0×10−21. It matches the waveform predicted by general relativity for the inspiral and merger of a pair of black holes and the ringdown of the resulting single black hole. The signal was observed with a matched-filter signal-to-noise ratio of 24 and a false alarm rate estimated to be less than 1 event per 203 000 years, equivalent to a significance greater than 5.1σ. The source lies at a luminosity distance of 410−180+160 Mpc corresponding to a redshift z=0.09−0.04+0.03. In the source frame, the initial black hole masses are 36−4+5M⊙ and 29−4+4M⊙, and the final black hole mass is 62−4+4M⊙, with 3.0−0.5+0.5M⊙c2 radiated in gravitational waves. All uncertainties define 90% credible intervals. These observations demonstrate the existence of binary stellar-mass black hole systems. This is the first direct detection of gravitational waves and the first observation of a binary black hole merger. Published by the American Physical Society 2016
$ python main.py
anonymous
We present cosmological parameter results from the final full-mission Planck measurements of the cosmic microwave background (CMB) anisotropies, combining information from the temperature and polarization maps and the lensing reconstruction. Compared to the 2015 results, improved measurements of large-scale polarization allow the reionization optical depth to be measured with higher precision, leading to significant gains in the precision of other correlated parameters. Improved modelling of the small-scale polarization leads to more robust constraints on many parameters, with residual modelling uncertainties estimated to affect them only at the 0.5 σ level. We find good consistency with the standard spatially-flat 6-parameter ΛCDM cosmology having a power-law spectrum of adiabatic scalar perturbations (denoted “base ΛCDM” in this paper), from polarization, temperature, and lensing, separately and in combination. A combined analysis gives dark matter density Ω c h 2 = 0.120 ± 0.001, baryon density Ω b h 2 = 0.0224 ± 0.0001, scalar spectral index n s = 0.965 ± 0.004, and optical depth τ = 0.054 ± 0.007 (in this abstract we quote 68% confidence regions on measured parameters and 95% on upper limits). The angular acoustic scale is measured to 0.03% precision, with 100 θ * = 1.0411 ± 0.0003. These results are only weakly dependent on the cosmological model and remain stable, with somewhat increased errors, in many commonly considered extensions. Assuming the base-ΛCDM cosmology, the inferred (model-dependent) late-Universe parameters are: Hubble constant
$ python main.py
anonymous
Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC
$ python main.py
anonymous
Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC
$ python main.py
anonymous
The predictions of gyrokinetic and gyrofluid simulations of ion-temperature-gradient (ITG) instability and turbulence in tokamak plasmas as well as some tokamak plasma thermal transport models, which have been widely used for predicting the performance of the proposed International Thermonuclear Experimental Reactor (ITER) tokamak [Plasma Physics and Controlled Nuclear Fusion Research, 1996 (International Atomic Energy Agency, Vienna, 1997), Vol. 1, p. 3], are compared. These comparisons provide information on effects of differences in the physics content of the various models and on the fusion-relevant figures of merit of plasma performance predicted by the models. Many of the comparisons are undertaken for a simplified plasma model and geometry which is an idealization of the plasma conditions and geometry in a Doublet III-D [Plasma Physics and Controlled Nuclear Fusion Research, 1986 (International Atomic Energy Agency, Vienna, 1987), Vol. 1, p. 159] high confinement (H-mode) experiment. Most of the models show good agreements in their predictions and assumptions for the linear growth rates and frequencies. There are some differences associated with different equilibria. However, there are significant differences in the transport levels between the models. The causes of some of the differences are examined in some detail, with particular attention to numerical convergence in the turbulence simulations (with respect to simulation mesh size, system size and, for particle-based simulations, the particle number). The implications for predictions of fusion plasma performance are also discussed.
$ python main.py
anonymous
Ballistic Focusing of Polyenergetic Protons Driven by Petawatt Laser Pulses
$ python main.py
anonymous
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"><mml:msub><mml:mi>Z</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math>Topological Order and the Quantum Spin Hall Effect
$ python main.py
anonymous
New Method for High-Accuracy Determination of the Fine-Structure Constant Based on Quantized Hall Resistance
$ python main.py
anonymous
▪ Abstract We present an overview of the lattice Boltzmann method (LBM), a parallel and efficient algorithm for simulating single-phase and multiphase fluid flows and for incorporating additional physical complexities. The LBM is especially useful for modeling complicated boundary conditions and multiphase interfaces. Recent extensions of this method are described, including simulations of fluid turbulence, suspension flows, and reaction diffusion systems.
$ python main.py
anonymous
Recovery of the Navier-Stokes equations using a lattice-gas Boltzmann method
$ python main.py
anonymous
New Monte Carlo technique for studying phase transitions
$ python main.py
anonymous
Nonuniversal critical dynamics in Monte Carlo simulations
$ python main.py
anonymous
Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules
$ python main.py
anonymous
Fast Parallel Algorithms for Short-Range Molecular Dynamics
$ python main.py
anonymous
Ab initio effective core potentials (ECP’s) have been generated to replace the Coulomb, exchange, and core-orthogonality effects of the chemically inert core electron in the transition metal atoms Sc to Hg. For the second and third transition series relative ECP’s have been generated which also incorporate the mass–velocity and Darwin relativistic effects into the potential. The ab initio ECP’s should facilitate valence electron calculations on molecules containing transition-metal atoms with accuracies approaching all-electron calculations at a fraction of the computational cost. Analytic fits to the potentials are presented for use in multicenter integral evaluation. Gaussian orbital valence basis sets are developed for the (3d,4s,4p), (4d,5s,5p), and (5d,6s,6p) orbitals of the first, second, and third transition series atoms, respectively. All-electron and valence-electron atomic excitation energies are also compared for the low-lying states of Sc–Hg, and the valence-electron calculations are found to reproduce the all-electron excitation energies (typically within a few tenths of an eV).
$ python main.py
anonymous
In molecular dynamics (MD) simulations the need often arises to maintain such parameters as temperature or pressure rather than energy and volume, or to impose gradients for studying transport properties in nonequilibrium MD. A method is described to realize coupling to an external bath with constant temperature or pressure with adjustable time constants for the coupling. The method is easily extendable to other variables and to gradients, and can be applied also to polyatomic molecules involving internal constraints. The influence of coupling time constants on dynamical variables is evaluated. A leap-frog algorithm is presented for the general case involving constraints with coupling to both a constant temperature and a constant pressure bath.
$ python main.py
anonymous
In the molecular dynamics simulation method for fluids, the equations of motion for a collection of particles in a fixed volume are solved numerically. The energy, volume, and number of particles are constant for a particular simulation, and it is assumed that time averages of properties of the simulated fluid are equal to microcanonical ensemble averages of the same properties. In some situations, it is desirable to perform simulations of a fluid for particular values of temperature and/or pressure or under conditions in which the energy and volume of the fluid can fluctuate. This paper proposes and discusses three methods for performing molecular dynamics simulations under conditions of constant temperature and/or pressure, rather than constant energy and volume. For these three methods, it is shown that time averages of properties of the simulated fluid are equal to averages over the isoenthalpic–isobaric, canonical, and isothermal–isobaric ensembles. Each method is a way of describing the dynamics of a certain number of particles in a volume element of a fluid while taking into account the influence of surrounding particles in changing the energy and/or density of the simulated volume element. The influence of the surroundings is taken into account without introducing unwanted surface effects. Examples of situations where these methods may be useful are discussed.
$ python main.py
anonymous
Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules
$ python main.py
anonymous
Efficient iterative schemes for<i>ab initio</i>total-energy calculations using a plane-wave basis set
$ python main.py
anonymous
Despite the remarkable thermochemical accuracy of Kohn–Sham density-functional theories with gradient corrections for exchange-correlation [see, for example, A. D. Becke, J. Chem. Phys. 96, 2155 (1992)], we believe that further improvements are unlikely unless exact-exchange information is considered. Arguments to support this view are presented, and a semiempirical exchange-correlation functional containing local-spin-density, gradient, and exact-exchange terms is tested on 56 atomization energies, 42 ionization potentials, 8 proton affinities, and 10 total atomic energies of first- and second-row systems. This functional performs significantly better than previous functionals with gradient corrections only, and fits experimental atomization energies with an impressively small average absolute deviation of 2.4 kcal/mol.
$ python main.py
anonymous
Generalized Gradient Approximation Made Simple
$ python main.py
anonymous
Inhomogeneous Electron Gas
$ python main.py
anonymous
Self-Consistent Equations Including Exchange and Correlation Effects
$ python main.py
anonymous
The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.
$ python main.py
anonymous
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.
$ python main.py
anonymous
Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. Notably, RoPE enables valuable properties, including the flexibility of sequence length, decaying inter-token dependency with increasing relative distances, and the capability of equipping the linear self-attention with relative position encoding. Finally, we evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets. Our experiments show that it consistently overcomes its alternatives. Furthermore, we provide a theoretical analysis to explain some experimental results. RoFormer is already integrated into Huggingface: \url{https://huggingface.co/docs/transformers/model_doc/roformer}.
$ python main.py
anonymous
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. We analyze the IO complexity of FlashAttention, showing that it requires fewer HBM accesses than standard attention, and is optimal for a range of SRAM sizes. We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method. FlashAttention trains Transformers faster than existing baselines: 15% end-to-end wall-clock speedup on BERT-large (seq. length 512) compared to the MLPerf 1.1 training speed record, 3$\times$ speedup on GPT-2 (seq. length 1K), and 2.4$\times$ speedup on long-range arena (seq. length 1K-4K). FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).
$ python main.py
anonymous
Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5$\times$ higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.
$ python main.py
anonymous
Mastering the game of Go without human knowledge
$ python main.py
anonymous
While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF). However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model. In this paper we introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form, allowing us to solve the standard RLHF problem with only a simple classification loss. The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight, eliminating the need for sampling from the LM during fine-tuning or performing significant hyperparameter tuning. Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods. Notably, fine-tuning with DPO exceeds PPO-based RLHF in ability to control sentiment of generations, and matches or improves response quality in summarization and single-turn dialogue while being substantially simpler to implement and train.
$ python main.py
anonymous
Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
$ python main.py
anonymous
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.
$ python main.py
anonymous
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.
$ python main.py
anonymous
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128$\times$128, 4.59 on ImageNet 256$\times$256, and 7.72 on ImageNet 512$\times$512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256$\times$256 and 3.85 on ImageNet 512$\times$512. We release our code at https://github.com/openai/guided-diffusion
$ python main.py
anonymous
We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.
$ python main.py
anonymous
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs. Code is available at https://github.com/CompVis/latent-diffusion .
$ python main.py
anonymous
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion
$ python main.py
anonymous
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.
$ python main.py
anonymous
In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets. Second, these features are also excellent k-NN classifiers, reaching 78.3% top-1 on ImageNet with a small ViT. Our study also underlines the importance of momentum encoder, multi-crop training, and the use of small patches with ViTs. We implement our findings into a simple self-supervised method, called DINO, which we interpret as a form of self-distillation with no labels. We show the synergy between DINO and ViTs by achieving 80.1% top-1 on ImageNet in linear evaluation with ViT-Base.
$ python main.py
anonymous
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.
$ python main.py
anonymous
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
$ python main.py
anonymous
We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4$\times$ more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher.
$ python main.py
anonymous
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.
$ python main.py
anonymous
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
$ python main.py
anonymous
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
$ python main.py
anonymous
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
$ python main.py
anonymous
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
$ python main.py
anonymous
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.
$ python main.py
anonymous
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.
$ python main.py
anonymous
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
$ python main.py
anonymous
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
$ python main.py
anonymous
Learning representations by back-propagating errors
$ python main.py
anonymous
We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.
$ python main.py
anonymous
Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .
$ python main.py
anonymous
Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90% success, with only 10 minutes worth of demonstrations. Project website: https://tonyzhaozh.github.io/aloha/
$ python main.py
anonymous
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).
$ python main.py
anonymous
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer1.github.io
$ python main.py
anonymous
We present a new optimization-based approach for robotic motion planning among obstacles. Like CHOMP (Covariant Hamiltonian Optimization for Motion Planning), our algorithm can be used to find collision-free trajectories from naïve, straight-line initializations that might be in collision. At the core of our approach are (a) a sequential convex optimization procedure, which penalizes collisions with a hinge loss and increases the penalty coefficients in an outer loop as necessary, and (b) an efficient formulation of the no-collisions constraint that directly considers continuous-time safety Our algorithm is implemented in a software package called TrajOpt. We report results from a series of experiments comparing TrajOpt with CHOMP and randomized planners from OMPL, with regard to planning time and path quality. We consider motion planning for 7 DOF robot arms, 18 DOF full-body robots, statically stable walking motion for the 34 DOF Atlas humanoid robot, and physical experiments with the 18 DOF PR2. We also apply TrajOpt to plan curvature-constrained steerable needle trajectories in the SE(3) configuration space and multiple non-intersecting curved channels within 3D-printed implants for intracavitary brachytherapy. Details, videos, and source code are freely available at: http://rll.berkeley.edu/trajopt/ijrr .
$ python main.py
anonymous
Minimum snap trajectory generation and control for quadrotors
$ python main.py
anonymous
In this paper, we propose a locomotion training framework where a control policy and a state estimator are trained concurrently. The framework consists of a policy network which outputs the desired joint positions and a state estimation network which outputs estimates of the robot's states such as the base linear velocity, foot height, and contact probability. We exploit a fast simulation environment to train the networks and the trained networks are transferred to the real robot. The trained policy and state estimator are capable of traversing diverse terrains such as a hill, slippery plate, and bumpy road. We also demonstrate that the learned policy can run at up to 3.75 m/s on normal flat ground and 3.54 m/s on a slippery plate with the coefficient of friction of 0.22.
$ python main.py
anonymous
Designing agile locomotion for quadruped robots often requires extensive expertise and tedious manual tuning. In this paper, we present a system to automate this process by leveraging deep reinforcement learning techniques. Our system can learn quadruped locomotion from scratch using simple reward signals. In addition, users can provide an open loop reference to guide the learning process when more control over the learned gait is needed. The control policies are learned in a physics simulator and then deployed on real robots. In robotics, policies trained in simulation often do not transfer to the real world. We narrow this reality gap by improving the physics simulator and learning robust policies. We improve the simulation using system identification, developing an accurate actuator model and simulating latency. We learn robust controllers by randomizing the physical environments, adding perturbations and designing a compact observation space. We evaluate our system on two agile locomotion gaits: trotting and galloping. After learning in simulation, a quadruped robot can successfully perform both gaits in the real world.
$ python main.py
anonymous
We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.
$ python main.py
anonymous
Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.
$ python main.py
anonymous
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
$ python main.py
anonymous
Prompt-based learning has emerged as a successful paradigm in natural language processing, where a single general-purpose language model can be instructed to perform any task specified by input prompts. Yet task specification in robotics comes in various forms, such as imitating one-shot demonstrations, following language instructions, and reaching visual goals. They are often considered different tasks and tackled by specialized models. We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens. Accordingly, we develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert trajectories for imitation learning, and a four-level evaluation protocol for systematic generalization. We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively. VIMA features a recipe that achieves strong model scalability and data efficiency. It outperforms alternative designs in the hardest zero-shot generalization setting by up to $2.9\times$ task success rate given the same training data. With $10\times$ less training data, VIMA still performs $2.7\times$ better than the best competing variant. Code and video demos are available at https://vimalabs.github.io/
$ python main.py
anonymous
Using synthetic data for training deep neural networks for robotic manipulation holds the promise of an almost unlimited amount of pre-labeled training data, generated safely out of harm's way. One of the key challenges of synthetic data, to date, has been to bridge the so-called reality gap, so that networks trained on synthetic data operate correctly when exposed to real-world data. We explore the reality gap in the context of 6-DoF pose estimation of known objects from a single RGB image. We show that for this problem the reality gap can be successfully spanned by a simple combination of domain randomized and photorealistic data. Using synthetic data generated in this manner, we introduce a one-shot deep neural network that is able to perform competitively against a state-of-the-art network trained on a combination of real and synthetic data. To our knowledge, this is the first deep network trained only on synthetic data that is able to achieve state-of-the-art performance on 6-DoF object pose estimation. Our network also generalizes better to novel environments including extreme lighting conditions, for which we show qualitative results. Using this network we demonstrate a real-time system estimating object poses with sufficient accuracy for real-world semantic grasping of known household objects in clutter by a real robot.
$ python main.py
anonymous
To reduce data collection time for deep learning of robust robotic grasp plans, we explore training from a synthetic dataset of 6.7 million point clouds, grasps, and analytic grasp metrics generated from thousands of 3D models from Dex-Net 1.0 in randomized poses on a table. We use the resulting dataset, Dex-Net 2.0, to train a Grasp Quality Convolutional Neural Network (GQ-CNN) model that rapidly predicts the probability of success of grasps from depth images, where grasps are specified as the planar position, angle, and depth of a gripper relative to an RGB-D sensor. Experiments with over 1,000 trials on an ABB YuMi comparing grasp planning methods on singulated objects suggest that a GQ-CNN trained with only synthetic data from Dex-Net 2.0 can be used to plan grasps in 0.8sec with a success rate of 93% on eight known objects with adversarial geometry and is 3x faster than registering point clouds to a precomputed dataset of objects and indexing grasps. The Dex-Net 2.0 grasp planner also has the highest success rate on a dataset of 10 novel rigid objects and achieves 99% precision (one false positive out of 69 grasps classified as robust) on a dataset of 40 novel household objects, some of which are articulated or deformable. Code, datasets, videos, and supplementary material are available at http://berkeleyautomation.github.io/dex-net .
$ python main.py
anonymous
Incorporating in-situ force sensing capabilities in a magnetic microrobot
$ python main.py
anonymous
A monocular visual-inertial system (VINS), consisting of a camera and a low-cost inertial measurement unit (IMU), forms the minimum sensor suite for metric six degrees-of-freedom (DOF) state estimation. However, the lack of direct distance measurement poses significant challenges in terms of IMU processing, estimator initialization, extrinsic calibration, and nonlinear optimization. In this work, we present VINS-Mono: a robust and versatile monocular visual-inertial state estimator.Our approach starts with a robust procedure for estimator initialization and failure recovery. A tightly-coupled, nonlinear optimization-based method is used to obtain high accuracy visual-inertial odometry by fusing pre-integrated IMU measurements and feature observations. A loop detection module, in combination with our tightly-coupled formulation, enables relocalization with minimum computation overhead.We additionally perform four degrees-of-freedom pose graph optimization to enforce global consistency. We validate the performance of our system on public datasets and real-world experiments and compare against other state-of-the-art algorithms. We also perform onboard closed-loop autonomous flight on the MAV platform and port the algorithm to an iOS-based demonstration. We highlight that the proposed work is a reliable, complete, and versatile system that is applicable for different applications that require high accuracy localization. We open source our implementations for both PCs and iOS mobile devices.
$ python main.py
anonymous
Reduced-order models and controllers for continuous-time stochastic systems: an information theory approach
$ python main.py
anonymous
Associating Uncertainty With Three-Dimensional Poses for Use in Estimation Problems
$ python main.py
anonymous
We present an information theoretic approach to stochastic optimal control problems that can be used to derive general sampling based optimization schemes. This new mathematical method is used to develop a sampling based model predictive control algorithm. We apply this information theoretic model predictive control (IT-MPC) scheme to the task of aggressive autonomous driving around a dirt test track, and compare its performance to a model predictive control version of the cross-entropy method.
$ python main.py
anonymous
Probabilistic roadmaps for path planning in high-dimensional configuration spaces
$ python main.py
anonymous
During the last decade, sampling-based path planning algorithms, such as probabilistic roadmaps (PRM) and rapidly exploring random trees (RRT), have been shown to work well in practice and possess theoretical guarantees such as probabilistic completeness. However, little effort has been devoted to the formal analysis of the quality of the solution returned by such algorithms, e.g. as a function of the number of samples. The purpose of this paper is to fill this gap, by rigorously analyzing the asymptotic behavior of the cost of the solution returned by stochastic sampling-based algorithms as the number of samples increases. A number of negative results are provided, characterizing existing algorithms, e.g. showing that, under mild technical conditions, the cost of the solution returned by broadly used sampling-based algorithms converges almost surely to a non-optimal value. The main contribution of the paper is the introduction of new algorithms, namely, PRM* and RRT* , which are provably asymptotically optimal, i.e. such that the cost of the returned solution converges almost surely to the optimum. Moreover, it is shown that the computational complexity of the new algorithms is within a constant factor of that of their probabilistically complete (but not asymptotically optimal) counterparts. The analysis in this paper hinges on novel connections between stochastic sampling-based path planning algorithms and the theory of random geometric graphs.
$ python main.py
anonymous
We propose a novel direct sparse visual odometry formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry -- represented as inverse depth in a reference frame -- and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. Since our method does not depend on keypoint detectors or descriptors, it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on mostly white walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.
$ python main.py
anonymous
Simultaneous localization and mapping: part I
$ python main.py
anonymous
Intelligent control algorithms for robotic-assisted beating heart surgery
$ python main.py
anonymous
ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras
$ python main.py
anonymous
ORB-SLAM: A Versatile and Accurate Monocular SLAM System
$ python main.py
anonymous
Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called "neural tangent kernel" perspective. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities. Numerical results are provided to support our claim.
$ python main.py
anonymous
Suppose that $M$ is a Kähler manifold with a pole such that its holomorphic sectional curvature is bounded from below by a constant and its radial sectional curvature is also bounded from below. Suppose that $N$ is a strongly pseudoconvex complex Finsler manifold such that its holomorphic sectional curvature is bounded from above by a negative constant. In this paper, we establish a Schwarz lemma for holomorphic mappings $f$ form $M$ into $N$. As applications, we obtain a Liouville type rigidity result for holomorphic mappings $f$ from $M$ into $N$, as well as a rigidity theorem for bimeromorphic mappings from a compact complex manifold into a compact complex Finsler manifold.
$ python main.py
anonymous
Wide-Area Monitoring, Protection, and Control of Future Electric Power Networks
$ python main.py
anonymous
This paper investigates the design and performance of delayed bit-interleaved coded modulation (DBICM) with low-density parity-check (LDPC) codes. For Gray labeled square $M$-ary quadrature amplitude modulation (QAM) constellations, we investigate the optimal delay scheme with the largest spectrum efficiency of DBICM for a fixed maximum number of delayed time slots and a given signal-to-noise ratio. When analyzing the capacity of DBICM, we find two important properties: the capacity improvement due to delayed coded bits being mapped to the real and imaginary parts of the transmitted symbols are independent of each other; a pair of delay schemes with delayed coded bits having identical bit-channel capacity lead to equivalent DBICM capacity. Using these two properties, we efficiently optimize the delay scheme for any uniform Gray-QAM systems. Furthermore, these two properties enable efficient LDPC code designs regarding unequal error protection via bit-channel type classifications. Moreover, we use protograph-based extrinsic information transfer charts to jointly optimize degree distributions and channel assignments of LDPC codes and propose a constrained progressive edge growth like algorithm to jointly construct LDPC codes and bit-interleavers for DBICM, taking distinctive bit-channel's capacity into account. Simulation results demonstrate that the designed LDPC coded DBICM systems significantly outperform LDPC coded BICM systems.
$ python main.py
anonymous
Numerical solution of initial boundary value problems involving maxwell's equations in isotropic media
$ python main.py
anonymous
Autonomous Demand-Side Management Based on Game-Theoretic Energy Consumption Scheduling for the Future Smart Grid
$ python main.py
anonymous
Hierarchical Control of Droop-Controlled AC and DC Microgrids—A General Approach Toward Standardization
$ python main.py
anonymous
MicroGrids
$ python main.py
anonymous
Extended Kalman filtering for battery management systems of LiPB-based HEV battery packs
$ python main.py
anonymous
Modeling of Galvanostatic Charge and Discharge of the Lithium/Polymer/Insertion Cell
$ python main.py
sayonsom
Resiliency-Driven Proactive Distribution System Reconfiguration With Synchrophasor Data
$ python main.py
anonymous
Climate change impacts on wind energy: A review
$ python main.py
anonymous
Comparison of Photovoltaic Array Maximum Power Point Tracking Techniques
$ python main.py
anonymous
PerCom 2008 advertisement
$ python main.py
sayonsom
Defining and Enabling Resiliency of Electric Distribution Systems With Multiple Microgrids
$ python main.py
anonymous
False data injection attacks against state estimation in electric power grids
$ python main.py
anonymous
VSC-Based HVDC Power Transmission Systems: An Overview
$ python main.py
sayonsom
A Novel Metric to Quantify and Enable Resilient Distribution System Using Graph Theory and Choquet Integral
$ python main.py
anonymous
AbstractWe construct orthonormal bases of compactly supported wavelets, with arbitrarily high regularity. The order of regularity increases linearly with the support width. We start by reviewing the concept of multiresolution analysis as well as several algorithms in vision decomposition and reconstruction. The construction then follows from a synthesis of these different approaches.
$ python main.py
anonymous
An algorithm for the machine calculation of complex Fourier series
$ python main.py
anonymous
Smoothing and Differentiation of Data by Simplified Least Squares Procedures.
$ python main.py
anonymous
The classical filtering and prediction problem is re-examined using the Bode-Shannon representation of random processes and the “state-transition” method of analysis of dynamic systems. New results are: (1) The formulation and methods of solution of the problem apply without modification to stationary and nonstationary statistics and to growing-memory and infinite-memory filters. (2) A nonlinear difference (or differential) equation is derived for the covariance matrix of the optimal estimation error. From the solution of this equation the co-efficients of the difference (or differential) equation of the optimal linear filter are obtained without further calculations. (3) The filtering problem is shown to be the dual of the noise-free regulator problem. The new method developed here is applied to two well-known problems, confirming and extending earlier results. The discussion is largely self-contained and proceeds from first principles; basic concepts of the theory of random processes are reviewed in the Appendix.
$ python main.py
anonymous
Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.
$ python main.py
anonymous
We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs. We motivate the choice of our convolutional architecture via a localized first-order approximation of spectral graph convolutions. Our model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. In a number of experiments on citation networks and on a knowledge graph dataset we demonstrate that our approach outperforms related methods by a significant margin.
$ python main.py
anonymous
Many networks of interest in the sciences, including social networks, computer networks, and metabolic and regulatory networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure is one of the outstanding issues in the study of networked systems. One highly effective approach is the optimization of the quality function known as “modularity” over the possible divisions of a network. Here I show that the modularity can be expressed in terms of the eigenvectors of a characteristic matrix for the network, which I call the modularity matrix, and that this expression leads to a spectral algorithm for community detection that returns results of demonstrably higher quality than competing methods in shorter running times. I illustrate the method with applications to several published network data sets.
$ python main.py
anonymous
Normalized cuts and image segmentation
$ python main.py
anonymous
Introduction. The problem discussed in this paper was formulated by T. Harris as follows:“Consider a rail network connecting two cities by way of a number of intermediate cities, where each link of the network has a number assigned to it representing its capacity. Assuming a steady state condition, find a maximal flow from one given city to the other.”
$ python main.py
anonymous
A note on two problems in connexion with graphs
$ python main.py
anonymous
Causal inference in statistics: An overview
$ python main.py
anonymous
Estimating causal effects of treatments in randomized and nonrandomized studies.
$ python main.py
anonymous
Summary The analysis of censored failure times is considered. It is assumed that on each individual are available values of one or more explanatory variables. The hazard function (age-specific failure rate) is taken to be a function of the explanatory variables and unknown regression coefficients multiplied by an arbitrary and unknown function of time. A conditional likelihood is obtained, leading to inferences about the unknown regression coefficients. Some generalizations are outlined.
$ python main.py
anonymous
Bootstrap Methods: Another Look at the Jackknife
$ python main.py
anonymous
SummaryWe consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.
$ python main.py
anonymous
Sampling-Based Approaches to Calculating Marginal Densities
$ python main.py
anonymous
We study convex optimization problems for which the data is not specified exactly and it is only known to belong to a given uncertainty set U, yet the constraints must hold for all possible values of the data from U. The ensuing optimization problem is called robust optimization. In this paper we lay the foundation of robust convex optimization. In the main part of the paper we show that if U is an ellipsoidal uncertainty set, then for some of the most important generic convex optimization problems (linear programming, quadratically constrained programming, semidefinite programming and others) the corresponding robust convex program is either exactly, or approximately, a tractable problem which lends itself to efficient algorithms such as polynomial time interior point methods.
$ python main.py
anonymous
State-space solutions to standard H/sub 2/ and H/sub infinity / control problems
$ python main.py
anonymous
Identification and control of dynamical systems using neural networks
$ python main.py
anonymous
An adaptive controller which provides Lyapunov stability
$ python main.py
anonymous
Coordination of groups of mobile autonomous agents using nearest neighbor rules
$ python main.py
anonymous
Consensus Problems in Networks of Agents With Switching Topology and Time-Delays
$ python main.py
anonymous
Automatic tuning of simple regulators with specifications on phase and amplitude margins
$ python main.py
anonymous
Abstract In this paper, the three principal control effects found in present controllers are examined and practical names and units of measurement are proposed for each effect. Corresponding units are proposed for a classification of industrial processes in terms of the two principal characteristics affecting their controllability. Formulas are given which enable the controller settings to be determined from the experimental or calculated values of the lag and unit reaction rate of the process to be controlled. These units form the basis of a quick method for adjusting a controller on the job. The effect of varying each controller setting is shown in a series of chart records. It is believed that the conceptions of control presented in this paper will be of assistance in the adjustment of existing controller applications and in the design of new installations.
$ python main.py
anonymous
Model predictive control: Theory and practice—A survey
$ python main.py
anonymous
Constrained model predictive control: Stability and optimality
$ python main.py
anonymous
How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions are two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.
$ python main.py
anonymous
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
$ python main.py
anonymous
Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g., service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this monograph discusses recent advances and presents an extensive collection of open problems and challenges.
$ python main.py
anonymous
Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.
$ python main.py
anonymous
Human-level control through deep reinforcement learning
$ python main.py
anonymous
Q-learning
$ python main.py
anonymous
Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori , hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.
$ python main.py
anonymous
Least squares quantization in PCM
$ python main.py
anonymous
Support-vector networks
$ python main.py
anonymous
Random Forests
$ python main.py
anonymous
SUNDIALS is a suite of advanced computational codes for solving large-scale problems that can be modeled as a system of nonlinear algebraic equations, or as initial-value problems in ordinary differential or differential-algebraic equations. The basic versions of these codes are called KINSOL, CVODE, and IDA, respectively. The codes are written in ANSI standard C and are suitable for either serial or parallel machine environments. Common and notable features of these codes include inexact Newton-Krylov methods for solving large-scale nonlinear systems; linear multistep methods for time-dependent problems; a highly modular structure to allow incorporation of different preconditioning and/or linear solver methods; and clear interfaces allowing for users to provide their own data structures underneath the solvers. We describe the current capabilities of the codes, along with some of the algorithms and heuristics used to achieve efficiency and robustness. We also describe how the codes stem from previous and widely used Fortran 77 solvers, and how the codes have been augmented with forward and adjoint methods for carrying out first-order sensitivity analysis with respect to model parameters or initial conditions.
$ python main.py
anonymous
A family of embedded Runge-Kutta formulae
$ python main.py
anonymous
An analysis of the academic literature on simulation and modelling in health care
$ python main.py
anonymous
Fifty years ago, the author published a paper in Operations Research with the title, “A proof for the queuing formula: L = λW” [Little, J. D. C. 1961. A proof for the queuing formula: L = λW. Oper. Res. 9(3) 383–387]. Over the years, L = λW has become widely known as “Little's Law.” Basically, it is a theorem in queuing theory. It has become well known because of its theoretical and practical importance. We report key developments in both areas with the emphasis on practice. In the latter, we collect new material and search for insights on the use of Little's Law within the fields of operations management and computer architecture.
$ python main.py
anonymous
Tutorial on agent-based modelling and simulation
$ python main.py
anonymous
Agent-based modeling is a powerful simulation modeling technique that has seen a number of applications in the last few years, including applications to real-world business problems. After the basic principles of agent-based simulation are briefly introduced, its four areas of application are discussed by using real-world applications: flow simulation, organizational simulation, market simulation, and diffusion simulation. For each category, one or several business applications are described and analyzed.
$ python main.py
anonymous
An Introduction to MCMC for Machine Learning
$ python main.py
anonymous
Monte Carlo sampling methods using Markov chains and their applications
$ python main.py
anonymous
Particle swarm optimization
$ python main.py
anonymous
There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.
$ python main.py
anonymous
On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming
$ python main.py
anonymous
A Limited Memory Algorithm for Bound Constrained Optimization
$ python main.py
anonymous
Semidefinite Programming
$ python main.py
anonymous
Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for ℓ1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.
$ python main.py
anonymous
A Computationally Efficient Mixed-Integer Linear Formulation for the Thermal Unit Commitment Problem
$ python main.py
anonymous
Generalized Benders decomposition
$ python main.py
anonymous
MATPOWER: Steady-State Operations, Planning, and Analysis Tools for Power Systems Research and Education
$ python main.py
anonymous
A new polynomial-time algorithm for linear programming
$ python main.py
sayonsom
Abstract The distinctive feature of the adaptive immune system is its ability to generate immunological memory that can provide defense against subsequent infections. In the case of antibody-mediated immune responses, this memory comes in two cellular forms: plasma cells (PCs) and memory B cells (MBCs). PCs protect against reinfection by constitutively producing antibodies. The presence of a diverse pool of MBCs, which can expand and differentiate into PCs in secondary immune responses, is thought to be particularly important for defense against new pathogen variants. Recent studies have shown that the MBC compartment is far more heterogeneous than previously anticipated. This heterogeneity, among other factors, is shaped by their developmental pathway (germinal center (GC) vs non-GC-derived MBCs), the duration and strength of antigenic stimulation, anatomical and microanatomical localization, and the timing of generation in ontogeny. Combinations of these “layers” of MBC identities can define MBCs’ properties and their fate in recall responses. Here, we review the mechanisms underlying MBC differentiation, maintenance, and reactivation and explore how the layered identity of MBCs contributes to the functions of these cells.
$ python main.py
cross-domain-seed
Statistical downscaling of GCM outputs to regional (10 km) daily precipitation and temperature grids using a transformer-based bias-correction scheme trained on CMIP6 ensembles. Produces uncertainty-quantified local climate projections.
$ python main.py
cross-domain-seed
Computable general equilibrium (CGE) model of labor markets with heterogeneous skill types, frictional search, and exogenous productivity shocks. Solves for wages, unemployment rates, and cross-sector flows under counterfactual policy shocks.
$ python main.py
cross-domain-seed
Stochastic SIR epidemic simulator on a heterogeneous contact network, supporting targeted intervention policies (lockdown, vaccination, contact tracing) with variable compliance rates.
$ python main.py
anonymous
Review of Small-Signal Converter-Driven Stability Issues in Power Systems
$ python main.py
anonymous
Coordinated Planning of Electric Vehicle Charging Infrastructure and Renewables in Power Grids
$ python main.py
anonymous
Reconfigurable Real-Time Power Grid Emulator for Systems With High Penetration of Renewables
$ python main.py
anonymous
On Modeling Depths of Power Electronic Circuits for Real-Time Simulation – A Comparative Analysis for Power Systems
$ python main.py
anonymous
Optimal Energy Dispatch of Distributed PVs for the Next Generation of Distribution Management Systems
$ python main.py
anonymous
100% Sustainable Electricity in the Faroe Islands: Expansion Planning Through Economic Optimization
$ python main.py
anonymous
Proposes an adaptive droop control method to suppress circulating currents among parallel DC-DC converters in DC microgrids. The method dynamically adjusts droop resistance based on measured circulating currents to achieve proportional load sharing and reduce bus voltage deviation.
$ python main.py
anonymous
This paper presents a systematic review of the evolution of research in microgrids control, covering publications from 2000 to 2019. The review analyzes trends in publication counts, research themes, control methodologies, and collaboration patterns to map the landscape of microgrid control research and identify emerging directions.
$ python main.py
anonymous
This paper presents a paradigm shift in power systems planning and operation by integrating wildfire risk into operational decision-making. A composite wildfire risk index is defined for transmission lines based on vegetation, weather, and topographic factors. A risk-constrained optimal power flow formulation is proposed that allows selective de-energization of high-risk lines to reduce wildfire ignition probability while minimizing load curtailment and operational cost.
$ python main.py
anonymous
This paper proposes a fast frequency support (FFS) control strategy for wind turbine systems to arrest the frequency nadir close to the settling frequency following a generation-load imbalance. The strategy coordinates derivative and proportional power injection from wind turbines to minimize the frequency deviation between nadir and settling point, thereby improving frequency stability.
$ python main.py
anonymous
This paper investigates federated learning for short-term residential load forecasting, addressing privacy concerns in smart grid data. Local LSTM models are trained on individual household data and aggregated using FedAvg, demonstrating competitive accuracy compared to centralised training while preserving data privacy.
$ python main.py
anonymous
This paper presents a methodology for building highly detailed synthetic electric grid datasets that combine transmission and distribution systems. The approach starts with a synthetic transmission system and attaches synthetic distribution feeders at each load bus, scaling loads to match transmission-level demand. The resulting datasets preserve statistical properties of real grids while containing no confidential information, enabling open sharing for research and education.
$ python main.py
anonymous
This paper presents a comprehensive catalogue of test distribution systems including network parameters and diagrams. The paper provides standardized data for radial and meshed distribution test feeders commonly used in power systems research, enabling reproducible benchmarking of distribution system analysis and optimization algorithms.
$ python main.py
anonymous
This paper proposes a unified definition of energy quality that encompasses voltage quality, frequency quality, and waveform quality into a composite Energy Quality Index (EQI). The index quantifies the degree to which delivered electrical energy meets ideal standards, enabling comparison across systems and time periods.
$ python main.py
anonymous
This paper presents model-based and data-driven HVAC control strategies for residential demand response. A first-order RC thermal model captures building thermal dynamics. Model predictive control (MPC) optimizes HVAC scheduling to minimize energy cost while satisfying thermal comfort constraints. A data-driven rule-based strategy provides a computationally lighter alternative. Both strategies are evaluated under time-of-use pricing and direct load control scenarios.
$ python main.py
anonymous
This paper proposes a distributed secondary control scheme for islanded microgrids using a generalized PI finite-time controller. The controller guarantees finite-time convergence for both frequency and voltage restoration, as well as proportional active power sharing among distributed generators. The communication topology is modeled as a directed graph and the consensus protocol is designed based on Lyapunov-based finite-time stability theory.
$ python main.py
anonymous
This paper presents a comprehensive review of space microgrids for future manned lunar bases. It covers power generation technologies including solar photovoltaic arrays, nuclear power (RTG and fission), and fuel cells. Energy storage options including batteries and regenerative fuel cells are discussed. The paper reviews microgrid architectures (DC, AC, and hybrid), power management and control strategies, and reliability considerations specific to the lunar environment. Key challenges such as the 14-day lunar night, radiation, thermal extremes, and dust contamination are addressed.
$ python main.py
anonymous
This paper argues that the natural language of electricity markets is complementarity, not optimization. While market clearing is often formulated as a social welfare maximization problem, the underlying equilibrium conditions are complementarity conditions derived from KKT optimality. The paper presents both energy-only and network-constrained market models, derives their KKT/complementarity conditions, and solves them as linear complementarity problems (LCPs) to find market equilibria including locational marginal prices.
$ python main.py
anonymous
This paper presents a comprehensive statistical analysis of faults occurring on 220-kV and above transmission lines in a southern coastal provincial power grid of China over the period 2009 to 2018. The study analyzes fault frequency rates, fault types, fault causes, reclosing success rates, seasonal distributions, and voltage-level characteristics to identify key risk factors and guide grid maintenance strategies.
$ python main.py
anonymous
This paper presents a system-level design framework for reliability and maintenance scheduling in modern power electronic-based power systems. Mission-profile-based stress analysis is combined with component lifetime models to obtain failure rates. Markov chain models compute system reliability indices. Maintenance scheduling optimization minimizes lifecycle cost subject to availability constraints.
$ python main.py
anonymous
This paper presents the results of a day-ahead electricity demand forecasting competition that was motivated by the unprecedented changes in electricity consumption patterns caused by the COVID-19 pandemic. Participants were challenged to forecast 24-hour-ahead electricity demand for a large North American utility using data spanning the COVID-19 period. The paper describes the competition design, the evaluation framework, and the methods employed by top-performing teams, highlighting that ensemble and machine-learning approaches outperformed classical statistical baselines.
$ python main.py
anonymous
Optimal dispatch of a battery energy storage system participating in energy and frequency regulation markets while peak shaving an EV fast charging station load using mixed-integer linear programming.
$ python main.py
anonymous
This paper presents an optimal scheduling framework for merchant-owned energy storage systems (ESS) participating in multiple ancillary service markets alongside energy arbitrage. The proposed mixed-integer linear program maximises expected daily profit by co-optimising bids for energy, spinning reserve, regulation up, and regulation down over a 24-hour horizon while respecting physical ESS constraints including state-of-charge limits, power limits, round-trip efficiency, and battery degradation costs.
$ python main.py
anonymous
This paper presents a software-defined microgrid (SDM) control framework that decouples the cyber and physical layers of microgrids. By abstracting physical resources into software-defined virtual resources, the SDM enables flexible, programmable, and resilient microgrid control. The proposed architecture supports decoupled cyber-physical operation and demonstrates improved power sharing, frequency restoration, and voltage regulation compared to conventional droop control.
$ python main.py
anonymous
Proposes an adaptive droop control method to suppress circulating currents among parallel DC-DC converters in DC microgrids. The method dynamically adjusts droop resistance based on measured circulating currents to achieve proportional load sharing and reduce bus voltage deviation.
$ python main.py
anonymous
This paper presents a systematic review of the evolution of research in microgrids control, covering publications from 2000 to 2019. The review analyzes trends in publication counts, research themes, control methodologies, and collaboration patterns to map the landscape of microgrid control research and identify emerging directions.
$ python main.py
anonymous
This paper presents a paradigm shift in power systems planning and operation by integrating wildfire risk into operational decision-making. A composite wildfire risk index is defined for transmission lines based on vegetation, weather, and topographic factors. A risk-constrained optimal power flow formulation is proposed that allows selective de-energization of high-risk lines to reduce wildfire ignition probability while minimizing load curtailment and operational cost.
$ python main.py
anonymous
This paper proposes a fast frequency support (FFS) control strategy for wind turbine systems to arrest the frequency nadir close to the settling frequency following a generation-load imbalance. The strategy coordinates derivative and proportional power injection from wind turbines to minimize the frequency deviation between nadir and settling point, thereby improving frequency stability.
$ python main.py
anonymous
This paper investigates federated learning for short-term residential load forecasting, addressing privacy concerns in smart grid data. Local LSTM models are trained on individual household data and aggregated using FedAvg, demonstrating competitive accuracy compared to centralised training while preserving data privacy.
$ python main.py
anonymous
This paper presents a methodology for building highly detailed synthetic electric grid datasets that combine transmission and distribution systems. The approach starts with a synthetic transmission system and attaches synthetic distribution feeders at each load bus, scaling loads to match transmission-level demand. The resulting datasets preserve statistical properties of real grids while containing no confidential information, enabling open sharing for research and education.
$ python main.py
anonymous
This paper presents a comprehensive catalogue of test distribution systems including network parameters and diagrams. The paper provides standardized data for radial and meshed distribution test feeders commonly used in power systems research, enabling reproducible benchmarking of distribution system analysis and optimization algorithms.
$ python main.py
anonymous
This paper proposes a unified definition of energy quality that encompasses voltage quality, frequency quality, and waveform quality into a composite Energy Quality Index (EQI). The index quantifies the degree to which delivered electrical energy meets ideal standards, enabling comparison across systems and time periods.
$ python main.py
anonymous
This paper presents model-based and data-driven HVAC control strategies for residential demand response. A first-order RC thermal model captures building thermal dynamics. Model predictive control (MPC) optimizes HVAC scheduling to minimize energy cost while satisfying thermal comfort constraints. A data-driven rule-based strategy provides a computationally lighter alternative. Both strategies are evaluated under time-of-use pricing and direct load control scenarios.
$ python main.py
anonymous
This paper proposes a distributed secondary control scheme for islanded microgrids using a generalized PI finite-time controller. The controller guarantees finite-time convergence for both frequency and voltage restoration, as well as proportional active power sharing among distributed generators. The communication topology is modeled as a directed graph and the consensus protocol is designed based on Lyapunov-based finite-time stability theory.
$ python main.py
anonymous
This paper presents a comprehensive review of space microgrids for future manned lunar bases. It covers power generation technologies including solar photovoltaic arrays, nuclear power (RTG and fission), and fuel cells. Energy storage options including batteries and regenerative fuel cells are discussed. The paper reviews microgrid architectures (DC, AC, and hybrid), power management and control strategies, and reliability considerations specific to the lunar environment. Key challenges such as the 14-day lunar night, radiation, thermal extremes, and dust contamination are addressed.
$ python main.py
anonymous
This paper argues that the natural language of electricity markets is complementarity, not optimization. While market clearing is often formulated as a social welfare maximization problem, the underlying equilibrium conditions are complementarity conditions derived from KKT optimality. The paper presents both energy-only and network-constrained market models, derives their KKT/complementarity conditions, and solves them as linear complementarity problems (LCPs) to find market equilibria including locational marginal prices.
$ python main.py
anonymous
This paper presents a comprehensive statistical analysis of faults occurring on 220-kV and above transmission lines in a southern coastal provincial power grid of China over the period 2009 to 2018. The study analyzes fault frequency rates, fault types, fault causes, reclosing success rates, seasonal distributions, and voltage-level characteristics to identify key risk factors and guide grid maintenance strategies.
$ python main.py
anonymous
This paper presents a system-level design framework for reliability and maintenance scheduling in modern power electronic-based power systems. Mission-profile-based stress analysis is combined with component lifetime models to obtain failure rates. Markov chain models compute system reliability indices. Maintenance scheduling optimization minimizes lifecycle cost subject to availability constraints.
$ python main.py
anonymous
This paper presents the results of a day-ahead electricity demand forecasting competition that was motivated by the unprecedented changes in electricity consumption patterns caused by the COVID-19 pandemic. Participants were challenged to forecast 24-hour-ahead electricity demand for a large North American utility using data spanning the COVID-19 period. The paper describes the competition design, the evaluation framework, and the methods employed by top-performing teams, highlighting that ensemble and machine-learning approaches outperformed classical statistical baselines.
$ python main.py
anonymous
Optimal dispatch of a battery energy storage system participating in energy and frequency regulation markets while peak shaving an EV fast charging station load using mixed-integer linear programming.
$ python main.py
anonymous
This paper presents an optimal scheduling framework for merchant-owned energy storage systems (ESS) participating in multiple ancillary service markets alongside energy arbitrage. The proposed mixed-integer linear program maximises expected daily profit by co-optimising bids for energy, spinning reserve, regulation up, and regulation down over a 24-hour horizon while respecting physical ESS constraints including state-of-charge limits, power limits, round-trip efficiency, and battery degradation costs.
$ python main.py
anonymous
This paper presents a software-defined microgrid (SDM) control framework that decouples the cyber and physical layers of microgrids. By abstracting physical resources into software-defined virtual resources, the SDM enables flexible, programmable, and resilient microgrid control. The proposed architecture supports decoupled cyber-physical operation and demonstrates improved power sharing, frequency restoration, and voltage regulation compared to conventional droop control.
$ python main.py
sayonsom
Text-to-video diffusion models have enabled open-ended video synthesis, but often struggle with generating the correct number of objects specified in a prompt.
$ python main.py
sayonsom
We introduce TurboDiffusion, a video generation acceleration framework that can speed up end-to-end diffusion generation by 100-200x while maintaining video quality.
$ python setup.py