Artificial Intelligence

Performance research papers

Featured research

A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

Large language models (LLMs) have improved through larger models and data, but gains are slowing. Recent methods scale computation at inference using reward models, treating it as a search problem prone to reward hacking. This paper reframes it as probabilistic inference, using sampling to explore state distributions in a state-space model with approximate likelihood. It introduces a new approach adapting particle-based Monte Carlo methods, achieving 4-16x better scaling than deterministic search on math reasoning tasks. Qwen2.5-Math-1.5B-Instruct beats GPT-4o in 4 rollouts, and Qwen2.5-Math-7B-Instruct reaches o1 accuracy in 32 rollouts. This links probabilistic inference to LLM scaling.


Github 

Download Video

Research paper static img

All research papers

  • Filter by:

"Give Me BF16 or Give Me Death?" Accuracy-Performance Trade-Offs in LLM Quantization

Despite the popularity of large language model (LLM) quantization for inference acceleration, significant uncertainty remains regarding the accuracy-performance trade-offs associated with various quantization formats. We present a comprehensive empirical study of quantized accuracy, evaluating popular quantization formats (FP8, INT8, INT4) across academic benchmarks and real-world tasks, on the entire Llama-3.1 model family. Additionally, our study examines the difference in text generated by quantized models versus their uncompressed counterparts. Beyond benchmarks, we also present a couple of quantization improvements which allowed us to obtain state-of-the-art accuracy recovery results.

On the Complexity of Neural Computation in Superposition

This paper explores the theoretical foundations of computing in superposition within neural networks, focusing on explicit, provably correct algorithms and their efficiency. Our results demonstrate that for a broad class of problems, including permutations and pairwise logical operations, a neural network computing in superposition requires a significant number of parameters and neurons. We establish that any sparse sub-network must have a considerable number of parameters, irrespective of the original dense network size. We present an upper bound showing that pairwise logical operations, such as AND, can be computed using a relatively efficient number of neurons and parameters. 

Sparse Finetuning for Inference Acceleration of Large Language Models

We consider the problem of accurate sparse finetuning of large language models (LLMs), that is, finetuning pretrained LLMs on specialized tasks, while inducing sparsity in their weights. On the accuracy side, we observe that standard loss-based finetuning may fail to recover accuracy, especially at high sparsities. To address this, we perform a detailed study of distillation-type losses, determining an L2-based distillation approach we term SquareHead which enables accurate recovery even at higher sparsities, across all model types.

Github

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. When executing SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, we can reach 60% sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches.

Github

Sparse Expansion and Neuronal Disentanglement

We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach Sparse Expansion. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other one-shot sparsification approaches for the same inference FLOP budget per token, and that this gap grows as sparsity increases, leading to inference speedups. But why? To answer this, we provide strong evidence that the mixture of sparse experts is effectively disentangling the input-output relationship of every individual neuron across clusters of inputs.

Github

Sparse*BERT: Sparse Models are Robust

This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks. Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches. We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text. Moreover, we show that SparseBioBERT can match the quality of BioBERT with only 10\% of the parameters.

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

We show how you can compound multiple sparsification techniques to compress transformer-based NLP models for better inference performance. Results: 10x model size compression with < 1% relative drop in accuracy to the dense BERT-base, 10x end-to-end CPU-inference speedup with < 2% relative drop in accuracy, and 29x inference speedups with < 7.5% relative accuracy drop.

 

How Well Do Sparse Imagenet Models Transfer?

In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups.

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

We propose two new algorithms as part of a framework called M-FAC. These two algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods.

Github

Asynchronous Decentralized SGD with Quantized and Local Updates

We show that a variant of SGD called SwarmSGD still converges in this setting, even if non-blocking communication, quantization, and local steps are all applied in conjunction, and even if the node data distributions and underlying graph topology are both heterogenous. We implement this algorithm and deploy it in a super-computing environment, showing that it can outperform previous decentralized methods in terms of end-to-end training time, and that it can even rival carefully-tuned large-batch SGD for certain tasks.

 

AC/DC: Alternating Compressed / DeCompressed Training of Deep Neural Networks

Existing sparse training methods are mainly empirical and often have lower accuracy relative to the dense baseline. We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of DNNs, demonstrate convergence for a variant of the algorithm, and show that AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets; at high sparsity levels, AC/DC even outperforms existing methods that rely on accurate pre-trained dense models.

Github

Towards Tight Communication Lower Bounds for Distributed Optimization

We consider a standard distributed optimisation setting where N machines, each holding a d-dimensional function fi , aim to jointly minimise the sum of the functions PN i=1 fi(x). This problem arises naturally in large-scale distributed optimization, where a standard solution is to apply variants of (stochastic) gradient descent. Our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the N machines to solve this problem under pointto-point communication, within a given error-tolerance. Our results bring over tools from communication complexity to distributed optimisation, which has potential for further applications.

Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks

The future of deep learning is sparse! See our overview of the field and upcoming opportunities for how to gain 10-100x performance to fuel the next AI revolution. HPC techniques will be key as large-scale training is supercomputing.

On the Predictability of Pruning Across Scales

We show that the error of magnitude-pruned networks follows a scaling law, and that this law is of a fundamentally different nature than that of unpruned networks.

 

WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

Learn about the WoodFisher optimization method for efficient second-order approximation for neural network compression.

Github

Relaxed Scheduling for Scalable Belief Propagation

Learn about efficient parallel algorithms for the key machine learning task of inference on graphical models, in particular on the fundamental belief propagation algorithm.

 

Adaptive Gradient Quantization for Data-Parallel SGD

In this paper, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups.

Github

Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference

Learn how to gain significant performance by inducing and exploiting activation sparsity for fast neural network inference. Download the paper here.

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models

Weight quantization in Large Language Models (LLMs) reduces model size and speeds up single-user inference on GPUs with minimal accuracy loss. However, its effectiveness in batched settings with multiple clients was uncertain. This paper introduces MARLIN, Mixed-precision Auto-Regressive LINear kernels, which achieve near-maximum quantization speedups (up to 4×) for batch sizes of 16-32, and significant speedups for up to 64-128, using techniques like asynchronous memory access and complex scheduling. Integrated with vLLM, MARLIN offers up to 2.8× end-to-end LLM inference speedups and supports further compression like NVIDIA 2:4 sparsity.

Github

 

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Text-to-image diffusion models, vital for generating high-quality images from text, have grown to billions of parameters, making them less accessible in resource-limited settings. Post-training quantization (PTQ) compresses pretrained weights to lower bits. While uniform scalar quantization achieves decent results at 4 bits, this study explores vector quantization (VQ) for greater compression. Tailoring VQ-based PTQ to billion-scale models like SDXL and SDXL-Turbo, we compress 2B+ parameter models to ~3 bits, maintaining image quality and textual alignment comparable to 4-bit methods.

Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models

Mathador-LM is a new benchmark assessing mathematical reasoning in large language models (LLMs) via ruleset interpretation, planning, and problem-solving, based on the Mathador game. Players use arithmetic to hit a target number from base numbers under simple rules. Dynamically generated instances ensure stable difficulty and prevent test-set leakage into training data, a common benchmark flaw. Evaluating top open and closed-source LLMs, we find they underperform compared to 3rd graders, despite excelling on other math benchmarks.

Github

Wasserstein Distances, Neuronal Entanglement, and Sparsity

This study explores disentangling polysemantic neurons in large language models (LLMs) to interpret performance under weight sparsity, a key optimization method. We introduce a new metric, Wasserstein distance, to gauge neuronal entanglement by comparing a neuron’s output distribution to a Gaussian. We identify "Wasserstein Neurons" with non-Gaussian outputs that strongly affect accuracy and map similar inputs to dissimilar outputs. Our novel framework splits layer inputs, creating a mixture of experts with less entangled neurons that maintain accuracy when sparsified, effectively disentangling complex input-output relationships.

Github

 

PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

This paper critiques "extreme" LLM compression (1-2 bits/parameter) using straight-through estimators (STE) in post-training quantization, noting diminishing accuracy returns. Existing methods like QuIP# and AQLM use limited fine-tuning with STE, but its efficacy is unclear. We introduce PV-Tuning, a versatile framework enhancing quantization-aware fine-tuning with convergence guarantees. Outperforming prior techniques, PV-Tuning achieves Pareto-optimal 2-bit quantization for Llama 2 and boosts accuracy in 1-2 bit vector quantization for models like Llama and Mistral.

Github

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

We present a method to create sparse, accurate versions of large language models (LLMs) like LLaMA-2 7B, achieving full accuracy recovery for fine-tuning at 70% sparsity. Using SparseGPT pruning and sparse pretraining on SlimPajama and The Stack datasets, we accelerate training on Cerebras CS-3 chips and inference up to 3x on CPUs and 1.7x on GPUs (vLLM). Sparsity alone drives these gains, with quantization boosting CPU speedups to 8.6x. Tested across tasks like chat, coding, and reasoning, this approach enables faster, smaller LLMs without accuracy loss.

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

QuaRot, a novel quantization method based on rotations, quantizes large language models (LLMs) end-to-end—including weights, activations, and KV cache—to 4 bits. By rotating the LLM to eliminate outliers in the hidden state (residual), feed-forward activations, attention mechanisms, and KV cache, QuaRot simplifies quantization without altering outputs. Applied to LLaMa2-70B, it achieves a 4-bit model with minimal perplexity loss (0.47 on WikiText-2) and 99% zero-shot performance retention. QuaRot also enables lossless 6- and 8-bit models without calibration. Code is available online.

Github

 

Extreme Compression of Large Language Models via Additive Quantization

This paper revisits "extreme" compression of large language models (LLMs) to 2-3 bits per parameter using Multi-Codebook Quantization (MCQ). Our algorithm, AQLM, enhances Additive Quantization (AQ) with two innovations: learned, input-adaptive quantization of weight matrices and joint codebook optimization across transformer blocks. AQLM achieves Pareto optimality in accuracy vs. size below 3 bits, excelling in 2-bit compression. It also offers practical GPU/CPU implementations for fast token generation, matching or beating FP16 speeds with a smaller memory footprint.

Github

How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry" Benchmark

This paper reexamines pruning BERT-family large language models (LLMs) during fine-tuning, addressing challenges in the "Sparsity May Cry" (SMC) benchmark where existing methods struggle. We propose guidelines for effective pruning: analyzing costs vs. benefits of pruning components like embeddings and classification heads, scaling training and sparsity schedules based on target sparsity, and optimizing Knowledge Distillation parameterization. Our approach, using classic gradual magnitude pruning (GMP), achieves state-of-the-art results on both traditional BERT-pruning and SMC benchmarks.

Scaling Laws for Sparsely-Connected Foundation Models

This study investigates how parameter sparsity affects Transformer scaling in vision and language foundation models trained on massive datasets (e.g., ViT/JFT-4B, T5/C4). We establish the first scaling law linking weight sparsity, non-zero parameters, and training data volume, validated across scales. We define "optimal sparsity"—the level yielding peak performance for a given size and budget—finding it rises with more training data. We also explore sparsity structures (e.g., n:m patterns) and strategies (e.g., starting dense), revealing sparsity’s potential and limits for efficiency.

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

This paper introduces Sparse-Quantized Representation (SpQR), a novel compression format and quantization technique for large language models (LLMs), enabling near-lossless quantization to 3-4 bits per parameter across scales. By isolating outlier weights for higher precision and compressing others, SpQR achieves <1% perplexity loss for LLaMA and Falcon LLMs. It allows a 33B parameter LLM to run on a 24 GB GPU with 15% speedup and no performance drop. SpQR includes efficient encoding/decoding algorithms, offering >4x memory compression and faster GPU inference than 16-bit baselines.

Github

Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures

This study examines how pruning—zeroing out many neural network parameters—affects bias in Convolutional Neural Networks (CNNs) for vision tasks. While pruning can compress models effectively, it may worsen output bias. We demonstrate that CNNs pruned to <10% weights can maintain accuracy and limit bias increase compared to dense models. However, at higher sparsity, pruned models show greater output uncertainty and correlations, linked to increased bias. We offer simple criteria using only the original model to predict bias shifts and identify samples prone to bias after pruning.

SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks

We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch. Thus, our results provide the first support for sparse training on commodity hardware.

ZipLM: Inference-Aware Structured Pruning of Language Models

ZipLM, a new structured compression method for large language models (LLMs), balances accuracy and speedup for specified runtime targets in any inference environment. Unlike prior methods limited to post-training, gradual compression, or specific models (e.g., BERT, GPT), ZipLM iteratively prunes components with the worst loss-runtime trade-offs, excelling across settings. It outperforms distillation/pruning techniques like CoFi and DistilGPT2, matching MobileBERT’s performance by pruning BERT-large, and offers a 60% smaller, 30% faster GPT2 alternative.

Github

Quantized Distributed Training of Large Models with Convergence Guarantees

QSDP enhances fully-sharded data parallel (FSDP) training for large language models like GPT by introducing gradient and weight quantization, addressing scalability bottlenecks. Unlike direct compression in FSDP, which affects convergence, QSDP modifies SGD to maintain accuracy with quantized weights in a non-convex domain, supported by theoretical guarantees. Simple to implement with minimal overhead, QSDP was validated on GPT models up to 1.3B parameters, achieving up to 2.2x end-to-end speedups while preserving accuracy on multi-node clusters.

Github

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

GPTQ, a new one-shot weight quantization method using approximate second-order information, compresses large GPT models (e.g., 175B parameters) to 3-4 bits per weight in ~4 GPU hours with minimal accuracy loss. Outperforming prior methods, it doubles compression gains, enabling a 175B model to run on a single GPU. GPTQ also supports extreme 2-bit or ternary quantization with reasonable accuracy, yielding inference speedups of 3.25x on NVIDIA A100 and 4.5x on A6000 over FP16 baselines.

Github

Activation-Informed Merging of Large Language Models

Model merging, a method that combines the parameters and embeddings of multiple fine-tuned large language models (LLMs), offers a promising approach to enhance model performance across various tasks while maintaining computational efficiency. This paper introduces Activation-Informed Merging (AIM), a technique that integrates the information from the activation space of LLMs into the merging process to improve performance and robustness. AIM is designed as a flexible, complementary solution that is applicable to any existing merging method. It aims to preserve critical weights from the base model, drawing on principles from continual learning~(CL) and model compression. 

Github

A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

Large language models (LLMs) have improved through larger models and data, but gains are slowing. Recent methods scale computation at inference using reward models, treating it as a search problem prone to reward hacking. This paper reframes it as probabilistic inference, using sampling to explore state distributions in a state-space model with approximate likelihood. It introduces a new approach adapting particle-based Monte Carlo methods, achieving 4-16x better scaling than deterministic search on math reasoning tasks. Qwen2.5-Math-1.5B-Instruct beats GPT-4o in 4 rollouts, and Qwen2.5-Math-7B-Instruct reaches o1 accuracy in 32 rollouts. This links probabilistic inference to LLM scaling.

Github

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

LLMs favor well-resourced labs, leaving small developers at a disadvantage. This study bridges the gap by exploring supervised fine-tuning of small LLMs (3B-7B parameters) using diverse instruction-tuning datasets. Testing four open-source models, it challenges common practices like TULU’s hyperparameters and Orca’s phased training. Key findings: (i) larger batch sizes with lower learning rates boost performance on MMLU, MTBench, and Open LLM Leaderboard; (ii) early training signals (low gradient norms, high loss) predict success, saving compute; (iii) simplified hyperparameter tweaks maintain performance; (iv) stacked training matches phased training but is simpler and more efficient. This guide empowers small-scale LLM fine-tuning.

Dr. SoW: Density Ratio of Strong-over-weak LLMs for Reducing the Cost of Human Annotation in Preference Tuning

Preference tuning typically needs costly human data. This paper introduces DRSW (Density Ratio of Strong over Weak), a cost-effective method using off-the-shelf LLMs for annotation, avoiding human input. DRSW uses the log-density ratio between better- and less-aligned LLMs as a reward signal. Testing 221 LLM pairs, it shows a strong link between model performance gaps and reward quality. An end-to-end pipeline tailors rewards to user domains, boosting accuracy without fine-tuning. With Mistral-7B, DRSW scores 82.6 on RewardBench, beating top trained rewards, and excels in Safety (91.0) and Reasoning (88.0). Tuning Llama-3-8B with DRSW-annotated data lifts win rates to 37.4% (+15.1%) on ArenaHard and 40.7% (+17.8%) on AlpacaEval 2.0.

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

Discrete diffusion models excel in image generation and masked language tasks but struggle with controlled editing. DICE (Discrete Inversion for Controllable Editing) is the first method to enable precise inversion for these models, like multinomial diffusion and masked generative models. By tracking noise sequences and masking patterns in reverse diffusion, DICE reconstructs and edits discrete data accurately without predefined masks or attention tweaks. Tested on VQ-Diffusion, Paella, and RoBERTa, DICE maintains high fidelity while boosting editing flexibility in image and text domains. It opens new possibilities for fine-grained content control in discrete spaces. 

Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models

Fine-tuning large pre-trained generative models efficiently is increasingly popular. Traditional low-rank adaptation limits parameters but may lack capacity for complex tasks. This paper presents a spectrum-aware adaptation framework, adjusting singular values and basis vectors of pre-trained weights. Using the Kronecker product and Stiefel optimizers, it adapts orthogonal matrices efficiently. The proposed Spectral Orthogonal Decomposition Adaptation (SODA) balances efficiency and capacity. Tested on text-to-image diffusion models, SODA proves effective, providing a spectrum-aware alternative to existing fine-tuning approaches.

Github

Differentially Private Synthetic Data Generation for Relational Databases

Current differentially private (DP) synthetic data methods focus on single tables, but real data often spans multiple related tables. This paper presents a pioneering algorithm that enhances existing DP methods to create synthetic relational databases. It refines inter-table relationships iteratively, minimizing errors in low-order marginals while ensuring referential integrity. This approach avoids flattening tables (saving space), runs efficiently (saving time), and scales to high dimensions. It offers DP and utility guarantees, with experiments on real datasets showing strong fidelity to the original data.

Github

Value Augmented Sampling for Language Model Alignment and Personalization

Aligning Large Language Models (LLMs) to human preferences, new skills, and safer behavior is key. Search-based methods like Best-of-N excel but are costly, while Reinforcement Learning (RL) is efficient but less effective due to optimization issues. This paper introduces Value Augmented Sampling (VAS), a reward optimization framework using only initial, frozen LLM data. VAS avoids co-training policy and value functions, outperforming PPO and DPO on benchmarks and matching Best-of-128 with lower cost. It adapts LLMs like ChatGPT without weight access and enables composing multiple rewards for personalized alignment.

Github

LInK: Learning Joint Representations of Design and Performance Spaces through Contrastive Learning for Mechanism Synthesis

This paper introduces LInK, a framework blending contrastive learning and optimization to tackle complex inverse problems in engineering design, focusing on path synthesis for planar linkage mechanisms. Using multimodal, transformation-invariant contrastive learning, LInK learns joint representations of physics and design from over 10 million mechanisms, enabling fast retrieval. Paired with a hierarchical nonlinear optimization algorithm, it cuts error by 28x and time by 20x compared to state-of-the-art methods on existing benchmarks. LInK also tackles a tougher new benchmark, LINK ABC, tracing English alphabet trajectories. It advances mechanism design and extends contrastive learning to engineering.

Github

LAB: Large-Scale Alignment for ChatBots

This paper presents LAB (Large-scale Alignment for chatBots), a new method to improve scalability in instruction-tuning large language models (LLMs). Using taxonomy-guided synthetic data and a multi-phase tuning approach, LAB cuts reliance on costly human annotations and models like GPT-4. It achieves competitive benchmark results against traditionally trained models, offering a cost-effective, scalable way to boost LLM performance and instruction-following without catastrophic forgetting. This advances efficient LLM training for diverse applications.

Curiosity-driven Red-teaming for Large Language Models

Large language models (LLMs) can produce unwanted content, prompting red team human testers to craft prompts that reveal flaws—an expensive, slow process. Recent automation uses reinforcement learning (RL) to train a red team LLM, but it generates few effective test cases, limiting coverage. This paper links broader test case coverage to curiosity-driven exploration, introducing Curiosity-driven Red Teaming (CRT). CRT boosts coverage and effectiveness over existing methods, successfully eliciting toxic responses from the heavily fine-tuned LLaMA2.

Github

Constraining Generative Models for Engineering Design with Negative Data

Generative models excel but often fail to produce realistic outputs, especially in engineering where strict standards apply. This paper introduces a training method using "negative data"—examples to avoid—to guide models toward constraint-satisfying outputs. The Negative-Data Generative Model (NDGM) outperforms classics, cutting constraint violations to 1/6 with 1/8 the data in some cases. It beats baselines in 12 of 14 tests, balancing constraint adherence and distributional accuracy. Tested on synthetic and real engineering tasks like ship hull and vehicle design, NDGM shines. 

Github

Analyzing Generalization of Neural Networks through Loss Path Kernels

Deep neural networks are vital in real-world use, requiring adaptation to new data. This paper explores their generalization under (stochastic) gradient flow, linking loss dynamics to kernel machines via a new "loss path kernel." This kernel assesses data similarity using loss gradient agreement along gradient flow paths. It yields a tight generalization bound for various network architectures, closely tied to true error. Applied to neural architecture search (NAS), it outperforms top NAS algorithms in experiments, enhancing design guidance.

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Offline policy learning uses existing trajectory datasets to train decision-making policies without new data collection. Unlike supervised learning, reinforcement learning (RL) aims to exceed the dataset’s average return. However, when datasets are mostly suboptimal, top offline RL methods fail to improve much, constrained by an assumption to stick close to dataset trajectories. This paper proposes a sampling strategy focusing only on "good data," avoiding uniform mimicry of suboptimal actions. It offers a plug-and-play algorithm enhancing standard offline RL, showing big gains in 72 imbalanced datasets, D4RL, and three RL methods.

Github

Compositional Foundation Models for Hierarchical Planning

Effective decision-making in new environments with long-term goals requires hierarchical reasoning across space and time—planning subgoals, visualizing plans, and executing actions via visual-motor control. This paper introduces Compositional Foundation Models for Hierarchical Planning (HiP), integrating expert models trained on language, vision, and action data. A large language model creates symbolic plans, grounded by a video diffusion model, then linked to actions via an inverse dynamics model. Consistency is ensured through iterative refinement. HiP’s effectiveness is shown in three long-horizon table-top manipulation tasks.

Github

Identifiability Guarantees for Causal Disentanglement from Soft Interventions

Causal disentanglement seeks a data representation with latent variables linked by a causal model, identifiable if the model is unique. This paper addresses cases with unpaired observational and interventional data, where interventions alter latent variable mechanisms. While fully observed causal variables allow identification under faithfulness, this work proves identifiability with unobserved variables using a broader faithfulness concept. It ensures recovery of the latent causal model up to an equivalence class and prediction of unseen intervention effects with infinite data. An autoencoding variational Bayes algorithm is developed and applied to predict combinatorial genomic perturbation effects.

Github

Aligning Optimization Trajectories with Diffusion Models for Constrained Design Generation

Generative models excel in vision and language, inspiring their use in science and engineering to speed up design and cut iterative optimization. Physics-based methods outshine them in constrained, data-scarce settings needing precision. We introduce Diffusion Optimization Models (DOM) and Trajectory Alignment (TA), aligning diffusion model sampling with physics-based optimization trajectories to ensure physical grounding. Requiring no costly preprocessing or extra data, it generates high-performance designs in two steps. Applied to structural topology optimization, TA beats top generative models in-distribution, halves inference costs, and boosts manufacturability out-of-distribution with minimal optimization. 

Github

Post-processing Private Synthetic Data for Improving Utility on Selected Measures

Current private synthetic data generation ignores downstream tasks, risking low utility if user needs aren’t met. This paper presents a post-processing method to boost synthetic data utility for user-specified measures while keeping privacy and quality intact. It resamples data to exclude low-utility samples, using an efficient stochastic first-order algorithm for optimal weights. Tests across benchmark datasets and top generation algorithms show consistent utility gains.

Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design

Deep generative models (VAEs, GANs, Diffusion Models, Transformers) excel in applications like image synthesis and drug discovery, but evaluating them for engineering design is tricky. Traditional likelihood-based metrics often miss design-specific needs. This paper reviews classic metrics, explains their limitations in design via case studies, and curates design-focused metrics—constraint satisfaction, performance, novelty, conditioning—for better evaluation. Using 2D examples and real cases (bicycle frame, topology generation), it applies these metrics to assess four models, highlighting target achievement and geometric constraints.

Github

Private Synthetic Data Meets Ensemble Learning

Machine learning models trained on synthetic data often falter on real data due to distribution shifts. This paper proposes an ensemble strategy to boost downstream model performance on real data. Multiple synthetic datasets are created using differential privacy (DP) in parallel, then used to train and ensemble downstream models. Though each dataset may stray further from real data, their diversity strengthens robustness. Tests show no gain with marginal- or workload-based DP, but GAN-based DP improves accuracy and calibration in ensembled models.

Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries

Deep ensembles (DE) boost performance by leveraging random initialization’s stochasticity for diversity. Recent efforts enhance this via hyperparameters or loss regularization, yet remain stochastic. This paper introduces Multi-Symmetry Ensembles (MSE), a framework capturing diverse hypotheses along symmetry axes, expanding beyond weight and hyperparameter tweaks. Using contrastive learning, MSE creates models for invariant and equivariant hypotheses, efficiently ensembling them for tasks. On ImageNet, MSE’s inherent diversity enhances classification, uncertainty quantification, and generalization in transfer tasks.

Github

A Probabilistic Framework for Modular Continual Learning

Modular continual learning (CL) uses unique module compositions per problem, but searching vast composition spaces is tough due to training costs. This paper introduces PICLE, a framework using a probabilistic model to efficiently evaluate compositions, enabling perceptual, few-shot, and latent transfer. Combining prior knowledge with dataset specifics, PICLE excels in two CL benchmark suites. It outperforms prior modular CL methods, scaling well to large search spaces and long problem sequences.

Github

Improving Tuning-Free Real Image Editing with Proximal Guidance

DDIM inversion excels in real image editing via diffusion methods, but struggles with larger classifier-free guidance (CFG) scales. Null-text inversion (NTI) adjusts null embeddings for better alignment at high CFG, enabling cross-attention control. Negative-prompt inversion (NPI) offers a training-free NTI solution but can introduce artifacts and relies on DDIM quality. This paper enhances NPI with proximal guidance, adding regularization and reconstruction guidance to cut artifacts while keeping it training-free. It also extends to mutual self-attention control for geometry/layout edits, offering an efficient, low-overhead editing approach.

Github

Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

Density ratio functions (p/q) are key in machine learning to measure distribution differences, with binary classification estimators excelling in high dimensions. However, they falter when densities are well-separated due to distribution shifts between training and evaluation. This paper reveals poor performance in such cases and proposes a multi-class classification method using auxiliary densities {mk}Kk=1. By training a logistic regression on p, q, and {mk} into K+2 classes, it ensures overlap and eliminates shift issues. Tests on synthetic and real data show it outperforms state-of-the-art in density ratio estimation, mutual information, and representation learning.

Mitigating Confirmation Bias in Semi-supervised Learning via Efficient Bayesian Model Averaging

State-of-the-art semi-supervised learning (SSL) excels with labeled and unlabeled data via self-training or pseudo-labeling, but risks confirmation bias as models reinforce errors. This paper shows SOTA SSL suffers from this due to poorly calibrated classifiers in pseudo-labeling. It introduces BaM-SSL, a Bayesian Model averaging method enhancing uncertainty quantification with low overhead. BaM-SSL reduces bias, boosting test accuracy by up to 16% on CIFAR-100 (400 labels) across vision benchmarks like CIFAR-10 and CIFAR-100. It also shines in class-imbalanced datasets and photonics science challenges.

Github

Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

Expert transformations like random-resized-crop and color-jitter are key to contrastive learning success (e.g., SimCLR). Efforts to replace these with learned view-generation have underperformed for imagery. This paper asks if generated views can enhance, rather than replace, expert transformations. It proposes a view generation and assimilation method, boosting state-of-the-art performance by up to 3.6% across three datasets. A thorough study analyzes view generation and assimilation, offering insights into learned views’ role in contrastive learning.

On the Importance of Calibration in Semi-supervised Learning

State-of-the-art semi-supervised learning (SSL) excels with labeled and unlabeled data using consistency regularization and pseudo-labeling. Pseudo-labeling relies on model predictions, making calibration key to avoid confirmation bias. Yet, SOTA methods prioritize performance over calibration. This paper shows calibration strongly ties to performance and proposes enhancing it with Bayesian techniques. A new SSL model family optimizing calibration boosts test accuracy by up to 15.9% on CIFAR-10, CIFAR-100, and ImageNet, and proves effective in class-imbalanced and photonics science challenges.

A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics

Humans excel at intuitive physical reasoning with minimal data, a key aspect of common sense. This paper suggests humans learn approximate physics laws swiftly. It introduces a Bayesian-symbolic framework (BSP) for sample-efficient, human-like physical reasoning. BSP uses a generative model of interacting entities with unknown force laws, treating entities as random variables for Bayesian inference of properties. It employs symbolic regression with Newtonian grammar in a bilevel optimization to learn forces, iterating via expectation-maximization. BSP outperforms neural methods on synthetic datasets, handles real-world scenes, and excels in human physical reasoning tasks.

Equivariant Contrastive Learning

State-of-the-art self-supervised learning (SSL) pre-training creates semantically rich representations by enforcing invariance to human-defined transformations. This paper argues that equivariance—a broader concept where representations transform with inputs—can enhance this. It introduces Equivariant Self-Supervised Learning (E-SSL), extending SSL by adding a pre-training goal to predict input transformations, balancing equivariance and invariance. E-SSL boosts SimCLR to 72.5% ImageNet linear probe accuracy and proves effective in vision benchmarks and photonics regression tasks.

Github

Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling

Current autoencoder-based disentangled representation learning sacrifices reconstruction quality for independence by penalizing the posterior, limiting capacity for correlated latent variables key to image details. This paper proposes a multi-stage approach: first, a penalty-based method learns disentangled factors; then, a deep generative model adds correlated variables for detail, conditioned on the disentangled factors. Unified by D-separation, this model spans VAEs, GANs, and normalizing flows. It beats state-of-the-art in reconstruction quality across benchmarks while matching disentanglement, excels in synthetic tabular data generation, and reveals interpretable features.

not-so-BigGAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution

High-resolution image generation models like BigGAN and VQVAE-2 demand vast compute resources (512 TPU-v3 cores), limiting access. Conversely, GAN-based super-resolution models like ESRGAN upscale efficiently. This paper introduces not-so-big-GAN (nsb-GAN), a cost-effective two-step framework for deep generative models. It first generates low-frequency images in the wavelet domain, then super-resolves them to pixel-space with a novel wavelet decoder. Wavelet down-sampling retains more structure, enhancing quality at lower resolutions (e.g., 64x64). With parallel training and reduced dimensions, nsb-GAN cuts costs, achieving an FID of 10.59 on ImageNet 512x512—outperforming BigGAN with half the compute (256 TPU-v3 cores).

Generative Ratio Matching Networks

Deep generative models excel at creating realistic images, often via adversarial methods requiring tricky saddlepoint optimization to balance generator and critic networks. Maximum mean discrepancy networks (MMD-nets) sidestep this using a fixed kernel adversary but lag in quality. This paper advances this idea with Generative Ratio Matching (GRAM), a new method avoiding saddlepoint issues. In GRAM, generator and critic networks compete against a fixed kernel, not each other, ensuring stability like MMD-nets while rivaling or surpassing adversarial models in generative quality.

Sequential Transfer Machine Learning in Networks: Measuring the Impact of Data and Neural Net Similarity on Transferability

Transfer machine learning aids neural net reuse across independent entities with similar tasks using distributed data, preserving privacy. As datasets in business networks increase and transfers vary in success, assessing transferability is key. This study uses sales data from six restaurants to train and transfer neural nets, measuring transferability. It tests indicators—data divergences, projections, and a new neural net similarity metric—finding strong negative correlations with transferability. These insights guide transfer path selection, boosting performance with fewer transfers.

SimVAE: Simulator-Assisted Training for Interpretable Generative Models

This paper introduces SimVAE, a simulator-assisted training method for variational autoencoders (VAEs) that yields a disentangled, interpretable latent space. SimVAE trains in two steps: first, a deep generator (decoder) approximates a simulator, using it as a data source or teacher; then, an inference network (encoder) inverts the decoder, effectively approximating an inverted simulator. By separating encoder and decoder training, SimVAE avoids challenges common in VAEs and GANs. Its applications span circuit design, graphics de-rendering, and natural science problems requiring simulation-based inference.

BreGMN: Scaled-Bregman Generative Modeling Networks

F-divergences, widely used in generative modeling, require full overlap between data and model distributions, failing when supports mismatch during gradient-based training. Recent solutions shift to integral probability measures (IPMs) or variational lower bounds. This paper argues against changing the objective entirely, proposing instead to augment the base measure of f-divergences. It introduces Scaled Bregman Divergences, merging f-divergences and Bregman divergences, which, with a suitable base measure, address support mismatch and add geometric insights. Tests on MNIST, CelebA, and CIFAR-10 show strong results.

Variational Russian Roulette for Deep Bayesian Nonparametrics

Bayesian nonparametric models adjust complexity to data size but are computationally tough. Amortized variational methods are efficient but use fixed truncations, causing issues like over-pruning. This paper proposes a new variational approach using Russian roulette sampling from statistical physics. It adapts complexity during inference without fixed truncation, maintaining unbiased gradient estimates. Applied to infinite variational auto-encoders with a Beta-Bernoulli (Indian buffet process) prior, it offers a flexible, effective solution.

Github

EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs

Graph representation learning gains traction, adapting deep learning from Euclidean to non-Euclidean graph data via graph neural networks (GNNs). While effective in static settings, real-world graphs evolve dynamically. Traditional methods use node embeddings and recurrent neural networks (RNNs) to track temporal changes, but struggle with varying node sets across time. This paper introduces EvolveGCN, which evolves graph convolutional network (GCN) parameters temporally using an RNN, bypassing node embeddings. Two evolution architectures are explored. Tests on link prediction, edge, and node classification show EvolveGCN outperforms related methods.

Github

Scalable Graph Learning for Anti-Money Laundering: A First Look

Organized crime and human trafficking thrive on complex money laundering. Despite heavy anti-money laundering (AML) efforts, little illicit activity is stopped. This paper outlines the technical challenges and reviews AML methods, introducing scalable graph convolutional neural networks for analyzing vast, dynamic financial data. Using AMLSim, a simulator generating a synthetic graph (1M nodes, 9M edges), initial results show promise. It explores computational efficiency and graph compression, suggesting deep learning could significantly aid AML efforts.

Logical Rule Induction and Theory Learning Using Neural Theorem Proving

Human cognition excels at forming predictive theories from observations. This paper introduces a neuro-symbolic mechanism for logical theory acquisition, learning rules and core facts from observed data. Rules use vector-represented predicates, applied via soft unification to infer facts from core facts. After k inference steps, results are compared to observations, refining rules and facts to match. Built on a novel differentiable rule induction network, it features interpretable, compositional rules. Tests on ILP and domain theory datasets show its effectiveness.

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Distribution matching methods like GDC and DPG for language model alignment lag behind contrastive RLHF methods (e.g., SLiC, DPO) due to high gradient variance. This paper proposes a self-normalized baseline to cut variance and generalizes target distributions in DPG, GDC, and DPO using Bayes’ rule for a reward-conditioned posterior. The new approach, BRAIn (Bayesian Reward-conditioned Amortized Inference), links distribution matching and DPO, outperforming prior methods in summarization and Anthropic HH tasks.

Grafting Vision Transformers

Vision Transformers (ViTs) outshine CNNs in vision tasks by enabling global information sharing in shallow layers, a trait diluted in efficient pyramid designs like Swin Transformer. This paper introduces GrafT, a simple add-on enhancing any network by maintaining global dependencies and multi-scale info across all feature levels. Flexible in depth and sharing backbone resources, GrafT boosts performance across diverse Transformer models. It notably uplifts mobile-size models, adding +3.9%, +1.4%, and +1.9% top-1 accuracy to DeiT-T, Swin-T, and MobileViT-XXS on ImageNet-1k. 

The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI

This paper presents the ThreeDWorld Transport Challenge, a benchmark for visually-guided, physics-driven task-and-motion planning. An agent with two 9-DOF arms navigates a simulated home to locate, pick up, and transport objects to a target spot, using containers as tools. Built on the ThreeDWorld platform with physics-responsive objects and a fully physics-driven API, the task demands planning amid real constraints. Tests show pure RL struggles, while hierarchical planning agents manage partial success but fall short of mastery. This benchmark aims to advance physics-driven robotic intelligence.

Are Fairy Tales Fair? Analyzing Gender Bias in Temporal Narrative Event Chains of Children's Fairy Tales

Social biases in stories, like those in children’s tales, are well-documented in humanities research but often studied manually on a small scale. This paper enhances such efforts with a natural language processing pipeline that extracts temporal verb-based event chains and character attributes (e.g., gender) from narratives. It introduces a verb-event annotation scheme targeting bias-related categories, like stereotypes. A case study on fairy tales shows the framework uncovers gender bias in both individual events and their narrative sequence for female and male characters.

AGENT: A Benchmark for Core Psychological Reasoning

Machine agents need intuitive psychology—reasoning about hidden mental states driving actions—to interact with humans effectively. Humans grasp this early, distinguishing agents from objects. This paper introduces AGENT, a benchmark with 3D animations testing core psychology principles (goals, efficiency, constraints, trade-offs) via four scenarios. Validated with human ratings, AGENT emphasizes generalization in evaluation. Compared against Bayesian inverse planning and a Theory of Mind network, results show that human-level performance requires models to represent agent planning, integrating utility, object knowledge, and physics.

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents

Conversational agents, like LLMs, are integral to personal life, yet users often overlook privacy risks when sharing information. This paper introduces "contextual privacy," aiming to limit disclosures to only relevant, necessary info for user goals, reducing risks with untrusted LLMs. A user study reveals even privacy-aware individuals leak sensitive data indirectly. The authors propose a local framework that sits between users and LLMs, detecting and reframing out-of-context info in prompts. Evaluated with ShareGPT data, lightweight models boost contextual privacy while maintaining user intent across classification methods.

Privacy without Noisy Gradients: Slicing Mechanism for Generative Model Training

Training generative models with differential privacy (DP) often involves noisy gradients or altered discriminator training, hindering tuning and convergence. This paper uses the slicing privacy mechanism, adding noise to low-dimensional data projections with strong privacy guarantees, for training. It introduces smoothed-sliced f-divergence, proven statistically consistent, and a kernel-based estimator avoiding adversarial training. Experiments show superior synthetic data quality over baselines. By avoiding noisy gradients, it allows flexible generator adjustments, unlimited epochs, and restarts without extra privacy costs.

Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions

Machine learning models may discriminate due to data biases or development choices. This paper splits discrimination into aleatoric (data-inherent) and epistemic (model-induced) types. Aleatoric discrimination is quantified by assessing model performance limits under fairness constraints with perfect data knowledge, using Blackwell’s statistical experiment comparison. Epistemic discrimination is the gap between this limit and actual model accuracy under fairness rules. Applied to fairness interventions and missing-value data, results show current methods eliminate epistemic bias on standard datasets but struggle with aleatoric bias in incomplete data.

Adapting Fairness Interventions to Missing Values

Missing values in data challenge algorithmic fairness, disproportionately impacting demographic groups. The common "impute-then-classify" approach—imputing missing data then classifying—can worsen discrimination. This paper shows that classifiers trained on imputed data lose missing pattern info, degrading group fairness and accuracy. It introduces scalable, adaptive algorithms for fair classification that preserve missing pattern information and work with existing fairness methods. Tests with top fairness interventions across datasets show these algorithms outperform impute-then-classify in fairness and accuracy.

Quantifying Representation Reliability in Self-Supervised Learning Models

Self-supervised learning creates versatile data representations, but their reliability for downstream tasks is critical. This paper defines representation reliability as the ability of downstream models to consistently predict accurately using a test point’s representation. Since downstream data is often unavailable due to privacy, the authors propose an ensemble method to estimate reliability without prior task knowledge. It leverages neighborhood consistency across pre-trained representation spaces, aligning them with shared neighbor anchors. Extensive tests show this method strongly correlates with reliability, outperforming baselines.


Github

Generalization Bounds for Noisy Iterative Algorithms Using Properties of Additive Noise Channels

Machine learning models trained by different optimization algorithms under different data distributions can exhibit distinct generalization behaviors. We analyze the generalization of models trained by noisy iterative algorithms. We derive distribution-dependent generalization bounds by connecting noisy iterative algorithms to additive noise channels found in communication and information theory. Our generalization bounds shed light on several applications, including differentially private stochastic gradient descent (DP-SGD), federated learning, and stochastic gradient Langevin dynamics (SGLD). We shpw that they can help understand recent empirical observations of the generalization phenomena of neural networks.

Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection

This paper tackles fair multi-class classification by "projecting" a pre-trained, potentially unfair classifier onto a fair model set meeting group-fairness goals. The fair model adjusts the pre-trained outputs with a multiplicative factor. It offers a parallelizable, iterative algorithm with sample complexity and convergence guarantees. Tests against top benchmarks show it balances accuracy and fairness well, with fast runtime on big datasets. It scales effectively, proven on a 1M+ sample dataset with multiple classes and intersectional groups.

Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values

This paper examines fairness issues in machine learning when training on datasets with missing values, where missing patterns may tie to group attributes (e.g., gender, race). Most fairness methods assume complete data, but imputing missing values can skew fairness. The authors analyze discrimination risks theoretically and propose a decision tree-based method, Missing Incorporated as Attribute (MIA), that skips separate imputation. It optimizes a fairness-regularized function directly. Tests on real datasets show it beats fairness interventions on imputed data.

NetGSR: Towards Efficient and Reliable Network Monitoring with Generative Super Resolution

Network monitoring systems aggregate data from network elements to a central collector for visibility, balancing efficiency (low overhead) and high-fidelity (accurate status). Dynamic networks challenge this balance, with prior methods sacrificing one for the other. This paper introduces NetGSR, a deep learning solution using a tailored conditional generative model (DistilGAN) and a feedback mechanism (Xaminer) to reconstruct fine-grained network status from low-resolution data. Xaminer adjusts sampling rates based on uncertainty and denoising. Tested on real-world datasets, NetGSR achieves 25x better efficiency and fast inference, maintaining fidelity.

PH-Dropout: Practical Epistemic Uncertainty Quantification for View Synthesis

Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) excel in view synthesis, but lack efficient epistemic Uncertainty Quantification (UQ). Current NeRF UQ methods add heavy computational costs (e.g., 10x training time), while GS has no systematic UQ approach. This gap hinders robustness and scalability. This paper reexamines NeRF and GS as function approximation, revealing key 3D representation insights. It proposes PH-Dropout, the first real-time, accurate UQ method for pre-trained NeRF and GS models. Extensive tests confirm its theoretical basis and effectiveness.

Github

Graphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utility

Generative models with Differential Privacy (DP) create synthetic tabular data while lowering privacy risks, but their privacy-utility tradeoffs complicate model selection. This paper analyzes how DP models allocate privacy budgets across rows and columns, a key utility factor. It compares graphical and deep models, examining modeling techniques, DP mechanisms, and data dimensionality. Findings show graphical models spread budgets horizontally, struggling with wide datasets, while deep models spend per iteration, adapting better to varying dimensions. Low privacy (ϵ≥100) can enhance generalization. This guides DP model choice for datasets, privacy needs, and tasks.

Practical Hamiltonian Monte Carlo on Riemannian Manifolds via Relativity Theory

Hamiltonian Monte Carlo (HMC) samples unnormalized densities via Hamiltonian dynamics. Girolami & Calderhead (2011) extended HMC to Riemannian manifolds, but instability persists. Past efforts improved stability with robust metric tensors. This paper enhances stability by designing dynamics, building on Lu et al. (2017)’s momentum distribution to cap particle speed. Generalized to Riemannian manifolds, it introduces position-dependent velocity bounds, curbing step sizes in high-curvature areas to cut numerical errors. It also offers a practical algorithm for sampling relativistic momentum without mean-field reliance.

Shrinking VOD Traffic via Rényi-Entropic Optimal Transport

As Internet Video on Demand (VOD) traffic surges, this paper shifts focus from infrastructure optimization to shaping user demand for cache efficiency. It proposes a mechanism to adjust request distributions to be cache-friendly while staying close to user preferences cost-wise. Using Rényi entropy as a novel proxy for cache footprint—measuring video richness and access evenness—it formulates an optimal transport problem to reduce this metric. A key theorem links entropy minimization to maximizing soft cache hit ratio (SCHR), allowing video substitutions. Tests on a city-scale dataset cut cache size by 83% and boost SCHR near 100%.