Files
Abstract
This dissertation presents algorithmic contributions aimed at advancing scalable and efficient model training in machine learning. As models grow in size and data distributions evolve, reducing computational and memory demands without compromising performance has become a critical challenge. This work addresses the problem from multiple complementary angles by designing training methods that reduce redundancy, exploit structure, and incorporate curvature information in a lightweight manner. We first propose an adaptive training strategy for sequential data, where change point detection identifies distributional shifts and triggers model updates only when necessary. This selective retraining approach reduces unnecessary computations while maintaining or improving predictive accuracy, making it particularly effective for online learning scenarios. Building on the principle of resource-aware computation, we then explore curvature-aware optimization methods that aim to harness the benefits of second-order information without the prohibitive cost typically associated with such methods. By analyzing the structure of curvature matrices such as the Fisher information, we reveal low-rank and Kronecker-factored forms that enable the construction of efficient preconditioners. These approximations lead to substantial gains in convergence speed, training time, and memory usage across a variety of model architectures. Finally, we extend this line of work to derivative-free optimization by developing a zeroth-order method for fine-tuning large language models under black-box constraints. By combining a variance reduction strategy with low-rank curvature approximation, the proposed approach achieves efficient adaptation with reduced memory overhead compared to state-of-the-art zeroth-order methods. These contributions represent diverse yet coherent efforts toward scalable model training. By leveraging statistical adaptivity and structural efficiency in both data and optimization, this dissertation offers practical solutions to the growing demands of modern machine learning. It also demonstrates how principled approximations and theoretical insights can enable learning systems that are efficient, robust, and well-suited to real-world challenges.