Time series forecasting plays an important role in predicting future outcomes from past observations, especially in applications such as infectious diseases, where accurate and timely forecasts can guide public health decisions. However, forecasting indicators such as COVID-19 or influenza-like illness (ILI) cases remains challenging due to the limited, non-stationary, and skewed nature of infectious disease data. Deep learning models, particularly Transformers, typically require large datasets to generalize effectively, making them less reliable during the early stages of an outbreak when only small datasets are available. Moreover, common global preprocessing methods used in Transformers, such as z-normalization, are sensitive to extreme values and assume data to be roughly normal, which does not hold for infectious diseases data characterized by distributional shifts, extreme values, and skewness. This dissertation aims to address these challenges through a two-fold approach. First, it evaluates sixteen forecasting models, ranging from statistical and recurrent models to Transformers, under both small and large data conditions. A retraining strategy is integrated into deep learning models for limited datasets to allow the model to relearn from newly available data, enabling it to remain adaptive and accurate when forecasting future points. Second, the dissertation applies statistical transformations (Logarithmic, Square Root, Yeo–Johnson, Box–Cox, and Differencing) before z-normalization to better handle the challenges inherent in infectious disease time series. The univariate Box–Cox transformation is extended to a multivariate form that jointly estimates parameters using feature covariances. Previous studies have remained largely theoretical or implemented only in R, whereas this work provides the first practical Python implementation. Finally, a new instance normalization technique called Context-aware Instance Normalization (CoIN) is proposed to overcome the limitations of Reversible Instance Normalization (RevIN), which does not account for differing statistical characteristics between input and forecast horizons. Overall, this work contributes toward improving the generalization and adaptability of Transformer-based forecasting models for real-world data. Extensive experiments have been conducted on viral disease datasets to demonstrate the effectiveness of the proposed methods.