經典機器學習與時間序列演算法 Classical ML & Time Series Algorithms

每種方法附原理、優缺點與典型用例 Each method comes with idea, pros, cons, and use cases

14 種最常見演算法 · 涵蓋監督、非監督、集成、深度學習 14 most common algorithms — supervised, unsupervised, ensemble, deep

Linear Regression
Supervised · Regression

1. 線性回歸 Linear Regression1. Linear Regression

原理:Idea: 找一條最佳直線,使預測與實際的平方誤差總和最小。 Find the best-fit line by minimizing sum of squared errors.

✓ 優點✓ Pros

  • 簡單直觀Simple, intuitive
  • 計算極快Very fast
  • 係數可解釋Interpretable coefficients

✗ 缺點✗ Cons

  • 假設線性關係Assumes linearity
  • 對離群值敏感Outlier-sensitive
  • 無法捕捉非線性No non-linear patterns

★ 典型用例★ Typical use cases

  • 房價、銷量、溫度等連續預測Predicting price, sales, temperature
  • 商業基線模型、簡單趨勢分析Business baseline, simple trend analysis
Logistic Regression
Supervised · Classification

2. 邏輯回歸 Logistic Regression2. Logistic Regression

原理:Idea: 用 sigmoid 把線性結果壓到 0–1 之間做機率分類。 Squash a linear score through a sigmoid into a 0–1 probability.

✓ 優點✓ Pros

  • 輸出機率,可解釋Probabilistic output
  • 訓練快、模型小Fast training, small model
  • 不易過擬合Resistant to overfitting

✗ 缺點✗ Cons

  • 線性決策邊界Linear boundary only
  • 類別不平衡時表現差Struggles with class imbalance
  • 需要特徵工程Requires feature engineering

★ 典型用例★ Typical use cases

  • 信用違約、垃圾郵件、疾病診斷Credit default, spam, disease diagnosis
  • 廣告點擊率 (CTR) 預測Ad click-through-rate prediction
Decision Tree
Supervised · Tree

3. 決策樹 Decision Tree3. Decision Tree

原理:Idea: 遞迴依特徵分裂節點,葉節點給出預測。 Recursively split data by feature thresholds; leaves give predictions.

✓ 優點✓ Pros

  • 規則可視覺化Visual, transparent rules
  • 數值與類別都能用Handles numeric & categorical
  • 不需特徵縮放No feature scaling needed

✗ 缺點✗ Cons

  • 容易過擬合Prone to overfitting
  • 不穩定(資料微變即模型大變)Unstable to small data changes
  • 偏向高基數特徵Biased to high-cardinality features

★ 典型用例★ Typical use cases

  • 風控規則、醫療診斷流程Credit risk rules, medical decision flows
  • 需要可解釋規則的場景When transparent rules are required
Random Forest
Ensemble · Bagging

4. 隨機森林 Random Forest4. Random Forest

原理:Idea: 許多獨立決策樹各自預測,多數決投票。 Many independent decision trees vote; majority wins.

✓ 優點✓ Pros

  • 抗過擬合、表現穩定Resists overfitting, stable
  • 提供特徵重要度Feature importance built-in
  • 少需調參Few hyperparams to tune

✗ 缺點✗ Cons

  • 訓練/預測較慢Slower train & predict
  • 模型體積大Large model size
  • 可解釋性弱於單樹Less interpretable than single tree

★ 典型用例★ Typical use cases

  • 表格資料的萬用基線Default tabular baseline
  • 銀行信用評分、醫療風險分層Credit scoring, medical risk stratification
XGBoost
Ensemble · Boosting

5. XGBoost / 梯度提升5. XGBoost / Gradient Boosting

原理:Idea: 樹串接訓練,每棵樹擬合前一棵的殘差。 Trees trained sequentially; each fits the residuals of the previous.

✓ 優點✓ Pros

  • 表格資料 SOTASOTA on tabular data
  • 內建處理缺失值Built-in missing-value handling
  • 高度可調、競賽常勝Highly tunable, competition winner

✗ 缺點✗ Cons

  • 超參數多需仔細調Many hyperparams to tune
  • 訓練時間較長Longer training time
  • 可能過擬合Can overfit

★ 典型用例★ Typical use cases

  • Kaggle 競賽、CTR 預測、風險建模Kaggle, CTR prediction, risk modeling
  • 推薦系統排序Recommendation ranking
KNN
Supervised · Instance-based

6. K 最近鄰 KNN6. K-Nearest Neighbors (KNN)

原理:Idea: 找 k 個最近的訓練點,由它們投票決定類別。 Find the k closest training points; let them vote on the label.

✓ 優點✓ Pros

  • 無需訓練階段No training phase
  • 簡單直觀Simple, intuitive
  • 可學任意決策邊界Learns arbitrary boundaries

✗ 缺點✗ Cons

  • 預測時要算所有距離 (慢)Slow prediction (compute all dist)
  • 高維表現差 (curse of dim)Curse of dimensionality
  • 對特徵縮放敏感Sensitive to feature scaling

★ 典型用例★ Typical use cases

  • 推薦系統、相似圖搜尋Recommenders, image similarity
  • 小資料集分類Small-dataset classification
K-Means
Unsupervised · Clustering

7. K-Means 分群7. K-Means Clustering

原理:Idea: 迭代:點分配給最近質心 → 重算質心 → 直到穩定。 Iterate: assign points to nearest centroid → recompute centroid → until stable.

✓ 優點✓ Pros

  • 快速、簡單Fast and simple
  • 可擴展到大資料Scales to large data
  • 結果易解釋Interpretable results

✗ 缺點✗ Cons

  • 需事先指定 kMust specify k upfront
  • 只能球形群Spherical clusters only
  • 對初始點和離群值敏感Sensitive to init & outliers

★ 典型用例★ Typical use cases

  • 客戶分群、影像顏色量化Customer segmentation, color quantization
  • 文件分類、資料探索Document clustering, data exploration
SVM
Supervised · Margin

8. 支持向量機 SVM8. Support Vector Machine (SVM)

原理:Idea: 找一條超平面使兩類間的「間隔」最大。 Find the hyperplane that maximizes the margin between two classes.

✓ 優點✓ Pros

  • 高維表現好Strong in high dimensions
  • 透過 kernel 處理非線性Non-linear via kernels
  • 理論完備Theoretically grounded

✗ 缺點✗ Cons

  • 大資料訓練慢Slow on large data
  • 對 kernel 與參數敏感Sensitive to kernel & params
  • 可解釋性弱Less interpretable

★ 典型用例★ Typical use cases

  • 文字分類、基因表現Text classification, gene expression
  • 中小規模高維特徵問題Small-to-mid high-dim problems
PCA
Unsupervised · Dim Reduction

9. 主成分分析 PCA9. Principal Component Analysis (PCA)

原理:Idea: 找變異最大的正交方向,把資料投影到較少維度。 Find orthogonal axes of greatest variance and project to fewer dimensions.

✓ 優點✓ Pros

  • 降維減少計算成本Reduces compute cost
  • 消除特徵共線性Decorrelates features
  • 可視覺化高維資料Visualizes high-dim data

✗ 缺點✗ Cons

  • 只做線性變換Linear transform only
  • 新特徵失去原意New features lose meaning
  • 對特徵縮放敏感Scale-sensitive

★ 典型用例★ Typical use cases

  • 資料視覺化、特徵壓縮Data visualization, feature compression
  • 模型訓練前的預處理Preprocessing before modeling
Naive Bayes
Supervised · Probabilistic

10. 樸素貝氏 Naive Bayes10. Naive Bayes

原理:Idea: 由貝氏定理算後驗,假設特徵在類別下相互獨立。 Apply Bayes' rule under the (naive) assumption that features are independent given the class.

✓ 優點✓ Pros

  • 訓練/預測極快Very fast train/predict
  • 少資料也能跑Works with small data
  • 文字分類效果好Strong on text

✗ 缺點✗ Cons

  • 獨立假設常不成立Independence rarely holds
  • 機率估計不準Inaccurate probabilities
  • 表現天花板低Lower performance ceiling

★ 典型用例★ Typical use cases

  • 垃圾郵件、情感分析Spam filtering, sentiment analysis
  • 文件分類的快速基線Fast baseline for text classification
Neural Network
Supervised · Deep Learning

11. 神經網路 Neural Network (MLP)11. Neural Network (MLP)

原理:Idea: 多層神經元疊加,用反向傳播學習權重。 Stack of neuron layers; weights learned via back-propagation.

✓ 優點✓ Pros

  • 能擬合任意複雜函數Fits arbitrary functions
  • 通用、可遷移General, transferable
  • 表現上限高High performance ceiling

✗ 缺點✗ Cons

  • 需要大量資料Needs lots of data
  • 訓練慢、黑盒Slow training, black box
  • 超參敏感、易過擬合Hyperparam sensitive, overfit-prone

★ 典型用例★ Typical use cases

  • 影像/語音/文字、複雜非線性問題Image / speech / text, complex non-linear
  • 深度學習所有架構的基礎Foundation for all deep architectures
DBSCAN
Unsupervised · Density

12. DBSCAN 密度群集12. DBSCAN

原理:Idea: 在 ε 半徑內密度足夠的點即為一群,稀疏者視為雜訊。 Points densely packed within ε form a cluster; sparse points are noise.

✓ 優點✓ Pros

  • 不需事先指定群數No k needed
  • 找任意形狀群集Finds arbitrary shapes
  • 自動偵測雜訊/離群值Auto-detects noise/outliers

✗ 缺點✗ Cons

  • 對密度不均資料效果差Struggles with varying density
  • 需調 ε 與 minPtsNeeds ε & minPts tuning
  • 高維表現差Poor in high dimensions

★ 典型用例★ Typical use cases

  • 異常偵測、地理位置分群Anomaly detection, geo-location clustering
  • 影像區塊偵測Image segment detection
Hierarchical Clustering
Unsupervised · Hierarchy

13. 階層分群 Hierarchical Clustering13. Hierarchical Clustering

原理:Idea: 由下往上把最近的群合併,產生樹狀圖;可剪枝決定群數。 Bottom-up merge nearest clusters into a dendrogram; cut to choose k.

✓ 優點✓ Pros

  • 不必預設群數No upfront k
  • 產生階層 dendrogramProduces dendrogram structure
  • 直觀可視覺化Intuitive, visual

✗ 缺點✗ Cons

  • 計算 O(n²) 以上Computational O(n²)+
  • 對離群值敏感Outlier-sensitive
  • 合併不可逆Merges are irreversible

★ 典型用例★ Typical use cases

  • 物種分類、基因表現分群Taxonomy, gene expression clustering
  • 組織架構與市場區隔分析Organizational and market segment analysis
Ridge & Lasso
Supervised · Regularization

14. Ridge & Lasso 正規化回歸14. Ridge & Lasso Regression

原理:Idea: 在線性回歸損失上加入 L2 (Ridge) 或 L1 (Lasso) 懲罰,控制係數大小。 Add L2 (Ridge) or L1 (Lasso) penalty to linear loss to shrink coefficients.

✓ 優點✓ Pros

  • 防止過擬合Prevents overfitting
  • Lasso 可自動特徵選擇Lasso does auto feature selection
  • 提升泛化能力Better generalization

✗ 缺點✗ Cons

  • 需調正規化強度 λNeed to tune λ
  • Lasso 對相關特徵不穩Lasso unstable on correlated features
  • Ridge 不會把係數歸零Ridge can't zero out coefficients

★ 典型用例★ Typical use cases

  • 高維資料、防止線性模型過擬合High-dim data, regularizing linear models
  • 基因 / 文字 / 影像特徵選擇Gene / text / image feature selection

針對具有強烈週期性 (日 / 週 / 年) 的時序資料 For time series with strong cyclical patterns (daily / weekly / yearly)

⚡ 怎麼選方法?⚡ How to choose?

  • 單一週期、要統計嚴謹 → SARIMA / Holt-Winters
  • 多週期同時存在 (日+週+年) → Prophet 或 Fourier 特徵 + XGBoost
  • 想先了解資料結構 → STL 分解,分離趨勢與週期
  • 資料量大、模式複雜 → LSTM (深度學習)
  • 把週期塞進現有模型 → Fourier 特徵
  • Single seasonality, need statistical rigor → SARIMA / Holt-Winters
  • Multiple seasonalities together (daily + weekly + yearly) → Prophet, or Fourier features + XGBoost
  • Want to understand structure first → STL decomposition
  • Lots of data, complex patterns → LSTM (deep learning)
  • Inject cycles into an existing model → Fourier features
STL Decomposition
Decomposition · Exploratory

時序分解 STL DecompositionSTL Decomposition

原理:Idea: 把序列拆成「趨勢 + 季節週期 + 殘差」三部分。 Split a series into Trend + Seasonal + Residual.

✓ 優點✓ Pros

  • 直觀分離趨勢與週期Intuitive split
  • 對異常穩健Robust to outliers
  • 支援任意週期長度Any period length

✗ 缺點✗ Cons

  • 只支援單一季節性Single seasonality only
  • 不直接預測未來Doesn't forecast directly
  • 需要規則時間間隔Requires regular intervals

★ 典型用例★ Typical use cases

  • 探索性分析,了解資料結構Exploratory analysis, understanding structure
  • 異常偵測前處理Preprocessing for anomaly detection
SARIMA
Statistical · Forecast

SARIMA 季節性 ARIMASARIMA

原理:Idea: ARIMA (AR + 差分 + MA) 加上季節性參數,能處理趨勢與週期。 ARIMA (AR + differencing + MA) plus seasonal counterparts.

✓ 優點✓ Pros

  • 統計理論完備Statistically rigorous
  • 提供信賴區間Provides confidence intervals
  • 處理趨勢與單一季節Handles trend & one season

✗ 缺點✗ Cons

  • 多重季節難處理Hard with multi-seasonality
  • 參數選擇複雜Complex parameter selection
  • 需資料平穩化Needs stationarity

★ 典型用例★ Typical use cases

  • 月銷量、季度 GDP、氣溫Monthly sales, quarterly GDP, temperature
  • 需要嚴謹預測區間的場景When rigorous prediction intervals matter
Prophet
Multi-seasonality · Practical

Prophet (Meta)Prophet (Meta)

原理:Idea: y(t) = 趨勢 + 年週期 + 週週期 + 節日效應 (加法分解)。 y(t) = trend + yearly + weekly + holiday effects (additive).

✓ 優點✓ Pros

  • 同時處理日/週/年週期Handles daily/weekly/yearly together
  • 支援節日與特殊事件Supports holidays & events
  • 容忍缺值,易上手Tolerates missing data, easy to use

✗ 缺點✗ Cons

  • 短期預測未必最佳Not always best at short-horizon
  • 非統計嚴謹Less statistically rigorous
  • 對突變反應慢Slow to react to abrupt changes

★ 典型用例★ Typical use cases

  • 網站流量、零售銷量、能源用量Web traffic, retail sales, energy demand
  • 具強烈週期的業務時序Business series with strong cycles
Holt-Winters
Smoothing · Lightweight

Holt-Winters 三重指數平滑Holt-Winters Smoothing

原理:Idea: 三組指數平滑分別追蹤水平 (α)、趨勢 (β)、季節 (γ)。 Three exponential smoothers track Level (α), Trend (β), Seasonal (γ).

✓ 優點✓ Pros

  • 簡單快速Simple and fast
  • 參數少Few parameters
  • 計算成本極低Very low compute cost

✗ 缺點✗ Cons

  • 只支援一個季節Single seasonality only
  • 無法加外生變數No exogenous variables
  • 對突變反應慢Slow to react to shocks

★ 典型用例★ Typical use cases

  • 庫存預測、月度銷量Inventory forecasting, monthly sales
  • 即時監控的快速基線Fast baseline for live monitoring
Fourier Features
Feature Engineering · Multi-cycle

傅立葉特徵 Fourier FeaturesFourier Features

原理:Idea: 用 sin/cos 對 (不同頻率) 編碼週期,作為特徵餵入回歸模型。 Encode cycles as sin/cos pairs at multiple frequencies; feed into any regressor.

✓ 優點✓ Pros

  • 編碼任意週期長度Encodes any period length
  • 可組合多個週期Combine multiple cycles
  • 適配任何回歸器Works with any regressor

✗ 缺點✗ Cons

  • 需事先知道週期Period must be known a priori
  • 增加特徵維度Adds feature dimensions
  • 不處理非週期結構Doesn't handle non-cyclic structure

★ 典型用例★ Typical use cases

  • 把日/週/年週期塞入 XGBoost / LightGBMInject daily/weekly/yearly into XGBoost / LightGBM
  • 給線性模型加上週期特徵Add cyclical features to linear models
LSTM
Deep Learning · Sequence

LSTM 長短期記憶網路LSTM (Long Short-Term Memory)

原理:Idea: 遞迴神經網路 + 記憶單元 (cell state),可學長距依賴。 Recurrent network with a cell state for long-range dependencies.

✓ 優點✓ Pros

  • 學任意非線性序列模式Learns non-linear sequence patterns
  • 自動發現週期Auto-discovers cycles
  • 可加外生變數,多變量Accepts exogenous & multivariate

✗ 缺點✗ Cons

  • 需大量資料Needs lots of data
  • 訓練慢、黑盒Slow training, black box
  • 超參敏感、易過擬合Hyperparam sensitive, overfit-prone

★ 典型用例★ Typical use cases

  • 高頻交易、感測器序列、IoTHigh-frequency trading, sensor / IoT sequences
  • 複雜多變量時序、語言模型Complex multivariate series, language modeling