Classical ML & Time Series · 經典機器學習與時間序列

14 種最常見演算法 · 涵蓋監督、非監督、集成、深度學習 14 most common algorithms — supervised, unsupervised, ensemble, deep

Supervised · Regression

1. 線性回歸 Linear Regression1. Linear Regression

原理：Idea: 找一條最佳直線，使預測與實際的平方誤差總和最小。 Find the best-fit line by minimizing sum of squared errors.

✓ 優點✓ Pros

簡單直觀Simple, intuitive
計算極快Very fast
係數可解釋Interpretable coefficients

✗ 缺點✗ Cons

假設線性關係Assumes linearity
對離群值敏感Outlier-sensitive
無法捕捉非線性No non-linear patterns

★ 典型用例★ Typical use cases

房價、銷量、溫度等連續預測Predicting price, sales, temperature
商業基線模型、簡單趨勢分析Business baseline, simple trend analysis

Supervised · Classification

2. 邏輯回歸 Logistic Regression2. Logistic Regression

原理：Idea: 用 sigmoid 把線性結果壓到 0–1 之間做機率分類。 Squash a linear score through a sigmoid into a 0–1 probability.

✓ 優點✓ Pros

輸出機率，可解釋Probabilistic output
訓練快、模型小Fast training, small model
不易過擬合Resistant to overfitting

✗ 缺點✗ Cons

線性決策邊界Linear boundary only
類別不平衡時表現差Struggles with class imbalance
需要特徵工程Requires feature engineering

★ 典型用例★ Typical use cases

信用違約、垃圾郵件、疾病診斷Credit default, spam, disease diagnosis
廣告點擊率 (CTR) 預測Ad click-through-rate prediction

Supervised · Tree

3. 決策樹 Decision Tree3. Decision Tree

原理：Idea: 遞迴依特徵分裂節點，葉節點給出預測。 Recursively split data by feature thresholds; leaves give predictions.

✓ 優點✓ Pros

規則可視覺化Visual, transparent rules
數值與類別都能用Handles numeric & categorical
不需特徵縮放No feature scaling needed

✗ 缺點✗ Cons

容易過擬合Prone to overfitting
不穩定（資料微變即模型大變）Unstable to small data changes
偏向高基數特徵Biased to high-cardinality features

★ 典型用例★ Typical use cases

風控規則、醫療診斷流程Credit risk rules, medical decision flows
需要可解釋規則的場景When transparent rules are required

Ensemble · Bagging

4. 隨機森林 Random Forest4. Random Forest

原理：Idea: 許多獨立決策樹各自預測，多數決投票。 Many independent decision trees vote; majority wins.

✓ 優點✓ Pros

抗過擬合、表現穩定Resists overfitting, stable
提供特徵重要度Feature importance built-in
少需調參Few hyperparams to tune

✗ 缺點✗ Cons

訓練/預測較慢Slower train & predict
模型體積大Large model size
可解釋性弱於單樹Less interpretable than single tree

★ 典型用例★ Typical use cases

表格資料的萬用基線Default tabular baseline
銀行信用評分、醫療風險分層Credit scoring, medical risk stratification

Ensemble · Boosting

5. XGBoost / 梯度提升5. XGBoost / Gradient Boosting

原理：Idea: 樹串接訓練，每棵樹擬合前一棵的殘差。 Trees trained sequentially; each fits the residuals of the previous.

✓ 優點✓ Pros

表格資料 SOTASOTA on tabular data
內建處理缺失值Built-in missing-value handling
高度可調、競賽常勝Highly tunable, competition winner

✗ 缺點✗ Cons

超參數多需仔細調Many hyperparams to tune
訓練時間較長Longer training time
可能過擬合Can overfit

★ 典型用例★ Typical use cases

Kaggle 競賽、CTR 預測、風險建模Kaggle, CTR prediction, risk modeling
推薦系統排序Recommendation ranking

Supervised · Instance-based

6. K 最近鄰 KNN6. K-Nearest Neighbors (KNN)

原理：Idea: 找 k 個最近的訓練點，由它們投票決定類別。 Find the k closest training points; let them vote on the label.

✓ 優點✓ Pros

無需訓練階段No training phase
簡單直觀Simple, intuitive
可學任意決策邊界Learns arbitrary boundaries

✗ 缺點✗ Cons

預測時要算所有距離 (慢)Slow prediction (compute all dist)
高維表現差 (curse of dim)Curse of dimensionality
對特徵縮放敏感Sensitive to feature scaling

★ 典型用例★ Typical use cases

推薦系統、相似圖搜尋Recommenders, image similarity
小資料集分類Small-dataset classification

Unsupervised · Clustering

7. K-Means 分群7. K-Means Clustering

原理：Idea: 迭代：點分配給最近質心 → 重算質心 → 直到穩定。 Iterate: assign points to nearest centroid → recompute centroid → until stable.

✓ 優點✓ Pros

快速、簡單Fast and simple
可擴展到大資料Scales to large data
結果易解釋Interpretable results

✗ 缺點✗ Cons

需事先指定 kMust specify k upfront
只能球形群Spherical clusters only
對初始點和離群值敏感Sensitive to init & outliers

★ 典型用例★ Typical use cases

客戶分群、影像顏色量化Customer segmentation, color quantization
文件分類、資料探索Document clustering, data exploration

Supervised · Margin

8. 支持向量機 SVM8. Support Vector Machine (SVM)

原理：Idea: 找一條超平面使兩類間的「間隔」最大。 Find the hyperplane that maximizes the margin between two classes.

✓ 優點✓ Pros

高維表現好Strong in high dimensions
透過 kernel 處理非線性Non-linear via kernels
理論完備Theoretically grounded

✗ 缺點✗ Cons

大資料訓練慢Slow on large data
對 kernel 與參數敏感Sensitive to kernel & params
可解釋性弱Less interpretable

★ 典型用例★ Typical use cases

文字分類、基因表現Text classification, gene expression
中小規模高維特徵問題Small-to-mid high-dim problems

Unsupervised · Dim Reduction

9. 主成分分析 PCA9. Principal Component Analysis (PCA)

原理：Idea: 找變異最大的正交方向，把資料投影到較少維度。 Find orthogonal axes of greatest variance and project to fewer dimensions.

✓ 優點✓ Pros

降維減少計算成本Reduces compute cost
消除特徵共線性Decorrelates features
可視覺化高維資料Visualizes high-dim data

✗ 缺點✗ Cons

只做線性變換Linear transform only
新特徵失去原意New features lose meaning
對特徵縮放敏感Scale-sensitive

★ 典型用例★ Typical use cases

資料視覺化、特徵壓縮Data visualization, feature compression
模型訓練前的預處理Preprocessing before modeling

Supervised · Probabilistic

10. 樸素貝氏 Naive Bayes10. Naive Bayes

原理：Idea: 由貝氏定理算後驗，假設特徵在類別下相互獨立。 Apply Bayes' rule under the (naive) assumption that features are independent given the class.

✓ 優點✓ Pros

訓練/預測極快Very fast train/predict
少資料也能跑Works with small data
文字分類效果好Strong on text

✗ 缺點✗ Cons

獨立假設常不成立Independence rarely holds
機率估計不準Inaccurate probabilities
表現天花板低Lower performance ceiling

★ 典型用例★ Typical use cases

垃圾郵件、情感分析Spam filtering, sentiment analysis
文件分類的快速基線Fast baseline for text classification

Supervised · Deep Learning

11. 神經網路 Neural Network (MLP)11. Neural Network (MLP)

原理：Idea: 多層神經元疊加，用反向傳播學習權重。 Stack of neuron layers; weights learned via back-propagation.

✓ 優點✓ Pros

能擬合任意複雜函數Fits arbitrary functions
通用、可遷移General, transferable
表現上限高High performance ceiling

✗ 缺點✗ Cons

需要大量資料Needs lots of data
訓練慢、黑盒Slow training, black box
超參敏感、易過擬合Hyperparam sensitive, overfit-prone

★ 典型用例★ Typical use cases

影像/語音/文字、複雜非線性問題Image / speech / text, complex non-linear
深度學習所有架構的基礎Foundation for all deep architectures

Unsupervised · Density

12. DBSCAN 密度群集12. DBSCAN

原理：Idea: 在 ε 半徑內密度足夠的點即為一群，稀疏者視為雜訊。 Points densely packed within ε form a cluster; sparse points are noise.

✓ 優點✓ Pros

不需事先指定群數No k needed
找任意形狀群集Finds arbitrary shapes
自動偵測雜訊/離群值Auto-detects noise/outliers

✗ 缺點✗ Cons

對密度不均資料效果差Struggles with varying density
需調 ε 與 minPtsNeeds ε & minPts tuning
高維表現差Poor in high dimensions

★ 典型用例★ Typical use cases

異常偵測、地理位置分群Anomaly detection, geo-location clustering
影像區塊偵測Image segment detection

Unsupervised · Hierarchy

13. 階層分群 Hierarchical Clustering13. Hierarchical Clustering

原理：Idea: 由下往上把最近的群合併，產生樹狀圖；可剪枝決定群數。 Bottom-up merge nearest clusters into a dendrogram; cut to choose k.

✓ 優點✓ Pros

不必預設群數No upfront k
產生階層 dendrogramProduces dendrogram structure
直觀可視覺化Intuitive, visual

✗ 缺點✗ Cons

計算 O(n²) 以上Computational O(n²)+
對離群值敏感Outlier-sensitive
合併不可逆Merges are irreversible

★ 典型用例★ Typical use cases

物種分類、基因表現分群Taxonomy, gene expression clustering
組織架構與市場區隔分析Organizational and market segment analysis

Supervised · Regularization

14. Ridge & Lasso 正規化回歸14. Ridge & Lasso Regression

原理：Idea: 在線性回歸損失上加入 L2 (Ridge) 或 L1 (Lasso) 懲罰，控制係數大小。 Add L2 (Ridge) or L1 (Lasso) penalty to linear loss to shrink coefficients.

✓ 優點✓ Pros

防止過擬合Prevents overfitting
Lasso 可自動特徵選擇Lasso does auto feature selection
提升泛化能力Better generalization

✗ 缺點✗ Cons

需調正規化強度 λNeed to tune λ
Lasso 對相關特徵不穩Lasso unstable on correlated features
Ridge 不會把係數歸零Ridge can't zero out coefficients

★ 典型用例★ Typical use cases

高維資料、防止線性模型過擬合High-dim data, regularizing linear models
基因 / 文字 / 影像特徵選擇Gene / text / image feature selection

針對具有強烈週期性 (日 / 週 / 年) 的時序資料 For time series with strong cyclical patterns (daily / weekly / yearly)

⚡ 怎麼選方法？⚡ How to choose?

單一週期、要統計嚴謹 → SARIMA / Holt-Winters
多週期同時存在 (日+週+年) → Prophet 或 Fourier 特徵 + XGBoost
想先了解資料結構 → STL 分解，分離趨勢與週期
資料量大、模式複雜 → LSTM (深度學習)
把週期塞進現有模型 → Fourier 特徵

Single seasonality, need statistical rigor → SARIMA / Holt-Winters
Multiple seasonalities together (daily + weekly + yearly) → Prophet, or Fourier features + XGBoost
Want to understand structure first → STL decomposition
Lots of data, complex patterns → LSTM (deep learning)
Inject cycles into an existing model → Fourier features

Decomposition · Exploratory

時序分解 STL DecompositionSTL Decomposition

原理：Idea: 把序列拆成「趨勢 + 季節週期 + 殘差」三部分。 Split a series into Trend + Seasonal + Residual.

✓ 優點✓ Pros

直觀分離趨勢與週期Intuitive split
對異常穩健Robust to outliers
支援任意週期長度Any period length

✗ 缺點✗ Cons

只支援單一季節性Single seasonality only
不直接預測未來Doesn't forecast directly
需要規則時間間隔Requires regular intervals

★ 典型用例★ Typical use cases

探索性分析，了解資料結構Exploratory analysis, understanding structure
異常偵測前處理Preprocessing for anomaly detection

Statistical · Forecast

SARIMA 季節性 ARIMASARIMA

原理：Idea: ARIMA (AR + 差分 + MA) 加上季節性參數，能處理趨勢與週期。 ARIMA (AR + differencing + MA) plus seasonal counterparts.

✓ 優點✓ Pros

統計理論完備Statistically rigorous
提供信賴區間Provides confidence intervals
處理趨勢與單一季節Handles trend & one season

✗ 缺點✗ Cons

多重季節難處理Hard with multi-seasonality
參數選擇複雜Complex parameter selection
需資料平穩化Needs stationarity

★ 典型用例★ Typical use cases

月銷量、季度 GDP、氣溫Monthly sales, quarterly GDP, temperature
需要嚴謹預測區間的場景When rigorous prediction intervals matter

Multi-seasonality · Practical

Prophet (Meta)Prophet (Meta)

原理：Idea: y(t) = 趨勢 + 年週期 + 週週期 + 節日效應 (加法分解)。 y(t) = trend + yearly + weekly + holiday effects (additive).

✓ 優點✓ Pros

同時處理日/週/年週期Handles daily/weekly/yearly together
支援節日與特殊事件Supports holidays & events
容忍缺值，易上手Tolerates missing data, easy to use

✗ 缺點✗ Cons

短期預測未必最佳Not always best at short-horizon
非統計嚴謹Less statistically rigorous
對突變反應慢Slow to react to abrupt changes

★ 典型用例★ Typical use cases

網站流量、零售銷量、能源用量Web traffic, retail sales, energy demand
具強烈週期的業務時序Business series with strong cycles

Smoothing · Lightweight

Holt-Winters 三重指數平滑Holt-Winters Smoothing

原理：Idea: 三組指數平滑分別追蹤水平 (α)、趨勢 (β)、季節 (γ)。 Three exponential smoothers track Level (α), Trend (β), Seasonal (γ).

✓ 優點✓ Pros

簡單快速Simple and fast
參數少Few parameters
計算成本極低Very low compute cost

✗ 缺點✗ Cons

只支援一個季節Single seasonality only
無法加外生變數No exogenous variables
對突變反應慢Slow to react to shocks

★ 典型用例★ Typical use cases

庫存預測、月度銷量Inventory forecasting, monthly sales
即時監控的快速基線Fast baseline for live monitoring

Feature Engineering · Multi-cycle

傅立葉特徵 Fourier FeaturesFourier Features

原理：Idea: 用 sin/cos 對 (不同頻率) 編碼週期，作為特徵餵入回歸模型。 Encode cycles as sin/cos pairs at multiple frequencies; feed into any regressor.

✓ 優點✓ Pros

編碼任意週期長度Encodes any period length
可組合多個週期Combine multiple cycles
適配任何回歸器Works with any regressor

✗ 缺點✗ Cons

需事先知道週期Period must be known a priori
增加特徵維度Adds feature dimensions
不處理非週期結構Doesn't handle non-cyclic structure

★ 典型用例★ Typical use cases

把日/週/年週期塞入 XGBoost / LightGBMInject daily/weekly/yearly into XGBoost / LightGBM
給線性模型加上週期特徵Add cyclical features to linear models

Deep Learning · Sequence

LSTM 長短期記憶網路LSTM (Long Short-Term Memory)

原理：Idea: 遞迴神經網路 + 記憶單元 (cell state)，可學長距依賴。 Recurrent network with a cell state for long-range dependencies.

✓ 優點✓ Pros

學任意非線性序列模式Learns non-linear sequence patterns
自動發現週期Auto-discovers cycles
可加外生變數，多變量Accepts exogenous & multivariate

✗ 缺點✗ Cons

需大量資料Needs lots of data
訓練慢、黑盒Slow training, black box
超參敏感、易過擬合Hyperparam sensitive, overfit-prone

★ 典型用例★ Typical use cases

高頻交易、感測器序列、IoTHigh-frequency trading, sensor / IoT sequences
複雜多變量時序、語言模型Complex multivariate series, language modeling