Machine Learning Theory

Machine Learning Theory is the formal, mathematical study of how machines learn from data. It focuses on answering fundamental questions such as:

1. What can be learned? (Learnability)

Determines which tasks or functions can be learned from data.
Includes the concept of PAC learning (Probably Approximately Correct), which studies how many samples are needed to learn with high accuracy.

2. How well can we learn? (Generalization)

Ensures that the model performs well not only on training data but also on unseen data.
Concepts include:
- Bias–variance trade-off
- Overfitting and underfitting
- Capacity and complexity of models

3. How many data samples are required? (Sample Complexity)

Measures how much data is needed to learn a good model.
Related to:
- VC dimension
- Rademacher complexity

4. How much computation is required? (Computational Complexity)

Studies whether the learning process is computationally feasible.
Focuses on the difficulty of optimization problems in machine learning.

5. What algorithms guarantee learning?

Provides theoretical guarantees about convergence and performance of:
- Gradient descent
- Stochastic gradient descent (SGD)
- Regularization methods

In short, Machine Learning Theory provides the mathematical foundations that explain why learning algorithms work and under what conditions they succeed.

Reinforcement Learning (RL)

Reinforcement Learning is a branch of machine learning where an agent learns to make decisions by interacting with an environment. Instead of learning from labeled data, the agent learns from rewards and penalties.

Key Concepts

1. Agent and Environment

Agent: the learner or decision-maker.
Environment: everything the agent interacts with.

2. States, Actions, and Rewards

State (S): A snapshot of the environment at a given time.
Action (A): A choice the agent makes.
Reward (R): A numerical signal given after taking an action (positive or negative).

3. Policy (π)

A policy is the agent’s strategy:

Maps states → actions.
Can be deterministic or probabilistic.

4. Value Functions

Measure how good it is to be in a certain state or to take a certain action:

V(s): expected return from state s
Q(s, a): expected return from taking action a in state s

5. Return and Discount Factor

Return: total future rewards.
Discount Factor (γ): determines how much future rewards matter.

6. Exploration vs. Exploitation

Exploration: trying new actions to gain more knowledge.
Exploitation: choosing the best-known action.
RL must balance these two.

Types of Reinforcement Learning Algorithms

1. Value-Based Methods

Learn value functions.
Examples:
- Q-Learning
- Deep Q-Network (DQN)

2. Policy-Based Methods

Learn a direct mapping from states to actions.
Examples:
- REINFORCE
- PPO (Proximal Policy Optimization)

3. Actor–Critic Methods

Combine value-based and policy-based approaches.
Examples:
- A2C / A3C
- SAC (Soft Actor-Critic)

4. Model-Based Methods

Learn a model of the environment and use it to plan actions.
Examples:
- Dyna-Q
- MuZero

In Summary