This page provides a list of topics and resources to build a good undergraduate level foundation in various topics related to Machine Learning,
Linear Algebra, Artificial Intelligence, Information Retrieval, Social Networks, Deep Learning and Statistics
Foundation/Basics
Deep Learning - NPTEL IIT M
MIT 6.S191: Introduction to Deep Learning
Machine Learning: Tom Mitchell
Various Optimization Algorithms For Training Neural Network | Towards Data Science
Activation Functions in Neural Networks | by SAGAR SHARMA | Towards Data Science
Common Loss functions in machine learning | by Ravindra Parmar | Towards Data Science
Artificial Intelligence - All in One - YouTube
DeepLearningAI - YouTube
Conceptual Introduction
- Perceptron - What is a perceptron, how does it learn linear functions, What type of logical gates can it learn? What about XOR?
- Linear functions vs. non-linear functions
- How to deal with non-linearity? Activation functions - Sigmoid, RELU, TanH (Where to use these activations - binary classification, multi-class classification, softmax type situations, regression)
- Regression vs. classification
Math behind “Learning”
- Minimization of loss
- Dynamic Learning Rate
- Backpropagation math
- Difference between overfitting and under-fitting - What happens when u can fully learn the training data but cannot fit to test data
- Momentum - how to gradually converge to the target function that you are trying to learn
- Exponential smoothing - how is momentum a smoothing function?
- Nesterov’s accelerated gradient - Look ahead momentum
- AdaGrad, RMSProp, Adam Optimizers
- Different Types of Errors
- Mean Squared Error
- L1 Loss, L2 Loss
- Regularization - What is it? Lasso and Ridge
- Dropout - why is it necessary
- Batch Normalization vs. Layer Normalization
- Binary Cross Entropy
- What is Entropy?
- KL Divergence - How are two distributions similar, different? What is the distance between two distributions
- Precision, Accuracy, Recall, F1 Score, ROC Curve
Handling Images, Graphics (Computer Vision)
A Gentle Introduction to Generative Adversarial Networks (GANs) - MachineLearningMastery.com
Unsupervised Feature Learning and Deep Learning Tutorial (stanford.edu)
Convolutional Neural Networks, Explained | by Mayank Mishra | Towards Data Science
- Autoencoders - Reducing dimensions, Minimizing Standard Deviation + KL Divergence - Variational Autoencoders
- Denoising Autoencoders
- Convolutional Neural Networks - 2D Filters, 3D Filters
- Activation Map, Max Pooling, Average Pooling
- 3D Filters, 2D Filters
- Vanishing Gradient
- Generative Adversarial Networks
- Fine tuning and Transfer Learning
- YOLO models
- DALL-E
- When do GANS achieve convergence - Nash Equilibrium. Why is it adversarial training?
Artificial Intelligence
MIT 6.034 Artificial Intelligence, Fall 2010
Python implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach":Acknowledgements (berkeley.edu)
Artificial Intelligence: A Modern Approach Russell and Norvig 3rd Edition
- Solving problems by searching for the solution in a space of solutions
- Intelligent agents - Rationality
- Uninformed search - DFS, BFS, Depth limited search, iterative deepening search
- Search strategies - Greedy heuristic search, A-Star search
- Local Search - Hill Climbing search, Stochastic Hill Climbing, Simulated annealing, beam search, gradient descent, Genetic Algorithm (Important)
- Adversarial search, games, Alpha Beta Pruning, Minimax games - Tic Tac Toe
- Constraint Satisfaction problems - Map Coloring with constraints
- Propositional Logic vs. First Order Logic
Python Tools for ML
Blog (machinelearningmastery.com)
Towards Data Science
- Pytorch - Different types of Neural Networks
- Tensorflow/Keras
- SkLearn - other ML Models
- Huggingface - pretrained language models
- Spacy, NLTK - NLP tools
Linear Algebra and Stats
Introduction to Linear Algebra, 4th Edition (aiu.edu)
A First Course in Probability: Sheldon Ross
Probability, Random Variables, Random Signal Principles: Peyton Z Peebles
- Bayesian stuff
- Joint Probability
- Correlation and Covariance
- Conditional Probability and Independence
- Multivariate Random Variable
- Expectation, Binomial RV, Gaussian RV, Poisson RV
- Identity, Inverse, Diagonal, Orthogonal, Orthonormal matrices
- Eigen Values and Eigen Vectors
- EigenDecomposition
- Singular Value Decomposition
- Sampling data
- Maximum Likelihood Estimates
- Maximum A-Posteriori Estimate
- Bias-Variance Tradeoffs
- Hypothesis Testing and Statistical Significance
Handling Text
Introduction to Information Retrieval (stanford.edu)
PageRank algorithm, fully explained | by Amrani Amine | Towards Data Science
CS 230 - Recurrent Neural Networks Cheatsheet (stanford.edu)
- Representing text - Term Frequency, Inverse Document Frequency
- Bigrams, Tri-grams
- Vector Space Model
- Inverted Index - What is it, how is it useful for querying large text corpora
- Minimum edit distance - how to deal with spelling mistakes while querying text (Not ML)
- Page Rank Algorithm for Google page ranking (Not ML, tangentially related to text ML)
- Dimensionality reduction - Principal Component Analysis (PCA)
- Latent Semantic Indexing
- Contextualized word representation (Meaning changes with context - bear:animal, bear:tolerate) - Word2Vec, BERT, GPT-2, GPT-3, ChatGPT, BARD, Bing AI
- Glove embeddings - static embeddings for words
- Non-Negative Matrix Factorization - optimization problem with Frobenius norm
- Recurrent Neural Networks - Image Captioning, Language translation
- LSTM, GRU
Handling Graphs and Networks
- Walks, Paths, Euler circuit, Hamiltonian Circuit, Degrees
- Connected graph, Directed Acyclic Graph, Bipartite Graph, Tree
- Reciprocity, transitivity
- Clustering Coefficient
- Graph Neural Networks
- Random Networks - Erdos Renyi
- Six Degrees of Separation
- Power Law degree distribution
- Homophily
- Watts and Strogatz network
- Barabasi Albert Network
- Scale Free Networks
- Zipf Law
- Communities in graphs, cliques, clustering - hierarchical, Girvan Neumann, Louvain, Clique Percolation
Game Theory
Game Theory textbook (ucdavis.edu)
Game Theory in Artificial Intelligence | by Pier Paolo Ippolito | Towards Data Science
- Utility function, payoff function, Reinforcement Learning, Q-Learning
- Strict Dominance, weak dominance
- Prisoner’s Dilemma, Pareto Superior, Pareto Optimal
- Strictly dominated strategy vs. Weakly dominated Strategy
- Nash Equilibrium, Pareto Optimal, Social Optimal
- Hawk-Dove Games
- Mixed Strategies
Machine Learning beyond Neural Networks (Scikit-Learn)
Support Vector Machine — Introduction to Machine Learning Algorithms | by Rohith Gandhi | Towards Data Science
Naive Bayes Classifier. What is a classifier? | by Rohith Gandhi | Towards Data Science
Hidden Markov Model. Elaborated with examples | Towards Data Science
Support Vector Machines
- What is a margin in classification
- What are support vectors?
- Primal Function, max margin
- Min-max of Lagrangian
- Adding a slack variable to allow non-separable datasets
- Projecting to higher dimensions
- Kernel Function
- Multi-class SVM - one vs. one, one-vs-rest
Naive Bayes Classifier
- Probabilistic classification
- Likelihood
- MLE
- Laplace Smoothing and Laplace Correction
Decision Trees and Random Forests
- Entropy and Information Gain
- Gini Gain
- Pruning to prevent Overfitting, Post Pruning
- Reduced Error Pruning
- K-Fold Cross Validation
- ID3 Algorithm
K-Nearest Neighbors
- What is a nearest neighbor
- Euclidean Distance, Manhattan Distance, Mahalanobis Distance
- 1NN vs K-NN
Logistic Regression
Ensemble Learning
- Ensemble Learning
- Boosting and Bagging
- AdaBoost Algorithm
- Random Forests
Clustering
- K-Means Clustering
- Unsupervised learning
- How to cluster based on centroids
- Hierarchical Clustering
- Probabilistic Clustering
- Gaussian Mixture Models
Hidden Markov Models
- Hidden Markov Models
- Markov Chains
- Discrete Markov processes
- Viterbi Algorithm
- Baum Welch Algorithm
- Markov Chain Monte CarloP
Market Basket Analysis
- Expectation Maximization
- Estimating when data is hidden
- Purity
- Normalized Mutual Information Gain
Genetic Algorithms
Particle Swarm Optimization
Advanced Topics
Advanced Math/algorithms for “Learning”
randomized-algorithms-motwani-and-raghavan.pdf (wordpress.com)
Convex Optimization: Boyd Vandenberghe
- What is a convex function
- Jensen's Inequality
- Convex Optimization
- Smooth Optimization
- Min-Cut Algorithm, Las Vegas and Monte Carlo Algorithms
- Markov and Chebyshev Inequalities
- Concentration Bounds - Chernoff, Hoeffdings
- Max Sat Problems, Expanding Graphs
- Lovasz Local Lemma
- Fingerprinting and Frievalds
- Submodularity
- Monotone Functions
- Hessians, Cauchy Schwartz Inequality
- Constrained Optimization vs. Unconstrained Optimization
- Pseudo Inverse of a matrix
- Explainability - LIME and SHAP Functions
More ML Topics
Pattern Recognition and Machine Learning (microsoft.com)
- Hypothesis Space, Instance Space
- Version Space
- Inductive Bias, Expressivity, Decision Boundaries
- Mistake Bound Learning
- Polynomial Kernels, Gaussian Kernels
- Empirical Risk Minimization
- Sub Gradient Descent
- PAC Learning - Probably Approximately Correct
- K-CNF
- Agnostic Learning
- VC Dimensions - Shattering
Current Research Topics
Attention and Transformer Models. “Attention Is All You Need” was a… | by Helene Kortschak | Towards Data Science
What are Stable Diffusion Models and Why are they a Step Forward for Image Generation? | by J. Rafid Siddiqui, PhD | Towards Data Science
Understanding Vector Quantized Variational Autoencoders (VQ-VAE) | by Shashank Yadav | Medium
Neural Operator (zongyi-li.github.io)
NeRF: Neural Radiance Fields (matthewtancik.com)
Michael Black | Perceiving Systems - Max Planck Institute for Intelligent Systems (mpg.de)
Distill - Machine Learning Journal 2016-2021
- Transformers - Attention based models, Cross attention
- Sparse Transformers - Learning vs. Memorization
- Stable Diffusion
- Neural Operators
- VQ-VAE
- Cross Modal Learning
- Neural Radiance Fields (NERF)
- Mel-Spectrograms Synthesis through Diffusion
- Learning Embeddings