This page provides a list of topics and resources to build a good undergraduate level foundation in various topics related to Machine Learning, Linear Algebra, Artificial Intelligence, Information Retrieval, Social Networks, Deep Learning and Statistics


Foundation/Basics

Deep Learning - NPTEL IIT M
MIT 6.S191: Introduction to Deep Learning
Machine Learning: Tom Mitchell
Various Optimization Algorithms For Training Neural Network | Towards Data Science
Activation Functions in Neural Networks | by SAGAR SHARMA | Towards Data Science
Common Loss functions in machine learning | by Ravindra Parmar | Towards Data Science
Artificial Intelligence - All in One - YouTube
DeepLearningAI - YouTube


Conceptual Introduction
  • Perceptron - What is a perceptron, how does it learn linear functions, What type of logical gates can it learn? What about XOR?
  • Linear functions vs. non-linear functions
  • How to deal with non-linearity? Activation functions - Sigmoid, RELU, TanH (Where to use these activations - binary classification, multi-class classification, softmax type situations, regression)
  • Regression vs. classification
Math behind “Learning”
  • Minimization of loss
  • Dynamic Learning Rate
  • Backpropagation math
  • Difference between overfitting and under-fitting - What happens when u can fully learn the training data but cannot fit to test data
  • Momentum - how to gradually converge to the target function that you are trying to learn
  • Exponential smoothing - how is momentum a smoothing function?
  • Nesterov’s accelerated gradient - Look ahead momentum
  • AdaGrad, RMSProp, Adam Optimizers
  • Different Types of Errors
  • Mean Squared Error
  • L1 Loss, L2 Loss
  • Regularization - What is it? Lasso and Ridge
  • Dropout - why is it necessary
  • Batch Normalization vs. Layer Normalization
  • Binary Cross Entropy
  • What is Entropy?
  • KL Divergence - How are two distributions similar, different? What is the distance between two distributions
  • Precision, Accuracy, Recall, F1 Score, ROC Curve
Handling Images, Graphics (Computer Vision)

A Gentle Introduction to Generative Adversarial Networks (GANs) - MachineLearningMastery.com
Unsupervised Feature Learning and Deep Learning Tutorial (stanford.edu)
Convolutional Neural Networks, Explained | by Mayank Mishra | Towards Data Science

  • Autoencoders - Reducing dimensions, Minimizing Standard Deviation + KL Divergence - Variational Autoencoders
  • Denoising Autoencoders
  • Convolutional Neural Networks - 2D Filters, 3D Filters
  • Activation Map, Max Pooling, Average Pooling
  • 3D Filters, 2D Filters
  • Vanishing Gradient
  • Generative Adversarial Networks
  • Fine tuning and Transfer Learning
  • YOLO models
  • DALL-E
  • When do GANS achieve convergence - Nash Equilibrium. Why is it adversarial training?
Artificial Intelligence

MIT 6.034 Artificial Intelligence, Fall 2010
Python implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach":Acknowledgements (berkeley.edu)
Artificial Intelligence: A Modern Approach Russell and Norvig 3rd Edition

  • Solving problems by searching for the solution in a space of solutions
  • Intelligent agents - Rationality
  • Uninformed search - DFS, BFS, Depth limited search, iterative deepening search
  • Search strategies - Greedy heuristic search, A-Star search
  • Local Search - Hill Climbing search, Stochastic Hill Climbing, Simulated annealing, beam search, gradient descent, Genetic Algorithm (Important)
  • Adversarial search, games, Alpha Beta Pruning, Minimax games - Tic Tac Toe
  • Constraint Satisfaction problems - Map Coloring with constraints
  • Propositional Logic vs. First Order Logic
Python Tools for ML

Blog (machinelearningmastery.com)
Towards Data Science

  • Pytorch - Different types of Neural Networks
  • Tensorflow/Keras
  • SkLearn - other ML Models
  • Huggingface - pretrained language models
  • Spacy, NLTK - NLP tools
Linear Algebra and Stats

Introduction to Linear Algebra, 4th Edition (aiu.edu)
A First Course in Probability: Sheldon Ross
Probability, Random Variables, Random Signal Principles: Peyton Z Peebles

  • Bayesian stuff
  • Joint Probability
  • Correlation and Covariance
  • Conditional Probability and Independence
  • Multivariate Random Variable
  • Expectation, Binomial RV, Gaussian RV, Poisson RV
  • Identity, Inverse, Diagonal, Orthogonal, Orthonormal matrices
  • Eigen Values and Eigen Vectors
  • EigenDecomposition
  • Singular Value Decomposition
  • Sampling data
  • Maximum Likelihood Estimates
  • Maximum A-Posteriori Estimate
  • Bias-Variance Tradeoffs
  • Hypothesis Testing and Statistical Significance
Handling Text

Introduction to Information Retrieval (stanford.edu)
PageRank algorithm, fully explained | by Amrani Amine | Towards Data Science
CS 230 - Recurrent Neural Networks Cheatsheet (stanford.edu)

  • Representing text - Term Frequency, Inverse Document Frequency
  • Bigrams, Tri-grams
  • Vector Space Model
  • Inverted Index - What is it, how is it useful for querying large text corpora
  • Minimum edit distance - how to deal with spelling mistakes while querying text (Not ML)
  • Page Rank Algorithm for Google page ranking (Not ML, tangentially related to text ML)
  • Dimensionality reduction - Principal Component Analysis (PCA)
  • Latent Semantic Indexing
  • Contextualized word representation (Meaning changes with context - bear:animal, bear:tolerate) - Word2Vec, BERT, GPT-2, GPT-3, ChatGPT, BARD, Bing AI
  • Glove embeddings - static embeddings for words
  • Non-Negative Matrix Factorization - optimization problem with Frobenius norm
  • Recurrent Neural Networks - Image Captioning, Language translation
  • LSTM, GRU
Handling Graphs and Networks
  • Walks, Paths, Euler circuit, Hamiltonian Circuit, Degrees
  • Connected graph, Directed Acyclic Graph, Bipartite Graph, Tree
  • Reciprocity, transitivity
  • Clustering Coefficient
  • Graph Neural Networks
  • Random Networks - Erdos Renyi
  • Six Degrees of Separation
  • Power Law degree distribution
  • Homophily
  • Watts and Strogatz network
  • Barabasi Albert Network
  • Scale Free Networks
  • Zipf Law
  • Communities in graphs, cliques, clustering - hierarchical, Girvan Neumann, Louvain, Clique Percolation
Game Theory

Game Theory textbook (ucdavis.edu)
Game Theory in Artificial Intelligence | by Pier Paolo Ippolito | Towards Data Science

  • Utility function, payoff function, Reinforcement Learning, Q-Learning
  • Strict Dominance, weak dominance
  • Prisoner’s Dilemma, Pareto Superior, Pareto Optimal
  • Strictly dominated strategy vs. Weakly dominated Strategy
  • Nash Equilibrium, Pareto Optimal, Social Optimal
  • Hawk-Dove Games
  • Mixed Strategies
Machine Learning beyond Neural Networks (Scikit-Learn)

Support Vector Machine — Introduction to Machine Learning Algorithms | by Rohith Gandhi | Towards Data Science
Naive Bayes Classifier. What is a classifier? | by Rohith Gandhi | Towards Data Science
Hidden Markov Model. Elaborated with examples | Towards Data Science

Support Vector Machines
  • What is a margin in classification
  • What are support vectors?
  • Primal Function, max margin
  • Min-max of Lagrangian
  • Adding a slack variable to allow non-separable datasets
  • Projecting to higher dimensions
  • Kernel Function
  • Multi-class SVM - one vs. one, one-vs-rest
Naive Bayes Classifier
  • Probabilistic classification
  • Likelihood
  • MLE
  • Laplace Smoothing and Laplace Correction
Decision Trees and Random Forests
  • Entropy and Information Gain
  • Gini Gain
  • Pruning to prevent Overfitting, Post Pruning
  • Reduced Error Pruning
  • K-Fold Cross Validation
  • ID3 Algorithm
K-Nearest Neighbors
  • What is a nearest neighbor
  • Euclidean Distance, Manhattan Distance, Mahalanobis Distance
  • 1NN vs K-NN
Logistic Regression
Ensemble Learning
  • Ensemble Learning
  • Boosting and Bagging
  • AdaBoost Algorithm
  • Random Forests
Clustering
  • K-Means Clustering
  • Unsupervised learning
  • How to cluster based on centroids
  • Hierarchical Clustering
  • Probabilistic Clustering
  • Gaussian Mixture Models
Hidden Markov Models
  • Hidden Markov Models
  • Markov Chains
  • Discrete Markov processes
  • Viterbi Algorithm
  • Baum Welch Algorithm
  • Markov Chain Monte CarloP
Market Basket Analysis
  • Expectation Maximization
  • Estimating when data is hidden
  • Purity
  • Normalized Mutual Information Gain
Genetic Algorithms
Particle Swarm Optimization
Advanced Topics
Advanced Math/algorithms for “Learning”

randomized-algorithms-motwani-and-raghavan.pdf (wordpress.com)
Convex Optimization: Boyd Vandenberghe

  • What is a convex function
  • Jensen's Inequality
  • Convex Optimization
  • Smooth Optimization
  • Min-Cut Algorithm, Las Vegas and Monte Carlo Algorithms
  • Markov and Chebyshev Inequalities
  • Concentration Bounds - Chernoff, Hoeffdings
  • Max Sat Problems, Expanding Graphs
  • Lovasz Local Lemma
  • Fingerprinting and Frievalds
  • Submodularity
  • Monotone Functions
  • Hessians, Cauchy Schwartz Inequality
  • Constrained Optimization vs. Unconstrained Optimization
  • Pseudo Inverse of a matrix
  • Explainability - LIME and SHAP Functions
More ML Topics

Pattern Recognition and Machine Learning (microsoft.com)

  • Hypothesis Space, Instance Space
  • Version Space
  • Inductive Bias, Expressivity, Decision Boundaries
  • Mistake Bound Learning
  • Polynomial Kernels, Gaussian Kernels
  • Empirical Risk Minimization
  • Sub Gradient Descent
  • PAC Learning - Probably Approximately Correct
  • K-CNF
  • Agnostic Learning
  • VC Dimensions - Shattering
Current Research Topics

Attention and Transformer Models. “Attention Is All You Need” was a… | by Helene Kortschak | Towards Data Science
What are Stable Diffusion Models and Why are they a Step Forward for Image Generation? | by J. Rafid Siddiqui, PhD | Towards Data Science
Understanding Vector Quantized Variational Autoencoders (VQ-VAE) | by Shashank Yadav | Medium
Neural Operator (zongyi-li.github.io)
NeRF: Neural Radiance Fields (matthewtancik.com)
Michael Black | Perceiving Systems - Max Planck Institute for Intelligent Systems (mpg.de)
Distill - Machine Learning Journal 2016-2021

  • Transformers - Attention based models, Cross attention
  • Sparse Transformers - Learning vs. Memorization
  • Stable Diffusion
  • Neural Operators
  • VQ-VAE
  • Cross Modal Learning
  • Neural Radiance Fields (NERF)
  • Mel-Spectrograms Synthesis through Diffusion
  • Learning Embeddings

</div> </div>