cs179/GPU Machine Learning and cuDNN I

The use of Graphics Processing Units for rendering is well known, but their power for general parallel computation has only recently been explored. Parallel algorithms running on GPUs can often achieve up to 100x speedup over similar CPU algorithms, with many existing applications for physics simulations, signal processing, financial modeling, neural networks, and countless other fields.
This course will cover programming techniques for the GPU. The course will introduce NVIDIA’s parallel computing language, CUDA. Beyond covering the CUDA programming model and syntax, the course will also discuss GPU architecture, high performance computing on GPUs, parallel algorithms, CUDA libraries, and applications of GPU computing.

Problem sets will cover performance optimization and specific GPU applications such as numerical mathematics, medical imaging, finance, and other fields.

Course Link

whoat is ML GOOD for?

  • Handwriting recognition
  • Spam detection
  • Teaching a rebot how to do a backflip
  • Predicting the performance of a stock portfolio

What is ML

What do these problems have in common?

  • Some pattern we want to learn
  • No good closed-form model for it
  • LOTS of data

What can we do?
Use data to learn a statistical model for the pattern we are interested in.

One data point is a vector 𝑥 in ℝ^𝑑

  • A 30×30 pixel image is a 900-dimensional vector (one component per pixel intensity)
  • If we are classifying an email as spam or not spam, set 𝑑= number of words in dictionary
    Count the number of times 𝑛_𝑖 that a word 𝑖 appears in an email and set 𝑥_𝑖=𝑛_𝑖

what are we trying to do

Given an input 𝑥∈ℝ^𝑑, produce an output 𝑦. What is 𝑦?

  • Could be a real number, e.g. predicted return of a given stock portfolio
  • Could be 0 or 1, e.g. spam or not spam
  • Could be a vector in ℝ^𝑚, e.g. telling a robot how to move each of its 𝑚 joints

example of (x, y) pairs:
"pairs"

notation
"notation"

statistical models

  • Given (𝐗, 𝐘) (𝑁 pairs of (𝑥^((𝑖)), 𝑦^((𝑖) ) ) data), how do we accurately predict an output 𝑦 given an input 𝑥?
  • One solution: a model 𝑓(𝑥) parametrized by a vector (or matrix) 𝑤, denoted as 𝑓(𝑥;𝑤).The task is finding a set of optimal parameters 𝑤