# Heterogeneous Computing

# for Signal and Data Processing

**Heterogeneous Computing for Signal and Data Processing**

**** Parallel computing with GPUs and other devices****

**Course number: EECS E4750 **

**(Original name: Signal Processing and Communications on Mobile Multicore Processors)**

**Prof. Zoran Kostic****, ****Electrical Engineering Department, ****Data Sciences Institute,**** **** Columbia University in the City of New York**

**Target Audience: **

** Students interested in acquiring software and systems design skills in ****parallel computing for graphics processing units (GPUs) and heterogeneous computing infrastructure****, relevant to applications in data processing, deep learning, signal and communications industries.**

**Bulletin Description:**

Methods for deploying signal and data processing algorithms on contemporary general purpose graphics processing units (GPGPUs) and heterogeneous computing infrastructures. Using programming languages such as OpenCL and CUDA for computational speedup in audio, image and video processing and computational data analysis. Significant design project.

**Dates:**

Fall 2022: EECS E4750

Fall 2021, 2020, 2019, 2018, 2017, 2016: EECS E4750

Fall 2015, 2014: ELEN E4750 : SP & COMM ON MOBILE MULTI PROC

**Content**

Applications of Parallel Computing

Graphics Processing Unit (**GPU**) architecture and programming.

Heterogeneous Parallel Computing (HPC)

Parallel SW development in OpenCL and CUDA, Apple Metal, Vulkan, other standards.

Motivating examples from imaging, audio, multimedia, deep learning

Cross section of mobile processor architectures: Nvidia, AMD, Intel

General Purpose Processors, Graphic Processing Units (GPU), DSPs

ARM architecture

Parallel programming concepts for mobile platforms

CUDA and OpenCL language

Tools: development environments, code development, profiling

Standards: Khronos OpenGL, WebGL, HSA

Parallel programming examples

Signal processing

Image and video processing

Neural networks and deep learning

Communications processing, protocols

Data Analysis

Power Considerations

**Syllabus Details**

**Theory, CUDA, OpenCL:**

Portability and Scalability in HPC

Data Parallelism and Threads

Memory Hierarchy

Memory Allocation and Data Movement

Kernel-Based Parallel Programming

Memory Bandwidth and Coalescing

Matrix-Matrix Multiplications

Thread, Warps and Wavefronts

Thread Scheduling

Tiled Processing for 1D, 2D

Control Divergence

Convolution and Tiled Convolution

Reduction Kernels

Atomic Operations

Histogram Kernel

Applications: Deep Learning, Imaging, Video, ...

Profiling and Debugging

**Project Suggestions for implementation in CUDA or OpenCL**

Image processing

Audio processing

Machine learning

Deep Learning algorithm parallelization

Optimization of communication networks

Optimization of energy networks

Medical applications

Graphics

Video processing

Visualization

Financial applications

**Books, Tools and Resources**

BOOKS:

David Kirk and Wen-mei Hwu, "Programming Massively Parallel Processors -A Hands-on Approach," 3rd Edition, publisher: Elsevier eBook ISBN: 9780128119877, Paperback ISBN: 9780128119860, (https://www.elsevier.com/books/programming-massively-parallel-processors/kirk/978-0-12-811986-0)

Old book- D. Kirk and W. Hwu, â€śProgramming Massively Parallel Processors â€“ A Hands-on Approach,â€ť 2nd Edition, Morgan Kaufman Publisher (elsevier) ,ISBN-13: 978-0124159921 ISBN-10: 0124159923 (http://www.elsevier.com/books/programming-massively-parallel-processors/kirk/978-0-12-415992-1)

OpenCL Programming by Example, Ravishekhar Banger, Koushik Bhattacharyya, Packt Publishing (December 23, 2013),ISBN : 1849692343, ISBN 13 : 9781849692342

Parallel machines:

Google Cloud GPUs

Server with NVIDIA Tesla K40 + Nvidia Quadro K5000s + Mobile: Jetson TK1 + Intel Xeon E5-1620v

SDKs (SW development kit) by NVIDIA, Intel...

**2014 Fall Projects**

Low Rank Matrix Recovery Using Principal Component Analysis

Acceleration of Genetic Algorithms and Image Pattern Recognition of fMRI Fingerprint

Parallel Implementations of Detection Algorithms for MIMO Systems on the GPU

Harnessing GPU for solving Options Pricing problems in Financial Engineering

Topics Extraction with GPU Acceleration (machine learning)

Parallel Decoding of Space Time Codes on GPU

Image processing using parallel computing and PyOpenCL (Night vision)

**2015 Fall Projects**

3D Image Reconstruction (Stereo Vision Based Depth Perception & 3D Spatial Reconstruction)

Accelerate the Analysis of EEG Signal Based on Nonlinear Feature Extraction and Classification by Parallel Algorithm

Fast VOIP MOS (Mean Opinion Score) calculation

GPU Acceleration for Neural Network based Handwritten Digits Recognition

Image Matching Accelerator based on SIFT

Image blending

Image Stitching

Local Linear Embedding using OpenCL

Performance of Linear Equalization in Narrowband Channels

Parallel Computing on SAR Image Processing

Canny Edge and Boundary Detection using OpenCL

Parallel HEVC Video Compression Using OpenCL

Speaker Recognition

**2016 Fall Projects**

3D Voxel De-blurring

Laplacian Approximation on GPU

Basic object recognition

Camera Localization

Dark channel haze removal

Disparity Map Calculation by GPU

FPGA Implementation of PyOpenCL/OpenCL

First Principles MPI Simulator

Fingerprints recognition for security

GPU acceleration for SPH(Smooth Particle Hydrodynamics)

K-means Clustering Acceleration on GPU

Kinect Color and Depth Image Alignment

GPU-based Monte Carlo simulation of light transport for optical fiber probe geometries

Object Tracking Based on Video Analysis

Parallel Computing in Traffic Sign Detection

Real-time medical image processing empowered by parallel computing

Parallel simulated annealing

Recommendation Algorithms using deep learning

Real-time image de-hazing

Smart Cluster Construction

**2017 Fall Projects**

Parallel Methods for Image Deblurring

Parallel Locality-Sensitive Hashing In Movement and Gesture Recognition

3D Wifi Mesh Generation

Sudoku Solver with Parallel Backtracking

Parallel Processing for Analysis Purpose

Implement a game playing system

3D Image Reconstruction (Stereo Vision Based Depth Perception & 3D Spatial Reconstruction)

Vector representation for words

Iris detection in biometrics

Parallel Computation in Image Registration

3D Human Action Recognition

Fractal Image Compression in Parallel Computing platfrom

Parallelized Object Tracking with KCF

Parallelized Stock Market Prediction

Parallelized Monte Carlo Methods in Reinforcement Learning

Sparse Representation Based Image Super Resolution

Neural Network Based Artistic Style Transfer Algorithm

Sparse Representation Based Image Super Resolution

Face recognition with CNN

Object Detection using Cascade Classifier

**2018 Fall Projects**

3D rendering

Parallelization for KL Divergence Non-Negative Matrix Factorization

Deconvolution using Richardsonâ€“Lucy approach

3D City Adventure Game

Acceleration of fiber orientation analysis and visualization for optical coherence tomography imaging

GPU accelerating 3D Reconstrucion from sketches via Multi-view convolutional networks

Using Parallel Computing to Accelerate Artificial Neural Network

Mutual Information Based Semi Global Stereo Matching

Parallel SGD

Slam solver

Accelerating forward propagation of ZF-Net

CUDA and OpenCL accelerating for typical deep learning algorithms

Rapid optical coherence tomography image acquisition

Parallel Optimization for Deep Learning

Finite Difference Time Domain Simulation

**2019 Fall Projects**

GPU Acceleration of Canny Edge Detection for Images

Accelerating Discrete Wavelet Transforms of Parallel Architectures

Acceleration of K-means Clustering using PyCUDA

Acceleration of Point Correspondence Model Construction in Cone-Beam CT Scans

CUDA implementation of Recommender Systems

Data Augmentation Techniques for Deep Learning: Comparing Cuda, OpenCL, and Serial Implementations

Hardware Acceleration of Secure Hashing

Image Preprocessing for Convolutional Neural Networks using Parallel Computing

Accelerated Singular Value Decomposition for Principal Component Analysis

Parallel Particle Swarm Optimization

Acceleration of Spectral Domain Phase Microscopy for Vibrometry in Spectral Domain Optical Coherence Tomography

Spring mass system simulation

Efficient Primitives for CP and Tucker Tensor Decompositions on GPUs

Acceleration of Deep Learning Algorithm U-net to make predictions of drivers' workload

Particle filter For Mobile Robot localization

**2020 Fall Projects**

Acoustic Feature Extraction Acceleration

Acceleration of Multiple Signal Classification (MUSIC) Algorithm for Frequency Estimation

Parallel Acceleration for Cross-Lingual Relation Extraction

Acceleration of Demosaicing Bayer Color Filter Arrays (CFA)

Efficient CNN for Video Understanding

Acceleration of Long Short-Term Memory (LSTM) on GPU

Parallel linear programming solver

Acceleration of a Spiking Neural Network

Acceleration of Support Vector Machine(SVM) algorithm

**202****1**** Fall Projects**

3D Virtual Acoustics Using CUDA

Acceleration of Image Haze Removal Using Dark Channel Prior

Acceleration of LeNet on GPU with PyCUDA

Acceleration of kmeans algorithm

Acceleration of GLove representation

Covert Data Reduction into Actual Computational Efficiency: Customized Parallellzation of Convolution with Sparse Mask

Parallelizing class imbalance problem using SMOTE

Deep reinforcement learning with GPU acceleration()"

Heterogeneous Stock Inference via Scheduling

Parrallelize RANSAC

Parallel Random Forests

Parallelizing Nonlocal Means (NLM) denoising algorithm for 3D images

Speedup genetic algorithm using parallel computation

Non-negative Matrix Factorization

Fast and High Precision Multi-resolution Engine using Parallel Processors