Matt Willard


Grokking Algorithms

Introduction to Algorithms

Remember:

  1. Linear search is one after the other through a list and is very slow
  2. Binary search cuts each operation in half, splitting the sorted list every time: way faster in terms of Big O

Big O notiation

-number of steps it takes to finish an operation
-certain types of algos and sorts can modify the number of steps and that’s great for speed and performance
-you can use logarithms to talk about Big O
see this is important because some algorithms grow in speed and complexity at different rates
-> a logarithm is a REVERSE exponent: when you say O(log n), it’s like saying n to the log power
COMMON RUN TIMES
O(log n), also known as log time. Example: Binary search.
O(n), also known as linear time. Example: Simple search.
O(n * log n). Example: A fast sorting algorithm, like quicksort
O(n2). Example: A slow sorting algorithm, like selection sort
O(n!). Example: A really slow algorithm, like the traveling salesperson

Selection Sort

Arrays

Remember that arrays are sets of information in a row of memory blocks
PRO: it’s easier to jump to different spots in the array to find things you want right away, such as arr(i) = fast reads
CON: they always usually have to be one after the other, otherwise the whole array needs to get removed and readded into new memory with the extra stuff you need
-this is why when declaring an array you often need to declare the size at the get-go

Linked Lists

PRO: Unlike arrays, each item in a linked list has the value and the link to the next value’s memory address, so they can use memory more efficiently = fast inserts/deletes
-This also makes it easy to insert into the middle of the list by creating new links.
CON: Unlike arrays, you must iterate through a linked list to find something, you can’t just easily jump directly to it because you don’t know which address it’s at

BIG O NOTATION
Read array: O(1) | read list: O(n)
Insert array: O(n) | insert list: O(1)

Selection Sorting

Not that fast!

  1. Look through list and find highest
  2. Pop this from old list and add to new list
  3. Repeat with old list until done BIG O: O(n2) = O(n steps, * n times)

Recursion

In recursion, a function calls itself and can be a way to repeat things in a loop, sometimes providing new information for the next loop

Base and recursive case

base case: the case when the function doesn’t call itself
recursive case: the other times where it repeats

function countdown(i)  
	print i  
	if (i = 1) {  
		return  
	} else {  
		countdown(i-1)  
	}  

The stack

Remember that a stack is a data structure, like an array or a list operates under FIFO: first in, first out

Call Stack: determines the order in which functions are called based on the order they’re put on the stack
-as a result, stuff that’s partially called is paused until the top item completes
-remember! ASYNC programming works around this by having multiple operation threads go at once
When considering recursion: each recursive call thus goes on the call stack and works through in order: so if you need to, build a stack visual to work through to understand how a function is being recursive
THE TRADEOFF: this can eat a lot of memory, so there are alternatives like looping

Quicksort

Divide and Conquer

It’s about splitting the problem into a smaller sub-problem or series of sub-problems to solve, which is easier
For a D&C algo:
1. Find the base case which is the simplest possible case 2. divide the problem until it becomes your base case IE, can you solve a WAY SIMPLER version of the problem first and scale?

Let’s add 2,4,6 with a recursive function.
BASE CASE: What’s the sum of an array of just 0 items? {0} = 0.
Then:
SUM([2,4,6]) => 2 +SUM([4,6]) => you call this recursively

def sum(list): //Python
	if list == []: //empty list
		return 0
	return list[0] + sum(list[1:]) //use a sliced list to run sum again with a smaller array

Remember that the stack of functions then resolves in order

Quicksort

With quicksort:

  1. pick a pivot in the array to sort by
  2. group into sub-arrays less than the pivot and greater than the pivot
  3. call quicksort on the sub-arrays to do it again until all sub-arrays are sorted
  4. combine them all together

The base case for quicksort is an array with 1 or 0 elements: then you can just return that as is. This is a lot quicker than select sort!

7 5 13 4 9 //13 is pivot
7 5 4 9 | [13] // 13 is returned; call quicksort on the first array (let’s say we do five now)
4 | [5] | 7 9
=> and you repeat until done!

Merge sort VS. quick sort

Merge soft is O(n log n), faster than quicksort’s O(n^2) worst case time.
=> HOWEVER, depending on where the pivot is chosen, this speed can vary.
-Quicksort gets faster, for example, if you choose the middle item of a sorted array to shorten the call stack
-In practice, since quicksort’s constant average time is hit way more often, it can be faster than merge sort
–BASICALLY ON AVERAGE, if you pick a random element to pivot on, the average sort time of quicksort is O(n log n)

Hash Tables

Remember that a hash table is an array of linked lists. (A common one: an array of alphabet letters and each letter has a list of words. Google uses this for search text completion.) Often they’re full of key:value pairs.
SORTING one is super slow since you have to iterate through each part of each linked list. However, SEARCHING one, with a few caveats, is very fast. The average speed is O(1)! Just one constant step all the time!
Many languages have hash tables. Python’s dictionary is a hash table.

Hash functions

A hash function will take any string and spit out the same number for it every time. Hash functions also never assign two words to the same number to avoid double assignment.
These are used to populate hash tables - simply hash, and boom, an index where you can store the word. Need to find a value for a word? Pop in the word in your hash function, get the number, and boom, you’re done.

Use cases

=> They’re great for lookup: search terms, websites, phonebooks, etc.
=> You can use them to check and prevent duplicate entries via the hash function.
=> For regularly accessed websites, they can be cached in a hash so they don’t need to have their data loaded every time.

Collisions

So remember the linked lists? A hash function will inevitably try to assign two or more things to the same index in an array. At that point it can either
A. shift it over to the next empty slot, which ultimately will become
B. start a linked list at that index
If the hash function is crummy, it will make a long linked list at only a few slots. Get a good hash function that doesn’t do this.

Performance

In the average case, hash tables are O(1). In the worst, they’re O(n). They become worse case if there’s too many collisions, or if the load factor is too high.

To get a low load factor: have more spots in your array than things you need to add.
Divide # of items by total number of slots. If it’s more than .07, add more slots.
Avoid collisions: use a hash function that spreads the items out evenly.

Breadth-first search is a way to navigate a data structure called a graph (not the X/Y kind). It’s often used to find the shortest path to another point on the graph.

Graphs

A graph is a set of nodes connected to each other by edges. Edges can flow one way or back to another. A classic example is seeing what paths to take from one city to another in the shortest way possible. In fact, the problem is called ashortest-path problem.
-Directly connected nodes are called neighbors.
-A graph with only one-way edges is a directed graph; otherwise, it’s undirected.

Breadth-first search

It’s great for finding the shortest path to something, and can be used for things like spell checkers, finding a location closest to you, and so forth. It can see if there is a shortest path and if a path exists at all.

Finding the shortest path
Consider connections in terms of degrees radiating out from the starting point: first, second, etc. Breadth-first wants to check all the first degrees first before moving to each succeeding degree. The tradeoff: you need to do this searching in the order they’re added.
Queues
Hey, remember how the MTG stack is Last In, First Out? A queue is First In, First Out. All you can do with a queue is enqueue (push) or dequeue (pop) that first item added. Breadth-first search usues queues to ensure the resulting path is shortest.

Implementing the graph

Hash tables can be used to set up a graph. (And remember, they’re arrays of linked lists!)
-You can have your key (a node) be set to multiple values (its neighbors).

Implementing the algorithm

Here’s an example.

  1. Start with a queue. Do this to get the shortest path.
  2. Pop the first person off the queue. If they’ve been already searched, don’t do your check.
  3. Otherwise, do your check. Mark that person as searched in another list. This way you don’t keep checking in an infinite loop.

Other notes

Trees: a special type of graph where no edges ever point back. Topological sort: a method to make an ordered list out of a graph based on node dependencies.

Djikstra’s Algorithm

So you whip out Djikstra’s algorithm when you’re working with a weighted graph. A weighted graph is when the edges are, well, weighted. Think like miles between cities. You can then use this algorithm to find the shortest path to X in regards to the weight - in this case, the fastest path, which might take more edges to reach than the shortest.

THE STEPS:

  1. Find the “cheapest” node. This is the node you can get to in the least amount of time. (The lowest weight.)
  2. Check whether there’s a cheaper path to the neighbors of this node. If so, update their costs. (IE, if you go to B, you record the cost, then you check the neighboring node edges from B and you update costs accordingly.)
  3. Repeat until you’ve done this for every node in the graph.
  4. Calculate the final path.

DA only really works with directed graphs - if the nodes loop back on each other, you can’t really get the shortest path out of that.

You start off with a chart of nodes you know about and the nodes you don’t know about:

Node Cost
a 2
b 6
Finish Inf.

Then you update it with costs as you check out the other nodes.

Node Cost
a 5
b 2
Finish 7

Negative-weight edges

Djikstra’s algo doesn’t work with negative-weight edges - as you go through it, it’ll look for the shortest immediate costs even if negative edges exist. Use the Bellman-Ford algorithm for negative-weight edge graphs.

Greedy Algorithms

Greedy algorithms are designed to take the most locally optimal move at any point. They aren’t ideal for everything but they can work.

It’s possible to make optimal moves at each juncture and not get the best possible result. But sometimes, you need an algorithm that’s just close enough, especially where O(n) speed is concerned. This is an approximation algorithm - how fast are they and how close are they to the optimal solution?

NP-complete problems

These are problems that are very hard, if not impossible, to solve optimally. In this case, you ought to use an approximation algorithm to solve it. If your problem:

  • can’t be broken down into smaller sub-problems
  • you have to calculate every possible version of X
  • more items dramatically slows down the algorithm
  • resembles a set-covering problem or traveling salesperson problem then it’s probably NP-complete.

Set-covering problem: when you need to calculate every possible subset when they overlap at set intersections Traveling salesperson problem: when the number of routes you need to calculate for an optimal travel solution increases majorly the more cities you put in

Dynamic Programming

Dynamic programming is an algorithm that breaks the original problem down into subproblems, calculates, then brings them together to solve the bigger problem. Whereas a greedy algorithm finds the approximate best solution, this method can be used to calculate optimal solutions.

(It’s not a great name. The more accurate term is ‘memoization’. But it was named to bullshit a secretary of defense at the time who hated research and math.)

You can compare this to a recursive function, but with recursion, it gets exponentially more time complex as you go. Basically: base case = smallest subproblem recursive case = how exactly the problem is being broken down into smaller subproblems

K-nearest Neighbors

This algo is useful for classifying items. Imagine a graph of items based on classification - a new item gets examined by its three closest neighbors, and based on the number of neighbors, the item gets classified accordingly.
Examples include: recommendation systems based on preferences

You compare items and their neighbors by their features. For example, comparing two users for movie recommendations, you’d probably use features based on their data - most watched genres, movies with most frequent actors, length, etc.
You could use a graph of coordinates, and even despite the number of coordinates, you can use a distance formula for all these points to see what compares between two users.

Regression You can take the average of a rating or feature among users - that’s called regression. By finding neighbors then calculating these averages based on the neighbors chosen, you can get different results, detect trends, etc. (You can also use formulas like cosine similarity to pick neighbors for this.

It’s best to pick features that correlate with what you’re measuring against and don’t have a bias. (taste in various movies, ratings among these types, etc.)

Uses in machine learning

Machine learning uses KNN as part of its AI training. For example, it’s trained by going through images and recognizing common features, then it can use that to recognize these character patterns. Spam filters are another example of this. Of course, the more variables that are involved (Stock market prediction) the harder this is.

Where to Go Next

Trees

cascading series of nodes, left branches are smaller, right are higher
-start at the root, work down, and by doing a comparison at each branch, you can find what you’re searching for.
RUNTIME: O(log n) best, O(n) at worst
-B-trees are trees used to store data in databases, and it’s important to use trees that are balanced on both sides for performance.

Inverted indexes

A hash table that basically maps words to where they appear on pages or content. Search engines are built off of this.

Fourier transform

Great guide to this concept: https://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/
Think of it as a concept that breaks down a pattern into its individual parts for analysis and comparison. The simple version:
What do I have? a smoothie
How was it made? a recipe
what goes in the recipe? parts called ingredients

You can reverse engineer things by breaking them down into their “ingredients”.
Earthquakes: vibrations, speeds, amplitudes
Music: sound frequencies

Parallel algorithms

These are algorithms written to work in parallel across multiple processing cores. For example, there’s a parallel version of quicksort that goes faster than O(n log n). Things to remember:
-how do you divide the tasks among the cores?
-how do you balance the load between simple and complex tasks between the cores?

MapReduce

This is a distributed algorithm where you run an algo on multiple machines at once, which can hel;p with massive jobs like working with huge amounts of data.

The map function applies the same function to every item in an array.
The reduce function transforms an array into a single item by applying and combining the same function to each item in an array.

Together, MapReduce can run queries on massive data sets by working over multiple machines for quicker results.

Bloom filters and HyperLogLog

Bloom filters are probabilistic data structures. Whereas a hash table will give you an accurate lookup, with huge data sets, it can take massive space. A bloom filter is more of an educated guess so it doesn’t have to store exact data. Compare guessing if a URL of a website is positively crawled or not, or if a URL is malicious.

HyperLogLog gives an approximate answer of the number of unique elements in a set, such as unique searches.

Secure hash algorithms

Hash function that returns a string based on an existing string for a hash table. You can use this for easy checking to make sure files are the same and to protect the original string. This is because a SHA is a one-way hash, you can’t reverse it to get the original string.

Locality-sensitive hashing

When you want a hash to compare for similiar items, you use Simhash, a locality-sensitive hash. Changing the string slightly only changes the hash slightly - usually, SHAs will give you a completely different hash to make comparisons much harder.

Diffie-Hellman key exchange

This is a cipher system that’s very hard to break and both parties don’t need to know the cipher ahead of time.
It uses two keys, a public key and a private key. Messages encrypted by the public key can only be decrypted by the private key.
The successor is RSA.

Linear programming

It’s a way to maximize something (votes, profit) given constraints. This is used on graph problems using the Simplex algorithm.