Data structure & Algorithm
Data Structure
Array & Linked Lists
Linked Lists are similar to array; both store ordered data items. The difference is how they are stored inside the memory.
HashMap do NOT store ordered items.
Array elements must be stored beside each other and form a "contiguous block" in the memory. Linked List have pointer so the elements can be store in any location.
Array's Advantage: fast at reading elements
Disadvantages: slow when insert & delete elements because of how array are stored inside memory
Linked Lists is the opposite of array. Slower at reading but fast whenn insert/delete.
Linked list store nodes. Each node contains two fields: (1) data & (2) address of the next node in memory.
- Deletion of a node (given a pointer to the node) requires traversing the list to find its predecessor, making it an O(n) operation.
By default, we have Singly Linked List.
In a doubly Linked List, each node has two links: one to the next node & one to the previous node. Vẫn có field để store data nhé
- Pros: Can do reverse look-up making deletion of a node (given a pointer to the node) can be done in O(1) time because the previous node can be accessed directly.
- Cons: Extra memory for pointer to the previous node
- ArrayList: Internally uses a normal fixed-size array to store its elements. This means elements are stored in contiguous memory locations. But when the array reaches its capacity, a new, larger array is created, and the elements are copied over.
- LinkedList: Internally uses a sequence of "nodes." Each node contains the data element and a reference (or pointer) to the next node in the sequence. In the case of a doubly linked list, it also has a reference to the previous node. Elements are not necessarily stored in contiguous memory locations
Ví dụ ArrayList full ~80% its size thì nó tự động tạo array mới with double the old size and copy elements over to the new array.
Stack and Queue
Queue (hàng đợi) và ngăn xếp
Stack is FILO or LIFO (Last In, First Out)
Applications of Stacks: Function calls, Recursion
Queue is FIFO
Applications of Queye: Task Scheduling
Set
Set is like array, but no duplicate item. Order does not matter. We care about whether or not an item is presence in the set.
Tree
Binary Tree is a non-linear and hierarchical data structure.
Tree giống linked list nhưng trong LL, nodes only link to one next node. In tree, node can link to multiple next nodes.
When a node can only link to max two children node, it is call Binary tree.
Một node chỉ được phép có 1 parent node.
Terminnologies:
- An edge is a line that connects two nodes, specifically, a parent node to its child node.
- Leaf Node: A node that does not have any children or both children are null.
- Internal Node: A node that has at least one child. This includes all nodes except the leaf nodes.
HashMaps, Dictionary
HashMaps (aka dictionary) giống array nhưng thay vì index, nó dùng keys. Index chỉ có thể là integer, còn key thì có thể là string.
Có thể tìm element trong Hash Map bằng string key which cannot be done with array.
- HashMaps are un-ordered
- Search in O(1)
- A HashMap can store duplicate values, but it cannot store duplicate keys.
The reason Hash Tables are sometimes preferred instead of arrays or linked lists is because searching for, adding, and deleting data can be done really quickly, even for large amounts of data.
- In a Linked List, finding a person "Bob" takes time because we would have to go from one node to the next, checking each node, until the node with "Bob" is found.
- And finding "Bob" in an Array could be fast if we knew the index, but when we only know the name "Bob", we need to compare each element (like with Linked Lists), and that takes time.
- With a Hash Table however, finding "Bob" is done really fast because there is a way to go directly to where "Bob" is stored, using something called a hash function.
A hash function take in a string value (key) and return a hash code. It converts a string into a number (giống index của array).
- A HashSet is an implementation of a set - a collection of unique keys.
- HashMap and HashTable are both implementations of a Map - a collection of key value pairs where the keys are unique (kind of like a lookup table). HashTable does NOT allow a null key, but HashMap does.
Algorithm Terminnologies
Time Complexity and Space Complexity
Time Complexity: The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input. Note that the time to run is a function of the length of the input and NOT the actual execution time of the machine on which the algorithm is running on.
During analyses, mostly the worst-case scenario is considered. We ignore the lower order terms.
Ignore the constants. A constant is any steps that does not scale with the input to the function.
Big-O is a way to express the upper bound of an algorithm’s time or space complexity. Below are the major types of complexities (time and space) in ascending order of growth (Excelent to Horrible):
- Connstant time:
O(1); the time does NOT grow when input increase. - Logarithmic:
O(log n). - Linear time scaling:
O(n). - Linearithmic
O(n log n). - Quadratic
O(n^2). - Cubic
O(n^3). - Exponential
O(2^n). - Factorial
O(n!).
Where n is the number of input elements (x-axis). The y output values of the function (y-axis) is the operations needed to complete the algorithm.
The base of the logarithm is always taken as 2 in finding time complexity of algorithms. But it's not really matter what the base is. We take the number 2 because (1) it's greater than 1 and (2) it's a typical base number.
Còn một lý do để lấy base 2 là như trong merge sort, mỗi lần divide sẽ chia đôi cái original array thành N/2.
Space complexity
For an algorithm to take constant extra space, the extra variables used to solve it should not change with the input size, for example, In bubble sort, the extra variable that we will use is the temp variable, it doesn't change with the size of the input array, we will always use the exact number of extra variables for any size.
Sorting Algorithms
Khi học thì luôn sort ascending (biggest number stay on the right - at the end of array).
Khi giải một bài toán algorithm chỉ nên tập trung vào 1 cái trong 2 cái này vào 1 thời điểm:
- What the algo is doing
- How it is doing it?
Bubble, Selection, Insertion Sort
These 3 algorithms both have time complexity of O(n^2).
Bubble Sort is the simplest sorting algorithm. For each iteration, the biggest number bubble to the end of the array.
- Not suitable for large data sets as its average and worst-case time complexity are quite high.
i(counter) starting from 0 chạyn-1iterations (còn có 1 item thì không chạy nữa),j in range(n-i-1)- Best case
O(N)(array is sorted) comparein-1 lần nếu thấy no swap => end program - Worst case
O(N^2)(array is sorted in the opposite direction)
The Selection Sort algorithm finds (select) the lowest value in an array and moves it to the front of the array.
- 1st step make
n-1comparison; 2nd step maken-2; 3rd makesn-3comparison... tới khi make1comparison - Total comparision (for all iteration):
0+1+2+...+(N-1)= \frac{n(n-1)}{2}. Sau khi simplify thìO(N^2)(best & worst cases)
Công thức tính sum of arithmetic series here.
Insertion sort works by continually inserting each element from an unsorted subarray (from the right-hand side) into a sorted sub-array (hence the name “insertion” sort). We partially sort the array from left to right.
icounter run from[0, N-2];0<j<=(i+1).Nis the length of the array- Worst case (descending sorted or sorted but opposite direction)
O(N^2) - Best case (already sorted)
O(N)
Merge Sort
This is a divide-and-conquer algorithm.
- Dividide the original arr into 2 sub-arrs
- Sort each sub-arr via recursion
- Merger them together
The helper merge() function only merge two sorted arrays.
Merging two sorted arrays is easy. Chạy từ left -> right, so sánh từng cặp giá trị của 2 arrays.
Apply the same idea to merge to base-condition arrays (which contain only one element)
- Time complexity
- Hình dùng divide theo hình nguyên phân môn sinh học.
- Dùng đại số tính số lần divide (number of iteration)
1=ff{N}{2^k}=>k = log_2 N - For each iteration
k, there are alwaysNelement being merged. Tức là cóNphép so sánh cần thiết để merge those arrays together => The Total complexity of merge sort is:O(N . log N)
Quick Sort
Mình phải chọn pivot. After that, whatever pivot you take, after every iteration:
-
All element less than pivot go to left side
-
Element greater than pivot go to right side
-
After every iteration, you are putting the pivot at the correct position.
-
You then use recursion to continue
-
The while loop to put the pivot at its correct position is
O(n) -
T(N) = T(k) + T(N-k-1) + O(N) -
Worst case
O(N^2):k=0which means we take pivot as first or last element. The recursion partition is always un-balance, in this case on(n-1)element instead ofn/2O(n^2). Nhiều lần parition(n-1)cuối cùng raO(N^2) -
Average & Best case
O(n log n): Occurs when the pivot selection consistently divides the array into roughly equal halves
Searching Algorithms
Find duplicate value/number
There is only one repeated number in the array.
Binary Search
The idea of binary search is to use the information that the array is already sorted and reduce the time complexity to O(log n)
1=\ff{N}{2^k}
Find third highest value
Single Traversal Approach (Efficient)
Initialize Variables: Create three variables, first, second, and third, to store the highest, second highest, and third highest numbers found so far. Initialize them to a very small number (e.g., negative infinity).
- Iterate and Update: Traverse the array once. For each element x:
- If
xis greater thanfirst, updatethird = second,second = first, andfirst = x. - Else if
xis greater thansecond(and not equal to first), updatethird = second, andsecond = x. - Else if
xis greater thanthird(and not equal to first or second), updatethird = x.
- If
Return: After iterating through the entire array, third will hold the third highest number.
Other algorithms
i
Recursion
Base case = stopping condition
Pros:
- Biểu diễn, complex and elegant way algorithm
- Works well with recursive structures like trees, graphs, JSON objects.
Cons:
- Slowness due to CPU overhead; can lead to memory errors / stack overflow exceptions