programming_notes

CS notes

View on GitHub

COMPUTER SCIENCE NOTES

Maintained by Sadman Kabir Soumik

Linear vs Non-linear data structures

Key Linear Data Structures Non-linear Data Structures  
Data Element Arrangement In linear data structure, data elements are sequentially connected and each element is traversable through a single run. In non-linear data structure, data elements are hierarchically connected and are present at various levels.  
Levels In linear data structure, all data elements are present at a single level. In non-linear data structure, data elements are present at multiple levels.  
Implementation complexity Linear data structures are easier to implement. Non-linear data structures are difficult to understand and implement as compared to linear data structures.  
Traversal Linear data structures can be traversed completely in a single run. Non-linear data structures are not easy to traverse and needs multiple runs to be traversed completely.  
Memory utilization Linear data structures are not very memory friendly and are not utilizing memory efficiently. Non-linear data structures uses memory very efficiently.  
Time Complexity Time complexity of linear data structure often increases with increase in size. Time complexity of non-linear data structure often remain with increase in size.  
Examples Array, List, Queue, Stack. Graph, Map, Tree.  

Difference between tree and graph

In tree, there is no cycles. In Graphs, cycles may form.

DFS, BFS

DFS BFS
Stack Queue
LIFO FIFO
Stacking Plates Queue in front of a elevator
DFS is more suitable when there are solutions away from source. BFS is more suitable for searching vertices which are closer to the given source.
when we want to know the all possible results when we want to find the shortest path (simple graph). we usually use bfs,it can guarantee the ‘shortest’.

Array vs Linked List

Operation Array LinkedList
Size Since data can only be stored in contiguous blocks of memory in an array, its size cannot be altered at runtime due to the risk of overwriting other data. in a linked list, each node points to the next one such that data can exist at scattered (non-contiguous) addresses; this allows for a dynamic size that can change at runtime.
Memory allocation Happens at compile time Happens at run-time
Execution time Any element in an array can be directly accessed with its index. Also, better cache locality in arrays (due to contiguous memory allocation) can significantly improve performance. As a result, some operations (such as modifying a certain element) are faster in arrays. In the case of a linked list, all the previous elements must be traversed to reach any element. Inserting/deleting an element in the data are faster in linked lists.
List iteration O(n) time O(n) time
Cost of accessing an element O(1) O(n)
application array better in searching linked list better in insertion and deletions

Dynamic programming vs Recursion

Dynamic programming is used where we have problems, which can be divided into similar sub-problems, so that their results can be re-used.

During recursion, there may exist a case where same sub-problems are solved multiple times. Consider the example of calculating nth fibonacci number.

fibo(n) = fibo(n-1) + fibo(n-2)
fibo(n-1) = fibo(n-2) + fibo(n-3)
fibo(n-2) = fibo(n-3) + fibo(n-4)
.................................
.................................
.................................
fibo(2) = fibo(1) + fibo(0)

In the first three steps, it can be clearly seen that fibo(n-3) is calculated twice. If one goes deeper into recursion, it may find repeating the same sub-problems again and again.

Benefit of DP over Recursion: DP is a technique which uses a table to store the results of sub-problem so that if same sub-problem is encountered again in future, it could directly return the result instead of re-calculating it.

Ref: Quora

Polymorphism vs Overriding vs Overloading

Polymorphism means more than one form, same object performing different operations according to the requirement.

Polymorphism can be achieved by using two ways, those are

  1. Method overriding
  2. Method overloading

Method overloading means writing two or more methods in the same class by using same method name, but the passing parameters are different.

Method overriding ability of any object-oriented programming language that allows a subclass or child class to provide a specific implementation of a method that is already provided by one of its super-classes or parent classes. When a method in a subclass has the same name, same parameters or signature and same return type(or sub-type) as a method in its super-class, then the method in the subclass is said to override the method in the super-class.

OOP Concepts

Encapsulation

Declare all variables in the class as private. Write public methods in the class to set and get the values of the variables. It is more defined in the setter and getter method.

Abstraction

Handle complexity by hiding the unnecessary details from the user. For example, coffee machine.

Base class won’t have the implementation details, we just define a abstract method without detail implementation. Other class will inherit this class, override the abstract method and have the detail implementation.

Inheritance

Child class extends a parent class. Child class can extend parent class’s all public and protected methods and can have its own implementation.

Polymorphism

See above.

Benefits of OOP

  1. Modularity for easier troubleshooting. When working with OOP, we know exactly where to look at when something goes wrong.
  2. Reuse of code through inheritance.
  3. OOP systems can be easily upgraded from small scale to large scale systems.

Things to consider when designing REST API (Best Practices)

  1. Use JSON as the Format for Sending and Receiving Data. This is because, with XML for example, it’s often a bit of a hassle to decode and encode data – so XML isn’t widely supported by frameworks anymore.

  2. When you’re designing a REST API, you should not use verbs in the endpoint paths. The endpoints should use nouns, signifying what each of them does. This is because HTTP methods such as GET, POST, PUT, PATCH, and DELETE are already in verb form for performing basic CRUD (Create, Read, Update, Delete) operations.

    So, for example, an endpoint should not look like this:

    https://mysite.com/getPosts or https://mysite.com/createPost
    

    Instead, it should be something like this: https://mysite.com/posts

  3. You should always use regular HTTP status codes in responses to requests made to your API. This will help your users to know what is going on – whether the request is successful, or if it fails, or something else. For example, 404 - Not Found. See the most popular codes.

  4. Oftentimes, different endpoints can be interlinked, so you should nest them so it’s easier to understand them.

    For example, in the case of a multi-user blogging platform, different posts could be written by different authors, so an endpoint such as https://mysite.com/posts/author would make a valid nesting in this case.

  5. Use SSL for Security.

  6. REST APIs should have different versions, so you don’t force clients (users) to migrate to new versions. This might even break the application if you’re not careful. One of the commonest versioning systems in web development is semantic versioning.

    Many RESTful APIs from tech giants and individuals usually comes like this: https://mysite.com/v1/ for version 1 https://mysite.com/v2 for version 2

  7. When you make a REST API, you need to help clients (consumers) learn and figure out how to use it correctly. The best way to do this is by providing good documentation for the API.

    The documentation should contain:

    • relevant endpoints of the API
    • example requests of the endpoints
    • implementation in several programming languages
    • messages listed for different errors with their status codes.

    Ref: freecodecamp

Why stateless architecture is better

It doesn’t store any session data. A stateful architecture remembers client data(state) from request to the next. Adding or removing servers (horizontal scaling) is difficult with this approach.

That’s why stateless archi is better for horizontal scaling. In stateless (REST) architecture, we move the state data from web servers to persistent storage(SQL/NoSQL DB). It helps to auto-scale the web-tier(servers) just by adding/removing servers based on the traffic loads.

Instead of pulling data from web-servers, stateless architecture pulls state from shared storage(DB).

For this reason, stateless is

Message Queue vs Pub/SUb

Message queues

Message queues consist of a publishing service and multiple consumer services that communicate via a queue. This communication is typically one way where the publisher will issue commands to the consumers. The publishing service will typically put a message on a queue or exchange and a single consumer service will consume this message and perform an action based on this.

Consider the following exchange:

img

From this, we can see a Publisher service that is putting a message ‘m n+1’ onto the queue. In addition, we can also see multiple messages already in existence on the queue waiting to be consumed. On the right-hand side, we have 2 consuming services ‘A’ and ‘B’ that is listening to the queue for messages.

Let’s now consider the same exchange after some time:

img

First, we can see that the Publisher’s message has been pushed to the tail of the queue. Next, the important part to consider is the right-hand side of the image. We can see that consumer ‘A’ has read the message ‘m 1’ and, as such, it is no longer available in the queue for the other service ‘B’ to consume.

Example: Amazon SQS, RabbitMQ

Pub/Sub

Conversely, to message queues, in a pub-sub architecture we want all our consuming (subscribing) applications to get *at least 1* copy of the message that our publisher posts to an exchange.

Consider the following exchange:

img

On the left we have a publisher sending a message “m n+1” to a Topic. This Topic will broadcast this message to its subscriptions. These subscriptions are bound to queues. Each queue has a listening subscriber service awaiting messages.

Let’s now consider the same exchange after some time has passed:

img

Both the subscribing services are consuming “m 1” as both received a copy of this message. In addition, the Topic is distributing the new message “m n+1” to all of its subscribers.

Pub sub should be used where we need a guarantee that each subscriber gets a copy of the message.

Example: Apache Kafka

Ref: https://www.baeldung.com/pub-sub-vs-message-queues

synchronous vs asynchronous

Synchronous: Sequential execution. With synchronous communication the caller sends a message and waits for the receiver to respond. This is appropriate for actions such as login and purchase, in which the caller must have a reply.

With asynchronous communication the caller skips the wait and continues executing whatever code is necessary.

Git Rebase

With a regular rebase, you can update your current branch with another branch.

git getch origin main
# checkout to target branch
git checkout <my-feature-branch>
# rebase it against the main branch
git rebase origin/main

Git Pull vs Fetch

fetch pull
Gathers any commit from the target branch to the current branch. However, it doesn’t merge with the current branch. If we want to merge those changes, we must have to use git merge Tries to automatically merge after fetching commits. So, all pulled commits will be merged into your currently active branch. Git pulls automatically merge the commits without letting the user review them.

Git Bisect

This command is uses binary search algorithm to find which commit in the project’s history introduced a bug,.

git bisect start
git bisect good
git bisect bad <bad commit>

It tracks down the commits where the code works and where it doesn’t.

Kubernetes main components

Docker vs Kubernetes

In a nutshell, Docker is a suite of software development tools for creating, sharing and running individual containers; Kubernetes is a system for operating containerized applications at scale.

Think of containers as standardized packaging for microservices with all the needed application code and dependencies inside. Creating these containers is the domain of Docker. A container can run anywhere, on a laptop, in the cloud, on local servers, and even on edge devices.

A modern application consists of many containers. Operating them in production is the job of Kubernetes. Since containers are easy to replicate, applications can auto-scale: expand or contract processing capacities to match user demands.

Docker and Kubernetes are mostly complementary technologies—Kubernetes and Docker.

Ref: Peter

Enterprise software vs Consumer software

Enterprise software is just another term for business software. This is software that is sold to (or targeted at) companies, not to individuals. So, all the software which you use on a general basis like Windows or Google or Quora is consumer software.

Enterprise software is sold to companies to solve their problems. This can cover a wide range of applications, from software to manage the employees like payroll, attendance, promotions etc. (HRM), interacting with customers like the one’s marketing, sales.

Download speed vs Upload Speed

Download speed refers to how many megabits of data per second it takes to download data from a server in the form of images, videos, text, files and audio to your device. Activities such as listening to music on Spotify, downloading large files or streaming videos on Netflix all require you to download data.

Upload speed refers to how fast you can send information from your computer to another device or server on the internet. While downloading information is more common, some online activities need data to travel in the opposite direction. Sending emails, playing live tournament-style video games and video calling on Zoom require fast upload speeds for you to send data to someone else’s server.

Ref

Permutations vs Anagrams vs Palindromes

Check Permutation: Given two strings, write a method to decide if one is a permutation of the other.

I’m working through algorithm exercises with a group of people, and there was a lot of confusion about what permutation means, and how it differs from anagrams and palindromes.

So, to clarify:

A permutation is one of several possible variations, in which a set of things (like numbers, characters or items in an array) can be ordered or arranged. A permutation of characters does not have to have meaning.

Example: Given the string abcd, the permutations are abcd, abdc, acbd, acdb, adbc, adcb, bacd, badc, bcad, bcda, bdac, bdca, cabd, cadb, cbad, cbda, cdab, cdba, dabc, dacb, dbac, dbca, dcab and dcba

An anagram is a word, phrase, or name formed by rearranging the characters of a string. An anagram must have meaning, it can’t just be gibberish.

Example: These words are anagrams of carets: caters, caster, crates, reacts, recast, traces

A palindrome is a word, phrase, or sequence that reads the same backward as forward. A palindrome must have meaning, it can’t just be gibberish.

Example: Civic, level, madam, mom and noon are all palindromes.

All palindromes and anagrams are permutations, but not all permutations are either anagrams or palindromes.

Concurrency and parallelism

Concurrency and parallelism both relate to “different things happening more or less at the same time.

Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. It doesn’t necessarily mean they’ll ever both be running at the same instant. For example, multitasking on a single-core machine.

Parallelism is when tasks literally run at the same time, e.g., on a multicore processor.

https://fastapi.tiangolo.com/async/#in-a-hurry

Multi-threading vs multi-processing

When looking for the difference between python multiprocessing and multithreading, one might have the impression that they work pretty much the same. That could not be more wrong. The key differences are:

Ref: medium

Daemon in Linux

A daemon (pronounced DEE-muhn) is a program that runs continuously and exists for the purpose of handling periodic service requests that a computer system expects to receive. The daemon program forwards the requests to other programs (or processes) as appropriate. For example, the Cron daemon is a built-in Linux utility that runs processes on your system at a scheduled time. We can configure a cron job to schedule scripts or other commands to run automatically.

Keyboard shortcuts

vscode

Linux

Linux Terminal

mv filename.txt newfilename.txt
mkdir ./zipfiles
find . -name "*.zip" -exec mv "{}" ./zipfiles \;
ls | wc -l
# Convert and keep original files:
mogrify -format jpg *.png

# Convert and remove original files:
mogrify -format jpg *.png && rm *.png

The following code randomly deletes 1000 jpg files from the current directory.

find . -maxdepth 1 -type f -name "*.jpg" -print0 | head -z -n 1000 | xargs -0 rm

Anaconda commands

GCP commands

Git

git init
git add . && git commit -m "initial commit"
git push -f origin master
ssh-keygen -t ed25519 -C "your_email@example.com"
eval `ssh-agent -s`

git checkout -b <branch_name>

Elasticsearch local commands

# Reload deamon
sudo systemctl daemon-reload

# Restart the elasticsearch service
sudo systemctl restart elasticsearch

Run the command below to start the elasticsearch service and verify the service is running.

sudo systemctl start elasticsearch
systemctl status elasticsearch

Check the status of the ES:

sudo systemctl status elasticsearch

Stop the ES service:

sudo systemctl stop elasticsearch

Install plugins

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install <plugin name>

Docker

sudo service docker start
sudo service docker stop
sudo service docker status

sudo systemctl stop docker.socket

Pickle vs JSON for serialization

https://docs.python.org/3/library/pickle.html#comparison-with-json

CRUD

In computer programming, create, read, update, and delete (CRUD) are the four basic functions of persistent storage.

SQL vs NoSQL

SQL:

Relational database management systems: mysql, postgresql, mariadb, oracle etc.

NoSQL

NoSQL management systems: MongoDB, firebase, apache cassandra etc.

Foreign key is just the primary key of another table. So that we can make a relationship between two tables.

LSM-tree: read here

SQL Basics

Basic statements

Read more details from here

**Query statement (retrieve data): ** SELECT

DDL (data definition language): Create, Drop, Alter, Truncate

DML (Data manipulation language) statement: INSERT, UPDATE, DELETE

Select Rule | Query data from a table
SELECT column_name
FROM Table_name
WHERE Conditions;
Insert Rule | Inser new data in a table
INSERT INTO table_name (column1, column2, ... )
VALUES (value1, value2, ... );

Example, insert one row

INSERT INTO Instructor(ins_id, lastname, firstname, city, country)
VALUES(4, 'Saha', 'Sandip', 'Edmonton', 'CA');

Example, insert multiple row

INSERT INTO Instructor(ins_id, lastname, firstname, city, country)
VALUES(5, 'Doe', 'John', 'Sydney', 'AU'), (6, 'Doe', 'Jane', 'Dhaka', 'BD');
Update Rule | Alter information in a table
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

Example, update data

UPDATE Instructor
SET city='Toronto'
WHERE firstname="Sandip";

Example, update multiple columns

UPDATE Instructor
SET city='Dubai', country='AE'
WHERE ins_id=5;
Delete | Remove one or more rows from a table
DELETE FROM table_name
WHERE condition;

Example, delete one row from the table

DELETE FROM Instructor
WHERE ins_id = 6;
Delete vs Drop vs Truncate

DROP command is used to remove schema, table, domain or Constraints from the database.

Truncate command is used to delete the data inside a table, but not the table itself.

DELETE command is used to remove some or all the tuples from the table.

Rollback: One can make use of this command if they wish to undo any changes or alterations since the execution of the last COMMIT.

JOIN statement

A SQL Join statement is used to combine data or rows from two or more tables based on a common field between them. Different types of Joins are:

Aggregate Functions

An aggregate function performs a calculation on a set of values, and returns a single value. Except for COUNT(*), aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of the SELECT statement.

Various aggregate functions are:

1) Count()
2) Sum()
3) Avg()
4) Min()
5) Max()
Few clauses that is used with the select statement

COUNT: retrieves the number of rows

DISTINCT is used to remove duplicate values from a result set.

LIMIT: restricting the number of rows retrieved from the database.

Union vs Join:

Join: It combines data into new columns.

Union: It combines data into new rows.

SELECT Name
FROM Boys
WHERE Rollno < 16

UNION

SELECT Name
FROM Girls
WHERE Rollno > 9


Result:
------------------
Name
-------------------
Soumik
Sadman
Kabir
Khan
Zara
Kona
Rini
Mini
....

SELECT Boys.Name, Boys.Age, Girls.Address,
FROM Boys
INNER JOIN Girls
ON Boys.Rollno = Girls.Rollno;

Result
-------------------
Name     Age      Address
-------------------------
Soumik   27		  Dhaka
Kabir    31       New York
....   ....       .....

Entity -relationship diagrams

Entity: Table

Attribute: Columns

Data type, CHAR vs VARCHAR

A CHAR field is a fixed length, and VARCHAR is a variable length field.

This means that the storage requirements are different - a CHAR always takes the same amount of space regardless of what you store, whereas the storage requirements for a VARCHAR vary depending on the specific string stored.

CHAR fields are stored inside the register due to its size being known, this makes searching and indexing faster.

Indexing in DB

Consider a “Book” of 1000 pages, divided by 10 Chapters, each section with 100 pages.

Now, imagine you want to find a particular Chapter that contains a word “Alchemist”. Without an index page, you have to scan through the entire book/Chapters. i.e: 1000 pages.

This analogy is known as “Full Table Scan” in database world.

book index

But with an index page, you know where to go! And more, to lookup any particular Chapter that matters, you just need to look over the index page, again and again, every time. After finding the matching index you can efficiently jump to that chapter by skipping the rest.

But then, in addition to actual 1000 pages, you will need another ~10 pages to show the indices, so totally 1010 pages.

Thus, the index is a separate section that stores values of indexed column + pointer to the indexed row in a sorted order for efficient look-ups.

Foreign

Foreign Key references the primary key of another Table! It helps connect your Tables.

Normalization

Normalization is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules divides larger tables into smaller tables and links them using relationships.

If a table is not properly normalized and have data redundancy then it will not only eat up extra memory space but will also make it difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if database is not normalized.

1NF
  1. It should only have single valued attributes/columns.
  2. All the columns in a table should have unique names.
  3. Values stored in a column should be of the same domain.
2NF
  1. Should be in 1NF.
  2. Records should not depend on anything other than a table’s primary key.
3NF
  1. satisfy 2NF.
  2. Has no transitive functional dependencies.

(A transitive [functional dependency is when changing a non-key column, might cause any of the other non-key columns to change.

SQLite Database Creation: Flask

$ sqlite3 database.db
$ .tables
$ .exit
# define the database path code.
$ python
$ from app import db
$ db.create_all()
$ exit()
# open the database
$ sqlite3 database.db
$ .tables
$ select * from table_name

CMOS

Stands for “Complementary Metal Oxide Semiconductor.” It is a technology used to produce integrated circuits. CMOS circuits are found in several types of electronic components, including microprocessors, batteries, and digital camera image sensors.

Profiling

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization.

Babel

Babel is a transpiler that converts our ultra-modern JavaScript syntax to browser-readable JavaScript, HTML, and CSS.

HTML class vs ID

The difference between an ID and a class is that an ID is only used to identify one single element in our HTML. … However, a class can be used to identify more than one HTML element.

Vue.js commands

# check vue version
$ vue --version
# create the app from the current directory
$ vue create <app-name>
# run the app to browser
$ npm run serve

Abstract class

An abstract class is a class that is declared abstract —it may or may not include abstract methods. Abstract classes cannot be instantiated, but they can be subclassed. Abstract classes are classes that contain one or more abstract methods. An abstract method is a method that is declared, but contains no implementation. Abstract classes cannot be instantiated, and require subclasses to provide implementations for the abstract methods.

Python on its own doesn’t provide abstract classes. Yet, Python comes with a module which provides the infrastructure for defining Abstract Base Classes (ABCs). This module is called - for obvious reasons - abc.

The following Python code uses the abc module and defines an abstract base class:

from abc import ABC, abstractmethod

class AbstractClassExample(ABC):

    def __init__(self, value):
        self.value = value
        super().__init__()

    @abstractmethod
    def do_something(self):
        pass

We will define now a subclass using the previously defined abstract class. You will notice that we haven’t implemented the do_something method, even though we are required to implement it, because this method is decorated as an abstract method with the decorator “abstractmethod”. We get an exception that Add42 can’t be instantiated.

class Add42(AbstractClassExample):
    pass

x = Add42(4)

Output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-2bcc42ab0b46> in <module>
      2     pass
      3
----> 4 x = Add42(4)

TypeError: Can't instantiate abstract class Add42 with abstract methods do_something

We will do it the correct way in the following example, in which we define two classes inheriting from our abstract class:

class Add42(AbstractClassExample):

    def do_something(self):
        return self.value + 42

class Mul42(AbstractClassExample):

    def do_something(self):
        return self.value * 42

x = Add42(10)
y = Mul42(10)

print(x.do_something())
print(y.do_something())
52
420

Read more…

Python collections module

Read here

###

Static method vs Instance method

Static Method

Static methods are methods that can be called without creating an object of the class. They can be just called by referring the class name. For example: if we have a class:

class ClassName {
  static getPosts() {}
}

Then, I don’t need to create an object like obj = new ClassName() something like this. I can directly call ClassName.getPosts().

Pythonic way to create a static method: Most common form is to put a @staticmethod decorator on top of the function.

class MyClass:
    @staticmethod
    def hello():
        print('static method called')
# way to call a static method
>>> MyClass.hello()

Output:

static method called
Instance Method

Pythonic way to create an instance method: Instance method must contain a self parameter.

class MyClass:
    def instance_method(self):
        return 'instance method called', self

When do we want to use a static method?

  1. Static methods are used when we don’t want subclasses of a class change/override a specific implementation of a method.
  2. A particular piece of code is to be shared by all the instance methods.
  3. If you are writing utility classes and they are not supposed to be changed.

What is the purpose of self keyword in Python?

self represents the instance of the class. By using the “self” keyword we can access the attributes and methods of the class in python.

Let’s say you have a class ClassA which contains a method methodA defined as:

def methodA(self, arg1, arg2):
    # do something

and ObjectA is an instance of this class.

Now when ObjectA.methodA(arg1, arg2) is called, python internally converts it for you as:

ClassA.methodA(ObjectA, arg1, arg2)

The self variable refers to the object itself. The self parameter is a reference to the current instance of the class, and is used to access variables that belongs to the class. It does not have to be named self , you can call it whatever you like, but it has to be the first parameter of any function in the class.

None keyword in python

The None keyword is used to define a null value, or no value at all. None is not the same as 0, False, or an empty string. None is a data type of its own (NoneType) and only None can be None.

list pop()

Code:

queue = [1, 2, 3]
queue.pop()  # delete the last item
print(queue)

Output:

[1, 2]

Code:

queue = [1, 2, 3]
queue.pop(0)  # delete the first item
print(queue)

Output:

[2, 3]

List comprehension in Python

code:

words = ['data','science','machine','learning']

#for loop
a = []
for word in words:
   a.append(len(word))

#list comprehension
b = [len(word) for word in words]

print(f"a is {a}")
print(f"b is {b}"

output:

a is [4, 7, 7, 8]
b is [4, 7, 7, 8]

code:

#for loop
a = []
for word in words:
   if len(word) > 5:
    	a.append(word)

#list comprehension
b = [word for word in words if len(word) > 5]

print(f"a is {a}")
print(f"b is {b}")

output:

a is ['science', 'machine', 'learning']
b is ['science', 'machine', 'learning']

code:

#for loop
a = []
for word in words:
  for letter in word:
    if letter in ["a","e","i"]:
       a.append(letter)


# list comprehension
b = [letter for word in words for letter in word if letter in ["a","e","i"]]

Read more..

List and Tuple difference

List Table
Lists are mutable, means they can be edited Tuples are immutble, means can’t be edited.
Lists are slower than tuples. Tuples are faster.

What is the difference between Python Arrays and lists?

Arrays and lists, in Python, have the same way of storing data. But, arrays can hold only a single data type elements whereas lists can hold any data type elements.

Difference between HashTable and HashMap

Hash MAP Hash Table
Not synchronized. So, it’s not thread-safe. Synchronized. Means it’s thread-safe.
Allows one null key and multiple null values. Doesn’t allow any null key or value.

Python dictionaries are based on a well-tested and finely tuned hash table implementation that provides the performance characteristics you’d expect: O(1) time complexity for lookup, insert, update, and delete operations in the average case.

Python’s set() also uses hashtable as its underlying data structure.

Synchronized

synchronized means that in a multi threaded environment, an object having synchronized method(s)/block(s) does not let two threads to access the synchronized method(s)/block(s) of code at the same time. This means that one thread can’t read while another thread updates it.

How does the Google “Did you mean?” algorithm work?

Basically and according to Douglas Merrill former CTO of Google it is like this:

  1. You write a ( misspelled ) word in Google.

  2. You don’t find what you wanted ( don’t click on any results ).

  3. You realize you misspelled the word so you rewrite the word in the search box.

  4. You find what you want ( you click in the first links ).

This pattern multiplied millions of times, shows what are the most common misspells and what are the most “common” corrections.

This way Google can almost instantaneously, offer spell correction in every language.

Also this means if overnight everyone start to spell night as “nigth”, Google would suggest that word instead. Douglas describe it as “statistical machine learning”.

Proxy Server

PS is an intermediate server between client and the Internet. Proxy servers offers the following basic functionalities:

Purpose of Proxy Servers:

Read more from here

SQL vs NoSQL: what’s the best option for you?

1 . Data structure

The first and primary factor in making the SQL vs. NoSQL decision is what your data looks like. If your data is primarily structured, a SQL database is likely the right choice. A SQL database is a great fit for transaction-oriented systems such as customer relationship management tools, accounting software, and e-commerce platforms. Each row in a SQL database is a distinct entity (e.g. a customer), and each column is an attribute that describes that entity (e.g. address, job title, item purchased, etc.). Read more …

NoSQL examples:

  1. Big data applications.
  2. Rapidly growing application that needs scalability.
  3. Social Media (e.g. Facebook) 4.

SQL (MongoDB, Redis, Cassandra) examples:

  1. Transaction systems.
  2. Banking systems.
  3. Customer relationship systems.
    1. E-commerce.

Difference between JPG and PNG

PNG stands for Portable Network Graphics, with so-called “lossless” compression.

JPEG or JPG stands for Joint Photographic Experts Group, with so-called “lossy” compression.

JPEG uses lossy compression algorithm and image may lost some of its data whereas PNG uses lossless compression algorithm and no image data loss is present in PNG format.

Web Server

  1. Apache HTTP Server
  2. Gunicorn
  3. Nginx (Engine-X) requires a JSON configuration
  4. Unicorn

ASGI (Asynchronous Server Gateway Interface) server implementation

  1. Uvicorn

Metadata

Metadata is “data that provides information about other data”. In other words, it is “data about data”.

Docker Basics

Containerization

Usually, in the software development process, code developed on one machine might not work perfectly fine on any other machine because of the dependencies. This problem was solved by the containerization concept. So basically, an application that is being developed and deployed is bundled and wrapped together with all its configuration files and dependencies. This bundle is called a container. Now when you wish to run the application on another system, the container is deployed which will give a bug-free environment as all the dependencies and libraries are wrapped together. Most famous containerization environments are Docker and Kubernetes.

What is Docker Compose? What can it be used for?

Docker Compose is a tool that lets you define multiple containers and their configurations via a YAML or JSON file. The most common use for Docker Compose is when your application has one or more dependencies, e.g., MySQL or Redis. Normally, during development, these dependencies are installed locally—a step that then needs re-doing when moving to a production setup. You can avoid these installation and configuration parts by using Docker Compose.

Once set up, you can bring all of these containers/dependencies up and running with a single docker-compose up command.

If you wish to use a base image and make modifications or personalize it, how do you do that?

You pull an image from docker hub onto your local system

It’s one simple command to pull an image from docker hub:

$ docker pull <image_name>
How do you create a docker container from an image?

Pull an image from docker repository with the above command and run it to create a container. Use the following command:

$ docker run -it -d <image_name>

-d means the container needs to start in the detached mode.

How do you list all the running containers?

The following command lists down all the running containers:

$ docker ps
Suppose you have 3 containers running and out of these, you wish to access one of them. How do you access a running container?

The following command lets us access a running container:

$ docker exec -it <container id> bash
Can I use JSON instead of YAML for my compose file in Docker?

You can use JSON instead of YAML for your compose file, to use JSON file with compose, specify the JSON filename to use, for eg:

$ docker-compose -f docker-compose.json up

Class Method vs Static Method in Python

A staticmethod is a method that knows nothing about the class or instance it was called on. It just gets the arguments that were passed, no implicit first argument. We can use static method to create utility functions. It’s a way of putting a function into a class (because it logically belongs there), while indicating that it does not require access to the class.

With classmethods, the class of the object instance is implicitly passed as the first argument instead of self.

To decide whether to use @staticmethod or @classmethod you have to look inside your method. If your method accesses other variables/methods in your class then use @classmethod. On the other hand, if your method does not touches any other parts of the class then use @staticmethod.

class Apple:

    _counter = 0

    @staticmethod
    def about_apple():
        print('Apple is good for you.')

        # note you can still access other member of the class
        # but you have to use the class instance
        # which is not very nice, because you have repeat yourself
        #
        # For example:
        # @staticmethod
        #    print('Number of apples have been juiced: %s' % Apple._counter)
        #
        # @classmethod
        #    print('Number of apples have been juiced: %s' % cls._counter)
        #
        #    @classmethod is especially useful when you move your function to other class,
        #       you don't have to rename the class reference

    @classmethod
    def make_apple_juice(cls, number_of_apples):
        print('Make juice:')
        for i in range(number_of_apples):
            cls._juice_this(i)

    @classmethod
    def _juice_this(cls, apple):
        print('Juicing %d...' % apple)
        cls._counter += 1

Database basics

Volatile vs Non-volatile

Volatile: Power-off -> Data lost

Non-Volatile: Power-off -> Still data remains.

Different ERD schemas:
  1. Star schema
  2. Constellation schema
  3. Snowflake schema
Database vs Data warehouse

Database is a collection of related data that represents some elements of the real world whereas Data warehouse is an information system that stores historical and commutative data from single or multiple sources. Database is designed to record data whereas the Data warehouse is designed to analyze data.

DB: designed for record/store data.

DW: designed for analyzing data.

Joins in SQL

Outer is Optional.

So following list shows join equivalent syntaxes with and without OUTER

LEFT OUTER JOIN => LEFT JOIN
RIGHT OUTER JOIN => RIGHT JOIN
FULL OUTER JOIN => FULL JOIN
INNER JOIN => JOIN

How Django works

  1. The entry point to Django applications are URLs. URLs could be as simple as www.example.com, or more complex like www.example.com/whatever/you/want/. When a user accesses a URL, Django will pass it to a view for processing.
  2. Requests are Processed by Views. Django Views are custom Python code that get executed when a certain URL is accessed. Views can be as simple as returning a string of text to the user. They can also be made complex, querying databases, processing forms, processing credit cards, etc. Once a view is done processing, a web response is provided back to the user.
  3. Most often these web responses are HTML web page, showing a combination of text and images. These pages are created using Django’s templating system.

RSS feeds

Its another format like html pages. RSS feeds are created using XML.

What is a back-end?

The back-end is all of the technology required to process the incoming request and generate and send the response to the client. This typically includes three major parts:

Server

A server is simply a computer that listens for incoming requests. The server runs an app that contains logic about how to respond to various requests based on the HTTP verb and the Uniform Resource Identifier (URI). The server should not send more than one response per request.

Routing

The pair of an HTTP verb and a URI is called a route and matching them based on a request is called routing.

Middlewares

Middleware is any code that executes between the server receiving a request and sending a response. These middleware functions might modify the request object, query the database, or otherwise process the incoming request. Middleware functions typically end by passing control to the next middleware function, rather than by sending a response.

Eventually, a middleware function will be called that ends the request-response cycle by sending an HTTP response back to the client.

Transfer data between client and server

HTTP, FTP, SCP are the common File Transfer Protocols.

The basic point that distinguishes HTTP and FTP is that HTTP on request provides a web page from a web server to web browser. On another side, FTP is used to upload or download file between client and server.

The difference between SOAP and REST

Web services are categorised into two types: SOAP and REST. Typically SOAP and REST are the methods used to call the web services. There are several differences between SOAP and REST. Firstly SOAP relies on XML to assist the services while REST can support various formats such as HTML, XML, JSON, etc. Another significant difference is that SOAP is a protocol.

SOAP -> XML

REST -> JSON, HTML, XML

REST API using Flask

When to create an API

In general, consider an API if:

  1. Your data set is large, making download via FTP unwieldy or resource-intensive.
  2. Your users will need to access your data in real time, such as for display on another website or as part of an application.
  3. Your data changes or is updated frequently.
  4. Your users only need access to a part of the data at any one time.
  5. Your users will need to perform actions other than retrieve data, such as contributing, updating, or deleting data.

If you have data you wish to share with the world, an API is one way you can get it into the hands of others. However, APIs are not always the best way of sharing data with users. If the size of the data you are providing is relatively small, you can instead provide a “data dump” in the form of a downloadable JSON, XML, CSV, or SQLite file. Depending on your resources, this approach can be viable up to a download size of a few gigabytes.

REST (REpresentational State Transfer)

is a philosophy that describes some best practices for implementing APIs.

REST means when a client machine places a request to obtain information about resources from a server, the server machine then transfers the current state of the resource back to the client machine.

There are a few methods in this which are as follows.

Create REST using Flask

Flask-RESTful can be used to build REST APIs.

Why do we need to define a constructor

A constructor is generally used to set initial values for any of the fields (aka variables). It may also be used to “set up” anything you need for a class when you instantiate it.

A constructor is a method that is only called at the time of instantiation. You cannot ever explicitly call it, therefore, if you ever want to change the value of a field, you have to create methods other than the constructor to do so.

You don’t have to create a constructor at all. A default constructor will run anyway and set all fields to zero, null, etc., so if that’s all you plan to do, don’t bother.

Additionally, you can create overloaded constructors for different situations. You can have constructors that set values for any combination of variables and you can specify whether those values come from the program instantiating the class.

Compile time vs Run time

Compile-time: the time period in which you, the developer, are compiling your code.

Run-time: the time period which a user is running your piece of software.

Interpreted language vs compiled language

We need to convert our source code (high-level language) into binary machine code (low-level), so that our computer can understand it. There are mainly two ways to do these translations.

  1. Compiling the source code.
  2. Interpreting the source code.

Luckily as a programmer, we don’t need to worry about these things, because the languages themselves take care of these things, unless we are designing a programming language by ourselves.

Now let’s think of a scenario where I am the programmer and you are a consumer. Now I want to send my coded application to you.

One way to do this is that I compile my source code in my computer using a compiler, which will take my human readable source code, and translate it into a binary machine code. At this point, I have two files, one is the original source code, another one is the machine executable binary code. Now, I can’t send my executable binary file to the consumers so that the consumers can run my application. I don’t need to send the source code to the consumers. Compiled languages mainly work in this way. Examples of compiled languages: C, C++, Rust, Go.

Second way to distribute my program to the consumers is to give the source code to the consumer by interpreting my program. In this case, I send the actual source code to the consumer instead of the executable binary file. Then the consumer can download an interpreter that can execute my source code and run it on the fly. In this case, the interpreter goes through one line at a time of the source code and convert it to the equivalent binary code, and run it immediately before going to the next line. Examples of interpreted languages: Python, JavaScript, Ruby, PHP.

Benefits of compiled languages:

  1. It’s always ready to run. Once it is compiled and I have the executable binary file, I can send that file to millions of consumers immediately.
  2. It can be optimized for CPU usage. So, it is often faster.
  3. The source code is private.

Disadvantages/ downsides of compiled languages:

  1. If I compile it on PC, then that executable file will not work on Mac. It often needs to execute separately even for different types of CPU on the same operating system.

Benefits of interpreted languages:

  1. We don’t need to care about what kind of machine we are working on. Because we don’t distribute the executable file, we only send the source code. So, it is more portable and flexible across different platforms.
  2. It is also easier to test and debug because you only need to write your source code and test it.

Disadvantages/ downsides of interpreted languages:

  1. Slower compared to compiled languages.
  2. An interpreter is required.
  3. Source code is public.

But nowadays, most interpreted languages uses JIT (Just-in-time compilation), which makes interpreted languages faster. Read here.

Loose Coupling vs Tight Coupling

Loose coupling implies that services are independent so that changes in one service will not affect any other. The more dependencies you have between services, the more likely it is that changes will have wider, unpredictable consequences.

In a tightly coupled system, your performance is largely dictated by your slowest component. For example, microservice architectures with services that collaborate via HTTP-based APIs can be vulnerable to cascading performance problems where one component slows down. If your services are decoupled, you will have more freedom to optimise them individually for specific workloads.

Get vs Post request

GET is used for viewing something, without changing it, while POST is used for changing something. For example, a search page should use GET to get data while a form that changes your password should use POST. Essentially GET is used to retrieve remote data, and POST is used to insert/update remote data.

Get

Post

Distributed computing vs Parallel computing

Parallel Computing

In parallel computing multiple processors performs multiple tasks assigned to them simultaneously. Memory in parallel systems can either be shared or distributed. Parallel computing provides concurrency and saves time and money.

Distributed Computing

In distributed computing we have multiple autonomous computers which seems to the user as single system. In distributed systems there is no shared memory and computers communicate with each other through message passing. In distributed computing a single task is divided among different computers.

Parallel Distributed
Many operations are performed simultaneously System components are located at different locations
Single computer is required Uses multiple computers
Multiple processors perform multiple operations Multiple computers perform multiple operations
Processors communicate with each other through bus Computer communicate with each other through message passing.

Why do use CSRF token?

A CSRF token is a secure random token (e.g., synchronizer token or challenge token) that is used to prevent CSRF attacks. The token needs to be unique per user session and should be of large random value to make it difficult to guess. A CSRF secure application assigns a unique CSRF token for every user session.

GCP and AWS Equivalent Service names

GCP AWS
Cloud Storage S3 (Simple Storage Service)
Compute Engine EC2 (Elastic Compute Cloud)
BigQuery Redshift
Cloud Functions Lambda
App Engine Elastic beanstalk
Kubernetes Engine ECS (Elastic Container Service)
Cloud Firestore DynamoDB
Dataflow Amazon Kinesis
Dataproc EMR - Elastic MapReduce

Ref: https://cloudhawk.io/blog/aws/hybrid/cloud/2019/05/02/aws-gcp-service-equivalence.html

Docker vs Kubernetes vs Docker Swarm

https://youtu.be/9_s3h_GVzZc

Caching

Caching is also useful when retrieving data from a server. Instead of requesting the server every time we need data, we can store (cache) the data locally. Though, we may need a caching strategy if we have limited cache space or if the cached data can change over time.

The caching can also be implemented on the server itself. Instead of querying a database every time a user loads a page, we can cache the content and serve it to users from the cache. Then, update our cache every once in a while.

There are different caching strategies like:

FIFO, LIFO and LRU

Least Recently Used (LRU)

This is probably the most famous strategy. The name says it all. It evicts the least recently used value. But what does that mean?

When you call the cached function, the results are added to the cache (that is how caching works). But when you call the cached function with a value that is already cached, it returns the cached value and puts it at the top of the cache. When the cache is full the bottom-most value is removed.

In this strategy, recently used entries are most likely to be reused.

Ref