Thursday, 21 December 2023

1. Define the Pandas/Python pandas?

Pandas is an open-source library for high-performance data manipulation in Python.

2. What are the different types of Data Structures in Pandas?

There are two different types of Data Structures in Pandas-

1) series

2) DataFrame

3. Explain Series and DataFrame in Pandas.

A series is a one-dimensional array capable of holding numerous data types. A series cannot have more than one column. It has a single parameter.

A DataFrame is a two-dimensional array labelled with rows and columns. It is a common method of storing data in row and column indexes.

4. How to check an empty DataFrame?

df = pd.DataFrame()

print(df)

df.empty

5. Create a DataFrame using List.

list1 = [1,2,3,4,5]

df1 = pd.DataFrame(list1)

df1

6. What Are the Most Important Features Of the Pandas Library?

1) Quick and efficient data manipulation and analysis.

2) Data from various file objects can be loaded.

3) Simple management of missing data

4) create time series functionality enabled

5) Merging and combining data sets

7. How to convert a NumPy array to a DataFrame of a given shape?

s1 = pd.Series(np.random.randint(1,5,12))

print(s1)

df1 = pd.DataFrame(s1.values.reshape(3,4))

df1

8. Create a DataFrame using Dictionary with a list and arrays

dict1 = { "A" : [1,2,3,4,5],

"B" : np.array([10,20,30,40,50])}

df = pd.DataFrame(dict1)

9. Write a Pandas program to add, subtract, multiple, and divide two Pandas Series.

s1 = pd.Series([1, 2, 3, 4, 5])

s2 = pd.Series([9, 8, 3, 5, 6])

print(s1)

print(s2)

print(s1+s2)

print(s1-s2)

print(s1*s2)

print(s1/s2)

10. Explain is the use of info, describe, head and tail functions.

info

The data includes the number of columns, column names, column data types, memory use, range index, and the count of cells for each column.

describe

The describe() function returns descriptive statistics for a DataFrame.

head

used to get the first n rows.

tail

used to get the last n rows.

1. Create a null vector of size 10.

array1 = np.zeros(10,dtype = int)

array1

2. Create a null vector of size 10 but the fifth value which is 1.

array1 = np.zeros(10,dtype = int)

array1[4] = 1

array1

3. Write a NumPy program to add, subtract, multiply, divide arguments element-wise.

arr1 = np.array([1,2,3,4,5,6,7,8,9,10])

print( arr1)

add = arr1+10

print(add)

substract = arr1-10

print(substract)

multiply = arr1*10

print(multiply)

divide = arr1/10

print(divide)

4. Create a vector with values ranging from 15 to 45.

array1 = np.arange(15,45)

array1

5.Write a NumPy program to round elements of the array to the nearest integer.

arr1 = np.round([1.2,1.22,1.43,2.56])

arr1

6. Write a NumPy program to convert angles from degrees to radians for all elements in a given array.

rad_values = np.deg2rad([0,30,45,60,90])

rad_values

7. What is the use of all and any function in NumPy?

all() - This function returns True, if all element of the iterable are True or iterable is empty.

any() - This function returns True, if any one element of the iterable is True. If iterable is empty then returns False.

8. Create a 3x3 matrix with values ranging from 0 to 8.[[0 1 2] [3 4 5] [6 7 8]]

matrix1 = np.arange(0,9).reshape(3,3)

matrix1

9. Find indices of non-zero elements from [1,2,0,0,4,0]

arr1 = np.array([1,2,0,0,4,0])

np.where (arr1 !=0)

10. Create a random vector of size 30 and find the mean value.

arr1 = np.random.randint(1,30,size = 30)

print(arr1)

mean = np.mean(arr1)

mean

11. How to convert 1D array to 3D array?

array1 = np.array([1,2,3,4,5,6,7,8,9,10], ndmin = 3)

array1

12. How to convert 4D array to 2D array?

We can not convert 4D array to 2D array.

13. How to print only 3 decimal places in a python NumPy array?

array1 = np.around([1.2345,2.4586,3.59763],3)

array1

14. How to compute the median of a NumPy array?

arr1 = np.arange(1,30)

print(arr1)

median = np.median(arr1)

median

15. How to compute the standard deviation of a NumPy array?

arr1 = np.arange(1,30)

print(arr1)

std = np.std(arr1)

std

16. How to compute the mean of a NumPy array?

arr1 = np.arange(1,30)

print(arr1)

mean = np.mean(arr1)

mean

17. How to sort an array by the nth column?

arr1 = np.random.randint(1,10 , size=10)

print(arr1)

sort_arr1 = np.sort(arr1)

print(sort_arr1)

18. How to find common values between the two arrays?

X = np.array([1,2,3,4,5,6])

Y = np.array([6,4,5,8,7,9])

XY = np.intersect1d(X, Y)

print("Common_values", XY)

19. How to round away from zero a float array?

arr1 = np.around(3.145256 , 3)

arr1

20. Create a 3x3 identity matrix

array1 = np.identity(3,dtype = int)

array1

1. What are the key features of Python?

1. Open source and free: The Python programming language is freely available on the official website.

2. Interpreted Language: Python is an Interpreted Language since its code is executed line by line.

3. High-Level Programming Language: When we build Python programmed, we don't need to understand the system architecture or manage memory.

4. Python is a fairly simple language to learn when compared to other languages such as C, C#, JavaScript, Java, and so on.

5. Extensible feature: We can write Python code in C or C++ language and build the code in C/C++ language.

6. Comprehensive Standard Library: Python includes numerous libraries such as NumPy, pandas, and matplotlib.

7. Dynamically Typed Language: We don't need to declare the variable type.

2. How do you write comments in python? And Why Comments are important?

Comment in python:

Comments in Python are the lines in the code that are ignored by the interpreter during the execution of the program.

Single line Comments:

Comments in Python begin with a hash tag (#) and continue to the end of the line.

Multiline Comments:

1) We can use hashtag (#) to write multiline comments in Python. Each and every line will be considered as a single-line comment.

2) we ca use the strings with triple quotes (""" """) as multiline comments.

Importance:

1) Using comments in programs makes our code more understandable.

2) It makes the program more readable which helps us remember why certain blocks of code were written.

3) comments can also be used to ignore some code while testing other blocks of code.

3. What do you mean by Python literals?

Literals are the constant values, or the variable values used in a Python code.

1) String literals: String literals are characterized either by '', or "" surrounding them. The string literals can be Single line or Multiple line strings.

2) Numeric literals: Numeric Literals in Python can be of three numeric types; int, float, complex.

3) Boolean literals: Boolean literals can True or False.

4) Special literals: None is a special literal defined in Python to represent a NULL value.

5) collection literals: There are four different types of literal collections; List literals, Tuple literals, Dict literals, Set literals.

4. What are the Escape Characters in python?

Escape characters are used to indicate that the characters after them are encoded differently.

5. Write a Python program to reverse words in a string

string = "slicing is the first and easiest way to reverse a string in python"

string = (string[::-1])

string

6. Write a Python program to swap cases of a given string

string = "Write a Python program to swap cases of a given string"

string = string.swapcase()

string

7. Write a program to find the length of the string "machine learning" with and without using len function.

string = "machine learning"

count = 0

for i in string :

count = count + 1

print(count)

8. Write a Python program to count the occurrences of each word in a given sentence.

from collections import Counter

c = Counter(["Write", "a", "Python", "program", "to", "count", "the", "occurrences", "of", "each", "word", "in", "a", "given", "sentence","."])

print(c)

9. Python program to Count Even and Odd numbers in a string

string = input( "enter a string :")

even = 0

odd = 0

for i in range (0, len(string)):

if i %2 ==0:

even+=1

else:

odd+=1

print("Even no. count is :", even)

print("Odd no. count is:" ,odd)

10. How do you check if a string contains only digits?

string = "python and machine learning 12345"

print(string.isdigit())

1. Why is Random Forest Algorithm popular?

One of the most well-liked and widely applied machine learning techniques for classification issues is Random Forest. It works well on the classification model, but it may also be applied to the regression problem statements.

For modern data scientists, it has evolved into a deadly tool for improving the predictive model. The best thing about the method is that it relies on very few assumptions, making data preparation easier and saving time.

2. Can Random Forest Algorithm be used both for Continuous and Categorical Target Variables?

It is true that both continuous and categorical target (dependent) variables can be applied with Random Forest. The classification model refers to the category dependent variable in a random forest, or mixture of decision trees, and the regression model refers to the numerical or continuous dependent variable.

3. Explain the working of the Random Forest Algorithm.

The following are the steps involved in executing the random forest algorithm:

Step 1: Choose K at random records out of the N total records in the dataset.

Step 2: Using these K records, create and train a decision tree model.

Step 3: Repeat steps 1 and 2 after selecting the number of trees you want in your algorithm.

Step 4: In a regression issue, every tree in the forest forecasts an output value for an unknown data point. The mean, or average, of all the values predicted by each tree in the forest can be used to get the final value.

Each tree in the forest forecasts the class to which the new data point belongs in the event of a classification challenge. Ultimately, the class that receives the most votes—that is, the majority vote—is given the additional data point.

4. Why do we prefer a Forest (collection of Trees) rather than a single Tree?

Overfitting is an issue that arises when our model is flexible. A flexible model has a high variance since the training data will change the learned parameters, such as the decision tree's topology. Conversely, a rigid model is considered to have a high bias because it assumes things about the training data. It may also not be able to fit the training data at all, in which case the model has a high variance. Finally, a high bias suggests that the model is unable to appropriately generalize new and unseen data points.

Therefore, we must carefully consider the bias-variance tradeoff when building a model. Rather than restricting the tree's depth, which raises bias and decreases variance, we can merge numerous decision trees to create a forest at the end.

5. What does random refer to in ‘Random Forest’?

In Random Forest, "random" primarily refers to two processes:

1) Random observations that are used to grow each tree.

2) At each node, random variables are chosen for splitting.

Random Record Selection: Every tree in the forest is trained using typically 63.2% of the total training data; in this case, replacement data points are randomly selected from the original training dataset for each data point. The training set for the tree's growth will be this sample. Random Variable Selection: The node is divided using the best split on a set of independent variables (predictors), say m, that are chosen at random from all of the predictor variables.

6. Does Random Forest need Pruning? Why or why not?

There is no pruning in a random forest, so every tree grows to its full potential. Pruning is a technique used in decision trees to prevent overfitting. Pruning is the process of choosing a subtree that results in the fewest test errors.

Pruning is a process of removing some of the trees in a random forest. It is done to reduce the complexity of the model and improve its performance.

Pruning can be done by removing individual trees or by removing entire branches. The latter is called branch pruning and it can be done in two ways: hard pruning and soft pruning. Hard pruning removes all trees from a branch, while soft pruning removes only those trees that are not giving any predictive power to the model.

7. What is the importance of max_feature hyperparameter?

In order to determine the optimal split and maximum amount of features to consider dividing a node, random forest takes random subsets of features.

8. What are the advantages and disadvantages of the Random Forest Algorithm?

Advantages of Random Forest

1. Random Forest can perform both Classification and Regression tasks.

2. It is capable of handling large datasets with high dimensionality.

3. It enhances the accuracy of the model and prevents the overfitting issue.

4. It overcomes the problem of overfitting by averaging or combining the results of different decision trees.

5. Random Forests work well for a large range of data items than a single decision tree does.

6. Random Forest has less variance than a single decision tree.

7. Random forests are very flexible and possess very high accuracy.

8. Scaling of data is not required in a random forest algorithm. It maintains good accuracy even after providing data without scaling.

9. Random Forest algorithms maintain good accuracy even when a large proportion of the data is missing.

Disadvantages of Random Forest

1. Although random forest can be used for both classification and regression tasks, it is not more suitable for Regression tasks.

2. Complexity is the main disadvantage of Random Forest algorithms.

3. Construction of Random forests are much harder and time-consuming than decision trees.

4. More computational resources are required to implement the Random Forest algorithm.

5. It is less intuitive in the case when we have a large collection of decision trees.

6. The prediction process using random forests is very time-consuming in comparison with other algorithms.

9. List down the features of Bagged Trees

1. Lowers variance by averaging the performance of the group.

2. When taking node splits consideration, the most recent model makes use of the whole feature space.

3. The trees are able to grow without being pruned, which lowers the tree-depth sizes and generates more variance but lower bias. This can help increase prediction power.

10. What are the applications are random forests?

Four significant sectors are where Random Forest is most commonly used:

1. Banking: This algorithm is mostly used by the banking industry to identify loan risk.

2. Medicine: This technique can be used to identify patterns of sickness and associated risks.

3. Marketing: This algorithm can be used to analyse marketing trends.

4. E-commerce: Recommendation engines are a useful tool for cross-selling.

The Top 10 Decision Tree Algorithm Question Answer

1. Explain the Decision Tree algorithm in detail.

While the Decision Tree algorithm is a supervised machine learning technique that can be used for regression and classification, it is most commonly utilized to solve problems with classification.

A decision tree is a diagram in the form of a tree that is used to identify a course of action. The dataset is divided into smaller subsets and is represented by the nodes of the tree. There are three nodes in the tree structure: the root, internal, or decision, and leaf nodes.

The top node is called the root node. It stands for the best attribute that was chosen for categorization. The decision nodes' internal nodes test a dataset attribute, while the terminal or leaf nodes provide the decision or classification label.

The Classification and Regression Tree algorithm, or CART algorithm, is used for creating trees.

The features of the provided dataset are used to make decisions or run the test.

A decision tree only presents an issue, and it further divides the tree into subtrees according to the response (Yes/No).

2. Explain Decision tree Key terms: Root Node, Decision Node/Branch Node, Leaf or Terminal Node.

Root Node

In a decision tree, the root node is always at the top. It can be further separated into many sets and represents the full population or data sample.

Decision Node

Decision nodes have at least two branches and are sub-nodes that can divide into other sub-nodes.

Leaf Node

In a decision tree, the outcomes are carried by the leaf node. It is not possible to separate these nodes—also referred to as terminal nodes—any far enough.

3. What are the Steps for Making a decision tree?

1.Choose the initial dataset with the feature and target attributes defined.

2.Calculate the Information gain and Entropy for each attribute.

3.Pick the attribute with the highest information gain and make it the decision root node.

4.Calculate the information gain for the remaining attributes.

5.Create recurring child nodes by starting splitting at the decision node (i.e. for various values of the decision node, create separate child nodes).

6.Repeat this process until all the attributes are covered.

7.Prune the Tree to prevent overfitting.

4. What is a Puresubset

A pure subset is a subset that contains only samples of one class.

5. What are Techniques to avoid Overfitting in Decision Tree?

Pruning

Hyperparameter Tuning

Random Forest algorithm

Ensembling technique:a.Bagging b. Boosting

6. How to tune hyperparameters in decision trees Classifier?

GridSearchCV

This involves a thorough search of a list containing every potential hyperparameter value. Throughout the testing of every feasible combination, the top-performing model can be chosen.

RandomisedSearchCV

Random samples are chosen at random from the hyperparameter space using this method. The hyperparameter space can be defined using statistical distributions or lists, similar to what is done in Grid Search.

7. What is pruning in the Decision Tree?

The process of eliminating the decision tree's unnecessary branches is called pruning. The decision tree may have some branches that indicate noisy or anomalous data.

The process of reducing an undesired branch on a tree is called tree pruning. As a result, the tree will be less convoluted and more conducive to accurate predictive analysis. As it prunes the unnecessary branches off the trees, it reduces overfitting.

8. What are the advantages and disadvantages of the Decision Tree?

Advantages

1. The decision tree model can be used for both classification and regression problems, and it is easy to interpret, understand, and visualize.

2. The output of a decision tree can also be easily understood.

3. Compared with other algorithms, data preparation during pre-processing in a decision tree requires less effort and does not require normalization of data.

4. The implementation can also be done without scaling the data.

5. A decision tree is one of the quickest ways to identify relationships between variables and the most significant variable.

6. New features can also be created for better target variable prediction.

7. Decision trees are not largely influenced by outliers or missing values, and it can handle both numerical and categorical variables.

8. Since it is a non-parametric method, it has no assumptions about space distributions and classifier structure.

Disadvantages

1. Overfitting is one of the practical difficulties for decision tree models. It happens when the learning algorithm continues developing hypotheses that reduce the training set error but at the cost of increasing test set error. But this issue can be resolved by pruning and setting constraints on the model parameters.

2. Decision trees cannot be used well with continuous numerical variables.

3. A small change in the data tends to cause a big difference in the tree structure, which causes instability.

4. Calculations involved can also become complex compared to other algorithms, and it takes a longer time to train the model.

5. It is also relatively expensive as the amount of time taken and the complexity levels are greater.

9. What is Decision Tree Regressor?

Decision tree regression observes features of an object and trains a model in the structure of a tree to predict data in the future to produce meaningful continuous output. Continuous output means that the output/result is not discrete, i.e., it is not represented just by a discrete, known set of numbers or values.

Discrete output example: A weather prediction model that predicts whether or not there’ll be rain in a particular day. Continuous output example: A profit prediction model that states the probable profit that can be generated from the sale of a product. Here, continuous values are predicted with the help of a decision tree regression model.

Mean squared error (MSE) is usually used to decide whether to split a node into two or more sub-nodes in a decision tree regression. In the case of a binary tree, the algorithm picks a value and splits the data into two subsets, calculates MSE for each subset, and chooses the smallest MSE value as a result.

10. What are the Applications of Decision Trees

1. To determine whether a loan application is likely to default, a decision tree is applied.

2. It can be applied to determine an individual's likelihood of contracting a certain illness.

3. It can assist retailers on the internet in forecasting the likelihood that a customer would buy a specific product.

4. Customer churn rates can also be determined using decision trees.

The Top 10 k-Nearest Neighbours Algorithm Question Answer

1. What is the “K” in KNN algorithm?

When predicting the class of an item, K is the number of closest neighbors you prefer to choose from an unseen dataset that the model has never seen previously.

2. What is the difference between KNN and K-means?

KNN

1) It's a method of supervised learning

2) Classification is the main use, with regression being employed occasionally as well.

3) In a KNN, "K" refers to the number of closest neighbors that are used to categories or, in the case of a continuous variable or regression, forecast a test sample.

4) It is utilized for regression and classification of pre-existing data, where the goal variable or characteristic is often known ahead of time.

5) There isn't really a training phase in K-NN. However, using weighted averages or votes, the K-Nearest (typically Euclidean distance) Neighbours (observations) is used to forecast a test observation.

K-means

1) It's an approach to unsupervised learning.

2) Clustering is one of its uses.

3) In K-Means, "K" refers to the total number of clusters the algorithm is attempting to identify or establish from the input. Since this is employed with unsupervised learning, the clusters are frequently unknown.

3. Explain the K Nearest Neighbor Classification in detail

In basically, KNN separates the entire set of data into training and test sample data when dealing with a classification problem. The nearest neighbor is defined as the point with the lowest distance between training and sample points. This distance is measured. The KNN algorithm uses the majority to forecast the outcome.

4. Explain the K Nearest Neighbor Regression in detail

By averaging the observations within the same neighborhood, KNN regression is a non-parametric technique that intuitively approximates the relationship between independent variables and the continuous result. The analyst must determine the size of the neighborhood, or it can be selected by employing cross-validation to determine the size that minimizes the mean-squared error.

5. Why is the odd value of “K” preferable in KNN algorithm?

To ensure that there are no ties in the vote, odd values of K should be chosen above even ones. To make a number of data points odd, add or subtract 1 from its square root if it is even.

6. How do we decide the value of "K" in KNN algorithm?

The ideal value for "K" cannot be determined in a certain manner; instead, we must experiment with several numbers until we find the one that works best.

Five is the most favored value for K.

Extremely low values of K, like K=1 or K=2, may introduce noise into the model and cause outlier effects. Although large values for K are desirable, there may be some issues.

7. What is the difference between Euclidean Distance and Manhattan distance? What is the formula for Euclidean distance and Manhattan distance?

Euclidian distance

to determine the separation in a plane between two data points. By setting p's value to 2, the Minkowski Distance formula is used to determine it.

((x1-x2)^2-(y1-y2)^2)^2 equals ED.

Manhattan's distance

In a path that follows a grid, the Manhattan Distance is used to determine the separation between two data points. By setting the value of p to 1, the Minkowski Distance formula is utilized to compute it.

MD equals |x1-x2| plus |y1-y2|

8. Why do you need to scale your data for the k-NN algorithm?

Since KNN is an algorithm that depends on distance and is sensitive to outliers.

9. Why KNN Algorithm is called as Lazy Learner?

For the reason that it conserves the dataset and acts on it while classifying, rather than learning from the training set instantly.

Because of this, KNN is known as a lazy learner due to the way it delays learning a model instead of learning it right away.

10. What are the advantages and disadvantages of the KNN algorithm?

Advantages of KNN Algorithm

1) It is easy to put into practice.

2) It is capable of handling noisy training data.

3) If there is a lot of training data, it might work better.

Disadvantages of KNN Algorithm

1) It is always necessary to calculate the value of K, which occasionally may be difficult.

2) Because the distance between each data point for each training sample must be calculated, there is a significant computation cost.

3) Alert to Conditions

The Top 10 Logistic Regression Algorithm Question answer

1. What is logistic regression?

One machine learning technique for resolving categorization problems is logistic regression. It is a predictive analytical method predicated on the concept of probability. The probability of a categorical dependent variable is predicted using the classification algorithm Logistic Regression. In logistic regression, the dependent variable is a data-driven binary variable.

either a 1 or a 0. A logistic regression model is comparable to a linear regression model, but instead of using a linear function, it makes use of a more complex cost function called the "sigmoid function" or "logistic function."

2. How will you deal with the multiclass classification problem using logistic regression?

We're dealing with more than two classes when we use multi-class classification. That class is represented by 1 in the one vs. rest method, whereas the remaining classes become 0.

3. Why is logistic regression very popular/widely used?

Because it can transform the values of logits (log-odds), which can range from −∞ to +∞, to a range between 0 and 1, logistic regression is widely used. Logistic functions can be applied to a wide range of real-world circumstances since they provide the probability of an event occurring. This explains why the logistic regression model, which can handle categorical variables, is so widely used.

4. Why can’t linear regression be used instead of logistic regression for classification?

Distribution of error terms: There are differences in the data distribution between logistic and linear regression. Error terms are assumed to have a normal distribution in linear regression. This presumption is false when it comes to binary classification.

Model output: The output of a linear regression is continuous. When it comes to binary categorization, a continuous value's output makes no sense. Linear regression may forecast values for binary classification issues that extend beyond 0 and 1. Its range should be limited to 0 and 1 if we want the output to be probabilities that can be assigned to two distinct classes. The logistic regression model is favoured over linear regression because it can produce probabilities with a logistic/sigmoid function.

Variance of Residual Errors: Random error variance is assumed to be constant in linear regression. In the case of logistic regression, this assumption is likewise rejected.

5. What are the assumptions of logistic regression?

1. It is predicated on the assumption that the independent variables have minimal to no multicollinearity, or that the predictors are uncorrelated.

2. Each predictor variable and the outcome logit should have a linear relationship. The formula for the logit function is logit(p) = log(p/(1-p)), where p is the expected outcome's probability.

3. A large sample size is typically necessary in order to make accurate predictions.

4. The ordered logistic regression requires the target variable to be sorted, whereas the binary classification logistic regression assumes that the target variable is binary, i.e., it is divided into two groups.

6. Why is logistic regression called regression and not classification?

The basic approach for logistic regression is the same as that for linear regression, but it involves regressing for the likelihood of a categorical result.

related to linear regression Logistics regression computes the coefficients of the independent variable in the same manner as it employs the same linear equation containing all of the independent variables to predict the target variable. Because it is used to solve classification problems, logistic regression first transforms the equation into a sigmoidal function to obtain probabilities, and then it classifies the record assuming an appropriate threshold (such as 0.5 or the mean of the probabilities).

Y = b0 + b1X1 + b2X2 +... + bnXn is the linear regression formula.

Y equals sigmoid (b0 + b1X1 + b2X2 +... + bnXn) in logistic regression.

Logistic regression is a generalized linear model and it uses the same basic formula of linear regression.

So basically, Logistic Regression is just a sigmoid of Linear Regression.

7. Explain the significance of the sigmoid function.

A mathematical function called the sigmoid function is utilized to convert expected values into probabilities. It converts any real number between 0 and 1 into another value.

The logistic regression's result must lie between 0 and 1, and as it cannot be greater than this, it takes the shape of a "S" curve. The logistic or sigmoid function is another name for the S-form curve. The concept of the threshold value, which indicates a probability of either 0 or 1, is applied in logistic regression. For example, numbers that are higher than the threshold value tend to be 1 and values that are lower than it tends to be 0.

8. Explain the general intuition behind logistic regression.

Logistic regression applies a logit function to fit data and forecast the likelihood of an event occurring.

An S-shaped curve known as the logistic or sigmoid function can translate any real number into a value between 0 and 1, but never precisely at those boundaries. The goal of logistic regression is to identify the best line or plane that divides the two classes. A logistic regression model can be trained by simply using m and c to create the best feasible line to divide the two classes' points so that, in the event of a new unseen data point, the model can quickly determine which class the unseen data point belongs to.

9. What are outliers and how can the sigmoid function mitigate the problem of outliers in logistic regression?

The Logistic Regression's assumptions are susceptible to irregular data, including outliers, highly leveraged observations, and swaying observations. Therefore, in logistic regression, a sigmoid function is utilized to overcome the outlier problem. It can restrict the result value to fall between 0 and 1.

10. Why can’t we use Mean Square Error (MSE) as a cost function for logistic regression?

The sigmoid function is used in logistic regression to carry out a non-linear transformation and obtain the probability. This nonlinear transformation squared will result in the non-convexity issue with local minimums, making it impossible to reach the global minimum by gradient descent in such circumstances. MSE therefore becomes appropriate for use with logistic regression.

Wednesday, 20 December 2023

The Top 20 Linear Regression Algorithm Question Answer

1. What is Linear regression?

A method for supervised machine learning called linear regression is used to determine the relationship between one or more independent (x) variables and a dependent (y) variable.

2. How do you represent a simple linear regression?

Simple linear regression is a type of linear regression in which the value of a numerical dependent variable is predicted using only one independent variable.

Line equation: y = mx + c

where,

the dependent variable is y.

x is the independent variable.

m is the coefficient of linear regression.

c = the line's intercept

3. What is multiple linear regression?

Multiple linear regression is the term for a type of linear regression in which the value of a numerical dependent variable is predicted using more than one independent variable.

Linear equation: y = m1x1 + m2x2 +.... + mNxN + c

4. What are the assumptions of the Linear regression model?

1. Linearity: There must be a linear relationship between the independent and dependent variables.

2. Lack of Multicollinearity (Independence): Each observation operates independently from the rest of the data.

3. Residual Normality: The discrepancy between the actual and expected values of y

4. Multicollinearity: The characteristics do not exhibit multicollinearity.

5. Homoscedasticity: At every x-level, residual variance is constant. We called this homoscedasticity.

5. What is the assumption of homoscedasticity? How to check Linearity? How to prevent heteroscedasticity?

Assumption of homoscedasticity

1) No multicollinearity

2) linearity

How to check Linearity:

1. Coefficient of correlation

2. Scatter Plot

3. Correlation matrix

How to prevent heteroscedasticity?

It may be due to outliers

It may be due to omitted variable bias

Log transformation

6. What does multicollinearity mean?

This phenomenon occurs when two or more independent variables, or predictors, have a high degree of correlation with one another; in other words, one variable may be predicted linearly with the aid of other variables. It discovers how independent variables are correlated and associated with one another. Multicollinearity can occasionally be associated with collinearity.

7. What is VIF? What is the best value of VIF?

An independent variable's VIF score shows how well it can be explained by other independent variables.1 is an ideal value for VIF.

8.What are the feature selection methods in Linear Regression?

Feature selection is the process of identifying and selecting a subset of input variables that are most relevant to the target variable.

1. Stepwise Regression

In the Stepwise regression technique, we start fitting the model with each individual predictor and see which one has the lowest p-value.

2. Forward Selection

Forward selection is almost similar to Stepwise regression however the only difference is that in forward selection we only keep adding the features.

3. Backward Elimination

In backward elimination in the first step, we include all predictors and in subsequent steps, keep on removing the one which has the highest p-value (>.05 the threshold limit).

9.What is feature scaling? Is it required in Linear Regression?

The technique of normalizing a dataset's feature range is known as feature scaling.

Features in real-world datasets frequently vary in terms of magnitude, range, and unit of measurement. Consequently, feature scaling is required so that machine learning models can interpret various features on the same scale.

10.How to find the best fit line in a linear regression model?

A regression line is referred to as the best fit line (BFL) if it yields the lowest error.

The gradient decent approach is used in the linear regression model to determine the most appropriate line, which has the lowest sum of squared errors.

11.What is the cost Function in Linear Regression?

The difference, or error, between actual and predicted y at a given position is determined by the cost (or loss) function.

12. Briefly explain the gradient descent algorithm

Gradient descent is an optimization algorithm that’s used when training a machine learning model and is based on a convex function and tweaks its parameters iteratively to minimize a given function to its local minimum (that is, slope = 0).

For a start, we have to select a random bias and weights, and then iterate over the slope function to get a slope of 0. The way we change update the value of the bias and weights is through a variable called the learning rate. We have to be wise on the learning rate because choosing:

A small leaning rate may lead to the model to take some time to learn. A large learning rate will make the model converge as our pointer will shoot and we’ll not be able to get to minima.

13.How to evaluate regression models?

MAE, or Mean Absolute Error

The most straightforward metric is this one. The absolute difference between the actual data and the forecasts is divided into eight equal parts and averaged.

RMSE, or root mean square error

By taking the square root of the average of the squared difference between the predicted and actual values, the Root Mean Square Error is calculated. It shows the sample standard deviation of the variations between observed and expected values (also known as residuals).

MSE, or mean squared error

The mean of each data point's squared errors, or residuals. To put it simply, it can be expressed as the squared average of the disparities between the expected and actual values.

R^2, or the Coefficient of Determination

It gauges how effectively the regression line reproduces the actual results. It aids in your comprehension of how successfully the independent variable in your model adjusted for variation. This indicates the model's fit to the dataset.

Adjusted R-squared

There is a drawback of R^2 that it improves every time when we add new variables in the model. Think about it, whenever you add a new variable there can be two circumstances, either the new variable improves your model or not. When the new variable improves your model then it is ok. But what if it does not improve your model? Then the problem occurs. The value of R^2 keeps on increasing with the addition of more independent variables even though they may not have a significant impact on the prediction.

14.Which evaluation technique should you prefer to use for data with many outliers in it?

When there are too many outliers in the dataset, Mean Absolute Error (MAE) is recommended since it is robust to outliers while MSE and RMSE are very sensitive to outliers and begin penalising the outliers by squaring the error terms, also referred to as residuals.

15.What is residual? How is it computed?

residual also called as error and it is the difference between actual value(ya) and predicted value(yp).

residual = ya-yp

ya = actual value

yp = predicted value

16.What are SSE, SSR, and SST? and What is the relationship between them?

SSE

SSE is the sum of squared error, and it is defined as the sum of square of difference between actual value and predicted value.

SSE = sum(ya-yp)**2

SSR

SSR is the sum of squared error due to the regression and it can be defined as the sum of square of difference between

predicted value and mean.

SSR = sum(yp-y(mean))**2

SST

SST is the sum of squared of total error and it can be defined as sum of squared difference between actual and mean value of

dependent variable.

SST = sum(ya-y(mean))**2

The relationship between them is given by SST = SSR + SSE

17.What does the coefficient of determination explain?

In a linear regression model, the R-square (R2), also called the coefficient of determination, indicates the percentage of variation in your dependent variable (Y) that is explained by your independent variables (X).The primary issue with the R-squared is that, when we add additional independent variables, it either stays constant or gets larger.

18.What’s the intuition behind R-Squared?

A statistical metric in a regression model called R-Squared, or the coefficient of determination, indicates how much of the variance in the dependent variable can be explained by the independent variable. Put alternatively, the goodness of fit, or r-squared, illustrates how well the data fit the regression model.

19.What is the Coefficient of Correlation: Definition, Formula

Correlation coefficients are used to measure how strong a relationship is between two variables.

Correlation coefficient(r) = covariance(x,y)/std(x)*std(y)

20.What is the difference between overfitting and underfitting?

Overfitting happens when the model performs well on the training set but not so well on test data.

Underfitting happens when it neither performs well on the train set nor on the test set.