Thursday, 21 December 2023

Top 10 Pandas Question Answer

 

1. Define the Pandas/Python pandas?


Pandas is an open-source library for high-performance data manipulation in Python.

2. What are the different types of Data Structures in Pandas?


There are two different types of Data Structures in Pandas-
1) series
2) DataFrame

3. Explain Series and DataFrame in Pandas.


A series is a one-dimensional array capable of holding numerous data types. A series cannot have more than one column. It has a single parameter.

A DataFrame is a two-dimensional array labelled with rows and columns. It is a common method of storing data in row and column indexes.

4. How to check an empty DataFrame?


df = pd.DataFrame()
print(df)
df.empty

5. Create a DataFrame using List.


list1 = [1,2,3,4,5]
df1 = pd.DataFrame(list1)
df1

6. What Are the Most Important Features Of the Pandas Library?


1) Quick and efficient data manipulation and analysis.
2) Data from various file objects can be loaded.
3) Simple management of missing data
4) create time series functionality enabled
5) Merging and combining data sets

7. How to convert a NumPy array to a DataFrame of a given shape?


s1 = pd.Series(np.random.randint(1,5,12))
print(s1)

df1 = pd.DataFrame(s1.values.reshape(3,4))
df1

8. Create a DataFrame using Dictionary with a list and arrays


dict1 = { "A" : [1,2,3,4,5],
                "B" : np.array([10,20,30,40,50])}

df = pd.DataFrame(dict1)
df

9. Write a Pandas program to add, subtract, multiple, and divide two Pandas Series.


s1 = pd.Series([1, 2, 3, 4, 5])
s2 = pd.Series([9, 8, 3, 5, 6])

print(s1)
print(s2)

print(s1+s2)
print(s1-s2)
print(s1*s2)
print(s1/s2)

10. Explain is the use of info, describe, head and tail functions.


info
The data includes the number of columns, column names, column data types, memory use, range index, and the count of cells for each column.

describe
The describe() function returns descriptive statistics for a DataFrame.

head
used to get the first n rows.

 tail
used to get the last n rows.

Top 20 NumPy Question Answer

 

1. Create a null vector of size 10.


array1 = np.zeros(10,dtype = int)
array1

2. Create a null vector of size 10 but the fifth value which is 1.


array1 = np.zeros(10,dtype = int)
array1[4] = 1
array1

3. Write a NumPy program to add, subtract, multiply, divide arguments element-wise.


arr1 = np.array([1,2,3,4,5,6,7,8,9,10])
print( arr1)

add = arr1+10
print(add)

substract = arr1-10
print(substract)

multiply = arr1*10
print(multiply)

divide = arr1/10
print(divide)

4. Create a vector with values ranging from 15 to 45.


array1 = np.arange(15,45)
array1

5.Write a NumPy program to round elements of the array to the nearest integer.


arr1 = np.round([1.2,1.22,1.43,2.56])
arr1

6. Write a NumPy program to convert angles from degrees to radians for all elements in a given array.


rad_values = np.deg2rad([0,30,45,60,90])
rad_values

7. What is the use of all and any function in NumPy?


 all() - This function returns True, if all element of the iterable are True or iterable is empty.
 
 any() - This function returns True, if any one element of the iterable is True. If iterable is empty then returns False.

8. Create a 3x3 matrix with values ranging from 0 to 8.[[0 1 2] [3 4 5] [6 7 8]]

matrix1 = np.arange(0,9).reshape(3,3)

matrix1

9. Find indices of non-zero elements from [1,2,0,0,4,0]


arr1 = np.array([1,2,0,0,4,0])
np.where (arr1 !=0)

10. Create a random vector of size 30 and find the mean value.


arr1 = np.random.randint(1,30,size = 30)
print(arr1)
mean = np.mean(arr1)
mean

11. How to convert 1D array to 3D array?


array1 = np.array([1,2,3,4,5,6,7,8,9,10], ndmin = 3)
array1

12. How to convert 4D array to 2D array?


We can not convert 4D array to 2D array. 

13. How to print only 3 decimal places in a python NumPy array?


array1 = np.around([1.2345,2.4586,3.59763],3)
array1

14. How to compute the median of a NumPy array?


arr1 = np.arange(1,30)
print(arr1)
median = np.median(arr1)
median

15. How to compute the standard deviation of a NumPy array?


arr1 = np.arange(1,30)
print(arr1)
std = np.std(arr1)
std

16. How to compute the mean of a NumPy array?

arr1 = np.arange(1,30)
print(arr1)
mean = np.mean(arr1)
mean

17. How to sort an array by the nth column?


arr1 = np.random.randint(1,10 , size=10)
print(arr1)
sort_arr1 = np.sort(arr1)
print(sort_arr1)

18. How to find common values between the two arrays?


X = np.array([1,2,3,4,5,6])
Y = np.array([6,4,5,8,7,9])

XY = np.intersect1d(X, Y)
  print("Common_values", XY)

19. How to round away from zero a float array?


arr1 = np.around(3.145256 , 3)
arr1

20. Create a 3x3 identity matrix


array1 = np.identity(3,dtype = int)
array1

Top 10 Python Programming question answer

 

1. What are the key features of Python?


1. Open source and free: The Python programming language is freely available on the official website.

2. Interpreted Language: Python is an Interpreted Language since its code is executed line by line.

3. High-Level Programming Language: When we build Python programmed, we don't need to understand the system architecture or manage memory.

4. Python is a fairly simple language to learn when compared to other languages such as C, C#, JavaScript, Java, and so on.

5. Extensible feature: We can write Python code in C or C++ language and build the code in C/C++ language.

6. Comprehensive Standard Library: Python includes numerous libraries such as NumPy, pandas, and matplotlib.

7. Dynamically Typed Language: We don't need to declare the variable type.

2. How do you write comments in python? And Why Comments are important?


Comment in python:
Comments in Python are the lines in the code that are ignored by the interpreter during the execution of the program.

Single line Comments:
Comments in Python begin with a hash tag (#) and continue to the end of the line.

Multiline Comments:
1) We can use hashtag (#) to write multiline comments in Python. Each and every line will be considered as a single-line comment.
   
2) we ca use the strings with triple quotes (""" """) as multiline comments.

Importance:
1) Using comments in programs makes our code more understandable.
2) It makes the program more readable which helps us remember why certain blocks of code were written.
3) comments can also be used to ignore some code while testing other blocks of code.

3. What do you mean by Python literals?


Literals are the constant values, or the variable values used in a Python code.

1) String literals: String literals are characterized either by '', or "" surrounding them. The string literals can be Single line or Multiple line strings. 

2) Numeric literals: Numeric Literals in Python can be of three numeric types; int, float, complex.

3) Boolean literals: Boolean literals can True or False.

4) Special literals: None is a special literal defined in Python to represent a NULL value.

5) collection literals: There are four different types of literal collections; List literals, Tuple literals, Dict literals, Set literals.

4. What are the Escape Characters in python?


Escape characters are used to indicate that the characters after them are encoded differently.

5. Write a Python program to reverse words in a string


string = "slicing is the first and easiest way to reverse a string in python"

string = (string[::-1])

string

6. Write a Python program to swap cases of a given string


string = "Write a Python program to swap cases of a given string"

string = string.swapcase()

string

7. Write a program to find the length of the string "machine learning" with and without using len function.



string = "machine learning"
count = 0
for i in string :
    count = count + 1
print(count)


8. Write a Python program to count the occurrences of each word in a given sentence.


from collections import Counter

c = Counter(["Write", "a", "Python", "program", "to", "count", "the", "occurrences", "of", "each", "word", "in", "a", "given", "sentence","."])

print(c)

9. Python program to Count Even and Odd numbers in a string


string = input( "enter a string :")

even = 0
odd = 0

for i in range (0, len(string)):
    if i %2 ==0:
        
        even+=1
    else:
        odd+=1
        
print("Even no. count is :", even)  
print("Odd no. count is:" ,odd)

10. How do you check if a string contains only digits?


string = "python and machine learning 12345"

print(string.isdigit())


Top 10 Random Forest Algorithm Question Answer

 

1. Why is Random Forest Algorithm popular?


One of the most well-liked and widely applied machine learning techniques for classification issues is Random Forest. It works well on the classification model, but it may also be applied to the regression problem statements.

For modern data scientists, it has evolved into a deadly tool for improving the predictive model. The best thing about the method is that it relies on very few assumptions, making data preparation easier and saving time.

2. Can Random Forest Algorithm be used both for Continuous and Categorical Target Variables?


It is true that both continuous and categorical target (dependent) variables can be applied with Random Forest. The classification model refers to the category dependent variable in a random forest, or mixture of decision trees, and the regression model refers to the numerical or continuous dependent variable.

3. Explain the working of the Random Forest Algorithm.


The following are the steps involved in executing the random forest algorithm:
Step 1: Choose K at random records out of the N total records in the dataset.

Step 2: Using these K records, create and train a decision tree model.

Step 3: Repeat steps 1 and 2 after selecting the number of trees you want in your algorithm.

Step 4: In a regression issue, every tree in the forest forecasts an output value for an unknown data point. The mean, or average, of all the values predicted by each tree in the forest can be used to get the final value.
Each tree in the forest forecasts the class to which the new data point belongs in the event of a classification challenge. Ultimately, the class that receives the most votes—that is, the majority vote—is given the additional data point.

4. Why do we prefer a Forest (collection of Trees) rather than a single Tree?


Overfitting is an issue that arises when our model is flexible. A flexible model has a high variance since the training data will change the learned parameters, such as the decision tree's topology. Conversely, a rigid model is considered to have a high bias because it assumes things about the training data. It may also not be able to fit the training data at all, in which case the model has a high variance. Finally, a high bias suggests that the model is unable to appropriately generalize new and unseen data points.

Therefore, we must carefully consider the bias-variance tradeoff when building a model. Rather than restricting the tree's depth, which raises bias and decreases variance, we can merge numerous decision trees to create a forest at the end.

5. What does random refer to in ‘Random Forest’?


In Random Forest, "random" primarily refers to two processes: 
1) Random observations that are used to grow each tree.
2) At each node, random variables are chosen for splitting.

Random Record Selection: Every tree in the forest is trained using typically 63.2% of the total training data; in this case, replacement data points are randomly selected from the original training dataset for each data point. The training set for the tree's growth will be this sample. Random Variable Selection: The node is divided using the best split on a set of independent variables (predictors), say m, that are chosen at random from all of the predictor variables.

6. Does Random Forest need Pruning? Why or why not?


There is no pruning in a random forest, so every tree grows to its full potential. Pruning is a technique used in decision trees to prevent overfitting. Pruning is the process of choosing a subtree that results in the fewest test errors.

Pruning is a process of removing some of the trees in a random forest. It is done to reduce the complexity of the model and improve its performance.

Pruning can be done by removing individual trees or by removing entire branches. The latter is called branch pruning and it can be done in two ways: hard pruning and soft pruning. Hard pruning removes all trees from a branch, while soft pruning removes only those trees that are not giving any predictive power to the model.

7. What is the importance of max_feature hyperparameter?


In order to determine the optimal split and maximum amount of features to consider dividing a node, random forest takes random subsets of features.

8. What are the advantages and disadvantages of the Random Forest Algorithm?


 Advantages of Random Forest

1. Random Forest can perform both Classification and Regression tasks.
2. It is capable of handling large datasets with high dimensionality.
3. It enhances the accuracy of the model and prevents the overfitting issue.
4. It overcomes the problem of overfitting by averaging or combining the results of different decision trees.
5. Random Forests work well for a large range of data items than a single decision tree does.
6. Random Forest has less variance than a single decision tree.
7. Random forests are very flexible and possess very high accuracy.
8. Scaling of data is not required in a random forest algorithm. It maintains good accuracy even after providing data without scaling.
9. Random Forest algorithms maintain good accuracy even when a large proportion of the data is missing.

 Disadvantages of Random Forest
1. Although random forest can be used for both classification and regression tasks, it is not more suitable for Regression tasks.
2. Complexity is the main disadvantage of Random Forest algorithms.
3. Construction of Random forests are much harder and time-consuming than decision trees.
4. More computational resources are required to implement the Random Forest algorithm.
5. It is less intuitive in the case when we have a large collection of decision trees.
6. The prediction process using random forests is very time-consuming in comparison with other algorithms.

9. List down the features of Bagged Trees


1. Lowers variance by averaging the performance of the group.

2. When taking node splits consideration, the most recent model makes use of the whole feature space.

3. The trees are able to grow without being pruned, which lowers the tree-depth sizes and generates more variance but lower bias. This can help increase prediction power.

10. What are the applications are random forests?


Four significant sectors are where Random Forest is most commonly used:
1. Banking: This algorithm is mostly used by the banking industry to identify loan risk.
2. Medicine: This technique can be used to identify patterns of sickness and associated risks.
3. Marketing: This algorithm can be used to analyse marketing trends.
4. E-commerce: Recommendation engines are a useful tool for cross-selling.

The Top 10 Decision Tree Algorithm Question Answer

 

1. Explain the Decision Tree algorithm in detail.


While the Decision Tree algorithm is a supervised machine learning technique that can be used for regression and classification, it is most commonly utilized to solve problems with classification.

A decision tree is a diagram in the form of a tree that is used to identify a course of action. The dataset is divided into smaller subsets and is represented by the nodes of the tree. There are three nodes in the tree structure: the root, internal, or decision, and leaf nodes.

The top node is called the root node. It stands for the best attribute that was chosen for categorization. The decision nodes' internal nodes test a dataset attribute, while the terminal or leaf nodes provide the decision or classification label.

The Classification and Regression Tree algorithm, or CART algorithm, is used for creating trees.

The features of the provided dataset are used to make decisions or run the test.

A decision tree only presents an issue, and it further divides the tree into subtrees according to the response (Yes/No).

2. Explain Decision tree Key terms: Root Node, Decision Node/Branch Node, Leaf or Terminal Node.


Root Node 
In a decision tree, the root node is always at the top. It can be further separated into many sets and represents the full population or data sample.

Decision Node
Decision nodes have at least two branches and are sub-nodes that can divide into other sub-nodes.

Leaf Node
In a decision tree, the outcomes are carried by the leaf node. It is not possible to separate these nodes—also referred to as terminal nodes—any far enough.

3. What are the Steps for Making a decision tree?


1.Choose the initial dataset with the feature and target attributes defined.

2.Calculate the Information gain and Entropy for each attribute.

3.Pick the attribute with the highest information gain and make it the decision root node.

4.Calculate the information gain for the remaining attributes.

5.Create recurring child nodes by starting splitting at the decision node (i.e. for various values of the decision node, create separate child nodes).

6.Repeat this process until all the attributes are covered.

7.Prune the Tree to prevent overfitting.

4. What is a Puresubset


A pure subset is a subset that contains only samples of one class.

5. What are Techniques to avoid Overfitting in Decision Tree?


Pruning
Hyperparameter Tuning
Random Forest algorithm
Ensembling technique:a.Bagging b. Boosting

6. How to tune hyperparameters in decision trees Classifier?


GridSearchCV
This involves a thorough search of a list containing every potential hyperparameter value. Throughout the testing of every feasible combination, the top-performing model can be chosen. 

RandomisedSearchCV
Random samples are chosen at random from the hyperparameter space using this method. The hyperparameter space can be defined using statistical distributions or lists, similar to what is done in Grid Search.

7. What is pruning in the Decision Tree?


The process of eliminating the decision tree's unnecessary branches is called pruning. The decision tree may have some branches that indicate noisy or anomalous data.

The process of reducing an undesired branch on a tree is called tree pruning. As a result, the tree will be less convoluted and more conducive to accurate predictive analysis. As it prunes the unnecessary branches off the trees, it reduces overfitting.

8. What are the advantages and disadvantages of the Decision Tree?


Advantages
1. The decision tree model can be used for both classification and regression problems, and it is easy to interpret, understand, and visualize.

2. The output of a decision tree can also be easily understood.

3. Compared with other algorithms, data preparation during pre-processing in a decision tree requires less effort and does not require normalization of data.

4. The implementation can also be done without scaling the data.

5. A decision tree is one of the quickest ways to identify relationships between variables and the most significant variable.

6. New features can also be created for better target variable prediction.

7. Decision trees are not largely influenced by outliers or missing values, and it can handle both numerical and categorical variables.

8. Since it is a non-parametric method, it has no assumptions about space distributions and classifier structure.

Disadvantages
1. Overfitting is one of the practical difficulties for decision tree models. It happens when the learning algorithm continues developing hypotheses that reduce the training set error but at the cost of increasing test set error. But this issue can be resolved by pruning and setting constraints on the model parameters.

2. Decision trees cannot be used well with continuous numerical variables.

3. A small change in the data tends to cause a big difference in the tree structure, which causes instability.

4. Calculations involved can also become complex compared to other algorithms, and it takes a longer time to train the model.

5. It is also relatively expensive as the amount of time taken and the complexity levels are greater.

9. What is Decision Tree Regressor?


Decision tree regression observes features of an object and trains a model in the structure of a tree to predict data in the future to produce meaningful continuous output. Continuous output means that the output/result is not discrete, i.e., it is not represented just by a discrete, known set of numbers or values.

Discrete output example: A weather prediction model that predicts whether or not there’ll be rain in a particular day. Continuous output example: A profit prediction model that states the probable profit that can be generated from the sale of a product. Here, continuous values are predicted with the help of a decision tree regression model.

Mean squared error (MSE) is usually used to decide whether to split a node into two or more sub-nodes in a decision tree regression. In the case of a binary tree, the algorithm picks a value and splits the data into two subsets, calculates MSE for each subset, and chooses the smallest MSE value as a result.

10. What are the Applications of Decision Trees


1. To determine whether a loan application is likely to default, a decision tree is applied.

2. It can be applied to determine an individual's likelihood of contracting a certain illness.

3. It can assist retailers on the internet in forecasting the likelihood that a customer would buy a specific product.

4. Customer churn rates can also be determined using decision trees.

Top 10 Pandas Question Answer

  1. Define the Pandas/Python pandas? Pandas is an open-source library for high-performance data manipulation in Python. 2. What are the dif...