WEDNESDAY – NOVEMBER 8,2023.

  1. Import the necessary libraries:
  • Import the “pandas” library and assign it the alias ‘pd’ for working with data.
  • Import the “Counter” class from the “collections” module, which is used to count the frequency of words.
  1. Define the column names you want to analyze:
  • Create a list named “columns_to_analyze” containing the names of the columns you want to analyze for word frequencies.
  • In this code, the specified columns are ‘threat_type,’ ‘flee_status,’ ‘armed_with,’ and ‘body_camera.’
  1. Specify the file path to your Excel document:
  • Set the “directory_path” variable to specify the file path to the Excel file we want to analyze.
  1. Load your data into a data frame:
  • Use the pd.read_excel function to read the data from the Excel file specified by “directory_path” into a Pandas DataFrame named ‘df.’
  1. Initialize a dictionary to store word counts for each column:
  • Create an empty dictionary named “word_counts” to store the word counts for each specified column.
  1. Iterate through the specified columns:
  • Use a for loop to iterate through each column name specified in the “columns_to_analyze” list.
  1. Retrieve and preprocess the data from the column:
  • Within the loop, retrieve the data from the current column using “df[column_name].” Convert the data to strings using “.astype(str)” to ensure a consistent data type, and store it in the “column_data” variable.
  1. Tokenize the text and count the frequency of each word:

Tokenize the text within each column using the following steps:

  • Join all the text in the column into a single string using ‘ ‘.join(column_data).
  • Split the string into individual words using .split(). This step prepares the data for word frequency counting.
  • Use the “Counter” class to count the frequency of each word in the “words” list and store the results in the “word_counts” dictionary under the column name as the key.
  1. Print the words and their frequencies for each column:
  • After processing all specified columns, iterate through the “word_counts” dictionary.
  • For each column, print the column name, followed by the individual words and their counts. This information is used to display the word frequencies for each specified column.

Leave a Reply

Your email address will not be published. Required fields are marked *