- Import the necessary libraries:
- Import the “pandas” library and assign it the alias ‘pd’ for working with data.
- Import the “Counter” class from the “collections” module, which is used to count the frequency of words.
- Define the column names you want to analyze:
- Create a list named “columns_to_analyze” containing the names of the columns you want to analyze for word frequencies.
- In this code, the specified columns are ‘threat_type,’ ‘flee_status,’ ‘armed_with,’ and ‘body_camera.’
- Specify the file path to your Excel document:
- Set the “directory_path” variable to specify the file path to the Excel file we want to analyze.
- Load your data into a data frame:
- Use the pd.read_excel function to read the data from the Excel file specified by “directory_path” into a Pandas DataFrame named ‘df.’
- Initialize a dictionary to store word counts for each column:
- Create an empty dictionary named “word_counts” to store the word counts for each specified column.
- Iterate through the specified columns:
- Use a for loop to iterate through each column name specified in the “columns_to_analyze” list.
- Retrieve and preprocess the data from the column:
- Within the loop, retrieve the data from the current column using “df[column_name].” Convert the data to strings using “.astype(str)” to ensure a consistent data type, and store it in the “column_data” variable.
- Tokenize the text and count the frequency of each word:
Tokenize the text within each column using the following steps:
- Join all the text in the column into a single string using ‘ ‘.join(column_data).
- Split the string into individual words using .split(). This step prepares the data for word frequency counting.
- Use the “Counter” class to count the frequency of each word in the “words” list and store the results in the “word_counts” dictionary under the column name as the key.
- Print the words and their frequencies for each column:
- After processing all specified columns, iterate through the “word_counts” dictionary.
- For each column, print the column name, followed by the individual words and their counts. This information is used to display the word frequencies for each specified column.