WEDNESDAY – NOVEMBER 8,2023.

  1. Import the necessary libraries:
  • Import the “pandas” library and assign it the alias ‘pd’ for working with data.
  • Import the “Counter” class from the “collections” module, which is used to count the frequency of words.
  1. Define the column names you want to analyze:
  • Create a list named “columns_to_analyze” containing the names of the columns you want to analyze for word frequencies.
  • In this code, the specified columns are ‘threat_type,’ ‘flee_status,’ ‘armed_with,’ and ‘body_camera.’
  1. Specify the file path to your Excel document:
  • Set the “directory_path” variable to specify the file path to the Excel file we want to analyze.
  1. Load your data into a data frame:
  • Use the pd.read_excel function to read the data from the Excel file specified by “directory_path” into a Pandas DataFrame named ‘df.’
  1. Initialize a dictionary to store word counts for each column:
  • Create an empty dictionary named “word_counts” to store the word counts for each specified column.
  1. Iterate through the specified columns:
  • Use a for loop to iterate through each column name specified in the “columns_to_analyze” list.
  1. Retrieve and preprocess the data from the column:
  • Within the loop, retrieve the data from the current column using “df[column_name].” Convert the data to strings using “.astype(str)” to ensure a consistent data type, and store it in the “column_data” variable.
  1. Tokenize the text and count the frequency of each word:

Tokenize the text within each column using the following steps:

  • Join all the text in the column into a single string using ‘ ‘.join(column_data).
  • Split the string into individual words using .split(). This step prepares the data for word frequency counting.
  • Use the “Counter” class to count the frequency of each word in the “words” list and store the results in the “word_counts” dictionary under the column name as the key.
  1. Print the words and their frequencies for each column:
  • After processing all specified columns, iterate through the “word_counts” dictionary.
  • For each column, print the column name, followed by the individual words and their counts. This information is used to display the word frequencies for each specified column.

MONDAY – NOVEMBER 6,2023

1.  Import the necessary libraries:      Import the “pandas” library and assign it the alias ‘pd’ for data manipulation. Import the “matplotlib. pyplot” library and assign it the alias ‘plt’ for data visualization.

2. Load the Excel file into a DataFrame:    Specify the file path to the Excel file that you want to load (update this path to your Excel file’s location).
Specify the name of the sheet within the Excel file from which data should be read. Use the pd.read_excel function to read the data from the Excel file into a Pandas DataFrame named ‘df.’

3. Drop rows with missing ‘race,’ ‘age,’ or ‘gender’ values:     Remove rows from the DataFrame where any of these three columns (race, age, gender) have missing values.

4.Create age groups:     Define the boundaries for age groups using the ‘age_bins’ variable. Provide labels for each age group, corresponding to ‘age_bins,’ using the ‘age_labels’ variable.

5. Cut the age data into age groups for each race category:    Create a new column ‘Age Group’ in the DataFrame by categorizing individuals’ ages into the age groups defined in ‘age_bins’ and labeling them with ‘age_labels.’

6. Count the number of individuals in each age group by race and gender:   Group the data by race, gender, and age group.Count the number of individuals in each combination.Use the unstack() function to reshape the data, making it more suitable for visualization.Fill missing values with 0 using fillna(0).

7. Calculate the median age for each race and gender combination:    Group the data by race and gender. Calculate the median age for each combination.

8. Print the median age for each race and gender combination:    Print a header indicating “Median Age by Race and Gender.” Print the calculated median age for each race and gender combination.

9. Create grouped bar charts for different genders:   The code iterates over unique gender values in the DataFrame.

10. For each gender:  Subset the DataFrame to include only data for that gender. Create a grouped bar chart that displays the number of individuals in different age groups for each race-gender combination.
Set various plot properties such as the title, labels, legend, and rotation of x-axis labels. Display the plot using plt.show().

WEDNESDAY – NOVEMBER 1,2023.

Consequence the fundamental libraries:

The code imports the “pandas” library for information investigation and the “Counter” course from the “collections” module for tallying components in a list.
Specify the columns to be analyzed:

The code indicates the names of the columns you need to analyze from an Exceed Expectations record. These columns contain data such as “threat_type,” “flee_status,” “armed_with,” and others.
Set the record way to the Exceed Expectations document:

The code sets the record way to the area of your Exceed expectations record. You ought to supplant this way with the real way to your Exceed Expectations file.
Load the information from the Exceed expectations record into a DataFrame:

The code employments the “pd.read_excel” work to stack the information from the Exceed expectations record into a Pandas DataFrame, which may be a table-like structure for data.
Initialize a word reference for word counts:

The code initializes a lexicon called “word_counts” to store word frequencies for each of the desired columns. Each column will have its claim word recurrence counts.
Process each indicated column:

For each column indicated for examination, the code performs the following steps:
It recovers the information from that column and changes it to strings to guarantee uniform data type. This can be imperative for content processing.
It tokenizes the content within the column by breaking it into personal words. Tokenization is the method of part content into smaller units, such as words or phrases.
It tallies how numerous times each word shows up in that column utilizing the “Counter” lesson, and these word counts are put away within the “word_counts” word reference beneath the column’s name.
Print the words and their frequencies:

Finally, the code goes through the “word_counts” lexicon for each indicated column and shows the words and how numerous times they appear in that column. This gives bits of knowledge into the foremost common words or expressions in each column.

FRIDAY – NOVEMBER 3,2023.

Import the necessary libraries:

Import pandas as pd: Imports the Pandas library and assigns it the alias ‘pd’.
import matplotlib.pyplot as plt: Imports the Matplotlib library, specifically the ‘pyplot’ module, and assigns it the alias ‘plt’. Matplotlib is used for creating plots and visualizations.

Load the Excel file into a Data Frame:

directory_path: Specifies the file path to the Excel file you want to load. Make sure to update this path to the location of your Excel file.
sheet_name: Specifies the name of the sheet within the Excel file from which data should be read.
df = pd.read_excel(directory_path, sheet_name=sheet_name): Uses the pd.read_excel function to read the data from the Excel file into a Pandas DataFrame named ‘df’.

Calculate the median age of all individuals:

Median_age = df[‘age’].median(): Calculates the median age of all individuals in the ‘age’ column of the DataFrame and stores it in the ‘median_age’ variable.
print(“Median Age of All Individuals:”, median_age): Prints the calculated median age to the console.
Create age groups:

age_bins: Defines the boundaries for age groups. In this case, individuals will be grouped into the specified age ranges.
age_labels: Provides labels for each age group, corresponding to the ‘age_bins’.
Cut the age data into age groups:

df[‘Age Group’] = pd.cut(df[‘age’], bins=age_bins, labels=age_labels): Creates a new column ‘Age Group’ in the DataFrame by categorizing individuals’ ages into the age groups defined in ‘age_bins’ and labeling them with ‘age_labels.’
Count the number of individuals in each age group:

age_group_counts = df[‘Age Group’].value_counts().sort_index(): Counts the number of individuals in each age group and sorts them by the age group labels. The result is stored in the ‘age_group_counts’ variable.
Create a bar graph to analyze age groups:

plt.figure(figsize=(10, 6): Sets the size of the figure for the upcoming plot.
age_group_counts.plot(kind=’bar’, color=’skyblue’): Plots a bar graph using the ‘age_group_counts’ data, where each bar represents an age group. ‘skyblue’ is the color of the bars.
plt.title(‘Age Group Analysis’): Sets the title of the plot.
plt.xlabel(‘Age Group’): Sets the label for the x-axis.
plt.ylabel(‘Number of Individuals’): Sets the label for the y-axis.
plt.xticks(rotation=45): Rotates the x-axis labels by 45 degrees for better readability.
plt.show(): Displays the bar graph on the screen.

MONDAY – OCTOBER 30,2023

Data Collection:

Coordinate with Gary to obtain location data from police stations.
Gather geographical information on police stations, including their latitude and longitude coordinates.
Accurate location data is crucial for subsequent analysis.
Distance Calculation:

Calculate the distances between police stations using the coordinates obtained.
Understanding law enforcement’s spatial distribution and coverage in the area is the goal of this step.
Demographic Analysis:

Analyze data related to race, age, and shooting incidents.
Identify areas with the highest frequency of shootings.
This analysis helps identify potential hotspots.
Proximity Analysis:

Investigate how far shooting incidents occur from the police stations.
This analysis provides insights into response times and areas where increased law enforcement presence may be needed.
Data Segmentation:

Segment the data into training and testing datasets.
Consider the population distribution to ensure that the models are representative and capable of making accurate predictions or classifications.

FRIDAY – OCTOBER 27,2023

Python script is a versatile tool for text analysis within specific columns of an Excel dataset. It follows a structured process that includes importing essential libraries, specifying columns and file paths, loading data into a Pandas DataFrame, initializing an empty dictionary for word counts, iterating through specified columns, tokenizing text into words, counting word frequencies, and finally, printing the results. This script is adaptable and can be used for various data analysis tasks, such as text mining, sentiment analysis, or content categorization, providing valuable insights for further analysis and interpretation.