Various data analysis methods can be used to analyse the Hyde Park data over time. Begin by examining population trends over time to identify significant changes in each demographic category. Bar charts can be useful for depicting changes in age distribution and demonstrating how the population structure has evolved. Pie charts and bar graphs can effectively represent educational attainment levels, providing clear visuals of the community’s educational progress. Analysing nativity and race/ethnicity data entails observing changes in the demographic makeup using percentage distribution methods. Gender-based labour force participation can be represented graphically to reveal changing workforce trends. For housing tenure, pie charts or bar graphs can be used to show the shifting balance of owner-occupied and renter-occupied housing.
WEDNESDAY – NOVEMBER 15,2023.
Today I looked at the Excel sheet’s second “Back Bay” page https://data.boston.gov/dataset/neighborhood-demographics.
The Back Bay dataset provides insights into the neighborhood’s evolution over time, allowing for a comprehensive analysis of various demographic aspects. Population fluctuations are notable, with a decline until 1990 followed by relative stability. Age distribution highlights shifts in the percentage of residents across different age groups, with the 20-34 age bracket increasing from 32% in 1950 to 54% in 1980. Educational attainment shows shifting proportions of people with varying levels of education, most notably a significant increase in those with a Bachelor’s Degree or higher from 20% in 1950 to 81% in 2010.
MONDAY – NOVEMBER 13, 2023
I’m currently looking through the dataset on Analyse Boston, specifically the “Allston” sheet within the “neighborhoodsummaryclean_1950-2010” Excel file, which can be found at https://data.boston.gov/dataset/neighborhood-demographics. The dataset provides an in-depth look at demographic and socioeconomic trends in Allston over several decades. Notably, there is clear population growth between 1950 and 2010. The data on age distribution reveals interesting patterns, such as shifts in the percentage of residents across various age groups over time. Data on educational attainment reflect changes in the population’s education levels, most notably a significant increase in the percentage of people with a Bachelor’s degree or higher. The nativity data reveals the percentage of foreign-born residents, indicating changes in immigration patterns.
SUNDAY – NOVEMBER 12,2023
This is Project 2 for MTH 522 at the University of Massachusetts Dartmouth.
Project Title:
Spatial and Demographic Analysis of Police Shootings Related to Police Station Locations in the United Statesproject 2 mth
FRIDAY – NOVEMBER 10,2023.
In today’s analysis
Data Loading: Imported police shooting data from an Excel spreadsheet into a Pandas Data Frame.
Justification Criteria: A function was defined to decide if the use of force was justified, taking into account the threat types and weaponry involved.
Data Transformation: Applied the justification function to the dataset, resulting in the creation of a new column reflecting the force’s justification status.
Data Filtering: The data was filtered to include only instances involving people of the Black, White, Hispanic, and Asian race groupings.
Gender Separation: The filtered data was separated by gender to analyze incidences affecting both males and females.
Calculation of Occurrences and Percentages: For each ethnic group, the occurrences and percentages of ‘False’ justified force instances were calculated.
WEDNESDAY – NOVEMBER 8,2023.
- Import the necessary libraries:
- Import the “pandas” library and assign it the alias ‘pd’ for working with data.
- Import the “Counter” class from the “collections” module, which is used to count the frequency of words.
- Define the column names you want to analyze:
- Create a list named “columns_to_analyze” containing the names of the columns you want to analyze for word frequencies.
- In this code, the specified columns are ‘threat_type,’ ‘flee_status,’ ‘armed_with,’ and ‘body_camera.’
- Specify the file path to your Excel document:
- Set the “directory_path” variable to specify the file path to the Excel file we want to analyze.
- Load your data into a data frame:
- Use the pd.read_excel function to read the data from the Excel file specified by “directory_path” into a Pandas DataFrame named ‘df.’
- Initialize a dictionary to store word counts for each column:
- Create an empty dictionary named “word_counts” to store the word counts for each specified column.
- Iterate through the specified columns:
- Use a for loop to iterate through each column name specified in the “columns_to_analyze” list.
- Retrieve and preprocess the data from the column:
- Within the loop, retrieve the data from the current column using “df[column_name].” Convert the data to strings using “.astype(str)” to ensure a consistent data type, and store it in the “column_data” variable.
- Tokenize the text and count the frequency of each word:
Tokenize the text within each column using the following steps:
- Join all the text in the column into a single string using ‘ ‘.join(column_data).
- Split the string into individual words using .split(). This step prepares the data for word frequency counting.
- Use the “Counter” class to count the frequency of each word in the “words” list and store the results in the “word_counts” dictionary under the column name as the key.
- Print the words and their frequencies for each column:
- After processing all specified columns, iterate through the “word_counts” dictionary.
- For each column, print the column name, followed by the individual words and their counts. This information is used to display the word frequencies for each specified column.
MONDAY – NOVEMBER 6,2023
1. Import the necessary libraries: Import the “pandas” library and assign it the alias ‘pd’ for data manipulation. Import the “matplotlib. pyplot” library and assign it the alias ‘plt’ for data visualization.
2. Load the Excel file into a DataFrame: Specify the file path to the Excel file that you want to load (update this path to your Excel file’s location).
Specify the name of the sheet within the Excel file from which data should be read. Use the pd.read_excel function to read the data from the Excel file into a Pandas DataFrame named ‘df.’
3. Drop rows with missing ‘race,’ ‘age,’ or ‘gender’ values: Remove rows from the DataFrame where any of these three columns (race, age, gender) have missing values.
4.Create age groups: Define the boundaries for age groups using the ‘age_bins’ variable. Provide labels for each age group, corresponding to ‘age_bins,’ using the ‘age_labels’ variable.
5. Cut the age data into age groups for each race category: Create a new column ‘Age Group’ in the DataFrame by categorizing individuals’ ages into the age groups defined in ‘age_bins’ and labeling them with ‘age_labels.’
6. Count the number of individuals in each age group by race and gender: Group the data by race, gender, and age group.Count the number of individuals in each combination.Use the unstack() function to reshape the data, making it more suitable for visualization.Fill missing values with 0 using fillna(0).
7. Calculate the median age for each race and gender combination: Group the data by race and gender. Calculate the median age for each combination.
8. Print the median age for each race and gender combination: Print a header indicating “Median Age by Race and Gender.” Print the calculated median age for each race and gender combination.
9. Create grouped bar charts for different genders: The code iterates over unique gender values in the DataFrame.
10. For each gender: Subset the DataFrame to include only data for that gender. Create a grouped bar chart that displays the number of individuals in different age groups for each race-gender combination.
Set various plot properties such as the title, labels, legend, and rotation of x-axis labels. Display the plot using plt.show().
WEDNESDAY – NOVEMBER 1,2023.
Consequence the fundamental libraries:
The code imports the “pandas” library for information investigation and the “Counter” course from the “collections” module for tallying components in a list.
Specify the columns to be analyzed:
The code indicates the names of the columns you need to analyze from an Exceed Expectations record. These columns contain data such as “threat_type,” “flee_status,” “armed_with,” and others.
Set the record way to the Exceed Expectations document:
The code sets the record way to the area of your Exceed expectations record. You ought to supplant this way with the real way to your Exceed Expectations file.
Load the information from the Exceed expectations record into a DataFrame:
The code employments the “pd.read_excel” work to stack the information from the Exceed expectations record into a Pandas DataFrame, which may be a table-like structure for data.
Initialize a word reference for word counts:
The code initializes a lexicon called “word_counts” to store word frequencies for each of the desired columns. Each column will have its claim word recurrence counts.
Process each indicated column:
For each column indicated for examination, the code performs the following steps:
It recovers the information from that column and changes it to strings to guarantee uniform data type. This can be imperative for content processing.
It tokenizes the content within the column by breaking it into personal words. Tokenization is the method of part content into smaller units, such as words or phrases.
It tallies how numerous times each word shows up in that column utilizing the “Counter” lesson, and these word counts are put away within the “word_counts” word reference beneath the column’s name.
Print the words and their frequencies:
Finally, the code goes through the “word_counts” lexicon for each indicated column and shows the words and how numerous times they appear in that column. This gives bits of knowledge into the foremost common words or expressions in each column.
FRIDAY – NOVEMBER 3,2023.
Import the necessary libraries:
Import pandas as pd: Imports the Pandas library and assigns it the alias ‘pd’.
import matplotlib.pyplot as plt: Imports the Matplotlib library, specifically the ‘pyplot’ module, and assigns it the alias ‘plt’. Matplotlib is used for creating plots and visualizations.
Load the Excel file into a Data Frame:
directory_path: Specifies the file path to the Excel file you want to load. Make sure to update this path to the location of your Excel file.
sheet_name: Specifies the name of the sheet within the Excel file from which data should be read.
df = pd.read_excel(directory_path, sheet_name=sheet_name): Uses the pd.read_excel function to read the data from the Excel file into a Pandas DataFrame named ‘df’.
Calculate the median age of all individuals:
Median_age = df[‘age’].median(): Calculates the median age of all individuals in the ‘age’ column of the DataFrame and stores it in the ‘median_age’ variable.
print(“Median Age of All Individuals:”, median_age): Prints the calculated median age to the console.
Create age groups:
age_bins: Defines the boundaries for age groups. In this case, individuals will be grouped into the specified age ranges.
age_labels: Provides labels for each age group, corresponding to the ‘age_bins’.
Cut the age data into age groups:
df[‘Age Group’] = pd.cut(df[‘age’], bins=age_bins, labels=age_labels): Creates a new column ‘Age Group’ in the DataFrame by categorizing individuals’ ages into the age groups defined in ‘age_bins’ and labeling them with ‘age_labels.’
Count the number of individuals in each age group:
age_group_counts = df[‘Age Group’].value_counts().sort_index(): Counts the number of individuals in each age group and sorts them by the age group labels. The result is stored in the ‘age_group_counts’ variable.
Create a bar graph to analyze age groups:
plt.figure(figsize=(10, 6): Sets the size of the figure for the upcoming plot.
age_group_counts.plot(kind=’bar’, color=’skyblue’): Plots a bar graph using the ‘age_group_counts’ data, where each bar represents an age group. ‘skyblue’ is the color of the bars.
plt.title(‘Age Group Analysis’): Sets the title of the plot.
plt.xlabel(‘Age Group’): Sets the label for the x-axis.
plt.ylabel(‘Number of Individuals’): Sets the label for the y-axis.
plt.xticks(rotation=45): Rotates the x-axis labels by 45 degrees for better readability.
plt.show(): Displays the bar graph on the screen.
MONDAY – OCTOBER 30,2023
Data Collection:
Coordinate with Gary to obtain location data from police stations.
Gather geographical information on police stations, including their latitude and longitude coordinates.
Accurate location data is crucial for subsequent analysis.
Distance Calculation:
Calculate the distances between police stations using the coordinates obtained.
Understanding law enforcement’s spatial distribution and coverage in the area is the goal of this step.
Demographic Analysis:
Analyze data related to race, age, and shooting incidents.
Identify areas with the highest frequency of shootings.
This analysis helps identify potential hotspots.
Proximity Analysis:
Investigate how far shooting incidents occur from the police stations.
This analysis provides insights into response times and areas where increased law enforcement presence may be needed.
Data Segmentation:
Segment the data into training and testing datasets.
Consider the population distribution to ensure that the models are representative and capable of making accurate predictions or classifications.