Identifying the use of Police Force in the United States

Santiago Berniz
6 min readNov 17, 2020

I’ve recently had the opportunity to work as a Data Scientist in a team of Data Scientist, and Web Developers for the Human Rights First organization. By working with this organization I was presented with technical challenges as well as non-technical challenges.

The Human Rights First organization challenges the United States to live up to their ideas, and they will press government or companies to respect human rights and the rule of law as it is mentioned on their About section of their website. What is most interesting about this organization is that when the above fails, they will demand reform, accountability, and justice. This means this organization works to make sure human rights are respected and push for reforms when necessary.

The product we worked on was to identify the use of police force from online sources, check their credibility, and classify them into different categories of use of force and publish it in a map for people to see. To keep the classification as objective as possible, we followed a strict rule from the National Institute of Justice Continuum.

The Data Science team on this project was in charge of gathering, cleaning, and classifying the data. by gathering the data and providing the data to the public, we help people be aware of the current state of the use of police force. I am a firm believer that information helps to raise consciousness about an issue, and by providing this information through the project. More people will be able to see the problem and help reforms to help the organization’s mission of protecting human rights.

Working on this project was a great experience, but I still had my fears and concerns going into the experience. Mostly, technical concerns that I might not be able to deliver the features I wanted to implement but fortunately we were able to. Also, being a controversial topic. I worried about how other people could take the news of me working on it. Luckily, everyone supported me.

Breaking the roadmap into individual tasks was not an easy task. However, we decided. for the Data Science team, our roadmap was to Extract, transform, and load the data. Since we previously had data from the previous team, we already had previous data on a CSV file. For better API performance, we decided to go with a database. In order to do this, we decided to break the roadmap into individual maps by first working with the current data we already had, and add the new data after we curated the current data. and then provide the new data on the API for the web team to use it.

To do this. we divided into the following tasks: create a database, clean existing CSV data, import current data into the database, and finally create a code snippet that will automatically add new data to the database as it became available. The following Trello card displays an example of some of the tasks I worked on.

The Trello card above is significant because it divided the most important task of transferring the CSV file into a database since I was one of the strongest of my teammates with database queries, we unanimously decided I will take the task of creating the database, adding records, and deploying it. as one of my tasks. therefore I was in charge of most of that Trello card. with help of teammates for the classification and cleaning of the data.

The Technical Challenges

Working on a project that has been worked on by previous teams presents more technical challenges than starting a project from scratch. When starting a project from scratch, there are the usual technical challenges when code does not work the expected way. Also, when our approach is new and has never been used before, it is hard to implement the code if it has never been done before. Since every programmer has their own style of implementing a solution into code, when working with a project that other teams work on, it is harder to interpret their code because almost the entire codebase needs to be read to understand what the application is doing. Luckily although this part was time-consuming. it was easy to understand and we were able to make the necessary changes. a feature challenge was to provide latitude/longitude coordinates on the API. the data was originally one column called geolocation and we needed to split the values into lat/long but also inside the column, the data was in strings format as well as JSON format alternating. therefore, that was a challenging feature to implement.

The most technical challenge was to incorporate the SQL query into python. The main reason is that apparently python does not have a specific function to automatically filter commas and quotes from the text. Therefore, I had to include the entire query into the psycopg2_execute command. which made the code on that section a bit longer than I expected, but was the only way that worked for us. as you can see below. this section does not comply with PEP8 for python, but I decided to leave it as is since it works.

pg_curs.execute("""INSERT INTO table(dates,added_on, links, case_id, city, state,lat,long, title, description, tags,verbalization,empty_Hand_soft, empty_hand_hard, less_lethal_methods, lethal_force, uncategorized) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s);""",(row[0],current_dt,str(row[1]),str(row[2]),str(row[3]),str(row[4]),row[5],row[6],str(row[7]),str(row[8]),row[9],row[10],row[11],row[12],row[13],row[14],row[15]))print(counter)counter += 1

Most of the challenges my team faced were technical, albeit there weren’t many since we had a good team in my opinion. I contributed by either helping them understand and working with them. In the event we were both stuck I contributed by researching the issue and implementing the findings after understanding the concept.

An Ongoing project with great potential

The current project is in great condition thanks to the combined work of the previous team as well as the current team. current ship features are as followed.

DS team :

  • Gather’s data from the API
  • Cleans and classifies data
  • Insert Data into Database
  • Serves that data to the web team in the form of a Backend DS API

Web Team:

  • Gets data from the Data API
  • Process the Data
  • Display the Data on a map showing the location of each incident as well as how many incidents occurred in a specific location as well as an area
  • Shows different charts with all the incidents information
  • Show the incidents details in a dashboard
  • Save incidents into the user accounts of those who choose to save them.

Below is a video of a small snippet of user experience of our teamwork coming together

As seen in the video, the project has good data. However, I do believe that more features can be added. For Example, on the Data Science side, I would like to be able to add data not just for the U.S, but also worldwide. Also the ability to search the incident by city,stte, zip or date.

Some technical challenges I foresee are data cleaning and filtering the text to be optimized to be used in a database field, as well as validating that the data comes from a credible source, and data verification to be done automatically.

To conclude, this project was a great experience. I’ve learned new technical skills as well as other skills. Working in teams helped me improve both my communication skills as well as my technical skills. My communication skills improved because teammates gave me feedback and told me I should communicate more. My technical skills improve as a result of helping my teammates as well as finding answers to questions I had. This project furthers my career goals because it helped me realize which direction to take, and I believe that I would like to utilize what I’ve learned in a non-profit company or a company that helps those people in need using data science to find the best way to improve their lifestyle.

--

--