Using Natural Language Processing to Find Instances of Police Brutality In The United States

The Journey

Human Rights First is an independent organization pushing for total human rights and equality within the United States through political brainstorming, creative policy development, various campaigns, data gathering and research, and mass education. These human rights issues are especially important in today’s world where the frequency of inequality and injustices continue to run rampant. You can find more information and ways to help the Human Rights First organization at their website Human Rights First.

In a team consisting of one team project lead, five web developers, and three data scientists we worked on a heavily specific subdomain of the Human…


Introduction

Welcome back, reader, to the next installment of my Python to C++ series! Last week I gave a brief introduction to the series and went over some of the basic similarities and differences between the Python and C++ languages. I ended things by creating a Hello World program in each language and explaining the differences in the code line by line.

In this week’s installment I will be going over differences and similarities in general output, data types, variables, and collecting input form the user between the two languages. Let our journey continue!

Printing and Output

Printing output to the user is one…


An Introduction

As an aspiring Data Scientist I got my start into the great world of Computer Science through learning the programming language Python. This is a language that is highly user-friendly with a number of great libraries for program creation, data gathering, data analysis, etc. From there I chose to expand my knowledge by delving into the world of C programming, specifically C++. After having done some research on programming languages I decided upon C++ because of its widespread usability within the data science community and beyond. …


Hello, brave readers, and welcome to a new series of mine called “Tackling Kaggle Tasks”. In this series I will be exploring the vast ocean of data that Kaggle has to offer and completing various tasks that each dataset owner puts forth pertaining to their submitted dataset. For this edition we will be taking on a dataset called “Solar Power Generation Data” submitted by one Ani Kannal. The dataset includes data records of two solar panel plant sites in India over a 34 day period. There are a total of 4 data files, 2 files for each plant site. One…


Picture courtesy of Kaggle.com

Hello and welcome back! This will be my final installment using Kaggle’s ever-popular Titanic dataset. Let’s hope we can make the best of it. For this installment I decided to use the features and preprocessing I used in the first Titanic dataset installment. For a refresher, my dataset looks like so:


We are back at it again, this time seeing if we can get a closer look at some of the data and possibly engineer some features that will increase our accuracy score. For this installment I began by dropping the same 4 columns as before, “PassengerId”, “Name”, “Ticket”, “Embarked” as I still found little value in them. For a refresher, my dataframe looks like so:

A look at the first five rows of my dataframe

From the last exploration we determined that 75% of the values in the “Cabin” column contained NaN (empty) values. I wanted to get a closer look at why this might be, and if it perhaps…


I began my journey where many others began theirs: testing out the limits of Kaggle notebooks using the ever-popular Titanic dataset. This dataset includes 11 base attributes of which we have to decide their usefulness in predicting a passenger’s survival on the infamous Titanic voyage.

The attributes provided to us include PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, and Embarked. This list and a quick glance at the first five rows of data can be see in the picture below.

Attribute/column names and their first five rows


The coveted grandmaster status

In this series I will document my journey, from start to grand finale, to becoming a grandmaster on Kaggle.com. This will be no easy feat. Indeed, I anticipate both sweat and tears, but this arduous accolade will be well worth it in the end. Wish me the best of luck, though I assure you no one is wishing louder than myself.

Daniel Benson

I am a Data Scientist and writer prone to excitement and passion. I look forward to a future I am able to focus those characteristics into work I love.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store