Helping Families with Data Science
The use of technology in recent years has skyrocketed, especially in the Artificial Intelligence and Data Science sector. Many people are cautious and even afraid of this new technology as it can be used for malevolent purposes. However, Data Science, like any other subject, is just a tool. It can be used for both evil and good. There are many pieces of literature that will tackle all of the possible uses of Artificial Intelligence and Data Science. However, I will be focusing on the good aspects of Data Science, specifically on a project that my team and I have worked on in the past month.
We recently worked on a project for a non-profit organization called Family Promise of Spokane, an organization that helps families that are experiencing or at risk of homelessness. To accomplish this, they have several programs such as rental assistance, emergency shelter for the whole family, interface hospitality network rotational shelter, and aftercare services for graduated families. Their website at Family Promise of Spokane has all the needed information on their programs and how to make use of their services.
Although they have a great website with great information as shown in the screenshot above, they were using a paper intake package which, according to our stakeholder, was taking around two hours to complete per family. Our team tried to solve that problem by creating a web application to digitize this process. By using a digitized profile per guest, we also have the advantage of using group data (via a CSV dataset) of previous guests and their exit destinations, such as Permanent Exit (found permanent housing), Temporary Exit (temporary housing), Emergency Shelter (Hotel or RHY founded host home shelter), Transitional Housing (Detox Centers, Safe Haven, etc.), and Unknown (for guests whose status is unknown). By using their previous data, we will be using a predictive model to predict the current guests’ most likely outcome depending on their information. The advantage of using this predictive model is that case managers may be able to alter their outcome by changing their strategy to help guests find permanent housing as often as possible.
The main concern going into this project was the technical challenge that I would face trying to create a predictive model. Since we were starting the project from scratch, that presents an even greater challenge because all the planning has to be done with extra attention to detail. This concern grew bigger after seeing the dataset that we had to work with. It had many columns, and it was not cleaned properly for a predictive model. Therefore, the Data Science team had to do a lot of exploratory data analysis and data cleaning to come up with at least a base model that we could improve later. Another concern after looking at the size of the dataset was that we would take too long to explore and clean the dataset and not have at least a base model ready on time.
CONCERNS THAT BECAME REALITY (THE TECHNICAL CHALLENGES)
As mentioned above, one of the main concerns I had was that we were not going to be able to deploy a working predictive model. As the subtitle suggests, this became a reality. However, not exactly. Luckily, we were able to release a working predictive model for the Web team to use that has good accuracy. We were also able to create a predictive model with better accuracy, at around 70%. However, since some of the features used on the model from our dataset differ from or do not exist on the database the Web team created, the latest model was not able to be deployed to the front end. We were unable to deploy the more accurate model because we were missing those features required for the prediction. This issue was discovered when I, as the designated Data Engineer, was trying to implement by comparing the dataset columns used in the model against the database that the Web team used to register the guests’ profiles. Although we were unable to overcome this issue due to time constraints, I was able to write clear documentation for the next team which will make it very simple for them to implement. Due to the length of the code, I would not include it here; however, anyone interested in seeing the documentation, since this is an open-source project, can refer to the README.md file of the project’s Data Science Github page, along with its contributors, and general project documentation.
THE BEGINNING OF THE END.
This project presents a great opportunity for Family Promise of Spokane to better serve homeless families and families in need. By using the latest technology in Data Science and statistics, along with competent case managers, it could be the beginning to at least decrease the amount of homelessness of families by adjusting the programs according to the results of their programs as well as the predictive model on a family or individual. By predicting their exit destinations ahead of time, changes can be made to assure a positive exit destination for them. This current project is set up for four teams across four months. As the first team, we were able to provide a digital intake package, a CRM for case managers and supervisors to manage families and members, as well as a way for guests to check-in and update their profiles. On the Data Science side of the project, we were able to provide a total of two working predictive models. As a Data Engineer, I was able to connect the model with a Data Science API to which the Web team can connect and show the exit destination to case managers or supervisors. Although I would like to show a video demo, even though the Data Science endpoint is connected to the Web team, it is not currently deployed on the profile. However, below is an example of what the Web team will receive from the Data Science API. (To protect guests’ privacy, the following values are just examples)
{“identification”: guest_identification,“strategy”: “Permanent Exit”,“features”: {“highest feature: 0.33056536070162607,“second highest feature”: 0.22338306612570102,“third highest feature”: 0.1369112645381957}
Although I believe a great advancement has been made to help at least some of Spokane’s homeless families with this project, I do believe that more can be done on both the Data Science as well as the Web teams. As a Data Scientist, I believe that the main future features will be to improve the model accuracy to as close to 100% as possible, include a percentage probability of the predicted exit destination, and if possible, suggest to the case managers the changes needed to have a different prediction in case the prediction does not help the family. Due to the dimensionality of the dataset, I do believe that achieving that high accuracy model is a technical challenge that future teams might face, but I also believe it can be overcome. This being the second non-profit organization I worked with, it further, and also the third time that I worked as a Data Engineer, it will give me more experience as a Data Engineer and it will further the career goals I wanted from the beginning. Also, working with a non-profit organization helps to show my work ethic, and I want to use Data Science to do good, especially now that there are a lot of concerns about Artificial Intelligence and Data Science.