Assignment 6

In this assignment, you'll analyze a collection of data sets from the San Francisco Open Data Portal and Zillow. The data sets have been stored in the SQLite database sf_data.sqlite, which you can download here. The database contains the following tables:

Table Description
crime Crime reports dating back to 2010.
mobile_food_locations List of all locations where mobile food vendors sell.
mobile_food_permits List of all mobile food vendor permits. More details here.
mobile_food_schedule Schedules for mobile food vendors.
noise Noise complaints dating back to August 2015.
parking List of all parking lots.
parks List of all parks.
schools List of all schools.
zillow Zillow rent and housing statistics dating back to 1996. More details here.

The mobile_food_ tables are explicitly connected through the locationid and permit columns. The other tables are not connected, but you may be able to connect them using dates, latitude/longitude, or postal codes.

Shapefiles for US postal codes are available here. These may be useful for converting latitude/longitude to postal codes.

Shapefiles for San Francisco Neighborhoods are available here.

Exercise 1.1. Which mobile food vendor(s) sells at the most locations?

In [1]:
import sqlite3 as sql
In [2]:
db = sql.connect("sf_data.sqlite")
In [3]:
cursor = db.execute("SELECT * FROM sqlite_master")
In [4]:
cursor.fetchall()
Out[4]:
[(u'table',
  u'crime',
  u'crime',
  2,
  u'CREATE TABLE "crime" (\n"IncidntNum" INTEGER,\n  "Category" TEXT,\n  "Descript" TEXT,\n  "DayOfWeek" TEXT,\n  "Datetime" TIMESTAMP,\n  "PdDistrict" TEXT,\n  "Resolution" TEXT,\n  "Address" TEXT,\n  "Lon" REAL,\n  "Lat" REAL\n)'),
 (u'table',
  u'noise',
  u'noise',
  35775,
  u'CREATE TABLE "noise" (\n"CaseID" INTEGER,\n  "Type" TEXT,\n  "Address" TEXT,\n  "Neighborhood" TEXT,\n  "Datetime" TIMESTAMP,\n  "Lat" REAL,\n  "Lon" REAL\n)'),
 (u'table',
  u'parking',
  u'parking',
  35921,
  u'CREATE TABLE "parking" (\n"Owner" TEXT,\n  "Address" TEXT,\n  "PrimeType" TEXT,\n  "SecondType" TEXT,\n  "GarOrLot" TEXT,\n  "RegCap" INTEGER,\n  "ValetCap" INTEGER,\n  "MCCap" INTEGER,\n  "LandType" TEXT,\n  "Lat" REAL,\n  "Lon" REAL\n)'),
 (u'table',
  u'schools',
  u'schools',
  35944,
  u'CREATE TABLE "schools" (\n"Name" TEXT,\n  "Entity" TEXT,\n  "LowerGrade" INTEGER,\n  "UpperGrade" INTEGER,\n  "GradeRange" TEXT,\n  "Category" TEXT,\n  "LowerAge" INTEGER,\n  "UpperAge" INTEGER,\n  "GeneralType" TEXT,\n  "Address" TEXT,\n  "Lat" REAL,\n  "Lon" REAL\n)'),
 (u'table',
  u'parks',
  u'parks',
  35961,
  u'CREATE TABLE "parks" (\n"Name" TEXT,\n  "Type" TEXT,\n  "Acreage" REAL,\n  "ParkID" INTEGER,\n  "Lat" REAL,\n  "Lon" REAL\n)'),
 (u'table',
  u'zillow',
  u'zillow',
  35967,
  u'CREATE TABLE "zillow" (\n"RegionName" INTEGER,\n  "Date" TIMESTAMP,\n  "ZriPerSqft_AllHomes" REAL,\n  "MedianSoldPricePerSqft_AllHomes" REAL,\n  "PriceToRentRatio_AllHomes" REAL,\n  "Turnover_AllHomes" REAL\n)'),
 (u'table',
  u'mobile_food_permits',
  u'mobile_food_permits',
  36050,
  u'CREATE TABLE "mobile_food_permits" (\n"permit" TEXT,\n  "Status" TEXT,\n  "Applicant" TEXT,\n  "FacilityType" TEXT,\n  "FoodItems" TEXT,\n  "PriorPermit" INTEGER,\n  "Approved" TIMESTAMP,\n  "Expiration" TIMESTAMP\n)'),
 (u'table',
  u'mobile_food_locations',
  u'mobile_food_locations',
  36060,
  u'CREATE TABLE "mobile_food_locations" (\n"locationid" INTEGER,\n  "LocationDescription" TEXT,\n  "Address" TEXT,\n  "Latitude" REAL,\n  "Longitude" REAL\n)'),
 (u'table',
  u'mobile_food_schedule',
  u'mobile_food_schedule',
  36079,
  u'CREATE TABLE "mobile_food_schedule" (\n"locationid" INTEGER,\n  "permit" TEXT,\n  "DayOfWeek" TEXT,\n  "EndHour" INTEGER,\n  "StartHour" INTEGER\n)')]
In [5]:
import pandas as pd
In [6]:
pd.read_sql("SELECT * FROM sqlite_master", db)
Out[6]:
type name tbl_name rootpage sql
0 table crime crime 2 CREATE TABLE "crime" (\n"IncidntNum" INTEGER,\...
1 table noise noise 35775 CREATE TABLE "noise" (\n"CaseID" INTEGER,\n "...
2 table parking parking 35921 CREATE TABLE "parking" (\n"Owner" TEXT,\n "Ad...
3 table schools schools 35944 CREATE TABLE "schools" (\n"Name" TEXT,\n "Ent...
4 table parks parks 35961 CREATE TABLE "parks" (\n"Name" TEXT,\n "Type"...
5 table zillow zillow 35967 CREATE TABLE "zillow" (\n"RegionName" INTEGER,...
6 table mobile_food_permits mobile_food_permits 36050 CREATE TABLE "mobile_food_permits" (\n"permit"...
7 table mobile_food_locations mobile_food_locations 36060 CREATE TABLE "mobile_food_locations" (\n"locat...
8 table mobile_food_schedule mobile_food_schedule 36079 CREATE TABLE "mobile_food_schedule" (\n"locati...
In [7]:
mob_loc = pd.read_sql("SELECT * FROM mobile_food_locations", db)
mob_per = pd.read_sql("SELECT * FROM mobile_food_permits", db)
mob_sch = pd.read_sql("SELECT * FROM mobile_food_schedule", db)
In [8]:
mob_loc.head()
Out[8]:
locationid LocationDescription Address Latitude Longitude
0 762182 TOWNSEND ST: 05TH ST to 06TH ST (400 - 499) 444 TOWNSEND ST 37.774871 -122.398532
1 437211 MISSOURI ST: 20TH ST to SIERRA ST (500 - 630) 555 MISSOURI ST 37.759304 -122.395902
2 765880 17TH ST: SAN BRUNO AVE to UTAH ST (2200 - 2299) 2222 17TH ST 37.765216 -122.406007
3 765690 CESAR CHAVEZ ST: EVANS AVE to KANSAS ST (2100 ... 2323 CESAR CHAVEZ ST 37.748801 -122.400552
4 751253 FOLSOM ST: 14TH ST to 15TH ST (1800 - 1899) 1800 FOLSOM ST 37.767852 -122.416105
In [9]:
mob_per.head()
Out[9]:
permit Status Applicant FacilityType FoodItems PriorPermit Approved Expiration
0 16MFF-0027 APPROVED F & C Catering Truck Cold Truck: Hot/Cold Sandwiches: Water: Soda: ... 1 2016-03-09 12:00:00 2017-03-15 12:00:00
1 13MFF-0102 EXPIRED Natan's Catering Truck Burgers: melts: hot dogs: burritos:sandwiches:... 1 2013-04-12 12:00:00 2014-03-15 12:00:00
2 16MFF-0069 APPROVED Munch A Bunch Truck Cold Truck: packaged sandwiches: pitas: breakf... 1 2016-03-15 12:00:00 2017-03-15 12:00:00
3 16MFF-0051 APPROVED Park's Catering Truck Cold Truck: Hamburger: cheeseburgers: hot dogs... 1 2016-03-15 12:00:00 2017-03-15 12:00:00
4 16MFF-0010 REQUESTED Pipo's Grill Truck Tacos: Burritos: Hot Dogs: and Hamburgers 0 None None
In [10]:
mob_sch.head()
Out[10]:
locationid permit DayOfWeek EndHour StartHour
0 305727 11MFF-0040 Mo 15 10
1 305727 11MFF-0040 Tu 15 10
2 305727 11MFF-0040 We 15 10
3 305727 11MFF-0040 Th 15 10
4 305727 11MFF-0040 Fr 15 10
In [11]:
print len(mob_loc.index), len(mob_sch.index), len(mob_per.index)
754 3592 195
In [12]:
newdf = pd.read_sql("SELECT * FROM mobile_food_locations INNER JOIN mobile_food_schedule ON mobile_food_locations.locationID = mobile_food_schedule.locationID INNER JOIN mobile_food_permits ON mobile_food_permits.permit = mobile_food_schedule.permit", db)
In [13]:
newdf.head()
Out[13]:
locationid LocationDescription Address Latitude Longitude locationid permit DayOfWeek EndHour StartHour permit Status Applicant FacilityType FoodItems PriorPermit Approved Expiration
0 762182 TOWNSEND ST: 05TH ST to 06TH ST (400 - 499) 444 TOWNSEND ST 37.774871 -122.398532 762182 16MFF-0027 Fr 11 10 16MFF-0027 APPROVED F & C Catering Truck Cold Truck: Hot/Cold Sandwiches: Water: Soda: ... 1 2016-03-09 12:00:00 2017-03-15 12:00:00
1 762182 TOWNSEND ST: 05TH ST to 06TH ST (400 - 499) 444 TOWNSEND ST 37.774871 -122.398532 762182 16MFF-0027 Mo 11 10 16MFF-0027 APPROVED F & C Catering Truck Cold Truck: Hot/Cold Sandwiches: Water: Soda: ... 1 2016-03-09 12:00:00 2017-03-15 12:00:00
2 762182 TOWNSEND ST: 05TH ST to 06TH ST (400 - 499) 444 TOWNSEND ST 37.774871 -122.398532 762182 16MFF-0027 Th 11 10 16MFF-0027 APPROVED F & C Catering Truck Cold Truck: Hot/Cold Sandwiches: Water: Soda: ... 1 2016-03-09 12:00:00 2017-03-15 12:00:00
3 762182 TOWNSEND ST: 05TH ST to 06TH ST (400 - 499) 444 TOWNSEND ST 37.774871 -122.398532 762182 16MFF-0027 Tu 11 10 16MFF-0027 APPROVED F & C Catering Truck Cold Truck: Hot/Cold Sandwiches: Water: Soda: ... 1 2016-03-09 12:00:00 2017-03-15 12:00:00
4 762182 TOWNSEND ST: 05TH ST to 06TH ST (400 - 499) 444 TOWNSEND ST 37.774871 -122.398532 762182 16MFF-0027 We 11 10 16MFF-0027 APPROVED F & C Catering Truck Cold Truck: Hot/Cold Sandwiches: Water: Soda: ... 1 2016-03-09 12:00:00 2017-03-15 12:00:00
In [14]:
# Gets most frequent vendor
newdf['Applicant'].value_counts().idxmax
Out[14]:
<bound method Series.idxmax of May Catering                                                               340
Natan's Catering                                                           221
Anas Goodies Catering                                                      212
Liang Bai Ping                                                             182
Mang Hang Catering                                                         165
Park's Catering                                                            134
Steve's Mobile Deli                                                        125
John's Catering #5                                                         120
Singh Brothers Ice Cream                                                   105
Sun Rise Catering                                                          100
Two G's Catering                                                            98
M M Catering                                                                96
Quan Catering                                                               95
D & T Catering                                                              85
Mini Mobile Food Catering                                                   85
Mike's Catering                                                             83
F & C Catering                                                              79
Bach Catering                                                               60
Munch A Bunch                                                               52
San Francisco Carts & Concessions, Inc. DBA Stanley's Steamers Hot Dogs     42
Kara's Cupcakes                                                             41
BH & MT LLC                                                                 40
Halal Cart, LLC                                                             37
Linda's Catering                                                            35
The New York Frankfurter Co. of CA, Inc. DBA: Annie's Hot Dogs              35
San Francisco Street Foods, Inc.                                            35
Tacos Rodriguez                                                             34
Golden Catering                                                             31
Bombay Blvd.                                                                28
Tacos El Primo                                                              23
                                                                          ... 
Slider Shack, LLC.                                                           5
Smokin Warehouse Barbecue                                                    5
Tacos El Flaco                                                               5
Eli's Hot Dogs                                                               5
Mr. Nice, LLC                                                                5
Bonito Poke                                                                  4
Halal Cart of San Francisco                                                  4
Hiyaaa                                                                       4
Peruchi Food Truck,LLC                                                       3
Loma Linda                                                                   3
A Taste of Chicago                                                           3
Let's Be Frank                                                               3
El Calamar Perubian Food Truck                                               3
Alfaro's Truck                                                               3
Ruru Soul                                                                    3
Breaking Bread Inc.                                                          3
The Huge Hotdog Concession                                                   3
Rita's Catering                                                              3
Golden Gate Halal Food                                                       3
Missing Link SF                                                              3
Kiss Point                                                                   3
Viet Duong Co., Ltd.                                                         3
Geary Cocina                                                                 3
Raspados El Yocateco                                                         3
The Brooklyn Pig Inc.                                                        3
Mob Dog                                                                      3
Mario's Colombian and Mexican Food                                           3
Fruteria Serrano                                                             3
Buenafe                                                                      3
Munch India                                                                  2
Name: Applicant, dtype: int64>

May Catering sells at the most locations.

Exercise 1.2. Ask and use the database to analyze 5 questions about San Francisco. For each question, write at least 150 words and support your answer with plots. Make a map for at least 2 of the 5 questions.

You should try to come up with some questions on your own, but these are examples of reasonable questions:

  • Which parts of the city are the most and least expensive?
  • Which parts of the city are the most dangerous (and at what times)?
  • Are noise complaints and mobile food vendors related?
  • What are the best times and places to find food trucks?
  • Is there a relationship between housing prices and any of the other tables?

Please make sure to clearly state each of your questions in your submission.

In [15]:
from mpl_toolkits.basemap import Basemap
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

Which types of schools does San Francisco have most of and least of?

In [16]:
schools = pd.read_sql("SELECT * FROM schools", db)
In [17]:
schools.head()
Out[17]:
Name Entity LowerGrade UpperGrade GradeRange Category LowerAge UpperAge GeneralType Address Lat Lon
0 Alamo Elementary School SFUSD 0 5 K-5 USD Grades K-5 5 10 PS 250 23RD AVE, San Francisco, CA 94121 37.783005 -122.482300
1 Alvarado Elementary School SFUSD 0 5 K-5 USD Grades K-5 5 10 PS 625 DOUGLASS ST, San Francisco, CA 94114 37.753681 -122.438194
2 Aptos Middle School SFUSD 6 8 6-8 USD Grades 6-8 11 13 PS 105 APTOS AVE, San Francisco, CA 94127 37.729672 -122.465782
3 Argonne Early Education School SFUSD -2 0 PK-TK USD PreK/TK 3 5 PS 750 16TH AVE, San Francisco, CA 94118 37.773968 -122.474060
4 Argonne Elementary School SFUSD 0 5 K-5 USD Grades K-5 5 10 PS 680 18TH AVE, San Francisco, CA 94121 37.775307 -122.476311
In [18]:
schools['GradeRange'].value_counts().idxmax
Out[18]:
<bound method Series.idxmax of PK       165
K-8       65
K-5       51
9-12      36
PK-5      26
6-8       16
PK-K      13
K-12      11
13-14     10
PK-8       8
6-12       6
PK-4       6
8-12       4
TK-8       3
INF        3
3-8        2
TK-2       2
K-6        2
7-12       2
PK-TK      2
5-12       1
4-8        1
1-6        1
K-1        1
K-4        1
K-2        1
PK-12      1
1-4        1
PK-6       1
5-8        1
2-12       1
1-8        1
Name: GradeRange, dtype: int64>

A majority of schools in San Francisco cover the K-8 to K-5 range so most of them are elementary schools. The schools with a unique grade range (not the usual elementary, middle, and high school ranges) are in the minority and not common.

Which areas in San Francisco are the best areas to find parking?

In [19]:
parking = pd.read_sql("SELECT * FROM parking", db)
In [20]:
parking.head()
Out[20]:
Owner Address PrimeType SecondType GarOrLot RegCap ValetCap MCCap LandType Lat Lon
0 Private 2110 Market St PPA L 13 0 0 restaurant 37.767378 -122.429344
1 SFMTA 993 Potrero PPA L 34 0 0 37.757272 -122.406320
2 Port of SF 601 Terry A Francois Blvd PPA L 72 0 0 37.770135 -122.385801
3 Private 11 SOUTH VAN NESS PHO CPO G 130 0 0 37.774150 -122.418402
4 Private 101 CALIFORNIA ST PPA G 250 0 0 37.793243 -122.397495
In [21]:
lotspace = parking.loc[parking['RegCap'] > 500]
In [22]:
plt.figure(figsize=(15,10))
displaymap = Basemap(llcrnrlon=-122.5,llcrnrlat=37.7,urcrnrlon=-122.4,urcrnrlat=37.8, resolution = "h", projection ="merc")
<matplotlib.figure.Figure at 0xa72ab00>
In [23]:
# Convert longitudes and latitudes to list of floats
longitude = lotspace[['Lon']].values.tolist()
for i in range(0, len(longitude)):
    longitude[i] = float(longitude[i][0])
latitude = lotspace[['Lat']].values.tolist()
for i in range(0, len(latitude)):
    latitude[i] = float(latitude[i][0])
displaymap.drawmapboundary()
displaymap.drawcountries()
displaymap.drawcoastlines()
lons,lats = displaymap(longitude, latitude)
displaymap.plot(lons, lats, 'bo', color = "blue")
C:\Users\Apus\Anaconda2\lib\site-packages\mpl_toolkits\basemap\__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
C:\Users\Apus\Anaconda2\lib\site-packages\mpl_toolkits\basemap\__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
C:\Users\Apus\Anaconda2\lib\site-packages\mpl_toolkits\basemap\__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)
Out[23]:
[<matplotlib.lines.Line2D at 0x8edcf98>]

I defined the best parking areas to be spaces where the RegCap which is the maximum spaces allowed for parking to be greater than 500. From the map of San Francisco, it looks like most of those parking spaces are in the northeastern area of San Francisco.

Which areas in San Francisco have the most noise? Which type of noise causes the most noise and which areas in San Francisco have the most type of that noise?

In [24]:
noise = pd.read_sql("SELECT * FROM noise", db)
In [25]:
noise.head()
Out[25]:
CaseID Type Address Neighborhood Datetime Lat Lon
0 5130305 other_excessive_noise Not associated with a specific address None 2015-09-25 00:29:33 NaN NaN
1 6446138 amplified_sound_electronics Intersection of CLINTON PARK and GUERRERO ST Mission Dolores 2016-10-21 13:20:18 37.769148 -122.424475
2 5929789 other_excessive_noise 1301 48TH AVE, SAN FRANCISCO, CA, 94122 Outer Sunset 2016-06-01 11:38:59 37.762047 -122.508365
3 5659943 other_excessive_noise 736 LEAVENWORTH ST, SAN FRANCISCO, CA, 94109 Lower Nob Hill 2016-03-13 01:18:27 37.787983 -122.414943
4 5930704 other_excessive_noise 1360 43RD AVE, SAN FRANCISCO, CA, 94122 Outer Sunset 2016-06-01 13:58:43 37.761385 -122.502232
In [26]:
# deletes all N/A locations
noiseknown = noise.loc[np.isnan(noise['Lat']) == False]
In [32]:
plt.figure(figsize=(15,10))
displaymap = Basemap(llcrnrlon=-122.5,llcrnrlat=37.7,urcrnrlon=-122.4,urcrnrlat=37.8, resolution = "h", projection ="merc")
<matplotlib.figure.Figure at 0xd18f550>
In [33]:
# Convert longitudes and latitudes to list of floats
longitude = noiseknown[['Lon']].values.tolist()
for i in range(0, len(longitude)):
    longitude[i] = float(longitude[i][0])
latitude = noiseknown[['Lat']].values.tolist()
for i in range(0, len(latitude)):
    latitude[i] = float(latitude[i][0])
displaymap.drawmapboundary()
displaymap.drawcountries()
displaymap.drawcoastlines()
lons,lats = displaymap(longitude, latitude)
displaymap.plot(lons, lats, 'bo', color = "blue")
Out[33]:
[<matplotlib.lines.Line2D at 0xb15fcf8>]

From the map, it looks like most parts of San Francisco have noise but the eastern part seems to have more noise than the western part of San Francisco.

In [29]:
noise['Type'].value_counts().idxmax
Out[29]:
<bound method Series.idxmax of other_excessive_noise            1222
construction_private_property     795
amplified_sound_electronics       499
construction_public               386
mechanical_equipment              367
entertainment                     331
garbage_recycling_collection      261
traffic                           206
other_reallocate                  200
major_event_venue                 143
delivery_service_business          92
vehicle_car_alarm                  72
sirens                             41
tour_bus                           31
home_improvement_maintenance       30
delivery_service_vehicle           28
construction_emergency             27
vehicle_repair                     14
protest_speech_bullhorns           12
delivery_service_construction       9
emergency_equipment                 9
public_speech                       9
mobile_food_facility                7
hospital                            4
Name: Type, dtype: int64>
In [30]:
noisecon = noiseknown[noiseknown['Type'] == 'construction_private_property']
In [31]:
plt.figure(figsize=(15,10))
displaymap = Basemap(llcrnrlon=-122.5,llcrnrlat=37.7,urcrnrlon=-122.4,urcrnrlat=37.8, resolution = "h", projection ="merc")
# Convert longitudes and latitudes to list of floats
longitude = noisecon[['Lon']].values.tolist()
for i in range(0, len(longitude)):
    longitude[i] = float(longitude[i][0])
latitude = noisecon[['Lat']].values.tolist()
for i in range(0, len(latitude)):
    latitude[i] = float(latitude[i][0])
displaymap.drawmapboundary()
displaymap.drawcountries()
displaymap.drawcoastlines()
lons,lats = displaymap(longitude, latitude)
displaymap.plot(lons, lats, 'bo', color = "blue")
Out[31]:
[<matplotlib.lines.Line2D at 0xb15f2e8>]

Aside from other types of excessive noise which isn't in a specific category, the factor that causes the most noise is construction on private property. The points for construction noise is spread throughout the map of San Francisco but most of it is gathered around in the north eastern area of San Francisco.

Which parts of the city are the least expensive and which are the most expensive?

In [34]:
zillow = pd.read_sql("SELECT * FROM zillow", db)
In [37]:
zillow.head()
Out[37]:
RegionName Date ZriPerSqft_AllHomes MedianSoldPricePerSqft_AllHomes PriceToRentRatio_AllHomes Turnover_AllHomes
0 94109 2010-11-01 00:00:00 3.156 675.1913 19.14 6.0771
1 94110 2010-11-01 00:00:00 2.566 599.6785 18.10 5.4490
2 94122 2010-11-01 00:00:00 2.168 495.4432 17.99 2.4198
3 94080 2010-11-01 00:00:00 1.666 369.5538 18.31 3.8757
4 94112 2010-11-01 00:00:00 2.322 422.4538 14.44 3.1288
In [42]:
priceknown = zillow.loc[np.isnan(zillow['MedianSoldPricePerSqft_AllHomes']) == False]
In [50]:
maxprice = priceknown['MedianSoldPricePerSqft_AllHomes'].max()
expensive = priceknown.loc[priceknown['MedianSoldPricePerSqft_AllHomes'] == maxprice]
minprice = priceknown['MedianSoldPricePerSqft_AllHomes'].min()
cheap = priceknown.loc[priceknown['MedianSoldPricePerSqft_AllHomes'] == minprice]
print expensive.iloc[0]['RegionName'], cheap.iloc[0]['RegionName']
94104 94124

I based my answer on median sold price per square feet for all homes in one region. The most expensive houses are in the 94104 region and the cheapest houses are in the 94124 region.

Which types of violent crimes occur the most in San Francisco and which parts do they occur in the most?

In [52]:
crime = pd.read_sql("SELECT * FROM crime", db)
In [53]:
crime.head()
Out[53]:
IncidntNum Category Descript DayOfWeek Datetime PdDistrict Resolution Address Lon Lat
0 150060275 NON-CRIMINAL LOST PROPERTY Monday 2015-01-19 14:00:00 MISSION NONE 18TH ST / VALENCIA ST -122.421582 37.761701
1 150098210 ROBBERY ROBBERY, BODILY FORCE Sunday 2015-02-01 15:45:00 TENDERLOIN NONE 300 Block of LEAVENWORTH ST -122.414406 37.784191
2 150098210 ASSAULT AGGRAVATED ASSAULT WITH BODILY FORCE Sunday 2015-02-01 15:45:00 TENDERLOIN NONE 300 Block of LEAVENWORTH ST -122.414406 37.784191
3 150098210 SECONDARY CODES DOMESTIC VIOLENCE Sunday 2015-02-01 15:45:00 TENDERLOIN NONE 300 Block of LEAVENWORTH ST -122.414406 37.784191
4 150098226 VANDALISM MALICIOUS MISCHIEF, VANDALISM OF VEHICLES Tuesday 2015-01-27 19:00:00 NORTHERN NONE LOMBARD ST / LAGUNA ST -122.431119 37.800469
In [58]:
crime['Category'].value_counts().idxmax
Out[58]:
<bound method Series.idxmax of LARCENY/THEFT                  243144
OTHER OFFENSES                 141431
NON-CRIMINAL                   125702
ASSAULT                         90029
VANDALISM                       54467
WARRANTS                        46061
VEHICLE THEFT                   43685
DRUG/NARCOTIC                   43674
SUSPICIOUS OCC                  40894
BURGLARY                        40851
MISSING PERSON                  32261
ROBBERY                         25706
FRAUD                           19848
SECONDARY CODES                 13499
WEAPON LAWS                     10767
TRESPASS                         9188
STOLEN PROPERTY                  6658
FORGERY/COUNTERFEITING           5971
SEX OFFENSES, FORCIBLE           5803
PROSTITUTION                     5281
DRUNKENNESS                      4658
RECOVERED VEHICLE                4510
DISORDERLY CONDUCT               4358
DRIVING UNDER THE INFLUENCE      3017
KIDNAPPING                       2726
ARSON                            1771
RUNAWAY                          1677
LIQUOR LAWS                      1547
EMBEZZLEMENT                     1166
LOITERING                         606
SUICIDE                           544
FAMILY OFFENSES                   517
BRIBERY                           445
EXTORTION                         303
BAD CHECKS                        281
SEX OFFENSES, NON FORCIBLE        191
GAMBLING                          126
PORNOGRAPHY/OBSCENE MAT            25
TREA                               13
Name: Category, dtype: int64>
In [60]:
assault = crime[crime['Category'] == 'ASSAULT']
In [61]:
assault.head()
Out[61]:
IncidntNum Category Descript DayOfWeek Datetime PdDistrict Resolution Address Lon Lat
2 150098210 ASSAULT AGGRAVATED ASSAULT WITH BODILY FORCE Sunday 2015-02-01 15:45:00 TENDERLOIN NONE 300 Block of LEAVENWORTH ST -122.414406 37.784191
22 150098414 ASSAULT AGGRAVATED ASSAULT WITH BODILY FORCE Sunday 2015-02-01 17:05:00 CENTRAL ARREST, BOOKED PACIFIC AV / GRANT AV -122.406832 37.796903
23 150098414 ASSAULT BATTERY WITH SERIOUS INJURIES Sunday 2015-02-01 17:05:00 CENTRAL ARREST, BOOKED PACIFIC AV / GRANT AV -122.406832 37.796903
25 150098420 ASSAULT AGGRAVATED ASSAULT WITH BODILY FORCE Sunday 2015-02-01 17:10:00 TENDERLOIN ARREST, BOOKED 400 Block of ELLIS ST -122.413609 37.784697
29 150098458 ASSAULT BATTERY OF A POLICE OFFICER Sunday 2015-02-01 16:56:00 MISSION ARREST, BOOKED 2000 Block of MISSION ST -122.419520 37.764229
In [63]:
assault['PdDistrict'].value_counts().idxmax
Out[63]:
<bound method Series.idxmax of SOUTHERN      15114
MISSION       13580
BAYVIEW       11631
INGLESIDE      9844
NORTHERN       9479
TENDERLOIN     8764
CENTRAL        8257
TARAVAL        6180
PARK           3745
RICHMOND       3435
Name: PdDistrict, dtype: int64>

I defined violent crimes as those that cause people bodily harm and found that assault is the most common of those crimes. The Southern, Mission and Bayview district have the most number of these types of crimes.

In [64]:
db.close()