Understanding college debt

I want to see what types of universities have students in debt. We will be using the college scorecard API from data.gov. I am interested in a number of variables, these are admissions rate, tuition, median household income, student size, etc.

It seems that the UC office of president claims that the UC schools are a good deal based on data from the college scorecard.

You should go begin by getting an API key for data.gov. There are many data.gov APIs that you can use this key for and they have fairly relaxed limits on the number of requests.

The documentation for the API can be found here...
https://github.com/RTICWDT/open-data-maker/blob/master/API.md
https://collegescorecard.ed.gov/data/

Let's import some packages.

In [1]:
from urllib2 import Request, urlopen
import pandas as pd
import json
from matplotlib import pyplot as plt
plt.style.use('ggplot')
%matplotlib inline

Here are some variable names that we will be interested in.

In [2]:
varnames = """id
school.name
school.degrees_awarded.predominant
2014.admissions.admission_rate.overall
2014.cost.tuition.in_state
2014.cost.tuition.out_of_state
2014.aid.median_debt.completers.overall
2014.aid.median_debt.noncompleters
2005.student.demographics.median_hh_income
2005.student.demographics.poverty_rate
2014.aid.federal_loan_rate
2014.aid.pell_grant_rate
2014.student.size"""
fieldnames = varnames.replace("\n", ",")

Exercise (1) I would like to make a complicated query from the API. An example query from the API is

https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=3&_fields=school.name,
2014.aid.pell_grant_rate&_sort=2014.student.size:desc&_page=1
&_per_page=100&api_key=YOURKEY

Which can be parsed as

  1. the base url: https://api.data.gov/ed/collegescorecard/v1/schools.json,
  2. a filter of school.degrees_awarded.predominant=3,
  3. return the fields school.name,2014.aid.pell_grant_rate,
  4. sort by 2014.student.size in descending order,
  5. return 1 page of 100 entries,
  6. and use your api key

You can find a more complete description: https://github.com/RTICWDT/open-data-maker/blob/master/API.md

Fill in the following def that fits the following docstring, and gives the shown output.

def query_url(key,fieldnames,filterstring,sortvar,numpages):
    """
    This takes all of the necessary parameters to form a query 
    to college scorecard
    Input: key (data.gov API key, string), fieldnames (list of variable 
      names), filterstring (string for filtering), sortvar (variable to 
      sort with by descending), numpages (number of pages to return)
    Output: query url
    """


query_url(key,fieldnames,'school.degrees_awarded.predominant=3',
  '2014.student.size',2)

https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=3&_fields=id,school.name,school.degrees_awarded.predominant,school.degree_urbanization,2014.admissions.admission_rate.overall,2014.cost.tuition.in_state,2014.cost.tuition.out_of_state,2014.aid.median_debt.completers.overall,2014.aid.median_debt.noncompleters,2005.student.demographics.median_hh_income,2005.student.demographics.poverty_rate,2014.aid.federal_loan_rate,2014.aid.pell_grant_rate,2014.student.size&_page=2&_per_page=100&_sort=2014.student.size:desc&api_key=YOURKEYHERE

Try to reproduce the above behavior.

In [41]:
def query_url(key,fieldnames,filterstring,sortvar,numpages):
    base_url = 'https://api.data.gov/ed/collegescorecard/v1/schools.json?'
    numpages = str(numpages)
    url = base_url + filterstring + '&_fields=' + fieldnames + '&_page=' + numpages + '&_per_page=100' + '&_sort=' + sortvar + ':desc&api_key=' + key
    return url
searchurl = query_url('A8tjHzrA9pJQTudS94B6XSGMB3BzXeSSwrgUGlix', fieldnames, 'school.degrees_awarded.predominant=3', '2014.student.size', 1)
searchurl
Out[41]:
'https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=3&_fields=id,school.name,school.degrees_awarded.predominant,2014.admissions.admission_rate.overall,2014.cost.tuition.in_state,2014.cost.tuition.out_of_state,2014.aid.median_debt.completers.overall,2014.aid.median_debt.noncompleters,2005.student.demographics.median_hh_income,2005.student.demographics.poverty_rate,2014.aid.federal_loan_rate,2014.aid.pell_grant_rate,2014.student.size&_page=1&_per_page=100&_sort=2014.student.size:desc&api_key=A8tjHzrA9pJQTudS94B6XSGMB3BzXeSSwrgUGlix'

Exercise (2) Make the query using Request and load the results into a DataFrame that makes sense.

In [42]:
request = Request(searchurl)
response = urlopen(request)
rr = response.read()
In [43]:
print rr[0:400]
{"metadata":{"total":2102,"page":1,"per_page":100},"results":[{"2014.cost.tuition.out_of_state":17459,"2014.aid.median_debt.noncompleters":10809.0,"2014.cost.tuition.in_state":6299,"2014.admissions.admission_rate.overall":0.594,"id":110556,"2005.student.demographics.poverty_rate":16.0699996948242,"2014.aid.pell_grant_rate":0.5699,"2005.student.demographics.median_hh_income":53295,"2014.student.siz
In [44]:
js = json.loads(rr)
type(js)
js.keys()
Out[44]:
[u'results', u'metadata']
In [45]:
observations = js[u'results']
In [46]:
college = pd.DataFrame(data = observations)
college
Out[46]:
2005.student.demographics.median_hh_income 2005.student.demographics.poverty_rate 2014.admissions.admission_rate.overall 2014.aid.federal_loan_rate 2014.aid.median_debt.completers.overall 2014.aid.median_debt.noncompleters 2014.aid.pell_grant_rate 2014.cost.tuition.in_state 2014.cost.tuition.out_of_state 2014.student.size id school.degrees_awarded.predominant school.name
0 53295.0 16.070000 0.5940 0.3316 15993.5 10809.0 0.5699 6299 17459 20128 110556 3 California State University-Fresno
1 60976.0 5.940000 0.6989 0.3856 21500.0 7500.0 0.1989 8170 22090 19979 181464 3 University of Nebraska-Lincoln
2 60340.0 7.850000 0.7206 0.5668 21743.0 9250.0 0.4473 7876 24124 19928 220978 3 Middle Tennessee State University
3 63797.0 7.600000 0.8189 0.5588 23178.5 12500.0 0.3500 8970 25140 19842 232982 3 Old Dominion University
4 59909.0 6.270000 0.9493 0.4922 22236.0 9500.0 0.2447 9034 22624 19727 155399 3 Kansas State University
5 62889.0 7.110000 0.6933 0.5142 23368.5 12500.0 0.4065 7794 23319 19662 209807 3 Portland State University
6 NaN NaN NaN 0.5076 33000.0 7881.5 0.5145 10541 10541 19657 484631 3 University of Phoenix-California
7 46777.0 19.010000 0.9989 0.4686 17529.0 7500.0 0.5760 7259 18389 19614 228796 3 The University of Texas at El Paso
8 63949.0 8.250000 0.5781 0.4693 19500.0 8750.0 0.2965 8870 22290 19488 196088 3 University at Buffalo
9 74683.0 7.160000 0.3089 0.3391 19166.0 11000.0 0.2012 8918 20078 19177 110422 3 California Polytechnic State University-San Lu...
10 70891.0 5.240000 0.9142 0.4456 19793.0 8250.0 0.2216 10448 25731 18802 155317 3 University of Kansas
11 68292.0 11.290000 0.5829 0.5538 19500.0 9637.5 0.5752 13407 36285 18784 110671 3 University of California-Riverside
12 81580.0 4.660000 0.6573 0.3824 20500.0 8750.0 0.1394 9662 24522 18697 232423 3 James Madison University
13 58865.0 9.000000 NaN 0.8829 34135.5 6544.0 0.6901 11283 11283 18566 444158 3 Colorado Technical University-Online
14 80470.0 4.980000 0.5921 0.4708 18500.0 8250.0 0.2552 8590 20268 18546 164076 3 Towson University
15 58452.0 6.890000 0.5312 0.5447 26813.5 9500.0 0.1658 17772 28168 18474 215293 3 University of Pittsburgh-Pittsburgh Campus
16 69141.0 5.600000 0.8406 0.6679 26587.0 11063.0 0.3861 10685 24917 18450 172699 3 Western Michigan University
17 76787.0 8.750000 0.1802 0.3729 21500.0 15000.0 0.2337 48280 48280 18392 123961 3 University of Southern California
18 84189.0 4.380000 0.6783 0.5156 24195.0 8000.0 0.1359 12342 30692 18222 130943 3 University of Delaware
19 58778.0 8.120000 0.9571 0.5964 26000.0 9500.0 0.3928 10260 18792 18148 200800 3 University of Akron Main Campus
20 74746.0 5.020000 0.7388 0.6017 22500.0 8250.0 0.2665 13296 20886 18056 145813 3 Illinois State University
21 NaN NaN 0.7225 0.7979 40150.0 12955.0 0.6235 16888 16888 18055 482477 3 DeVry University-Illinois
22 80999.0 4.960000 0.4996 0.4978 21500.0 9945.5 0.2066 12700 32880 18016 129020 3 University of Connecticut
23 66893.0 7.010000 0.2850 0.3198 16149.0 11000.0 0.2128 8336 33418 17908 199120 3 University of North Carolina at Chapel Hill
24 35906.0 30.620001 0.6280 0.3816 12036.5 5500.0 0.6297 5173 14023 17811 227368 3 The University of Texas Rio Grande Valley
25 53374.0 12.170000 0.8122 0.4545 20500.0 10500.0 0.2838 7096 19144 17809 176017 3 University of Mississippi
26 69989.0 6.900000 0.6887 0.7320 26000.0 12500.0 0.4391 9663 25706 17723 169798 3 Eastern Michigan University
27 66371.0 6.180000 NaN 0.2561 14926.0 6334.0 0.2873 5184 13838 17413 230782 3 Weber State University
28 61912.0 11.530000 0.7733 0.6010 21393.0 12000.0 0.4790 11449 24472 17242 172644 3 Wayne State University
29 62772.0 9.620000 0.6310 0.6236 22500.0 10000.0 0.3849 6170 16487 17232 139931 3 Georgia Southern University
... ... ... ... ... ... ... ... ... ... ... ... ... ...
70 80346.0 6.490000 0.1419 0.3104 12034.5 8289.0 0.1630 47286 47286 14195 190415 3 Cornell University
71 66707.0 6.550000 0.8457 0.4670 19500.0 8866.0 0.2556 8965 20407 14054 237011 3 Western Washington University
72 77933.0 6.270000 0.3338 0.3372 24250.0 13313.5 0.1823 11394 30698 13996 139755 3 Georgia Institute of Technology-Main Campus
73 62628.0 7.120000 0.6057 0.4000 17402.0 7816.0 0.3212 6385 20789 13884 136172 3 University of North Florida
74 73627.0 7.460000 0.5536 0.4520 25131.0 8750.0 0.2135 38120 38120 13801 223232 3 Baylor University
75 63749.0 5.980000 0.5337 0.6624 26000.0 9500.0 0.3312 10726 18034 13697 201441 3 Bowling Green State University-Main Campus
76 71646.0 7.200000 0.7882 0.3746 19000.0 8000.0 0.3015 10620 29412 13667 141574 3 University of Hawaii at Manoa
77 59551.0 13.430000 0.8417 0.4155 19003.0 9837.0 0.4655 5516 14204 13662 225432 3 University of Houston-Downtown
78 74655.0 5.170000 0.5348 0.6285 23875.0 12500.0 0.2434 9144 19594 13610 216764 3 West Chester University of Pennsylvania
79 80190.0 5.690000 0.3222 0.4410 28501.0 19500.0 0.1321 43440 43440 13492 167358 3 Northeastern University
80 71392.0 8.780000 0.4406 0.4653 20000.0 10500.0 0.2861 8620 20260 13372 196079 3 SUNY at Binghamton
81 47617.0 13.740000 0.7387 0.5829 23500.0 9500.0 0.4242 7920 17448 13323 156620 3 Eastern Kentucky University
82 NaN NaN 0.6000 0.6711 24581.0 14769.0 0.2620 17280 17280 13271 454227 3 Chamberlain College of Nursing-Illinois
83 62394.0 6.730000 0.5888 0.4578 18750.0 7503.0 0.3438 6118 25162 13253 433660 3 Florida Gulf Coast University
84 72503.0 6.910000 0.7681 0.6714 23250.0 9250.0 0.2728 12506 28072 13205 217484 3 University of Rhode Island
85 59313.0 8.950000 0.8153 0.6092 22250.0 9500.0 0.4330 12248 24871 13171 149222 3 Southern Illinois University-Carbondale
86 53207.0 8.580000 0.8385 0.4699 24854.0 10500.0 0.2938 6800 21391 13117 180461 3 Montana State University
87 65469.0 12.270000 0.9427 0.1921 14666.0 5636.0 0.4530 6458 13268 12944 190558 3 College of Staten Island CUNY
88 49392.0 18.700001 0.3533 0.2179 10456.0 6500.0 0.4879 6536 13346 12925 190549 3 CUNY Brooklyn College
89 72116.0 6.700000 NaN 0.2647 19449.5 9250.0 0.2385 5494 16466 12865 102553 3 University of Alaska Anchorage
90 51361.0 19.879999 0.3731 0.2067 10092.0 6132.0 0.5530 6359 13169 12758 190600 3 CUNY John Jay College of Criminal Justice
91 62176.0 7.730000 0.5855 0.5178 20584.0 8741.0 0.2899 6392 20420 12686 199218 3 University of North Carolina Wilmington
92 70013.0 8.100000 0.5584 0.5944 21489.5 9249.0 0.3904 8302 19942 12659 196060 3 SUNY at Albany
93 69175.0 6.540000 0.5747 0.6058 27000.0 14000.0 0.2940 35768 35768 12635 195003 3 Rochester Institute of Technology
94 61040.0 6.820000 0.7842 0.6638 24425.5 10500.0 0.4124 10677 16467 12632 169910 3 Ferris State University
95 77484.0 4.130000 0.8002 0.6752 27000.0 9500.0 0.2250 16552 29532 12591 183044 3 University of New Hampshire-Main Campus
96 56952.0 8.900000 0.7439 0.5201 20800.0 10500.0 0.3829 7972 21066 12402 235097 3 Eastern Washington University
97 66314.0 4.750000 0.6658 0.5862 22997.0 8767.0 0.2869 7574 15053 12332 173920 3 Minnesota State University-Mankato
98 78750.0 8.390000 0.7031 0.3865 15880.0 11000.0 0.4758 6564 17724 12275 110574 3 California State University-East Bay
99 54533.0 17.820000 0.4234 0.3827 15594.0 11000.0 0.6047 6139 17299 12199 110547 3 California State University-Dominguez Hills

100 rows × 13 columns

In [47]:
searchurl = query_url('A8tjHzrA9pJQTudS94B6XSGMB3BzXeSSwrgUGlix', fieldnames, 'school.degrees_awarded.predominant=3', '2014.student.size', 2)
request = Request(searchurl)
response = urlopen(request)
rr = response.read()
js = json.loads(rr)
observations = js[u'results']
collegedfs = pd.DataFrame(data = observations)
college = college.append(collegedfs)
In [53]:
college
college = college.rename(columns=lambda i: ['Median Income', 'Poverty Rate', 'Admission Rate', 'Federal Loan Rate', 'Mean Debt Completers', 'Mean Debt Noncompleters', 'Pell Grant Rate', 'In State Tuition', 'Out State Tuition', 'Size', 'ID', 'School Degrees', 'School Name'][i])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-2cb8d49ae7e8> in <module>()
      1 college
----> 2 college = college.rename(columns=lambda i: ['Median Income', 'Poverty Rate', 'Admission Rate', 'Federal Loan Rate', 'Mean Debt Completers', 'Mean Debt Noncompleters', 'Pell Grant Rate', 'In State Tuition', 'Out State Tuition', 'Size', 'ID', 'School Degrees', 'School Name'][i])

C:\Users\Apus\Anaconda2\lib\site-packages\pandas\core\frame.pyc in rename(self, index, columns, **kwargs)
   2832     def rename(self, index=None, columns=None, **kwargs):
   2833         return super(DataFrame, self).rename(index=index, columns=columns,
-> 2834                                              **kwargs)
   2835 
   2836     @Appender(_shared_docs['fillna'] % _shared_doc_kwargs)

C:\Users\Apus\Anaconda2\lib\site-packages\pandas\core\generic.pyc in rename(self, *args, **kwargs)
    678 
    679             baxis = self._get_block_manager_axis(axis)
--> 680             result._data = result._data.rename_axis(f, axis=baxis, copy=copy)
    681             result._clear_item_cache()
    682 

C:\Users\Apus\Anaconda2\lib\site-packages\pandas\core\internals.pyc in rename_axis(self, mapper, axis, copy)
   2816         """
   2817         obj = self.copy(deep=copy)
-> 2818         obj.set_axis(axis, _transform_index(self.axes[axis], mapper))
   2819         return obj
   2820 

C:\Users\Apus\Anaconda2\lib\site-packages\pandas\core\internals.pyc in _transform_index(index, func)
   4746         return MultiIndex.from_tuples(items, names=index.names)
   4747     else:
-> 4748         items = [func(x) for x in index]
   4749         return Index(items, name=index.name)
   4750 

<ipython-input-53-2cb8d49ae7e8> in <lambda>(i)
      1 college
----> 2 college = college.rename(columns=lambda i: ['Median Income', 'Poverty Rate', 'Admission Rate', 'Federal Loan Rate', 'Mean Debt Completers', 'Mean Debt Noncompleters', 'Pell Grant Rate', 'In State Tuition', 'Out State Tuition', 'Size', 'ID', 'School Degrees', 'School Name'][i])

TypeError: list indices must be integers, not unicode
In [31]:
print len(college.columns), len(college.index)
13 200

Exercise (3) Produce scatterplots to demonstrate the associations between student debt (for 'completers') and median household income, pell grant rates, student poverty rate, tuition, and federal loan rate. Do you have an explanation for the associations that you are seeing? Are there some surprising associations.

In [23]:
from pandas.tools.plotting import scatter_matrix
In [37]:
debt = college[['2014.aid.median_debt.completers.overall']]
income = college[['2005.student.demographics.median_hh_income']]
plt.scatter(income, debt)
plt.show
Out[37]:
<function matplotlib.pyplot.show>
In [40]:
scatter_matrix(college[['2014.aid.median_debt.completers.overall', '2005.student.demographics.median_hh_income', '2014.aid.pell_grant_rate', '2005.student.demographics.poverty_rate', '2014.cost.tuition.in_state', '2014.aid.federal_loan_rate']], figsize = (16,16))
Out[40]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000002FCB3128>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003074D4E0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030808DA0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000308F57B8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030ABF2E8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030BA89B0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000030C72630>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030D19F60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030E26CC0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030D7CB38>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003102B4E0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031052978>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000003119D278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003129DD68>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031393BE0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000314A0710>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000315A3F60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000316B1BE0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000031724550>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000318731D0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031777C18>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031A689B0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031A9DA58>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031C2EF60>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000031CF8908>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031DEF358>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031EF0D68>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031FE7780>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000320C4358>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003216CC88>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000032279860>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003232D278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003247E080>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000323AE278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000326607B8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000326ED160>]], dtype=object)
In [ ]: