Understanding college debt¶

I want to see what types of universities have students in debt. We will be using the college scorecard API from data.gov. I am interested in a number of variables, these are admissions rate, tuition, median household income, student size, etc.

It seems that the UC office of president claims that the UC schools are a good deal based on data from the college scorecard.

You should go begin by getting an API key for data.gov. There are many data.gov APIs that you can use this key for and they have fairly relaxed limits on the number of requests.

The documentation for the API can be found here...
https://github.com/RTICWDT/open-data-maker/blob/master/API.md
https://collegescorecard.ed.gov/data/

Let's import some packages.

from urllib2 import Request, urlopen
import pandas as pd
import json
from matplotlib import pyplot as plt
plt.style.use('ggplot')
%matplotlib inline

Here are some variable names that we will be interested in.

varnames = """id
school.name
school.degrees_awarded.predominant
2014.admissions.admission_rate.overall
2014.cost.tuition.in_state
2014.cost.tuition.out_of_state
2014.aid.median_debt.completers.overall
2014.aid.median_debt.noncompleters
2005.student.demographics.median_hh_income
2005.student.demographics.poverty_rate
2014.aid.federal_loan_rate
2014.aid.pell_grant_rate
2014.student.size"""
fieldnames = varnames.replace("\n", ",")

Exercise (1) I would like to make a complicated query from the API. An example query from the API is

https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=3&_fields=school.name,
2014.aid.pell_grant_rate&_sort=2014.student.size:desc&_page=1
&_per_page=100&api_key=YOURKEY

Which can be parsed as

the base url: https://api.data.gov/ed/collegescorecard/v1/schools.json,
a filter of school.degrees_awarded.predominant=3,
return the fields school.name,2014.aid.pell_grant_rate,
sort by 2014.student.size in descending order,
return 1 page of 100 entries,
and use your api key

You can find a more complete description: https://github.com/RTICWDT/open-data-maker/blob/master/API.md

Fill in the following def that fits the following docstring, and gives the shown output.

def query_url(key,fieldnames,filterstring,sortvar,numpages):
    """
    This takes all of the necessary parameters to form a query 
    to college scorecard
    Input: key (data.gov API key, string), fieldnames (list of variable 
      names), filterstring (string for filtering), sortvar (variable to 
      sort with by descending), numpages (number of pages to return)
    Output: query url
    """


query_url(key,fieldnames,'school.degrees_awarded.predominant=3',
  '2014.student.size',2)

https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=3&_fields=id,school.name,school.degrees_awarded.predominant,school.degree_urbanization,2014.admissions.admission_rate.overall,2014.cost.tuition.in_state,2014.cost.tuition.out_of_state,2014.aid.median_debt.completers.overall,2014.aid.median_debt.noncompleters,2005.student.demographics.median_hh_income,2005.student.demographics.poverty_rate,2014.aid.federal_loan_rate,2014.aid.pell_grant_rate,2014.student.size&_page=2&_per_page=100&_sort=2014.student.size:desc&api_key=YOURKEYHERE

Try to reproduce the above behavior.

def query_url(key,fieldnames,filterstring,sortvar,numpages):
    base_url = 'https://api.data.gov/ed/collegescorecard/v1/schools.json?'
    numpages = str(numpages)
    url = base_url + filterstring + '&_fields=' + fieldnames + '&_page=' + numpages + '&_per_page=100' + '&_sort=' + sortvar + ':desc&api_key=' + key
    return url
searchurl = query_url('A8tjHzrA9pJQTudS94B6XSGMB3BzXeSSwrgUGlix', fieldnames, 'school.degrees_awarded.predominant=3', '2014.student.size', 1)
searchurl

'https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=3&_fields=id,school.name,school.degrees_awarded.predominant,2014.admissions.admission_rate.overall,2014.cost.tuition.in_state,2014.cost.tuition.out_of_state,2014.aid.median_debt.completers.overall,2014.aid.median_debt.noncompleters,2005.student.demographics.median_hh_income,2005.student.demographics.poverty_rate,2014.aid.federal_loan_rate,2014.aid.pell_grant_rate,2014.student.size&_page=1&_per_page=100&_sort=2014.student.size:desc&api_key=A8tjHzrA9pJQTudS94B6XSGMB3BzXeSSwrgUGlix'

Exercise (2) Make the query using Request and load the results into a DataFrame that makes sense.

request = Request(searchurl)
response = urlopen(request)
rr = response.read()

print rr[0:400]

{"metadata":{"total":2102,"page":1,"per_page":100},"results":[{"2014.cost.tuition.out_of_state":17459,"2014.aid.median_debt.noncompleters":10809.0,"2014.cost.tuition.in_state":6299,"2014.admissions.admission_rate.overall":0.594,"id":110556,"2005.student.demographics.poverty_rate":16.0699996948242,"2014.aid.pell_grant_rate":0.5699,"2005.student.demographics.median_hh_income":53295,"2014.student.siz

js = json.loads(rr)
type(js)
js.keys()

[u'results', u'metadata']

observations = js[u'results']

college = pd.DataFrame(data = observations)
college

searchurl = query_url('A8tjHzrA9pJQTudS94B6XSGMB3BzXeSSwrgUGlix', fieldnames, 'school.degrees_awarded.predominant=3', '2014.student.size', 2)
request = Request(searchurl)
response = urlopen(request)
rr = response.read()
js = json.loads(rr)
observations = js[u'results']
collegedfs = pd.DataFrame(data = observations)
college = college.append(collegedfs)

college
college = college.rename(columns=lambda i: ['Median Income', 'Poverty Rate', 'Admission Rate', 'Federal Loan Rate', 'Mean Debt Completers', 'Mean Debt Noncompleters', 'Pell Grant Rate', 'In State Tuition', 'Out State Tuition', 'Size', 'ID', 'School Degrees', 'School Name'][i])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-2cb8d49ae7e8> in <module>()
      1 college
----> 2 college = college.rename(columns=lambda i: ['Median Income', 'Poverty Rate', 'Admission Rate', 'Federal Loan Rate', 'Mean Debt Completers', 'Mean Debt Noncompleters', 'Pell Grant Rate', 'In State Tuition', 'Out State Tuition', 'Size', 'ID', 'School Degrees', 'School Name'][i])

C:\Users\Apus\Anaconda2\lib\site-packages\pandas\core\frame.pyc in rename(self, index, columns, **kwargs)
   2832     def rename(self, index=None, columns=None, **kwargs):
   2833         return super(DataFrame, self).rename(index=index, columns=columns,
-> 2834                                              **kwargs)
   2835 
   2836     @Appender(_shared_docs['fillna'] % _shared_doc_kwargs)

C:\Users\Apus\Anaconda2\lib\site-packages\pandas\core\generic.pyc in rename(self, *args, **kwargs)
    678 
    679             baxis = self._get_block_manager_axis(axis)
--> 680             result._data = result._data.rename_axis(f, axis=baxis, copy=copy)
    681             result._clear_item_cache()
    682 

C:\Users\Apus\Anaconda2\lib\site-packages\pandas\core\internals.pyc in rename_axis(self, mapper, axis, copy)
   2816         """
   2817         obj = self.copy(deep=copy)
-> 2818         obj.set_axis(axis, _transform_index(self.axes[axis], mapper))
   2819         return obj
   2820 

C:\Users\Apus\Anaconda2\lib\site-packages\pandas\core\internals.pyc in _transform_index(index, func)
   4746         return MultiIndex.from_tuples(items, names=index.names)
   4747     else:
-> 4748         items = [func(x) for x in index]
   4749         return Index(items, name=index.name)
   4750 

<ipython-input-53-2cb8d49ae7e8> in <lambda>(i)
      1 college
----> 2 college = college.rename(columns=lambda i: ['Median Income', 'Poverty Rate', 'Admission Rate', 'Federal Loan Rate', 'Mean Debt Completers', 'Mean Debt Noncompleters', 'Pell Grant Rate', 'In State Tuition', 'Out State Tuition', 'Size', 'ID', 'School Degrees', 'School Name'][i])

TypeError: list indices must be integers, not unicode

print len(college.columns), len(college.index)

13 200

Exercise (3) Produce scatterplots to demonstrate the associations between student debt (for 'completers') and median household income, pell grant rates, student poverty rate, tuition, and federal loan rate. Do you have an explanation for the associations that you are seeing? Are there some surprising associations.

from pandas.tools.plotting import scatter_matrix

debt = college[['2014.aid.median_debt.completers.overall']]
income = college[['2005.student.demographics.median_hh_income']]
plt.scatter(income, debt)
plt.show

<function matplotlib.pyplot.show>

scatter_matrix(college[['2014.aid.median_debt.completers.overall', '2005.student.demographics.median_hh_income', '2014.aid.pell_grant_rate', '2005.student.demographics.poverty_rate', '2014.cost.tuition.in_state', '2014.aid.federal_loan_rate']], figsize = (16,16))

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000002FCB3128>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003074D4E0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030808DA0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000308F57B8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030ABF2E8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030BA89B0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000030C72630>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030D19F60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030E26CC0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000030D7CB38>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003102B4E0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031052978>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000003119D278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003129DD68>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031393BE0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000314A0710>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000315A3F60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000316B1BE0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000031724550>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000318731D0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031777C18>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031A689B0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031A9DA58>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031C2EF60>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000031CF8908>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031DEF358>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031EF0D68>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000031FE7780>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000320C4358>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003216CC88>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000032279860>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003232D278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000003247E080>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000323AE278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000326607B8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000326ED160>]], dtype=object)

	2005.student.demographics.median_hh_income	2005.student.demographics.poverty_rate	2014.admissions.admission_rate.overall	2014.aid.federal_loan_rate	2014.aid.median_debt.completers.overall	2014.aid.median_debt.noncompleters	2014.aid.pell_grant_rate	2014.cost.tuition.in_state	2014.cost.tuition.out_of_state	2014.student.size	id	school.degrees_awarded.predominant	school.name
0	53295.0	16.070000	0.5940	0.3316	15993.5	10809.0	0.5699	6299	17459	20128	110556	3	California State University-Fresno
1	60976.0	5.940000	0.6989	0.3856	21500.0	7500.0	0.1989	8170	22090	19979	181464	3	University of Nebraska-Lincoln
2	60340.0	7.850000	0.7206	0.5668	21743.0	9250.0	0.4473	7876	24124	19928	220978	3	Middle Tennessee State University
3	63797.0	7.600000	0.8189	0.5588	23178.5	12500.0	0.3500	8970	25140	19842	232982	3	Old Dominion University
4	59909.0	6.270000	0.9493	0.4922	22236.0	9500.0	0.2447	9034	22624	19727	155399	3	Kansas State University
5	62889.0	7.110000	0.6933	0.5142	23368.5	12500.0	0.4065	7794	23319	19662	209807	3	Portland State University
6	NaN	NaN	NaN	0.5076	33000.0	7881.5	0.5145	10541	10541	19657	484631	3	University of Phoenix-California
7	46777.0	19.010000	0.9989	0.4686	17529.0	7500.0	0.5760	7259	18389	19614	228796	3	The University of Texas at El Paso
8	63949.0	8.250000	0.5781	0.4693	19500.0	8750.0	0.2965	8870	22290	19488	196088	3	University at Buffalo
9	74683.0	7.160000	0.3089	0.3391	19166.0	11000.0	0.2012	8918	20078	19177	110422	3	California Polytechnic State University-San Lu...
10	70891.0	5.240000	0.9142	0.4456	19793.0	8250.0	0.2216	10448	25731	18802	155317	3	University of Kansas
11	68292.0	11.290000	0.5829	0.5538	19500.0	9637.5	0.5752	13407	36285	18784	110671	3	University of California-Riverside
12	81580.0	4.660000	0.6573	0.3824	20500.0	8750.0	0.1394	9662	24522	18697	232423	3	James Madison University
13	58865.0	9.000000	NaN	0.8829	34135.5	6544.0	0.6901	11283	11283	18566	444158	3	Colorado Technical University-Online
14	80470.0	4.980000	0.5921	0.4708	18500.0	8250.0	0.2552	8590	20268	18546	164076	3	Towson University
15	58452.0	6.890000	0.5312	0.5447	26813.5	9500.0	0.1658	17772	28168	18474	215293	3	University of Pittsburgh-Pittsburgh Campus
16	69141.0	5.600000	0.8406	0.6679	26587.0	11063.0	0.3861	10685	24917	18450	172699	3	Western Michigan University
17	76787.0	8.750000	0.1802	0.3729	21500.0	15000.0	0.2337	48280	48280	18392	123961	3	University of Southern California
18	84189.0	4.380000	0.6783	0.5156	24195.0	8000.0	0.1359	12342	30692	18222	130943	3	University of Delaware
19	58778.0	8.120000	0.9571	0.5964	26000.0	9500.0	0.3928	10260	18792	18148	200800	3	University of Akron Main Campus
20	74746.0	5.020000	0.7388	0.6017	22500.0	8250.0	0.2665	13296	20886	18056	145813	3	Illinois State University
21	NaN	NaN	0.7225	0.7979	40150.0	12955.0	0.6235	16888	16888	18055	482477	3	DeVry University-Illinois
22	80999.0	4.960000	0.4996	0.4978	21500.0	9945.5	0.2066	12700	32880	18016	129020	3	University of Connecticut
23	66893.0	7.010000	0.2850	0.3198	16149.0	11000.0	0.2128	8336	33418	17908	199120	3	University of North Carolina at Chapel Hill
24	35906.0	30.620001	0.6280	0.3816	12036.5	5500.0	0.6297	5173	14023	17811	227368	3	The University of Texas Rio Grande Valley
25	53374.0	12.170000	0.8122	0.4545	20500.0	10500.0	0.2838	7096	19144	17809	176017	3	University of Mississippi
26	69989.0	6.900000	0.6887	0.7320	26000.0	12500.0	0.4391	9663	25706	17723	169798	3	Eastern Michigan University
27	66371.0	6.180000	NaN	0.2561	14926.0	6334.0	0.2873	5184	13838	17413	230782	3	Weber State University
28	61912.0	11.530000	0.7733	0.6010	21393.0	12000.0	0.4790	11449	24472	17242	172644	3	Wayne State University
29	62772.0	9.620000	0.6310	0.6236	22500.0	10000.0	0.3849	6170	16487	17232	139931	3	Georgia Southern University
...	...	...	...	...	...	...	...	...	...	...	...	...	...
70	80346.0	6.490000	0.1419	0.3104	12034.5	8289.0	0.1630	47286	47286	14195	190415	3	Cornell University
71	66707.0	6.550000	0.8457	0.4670	19500.0	8866.0	0.2556	8965	20407	14054	237011	3	Western Washington University
72	77933.0	6.270000	0.3338	0.3372	24250.0	13313.5	0.1823	11394	30698	13996	139755	3	Georgia Institute of Technology-Main Campus
73	62628.0	7.120000	0.6057	0.4000	17402.0	7816.0	0.3212	6385	20789	13884	136172	3	University of North Florida
74	73627.0	7.460000	0.5536	0.4520	25131.0	8750.0	0.2135	38120	38120	13801	223232	3	Baylor University
75	63749.0	5.980000	0.5337	0.6624	26000.0	9500.0	0.3312	10726	18034	13697	201441	3	Bowling Green State University-Main Campus
76	71646.0	7.200000	0.7882	0.3746	19000.0	8000.0	0.3015	10620	29412	13667	141574	3	University of Hawaii at Manoa
77	59551.0	13.430000	0.8417	0.4155	19003.0	9837.0	0.4655	5516	14204	13662	225432	3	University of Houston-Downtown
78	74655.0	5.170000	0.5348	0.6285	23875.0	12500.0	0.2434	9144	19594	13610	216764	3	West Chester University of Pennsylvania
79	80190.0	5.690000	0.3222	0.4410	28501.0	19500.0	0.1321	43440	43440	13492	167358	3	Northeastern University
80	71392.0	8.780000	0.4406	0.4653	20000.0	10500.0	0.2861	8620	20260	13372	196079	3	SUNY at Binghamton
81	47617.0	13.740000	0.7387	0.5829	23500.0	9500.0	0.4242	7920	17448	13323	156620	3	Eastern Kentucky University
82	NaN	NaN	0.6000	0.6711	24581.0	14769.0	0.2620	17280	17280	13271	454227	3	Chamberlain College of Nursing-Illinois
83	62394.0	6.730000	0.5888	0.4578	18750.0	7503.0	0.3438	6118	25162	13253	433660	3	Florida Gulf Coast University
84	72503.0	6.910000	0.7681	0.6714	23250.0	9250.0	0.2728	12506	28072	13205	217484	3	University of Rhode Island
85	59313.0	8.950000	0.8153	0.6092	22250.0	9500.0	0.4330	12248	24871	13171	149222	3	Southern Illinois University-Carbondale
86	53207.0	8.580000	0.8385	0.4699	24854.0	10500.0	0.2938	6800	21391	13117	180461	3	Montana State University
87	65469.0	12.270000	0.9427	0.1921	14666.0	5636.0	0.4530	6458	13268	12944	190558	3	College of Staten Island CUNY
88	49392.0	18.700001	0.3533	0.2179	10456.0	6500.0	0.4879	6536	13346	12925	190549	3	CUNY Brooklyn College
89	72116.0	6.700000	NaN	0.2647	19449.5	9250.0	0.2385	5494	16466	12865	102553	3	University of Alaska Anchorage
90	51361.0	19.879999	0.3731	0.2067	10092.0	6132.0	0.5530	6359	13169	12758	190600	3	CUNY John Jay College of Criminal Justice
91	62176.0	7.730000	0.5855	0.5178	20584.0	8741.0	0.2899	6392	20420	12686	199218	3	University of North Carolina Wilmington
92	70013.0	8.100000	0.5584	0.5944	21489.5	9249.0	0.3904	8302	19942	12659	196060	3	SUNY at Albany
93	69175.0	6.540000	0.5747	0.6058	27000.0	14000.0	0.2940	35768	35768	12635	195003	3	Rochester Institute of Technology
94	61040.0	6.820000	0.7842	0.6638	24425.5	10500.0	0.4124	10677	16467	12632	169910	3	Ferris State University
95	77484.0	4.130000	0.8002	0.6752	27000.0	9500.0	0.2250	16552	29532	12591	183044	3	University of New Hampshire-Main Campus
96	56952.0	8.900000	0.7439	0.5201	20800.0	10500.0	0.3829	7972	21066	12402	235097	3	Eastern Washington University
97	66314.0	4.750000	0.6658	0.5862	22997.0	8767.0	0.2869	7574	15053	12332	173920	3	Minnesota State University-Mankato
98	78750.0	8.390000	0.7031	0.3865	15880.0	11000.0	0.4758	6564	17724	12275	110574	3	California State University-East Bay
99	54533.0	17.820000	0.4234	0.3827	15594.0	11000.0	0.6047	6139	17299	12199	110547	3	California State University-Dominguez Hills