Dharani Suresh

API Project - Get started with open reproducible science! πŸ”¬πŸš€


  1. What is Open Reproducible Science? - In my understanding, open reproducible science refers to the practice of conducting scientific research in a way that others can easily access, replicate, and validate the findings. This approach promotes transparency and accountability in research by making methodologies, data, and results freely available.

  2. Example: Git and GitHub support open reproducible science by providing tools for version control and collaboration, allowing researchers to share their code, track changes, and manage contributions transparently. This infrastructure enables the scientific community to review, replicate, and build upon work, thus fostering a collaborative and open scientific environment.

What is Machine Readable Name - The Jupyter Notebook file β€œGet Started with Open Reproducible Science!.ipynb” has a machine-readable name. The format β€œ.ipynb” indicates it is a Jupyter Notebook, and the rest of the filename, though human-friendly with spaces and punctuation, is still readable by machines which can interpret it as a string for file operations and display purposes. Renamed the filename as madison_timeseries for easier understanding.

Here are some suggestions for creating readable, well-documented scientific workflows that are easier to reproduce

Write clean code by:

  1. Using meaningful and descriptive names for variables, functions, and classes to make the code easier to understand and maintain.
  2. Keeping functions short and focused on a single task, which enhances readability and simplifies debugging.
  3. Consistently using comments and documentation to explain the purpose and functionality of the code, aiding future maintainers and contributors.

Advantages of clean code include:

  1. Easier maintenance and debugging as cleaner, simpler code reduces the complexity that can obscure bugs.
  2. Enhanced collaboration since clean code is easier for other developers to read, understand, and contribute to.
  3. Increased efficiency in development cycles, with developers spending less time deciphering the existing codebase, allowing for quicker enhancements and fixes.

Getting Started with the Project

This project focuses on utilizing the National Centers for Environmental Information (NCEI) Access Data Service, which offers a RESTful application programming interface (API). This API allows users to access and subset data by applying a specific set of parameters to the version 1 (v1) URL. More information can be found here: https://www.ncei.noaa.gov/support/access-data-service-api-user-documentation

The Global Historical Climatology Network - Daily (GHCNd) is maintained by the National Climatic Data Center (NCDC), part of the National Oceanic and Atmospheric Administration (NOAA). The dataset includes daily climate records from thousands of land-based stations worldwide, spanning from the mid-1700s to the present. Temperatures in the GHCNd are primarily recorded in degrees Celsius, though historical data may also include records in Fahrenheit. Data are collected using standardized instruments and techniques at meteorological stations globally, ensuring consistency and reliability across measurements.

Citation: Menne, M.J., I. Durre, B. Korzeniewski, S. McNeal, K. Thomas, X. Yin, S. Anthony, R. Ray, R.S. Vose, B.E. Gleason, and T.G. Houston, 2012d: Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. NOAA National Climatic Data Center. http://doi.org/10.7289/V5D21VHZ.


Programming Part Begins

import pandas as pd

University of Wisconsin Outline Map

University of Wisconsin Outline Map

Loading the URL

Downloading daily summaries from the University of Wisconsin-Madison station, from 1971 to the present, to analyze precipitation and temperature trends using APIs from the National Centers for Environmental Information (NCEI), a part of the National Oceanic and Atmospheric Administration (NOAA) website.

uwm_url = (
    'https://www.ncei.noaa.gov/access/services/data/v1?'
    'dataset=daily-summaries&dataTypes=TOBS,PRCP&'
    'stations=USC00470273&startDate=1971-10-01&endDate=2024-04-05&'
    'includeStationName=true&includeStation'
    'Location=1&units=metric')
uwm_url
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00470273&startDate=1971-10-01&endDate=2024-04-05&includeStationName=true&includeStationLocation=1&units=metric'
madison_df = pd.read_csv(
 uwm_url, index_col='DATE', parse_dates=True, na_values='NaN')
madison_df
STATION NAME LATITUDE LONGITUDE ELEVATION PRCP TOBS
DATE
1971-10-01 USC00470273 UW ARBORETUM MADISON, WI US 43.04118 -89.42872 265.2 0.0 24.4
1971-10-02 USC00470273 UW ARBORETUM MADISON, WI US 43.04118 -89.42872 265.2 0.0 19.4
1971-10-03 USC00470273 UW ARBORETUM MADISON, WI US 43.04118 -89.42872 265.2 3.8 19.4
1971-10-04 USC00470273 UW ARBORETUM MADISON, WI US 43.04118 -89.42872 265.2 0.5 11.1
1971-10-05 USC00470273 UW ARBORETUM MADISON, WI US 43.04118 -89.42872 265.2 0.3 15.0
... ... ... ... ... ... ... ...
2024-04-01 USC00470273 UW ARBORETUM MADISON, WI US 43.04118 -89.42872 265.2 0.0 3.3
2024-04-02 USC00470273 UW ARBORETUM MADISON, WI US 43.04118 -89.42872 265.2 12.4 3.3
2024-04-03 USC00470273 UW ARBORETUM MADISON, WI US 43.04118 -89.42872 265.2 56.6 1.1
2024-04-04 USC00470273 UW ARBORETUM MADISON, WI US 43.04118 -89.42872 265.2 1.0 1.1
2024-04-05 USC00470273 UW ARBORETUM MADISON, WI US 43.04118 -89.42872 265.2 0.0 2.8

18998 rows Γ— 7 columns

# Data was imported into a pandas DataFrame
type(madison_df)
pandas.core.frame.DataFrame
madison_df = madison_df[['PRCP', 'TOBS']]
madison_df
PRCP TOBS
DATE
1971-10-01 0.0 24.4
1971-10-02 0.0 19.4
1971-10-03 3.8 19.4
1971-10-04 0.5 11.1
1971-10-05 0.3 15.0
... ... ...
2024-04-01 0.0 3.3
2024-04-02 12.4 3.3
2024-04-03 56.6 1.1
2024-04-04 1.0 1.1
2024-04-05 0.0 2.8

18998 rows Γ— 2 columns


Plotting the Precpitation column (PRCP) and Observed Temperature( TOBS) vs Time to explore the data

madison_df.plot()

png

Plotting the only the Precpitation column (PRCP) vs Time to explore the data

# Plot the data using .plot
madison_df.plot(
    y='PRCP',
    title='UWM-Precipitation',
    xlabel='Date ',
    ylabel='Precipitation in (mm)')

png

Converting to Celsius - a Vectortized Operation

madison_df.loc[:,'TFah'] = (madison_df.loc[:,'TOBS'] * (9 / 5)) + 32
madison_df
/tmp/ipykernel_36403/3593513323.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  madison_df.loc[:,'TFah'] = (madison_df.loc[:,'TOBS'] * (9 / 5)) + 32
PRCP TOBS TFah
DATE
1971-10-01 0.0 24.4 75.92
1971-10-02 0.0 19.4 66.92
1971-10-03 3.8 19.4 66.92
1971-10-04 0.5 11.1 51.98
1971-10-05 0.3 15.0 59.00
... ... ... ...
2024-04-01 0.0 3.3 37.94
2024-04-02 12.4 3.3 37.94
2024-04-03 56.6 1.1 33.98
2024-04-04 1.0 1.1 33.98
2024-04-05 0.0 2.8 37.04

18998 rows Γ— 3 columns

Another Efficient Method: Creating a β€œFunction” to Convert the Observed Temperature Values From Fahrenheit to Celsius


def convert_to_celsius(fahrenheit):
    """Convert temperature from Fahrenheit to Celsius."""
    return (fahrenheit - 32) * 5 / 9
madison_df['celsius_column'] = madison_df['TOBS'].apply(convert_to_celsius)
madison_df
/tmp/ipykernel_36403/1707990930.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  madison_df['celsius_column'] = madison_df['TOBS'].apply(convert_to_celsius)
PRCP TOBS TFah celsius_column
DATE
1971-10-01 0.0 24.4 75.92 -4.222222
1971-10-02 0.0 19.4 66.92 -7.000000
1971-10-03 3.8 19.4 66.92 -7.000000
1971-10-04 0.5 11.1 51.98 -11.611111
1971-10-05 0.3 15.0 59.00 -9.444444
... ... ... ... ...
2024-04-01 0.0 3.3 37.94 -15.944444
2024-04-02 12.4 3.3 37.94 -15.944444
2024-04-03 56.6 1.1 33.98 -17.166667
2024-04-04 1.0 1.1 33.98 -17.166667
2024-04-05 0.0 2.8 37.04 -16.222222

18998 rows Γ— 4 columns

Next, Subsetting and Resampling

# Subsetting the data to look at 1972-2023
madison_1971_2024 = madison_df.loc['1972-10-1':'2023-09']
madison_1971_2024
PRCP TOBS TFah celsius_column
DATE
1972-10-01 0.0 11.1 51.98 -11.611111
1972-10-02 0.0 12.8 55.04 -10.666667
1972-10-03 0.0 15.0 59.00 -9.444444
1972-10-04 0.0 13.9 57.02 -10.055556
1972-10-05 0.3 15.0 59.00 -9.444444
... ... ... ... ...
2023-09-26 41.4 16.7 62.06 -8.500000
2023-09-27 16.3 17.2 62.96 -8.222222
2023-09-28 5.8 17.2 62.96 -8.222222
2023-09-29 0.0 14.4 57.92 -9.777778
2023-09-30 0.0 18.9 66.02 -7.277778

18457 rows Γ— 4 columns

Getting into Action: Calculating Annual Statistics

# Resampling the data to look at yearly mean values
madison_yearly_mean = madison_1971_2024.resample('YS-OCT').mean()
madison_yearly_mean
PRCP TOBS TFah celsius_column
DATE
1972-10-01 2.920548 7.287535 45.117562 -13.729147
1973-10-01 2.566027 7.002755 44.604959 -13.887358
1974-10-01 2.968767 6.746006 44.142810 -14.029997
1975-10-01 1.844262 7.629363 45.732853 -13.539243
1976-10-01 2.204658 4.433058 39.979504 -15.314968
1977-10-01 3.201918 5.011570 41.020826 -14.993572
1978-10-01 2.180274 3.774451 38.794011 -15.680861
1979-10-01 2.485886 4.058683 39.305629 -15.522954
1980-10-01 2.470685 5.063611 41.114500 -14.964660
1981-10-01 2.398305 4.194521 39.550137 -15.447489
1982-10-01 2.479452 6.361370 43.450466 -14.243683
1983-10-01 2.602192 4.321038 39.777869 -15.377201
1984-10-01 2.780822 5.363736 41.654725 -14.797924
1985-10-01 3.172603 5.015068 41.027123 -14.991629
1986-10-01 2.191185 6.076584 42.937851 -14.401898
1987-10-01 2.171513 6.437982 43.588368 -14.201121
1988-10-01 2.013736 4.522253 40.140055 -15.265415
1989-10-01 2.181267 5.444413 41.799944 -14.753104
1990-10-01 2.516022 5.816898 42.470416 -14.546168
1991-10-01 2.531285 4.675900 40.416620 -15.180055
1992-10-01 3.600852 4.688068 40.438523 -15.173295
1993-10-01 2.376945 3.664286 38.595714 -15.742063
1994-10-01 2.036034 5.166289 41.299320 -14.907617
1995-10-01 2.562319 2.915864 37.248555 -16.157853
1996-10-01 2.469405 3.791136 38.824044 -15.671591
1997-10-01 2.793220 6.330086 43.394155 -14.261063
1998-10-01 2.805460 7.214245 44.985641 -13.769864
1999-10-01 3.209167 7.580737 45.645326 -13.566257
2000-10-01 2.826761 6.382051 43.487692 -14.232194
2001-10-01 2.386197 7.484000 45.471200 -13.620000
2002-10-01 2.015000 5.741399 42.334519 -14.588111
2003-10-01 2.911175 5.942604 42.696686 -14.476331
2004-10-01 1.824910 8.289510 46.921119 -13.172494
2005-10-01 2.721751 7.068091 44.722564 -13.851060
2006-10-01 2.964124 7.038028 44.668451 -13.867762
2007-10-01 3.238997 6.161972 43.091549 -14.354460
2008-10-01 2.735028 5.191061 41.343911 -14.893855
2009-10-01 3.396953 7.060388 44.708698 -13.855340
2010-10-01 2.376667 5.762259 42.372066 -14.576523
2011-10-01 2.072253 8.368579 47.063443 -13.128567
2012-10-01 3.528219 5.908564 42.635414 -14.495242
2013-10-01 2.637363 4.241944 39.635500 -15.421142
2014-10-01 2.476880 5.777135 42.398843 -14.568258
2015-10-01 3.344413 8.461878 47.231381 -13.076734
2016-10-01 3.355989 7.616809 45.710256 -13.546217
2017-10-01 4.053444 7.053315 44.695967 -13.859269
2018-10-01 3.384807 7.122715 44.820886 -13.820714
2019-10-01 2.779396 9.043213 48.277784 -12.753770
2020-10-01 2.087945 7.714641 45.886354 -13.491866
2021-10-01 2.500275 7.631476 45.736657 -13.538069
2022-10-01 2.490685 8.897534 48.015562 -12.834703

Plotting the Resampled Data

# Plotting mean annual temperature values
# Plotting the data using .plot

import matplotlib.pyplot as plt

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))  # Create a figure with two subplots

# Plot the first dataframe
madison_df.plot(
    y='TFah',
    title='UWM-Mean Annual Temperature',
    xlabel='Date',
    ylabel='Temp in (F)',
    ax=axes[0]  # This tells the plot to use the first subplot
)

# Plot the second dataframe
madison_yearly_mean.plot(
    y='TFah',
    title='UWM-Mean Annual Temperature',
    xlabel='Date',
    ylabel='Temp in (F)',
    ax=axes[1]  # This tells the plot to use the second subplot
)

fig.tight_layout()  # Adjusts plot to ensure everything fits without overlap
plt.show()  # Display the plots

png

About University of Wisconsin-Madison Station Temperature Plot πŸ“° πŸ—žοΈ πŸ“»

University of Wisconsin-Madison shows a warming trend with increased temperature fluctuations since 1980. Recent decades at the University of Wisconsin-Madison indicate a rise in mean annual temperatures, signaling a possible long-term climate shift.

%%capture
%%bash
jupyter nbconvert "madison_timeseries.ipynb" --to markdown