Jared Goldsmith

Professional Work

Green Button Initiative

The Green Button Initiative was announced in 2011 by U.S. Chief Technology Officer, Aneesh Chopra, with the goal of making your electricity and natural gas usage more accessible, namely by standardizing data formats and communication. I started working with the Green Button in 2012 as PG&E rolled out simple file downloads. In 2015, as the Green Button API became more settled, I built a Third Party application for Gas & Power Technologies to interface with Green Button Data Custodians. We were one of the first companies to utilize PG&E's bulk API for authorizing large groups of accounts. I'm currently experimenting with different models for forecasting electricity and natural gas usage using this Green Button data enriched with weather, calendar events, and other data sources.

University of California San Diego

Masters of Data Science & Engineering

Transient State Detection in Machine Data

Program Capstone Project

Advisors:

Dr. Yoav Fruend (UCSD)
Dr. Chad Holcomb (Solar Turbines)
Dr. Ilkay Altintas (San Diego Supercomputer Center)

Teammates: Jillian Jarrett, Garret Cheung, Orysya Stus

Given an unlabeled high dimensional timeseries dataset requiring deep domain expertise to understand (which we did not have), derive some business value. That was the challenge laid out by Solar Turbines, a Catepillar-owned company that builds industrial gas turbines and offers an equipment health monitoring platform to its customers.

Using millions of turbine sensor readings, a combination of PCA, creative time partitioning, and clustering, our group was able to generate machine load profiles and machine similarity measures, classify different types of performance outliers including a valuable subset known as transient states, and build a user interface for domain experts to use these results to efficiently create a labeled dataset for future predictive modeling.

dashboard demo(~15MB, subset of turbines), pipeline code, UI code, paper, slides

Exploratory Data Analysis API

Data Integration

Teammates: Laura Wilke, Lauren Coden, Jamie Ryan, Pradeep

Dora, the data explorer, is a Python API over 3 different data sources (Postgres, Solr, AsterixDB) intended for EDA in a mock product recommendation pipeline. Storage details are hidden from the API consumers and recommendation endpoints allow for a feedback loop with a machine learning model.

demo notebook, code, docs, slides

H1b Explorer

Data Visualization

Teammates: Jillian Jarrett, Akshit Bhatnagar

Final project for Amit Chourasia's Data Visualization class. With 1/3 of our team stuck in India on visa technicalities, we went into the project hoping to find some secret pattern in a dataset of H1B jobs to make things easier for anyone navigating the work visa maze. We came out with a functional and visually effective view of available positions by job type, employer and location. No secret patterns found...

Filter the H1B job market by state, county, company, and job type. Uses slope charts (2d parallel coordinate chart) to compare the number of jobs available with the average salary for companies and job types. Compared to previous projects for this class, this viz has an improved observer pattern for the different components.

demo (~60MB to load), code, report, slides

West Nile Virus Visualization

Data Visualization

Teammates: Jillian Jarrett, Akshit Bhatnagar

This was a simple project meant to explore time series and geographic data visualization. I really liked our solution. The seasonal markings show peak West Nile season is getting later and later each year, and the interactivity between timeline selection and geographic shading highlights how cases of the disease propagate out from counties with heavy lakes, rivers, or irrigation.

demo, code, report

Hobby Projects

NOAA Data Integration

I have been hooked on oceanagraphic data since I started surfing in 1999. As an undergrad, I took Physical Oceanography and my senior thesis in the Computer Science department was "Predicting Significant Ocean Wave Heights Using Genetic Algorithms”, a small survey of early neural network approaches to ocean state modeling with an attempt at applying Genetic Algorithms to the same problem. I've since left the forecasting to those with knowledge of fluid dynamics but I've continued to write code around the abundance of data provided by NOAA. Current work includes serverless wrappers around the different sources of wind, wave, tide, and bathymetry data for a more consistent interface as well as some visualizations built on top of those endpoints. I'm interested in generating labeled datasets for hyper-local surf condition predictions.

Dashboard for San Diego

Gardening

I never expected gardening to be a hobby of mine, but not only do I love growing as much of my own food as possible, I love tracking the data it generates. I've made a few garden management apps over the years, each one getting more simple than the last, searching for a balance between the drudgery of data entry and capturing useful information. Given the number of task and time management apps available, I don't expect anyone to adopt my garden management style, but I've found this project to be a good exercise in UI design and frontend tools. The current version from 2015 was built with Mongo, Express, AngularJS, and Node. It is not currently hosted anywhere but if you're interested, get in touch.

Genomics EDA

As a side project over the summer of 2017, I started exploring genetic data available through the NCBI. With a few pointers from friends at Salk and Scripps, I was on my way to some underinformed science. I didn't get very far before classes started back up, but it's a project and domain I intend to continue with.

code