Pandas DataFrames serve us well in many applications, but for data with multiple coordinates or many attributes XArray may be a better fit. In this week’s video we explore how to convert a DataFrame to a DataSet using upper air balloon data as an example.
Pandas has been a standard since its release in 2008 for dealing with tabular data. We’ve made it work for earth science datasets by either working with sets of DataFrames or complex multi-index DataFrames. While these solutions have solved the problem, a true multi-dimensional data structure has been missing from the Python ecosystem.
Xarray has proven itself to be a great tool for dealing with highly dimensional data. In planetary and atmospheric science we’ve battled this problem for many years. XArray is based on the idea of dimensions and coordinates. We have another video comparing those if you need a refresher.
Conversion of DataFrames to DataSets
While the conversion from tabular to N-dimensional structures can’t be totally automated, there is a to_xarray method available on DataFrames that is a good start. In this video we use the Siphon library to retrieve an upper air weather balloon sounding. We then convert that data to an XArray DataSet, assign attributes, clean up the data, and get a really useful structure in the end.
- Sediment Provenance with Microscopes and Raman – Don’t Panic Podcast Episode 352 - December 23, 2022
- XArray Multidimensional Groupby – Quick Data Analysis in 1 line! - December 19, 2022
- 2022’s Favorite Things – Don’t Panic Podcast Episode 351 - December 16, 2022