Pangeo

Pangeo is a community of people built around big data geoscience, and supports many aspects related to big data in the geosciences, including (but not limited to):

  • the use and development of python tools like jupyter, xarray and dask.

  • a software environment that includes core libraries used by the Pangeo community (aka the “Pangeo Stack”). A current list of packages in the Pangeo environment can be found here under “Pangeo-notebook>conda list”.

  • educational resources to learn more about the software and infrastructure Pangeo uses, including a gallery of example coding use cases, as well as a Pangeo-specific Jupyter Binder in which users can spin up their own notebooks to interact with data in the cloud.

  • hosting collections of selected datasets publicly in commercial cloud (Pangeo Data Catalog).

  • cloud computing services (Pangeo Cloud).

For more information, see the Pangeo website. To get involved in the community, you can post on the Pangeo Discourse Forum or attend Pangeo community meetings. The Pangeo Oceania group meets monthly at Australian-friendly times: 3rd Friday of the month at 1pm Australian Eastern Time. All are welcome and meeting agendas and connection details are posted here.

When would I use Pangeo?

If you use any of the tools like xarray or dask, then you are already “using” Pangeo. Pangeo Cloud can be used for any Earth-related research you undertake. It would be especially useful if you want to analyze across many datasets that are already stored in the Pangeo Cloud. Additionally, as a cloud computing service, it allows users to scale up computations very easily and on-demand, making it especially useful for data analysis on very large datasets. Pangeo Cloud is not currently optimized for running models, but rather for data analysis.

How do I get access?

Anyone can request access to Pangeo Cloud on the Pangeo Cloud documentation site, under “Sign Up”. This will take you to a Google form where you can fill in details of the research you would like to use Pangeo Cloud for, and after submission you should hear back within a few days if your request was approved. At this time, most projects related to climate and geoscience get approved. The link above also contains other pertinent information to using Pangeo Cloud. Note that the Pangeo Cloud is currently being run with limited funds, and should therefore not be used as a reliable computing platform for long-term projects, as funding could run out at any time. However, if you have funding and are able to pay for cloud computing resources, you can request your own Pangeo-like cloud environment from 2i2c - the same company that runs and maintains the Pangeo Cloud.