Published on the 17/12/2024 | Written by Heather Wright
New supercomputer + data platform + AI…
Kiwi weather forecasting and environmental science research is about to get bolstered with New Zealand’s National Institute of Water and Atmospheric Research updating and future-proofing its data infrastructure – and gearing up for an AI boost.
Jeff Zais, National Institute of Water and Atmospheric Research (Niwa) HPC senior science advisor and platform architect, told iStart an upgrade of its aging on-premises SAN storage will enable the crown research institute to provide more accurate weather forecasts and risk calculations such as coastal flooding, as well as enabling other types of research.
“It is time for this next generation system to be installed.”
The move to the Vast Data Platform comes hard on the heels of the August purchase of a new $20 million supercomputer – its fourth since 1999 – with funding for additional AI capability also approved.
Niwa’s forecasts are provided to a range of organisations, including Fire and Emergency New Zealand, which runs urban and rural fire services across New Zealand and which uses on-call comprehensive forecasting to provide highly detailed weather information to help in emergency responses, including fires and hazardous spills.
The Department of Conservation is also a customer using Niwa forecasts across New Zealand’s regional parks to ensure those heading outdoors have the timely, accurate weather information needed to keep themselves safe.
The Niwa 35 app is also used by farmers to provide ‘best guess’ rain forecasts for 35 days, using machine learning and AI.
The Vast Data Platform will be installed across two Auckland data centres, tightly synchronised to allow Niwa to fully utilise resources at the secondary site and enabling it to run more climate models simultaneously and orchestrate more use cases with fewer resources.
Warrick Johnston, Niwa general manager for tech and innovation, says the platform will provide the backbone for Niwa’s advanced technology and intelligent data capture.
Niwa’s research in atmospheric science, oceanography and climate modelling relies on more than 20 petabytes of historical weather data.
Vast will also help optimise storage efficiency and prepare Niwa for next-generation GPU-based scientific workloads, Zais says.
The new platform will enable Niwa to move from ‘deterministic’ forecasts – a single best guess at what the weather will be over the next six hours – to ensemble forecasting which provides a set of forecasts presenting a range of future weather possibilities.
“We have started this already with very simplified versions, but there is just not enough compute capacity with the current system to run all these ensembles at full resolution,” Zais says.
Niwa will go to full-scale ensemble versions, running 18 slightly different forecasts every six hours, providing a clearer picture of upcoming weather patterns.
Zais says Niwa also hopes to expand the ‘resolution’. Currently it only has a detailed model over New Zealand’s land territory, but wants to expand that to include further south, west and north – where New Zealand’s weather comes from.
Scaling out the modelling will also provide more data – enabling more research to be done on that data over time, Jose Higino, Niwa platforms manager, says.
Zais says the current forecast system upgrade is based on CPUs, not GPUs or machine learning, though approval has been granted to buy additional AI capability, something the organisation will be looking at next year.
“The thing about the Vast storage is it is very well suited to working with GPUs both in machine learning and artificial intelligence. That was another reason we were very comfortable moving to that all-flash Vast storage – it helped us move towards the future, giving us the flexibility and confidence to integrate GPU capabilities as we transition to new weather and climate modelling codes.”
Higino says there are multiple pain points Niwa aims to address with the new platform, including having an easy to manage platform that enabled the organisation to run more models and drive more research in a more efficient way that the current system.
But improving how the two data centre sites worked together was also important. With the existing system, switching operations between the primary and disaster recovery sites, can be problematic.
“With this new one we are trying to work with both sites in a more operationally active scenario and taking advantage of that in a more competitive way.”
Expansion of the system will also be smoother under the new system. Niwa has reserved space in the rack and can simply slide in additional capacity and have near immediate access.
“It will be far less disruptive to scale up – the current one requires us to have downtime, while with the new platform we can add resources while being in operation,” Higino says.
Zais notes that while Niwa can buy a boat which will be good for 30 years or more, the same isn’t true of technology.
“Unfortunately, vendors can only keep these systems running for six to eight years and then they really start falling apart,” he says, adding that Niwa’s system was ‘somewhat’ starting to fall apart and was becoming more and more difficult to keep running.
“So it is time for this next generation system to be installed.”
Higino says new technology also brings with it advances that provide greater performance from less energy.
“We want to always take advantage of that as well,” he says.
Most of the equipment for the new setup has been delivered and Niwa is in the middle of installing it and powering it up. Enabling and acceptance testing will be carried out through Q1 of next year and into Q2.
“All of this will be in production probably in the April and May time frame,” Zais says.
The new system also brings with it lower capital and operational costs for the greater capacity.
While Niwa considered moving to cloud, Zais notes the hefty bills many organisations have found themselves with after moving to cloud – and a move to repatriation back into data centres.
“The cloud can do a lot of wonderful things and it is quite amazing what is available, but it is somewhat expensive.
“Conventional wisdom, going into our procurement process was if you had a predictable workload it would be better to have your own equipment. And sure enough with a predictable four-times-a-day workload, we know exactly what we need to buy and so that is where it turned out to be a significant cost advantage.”
“When we thought about this platform we didn’t rule out cloud,” Higino adds. “In fact, we just prepared the infrastructure to work together with cloud in the future if needed so we can make use of the techniques of expanding and reducing capacity as required.”