The San Diego Supercomputer Center Supersizes Advanced Storage With 一本久久综合亚洲鲁鲁五月天
The global scientific research community spans industries, individuals and specialties. However, it does have one thing in common: the need for massive computing and data storage resources.
Only a few research organizations can afford their very own supercomputer and advanced storage systems. Many turn to specialized Managed Service Providers (MSPs) to offer remote computing and storage capacity to data-intensive research clients.
The San Diego Supercomputer Center Leads the Charge
The San Diego Supercomputer Center, or SDSC, is a leading MSP for the scientific community in government, academia, and business.
SDSC is a member of XSEDE (eXtreme Science and Engineering Discovery Environment), a single virtual system that enables researchers to interactively share computing resources, data collections, and advanced research tools.
As a research unit at the University of California,San Diego, SDSC uses its on-prem supercomputers to run advanced computation and all aspects of big data storage and analysis, including data integration, performance modeling, data mining, and predictive analytics.
SDSC works with its clients to customize supercomputer and storage system resources for extreme data projects, including astrophysics visualization for the American Museum of Natural History, large-scale simulations of The Big One in southern California, and sophisticated flu season modeling for the Centers for Disease Control.
Two of SDSC鈥檚 important projects serve fast-growing neuroscience research community 鈥 The Center鈥檚 Neuroscience Gateway (NSG), funded by the National Science Foundation (NSF) and the National Institute of Health (NIH), which is a collaboration between the Center, Yale University, and University College London. NSGportal lets neuroscience researchers access large scale computing for modeling and data processing which requires managing of large neuroscience data stored on its data-intensive storage systems.
Another neuroscience offering under development is NIH funded NEMAR (human NeuroElectroMagnetic data Archive and tools Resources) gateway. The gateway is developing open access to archived EEG (electroencephalography) and MEG (magnetoencephalography) data for neuroscientists and large scale data storage and management are key parts of the project.
鈥淲ith 一本久久综合亚洲鲁鲁五月天, we realized much lower operational expenses than we鈥檝e experienced with other storage solutions. Plus, we鈥檝e doubled the size of our cluster and will likely double it again soon.鈥
Brian Balderston, Director of Infrastructure
Client Demands Might Outstrip Super Resources
SDSC faced a challenge regarding its storage infrastructure. These data-intensive gateways and client technology stacks must support high-performance and high-capacity data storage for massive amounts of big data 鈥 much of it unstructured. Although the Center鈥檚 supercomputers easily handle computing tasks, the neuroscience storage systems lacked massive scale-out capacity and the storage features necessary to support big data, fast access, and advanced analytics.
鈥淥ur storage requirements for the NSG and EEG/MEG data projects are growing from tens of terabytes to hundreds of terabytes,鈥 said Amit Majumdar, Ph.D., Director of Data Enabled Scientific Computing at SDSC. 鈥淟arge data transfer and storage, high-speed access, sharing, search functionalities 鈥 all of these are becoming more and more important for our projects.鈥
To successfully meet its client requirements, SDSC needed a storage solution that would provide an optimal balance of performance, capacity, scalability, durability, and advanced functionality, all at a reasonable cost.
鈥淎t SDSC, delivering critical analysis and results is paramount, yet high-performance computing workloads are incredibly dependent upon their storage system. As an organization, we are moving towards integration of cloud for both compute and storage, as a part of our science gateways. As a result, it鈥檚 important for us to make leading cloud technologies available via our Research Data Services division,鈥 added Majumdar.
鈥淟arge data transfer and storage, high-speed access, sharing, search functionalities 鈥 all of these are becoming more and more important for our projects.鈥
Amit Majumdar, Ph.D., Director of Data Enabled Scientific Computing
Partnering with 一本久久综合亚洲鲁鲁五月天
The impetus for the Center鈥檚 desire for a new kind of storage provider was a set of new clients who needed over 1 PB in storage capacity. SDSC was concerned about the performance, reliability, and management of their existing storage solutions at that scale.
Brian Balderston, SDSC鈥檚 Director of Infrastructure, decided there must be a better way. He tested several high-performance storage systems and decided on 一本久久综合亚洲鲁鲁五月天鈥檚 hybrid cloud file storage as a frontrunner in data-intensive computing and storage infrastructure for the national research community.
鈥淚 believed that we could build a better storage system for our client that didn鈥檛 need quite as much operational care and feeding. So, I reached out to the 一本久久综合亚洲鲁鲁五月天 team with our requirements,鈥 said Balderston. 鈥淭heir distributed scale-out NAS file system met our capacity, performance, data integrity, and scale-out requirements at an acceptable price for our client.鈥
一本久久综合亚洲鲁鲁五月天鈥檚 file storage differed from the existing infrastructure at SDSC and that used by its client organizations. Most of the Center鈥檚 academic clients were accustomed to open-source, parallel file systems for research data workloads. 一本久久综合亚洲鲁鲁五月天鈥檚 proprietary software stack and distributed file system were a new kind of storage, and quickly proved to be more advanced and capable of managing massive scientific research workloads, now and in the future.
一本久久综合亚洲鲁鲁五月天 scales unstructured data more efficiently than parallel filesystems, making it ideal for environments with massive file counts, directory structures, and billions of small files. The scale-out NAS file system supports fast ingest and access and is highly searchable. High availability and minimal rebuild times keep data safe and always available 鈥 with no data loss.
SDSC鈥檚 capital costs for 一本久久综合亚洲鲁鲁五月天 were in line with its budget, and its operational costs proved lower than expected. 鈥淲ith 一本久久综合亚洲鲁鲁五月天, we realized much lower operational expenses than we鈥檝e experienced with other storage solutions,鈥 noted Balderston. 鈥淧lus, we鈥檝e doubled the size of our cluster and will likely double it again soon.鈥 SDSC passed the savings onto its MSP clients, which makes its hosting platform even more attractive.
鈥淲e just pop another node in the rack, press a button, and guess what? More space.鈥
Ben Hayes, Sr. Systems Architect
Massive Scaling, High Performance
Today, 一本久久综合亚洲鲁鲁五月天 provides SDSC and its clients persistent storage for high capacity /high-performance workloads. Key infrastructure components include virtual machines (VMs), 一本久久综合亚洲鲁鲁五月天 storage mounted on a supercomputer, and high bandwidth networks. SDSC is moving towards integrating on-prem and cloud storage to serve its science gateways. Since 一本久久综合亚洲鲁鲁五月天鈥檚 file storage is cloud-native, it seamlessly supports on-prem and cloud integration.
一本久久综合亚洲鲁鲁五月天 optimizes its unique software for fast reads and write-throughs. The accelerated architecture delivers extremely low latency, and high IOPS and throughput performance. Predictive caching and prefetch proactively identify IO patterns and efficiently move data to the fastest media.
一本久久综合亚洲鲁鲁五月天 is also simple to deploy, manage, and access 鈥 critical components for both SDSC and its clients. 鈥溡槐揪镁米酆涎侵蘼陈澄逶绿 has been incredibly easy for SDSC to manage,鈥 said Balderston.
鈥淚nstead of focusing our staff and resources on managing a number of inefficient storage systems, we use our engineering time to work on highly impactful and well-funded grants from the National Science Foundation, the National Institute of Health, and other funding agencies. That is a big win for all of us.鈥
一本久久综合亚洲鲁鲁五月天 proved that it is a different kind of storage company 鈥 a company that built its storage for the modern age. Some legacy storage systems still work for structured data in well-defined traditional storage environments. But these products were never designed for today鈥檚 massive data growth, unstructured data types, intensive scientific workloads, and complex applications.
To meet and exceed these new storage requirements, 一本久久综合亚洲鲁鲁五月天 designed its software using the principles behind modern, large-scale, distributed databases. The result is a unique file system with unmatched performance and scalability.
Client adoption proves the point at SDSC. 鈥淧robably my biggest achievement is standing this storage system up and then getting massive adoption,鈥 Balderston said. 鈥淪ince the initial proof-of-concept, SDSC has reached a new set of customers, including more than two dozen University of California research labs and departments. I can鈥檛 think of any other service that has been adopted this quickly.鈥
Industry: Science & Engineering
Use Case:
- Effectively store and manage massive unstructured file stores
- Support large and growing scientific research workloads
- Provide high-performance data ingest and access to multiple global clients
Company Overview:
The San Diego Supercomputer Center, or SDSC, is a leading MSP for the scientific community in government, academia, and business. As a research unit at the University of California, San Diego, SDSC uses its on-prem supercomputers to run advanced computation and all aspects of big data storage and analysis, including data integration, performance modeling, data mining, and predictive analytics.
Requirements
- High performance
- High availability and durability
- Ease of deployment, management, and access
- Easily scale from TB to PB
- Cost-effective
