Redbook IBM Storage Ceph
Data Lakehouse platform for IBM watsonx.data and beyond
This is a Press Release edited by StorageNewsletter.com on July 17, 2024 at 2:18 pm
By Daniel Alexander Parkes, technical product manager IBM Storage Ceph, IBM Corp.
Introducing the latest addition to the IBM Storage Ceph IBM Redbook family: IBM Storage Ceph as a Data Lakehouse Platform for IBM watsonx.data and Beyond.
(credit: IBM)
This new publication dives into how IBM Storage Ceph, with its robust and scalable object storage capabilities, is suited for modern Data Lakehouse environments. Leveraging the efficiency and performance of Iceberg for curated tabular data, Storage Ceph stands out as a premier on-premises object store solution. Using a quote from the initial chapter of the book:
“This is where Storage Ceph takes the stage. Storage Ceph is open source, software-defined, runs on industry-standard hardware, and has best-in-class coverage of the lingua franca of object storage, the Amazon S3 API. It was designed from the ground up as an object store, contrasting with approaches that bolt-on S3 API servers to a distributed filesystem. With Ceph, data placement is by algorithm instead of by lookup. This allows Ceph to scale well into the billions of objects, even on modestly sized clusters. Data stored in Ceph is protected with efficient erasure coding, in-flight and at-rest checksums and encryption, and robust access control that thoughtfully integrates with enterprise identity systems.”
Chapter 3 of the book demonstrates how the Storage Ceph Object Storage Feature Set is a perfect candidate for your on-prem datalake/lakehouse requirements; each section covers scale, performance, security, efficiency, resiliency, cost effectiveness and management.
In Chapter 6 of the book, we take you through a hands-on example with a retail use case, where we, step by step, take you through the setup and configuration of the required data pipelines and transformations. We start with raw, unstructured data: browser logs, transaction data, and customer feedback, ending up with easily consumable curated data, allowing us to create visualization dashboards to interpret and take action on the data:
Click to enlarge
Chapter 6 provides comprehensive, hands-on examples of configuring various Storage Ceph Object features.
This chapter serves not only as an illustrative use case but also as a detailed configuration guide. Here is a brief list of some features we cover:
- Object Storage S3 Life Cycle Policy Expiration filtering by tags.
- Object Data at rest encryption using S3 SSE-KMS.
- Object Storage Secure Token Service. IDP authentication through SSO.
- Object Storage IAM Role RBAC-based Authorization.
- Object Storage Bucket Notification Integration with Knative & Kafka.
- S3 Object Lock and Versioning Configuration.
- Object Multi-Factor Authentication delete.
- Object Cold Storage Classes, Data Tiering Configuration.
- Object Early data query and filtering with S3 Select.
Also, it’s important to remember that the books have an associated GitHub repo containing the code and examples of the configurations used during Chapter 6: https://github.com/IBMRedbooks/IBM-Storage-Ceph-as-a-Data-Lakehouse-platform-for-IBM-watsonx.data-and-Beyond
IBM Storage Ceph already has 2 other IBM Books available:
- IBM Storage Ceph Concepts and Architecture Guide. This book is a perfect place to start your Ceph Journey.
- IBM Storage Ceph Solutions Guide. This book provides a comprehensive list of hands-on examples for different Ceph use cases.
For more information, you can visit the official IBM Redbooks website: IBM Redbooks.