Image from Coce

Big data : principles and best practices of scalable real-time data systems / Nathan Marz and James Warren.

By: Contributor(s): Material type: TextTextPublisher: Shelter Island, NY : Manning, 2015Copyright date: ©2015Description: xx, 308 pages : illustrations ; 24 cmContent type:
  • text
Media type:
  • unmediated
Carrier type:
  • volume
ISBN:
  • 1617290343
  • 9781617290343
Subject(s): DDC classification:
  • 658.4038 23
Contents:
1. A new paradigm for Big Data -- Part 1. Batch layer : -- 2. Data model for Big Data -- 3. Data model for Big Data: Illustration -- 4. Data storage on the batch layer -- 5. Data storage on the batch layer: Illustration -- 6. Batch layer -- 7. Batch layer: Illustration -- 8. An example batch layer: Architecture and algorithms -- 9. An example batch layer: Implementation -- Part 2. Serving layer : -- 10. Serving layer -- 11. Serving layer: Illustration -- Part 3. Speed layer : -- 12. Realtime views -- 13. Realtime views: Illustration -- 14. Queuing and stream processing -- 15. Queuing and stream processing: Illustration -- 16. Micro-batch stream processing -- 17. Micro-batch stream processing: Illustration -- 18. Lambda Architecture in depth -- --
1. A new paradigm for Big Data -- 1.1. How this book is structured -- 1.2. 1.2Scaling with a traditional database -- 1.3. NoSQL is not a panacea -- 1.4. First principles -- 1.5. Desired properties of a Big Data system -- 1.6. The problems with fully incremental architectures -- 1.7. Lambda Architecture -- 1.8. Recent trends in technology -- 1.9. Example application: SuperWebAnalytics.com -- 1.10. Summary -- -- Part 1. Batch layer : -- -- 2. Data model for Big Data -- 2.1. The properties of data -- 2.2. The fact-based model for representing data -- 2.3. Graph schemas -- 2.4. A complete data model for SuperWebAnalytics.com -- 2.5. Summary -- -- 3. Data model for Big Data: Illustration -- 3.1. Why a serialization framework? -- 3.2. Apache Thrift -- 3.3. Limitations of serialization frameworks -- 3.4. Summary -- -- 4. Data storage on the batch layer -- 4.1. Storage requirements for the master dataset -- 4.2. Choosing a storage solution for the batch layer -- 4.3. How distributed filesystems work -- 4.4. Storing a master dataset with a distributed filesystem -- 4.5. Vertical partitioning -- 4.6. Low-level nature of distributed filesystems -- 4.7. Storing the SuperWebAnalytics.com master dataset on a distributed filesystem -- 4.8. Summary -- -- 5. Data storage on the batch layer: Illustration -- 5.1. Using the Hadoop Distributed File System -- 5.2. Data storage in the batch layer with Pail -- 5.3. Storing the master dataset for SuperWebAnalytics.com -- 5.4. Summary -- -- 6. Batch layer -- 6.1. Motivating examples -- 6.2. Computing on the batch layer -- 6.3. Recomputation algorithms vs. incremental algorithms -- 6.4. Scalability in the batch layer -- 6.5. MapReduce: a paradigm for Big Data computing -- 6.6. Low-level nature of MapReduce -- 6.7. Pipe diagrams: a higher-level way of thinking about batch computation -- 6.8. Summary -- -- 7. Batch layer: Illustration -- 7.1. An illustrative example -- 7.2. Common pitfalls of data-processing tools -- 7.3. An introduction to JCascalog -- 7.4. Composition -- 7.5. Summary -- -- 8. An example batch layer: Architecture and algorithms -- 8.1. Design of the SuperWebAnalytics.com batch layer -- 8.2. Workflow overview -- 8.3. Ingesting new data -- 8.4. URL normalization -- 8.5. User-identifier normalization -- 8.6. Deduplicate pageviews -- 8.7. Computing batch views -- 8.8. Summary -- -- 9. An example batch layer: Implementation -- 9.1. Starting point -- 9.2. Preparing the workflow -- 9.3. Ingesting new data -- 9.4. URL normalization -- 9.5. User-identifier normalization -- 9.6. Deduplicate pageviews -- 9.7. Computing batch views -- 9.8. Summary -- -- Part 2. Serving layer : -- -- 10. Serving layer -- 10.1. Performance metrics for the serving layer -- 10.2. The serving layer solution to the normalization/denormalization problem -- 10.3. Requirements for a serving layer database -- 10.4. Designing a serving layer for SuperWebAnalytics.com -- 10.5. Contrasting with a fully incremental solution -- 10.6. Summary -- -- 11. Serving layer: Illustration -- 11.1. Basics of ElephantDB -- 11.2. Building the serving layer for SuperWebAnalytics.com -- 11.3. Summary -- -- Part 3. Speed layer : -- -- 12. Realtime views -- 12.1. Computing realtime views -- 12.2. Storing realtime views -- 12.3. Challenges of incremental computation -- 12.4. Asynchronous versus synchronous updates -- 12.5. Expiring realtime views -- 12.6. Summary -- -- 13. Realtime views: Illustration -- 13.1. Cassandra's data model -- 13.2. Using Cassandra -- 13.3. Summary -- -- 14. Queuing and stream processing -- 14.1. Queuing -- 14.2. Stream processing -- 14.3. Higher-level, one-at-a-time stream processing -- 14.4. SuperWebAnalytics.com speed layer -- 14.5. Summary -- -- 15. Queuing and stream processing: Illustration -- 15.1. Defining topologies with Apache Storm -- 15.2. Apache Storm clusters and deployment -- 15.3. Guaranteeing message processing -- 15.4. Implementing the SuperWebAnalytics.com uniques-over-time speed layer -- 15.5. Summary -- -- 16. Micro-batch stream processing -- 16.1. Achieving exactly-once semantics -- 16.2. Core concepts of micro-batch stream processing -- 16.3. Extending pipe diagrams for micro-batch processing -- 16.4. Finishing the speed layer for SuperWebAnalytics.com -- 16.5. Pageviews over time 262 n Bounce-rate analysis -- 16.6. Another look at the bounce-rate-analysis example -- 16.7. Summary -- -- 17. Micro-batch stream processing: Illustration -- 17.1. Using Trident -- 17.2. Finishing the SuperWebAnalytics.com speed layer -- 17.3. Fully fault-tolerant, in-memory, micro-batch processing -- 17.4. Summary -- -- 18. Lambda Architecture in depth -- 18.1. Defining data systems -- 18.2. Batch and serving layers -- 18.3. Speed layer -- 18.4. Query layer -- 18.5. Summary.
Summary: "Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built."--Publisher's website.
Tags from this library: No tags from this library for this title. Log in to add tags.

Includes index.

1. A new paradigm for Big Data -- Part 1. Batch layer : -- 2. Data model for Big Data -- 3. Data model for Big Data: Illustration -- 4. Data storage on the batch layer -- 5. Data storage on the batch layer: Illustration -- 6. Batch layer -- 7. Batch layer: Illustration -- 8. An example batch layer: Architecture and algorithms -- 9. An example batch layer: Implementation -- Part 2. Serving layer : -- 10. Serving layer -- 11. Serving layer: Illustration -- Part 3. Speed layer : -- 12. Realtime views -- 13. Realtime views: Illustration -- 14. Queuing and stream processing -- 15. Queuing and stream processing: Illustration -- 16. Micro-batch stream processing -- 17. Micro-batch stream processing: Illustration -- 18. Lambda Architecture in depth -- --

1. A new paradigm for Big Data -- 1.1. How this book is structured -- 1.2. 1.2Scaling with a traditional database -- 1.3. NoSQL is not a panacea -- 1.4. First principles -- 1.5. Desired properties of a Big Data system -- 1.6. The problems with fully incremental architectures -- 1.7. Lambda Architecture -- 1.8. Recent trends in technology -- 1.9. Example application: SuperWebAnalytics.com -- 1.10. Summary -- -- Part 1. Batch layer : -- -- 2. Data model for Big Data -- 2.1. The properties of data -- 2.2. The fact-based model for representing data -- 2.3. Graph schemas -- 2.4. A complete data model for SuperWebAnalytics.com -- 2.5. Summary -- -- 3. Data model for Big Data: Illustration -- 3.1. Why a serialization framework? -- 3.2. Apache Thrift -- 3.3. Limitations of serialization frameworks -- 3.4. Summary -- -- 4. Data storage on the batch layer -- 4.1. Storage requirements for the master dataset -- 4.2. Choosing a storage solution for the batch layer -- 4.3. How distributed filesystems work -- 4.4. Storing a master dataset with a distributed filesystem -- 4.5. Vertical partitioning -- 4.6. Low-level nature of distributed filesystems -- 4.7. Storing the SuperWebAnalytics.com master dataset on a distributed filesystem -- 4.8. Summary -- -- 5. Data storage on the batch layer: Illustration -- 5.1. Using the Hadoop Distributed File System -- 5.2. Data storage in the batch layer with Pail -- 5.3. Storing the master dataset for SuperWebAnalytics.com -- 5.4. Summary -- -- 6. Batch layer -- 6.1. Motivating examples -- 6.2. Computing on the batch layer -- 6.3. Recomputation algorithms vs. incremental algorithms -- 6.4. Scalability in the batch layer -- 6.5. MapReduce: a paradigm for Big Data computing -- 6.6. Low-level nature of MapReduce -- 6.7. Pipe diagrams: a higher-level way of thinking about batch computation -- 6.8. Summary -- -- 7. Batch layer: Illustration -- 7.1. An illustrative example -- 7.2. Common pitfalls of data-processing tools -- 7.3. An introduction to JCascalog -- 7.4. Composition -- 7.5. Summary -- -- 8. An example batch layer: Architecture and algorithms -- 8.1. Design of the SuperWebAnalytics.com batch layer -- 8.2. Workflow overview -- 8.3. Ingesting new data -- 8.4. URL normalization -- 8.5. User-identifier normalization -- 8.6. Deduplicate pageviews -- 8.7. Computing batch views -- 8.8. Summary -- -- 9. An example batch layer: Implementation -- 9.1. Starting point -- 9.2. Preparing the workflow -- 9.3. Ingesting new data -- 9.4. URL normalization -- 9.5. User-identifier normalization -- 9.6. Deduplicate pageviews -- 9.7. Computing batch views -- 9.8. Summary -- -- Part 2. Serving layer : -- -- 10. Serving layer -- 10.1. Performance metrics for the serving layer -- 10.2. The serving layer solution to the normalization/denormalization problem -- 10.3. Requirements for a serving layer database -- 10.4. Designing a serving layer for SuperWebAnalytics.com -- 10.5. Contrasting with a fully incremental solution -- 10.6. Summary -- -- 11. Serving layer: Illustration -- 11.1. Basics of ElephantDB -- 11.2. Building the serving layer for SuperWebAnalytics.com -- 11.3. Summary -- -- Part 3. Speed layer : -- -- 12. Realtime views -- 12.1. Computing realtime views -- 12.2. Storing realtime views -- 12.3. Challenges of incremental computation -- 12.4. Asynchronous versus synchronous updates -- 12.5. Expiring realtime views -- 12.6. Summary -- -- 13. Realtime views: Illustration -- 13.1. Cassandra's data model -- 13.2. Using Cassandra -- 13.3. Summary -- -- 14. Queuing and stream processing -- 14.1. Queuing -- 14.2. Stream processing -- 14.3. Higher-level, one-at-a-time stream processing -- 14.4. SuperWebAnalytics.com speed layer -- 14.5. Summary -- -- 15. Queuing and stream processing: Illustration -- 15.1. Defining topologies with Apache Storm -- 15.2. Apache Storm clusters and deployment -- 15.3. Guaranteeing message processing -- 15.4. Implementing the SuperWebAnalytics.com uniques-over-time speed layer -- 15.5. Summary -- -- 16. Micro-batch stream processing -- 16.1. Achieving exactly-once semantics -- 16.2. Core concepts of micro-batch stream processing -- 16.3. Extending pipe diagrams for micro-batch processing -- 16.4. Finishing the speed layer for SuperWebAnalytics.com -- 16.5. Pageviews over time 262 n Bounce-rate analysis -- 16.6. Another look at the bounce-rate-analysis example -- 16.7. Summary -- -- 17. Micro-batch stream processing: Illustration -- 17.1. Using Trident -- 17.2. Finishing the SuperWebAnalytics.com speed layer -- 17.3. Fully fault-tolerant, in-memory, micro-batch processing -- 17.4. Summary -- -- 18. Lambda Architecture in depth -- 18.1. Defining data systems -- 18.2. Batch and serving layers -- 18.3. Speed layer -- 18.4. Query layer -- 18.5. Summary.

"Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built."--Publisher's website.

There are no comments on this title.

to post a comment.

Powered by Koha