Image from Coce

Big data : principles and best practices of scalable real-time data systems / Nathan Marz and James Warren.

By: Contributor(s): Material type: TextTextPublisher: Shelter Island, NY : Manning, 2015Copyright date: ©2015Description: xx, 308 pages : illustrations ; 24 cmContent type:
  • text
Media type:
  • unmediated
Carrier type:
  • volume
ISBN:
  • 1617290343
  • 9781617290343
Subject(s): DDC classification:
  • 658.4038 23
Contents:
1. A new paradigm for Big Data -- Part 1. Batch layer : -- 2. Data model for Big Data -- 3. Data model for Big Data: Illustration -- 4. Data storage on the batch layer -- 5. Data storage on the batch layer: Illustration -- 6. Batch layer -- 7. Batch layer: Illustration -- 8. An example batch layer: Architecture and algorithms -- 9. An example batch layer: Implementation -- Part 2. Serving layer : -- 10. Serving layer -- 11. Serving layer: Illustration -- Part 3. Speed layer : -- 12. Realtime views -- 13. Realtime views: Illustration -- 14. Queuing and stream processing -- 15. Queuing and stream processing: Illustration -- 16. Micro-batch stream processing -- 17. Micro-batch stream processing: Illustration -- 18. Lambda Architecture in depth -- --
1. A new paradigm for Big Data -- 1.1. How this book is structured -- 1.2. 1.2Scaling with a traditional database -- 1.3. NoSQL is not a panacea -- 1.4. First principles -- 1.5. Desired properties of a Big Data system -- 1.6. The problems with fully incremental architectures -- 1.7. Lambda Architecture -- 1.8. Recent trends in technology -- 1.9. Example application: SuperWebAnalytics.com -- 1.10. Summary -- -- Part 1. Batch layer : -- -- 2. Data model for Big Data -- 2.1. The properties of data -- 2.2. The fact-based model for representing data -- 2.3. Graph schemas -- 2.4. A complete data model for SuperWebAnalytics.com -- 2.5. Summary -- -- 3. Data model for Big Data: Illustration -- 3.1. Why a serialization framework? -- 3.2. Apache Thrift -- 3.3. Limitations of serialization frameworks -- 3.4. Summary -- -- 4. Data storage on the batch layer -- 4.1. Storage requirements for the master dataset -- 4.2. Choosing a storage solution for the batch layer -- 4.3. How distributed filesystems work -- 4.4. Storing a master dataset with a distributed filesystem -- 4.5. Vertical partitioning -- 4.6. Low-level nature of distributed filesystems -- 4.7. Storing the SuperWebAnalytics.com master dataset on a distributed filesystem -- 4.8. Summary -- -- 5. Data storage on the batch layer: Illustration -- 5.1. Using the Hadoop Distributed File System -- 5.2. Data storage in the batch layer with Pail -- 5.3. Storing the master dataset for SuperWebAnalytics.com -- 5.4. Summary -- -- 6. Batch layer -- 6.1. Motivating examples -- 6.2. Computing on the batch layer -- 6.3. Recomputation algorithms vs. incremental algorithms -- 6.4. Scalability in the batch layer -- 6.5. MapReduce: a paradigm for Big Data computing -- 6.6. Low-level nature of MapReduce -- 6.7. Pipe diagrams: a higher-level way of thinking about batch computation -- 6.8. Summary -- -- 7. Batch layer: Illustration -- 7.1. An illustrative example -- 7.2. Common pitfalls of data-processing tools -- 7.3. An introduction to JCascalog -- 7.4. Composition -- 7.5. Summary -- -- 8. An example batch layer: Architecture and algorithms -- 8.1. Design of the SuperWebAnalytics.com batch layer -- 8.2. Workflow overview -- 8.3. Ingesting new data -- 8.4. URL normalization -- 8.5. User-identifier normalization -- 8.6. Deduplicate pageviews -- 8.7. Computing batch views -- 8.8. Summary -- -- 9. An example batch layer: Implementation -- 9.1. Starting point -- 9.2. Preparing the workflow -- 9.3. Ingesting new data -- 9.4. URL normalization -- 9.5. User-identifier normalization -- 9.6. Deduplicate pageviews -- 9.7. Computing batch views -- 9.8. Summary -- -- Part 2. Serving layer : -- -- 10. Serving layer -- 10.1. Performance metrics for the serving layer -- 10.2. The serving layer solution to the normalization/denormalization problem -- 10.3. Requirements for a serving layer database -- 10.4. Designing a serving layer for SuperWebAnalytics.com -- 10.5. Contrasting with a fully incremental solution -- 10.6. Summary -- -- 11. Serving layer: Illustration -- 11.1. Basics of ElephantDB -- 11.2. Building the serving layer for SuperWebAnalytics.com -- 11.3. Summary -- -- Part 3. Speed layer : -- -- 12. Realtime views -- 12.1. Computing realtime views -- 12.2. Storing realtime views -- 12.3. Challenges of incremental computation -- 12.4. Asynchronous versus synchronous updates -- 12.5. Expiring realtime views -- 12.6. Summary -- -- 13. Realtime views: Illustration -- 13.1. Cassandra's data model -- 13.2. Using Cassandra -- 13.3. Summary -- -- 14. Queuing and stream processing -- 14.1. Queuing -- 14.2. Stream processing -- 14.3. Higher-level, one-at-a-time stream processing -- 14.4. SuperWebAnalytics.com speed layer -- 14.5. Summary -- -- 15. Queuing and stream processing: Illustration -- 15.1. Defining topologies with Apache Storm -- 15.2. Apache Storm clusters and deployment -- 15.3. Guaranteeing message processing -- 15.4. Implementing the SuperWebAnalytics.com uniques-over-time speed layer -- 15.5. Summary -- -- 16. Micro-batch stream processing -- 16.1. Achieving exactly-once semantics -- 16.2. Core concepts of micro-batch stream processing -- 16.3. Extending pipe diagrams for micro-batch processing -- 16.4. Finishing the speed layer for SuperWebAnalytics.com -- 16.5. Pageviews over time 262 n Bounce-rate analysis -- 16.6. Another look at the bounce-rate-analysis example -- 16.7. Summary -- -- 17. Micro-batch stream processing: Illustration -- 17.1. Using Trident -- 17.2. Finishing the SuperWebAnalytics.com speed layer -- 17.3. Fully fault-tolerant, in-memory, micro-batch processing -- 17.4. Summary -- -- 18. Lambda Architecture in depth -- 18.1. Defining data systems -- 18.2. Batch and serving layers -- 18.3. Speed layer -- 18.4. Query layer -- 18.5. Summary.
Summary: "Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built."--Publisher's website.
Tags from this library: No tags from this library for this title. Log in to add tags.
Holdings
Item type Current library Call number Copy number Status Date due Barcode
Book City Campus City Campus Main Collection 658.4038 MAR (Browse shelf(Opens below)) 1 Available A507726B
Book City Campus City Campus Main Collection 658.4038 MAR (Browse shelf(Opens below)) 1 Available A556463B

Includes index.

1. A new paradigm for Big Data -- Part 1. Batch layer : -- 2. Data model for Big Data -- 3. Data model for Big Data: Illustration -- 4. Data storage on the batch layer -- 5. Data storage on the batch layer: Illustration -- 6. Batch layer -- 7. Batch layer: Illustration -- 8. An example batch layer: Architecture and algorithms -- 9. An example batch layer: Implementation -- Part 2. Serving layer : -- 10. Serving layer -- 11. Serving layer: Illustration -- Part 3. Speed layer : -- 12. Realtime views -- 13. Realtime views: Illustration -- 14. Queuing and stream processing -- 15. Queuing and stream processing: Illustration -- 16. Micro-batch stream processing -- 17. Micro-batch stream processing: Illustration -- 18. Lambda Architecture in depth -- --

1. A new paradigm for Big Data -- 1.1. How this book is structured -- 1.2. 1.2Scaling with a traditional database -- 1.3. NoSQL is not a panacea -- 1.4. First principles -- 1.5. Desired properties of a Big Data system -- 1.6. The problems with fully incremental architectures -- 1.7. Lambda Architecture -- 1.8. Recent trends in technology -- 1.9. Example application: SuperWebAnalytics.com -- 1.10. Summary -- -- Part 1. Batch layer : -- -- 2. Data model for Big Data -- 2.1. The properties of data -- 2.2. The fact-based model for representing data -- 2.3. Graph schemas -- 2.4. A complete data model for SuperWebAnalytics.com -- 2.5. Summary -- -- 3. Data model for Big Data: Illustration -- 3.1. Why a serialization framework? -- 3.2. Apache Thrift -- 3.3. Limitations of serialization frameworks -- 3.4. Summary -- -- 4. Data storage on the batch layer -- 4.1. Storage requirements for the master dataset -- 4.2. Choosing a storage solution for the batch layer -- 4.3. How distributed filesystems work -- 4.4. Storing a master dataset with a distributed filesystem -- 4.5. Vertical partitioning -- 4.6. Low-level nature of distributed filesystems -- 4.7. Storing the SuperWebAnalytics.com master dataset on a distributed filesystem -- 4.8. Summary -- -- 5. Data storage on the batch layer: Illustration -- 5.1. Using the Hadoop Distributed File System -- 5.2. Data storage in the batch layer with Pail -- 5.3. Storing the master dataset for SuperWebAnalytics.com -- 5.4. Summary -- -- 6. Batch layer -- 6.1. Motivating examples -- 6.2. Computing on the batch layer -- 6.3. Recomputation algorithms vs. incremental algorithms -- 6.4. Scalability in the batch layer -- 6.5. MapReduce: a paradigm for Big Data computing -- 6.6. Low-level nature of MapReduce -- 6.7. Pipe diagrams: a higher-level way of thinking about batch computation -- 6.8. Summary -- -- 7. Batch layer: Illustration -- 7.1. An illustrative example -- 7.2. Common pitfalls of data-processing tools -- 7.3. An introduction to JCascalog -- 7.4. Composition -- 7.5. Summary -- -- 8. An example batch layer: Architecture and algorithms -- 8.1. Design of the SuperWebAnalytics.com batch layer -- 8.2. Workflow overview -- 8.3. Ingesting new data -- 8.4. URL normalization -- 8.5. User-identifier normalization -- 8.6. Deduplicate pageviews -- 8.7. Computing batch views -- 8.8. Summary -- -- 9. An example batch layer: Implementation -- 9.1. Starting point -- 9.2. Preparing the workflow -- 9.3. Ingesting new data -- 9.4. URL normalization -- 9.5. User-identifier normalization -- 9.6. Deduplicate pageviews -- 9.7. Computing batch views -- 9.8. Summary -- -- Part 2. Serving layer : -- -- 10. Serving layer -- 10.1. Performance metrics for the serving layer -- 10.2. The serving layer solution to the normalization/denormalization problem -- 10.3. Requirements for a serving layer database -- 10.4. Designing a serving layer for SuperWebAnalytics.com -- 10.5. Contrasting with a fully incremental solution -- 10.6. Summary -- -- 11. Serving layer: Illustration -- 11.1. Basics of ElephantDB -- 11.2. Building the serving layer for SuperWebAnalytics.com -- 11.3. Summary -- -- Part 3. Speed layer : -- -- 12. Realtime views -- 12.1. Computing realtime views -- 12.2. Storing realtime views -- 12.3. Challenges of incremental computation -- 12.4. Asynchronous versus synchronous updates -- 12.5. Expiring realtime views -- 12.6. Summary -- -- 13. Realtime views: Illustration -- 13.1. Cassandra's data model -- 13.2. Using Cassandra -- 13.3. Summary -- -- 14. Queuing and stream processing -- 14.1. Queuing -- 14.2. Stream processing -- 14.3. Higher-level, one-at-a-time stream processing -- 14.4. SuperWebAnalytics.com speed layer -- 14.5. Summary -- -- 15. Queuing and stream processing: Illustration -- 15.1. Defining topologies with Apache Storm -- 15.2. Apache Storm clusters and deployment -- 15.3. Guaranteeing message processing -- 15.4. Implementing the SuperWebAnalytics.com uniques-over-time speed layer -- 15.5. Summary -- -- 16. Micro-batch stream processing -- 16.1. Achieving exactly-once semantics -- 16.2. Core concepts of micro-batch stream processing -- 16.3. Extending pipe diagrams for micro-batch processing -- 16.4. Finishing the speed layer for SuperWebAnalytics.com -- 16.5. Pageviews over time 262 n Bounce-rate analysis -- 16.6. Another look at the bounce-rate-analysis example -- 16.7. Summary -- -- 17. Micro-batch stream processing: Illustration -- 17.1. Using Trident -- 17.2. Finishing the SuperWebAnalytics.com speed layer -- 17.3. Fully fault-tolerant, in-memory, micro-batch processing -- 17.4. Summary -- -- 18. Lambda Architecture in depth -- 18.1. Defining data systems -- 18.2. Batch and serving layers -- 18.3. Speed layer -- 18.4. Query layer -- 18.5. Summary.

"Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built."--Publisher's website.

There are no comments on this title.

to post a comment.

Powered by Koha