Dreamhost | Previous | 2009-07-02 Thu | Next |

2009-07-02 Thu

10:28 It Must be Crap on Relational Dabases Week (1576 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

It's hard to be a relational database lately. After years of faithful service everywhere you look the world is turning against you:

  • Recently at the NoSQL conference 150 revolutionaries met with their new anti-RDBMS arms suppliers. And you know what happens when revolutionaries are motivated, educated, funded, and well armed.
  • The revolution has gone mainstream when Computerworld writes No to SQL? Anti-database movement gains steam. It's not just whispers anymore, it's everywhere.
  • And perennial revolutionary Michael Stonebraker runs from blog to blog shouting the The End of a DBMS Era (Might be Upon Us). Relational vendors are selling legacy software, are 50x slower than other alternatives, and that can not stand.
  • The Greek Chorus on Hacker News sings of anger and lies.
  • Certainly some say stick with the past. It's your fault, you aren't doing it right, give us another chance and all will be as it ever was. Some smirk saying this is nothing but a return to a more ancient time when IBM was King.

    But it's in the air. It's in the code. A revolution is coming. To what? That is what is not yet clear.

    07:43 Product: Hbase (2574 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

    Update 3: Presentation from the NoSQL Conference: slides, video.
    Update 2: Jim Wilson helps with the Understanding HBase and BigTable by explaining them from a "conceptual standpoint."
    Update: InfoQ interview: HBase Leads Discuss Hadoop, BigTable and Distributed Databases. "MapReduce (both Google's and Hadoop's) is ideal for processing huge amounts of data with sizes that would not fit in a traditional database. Neither is appropriate for transaction/single request processing."

    Hbase is the open source answer to BigTable, Google's highly scalable distributed database. It is built on top of Hadoop (product), which implements functionality similar to Google's GFS and Map/Reduce systems.

    read more

    07:38 Hypertable is a New BigTable Clone that Runs on HDFS or KFS (2979 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

    Update 3: Presentation from the NoSQL conference: slides, video 1, video 2.
    Update 2: The folks at Hypertable would like you to know that Hypertable is now officially sponsored by Baidu, China’s Leading Search Engine. As a sponsor of Hypertable, Baidu has committed an industrious team of engineers, numerous servers, and support
    resources to improve the quality and development of the open source technology.

    Update: InfoQ interview on Hypertable Lead Discusses Hadoop and Distributed Databases. Hypertable differs from HBase in that it is a higher performance implementation of Bigtable.

    Skrentablog gives the heads up on Hypertable, Zvents' open-source BigTable clone. It's written in C++ and can run on top of either HDFS or KFS. Performance looks encouraging at 28M rows of data inserted at a per-node write rate of 7mb/sec.

    07:31 Product: Facebook's Cassandra - A Massive Distributed Store (3514 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

    Update 2: Presentation from the NoSQL conference: slides, video.
    Update: Why you won't be building your killer app on a distributed hash table by Jonathan Ellis. Why I think Cassandra is the most promising of the open-source distributed databases --you get a relatively rich data model and a distribution model that supports efficient range queries. These are not things that can be grafted on top of a simpler DHT foundation, so Cassandra will be useful for a wider variety of applications.

    James Hamilton has published a thorough summary of Facebook's Cassandra, another scalable key-value store for your perusal. It's open source and is described as a "BigTable data model running on a Dynamo-like infrastructure." Cassandra is used in Facebook as an email search system containing 25TB and over 100m mailboxes.

  • Google Code for Cassandra - A Structured Storage System on a P2P Network
  • SIGMOD 2008 Presentation.
  • Video Presentation at Facebook
  • Facebook Engineering Blog for Cassandra
  • Anti-RDBMS: A list of distributed key-value stores
  • Facebook Cassandra Architecture and Design by James Hamilton
  • 07:02 Product: Project Voldemort - A Distributed Database (839 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

    Update: Presentation from the NoSQL conference: slides, video 1, video 2.

    Project Voldemort is an open source implementation of the basic parts of Dynamo (Amazon’s Highly Available Key-value Store) distributed key-value storage system. LinkedIn is using it in their production environment for "certain high-scalability storage problems where simple functional partitioning is not sufficient."

    read more

    06:53 Anti-RDBMS: A list of distributed key-value stores (1546 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

    Update 2: They are now called NoSQL databases. So keep up! Eric Lai wrote a good article in Computerworld No to SQL? Anti-database movement gains steam about the phenomena. There was even a NoSQL conference. It was unfortunately full by the time I wanted to sign up, but there are presentations by all the major players. Nice Hacker News thread too.
    Update: Some Notes on Distributed Key Stores by Leonard Lin. What's the best way to handle a fast growing system with 100M items that requires low latency and lots of inserts? Leanord takes a trip through several competing systems. The winner was: Tokyo Cabinet.

    Richard Jones has put together a very nice list of various key-value stores around the internets. The list includes: Project Voldemort, Ringo, Scalaris, Kai, Dynomite, MemcacheDB, ThruDB, CouchDB, Cassandra, HBase, and Hypertable. Richard also includes some commentary and their basic components (language, fault tolerance, persistence, client protocol, data model, docs, community).

    There's an excellent discussion in the comments of Paxos vs Vector Clock techniques for synchronizing writes in the face of network failures.

    2009-07-01 Wed

    21:30 Podcast about Facebook's Cassandra Project and the New Wave of Distributed Databases (569 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

    In this podcast, we interview Jonathan Ellis about how Facebook's open sourced Cassandra Project took lessons learned from Amazon's Dynamo and Google's BigTable to tackle the difficult problem of building a highly scalable, always available, distributed data store.

    21:10 Gathering queries from a server with Maatkit and tcpdump (27320 Bytes) » MySQL Performance Blog

    For the last couple of months, we've been quietly developing a MySQL protocol parser for Maatkit. It isn't an implementation of the protocol: it's an observer of the protocol. This lets us gather queries from servers that don't have a slow query log enabled, at very high time resolution.

    With this new functionality, it becomes possible for mk-query-digest to stand on the sidelines and watch queries fly by over TCP. It is only an observer on the sidelines: it is NOT a man in the middle like mysql-proxy, so it has basically zero impact on the running server (tcpdump is very efficient) and zero impact on the query latency. There are some unique challenges to watching an entire server's traffic, but we've found ways to solve those. Some of them are harder than others, such as making sense of a conversation when you start listening in the middle. In general, it's working very well. We can gather just about every bit of information about queries that mysql-proxy can, making this a viable way to monitor servers without the disadvantages of a proxy. In theory, we can gather ALL the same information, but in practice we are going for the 95% case.

    As always with Maatkit, this has minimal dependencies. It doesn't require any Net::Pcap or other modules from CPAN. It's written in pure Perl, and it parses the output of tcpdump, rather than watching the network traffic directly. This might sound useless, but it's not. It means you can go tcpdump some traffic on a machine without Perl installed, and copy it to another machine for analysis, or send it via email to your friendly consultant, or do any of a number of other things. Decoupling things is very helpful sometimes.

    Let's see how to gather queries and do something useful with them. I'll just watch the queries on a sandbox server on my laptop, and print out the profile synopsis so you can see how it works.

    CODE:
    1. sudo tcpdump -i lo port 3306 -s 65535 -x -n -q -tttt> tcpdump.out

    I run a few queries, quit, and cancel tcpdump. Now I've got a file and I'm ready to analyze it. Let's see:

    CODE:
    1. mk-query-digest --type=tcpdump --report-format=profile tcpdump.out
    2. # Rank Query ID           Response time    Calls   R/Call     Item
    3. # ==== ================== ================ ======= ========== ====
    4. #    1 0x088084BF139954D8     0.1009 90.2%       1   0.100881 SELECT dual
    5. #    2 0x261703E684370D2C     0.0110  9.8%       1   0.011017

    I'm kind of showing off the summary profile here to illustrate that you can get really compact results to see what's going on inside your server. What do you suppose that one query was that took a tenth of a second? We can find out.

    CODE:
    1. mk-query-digest --type=tcpdump --limit 1 tcpdump.out
    2. # 460ms user time, 30ms system time, 8.88M rss, 11.79M vsz
    3. # Overall: 8 total, 6 unique, 0.15 QPS, 0.00x concurrency ________________
    4. #                    total     min     max     avg     95%  stddev  median
    5. # Exec time          114ms       0   101ms    14ms   100ms    33ms   737us
    6. # Hosts                  8
    7. # Time range        2009-07-01 23:55:41.334082 to 2009-07-01 23:56:33.929945
    8. # bytes                215      14      49   26.88   46.83    9.76   26.08
    9. # Errors                 8
    10. # Rows affe              0       0       0       0       0       0       0
    11. # Warning c              0       0       0       0       0       0       0
    12. #   0% No_good_index_used
    13. 12% No_index_used
    14.  
    15. # Query 1: 0 QPS, 0x concurrency, ID 0x088084BF139954D8 at byte 7517 _____
    16. # This item is included in the report because it matches --limit.
    17. #              pct   total     min     max     avg     95%  stddev  median
    18. # Count         12       1
    19. # Exec time     88   101ms   101ms   101ms   101ms   101ms       0   101ms
    20. # Users                  1 msandbox
    21. # Hosts                  1 127.0.0.1
    22. # Databases              1
    23. # Time range 2009-07-01 23:56:31.022602 to 2009-07-01 23:56:31.022602
    24. # bytes         22      49      49      49      49      49       0      49
    25. # Errors                 1    none
    26. # Rows affe      0       0       0       0       0       0       0       0
    27. # Warning c      0       0       0       0       0       0       0       0
    28. #   0% No_good_index_used
    29. #   0% No_index_used
    30. # Query_time distribution
    31. #   1us
    32. #  10us
    33. # 100us
    34. #   1ms
    35. #  10ms
    36. # 100ms  ################################################################
    37. #    1s
    38. #  10s+
    39. # Tables
    40. #    SHOW TABLE STATUS LIKE 'dual'\G
    41. #    SHOW CREATE TABLE `dual`\G
    42. # EXPLAIN
    43. select 1 from ( select sleep(.1) from dual ) as x\G

    Indeed, it's no surprise the query took a tenth of a second to execute, and now you see where "SELECT dual" comes from.

    Notice that it is inspecting the protocol enough to see the flags set in the protocol, indicating the warning count, error count, rows affected, and whether no index or no good index was available. Look at the top of the report -- what is up with the 12% of queries that say No_index_used? If we increase --limit a bit, we can see

    CODE:
    1. #   0% No_good_index_used
    2. # 100% No_index_used
    3. # Query_time distribution
    4. ... snip ...
    5. show databases\G

    I did not know that SHOW DATABASES sets the "no index used" flag, did you? Now we both do!

    This is just a brief introduction to what the protocol parser can do. Of course, in real life it's much more useful than just seeing a query or two -- it has all the power of mk-query-digest for filtering, aggregating, printing and so forth.


    Entry posted by Baron Schwartz | No comment

    Add to: delicious | digg | reddit | netscape | Google Bookmarks

    16:45 Apologies to those I might have offended :) (706 Bytes) » My SQL Dump
    I'm quite the emotional person, so I end up saying something along the lines of the subject to oh, say, two or three people, some days. I emotionally invest myself in things that I probably shouldn't, as all of us are probably guilty of that from time to time.

    I was particularly mean spirited in my recent post about the TPC-H(tm) benchmark when I suggested that an unnamed analyst should perhaps leave the industry behind. I didn't really mean that.

    I choose to give the TPC-H(tm) numbers significance, especially as an indicator of price/performance ratios. Kickfire kicks ass with a PP ratio of $0.81 per QPPH. This is very impressive. You can choose to disagree if you like.
    16:31 Latency is Everywhere and it Costs You Sales - How to Crush it (6671 Bytes) » High Scalability - Building bigger, faster, more reliable websites.

    Update 5: Shopzilla's Site Redo - You Get What You Measure. At the Velocity conference Phil Dixon, from Shopzilla, presented data showing a 5 second speed up resulted in a 25% increase in page views, a 10% increase in revenue, a 50% reduction in hardware, and a 120% increase traffic from Google. Built a new service oriented Java based stack. Keep it simple. Quality is a design decision. Obsessively easure everything. Used agile and built the site one page at a time to get feedback. Use proxies to incrementally expose users to new pages for A/B testing. Oracle Coherence Grid for caching. 1.5 second page load SLA. 650ms server side SLA. Make 30 parallel calls on server. 100 million requests a day. SLAs measure 95th percentile, averages not useful. Little things make a big difference.
    Update 4: Slow Pages Lose Users. At the Velocity Conference Jake Brutlag (Google Search) and Eric Schurman (Microsoft Bing) presented study data showing delays under half a second impact business metrics and delay costs increase over time and persist. Page weight not key. Progressive rendering helps a lot.
    Update 3: Nati Shalom's Take on this article. Lots of good stuff on designing architectures for latency minimization.
    Update 2: Why Latency Lags Bandwidth, and What it Means to Computing by David Patterson. Reasons: Moore's Law helps BW more than latency; Distance limits latency; Bandwidth easier to sell; Latency help BW, but not vice versa; Bandwidth hurts latency; OS overhead hurts latency more than BW. Three ways to cope: Caching, Replication, Prediction. We haven't talked about prediction. Games use prediction, i.e, project where a character will go, but it's not a strategy much used in websites.
    Update: Efficient data transfer through zero copy. Copying data kills. This excellent article explains the path data takes through the OS and how to reduce the number of copies to the big zero.

    Latency matters. Amazon found every 100ms of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%. A broker could lose $4 million in revenues per millisecond if their electronic trading platform is 5 milliseconds behind the competition.

    read more

    08:09 xtrabackup-0.8 » MySQL Performance Blog

    2009-06-30 Tue

    11:38 Hot New Trend: Linking Clouds Through Cheap IP VPNs Instead of Private Lines » High Scalability - Building bigger, faster, more reliable websites.

    2009-06-29 Mon

    20:21 Few more ideas for InnoDB features » MySQL Performance Blog
    14:31 eHarmony.com describes how they use Amazon EC2 and MapReduce » High Scalability - Building bigger, faster, more reliable websites.

    2009-06-26 Fri

    2009-06-25 Thu

    14:00 What is “Network Delay”? » I'm just a simple DBA on a complex production system
     123
     123