Skip to main content
Adding MySQL Slow Query Logs to Logstash
May 6, 2014 |

Logstash is a great tool for consolidating logs and making them easy to analyze. Adding multiline log files such as MySQL's slow query log, however can be a bit of a challenge. Once you have, however, asking questions like "how many of my slow queries last week involved the taxonomy_hierarchy table and were from user slow_query_writer" become a snap. There are a few resources for this online 1 but none that I found worked out of the box. Here is how to setup a logstash recipe for MySQL's slow query logs using logstash. If you were using a simple logstash instance you may want to put the snippets below into /etc/logstash/logstash.conf This was developed using logstash 1.2.2 and MySQL 5.5 on an Ubuntu 12.04 LTS system. This recipe also appears to work for the 1.3.3 version of logstash. To handle MySQL's slow query log format, the multiline codec must be used as part of the input section of the configuration file. Some of the examples I found show multiple multiline entries. Examination of the multiline codec and testing shows, however, that a single entry needs to be used to break up each record2. Due to the way the matching works, a negated previous entry must be used so that anything which does not match the record is buffered.

[ruby]input {
  file {
    type => "mysql-slow"
    path => "/var/log/mysql/mysql-slow.log"

    # Key breaking the log up on the # User@Host line, this will mean
    # sometimes a # Time line from the next entry will be incorrectly
    # included but since that isn't consistently present it can't be
    # keyed off of
    # Due to the way the multiline codec works, previous must be used
    # to collect everything which isn't the User line up. Since
    # queries can be multiline the User line can't be pushed forward
    # as it would only collect the first line of the actual query
    # data.
    # logstash will always be one slow query behind because it needs
    # the User line to trigger that it is done with the previous entry.
    # A periodic "SELECT SLEEP(1);" where 1 is above the slow query
    # threshold can be used to "flush" these events through at the
    # expense of having spurious log entries (see the drop filter)
    codec => multiline {
      pattern => "^# User@Host:"
      negate => true
      what => previous

This has a side effect of causing events being recorded to always be one event behind the last one in the slow query log file. See the comments in the code for more details and a tip on how to handle if this is an issue for you. Now that each record is gathered, various filters can be used to parse interesting fields out of it. Many of these are straight data captures. The last 3 allow for setting the exact time of the event based on the logged unix timestamp. This can be handy if you just want to periodically rotate a log file and then pull it into logstash instead of monitoring it continuously. Without this change, logstash would use a timestamp based on when it processed the item from the log.

[ruby]filter {
  # Capture user, optional host and optional ip fields
  # sample log file lines:
  # User@Host: logstash[logstash] @ localhost []
  # User@Host: logstash[logstash] @  []
  grok {
    match => [ "message", "^# User@Host: %{USER:user}(?:[[^]]+])?s+@s+%{HOST:host}?s+[%{IP:ip}?]" ]
  # Capture query time, lock time, rows returned and rows examined
  # sample log file lines:
  # Query_time: 102.413328  Lock_time: 0.000167 Rows_sent: 0  Rows_examined: 1970
  # Query_time: 1.113464  Lock_time: 0.000128 Rows_sent: 1  Rows_examined: 0
  grok {
    match => [ "message", "^# Query_time: %{NUMBER:duration:float}s+Lock_time: %{NUMBER:lock_wait:float} Rows_sent: %{NUMBER:results:int} s*Rows_examined: %{NUMBER:scanned:int}"]
  # Capture the time the query happened
  grok {
    match => [ "message", "^SET timestamp=%{NUMBER:timestamp};" ]
  # Extract the time based on the time of the query and
  # not the time the item got logged
  date {
    match => [ "timestamp", "UNIX" ]
  # Drop the captured timestamp field since it has been moved to the
  # time of the event
  mutate {
    remove_field => "timestamp"

Once the multiline codec has been used to break the lines up into each record extracting fields you are interested in is fairly straightforward. The configuration above results in the overall SQL query being seen in the primary message for the log item with particular elements pulled out as fields. Here's a sample of what you see using the stdout debugging output

      "@timestamp" => "2014-03-04T19:59:06.000Z",
         "message" => "# User@Host: logstash[logstash] @ localhost []n# Query_time: 5.310431  Lock_time: 0.029219 Rows_sent: 1  Rows_examined: 24575727nSET timestamp=1393963146;nselect count(*) from node join variable order by rand();n# Time: 140304 19:59:14",
        "@version" => "1",
            "tags" => [
                [0] "multiline"
            "type" => "mysql-slow",
            "host" => [
                [0] "cj",
                [1] "localhost"
            "path" => "/var/log/mysql/mysql-slow.log",
            "user" => "root",
              "ip" => "",
        "duration" => 5.310431,
       "lock_wait" => 0.029219,
         "results" => 1,
         "scanned" => 24575727,
    "date_matched" => "yes"

One final tip, if you go the route of having periodic SLEEP calls to flush logging as hinted at in the multiline codec comments you should be able to add the following at the top of your filter so they are dropped rather than imported into logstash.

[ruby]filter {
  # drop sleep events
  grok {
    match => { "message" => "SELECT SLEEP" }
    drop_if_match => true # appears ineffective, thus tag and conditional
    add_tag => [ "sleep_drop" ]
    tag_on_failure => [] # prevent default _grokparsefailure tag on real records
  if "sleep_drop" in [tags] {
    drop {}
  ... previous filter items here ...


  1. This recipe was created derived from info at and
  2. See the decode, do_next and do_previous handler functions which match, buffer and flush based on whether the pattern matched. If you attempt multiple patterns, for example to push the "# Time: ..." portion forward into the next record, when logstash encounters a line that does not match it triggers a flush of the record.
Direct from our brains to your inbox

CJ is a seasoned hands-on web developer and enterprise architect whose specialties include high-performance database-backed content management systems, large-scale Drupal systems, complex multi-system integrations, and continuous integration and delivery pipelines. He ensures our Engineering team excels at solving problems and applying the right technology, or none at all, to the job at hand.

Jump back to top