James Dabbs

TracePoint by Example

30 Nov 2016

TracePoint is a uniquely powerful sledgehammer for introspecting Ruby programs. In this post, we’ll take a look at what tracing can do by sketching a couple utilities.

The basic idea behind TracePoint is registering low-level hooks in the Ruby VM to be able to react to events throughout our program. Possible “hooks” include “every time a thread starts”, “every time a block is called”, “every new line” and many more, which allow you to implement some surprisingly wide-reaching features.

I haven’t profiled any of the following (but you could build a basic profiler with TracePoint!), and would have some concerns about over-tracing or robust error handling for production code, but these sorts of traces can be great for exploring how a program executes in development, as the following examples suggest -

Watching for monkeypatching

Monkeypatching is the act of reopening an existing class to add your own functionality. While it’s neat and can be helpful for small projects, it’s often considered a bad idea as it can cause some confusing collisions.

We can use TracePoint’s class-start and -end events to watch for changes in defined methods, and alert (or raise) if new methods are added -

# Not required, but `colorize` is a nice way to prettify our
#   output. It adds e.g. `red` and `light_blue` methods to
#   String (amusingly, a monkeypatch)
require 'colorize'

module Monkeys
  Methods = {}

  # This registers a trace point hook
  # Note that this trace is _not_ active
  #   until we explicitly enable it
  Start = TracePoint.new :class do |t|
    # This hook fires any time a class or module is opened
    #   and `t.self` will be that class or module
    # Let's just focus on Classes for now
    next unless t.self.is_a?(Class)

    # Store the names of methods that the class defines
    Methods[t.self] = t.self.instance_methods
  end

  # This hook fires when a module or class hits its `end`
  #   and sets `t.self` similarly
  End = TracePoint.new :end do |t|
    next unless t.self.is_a?(Class)

    # Compare the methods that the class currently has
    #   with the ones it had when we opened it
    old     = Methods[t.self]
    current = t.self.instance_methods
    new     = current - old

    if old.any? && new.any? # Ignore the first opening
      warn "#{t.self.to_s.light_blue} has added"
      new.each do |method|
        warn " #{'*'.light_black} #{method}"
      end
      warn "#{'in'.light_black} #{t.path}:#{t.lineno}"
      warn ""
    end
  end

  # This is how we start and stop the previous hooks
  def self.watch
    Start.enable
    End.enable
  end
  def self.stop
    Start.disable
    End.disable
  end
end

Example useage:

Monkeys.watch

class String
  def is_palindrome?
    self == reverse
  end
end

Monkeys.stop

which produces

String has added
 * is_palindrome?
in <file>

Tracing Exception Handling

This is the example that inspired this post. I wanted to better understand where Rails’ exception backtrace logging was coming from, but - unsurprisingly - the backtrace itself wasn’t much help. So how do you answer a question like “where does this error go after it gets raised” in a system you’re not as familiar with? Well, you can trace and examine where methods on that error object are called.

class Trace
  class Calls
    # `item` is the object that we want to spy on
    def initialize item, methods: nil
      @values  = {}
      @files   = {}
      @callers = {}

      @trace = TracePoint.new :return do |t|
        # This hook fires every time a method returns and
        #   `t.self` is the object which defines the method
        # We only care about the item we're watching
        next unless item == t.self

        # We might only care about a known subset of methods
        next unless methods.nil? || methods.include?(t.method_id)

        # Record all of the return values for a given method,
        #   along with how many times the given value was
        #   returned
        @values[t.method_id] ||= {}
        @values[t.method_id][t.return_value] ||= 0
        @values[t.method_id][t.return_value]  += 1

        # We can get the binding at the point of call
        #   and use that to `eval` any expression we
        #   want.
        # In this case, we want to examine the call
        #   stack and record which files and lines
        #   show up often
        t.binding.eval('caller').each do |line|
          file = line.split(':').first
          @files[file] ||= 0
          @files[file]  += 1

          @callers[line] ||= 0
          @callers[line]  += 1
        end
      end

      @trace.enable
    end

    def disable
      @trace.disable
    end

    # Show which files contain code which call into the
    #   traced object, sorted by frequency of file
    def files
      @files.sort_by { |file, hits| -hits }.to_h
    end

    # Similarly, but by frequency of the exact call site
    #   (including line number)
    def callers
      @callers.sort_by { |file, hits| -hits }.to_h
    end

    # Which methods on this object were used (during this watch)?
    def called_methods
      @values.keys
    end

    # What values were returned by a given method (and how often)
    def values method
      @values[method]
    end
  end

  # Some utilities for starting a call trace and storing the results
  #   for later inspection
  Traces = {}

  def self.watch label, item, methods: nil
    Traces[label] = Trace::Calls.new item, methods: methods
    item
  end

  def self.stop label
    Traces.delete(label).tap(&:disable)
  end
end

You could use this to explore error handling in a Rails app by adding actions like

class MyController < ApplicationController
  def error
    raise Trace.watch "error", RuntimeError.new("whoops")
  end

  def inspect
    calls = Trace.stop "error"
    binding.pry # to inspect
  end
end

In this particular case, we see something like

[1] pry(#<MyController>)> calls.files.first 5
=> [["/Users/james/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/actionpack-5.0.0.1/lib/action_dispatch/middleware/exception_wrapper.rb", 1816],
 ["/Users/james/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/actionpack-5.0.0.1/lib/action_dispatch/middleware/debug_exceptions.rb", 1783],
 ["/Users/james/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/railties-5.0.0.1/lib/rails/rack/logger.rb", 1074],
 ["/Users/james/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activesupport-5.0.0.1/lib/active_support/tagged_logging.rb", 1074],
 ["/Users/james/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/puma-3.6.2/lib/puma/server.rb", 1074]]

and I know exactly where to go look next.

Enforcing Idempotence

An idempotent method is one that always returns the same value, no matter how many times you call it, and is generally a Good Idea™. Referentially transparent methods have to be idempotent, and methods on value objects should generally be idempotent as well. We can use TracePoint to enforce that (e.g. by raising an error whenever that contract is violated in development).

module Idempotizer
  Traces = {}

  def self.included klass
    ret = nil
    Traces[klass] = TracePoint.new :return do |t|
      next unless t.self.is_a? klass

      method = t.self.method t.method_id
      # Methods which take input parameters generally produce different return
      #   values with different inputs. Regrettably, TracePoint doesn't seem
      #   to give us a reasonable way to get the inputs to this function call.
      next unless method.arity == 0

      # Somewhat fiddly, but if we `raise`, we'll end up triggering another `return`
      # event but with `return_value == nil` and we want the earlier return value
      ret   = t.return_value if t.return_value

      # Call the method again (without traces) and compare the values
      check = t.disable { method.call }
      if check != ret
        raise "#{t.self}##{t.method_id} is not idempotent! #{check} != #{ret}"
      end
    end
    Traces[klass].enable
  end
end

class Demo
  include Idempotizer

  # This will now raise an error the second time it's called
  def now
    Time.now.to_f
  end

  def sum
    2 + 2
  end
end

Other Ideas?

Hopefully you get the gist of what TracePoint can do, and how to use it. These examples are just a jumping off point though - if you’ve got another nice hack, let me know.