Overview

  • Two projects during this internship
  • Support.mozilla.org (SUMO)
    • Bringing SUMO offline with IndexDB
  • Firefox crash stats (Socorro)
    • Automagically identify "explosive crashes"
  • Lots of fun all around!
Press 'p' to get presenter's notes

Taking SUMO offline

Basic idea

  • Allow users to look up help articles offline, such as on a Firefox OS device

Technical requirements

  • Must run entirely offline
    • Appcache and IndexedDB
  • Must run reasonably fast on a low powered device such as a Firefox OS phone
  • Mobile friendly
  • Translatable

Mandatory "I hate CSS" slide

I hate CSS, but whatever

App with AngularJS

Challenges

  • Downloading content
  • Fiddling and debugging IndexedDB problems
  • Offline search (more on that later)

Searching

  • Built on term frequency-inverse document frequency (TF-IDF)
    • Scores each word and its importance for each document
    • If a word appears frequently in one document but not overall, it will have a high score for that document
    • Math is all on Wikipedia
  • Index is a big map. Maps word to document id and score.
    • Generated on the server
    • Stored in IndexedDB
  • Loading from IndexedDB is a major bottle neck
    • Searching itself 1% - 5% of the total time to search

Demo time!

Current status

Time to switch gears

Crash stats here we come

source: http://www.flickr.com/photos/binaryape/4882162452/

Firefox crash stats

  • Named Socorro
  • Collects crash reports and display them in a convenient web app
  • Helps Firefox developers discover and eventually fix bugs

My task

  • Identify "explosive" crashes automatically

What is an "explosive" crash?

  • A lot of different definitions
    • Positive deviation from expected value
    • Large derivative values
  • Many different applications
    • Disease outbreak detection, network traffic anomaly, trending topics, stock markets, industrial process control

How can we find these signatures?

  • Defined as a big, positive deviation from some expected value.
  • Tried many different approaches
    • Prediction based approach
    • Time series analysis
    • Fancy math everywhere

Problems with time series approach

  • Not all signature time series exhibits the same behaviour
  • Cannot easily generalize a model developed by looking at a very small subset of the data
  • Only one known training example

A different approach

  • Attempted detection with the derivative formula
  • Used overly optimistic approach to find potential examples
    • Manually labeled everything
    • Was able to find many examples
  • Turns out simple is good enough
  • Found most manually labeled examples

Socorro is complicated

A lot of components to get familiar with

Implementing in Socorro

  • Starting to run low on time
  • Implemented a method to pick threshold automatically based on historic observations
  • Implemented as a cron job that runs every day

Current Status

Conclusion

  • Awesome experience
  • Working on open source is a big plus
  • Special thanks to my mentors: Mike Cooper and Chris Lonnen