webigator is a tool for filtering and aggregating data from the web. It was developed based on our experience after the great East Japan earthquake of 2011, where there was a large amount of useful information on the web that got drowned out by an even larger amount of irrelevant information. The tool itself performs a keyword search over text data (such as the Twitter stream), and then uses machine learning techniques to filter out irrelevant information. It comes with a web interface that can be used by multiple users at the same time, allowing for collaberative construction of lists of useful information based on these results. You can read in more detail in this paper:
A Framework and Tool for Collaborative Extraction of Reliable Information
Graham Neubig, Shinsuke Mori, Masahiro Mizkami. Workshop on Language Processing and Crisis Information (LPCI). 2013.
Source Code: @github
The code of webigator is distributed according to the Eclipse Public License v 1.0, and can be distributed freely according to this license.
You can see a (probably) working demo of Webigator here: webigator demo.
This section is only if you need to set up your own server. If you are using a server that someone else set up, you can skip this section.
The server works on Linux, and will probably work on Mac or Cygwin. Before running the program you need to install the boost and XML-RPC libraries. This can be done with your package manager, for example on Ubuntu:
sudo apt-get install libboost-all-dev libxmlrpc-c3-dev
You should also install the XML::RPC package for Perl:
Next, you can build the server:
autoreconf -i ./configure make
If this works properly, you should be able to run:
Instructions are under construction.
If you are interested in participating in the webigator project, particularly tackling any of the challenges below, please send an email to neubig at gmail dot com.
There are a bunch of possible improvements that would be quite interesting and useful: