Getting Started with StreamMill (for first time users)

 

Once you have downloaded the StreamMill Client:

-         Start the client

-         Click on the Library->View Library Menu

o       a dialog box with a list of libraries will pop-up.

-         Select demoTraffic library and click Ok.

-         Now, Click on the View/Modify->Monitor Output Menu.

o       This will pop-up another window (Output Buffer List), which lists the available buffers.

-         Click on the Monitor button for the first buffer in the list.

o       This will pop-up another window, which will show the contents of this buffer. You should see incoming data in this new window. In the worst case, it can take about half minute for data to show up. This live data coming from http://www.dot.ca.gov/ (No endorsement intended).

-         Close the Output Buffer List window.

 

Details of the Traffic demo

This is a very simple demo that illustrates a few functionalities of the StreamMill system. The demo takes sensor data from a traffic site, feeds it into a stream with a timestamp and then poses a simple continuous query on this data. This is stored as library demoTraffic in the StreamMill system. Rest of this example will use the words library and public user interchangeably, both of them mean the same thing.

 

Stream(s):

stream traffic(station_id int, speed int, time timestamp) source 'iomod_traffic.so';

 

The syntax is similar to Atlas syntax where we would use table instead of stream. The traffic stream has three attributes station_id, speed, and time. The source construct is very important here. The source is the file that feeds data into the stream i.e. the data source. StreamMill also allows inbuilt data sources, please refer to the StreamMill manual for more details.

 

Data Source(s):

The data sources are defined in a dataSourceName.cc file.For the running example the name of the file would be iomod_traffic.cc.The source for this sample data source is provided here. Any data source file should compile correctly and have following two functions.

extern "C" int getTuple(buffer* dest);

extern "C" int closeConnection();

 

Also, remember to include these extern definitions in the data source file. If you plan to write more data sources then please carefully look at following functions in the sample data source file:

-         getTuple() Reads port 5439 for incoming data, whenever any data is read calls putDataInBuffer, also returns the correct return codes

-         closeConnection() When the Data Source is deleted this function is called, you can do clean ups here, e.g. closing ports/files etc.

-         putDataInBuffer() Given a chunk of data parse it, for the given example parses a string by endline and calls processMessage for each tuple

-         processMessage() Given a line parses it into a tuple, if you define other streams then you should change function this accordingly.

 

NOTE: Don't try to use the same data source file for multiple streams, it will not work.

 

The getTuple() function for this particular data source reads at port 5439. Therefore, we have to be able to send some data to this port. We are already sending data to this port and therefore you can monitor the incoming data. We use the ss.pl script to send the data (The script was originally borrowed from the Telegraph project and modified for our needs). This script reads data from http://www.dot.ca.gov/ and sends it to specified port.

 

Now that we have the data coming into the Streams, we want to be able to query it.

 

Query:

select station_id, speed, time from traffic;

 

This is a very simple query on the stream. You can define more complex queries and aggregates on this stream.

 

 

NOTE: Please Read

To enable multiple users and avoid object name clashes, you have to use $ to refer to objects of other users (other public users, you can be a private user), i.e. you have to write the query as follows:

select station_id, speed, time from demoTraffic$traffic;

 

demoTraffic is a public user in the system and you can find the above definitions by viewing this library. To get acquainted with the system you should try to do the following:

-         Define queries on the demoTraffic$traffic stream

-         Activate the queries

-         View the stdout buffers resulting from the queries

-         Feed the data to the stream using the ss.pl file.

 

In general, to refer to public users object you have to use publicUserName$objectName.