TU Delft
 
MultiProbe
The Data Set
Parallel and Distributed Systems
EWI PDS MultiProbeThe Data Set
 
 
 
 
 
 
 
 
Quick links
overview active-start passive-start


2 types of measurements, 20 types of files
what did we measure?

In the MultiProbe framework page, we have defined two types of measurements: active-start measurements, and passive-start measurements. Each type of measurements yields a number of files, with specific formats. We have used 20 types of files; their structure is detailed in this web page.
 


File formats for the active-start measurements
detailed description
Quick links to active-start measurements' file types:
A-1 | A-2 | A-3

During the active-start measurements, we have successfully tracked the top 2,000 files from Pirate Bay (for more details on this read the measurements page). For each file, we stored the following types of files:

  1. .smallres files [File type A-1]

    Info:
    Used to store information about the contacted peers.

    Format:

    TimeStamp<sp>IP<sp>Port<sp>ID<sp>Chunks

    where:
    1. TimeStamp is the time stamp of the observation;
    2. IP is the anonymized IP address of the observed peer;
    3. Port is the port on which the observed peer was contacted;
    4. ID is the network-specific identifier of the observed peer;
    5. Chunks is the number of chunks that the observed user reported to have.

     
  2. .error files [File type A-2]

    Info:
    Used to store information about the errors in contacting peers.

    Format:

    TimeStamp<sp>IP<sp>Port<sp>ID<sp>Error reason

    where:
    1. TimeStamp is the time stamp of the observation;
    2. IP is the anonymized IP address of the observed peer;
    3. Port is the port on which the observed peer was contacted;
    4. ID is the network-specific identifier of the observed peer;
    5. Error reason is the reason for the connection failure.
  3. .active files [File type A-3]

    Info:
    Used to store internal measurement information about the contacted peers.
     

top

File formats for the passive-start measurements
detailed description
Quick links to passive-start measurements' file types:
P-1 | P-2 | P-3 | P-4 | P-5
P-6 | P-7 | P-8 | P-9 | P-10
P-11 | P-12 | P-13 | P-14 | P-15
P-16 | P-17

During the passive-start measurements, we have successfully tracked the top 750 files from Pirate Bay (for more details on this read the measurements page). Files were tracked from separate nodes in PlanetLab. For each PlanetLab node, we created a number of timestamped archives, each with the following structure:

  1. Directory base_data/
    Info:
    Used to store various files regarding general monitoring process

    1. trackinginfo files [File type P-1]

      Info:
      Used to store information about the peers contacting the passive-start infrastructure.

      Format:

      TimeStamp <tab> <IP> <tab> BatchNo <tab> FileName <tab> IP <tab> Port <tab> Unique
      or
      TimeStamp <tab> <E-IP> <tab> BatchNo <tab> RootIP


      where:
      1. TimeStamp is the time stamp of the observation;
      2. <IP> is a constant string (IP), defining that the line reports information about an IP;
      3. BatchNo is an integer identifiying the set of measurements to which the IP belongs; unique per machine;
      4. FileName is the monitored (torrent) file to which this information refers to;
      5. IP is the anonymized IP address of the observed peer;
      6. Port is the port used by the contacting peer;
      7. Unique reports whether the peer was previously tracked (possible values are Y and N);
      8. <E-IP> is a constant string (E-IP), defining that the line reports information about a peer tracking;
      9. RootIP shows the IP to which the multi-source traceroutes were started (the tracking target), and should match the IP addresses identified in the previous <IP> lines.

       
    2. trackingstats files [File type P-2]

      Info:
      Used to store information about the tracking process.

      Format:

      TimeStamp <sp> <BATCH> <sp> BatchNo <sp> NBatchIPs <sp> NUniqueBatchIPs
      or
      TimeStamp <sp> <EDGES> <sp> BatchNo <sp> NoDestinations <sp> NPackets


      where:
      1. TimeStamp is the time stamp of the recording;
      2. <BATCH> is a constant string (BATCH), defining that the line reports information about a batch of IP addresses to be tracked;
      3. BatchNo is an integer identifiying the set of measurements to which the recording belongs; unique per machine;
      4. NBatchIPs is the number of IP addresses in this batch;
      5. NUniqueBatchIPs is the number of IP addresses to be monitored (torrent) in this batch;
      6. <EDGES> is a constant string (EDGES), defining that the line reports information about the tracking process related to a certain batch of IP addresses;
      7. NoDestinations shows the number of IP addresses to which the multi-source traceroutes were started, and should match the number identified in the previous <BATCH> line;
      8. NoPackets shows the number of packets used to track the batch of IP addresses specified in the current line.

    3. destinations.dat files [File type P-3]

      Info:
      Used to store the tracked IP addresses.

      Format:

      IP

      One IP address per line.

    4. edges_cache.dat files [File type P-4]

      Info:
      Used to store visited paths.

      Format:

      ExtIP1<sp> ExtIP2

      Two IP addresses per line, except that some IP addresses may be concatenated with extra '+' signs, which signal unresponsive hops in the traceroute process (see Scriptroute's Reverse Path Tree script for a more detailed description of the '+' sign's significance).

    5. sources.dat files [File type P-5]

      Info:
      Used to store the IP addresses of the PlanetLab multi-traceroute sources. Only a limited number of these sources were used in a single multi-source traceroute (we did not trace from all sources to all peers, but from a fixe number of randomly selected sources to all peers).

      Format:

      IP

      One IP address per line.
       

     
  2. Directory torrent_data/
    Info:
    Used to store various files regarding per-file monitoring process.

    1. buffer.err, buffer.res files [File types P-6, P-7]

      Info:
      Used to store information about the IP addresses buffer [debugging information].
       
    2. Torrent<ID>-LP.err,Torrent<ID>-LP.res files [File types P-8, P-9]

      Info:
      Used to store information about tracking a file (<ID> is the tracked file's ID, given as an 8-digit number, e.g., 0000000).

      Format:

      The err file should be empty; otherwise, it contains a detailed error report.
      The res file has the following content:

      # Comment line
      or
      TimeStamp <sp> IP <sp> Port <sp> FileName <sp> <[connection]>


      where:
      1. TimeStamp is the time stamp of the recording;
      2. IP is the IP addresses of the contacting peer;
      3. Port is the TCP port on which the contacting peer listens;
      4. FileName is the name of the torrent for which this connection was issued;
      5. <[connection]> is a constant string ([connection]), defining that the line reports information about an incoming connection.

    3. tracker.err,tracker.res files [File types P-10, P-11]

      Info:
      Used to store debugging information. Should be empty.


     
  3. Directory batch<ID>data/
    Info:
    Used to store per-batch information regarding the tracking process. <ID> represents the batch identification number, unique per PlanetLab node.

    1. ipaddresses file [File type P-12]

      Info:
      Used to store information about the IP addresses to be tracked in this batch.

      Format:

      IP

      One IP per line (typically <100).
       
    2. measure_edges.err,measure_edges.out files [File types P-13, P-14]

      Info:
      Used to store information about multi-source tracerouting process. Useful only for the authors of Scriptroute.

    3. new_edges_cache.dat file [File type P-15]

      Info:
      Used to store information about the edges newly discovered in the multi-source tracerouting process. Same structure as file type P-4.

    4. Sub-directory paths/

      Info:
      Used to store detailed information about the multi-source traceroutes.

      1. path_<SrcIP>_<DstIP> files [File types P-16]

        Info:
        Used to store detailed information about the SrcIP path to DstIP hops.

        Format:

        HopNo <sp> IP <sp> Time <ms>
        or
        <packetcount:> <sp> NoPackets
        or
        Error

        where:
        1. HopNo is the number of hop observed in the path;
        2. IP is the IP addresses of the identified path in the hop, or unresponsive, if the hop was unresponsive;
        3. Time is the latency of the packets towards the identified hop;
        4. NoPackets is the number of packets used to identify all the path's hops.

      2. sources-<DstIP>.dat files [File types P-17]

        Info:
        Used to store information about the source IP addresses in the multi-source traceroute towards a destination IP address (<DstIP>).

        Format:

        IP

        One IP address per line.
         


 
     

Last modified: Thu, 13 October, 2005 8:28 PM
The newest version of this page can be found at: http://multiprobe.ewi.tudelft.nl/dataset.html
Copyright © 1998-2005 Alexandru Iosup. All Rights Reserved.
And the famous NedStat counter: