README.md 2.62 KB
Newer Older
Andrea Cimmino's avatar
Andrea Cimmino committed
# SD Synthetic Data Generator


The SD Synthetic Data Generator allows to create a set of CSEs with several Semantic Descriptors registered. These Semantic Descriptors contain instances and data according to the ASD ontology: 
![ASD Ontology](./img/ASD-Ontology.jpg)

The goal of the SD Synthetic Data Generator is to produce an experimental environment for the SD simulation task, specifically for the DSRD Simulator. 

## Quick start
In order to use the SD Synthetic Data Generator, the following steps must be followed:

1. Configure the SD Synthetic Data Generator modifying the *config.json* file
2. Run the SD Synthetic Data Generator using the following command:
`````
%$> bash run.sh
`````
4. A file with the data will appear in the directory of the SD Synthetic Data Generator
5. Load the data in the DSRD simulator

### Configuring the SD Synthetic Data Generator

The SD Synthetic Data Generator requires to be provided with a JSON configuration file that must have the following keys:

 * **"cses"** (integer) its value represents the number of CSEs that will be created
 * **"devices"** (integer) its value represents the number of devices that will be created. The maximum value must be 101 (inclusive)
 * **"deviceToCSE"** (double) its value represents the probability with which a CSE will have registered each created device 
 * **"buildings"** (integer) its value represents the number of buildings that will be created. The maximum value must be 99 (inclusive)
 * **"deviceToBuilding"** (double) its value represents the probability with which a device will be assigned to a building randomly
 * **"output"** (string) its value is the file where the dataset will be written. Consider that the file must have the .nq extension

A sample instantiation of this configuration file can be the following:

````
{
	"cses" : 1000,
	"devices" : 100, 
	"deviceToCSE" : 0.20,
	"buildings" : 5,
	"deviceToBuilding" : 0.9,
	"output" : "SD-dataset.nq"
}
Andrea Cimmino's avatar
Andrea Cimmino committed
````

### Testing the generated data

After generating a sample dataset, load the dataset into a triple store that supports named graphs. Each CSE is automatically asigned to a named graph by the syntetic semantic descriptor generator. After loading the dataset,  CSEs can be queried individually by limiting the query to a specific named graph (i.e., CSE), or CSE can be queried by groups issuing the query to several named graphs.
To ease the testing, in the current directory the file [queries.zip]() contains a set of queries that must return results for the generated synthetic data. The queries can be directly pasted into a SPARQL interface or sent through a REST API, depending on the triple store implementation.