README.md 2.75 KB
Newer Older
Andrea Cimmino's avatar
Andrea Cimmino committed
# SD Synthetic Data Generator


The SD Synthetic Data Generator allows to create a set of CSEs with several Semantic Descriptors registered. These Semantic Descriptors contain instances and data according to the ASD ontology: 
![ASD Ontology](./img/ASD-Ontology.jpg)

The goal of the SD Synthetic Data Generator is to produce an experimental environment for the SD simulation task, specifically for the DSRD Simulator. 

## Quick start
In order to use the SD Synthetic Data Generator, the following steps must be followed:

1. Configure the SD Synthetic Data Generator modifying the *config.json* file
2. Run the SD Synthetic Data Generator using the following command:
`````
%$> bash run.sh
`````
4. A file with the data will appear in the directory of the SD Synthetic Data Generator
Andrea Cimmino's avatar
Andrea Cimmino committed
5. Load the data in any triple store supporting named graphs, which will act as Distributed Semantic Resource Directory thanks to the way in which data has been generated.
Andrea Cimmino's avatar
Andrea Cimmino committed

### Configuring the SD Synthetic Data Generator

The SD Synthetic Data Generator requires to be provided with a JSON configuration file that must have the following keys:

 * **"cses"** (integer) its value represents the number of CSEs that will be created
 * **"devices"** (integer) its value represents the number of devices that will be created. The maximum value must be 101 (inclusive)
 * **"deviceToCSE"** (double) its value represents the probability with which a CSE will have registered each created device 
 * **"buildings"** (integer) its value represents the number of buildings that will be created. The maximum value must be 99 (inclusive)
 * **"deviceToBuilding"** (double) its value represents the probability with which a device will be assigned to a building randomly
 * **"output"** (string) its value is the file where the dataset will be written. Consider that the file must have the .nq extension

A sample instantiation of this configuration file can be the following:

````
{
	"cses" : 1000,
	"devices" : 100, 
	"deviceToCSE" : 0.20,
	"buildings" : 5,
	"deviceToBuilding" : 0.9,
	"output" : "SD-dataset.nq"
}
Andrea Cimmino's avatar
Andrea Cimmino committed
````

### Testing the generated data

After generating a sample dataset, load the dataset into a triple store that supports named graphs. Each CSE is automatically asigned to a named graph by the syntetic semantic descriptor generator. After loading the dataset,  CSEs can be queried individually by limiting the query to a specific named graph (i.e., CSE), or CSE can be queried by groups issuing the query to several named graphs.
To ease the testing, in the current directory the file [queries.zip]() contains a set of queries that must return results for the generated synthetic data. The queries can be directly pasted into a SPARQL interface or sent through a REST API, depending on the triple store implementation.