Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# SD Synthetic Data Generator
The SD Synthetic Data Generator allows to create a set of CSEs with several Semantic Descriptors registered. These Semantic Descriptors contain instances and data according to the ASD ontology:
![ASD Ontology](./img/ASD-Ontology.jpg)
The goal of the SD Synthetic Data Generator is to produce an experimental environment for the SD simulation task, specifically for the DSRD Simulator.
## Quick start
In order to use the SD Synthetic Data Generator, the following steps must be followed:
1. Configure the SD Synthetic Data Generator modifying the *config.json* file
2. Run the SD Synthetic Data Generator using the following command:
`````
%$> bash run.sh
`````
4. A file with the data will appear in the directory of the SD Synthetic Data Generator
5. Load the data in the DSRD simulator
### Configuring the SD Synthetic Data Generator
The SD Synthetic Data Generator requires to be provided with a JSON configuration file that must have the following keys:
* **"cses"** (integer) its value represents the number of CSEs that will be created
* **"devices"** (integer) its value represents the number of devices that will be created. The maximum value must be 101 (inclusive)
* **"deviceToCSE"** (double) its value represents the probability with which a CSE will have registered each created device
* **"buildings"** (integer) its value represents the number of buildings that will be created. The maximum value must be 99 (inclusive)
* **"deviceToBuilding"** (double) its value represents the probability with which a device will be assigned to a building randomly
* **"output"** (string) its value is the file where the dataset will be written. Consider that the file must have the .nq extension
A sample instantiation of this configuration file can be the following:
````
{
"cses" : 1000,
"devices" : 100,
"deviceToCSE" : 0.20,
"buildings" : 5,
"deviceToBuilding" : 0.9,
"output" : "SD-dataset.nq"
}
````
### Testing the generated data
After generating a sample dataset, load the dataset into a triple store that supports named graphs. Each CSE is automatically asigned to a named graph by the syntetic semantic descriptor generator. After loading the dataset, CSEs can be queried individually by limiting the query to a specific named graph (i.e., CSE), or CSE can be queried by groups issuing the query to several named graphs.
To ease the testing, in the current directory the file [queries.zip]() contains a set of queries that must return results for the generated synthetic data. The queries can be directly pasted into a SPARQL interface or sent through a REST API, depending on the triple store implementation.