README.md

# STF584

ETSI STF584 - oneM2M architecture for AI

This repository contains the source code of ETSI STF584 Use Case 3. It uses the OpenMTC middleware compliant with the 
 OneM2M standard.
 
It consists of a a Common Service Function (CSF) which was included in a Middle Node Common Service
 Entity (CSE) and two applications, one for generating data and another for receiving this data after having been
 processed by our CSF.


## Installation and startup

Run the [build.sh](build.sh) shell script, this will download OpenMTC source code from its repository, apply a fix for some files 
 which have since been affected by dependencies code changes, generate OpenMTC's docker images and, finally, remove the
 files previously downloaded as they are no longer necessary.

After having the docker images generated, just run the [docker compose](docker-compose.yml) with `docker-compose up -d` and both the gateway
 and the 2 applications will be running in the background, each in separate containers. The logs can be viewed with the
 following commands:
 * Gateway: `docker-compose logs -f gateway`
 * Device Application: `docker-compose logs -f device`
 * NLP Application: `docker-compose logs -f nlp`

## Common Service Function (CSF)

The CSF is included in the [clean_text_handler](clean_text_handler) directory, which is mapped in the docker-compose to the correct 
 location inside the gateway container. The main code is inside the [\_\_init\_\_.py](clean_text_handler/__init__.py) 
 file, while the [utils.py](clean_text_handler/utils.py) file
 contains helper functions for the text clean process.

The configuration file for the gateway is the [config_gateway.json](config-gateway.json) file. Please note the following
 data required to enable the new CSF:
 
```json
{
  ...,
  "plugins": {
    ...,
    "openmtc_cse": [,
      ...,
      {
        "name": "CleanTextHandler",
        "package": "openmtc_cse.plugins.clean_text_handler",
        "disabled": false,
        "config": {
          "labels_input": [
            "to_clean"
          ],
          "labels_output": [
            "cleaned"
          ],
          "stopwords_langs": [
            "english"
          ]
        }
      }
    ]
  }
}
```

## Applications

### Device Application

The device application code is present in the [device_ae.py](Device_AE/src/device_ae/device_ae.py) file. It starts by 
 attempting to load a [csv file](Device_AE/example.csv) with example messages and proceeds to register the necessary 
 resources for the application to run. The data source can easily be edited at will inside the `load_data` method
 and triggering of publishing data can be edited inside the `get_data` method.

The configuration file for this application is present [here](Device_AE/config.json).

### NLP Application

The NLP application code is present in the [nlp_ae.py](NLP_AE/src/nlp_ae/nlp_ae.py) file. The [models](NLP_AE/models)
 and [tokenizers](NLP_AE/tokenizers) folders contain the pre trained model for occurrence detections. The application
 starts by loading the model and then registers the necessary resources for it to run. The classification and output 
 when a message is received is present inside the `handle_processed_text` method.

The configuration file for this application is present [here](NLP_AE/config.json).

## Credits
Ubiwhere \
Dataset: The project was based on the dataset provided in the Kaggle competition [Real or Not? NLP with Disaster Tweets](https://www.kaggle.com/c/nlp-getting-started)