This repo contains code for the paper:

** Is Everything in Order? A Simple Way to Order Sentences **

First, create the dataset splits and put them in `./data` folder.

### Train the ReBART model:

To train the ReBART model run the following command:

```
bash finetune_bart.sh
```
You can specify the hyper-parameters inside the bash script.

### Generate

To generate the outputs (position markers) using the trained model, run the following commands:

```
export DATA_DIR="data/arxiv-abs"
export MODEL_PATH="outputs/reorder_exp/bart-large_arxiv"
python source/generate.py --in_file $DATA_DIR/test.jsonl --out_file $MODEL_PATH/test_bart_greedy.jsonl --model_name_or_path $MODEL_PATH --beams 1 --max_length 40 --task index_with_sep --device 0
```

### Evaluate

To evaluate the model and get the performance metrics, run:

```
python eval/evaluation.py --output_path $MODEL_PATH/test_bart_greedy.jsonl
```



#### Implementation Details:
* Number of BART-L parameters: 400M
* Runtime per epoch for each dataset:
    * ROCStories: 1 hour
    * NIPS: 15 minutes
    * AAN: 45 minutes
    * SIND: 1.5 hours
    * arxiv: 24 hours
    * NSF: 4-5 hours
    * Wiki Movie: 1.5 hours

Please find the links for the various datasets:
    * arXiv - https://drive.google.com/drive/folders/0B-mnK8kniGAiNVB6WTQ4bmdyamc
    * Wiki Movie Plots - https://www.kaggle.com/jrobischon/wikipedia-movie-plots
    * SIND - http://visionandlanguage.net/VIST/dataset.html
    * NSF - https://archive.ics.uci.edu/ml/datasets/NSF+Research+Award+Abstracts+1990-2003
    * ROC - https://www.cs.rochester.edu/nlp/rocstories/
    * NeurIPS - https://www.kaggle.com/benhamner/nips-papers
    * AAN - https://github.com/EagleW/ACL_titles_abstracts_dataset