-
- Downloads
restructured + cleansed code
Showing
- data/c4_realnewslike_big/c4-train.00000-of-00512.json.gz 0 additions, 0 deletionsdata/c4_realnewslike_big/c4-train.00000-of-00512.json.gz
- data/c4_realnewslike_big/c4-train.00001-of-00512.json.gz 0 additions, 0 deletionsdata/c4_realnewslike_big/c4-train.00001-of-00512.json.gz
- data/processed/processed_0.jsonl 0 additions, 0 deletionsdata/processed/processed_0.jsonl
- data/processed/processed_1.jsonl 0 additions, 0 deletionsdata/processed/processed_1.jsonl
- data/processed/shard_0.jsonl 0 additions, 0 deletionsdata/processed/shard_0.jsonl
- data/processed/shard_1.jsonl 0 additions, 0 deletionsdata/processed/shard_1.jsonl
- data_loader.py 1 addition, 96 deletionsdata_loader.py
- main.py 53 additions, 87 deletionsmain.py
- preprocess_data.py 0 additions, 98 deletionspreprocess_data.py
- shard_processor.py 25 additions, 37 deletionsshard_processor.py
- text_preprocessor.py 0 additions, 30 deletionstext_preprocessor.py
Loading
Please register or sign in to comment