Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
scalable-c4-preprocessing
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Requirements
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Deploy
Releases
Package registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Code review analytics
Issue analytics
Insights
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Joel Dag
scalable-c4-preprocessing
Repository graph
Repository graph
You can move around the graph by using the arrow keys.
5d71fe60fdcbf972d8d208f2baccd1430d289dc0
Select Git revision
Branches
1
assignment-template
default
protected
1 result
Begin with the selected commit
Created with Raphaël 2.2.0
16
Mar
15
12
16
Feb
13
minor readme adaption
assignment-temp…
assignment-template
removed setup.sh
added commetns
adapted README.md
final adpatons
minor changes, added TODOs
adapted readme + shell scripts + requirements
restructured + cleansed code
Add master file list for consistent input distribution across processes
added pipline performance measurement logs
working pipline for parallel processing, added modular processing logic
efficient multiprocess data loader, parallel streaming input files line by line
initial prepocessing logic for cc_news multi-process preprocessing with sharding + lock
updating sample readme
updating sample readme
updating comments
updating preprocess_dataset.sh
adding assignment template files
Loading