4 tips for handling large file upload

To make sure we can train our models well, we often need data from our clients. These files can be very large, from 5GB to many terabytes. To safely and efficiently get these big files from our clients, we've set up a system where the files can be uploaded in smaller chunks. This helps us make sure the upload process goes smoothly.

1. Chunked Upload for Improved Reliability and Resumability

Rather than attempting to upload the entire file at once, which can lead to issues such as connection timeouts and data loss, we divide the file into smaller chunks. These chunks are typically around 5MB in size, which allows for a more stable and reliable upload process. Additionally, by breaking the file into smaller pieces, we are able to resume the upload if any issues arise during the process. This means that if the connection is lost or there is an error during the upload, the process can simply be resumed from where it left off rather than starting over from the beginning.

2. Secure Transfer Using Signed S3 URLs

To ensure that only authorized parties can access the uploaded files, we use signed S3 URLs to securely transfer the chunks to an S3 bucket. These URLs contain a unique signature that verifies the authenticity of the request and allows the chunk to be safely uploaded to the designated S3 bucket. This helps to protect the confidentiality and integrity of the uploaded files.

3. Error Recovery System to Minimize Data Loss

Despite the measures we have in place to ensure a smooth and reliable upload process, issues can still arise. To minimize the potential for data loss in these situations, we have implemented an error recovery system that runs a cleanup to recover any lost chunks at the end of an upload session. This helps to make the process error-proof and ensures that all the uploaded data is preserved.

4. User-Friendly Experience with Automatic Resuming and Local Storage

We understand that large file uploads can be time-consuming, potentially taking a full night or more to complete. To make the process seamless for our users, we have implemented several features to improve the experience. For example, rather than requiring the user to manually resume the upload process if there are any interruptions, the upload is automatically continued. Additionally, to ensure that the process can be resumed even after a refresh, browser restart, or computer restart, we store the file in a local database on the user's computer. This allows the upload to be resumed from where it left off even if the user's device is restarted or the browser is closed.

In summary, our large file upload system is designed to provide a secure and reliable experience for our users, while also ensuring the efficient and accurate handling of large files. By implementing a chunked upload system, secure transfer using signed S3 URLs, and an error recovery system, we are able to minimize the potential for issues and ensure a smooth and successful upload process.

‍

Other blog you might like

Why Process Intelligence Needs to Start in the Inbox

Most process mining tools miss where real work begins: the inbox. Discover how Tekst uses conversation data, not just system logs, to reveal and automate the true flow of business.

What is smart email routing?

Streamline customer support with smart email routing. Use AI to automate dispatching, reduce response times, improve productivity, and enhance customer satisfaction.

Tekst at Hannover Messe 2025.

Last week, the team at Tekst had the opportunity to showcase their cutting-edge AI solution at HANNOVER MESSE 2025...

Get AI into your operations

Discover the impact of AI on your enterprise. We're here to help you get started.

Name Surname

Automation Engineer @ Tekst