To make sure we can train our models well, we often need data from our clients. These files can be very large, from 5GB to many terabytes. To safely and efficiently get these big files from our clients, we've set up a system where the files can be uploaded in smaller chunks. This helps us make sure the upload process goes smoothly.

1. Chunked Upload for Improved Reliability and Resumability

Rather than attempting to upload the entire file at once, which can lead to issues such as connection timeouts and data loss, we divide the file into smaller chunks. These chunks are typically around 5MB in size, which allows for a more stable and reliable upload process. Additionally, by breaking the file into smaller pieces, we are able to resume the upload if any issues arise during the process. This means that if the connection is lost or there is an error during the upload, the process can simply be resumed from where it left off rather than starting over from the beginning.

2. Secure Transfer Using Signed S3 URLs

To ensure that only authorized parties can access the uploaded files, we use signed S3 URLs to securely transfer the chunks to an S3 bucket. These URLs contain a unique signature that verifies the authenticity of the request and allows the chunk to be safely uploaded to the designated S3 bucket. This helps to protect the confidentiality and integrity of the uploaded files.

3. Error Recovery System to Minimize Data Loss

Despite the measures we have in place to ensure a smooth and reliable upload process, issues can still arise. To minimize the potential for data loss in these situations, we have implemented an error recovery system that runs a cleanup to recover any lost chunks at the end of an upload session. This helps to make the process error-proof and ensures that all the uploaded data is preserved.

4. User-Friendly Experience with Automatic Resuming and Local Storage

We understand that large file uploads can be time-consuming, potentially taking a full night or more to complete. To make the process seamless for our users, we have implemented several features to improve the experience. For example, rather than requiring the user to manually resume the upload process if there are any interruptions, the upload is automatically continued. Additionally, to ensure that the process can be resumed even after a refresh, browser restart, or computer restart, we store the file in a local database on the user's computer. This allows the upload to be resumed from where it left off even if the user's device is restarted or the browser is closed.

In summary, our large file upload system is designed to provide a secure and reliable experience for our users, while also ensuring the efficient and accurate handling of large files. By implementing a chunked upload system, secure transfer using signed S3 URLs, and an error recovery system, we are able to minimize the potential for issues and ensure a smooth and successful upload process.

Other blog you might like
The story of our name: Tekst.ai

Tekst.ai’s name reflects our commitment to multilingual AI and revolutionizing customer support. Dive into the story behind our Dutch-rooted, AI-first identity.

Meta's Release of LLama 4

Meta’s release of Llama 4 marks a major leap in open AI models, combining enhanced performance, multimodal capabilities, and open-source accessibility to deliver powerful AI for both developers and consumers.

Automatic ticket routing in Zendesk

Transform Zendesk ticket routing with AI. Automatically tag and assign tickets by topic, sentiment, and language for faster, more accurate customer support.

Get AI into your operations

Discover the impact of AI on your enterprise. We're here to help you get started.

Talk to our experts
Name Surname
Automation Engineer @ Tekst