Data guidelines

How to make the transfer of your data to Chattermill as seamless as possible 🚀

Billie Bradley avatar
Written by Billie Bradley
Updated over a week ago

Chattermill strives to uphold our mission to take customer centricity to the next level by providing the best experience to our customers. To ensure the best experience possible, It is important that data and file formatting are kept consistent at all times to avoid data flow interruptions or inconsistencies. 

We have included the most common principles when it comes to working with data at Chattermill to ensure robust and accurate data pipeline and insights.

Example CSV Sheet

For a hands-on example of how to format a spreadsheet please visit this google doc: https://docs.google.com/spreadsheets/d/1682BmBHrnvKnM1EdlPH8lkziysBpn9awYt0MhFUS8A0/edit?ts=5f3fabdf#gid=0

Data CSV Uploading Instructions

Data for a single dataset should be sent in a single CSV 

  • In order to provide timely insights, a single CSV is ingested to upload data so the process can be automated

  • Please join any data sources/tables on your side and send us one single CSV containing the combined data. This will allow us to automate the process in future

  • Please ensure that individual files are no larger than 300MB

  • Kindly do not upload zipped files to your AWS S3 bucket, as Chattermill ingests files directly from S3

Ensure all uploads are in CSV format and are UTF-8 encoded  

  • Character encoding tells the computer how to translate 1's and 0's into human-readable characters. UTF-8 is the most common method used by computers to translate from bits (1's and 0's), to readable characters

  • To check for incorrect character encoding, please open the CSV in a text editor (such as Notepad or Sublime Text), and check that special characters such as ', &, é and Ü are displayed correctly

  • If a CSV is sent with incorrect encoding, some of the characters may not be displayed correctly. This makes the text difficult to read and can affect the ability to detect topic and sentiment of your comments

  • A common cause of incorrect encoding is a user opening a CSV in a tool such as Microsoft Excel, making changes, and overwriting the original file 

Consistent, descriptive file names 📝

All file names should include:

  • Company Name

  • Data Source

  • Data Type

  • Timestamp / Date / Month

Correct example: chattermill_survey_nps_2019_11_01.csv
Incorrect example: file_291283.csv

Please maintain a consistent naming convention for all file uploads. If a file name is changed it could result in a missed upload.


Headers should only contain alphanumeric characters, be unique, descriptive and remain consistent across uploads

  • Each header must be distinct from all other headers in the upload, otherwise the data will not be uploaded

  • Please ensure headers are descriptive so it is clear how the data should be displayed in the app. If possible please send a codebook including descriptions of each header

  • The header should be alphanumeric, descriptive, and in snake_case: 

Correct example: product_purchase_count
Incorrect example: product purchase count

  • Please do not include apostrophes in the header row

  • Our CSV imports are based on mapping header strings to certain fields which you can see in the app. If a column header changes, we will not be able to upload the data in that column

Correct headers:

Incorrect headers:

All responses must have a Unique Response ID

  • Please clearly identify the Unique Response ID field

  • This would preferably be the index/primary key for each response from your database or unique ID stored by your survey provider

  • This Response ID must be unique across the entire history of the dataset including previous files

  • Please check for any duplicates of this Response ID within your files before sending to us 

Date formatting must be consistent 📅

  • All dates within a CSV must use the same date format. This format should remain the same for all subsequent CSVs

  • Ideally, all dates should be in YYYY-MM-DD hh:mm:ss

  • If you are unable to send in the above format, please let us know so we can discuss the most appropriate alternative. Likewise, please notify us in advance if you are planning to make any changes to the date format

  • If the date format was to change, the date field may be stored incorrectly, or the response may not be uploaded at all

Correct format: 2019-10-11 13:14:15
Incorrect format: 0/11/19 01:14:15 PM

Comment columns are clearly identified 💬

  • Please clearly identify all comment fields you wish to include and the question they are related to

Correct formatting:

  • Where multiple comment columns exist in your data please identify all question/comment pairs you wish to include from the data

Score columns are clearly identified 

  • Scores must be in a numeric format

  • Where multiple scores exist in your data please identify all score/comment pairs you wish to include 

  • Correct formatting:

Incorrect formatting:

  • Scores are required for all NPS and CSAT responses. We also recommend providing scores for other data types where relevant (e.g. an app review may come with a score from 1-5), to make full use of the platform

In general, once we have a format for CSV uploads, all future CSV files should be consistent to ensure continuity in the data pipeline.

The comments/responses you wish to analyse using Chattermill should contain raw text only

Please remove any noise (e.g. HTML tags, email subject lines) from the comments/responses and only send the raw text you wish to use for insight analysis.

If you have any questions, please get in touch at [email protected]

Did this answer your question?