Roast Data - Python

I started using Python to analyze my roasts (load, calculate, clean up, display data) as I teach myself a bit of data analytics / data science. Cleaning and rearranging data was the bulk of my effort in the last few weeks. I’m slowly building my skills.

Luckily, I’ve been mostly consistent with ~500g roasts. There are always outliers and reasons for deviation but I’m starting to see a few nice trends and am having fun with it. Nothing wildly new. But clearer than tracking multiple overlays. For example:

Anyone else fancy their own roast data analytics?


Nice work!

I did a series of “unmanaged” roasts to investigate the effect of independently varying preheat, power, fan, and drum settings. Each was a 325g batch of Brazil Cerrado roasted without changing settings during the roast, all dropped at 218C IBTS. Not as fancy as your python study, but it was interesting to see for example how much increase in power is required to compensate for an increase in fan speed, or how much a 10C increase in preheat shortened time to FC. The results give you some intuition to use in real roasting where you’re changing settings during the roast and the situation becomes more complicated.


Thanks, this is cool.

Knowing air temperature and humidity are important for repeatability too, I’ve noticed.

1 Like

I agree there can be a relation but it’s likely minor and multi dimensional that isn’t seen here. As long as you have your machine in a long pre-heat, there are many other important factors that effect early roast parameters:


Still just playing with linear regression. Confirming that we should have tighter correlations with data points that are temporally close, especially early in the roast before you start mucking with settings. e.g. Strong positive correlation with charge temps and turning point temps.

Medium to strong negative correlation with TP temps and FC.

1 Like

Hi ryweller,

Are all your data posted above from different roasts of the same beans?

no not of all the same bean. I typically am working on 10lbs of African beans, but they vary between washed and natural, Ethiopian and Kenyan.

They way I have it setup I can easily filter some results to specific Ethiopian naturals. However I’ve been on the road for the last 3 months and haven’t roasted or played with this for a while.
In September I’ll get back at it with more results.

No doubt that plays a part in the non-linearity you’re seeing (you saw, 6 months ago :slight_smile: ).

It would be interesting to re-map the relationships, on a per-bean basis?

I hope the road is being kind to you.

If there’s not enough data to split them out into individual graphs, then you could colour each bean (or origin) differently, so you could more easily identify clusters.

@ryweller - I did some python based data analysis on the data I collected from my previous roaster (Behmor 1600), so I didn’t have the richness of data that the Bullet provides. But I did some analysis on things like bean weight and batch size and ambient temperature.
Turns out time in roaster was the only highly correlated factor for that… but it was the strongest correlation (like r values around 0.99999)!

I am interested in doing the same sort of exploration with the Bullet, when I get more than a few roasts under my belt.

How did you get the data? I’ve found the folder in RoasTime were it keeps a bunch of JSON files with roast data in there - but I would ideally like to ingest from directly if possible.

Also, have you looked at using the seaborn library? That gives prettier visualizations and lets you more readily do things like size and colour different subsets of your data.

I loaded the .json files and cleaned up the data in this notebook: GitHub - rockpyer/bullet-roasting: Notebook for ingesting, expanding, displaying, and analyzing Allio Bullet coffee roasting data.

It’s a bit sloppy because I was just learning Python and never got really good

It last worked on it about a year ago and noted that software updates may change the file formats or records recorded. But I think this should be good a good start. Also lots of data cleaning.

1 Like

That’s great!
They say 80% of data science is just cleaning the data up enough to be useful :slight_smile:

I want to use the data to help document my roast history as well as the data analytics, so I will probably build on what you’ve got there outside of a Jupyter notebook - the notes on things like isFork are hugely valuable time savers thanks!

I’ll report back with what I find!

And it looks like everyone is just working off the (obfuscated) json files on their computer - it would be good to know if we could hit up the Firebase back-end of

OK, so thanks to your help, I have managed to process all the roast information.

For starters, I am using it to automatically create labels, and to push some basic info out to a website.

Next step will be to recreate the roast profile graph.
I tried a variety of ways to automate taking a screenshot of the roast profile graph - but they’re doing something funky to generate it, so it’s not an element on the web page that I can find, and it refreshes every time you focus on the page, so the Selenium screengrabber only got blank screens :confused:

That’s the progress so far!


Nice one. I like that idea of generating information for a website or other display. You could probably use matplotlib, seaborn, or something to plot the roast profile as you wish it to be displayed. I’ve considered dashboards outside of roastime to help me track and compare notes of the same roast, but as you can tell I’ve fallen off the project.

Glad to the loading aspect was useful for your. Keep us updated or looking forward to others jumping in!

1 Like

Latest news:
I made the repo public.

I originally wrote a very … targeted script (ie messy and hacky) just to prove the concept. I am still working to get the repo to the point where it replicates all the functionality of my prototype but now it’s at the point where it’s working and isn’t too embarrassing to share.

In addition to recreating the roast profile, I am looking at adding in the ability to save the data as a local markdown file as well as the option to push it to a Headless CMS.

Feel free to share ideas thoughts etc here - or in the github project if you’re motivated enough :slight_smile:


Further news:
Library has been updated to have parity with my original hacky) script.

It now uses a three stage workflow

  1. Extract the data from RT, compile it into Markdown files, and generate a label
  2. Using a markdown Editor, you can create annotation .md files that will merge in with the auto generated content, so you can add in your own descriptions, scores, etc that we can’t keep in RT.
  3. Using Gatsby, publish the content to the web!

So now there’s very little human intervention between roasting a coffee and having it appear on my website!

1 Like

@themcclure this is so cool. I don’t have the focus to do something like this, but I feel like with enough data you could potentially figure out what kind of bean you roasted even if you had no prior knowledge of where it was from. I bet you could also use all this data to create a simple “wizard” where someone could input the beans origin, process, altitude, and variety, and it could output ideal temperatures possibly even scaling the data based on batch size. This would be such a useful tool for beginning with new beans.

1 Like


Yeah, getting a large enough dataset to generate a “default overlay” for your first roast would be a great goal - but it would need a pretty broad dataset.

That’s one of the reasons I’m hoping to connect to the RW API rather than the files on the local RT - we might be able to query everyone else’s roasts to build the models.

File that under “future dreams” :slight_smile: