EARL Conference

FULL EARL 2025 AGENDA ANNOUNCED

We’re thrilled to unveil the agenda for EARL 2025, taking place from October 14th to the 16th at the DoubleTree by Hilton Brighton Metropole. 
 
This year’s conference continues to bridge the worlds of R and Python, offering a rich program of keynotes, workshops, and sessions that showcase the latest in data science applications across industries. 
 
 Join us in Brighton to learn, share, and connect with the data science community

Companies That Presented Last Year

EARL 2025 Keynote Speakers

More information to come

Eric is an artist making work in a range of media, from painting, to music, to machine-learning AI art.

Some of his favourite themes are identity, consciousness, the philosophical ramifications of artificial intelligence, big data and the relationship between humans and machines.

We’re excited to see Eric take to the stage in Brighton, both as a local and inspirational AI genius and in his own words, “I’ll ponder what to talk about closer to the time – AI is an alarmingly fast-moving space!”.!

EARL Tech Conference: Wes McKinney, creator of the pandas library for Python, giving a talk

More information to come

Wes McKinney is an open source software developer and entrepreneur focusing on data processing tools and systems. He created the Python pandas and Ibis projects, and co-created Apache Arrow.

He is a Member of the Apache Software Foundation and a project PMC member for Apache Parquet. He is currently a Principal Architect at Posit PBC and a co-founder of Voltron Data. We cannot wait to learn more from Wes at EARL 2025!

Founder of The Data Inspiration Group, Digdata equips secondary and tertiary students with vital data literacy skills and real-world virtual work experience. With over a decade in data recruitment, Rachel saw first-hand the industry’s struggle to build diverse, inclusive teams – and the lack of awareness among young people about careers in data.

Determined to change this, she built Digdata into a national movement, now engaging over 25,000 students across 2,500 schools and 106 universities. Through partnerships with organisations like National Highways, NHS, GSK, HSBC, Meta and more, Rachel is helping shape a more inclusive and future-ready data talent pipeline for the UK.

More Information To Come

More information to come

More Information To Come

EARL 2025 Workshop Hosts

Myles Mitchell

Dynamic presentations with Quarto

This interactive tutorial guides you through building dynamic Quarto presentations with R or Python, embedding live code, plots, and tables, applying custom styles, and publishing online—using a cloud setup with all tools provided.

Myles holds a PhD in Physics and works as a Principal Data Scientist at Jumping Rivers. With over a decade of experience in Python programming, he likes to apply himself to projects ranging from predictive analytics to software development. Keen to share his expertise, he enjoys teaching courses in beginner programming, database management, machine learning and more. When he’s not staring at computers, he enjoys running, hiking and anything else outdoors!

Simulation guided Bayesian Designs using R

This workshop introduces Bayesian methods in clinical trials, focusing on confirmatory designs, regulatory aspects, and hands-on R sessions for simulating and evaluating design performance.

Rajat has over 20 years of experience providing statistical consultancy to Pharma, Biotech and Medtech. He started his carrier in Public Health after completing his PhD in mathematical statistics from the Univ. of Wisconsin – Madison, USA. 

Imran has over 15 year of experience in statistical software development and applications of machine learning in the pharma industry. He has a masters degree in Biomedical Engineering from the Indian Institure of Technology (IIT), Mumbai, India. 

At MuSigmas we provide strategic planning for clinical trial design and pipeline design in several therapeutic areas including oncology, rare diseases, vaccines, CNS and CVD. We work in innovative and adaptive trial designs in both frequentist and Bayesian frameworks. Our interest and expertise also includes developing statistical software and applications of Machine learning for diagnostics, biomarker discovery and mining real world data. !

Core Machine Learning Concepts in Python

This workshop covers key machine learning concepts—regression, classification, clustering—through theory and hands-on Python exercises, with no prior experience required and cloud setup provided.

Aida holds a PhD in Statistics and is currently a Data Scientist at Jumping Rivers. She has extensive experience applying Machine Learning methods to real-world problems and teaching various statistics courses to diverse audiences. In her free time, she enjoys spending time outdoors and climbing.

Deploying AI in R with {ellmer} and {shiny}: From Experimentation to Production

This workshop explores real-world LLM deployment in R using the {ellmer} package, guiding you through building, coding, and deploying AI-powered Shiny apps with best practices.

Nic is a data scientist, software engineer, and R enthusiast. They are part of the team who maintain the arrow R package and co-author of Scaling Up with R and Arrow.

Colin Magee

This talk introduces a long-term project analysing British horse racing data using R. It covers data cleaning, visualization, and modeling, including a Bayesian approach to track conditions—blending statistical rigor with the observational insights of racing enthusiasts. It also serves as a hands-on introduction to exploratory data analysis in R.

Jay Emerson

This talk introduces a long-term project analysing British horse racing data using R. It covers data cleaning, visualization, and modeling, including a Bayesian approach to track conditions—blending statistical rigor with the observational insights of racing enthusiasts. It also serves as a hands-on introduction to exploratory data analysis in R.

Dr Sarah Weidman

This talk presents R-based analyses of 92,165 dog handover requests to Dogs Trust (2023–2025). Hierarchical clustering revealed key owner profiles and reasons for relinquishment. Spatio-temporal modeling and travel analysis identified geographic service gaps. Insights will help tailor outreach and support for struggling dog owners across the UK.

Chris Newton

This talk presents R-based analyses of 92,165 dog handover requests to Dogs Trust (2023–2025). Hierarchical clustering revealed key owner profiles and reasons for relinquishment. Spatio-temporal modeling and travel analysis identified geographic service gaps. Insights will help tailor outreach and support for struggling dog owners across the UK.

Will Millard

The NBN Trust shares over 300 million UK biodiversity records via the NBN Atlas. This talk explores the challenges of mobilising messy biodiversity data, tools that help, and data pathways in UK wildlife recording. It also highlights Brighton’s wildlife trends and how anyone can access or contribute records via iNaturalist.

Rhiann Stock

The NBN Trust shares over 300 million UK biodiversity records via the NBN Atlas. This talk explores the challenges of mobilising messy biodiversity data, tools that help, and data pathways in UK wildlife recording. It also highlights Brighton’s wildlife trends and how anyone can access or contribute records via iNaturalist.

Laura Mawer headshot

More Information to come

Marcus

More Information to come.

Luke Bandy

Post-COVID, TPR has embraced hybrid working with limited desks, supporting smart booking. Join Senior Data Scientist Luke Bandy as he unveils a real-time map of desk and room availability—built with web scraping, trigonometry, API data, and powerful visualizations using {magick}, {leaflet}, and {shiny}. Don’t miss it!

Romain François

Jupyter widgets offer a powerful, intuitive way to build interactive frontends for data analysis, teaching, and rapid prototyping—all within a notebook. This talk introduces new R packages that bring full Jupyter widget support to R, enabling rich, reactive notebook interfaces and even development of new widgets entirely within R.

Katy Morgan

Generative AI can boost creativity and efficiency, but tools like ChatGPT can feel overwhelming. GIAA’s data team built R Shiny apps using NLP and Generative AI to analyse audit reports, surface insights, and streamline reporting. This talk covers their user-focused development process and key lessons from design to deployment.

Nick Howlett

A&E crowding is a major NHS issue. We analysed three years of data across departments, linking medical patient boarding to longer stays, ambulance delays, and worse outcomes. Our findings informed a live dashboard for NHS managers, enabling real-time monitoring and data-driven interventions to improve efficiency and patient flow.

Kylie Ainslie

High-stakes public health decisions demand speed, accuracy, and transparency. As an infectious disease modeller, I use R to support Dutch government responses—structuring every project as an R package. This talk shows how this workflow boosts reproducibility, collaboration, and trust, with lessons applicable across sectors and skill levels.

Zac Nash

Managing multiple clients can lead to reactive firefighting. At Fresh Egg, we built an internal anomaly detection system using Python and Prophet to monitor GA4 data daily and send Slack alerts. This talk covers why we built it, how it works, lessons learned, and how it’s improved efficiency and oversight.

Andres Baravalle

A quick prototype to ingest competitor prices grew into a complex, scalable Price Intelligence system using Python, Spark, and Delta Live Tables. This talk shares lessons from scaling without over-engineering, managing data chaos, debugging tough issues, and maintaining quality—offering real insights from building robust pipelines under pressure.

Elizabeth Brown

Manual interview transcription slows market research in fast-paced industries like pharma. What if AI could do it instead? In this talk, Elizabeth Brown, Data Scientist at Branding Science, shares how she built an internal AI transcription platform using R Shiny, Python, and CI/CD pipelines—from idea to production. She’ll cover platform design, backend processes, user feedback loops, and responsible AI use in a regulated environment.

Jack Westcott

Sensory testing demands consistency, but traditional vocabulary generation is slow and error-prone. Using R, Python, and LLMs, we built databases and the Portrait web app to standardise terms, cut setup time to under an hour, and improve data quality—empowering sensory scientists to work faster, more accurately, and at scale.

Kasidit Tipayawatn

Sensory testing demands consistency, but traditional vocabulary generation is slow and error-prone. Using R, Python, and LLMs, we built databases and the Portrait web app to standardise terms, cut setup time to under an hour, and improve data quality—empowering sensory scientists to work faster, more accurately, and at scale.

Amit Kholi

More Information to come.

Colin Gillepsie

More Information to come.

James Mullan

Python is a powerful tool for automating repetitive data tasks. This talk explores how to streamline data mappings, automate DDL conversions, and generate test data using libraries like Faker—sharing practical code snippets and tips to boost efficiency, reduce errors, and simplify data prep for migrations or testing environments.

craig - Craig West

What if your data pipeline could talk back? This talk explores AI Agents—autonomous programs that help data professionals work faster and smarter. Learn what AI Agents are, see hands-on examples assisting ETL, EDA, and ML tasks, and discover how clients can query data in plain English with instant answers.

Maria (Masha) Gaganova

Building great tools is only half the battle—driving adoption is the real challenge. This talk explores how to bridge technical brilliance and user engagement, with lessons from healthtech to the public sector. Learn strategies for fostering trust, embracing user critique, and designing tools people truly want to use.

Stephen Price

This talk shares how a team fostered R adoption in a traditionally Excel- and Power BI-driven organisation. Through fun, inclusive training and a supportive community of practice, they helped colleagues transition to R—empowering beginners to solve real business problems in just three sessions using a creative, themed learning approach.

Joseph Osborne

Literate programming enhances analysis but suffers from slow re-render times. Tools like targets and Tasks automate and accelerate updates, enabling near-instant, live refresh of Rmd/Quarto files. This session explores how combining these tools improves workflow, focus, and output quality in complex analytics projects.

Mike Smith

Five years after the R Validation Hub’s white paper, questions about validation still persist. This talk explores how one organisation defines and implements R package validation, addresses regulatory concerns, and offers practical discussion points to help others apply a risk-based approach within their own validation frameworks.

Mark Sellors

Reproducibility and scalability are essential in data science. This talk introduces Nextflow, a tool for building robust, scalable pipelines in R and Python. We’ll explore practical workflows and demonstrate how to create validated pipelines suitable for regulated environments like finance and pharma.

Nic Crane

Working with large datasets in R can be challenging, especially in enterprise settings. This talk introduces Apache Arrow and Parquet—tools that enhance performance, reduce file sizes, and streamline workflows. Through real-world examples, you’ll learn how these technologies boost efficiency in R for analysis, pipelines, and Shiny apps.

Jana Muschinski

This talk explores how free-text data can reveal insights into dog behaviour and welfare. Using BERT in Python and analysis in R, two studies classify public perceptions of dog emotions and track behavioural medication use over time—highlighting how NLP and statistical modelling improve understanding of canine health and human-dog interactions.

Mel Weedon

This talk explores how free-text data can reveal insights into dog behaviour and welfare. Using BERT in Python and analysis in R, two studies classify public perceptions of dog emotions and track behavioural medication use over time—highlighting how NLP and statistical modelling improve understanding of canine health and human-dog interactions.

Jason Verrall

This talk presents an innovative method for estimating the indirect carbon footprint of staff commutes and home working. By combining anonymised survey responses with administrative data, The Pensions Regulator gained clearer insight into emissions beyond standard metrics—supporting more informed, sustainable decision-making on the path to Net Zero.

Stephen Wilkins

This talk explores how astrophysicists simulate entire universes to understand galaxy formation. By combining observations from major telescopes with large-scale simulations incorporating complex physics, researchers compare models with real data to refine theories. While simulations run in C, Python is key for analysis and interpreting the cosmic evolution of galaxies.

Screenshot

This talk places AI in historical context, comparing it to transformative shifts like language, writing, and the printing press. It explores the widespread anxiety around AI and argues that understanding its broader societal impact can help us navigate and integrate it more effectively, as we’ve done with past revolutions.

Gabe Musker

This talk explores the creation and impact of metAInsights™, a Retrieval-Augmented Generation (RAG) chatbot platform developed to unlock 20+ years of pharmaceutical market research. It covers technical, design, and stakeholder challenges, user adoption strategies, and how the platform evolved into a revenue-generating product and key part of business operations.

Edited with Afterlight Photo

This talk explores the creation and impact of metAInsights™, a Retrieval-Augmented Generation (RAG) chatbot platform developed to unlock 20+ years of pharmaceutical market research. It covers technical, design, and stakeholder challenges, user adoption strategies, and how the platform evolved into a revenue-generating product and key part of business operations.

Chris Campbell

This talk details how a Large Language Model (LLM) was used to analyse inbound customer calls in the vehicle advertising process. By comparing LLM-derived topics with manual classifications, the team identified key pain points driving support demand, improving understanding of user issues and enabling a smoother seller experience.

Megan Bourne

This talk shares the development of a proof-of-concept sales prediction tool at a major catering company using R. It covers data cleaning, EDA, modelling (linear regression and random forest), and early deployment via Power BI, highlighting lessons learned, integration challenges, and how the project supports sustainability and operational efficiency.

Sky Consumer strategy staff
AJ Small

This talk explores how Sky uses causal inference techniques to address imbalances between treatment and control groups in data analysis. It covers methods like matched pairs and inverse propensity weighting to reduce bias, improve result accuracy, and support more reliable, data-driven decision-making in complex real-world scenarios.

Dr Alison Telford

This talk showcases how an R Shiny app was developed to replace complex Excel models for assessing UK railway level crossings. The tool streamlined lifecycle costing, enabled non-technical users to model upgrade scenarios, integrated safety and cost data, and supported more efficient, cost-effective, and safer infrastructure decisions at Network Rail.

aida

This talk explores R’s evolution from a statistical tool to a versatile language used across industries. It highlights packages that support everything from data wrangling to dashboards, showing how R empowers both beginners and experts to build reproducible workflows and make data-driven decisions more accessible and impactful.

Samrit Pramanik

This talk introduces {AstronomR}, a new R package bringing astrophysics and cosmology tools to R. It enables access to astronomical data (e.g. Gaia), spectral analysis, and cosmological calculations—bridging R’s statistical power with astrophysical research needs. Learn how {AstronomR} expands R’s potential in astrostatistics and universe modeling.