FULL EARL 2025 AGENDA ANNOUNCED
Companies That Presented Last Year























EARL 2025 Keynote Speakers

More information to come
Eric is an artist making work in a range of media, from painting, to music, to machine-learning AI art.
Some of his favourite themes are identity, consciousness, the philosophical ramifications of artificial intelligence, big data and the relationship between humans and machines.
We’re excited to see Eric take to the stage in Brighton, both as a local and inspirational AI genius and in his own words, “I’ll ponder what to talk about closer to the time – AI is an alarmingly fast-moving space!”.!

More information to come
Wes McKinney is an open source software developer and entrepreneur focusing on data processing tools and systems. He created the Python pandas and Ibis projects, and co-created Apache Arrow.
He is a Member of the Apache Software Foundation and a project PMC member for Apache Parquet. He is currently a Principal Architect at Posit PBC and a co-founder of Voltron Data. We cannot wait to learn more from Wes at EARL 2025!

Founder of The Data Inspiration Group, Digdata equips secondary and tertiary students with vital data literacy skills and real-world virtual work experience. With over a decade in data recruitment, Rachel saw first-hand the industry’s struggle to build diverse, inclusive teams – and the lack of awareness among young people about careers in data.
Determined to change this, she built Digdata into a national movement, now engaging over 25,000 students across 2,500 schools and 106 universities. Through partnerships with organisations like National Highways, NHS, GSK, HSBC, Meta and more, Rachel is helping shape a more inclusive and future-ready data talent pipeline for the UK.
More Information To Come

More information to come
More Information To Come
EARL 2025 Workshop Hosts

Dynamic presentations with Quarto
This interactive tutorial guides you through building dynamic Quarto presentations with R or Python, embedding live code, plots, and tables, applying custom styles, and publishing online—using a cloud setup with all tools provided.
Myles holds a PhD in Physics and works as a Principal Data Scientist at Jumping Rivers. With over a decade of experience in Python programming, he likes to apply himself to projects ranging from predictive analytics to software development. Keen to share his expertise, he enjoys teaching courses in beginner programming, database management, machine learning and more. When he’s not staring at computers, he enjoys running, hiking and anything else outdoors!

Simulation guided Bayesian Designs using R
This workshop introduces Bayesian methods in clinical trials, focusing on confirmatory designs, regulatory aspects, and hands-on R sessions for simulating and evaluating design performance.
Rajat has over 20 years of experience providing statistical consultancy to Pharma, Biotech and Medtech. He started his carrier in Public Health after completing his PhD in mathematical statistics from the Univ. of Wisconsin – Madison, USA.
Imran has over 15 year of experience in statistical software development and applications of machine learning in the pharma industry. He has a masters degree in Biomedical Engineering from the Indian Institure of Technology (IIT), Mumbai, India.
At MuSigmas we provide strategic planning for clinical trial design and pipeline design in several therapeutic areas including oncology, rare diseases, vaccines, CNS and CVD. We work in innovative and adaptive trial designs in both frequentist and Bayesian frameworks. Our interest and expertise also includes developing statistical software and applications of Machine learning for diagnostics, biomarker discovery and mining real world data. !

Core Machine Learning Concepts in Python
This workshop covers key machine learning concepts—regression, classification, clustering—through theory and hands-on Python exercises, with no prior experience required and cloud setup provided.
Aida holds a PhD in Statistics and is currently a Data Scientist at Jumping Rivers. She has extensive experience applying Machine Learning methods to real-world problems and teaching various statistics courses to diverse audiences. In her free time, she enjoys spending time outdoors and climbing.

Deploying AI in R with {ellmer} and {shiny}: From Experimentation to Production
This workshop explores real-world LLM deployment in R using the {ellmer} package, guiding you through building, coding, and deploying AI-powered Shiny apps with best practices.
Nic is a data scientist, software engineer, and R enthusiast. They are part of the team who maintain the arrow R package and co-author of Scaling Up with R and Arrow.

This talk introduces a long-term project analysing British horse racing data using R. It covers data cleaning, visualization, and modeling, including a Bayesian approach to track conditions—blending statistical rigor with the observational insights of racing enthusiasts. It also serves as a hands-on introduction to exploratory data analysis in R.

This talk introduces a long-term project analysing British horse racing data using R. It covers data cleaning, visualization, and modeling, including a Bayesian approach to track conditions—blending statistical rigor with the observational insights of racing enthusiasts. It also serves as a hands-on introduction to exploratory data analysis in R.

This talk presents R-based analyses of 92,165 dog handover requests to Dogs Trust (2023–2025). Hierarchical clustering revealed key owner profiles and reasons for relinquishment. Spatio-temporal modeling and travel analysis identified geographic service gaps. Insights will help tailor outreach and support for struggling dog owners across the UK.

This talk presents R-based analyses of 92,165 dog handover requests to Dogs Trust (2023–2025). Hierarchical clustering revealed key owner profiles and reasons for relinquishment. Spatio-temporal modeling and travel analysis identified geographic service gaps. Insights will help tailor outreach and support for struggling dog owners across the UK.

The NBN Trust shares over 300 million UK biodiversity records via the NBN Atlas. This talk explores the challenges of mobilising messy biodiversity data, tools that help, and data pathways in UK wildlife recording. It also highlights Brighton’s wildlife trends and how anyone can access or contribute records via iNaturalist.

The NBN Trust shares over 300 million UK biodiversity records via the NBN Atlas. This talk explores the challenges of mobilising messy biodiversity data, tools that help, and data pathways in UK wildlife recording. It also highlights Brighton’s wildlife trends and how anyone can access or contribute records via iNaturalist.

Post-COVID, TPR has embraced hybrid working with limited desks, supporting smart booking. Join Senior Data Scientist Luke Bandy as he unveils a real-time map of desk and room availability—built with web scraping, trigonometry, API data, and powerful visualizations using {magick}, {leaflet}, and {shiny}. Don’t miss it!

Jupyter widgets offer a powerful, intuitive way to build interactive frontends for data analysis, teaching, and rapid prototyping—all within a notebook. This talk introduces new R packages that bring full Jupyter widget support to R, enabling rich, reactive notebook interfaces and even development of new widgets entirely within R.

Generative AI can boost creativity and efficiency, but tools like ChatGPT can feel overwhelming. GIAA’s data team built R Shiny apps using NLP and Generative AI to analyse audit reports, surface insights, and streamline reporting. This talk covers their user-focused development process and key lessons from design to deployment.

A&E crowding is a major NHS issue. We analysed three years of data across departments, linking medical patient boarding to longer stays, ambulance delays, and worse outcomes. Our findings informed a live dashboard for NHS managers, enabling real-time monitoring and data-driven interventions to improve efficiency and patient flow.

High-stakes public health decisions demand speed, accuracy, and transparency. As an infectious disease modeller, I use R to support Dutch government responses—structuring every project as an R package. This talk shows how this workflow boosts reproducibility, collaboration, and trust, with lessons applicable across sectors and skill levels.

Managing multiple clients can lead to reactive firefighting. At Fresh Egg, we built an internal anomaly detection system using Python and Prophet to monitor GA4 data daily and send Slack alerts. This talk covers why we built it, how it works, lessons learned, and how it’s improved efficiency and oversight.

A quick prototype to ingest competitor prices grew into a complex, scalable Price Intelligence system using Python, Spark, and Delta Live Tables. This talk shares lessons from scaling without over-engineering, managing data chaos, debugging tough issues, and maintaining quality—offering real insights from building robust pipelines under pressure.

Manual interview transcription slows market research in fast-paced industries like pharma. What if AI could do it instead? In this talk, Elizabeth Brown, Data Scientist at Branding Science, shares how she built an internal AI transcription platform using R Shiny, Python, and CI/CD pipelines—from idea to production. She’ll cover platform design, backend processes, user feedback loops, and responsible AI use in a regulated environment.

Sensory testing demands consistency, but traditional vocabulary generation is slow and error-prone. Using R, Python, and LLMs, we built databases and the Portrait web app to standardise terms, cut setup time to under an hour, and improve data quality—empowering sensory scientists to work faster, more accurately, and at scale.

Sensory testing demands consistency, but traditional vocabulary generation is slow and error-prone. Using R, Python, and LLMs, we built databases and the Portrait web app to standardise terms, cut setup time to under an hour, and improve data quality—empowering sensory scientists to work faster, more accurately, and at scale.

Python is a powerful tool for automating repetitive data tasks. This talk explores how to streamline data mappings, automate DDL conversions, and generate test data using libraries like Faker—sharing practical code snippets and tips to boost efficiency, reduce errors, and simplify data prep for migrations or testing environments.

What if your data pipeline could talk back? This talk explores AI Agents—autonomous programs that help data professionals work faster and smarter. Learn what AI Agents are, see hands-on examples assisting ETL, EDA, and ML tasks, and discover how clients can query data in plain English with instant answers.

Building great tools is only half the battle—driving adoption is the real challenge. This talk explores how to bridge technical brilliance and user engagement, with lessons from healthtech to the public sector. Learn strategies for fostering trust, embracing user critique, and designing tools people truly want to use.

This talk shares how a team fostered R adoption in a traditionally Excel- and Power BI-driven organisation. Through fun, inclusive training and a supportive community of practice, they helped colleagues transition to R—empowering beginners to solve real business problems in just three sessions using a creative, themed learning approach.

Literate programming enhances analysis but suffers from slow re-render times. Tools like targets and Tasks automate and accelerate updates, enabling near-instant, live refresh of Rmd/Quarto files. This session explores how combining these tools improves workflow, focus, and output quality in complex analytics projects.

Five years after the R Validation Hub’s white paper, questions about validation still persist. This talk explores how one organisation defines and implements R package validation, addresses regulatory concerns, and offers practical discussion points to help others apply a risk-based approach within their own validation frameworks.

Reproducibility and scalability are essential in data science. This talk introduces Nextflow, a tool for building robust, scalable pipelines in R and Python. We’ll explore practical workflows and demonstrate how to create validated pipelines suitable for regulated environments like finance and pharma.

Working with large datasets in R can be challenging, especially in enterprise settings. This talk introduces Apache Arrow and Parquet—tools that enhance performance, reduce file sizes, and streamline workflows. Through real-world examples, you’ll learn how these technologies boost efficiency in R for analysis, pipelines, and Shiny apps.

This talk explores how free-text data can reveal insights into dog behaviour and welfare. Using BERT in Python and analysis in R, two studies classify public perceptions of dog emotions and track behavioural medication use over time—highlighting how NLP and statistical modelling improve understanding of canine health and human-dog interactions.

This talk explores how free-text data can reveal insights into dog behaviour and welfare. Using BERT in Python and analysis in R, two studies classify public perceptions of dog emotions and track behavioural medication use over time—highlighting how NLP and statistical modelling improve understanding of canine health and human-dog interactions.

This talk presents an innovative method for estimating the indirect carbon footprint of staff commutes and home working. By combining anonymised survey responses with administrative data, The Pensions Regulator gained clearer insight into emissions beyond standard metrics—supporting more informed, sustainable decision-making on the path to Net Zero.

This talk explores how astrophysicists simulate entire universes to understand galaxy formation. By combining observations from major telescopes with large-scale simulations incorporating complex physics, researchers compare models with real data to refine theories. While simulations run in C, Python is key for analysis and interpreting the cosmic evolution of galaxies.

This talk places AI in historical context, comparing it to transformative shifts like language, writing, and the printing press. It explores the widespread anxiety around AI and argues that understanding its broader societal impact can help us navigate and integrate it more effectively, as we’ve done with past revolutions.

This talk explores the creation and impact of metAInsights™, a Retrieval-Augmented Generation (RAG) chatbot platform developed to unlock 20+ years of pharmaceutical market research. It covers technical, design, and stakeholder challenges, user adoption strategies, and how the platform evolved into a revenue-generating product and key part of business operations.

This talk explores the creation and impact of metAInsights™, a Retrieval-Augmented Generation (RAG) chatbot platform developed to unlock 20+ years of pharmaceutical market research. It covers technical, design, and stakeholder challenges, user adoption strategies, and how the platform evolved into a revenue-generating product and key part of business operations.

This talk details how a Large Language Model (LLM) was used to analyse inbound customer calls in the vehicle advertising process. By comparing LLM-derived topics with manual classifications, the team identified key pain points driving support demand, improving understanding of user issues and enabling a smoother seller experience.

This talk shares the development of a proof-of-concept sales prediction tool at a major catering company using R. It covers data cleaning, EDA, modelling (linear regression and random forest), and early deployment via Power BI, highlighting lessons learned, integration challenges, and how the project supports sustainability and operational efficiency.

This talk explores how Sky uses causal inference techniques to address imbalances between treatment and control groups in data analysis. It covers methods like matched pairs and inverse propensity weighting to reduce bias, improve result accuracy, and support more reliable, data-driven decision-making in complex real-world scenarios.

This talk showcases how an R Shiny app was developed to replace complex Excel models for assessing UK railway level crossings. The tool streamlined lifecycle costing, enabled non-technical users to model upgrade scenarios, integrated safety and cost data, and supported more efficient, cost-effective, and safer infrastructure decisions at Network Rail.

This talk explores R’s evolution from a statistical tool to a versatile language used across industries. It highlights packages that support everything from data wrangling to dashboards, showing how R empowers both beginners and experts to build reproducible workflows and make data-driven decisions more accessible and impactful.

This talk introduces {AstronomR}, a new R package bringing astrophysics and cosmology tools to R. It enables access to astronomical data (e.g. Gaia), spectral analysis, and cosmological calculations—bridging R’s statistical power with astrophysical research needs. Learn how {AstronomR} expands R’s potential in astrostatistics and universe modeling.