An Admin's Cornucopia - Python Is More Than Just A Better Bash
Christian Theune
Python's versatility is known to admins - in this talk I'd like to show how it fits for many small and big challenges I meet regularly: from tiny scripts to large systems. Also, I'll show how using the languages' advanced and/or newer features makes scripts more compact and robust.
An introduction to PyMC3
Adrian Seyboldt
PyMC3 allows you to build statistical models for a wide range of datasets, use those models to estimate underlying parameters, and compute the uncertainty about those parameters. In this talk I will try to give a gentle introduction to PyMC3, and help avoid common pitfalls for new users.
And now to something ELSE: Real Time Data Processing @ billiger.de
Axel Arnold
billiger.de is one of the largest price comparison websites in Germany. In this talk, we want to share how we built the scalable, event-driven processing system which renders the products for our website using Python, Elasticsearch and redis.
Automated testing with 400TB memory
Christoph Heer
SAP operates a dedicated test infrastructure with more than 400TB main memory for its in-memory database SAP HANA. All custom implementations like improved scheduling, caching of artifacts and monitoring were implemented in our favorite programming language Python.
Building your own SDN with Debian Linux, Salt Stack and Python
Maximilian Wilhelm
In this talk you will get an overview about some awesome features of comtemporary Linux networking, how to easily integrate them with some cool open source tools, and glueing all this together with Salt Stack and some Python to get your very own SDN controller for a service-provider style network.
Connecting PyData to other Big Data Landscapes using Arrow and Parquet
Uwe L. Korn
While Python itself hosts a wide range of machine learning and data tools, other ecosystems like the Hadoop world also provide beneficial tools that can be either connected via Apache Parquet files or in memory using Arrow. This talks shows recent developments that allow interoperation at speed.
Data Plumbing 101 - ETL Pipelines for Everyday Projects
Eberhard Hansis
There is no data science without ETL! This presentation is about implementing maintainable data integration for your projects. We will have a first look a ‘Ozelot’, a library based on Luigi and SQLAlchemy that helps you get started with building ETL pipelines.
Data Science Best Practices : From Proof of Concepts to Production
Yasir Khan
This presentation will benefit the audience as it brings forward the practical issues in the industry today as we move towards industrializing data science algorithms. We will discuss the best practices around organization, methodology and tools to integrate a data science project into production.
Data Science Project for Beginners
Natalie Speiser,Jens Beyer
AI and Machine Learning are taking over the world - but how do you actually start with understanding your data and predicting events? And what kind of "political" trouble could you run into? With examples from real projects, we try to give you a feeling for data science projects.
Deep Learning for Computer Vision
Alex Conway
The state-of-the-art in image classification has skyrocketed thanks to the development of deep convolutional neural networks and increases in the amount of data and computing power available to train them. The top-5 error rate in the ImageNet competition to predict which of 1000 classes an image belongs to has plummeted from 28% error in 2010 to just 2.25% in 2017 (human level error is around 5%).
In addition to being able to classify objects in images (including not hotdogs), deep learning can be used to automatically generate captions for images, convert photos into paintings, detect cancer in pathology slide images, and help self-driving cars ‘see’.
The talk will give an overview of the cutting edge and some of the core mathematical concepts and will also include a short code-first tutorial to show how easy it is to get started using deep learning for computer vision in python…
Effective Data Analysis with Pandas Indexes
Alexander Hendorf
Pandas is the Swiss-Multipurpose Knife for Data Analysis in Python. In this talk we will look deeper into how to gain productivity utilizing Pandas powerful indexing and make advanced analytics a piece of cake. Pandas features multiple index types. This talk will give you a deep insight into the Pandas indexes and showcase the handiness of special Indexes as the TimeSeriesIndex.
Empowered by Python - A success story
Jens Nie,Peer Wagner
Introducing a new programming language in a company is always a daring task, usually involving a lot of effort and the will for change.
We'd like to take you on a journey reflecting eight years of challenges, solutions and success ending in a best practice guide helping you to achieve the same.
Flow is in the Air: Best Practices of Building Analytical Data Pipelines with Apache Airflow
Dominik Benz
Apache Airflow is an Open-Source python project which facilitates an intuitive programmatic definition of analytical data pipelines. Based on 2+ years of productive experience, we summarize its core concepts, detail on lessons learned and set it in context with the Big Data Analytics Ecosystem.
From 0 to Continuous Delivery in 30 minutes.
David Wölfle
An introduction and hands on example how to start Continuous Delivery for python (or whatever) projects with conda and gitlab, which are open source, free to use, and if you wish even available as a cloud service.
From Java to Python: Migrating Search Functionality at billiger.de
Patrick Schemitz
billiger.de is a German price comparison site. Search is handled by a heavily customized Solr setup. When switching to SolrCloud earlier this year, instead of porting our custom SolrComponents to SolrCloud, we ended up re-implementing them in a Python service layer. Here we show how, and why.
Getting Scikit-Learn To Run On Top Of Pandas
Ami Tavory
Scikit-Learn is built directly over numpy, Python's numerical array library. Pandas adds to numpy metadata and higher-level munging capabilities. This talk describes how to intelligently auto-wrap Scikit-Learn for creating a version that can leverage pandas's added features.
Graphql in the Python World
Nafiul Islam
So you've heard about this new thing called Graphql. What is it all about? What problems does it solve, and most importantly, how can you leverage it the python ecosystem? This talk is a tell all on what what Graphql is and how you can start using it with Python.
Hacking the Python AST
Suhas SG
Computer languages are a remarkable feat of human scientific engineering. In this talk, we'll look at the innards of CPython, and specifically learn how to modify and hack Abstract Syntax Trees (for world peace, of course).
High-Performance Ingestion with Python and Swarm64DB
Sebastian Dreßler
Swarm64DB is a hardware-accelerated plugin for PostgreSQL and other RDBMS. By using Swarm64DB in combination with PostgreSQL, Python and the right scaling mechanism, we are able to push the ingestion throughput into areas where Python can easily compete with compiled languages. The talk highlights the architecture of our solution and showcases a real world use-case..
How efficient is your public transport network? A data-driven approach using Geopandas and GTFS
Pieter Mulder
The presentation will explain how GeoPandas and other tools are used to analyse GTFS files to calculate the reachability of a public transport system.
Integrating Jupyter Notebooks into your Infrastructure
Florian Rhiem
Jupyter Notebooks combine executable code and rich text elements in a web application. In this talk you will learn how a custom JupyterHub installation can be used to integrate Jupyter Notebooks into your infrastructure, including existing authentication methods and custom software distributions.
Keeping the grip on decoupled code using CLIs
Anne Matthies
So you’ve decoupled your code monolith into all those micro chunks. When someone asks „How can I…“ you want to answer: „That’s easy! We’ve built that.“ Actually, you’ve built all parts needed for that. Who plugs them together? And how?
Large-scale machine learning pipelines using Luigi, PySpark and scikit-learn
Alexander Bauer
For prescriptive analytics applications, data science teams need to design, build and maintain complex machine learning pipelines. In this talk, we demonstrate how such pipelines can be implemented in a robust, scalable and extensible manner using Python, Luigi, PySpark and scikit-learn.
Lift your Speed Limits with Cython
Stefan Behnel
Think you can benefit from making your Python application run faster? Then come along and learn how to tune your code with Cython.
Master 2.5 GB of unstructured specification documents with ease
Dr. Andreas Schilling
How Do you kick start a project which is based on 2.5 GB files of unstructured specification documents? To answer this question, we present our lessons learned from developing a Python based knowledge management tool which provides a lightweight and intuitive browser frontend.
Migrating existing codebases to using type annotations
Stephan Jaensch
You have an existing codebase of tens or hundreds of thousands of lines of Python code? Learn how to get started with type annotations! Get your teammates (and yourself!) to always annotate your code. Find out what unexpected issues you might run into and how to solve them, all with this talk.
Modern ETL-ing with Python and Airflow (and Spark)
Tamara Mendt
The challenge of data integration is real. The sheer amount of tools that exist to address this problem is proof that organizations struggle with it. This talk will discuss the inherent challenges of data integration, and show how it can be tackled using Python and Apache Airflow and Apache Spark.
No Compromise: Use Ansible properly or stick to your scripts
Bjoern Meier
What you do in Ansible should be clean an simple. What we did was not. So I will show what we did wrong but also what we have changed or still have to, to make our life easier again. But I will also show how we progressively utilize Ansible to deploy our Data Science infrastructure.
Observing your applications with Sentry and Prometheus
Patrick Mühlbauer
If you have services running in production, something will fail sooner or later. We cannot avoid this completely, but we can prepare for it. In this talk we will have a look at how Sentry and Prometheus can help to get better insight into our systems to quickly track down the cause of failure.
Platform intrusion detection with deep learning
Carsten Pohl
shop.rewe.de is not only visited by human customers, but also by machines. We have built a deep learning platform using python with Keras, Tensorflow, on the Google infrastructure. In this talk we would like to show you how python is used in practice, supporting 2,5 million visitors each day.
Plugin ecosystems for Python web-applications
Raphael Michel
The power of some popular web applications like WordPress comes from a flexible plugin system. This talk will show how to implement such plugin architectures for Python web applications including real-world examples. I'll give examples with Django, but the important bits aren't Django-specific.
Programming the Web of Things with Python and MicroPython
Hardy Erlinger
In this session you will get a gentle introduction to the ever-expanding world of small programmable devices: learn to use single board computers and microcontrollers to connect to sensors and talk to APIs - all using Python or MicroPython, a subset of Python 3 for use in constrained environments.
Project Avatar - Telepresence robotics with Nao and Kinect
Thomas Reifenberger,Martin Foertsch
Using humanoid robots, VR glasses and 3D cameras you can experience the world through the eyes of a robot and control it via gestures. We built a telepresence robotics system based on a Nao robot, an Oculus Rift and a Kinect One to realize an immersive "out-of-body experience" as in "Avatar".
PyGenSA: An Efficient Global Optimization for Generalized Simulated Annealing
Stephane Cano
The PyGenSA python module has been developed for generalized simulated annealing to process complicated non-linear objective functions with a large number of local minima.
Python in Space - The N Body Problem
Daniel Jilg
The N Body Problem is a computationally complex problem that we use to predict how planets and galaxies – and everything in between – move through space. I'll show you some interesting ways to calculate it, and we'll have a look at what to do, should you find yourself in a space ship's pilot seat.
Python is Weird
Dave Halter
A lot of people think that Python is a really simple and straightforward language. Python hides a lot of peculiarities very well, but for the sake of this talk we will try to uncover them.
Is ++4;
valid Python? And what does it do? Let me give you an introduction into tokenizers/parsers.
Python on bare metal – Beginners tutorial with MicroPython on the pyboard
Christine Spindler
MicroPython is a complete reimplementation of Python that runs on small devices like microcontrollers. In this hands-on workshop I'll show how easy it is to use MicroPython on a pyboard.
Python with Apache OpenWhisk
Ansgar Schmidt
OpenWhisk is an opensource implementation of a so called serverless computing platform. At a live presentation I will show how to write an serverless application and how to deal with libraries and events. OpenWhisk is an open source alternative to AWS lambda or MS functions.
Rasa: open source conversational AI to build next generation chatbots
Joey Faulkner
Soon you will primarily communicate with your computer through conversation. At Rasa, we believe that this revolution in user experience should be available to everyone. In this spirit we have developed open source tools that use machine learning to make chatbots in a developer-friendly interface.
Really Deep Neural Networks with PyTorch
David Dao
Modern neural networks have hundreds of layers! How can we train such deep networks? Simply stacking layers on top doesn't work! This talk introduces the deep learning library PyTorch by explaining the exciting math, cool ideas and simple code behind what makes really deep neural networks work.
Simple Data Engineering in python 3.5+ with Bonobo
Romain Dorgueil
Simple is better than complex, and that's True for data pipelines, too.
Bonobo is a python 3.5+ tool used to write and monitor data pipelines. It’s plain, simple, modern, and atomic python.
This talk is a practical encounter, from zero to a complete data pipeline.
Spoiler : no «big data» here.
Sport analysis with Python
Thuy Le
Sport analysis with Python and visualize data with tableau. We have sample data of a team in football match (name of players, positions of players, velocities of players) which are recorded in every 20 millisecond. We use python to analysis and Tableau to visualize the activities of each player
Synthetic Data for Machine Learning Applications
Dr. Hendrik Niemeyer
In this talk I will show how we use real and synthetic data to create successful models for risk assessing pipeline anomalies. The main focus is the estimation of the difference in the statistical properties of real and generated data by machine learning methods.
Technical Lessons Learned from Pythonic Refactoring
Yenny Cheung
Ever stumbled upon poorly-maintained codebases that suck away your productivity? Fear no more! This talk addresses how to identify code smell (from Brie to Bleu cheese) and go through examples to refactor code and APIs. You will learn the art of writing clean, maintainable and idiomatic Python code.
The BorgBackup Project
Thomas Waldmann
BorgBackup is a modern, deduplicating backup software written in Python 3.4+, Cython and C.
The talk will start with a quick presentation about the software and why you may want to use it for your backups.
Then, I will show how we run the software project: Tools, Services, Best Practices.
The Mustache Movement
Heidi Thorpe
Generative Adversarial Networks (GANs) are a class of neural networks which are powerful and flexible tools. A common application is image generation. I would like to give a simple introduction to GANs using existing python modules and an example of how "mustache-ness" can be learned and applied.
The Python Ecosystem for Data Science: A Guided Tour
Christian Staudt
Pythonistas have access to an extensive collection of tools for data analysis. The space of tools is best understood as an ecosystem: Libraries build upon each other, and a good library fills an ecological niche by doing certain jobs well. This is a guided tour of the Python data science ecosystem.
The Snake in the Tar Pit: Complex Systems with Python
Stephan Erb
The Zen of Python motivates us to build software that is easy to maintain and extend. In reality however, we often end up with systems that are quite the opposite: complex and hard to change. In this talk, we will have a look at why this happens and how we can try to prevent it.
The eye of the Python, an eye tracking system. From zero to... what eye learned.
Samuel Muñoz Hidalgo | BEEVA
Is it possible to predict the point in the screen where a person is looking at? Easy to say but hard to do. An eye tracking system is the perfect project to learn the difficulties of applied machine learning. From gathering training data to building the final software with an acceptable performance.
Theoretical physics with sympy
Florian Thöle
In this talk, I will introduce the basics of sympy. Using a simple model system in magnetism, we'll play around with simplifications, then do a bit of numerical optimization and in the end make psychedelic-looking figures.
Time series feature extraction with tsfresh - “get rich or die overfitting”
Nils Braun
Have you ever thought about developing a time series model to predict stock prices? Or do you consider log time series from the operation of cloud resources as being more compelling? In this case you really should consider using the time series feature extraction package tsfresh for your project.
Turbodbc: Turbocharged database access for data scientists
Michael König
Python's database API 2.0 is well suited for transactional database workflows, but not so much for column-heavy data science. This talk explains how the ODBC-based turbodbc database module extends this API with first-class, efficient support for familiar NumPy and Apache Arrow data structures.
Verified fakes with OpenAPI
Lauris Jullien
It can be hard to test code that depends on external services. Often such services are mocked, but with time, it can be challenging to keep these mocks up to date. Verified fakes can solve this problem, and we will see how to set them up using OpenAPI and python.
Vim your Python, Python your Vim
Miroslav Šedivý
What do you use to write source code, docs, books or e-mails? Single brain, single pair of hands, single keyboard, but a different keyboard layout for each language and a different text editor for each purpose?
Why Python Has Taken Over Finance
Yves Hilpisch
Not too long ago, the finance field was dominated by compiled languages, such as C or C++, since they were considered to be the right choice for the implementation of computationally demanding algorithms. This talk explains why Python has become No 1 in the field.