Python's versatility is known to admins - in this talk I'd like to show how it fits for many small and big challenges I meet regularly: from tiny scripts to large systems. Also, I'll show how using the languages' advanced and/or newer features makes scripts more compact and robust.
PyMC3 allows you to build statistical models for a wide range of datasets, use those models to estimate underlying parameters, and compute the uncertainty about those parameters. In this talk I will try to give a gentle introduction to PyMC3, and help avoid common pitfalls for new users.
billiger.de is one of the largest price comparison websites in Germany. In this talk, we want to share how we built the scalable, event-driven processing system which renders the products for our website using Python, Elasticsearch and redis.
SAP operates a dedicated test infrastructure with more than 400TB main memory for its in-memory database SAP HANA. All custom implementations like improved scheduling, caching of artifacts and monitoring were implemented in our favorite programming language Python.
In this talk you will get an overview about some awesome features of comtemporary Linux networking, how to easily integrate them with some cool open source tools, and glueing all this together with Salt Stack and some Python to get your very own SDN controller for a service-provider style network.
Uwe L. Korn
While Python itself hosts a wide range of machine learning and data tools, other ecosystems like the Hadoop world also provide beneficial tools that can be either connected via Apache Parquet files or in memory using Arrow. This talks shows recent developments that allow interoperation at speed.
There is no data science without ETL! This presentation is about implementing maintainable data integration for your projects. We will have a first look a ‘Ozelot’, a library based on Luigi and SQLAlchemy that helps you get started with building ETL pipelines.
This presentation will benefit the audience as it brings forward the practical issues in the industry today as we move towards industrializing data science algorithms. We will discuss the best practices around organization, methodology and tools to integrate a data science project into production.
Natalie Speiser,Jens Beyer
AI and Machine Learning are taking over the world - but how do you actually start with understanding your data and predicting events? And what kind of "political" trouble could you run into? With examples from real projects, we try to give you a feeling for data science projects.
The state-of-the-art in image classification has skyrocketed thanks to the development of deep convolutional neural networks and increases in the amount of data and computing power available to train them. The top-5 error rate in the ImageNet competition to predict which of 1000 classes an image belongs to has plummeted from 28% error in 2010 to just 2.25% in 2017 (human level error is around 5%).
In addition to being able to classify objects in images (including not hotdogs), deep learning can be used to automatically generate captions for images, convert photos into paintings, detect cancer in pathology slide images, and help self-driving cars ‘see’.
The talk will give an overview of the cutting edge and some of the core mathematical concepts and will also include a short code-first tutorial to show how easy it is to get started using deep learning for computer vision in python…
Pandas is the Swiss-Multipurpose Knife for Data Analysis in Python. In this talk we will look deeper into how to gain productivity utilizing Pandas powerful indexing and make advanced analytics a piece of cake. Pandas features multiple index types. This talk will give you a deep insight into the Pandas indexes and showcase the handiness of special Indexes as the TimeSeriesIndex.
Jens Nie,Peer Wagner
Introducing a new programming language in a company is always a daring task, usually involving a lot of effort and the will for change.
We'd like to take you on a journey reflecting eight years of challenges, solutions and success ending in a best practice guide helping you to achieve the same.
Apache Airflow is an Open-Source python project which facilitates an intuitive programmatic definition of analytical data pipelines. Based on 2+ years of productive experience, we summarize its core concepts, detail on lessons learned and set it in context with the Big Data Analytics Ecosystem.
An introduction and hands on example how to start Continuous Delivery for python (or whatever) projects with conda and gitlab, which are open source, free to use, and if you wish even available as a cloud service.
billiger.de is a German price comparison site. Search is handled by a heavily customized Solr setup. When switching to SolrCloud earlier this year, instead of porting our custom SolrComponents to SolrCloud, we ended up re-implementing them in a Python service layer. Here we show how, and why.
Scikit-Learn is built directly over numpy, Python's numerical array library. Pandas adds to numpy metadata and higher-level munging capabilities. This talk describes how to intelligently auto-wrap Scikit-Learn for creating a version that can leverage pandas's added features.
So you've heard about this new thing called Graphql. What is it all about? What problems does it solve, and most importantly, how can you leverage it the python ecosystem? This talk is a tell all on what what Graphql is and how you can start using it with Python.
Computer languages are a remarkable feat of human scientific engineering. In this talk, we'll look at the innards of CPython, and specifically learn how to modify and hack Abstract Syntax Trees (for world peace, of course).
Swarm64DB is a hardware-accelerated plugin for PostgreSQL and other RDBMS. By using Swarm64DB in combination with PostgreSQL, Python and the right scaling mechanism, we are able to push the ingestion throughput into areas where Python can easily compete with compiled languages. The talk highlights the architecture of our solution and showcases a real world use-case..
The presentation will explain how GeoPandas and other tools are used to analyse GTFS files to calculate the reachability of a public transport system.
Jupyter Notebooks combine executable code and rich text elements in a web application. In this talk you will learn how a custom JupyterHub installation can be used to integrate Jupyter Notebooks into your infrastructure, including existing authentication methods and custom software distributions.
So you’ve decoupled your code monolith into all those micro chunks. When someone asks „How can I…“ you want to answer: „That’s easy! We’ve built that.“ Actually, you’ve built all parts needed for that. Who plugs them together? And how?
For prescriptive analytics applications, data science teams need to design, build and maintain complex machine learning pipelines. In this talk, we demonstrate how such pipelines can be implemented in a robust, scalable and extensible manner using Python, Luigi, PySpark and scikit-learn.
Think you can benefit from making your Python application run faster? Then come along and learn how to tune your code with Cython.
Dr. Andreas Schilling
How Do you kick start a project which is based on 2.5 GB files of unstructured specification documents? To answer this question, we present our lessons learned from developing a Python based knowledge management tool which provides a lightweight and intuitive browser frontend.
You have an existing codebase of tens or hundreds of thousands of lines of Python code? Learn how to get started with type annotations! Get your teammates (and yourself!) to always annotate your code. Find out what unexpected issues you might run into and how to solve them, all with this talk.
The challenge of data integration is real. The sheer amount of tools that exist to address this problem is proof that organizations struggle with it. This talk will discuss the inherent challenges of data integration, and show how it can be tackled using Python and Apache Airflow and Apache Spark.
What you do in Ansible should be clean an simple. What we did was not. So I will show what we did wrong but also what we have changed or still have to, to make our life easier again. But I will also show how we progressively utilize Ansible to deploy our Data Science infrastructure.
If you have services running in production, something will fail sooner or later. We cannot avoid this completely, but we can prepare for it. In this talk we will have a look at how Sentry and Prometheus can help to get better insight into our systems to quickly track down the cause of failure.
shop.rewe.de is not only visited by human customers, but also by machines. We have built a deep learning platform using python with Keras, Tensorflow, on the Google infrastructure. In this talk we would like to show you how python is used in practice, supporting 2,5 million visitors each day.
The power of some popular web applications like WordPress comes from a flexible plugin system. This talk will show how to implement such plugin architectures for Python web applications including real-world examples. I'll give examples with Django, but the important bits aren't Django-specific.
In this session you will get a gentle introduction to the ever-expanding world of small programmable devices: learn to use single board computers and microcontrollers to connect to sensors and talk to APIs - all using Python or MicroPython, a subset of Python 3 for use in constrained environments.
Thomas Reifenberger,Martin Foertsch
Using humanoid robots, VR glasses and 3D cameras you can experience the world through the eyes of a robot and control it via gestures. We built a telepresence robotics system based on a Nao robot, an Oculus Rift and a Kinect One to realize an immersive "out-of-body experience" as in "Avatar".
The PyGenSA python module has been developed for generalized simulated annealing to process complicated non-linear objective functions with a large number of local minima.
The N Body Problem is a computationally complex problem that we use to predict how planets and galaxies – and everything in between – move through space. I'll show you some interesting ways to calculate it, and we'll have a look at what to do, should you find yourself in a space ship's pilot seat.
A lot of people think that Python is a really simple and straightforward language. Python hides a lot of peculiarities very well, but for the sake of this talk we will try to uncover them.
++4; valid Python? And what does it do? Let me give you an introduction into tokenizers/parsers.
MicroPython is a complete reimplementation of Python that runs on small devices like microcontrollers. In this hands-on workshop I'll show how easy it is to use MicroPython on a pyboard.
OpenWhisk is an opensource implementation of a so called serverless computing platform. At a live presentation I will show how to write an serverless application and how to deal with libraries and events. OpenWhisk is an open source alternative to AWS lambda or MS functions.
Soon you will primarily communicate with your computer through conversation. At Rasa, we believe that this revolution in user experience should be available to everyone. In this spirit we have developed open source tools that use machine learning to make chatbots in a developer-friendly interface.
Modern neural networks have hundreds of layers! How can we train such deep networks? Simply stacking layers on top doesn't work! This talk introduces the deep learning library PyTorch by explaining the exciting math, cool ideas and simple code behind what makes really deep neural networks work.
Simple is better than complex, and that's True for data pipelines, too.
Bonobo is a python 3.5+ tool used to write and monitor data pipelines. It’s plain, simple, modern, and atomic python.
This talk is a practical encounter, from zero to a complete data pipeline.
Spoiler : no «big data» here.
Sport analysis with Python and visualize data with tableau. We have sample data of a team in football match (name of players, positions of players, velocities of players) which are recorded in every 20 millisecond. We use python to analysis and Tableau to visualize the activities of each player
Dr. Hendrik Niemeyer
In this talk I will show how we use real and synthetic data to create successful models for risk assessing pipeline anomalies. The main focus is the estimation of the difference in the statistical properties of real and generated data by machine learning methods.
Ever stumbled upon poorly-maintained codebases that suck away your productivity? Fear no more! This talk addresses how to identify code smell (from Brie to Bleu cheese) and go through examples to refactor code and APIs. You will learn the art of writing clean, maintainable and idiomatic Python code.
BorgBackup is a modern, deduplicating backup software written in Python 3.4+, Cython and C.
The talk will start with a quick presentation about the software and why you may want to use it for your backups.
Then, I will show how we run the software project: Tools, Services, Best Practices.
Generative Adversarial Networks (GANs) are a class of neural networks which are powerful and flexible tools. A common application is image generation. I would like to give a simple introduction to GANs using existing python modules and an example of how "mustache-ness" can be learned and applied.
Pythonistas have access to an extensive collection of tools for data analysis. The space of tools is best understood as an ecosystem: Libraries build upon each other, and a good library fills an ecological niche by doing certain jobs well. This is a guided tour of the Python data science ecosystem.
The Zen of Python motivates us to build software that is easy to maintain and extend. In reality however, we often end up with systems that are quite the opposite: complex and hard to change. In this talk, we will have a look at why this happens and how we can try to prevent it.
Samuel Muñoz Hidalgo | BEEVA
Is it possible to predict the point in the screen where a person is looking at? Easy to say but hard to do. An eye tracking system is the perfect project to learn the difficulties of applied machine learning. From gathering training data to building the final software with an acceptable performance.
In this talk, I will introduce the basics of sympy. Using a simple model system in magnetism, we'll play around with simplifications, then do a bit of numerical optimization and in the end make psychedelic-looking figures.
Have you ever thought about developing a time series model to predict stock prices? Or do you consider log time series from the operation of cloud resources as being more compelling? In this case you really should consider using the time series feature extraction package tsfresh for your project.
Python's database API 2.0 is well suited for transactional database workflows, but not so much for column-heavy data science. This talk explains how the ODBC-based turbodbc database module extends this API with first-class, efficient support for familiar NumPy and Apache Arrow data structures.
It can be hard to test code that depends on external services. Often such services are mocked, but with time, it can be challenging to keep these mocks up to date. Verified fakes can solve this problem, and we will see how to set them up using OpenAPI and python.
What do you use to write source code, docs, books or e-mails? Single brain, single pair of hands, single keyboard, but a different keyboard layout for each language and a different text editor for each purpose?
Not too long ago, the finance field was dominated by compiled languages, such as C or C++, since they were considered to be the right choice for the implementation of computationally demanding algorithms. This talk explains why Python has become No 1 in the field.