Event Date: June 25, 2015 15:15
Apache Flink and the Berlin Big Data Center
Data management research, systems, and technologies have drastically improved the availability of data analysis capabilities, particularly for non-experts, due in part to low-entry barriers and reduced ownership costs (e.g., for data management infrastructures and applications). Major reasons for the widespread success of database systems and today’s multi-billion dollar data management market include data independence, separating physical representation and storage from the actual information, and declarative languages, separating the program specification from its intended execution environment. In contrast, today’s big data solutions do not offer data independence and declarative specification.
As a result, big data technologies are mostly employed in newly-established companies with IT-savvy employees or in large well-established companies with big IT departments. We argue that current big data solutions will continue to fall short of widespread adoption, due to usability problems, despite the fact that in-situ data analytics technologies achieve a good degree of schema independence. In particular, we consider the lack of a declarative specification to be a major road-block, contributing to the scarcity in available data scientists available and limiting the application of big data to the IT-savvy industries. In particular, data scientists currently have to spend a lot of time on tuning their data analysis programs for specific data characteristics and a specific execution environment.
We believe that computer science research needs to bring forward the powerful concepts of declarative specification, query optimization and automatic parallelization as well as adaption to novel hardware, data characteristics and workload to current data analysis systems, in order to achieve the broad big data technology adoption and effectively deliver the promise that novel big data technologies offer. We will present the technologies that we have researched and developed in the context of Apache Flink (http://flink.apache.org ) and will give an outlook on further research and development that we are conducting at Database Systems and Information Management Group (DIMA) at TU Berlin and the Berlin Big Data Center (http://bbdc.berlin , http://www.dima.tu-berlin.de) as well as some current research challenges.
Bio
Volker Markl is a Full Professor and Chair of the Database Systems and Information Management (DIMA) group at the Technische Universität Berlin (TU Berlin). Volker also holds a position as an adjunct full professor at the University of Toronto and is director of the research group “Intelligent Analysis of Mass Data” at DFKI, the German Research Center for Artificial Intelligence. Earlier in his career, Dr. Markl lead a research group at FORWISS, the Bavarian Research Center for Knowledge-based Systems in Munich, Germany, and was a Research Staff member & Project Leader at the IBM Almaden Research Center in San Jose, California, USA. His research interests include: new hardware architectures for information management, scalable processing and optimization of declarative data analysis programs, and scalable data science, including graph and text mining, and scalable machine learning. Volker Markl has presented over 200 invited talks in numerous industrial settings and at major conferences and research institutions worldwide.
He has authored and published more than 100 research papers at world-class scientific venues. Volker regularly serves as member and chair for program committees of major international database conferences. He has been a member of the computer science evaluation group of the Natural Science and Engineering Research Council of Canada (NSERC). Volker has 18 patent awards, and he has submitted over 20 invention disclosures to date. Over the course of his career, he has garnered many prestigious awards, including the European Information Society and Technology Prize, an IBM Outstanding Technological Achievement Award , an IBM Shared University Research Grant , an HP Open Innovation Award , an IBM Faculty Award, a Trusted-Cloud Award for Information Marketplaces by the German Ministry of Economics and Technology, the Pat Goldberg Memorial Best Paper Award, and a VLDB Best Paper award. He has been speaker and principal investigator of the Stratosphere collaborative research unit funded by the German National Science Foundation (DFG), which resulted in numerous top-tier publications as well as the "Apache Flink" big data analytics system. Apache Flink is available open source and is currently used in production by several companies and serves as basis for teaching and research by several institutions in Germany, Europe and the United States. Volker currently serves as the secretary of the VLDB Endowment, is advising several companies and startups, and in 2014 was elected as one of Germany's leading "digital minds" (Digitale Köpfe) by the German Informatics Society (GI).