From: Florent Masseglia <florent.masseglia@inria.fr>
Date: Mon, Mar 12, 2012 at 11:59 PM
Subject: [Dbworld] Postdoc Position: Data Mining in the Cloud (Montpellier, France)
Title: Data Mining in the Cloud

Topic: cloud platforms rely on technologies and architectures that handle massive distribution of data and computation. They are usually provided and maintained by major companies (Amazon, Google, Yahoo, Microsoft). Hadoop is a free platform written in Java that allows data management and processing in a cloud environment. Hadoop is maintained by the Apache Foundation and implements the Google MapReduce technology. Today, most solutions for data mining in the cloud are straightforward implementations of existing algorithms in the selected cloud programming language. A basic illustration is the implementation for MapReduce of the “aPriori” algorithm which performs successive counting steps that rely on the native cloud primitives.

However, not all algorithms can have such straightforward implementations. This work aims at developing a set of data mining primitives optimized for a cloud environment. Such primitives have to be useful for different data mining tasks (e.g., finding frequent itemsets and sequential patterns, clustering, etc.).

Missions and activities:

Your mission will consist in:

  • Proposing efficient algorithms for some primitives that are useful for different data mining tasks and require a specific adaptation in the cloud.

  • Implementing the proposed algorithms on an experimental platform for large scale parallel and distributed systems.

  • Performing experiments on real scientific data and evaluating performances of your implementation for the tackled data mining primitives and the associated data mining tasks.

Skills and profiles:

– Strong knowledge of statistics.

– Good proficiency in English.

– Good programming skills in Java.

– A Ph.D. in computer science or mathematics.

Duration, Location and Salary:

Duration is 12 months and location is Montpellier.

The net salary is 2138 euros and includes social security.

A first round of selection will be organized in April for those who applied before March 16, 2012. In case some positions remain available after the first round, a second round will be organized late June. For this second round the deadline for applying is June 29, 2012. We strongly recommend the applicants to submit before the first deadline, i.e. before March 16, 2012.


The Zenith project-team of INRIA, headed by Patrick Valduriez, aims to propose new solutions related to scientific data and activities. Our research topics incorporate the management and analysis of massive and complex data, such as uncertain data, in highly distributed environments.

Our team is located in Montpellier and hosted by the LIRMM Laboratory. Montpellier is a very active town located in south of France. It gathers together major research Labs, that work on environment and health, such as INRA, CIRAD or IRD. Generally speaking, these scientific activities generate extremely large amounts of complex data that need to be managed and analyzed.


Reza Akbarinia (reza.akbarinia@inria.fr) and Florent Masseglia (florent.masseglia@inria.fr).

Zenith Web page: http://www-sop.inria.fr/teams/zenith/

Application page:


