Analyze Oracle Data Directly from Hadoop Compute Engines (HiveQL, Spark, Impala, etc)

For most companies, big data comes from enterprise data including: point of sales records, financial, medical, intelligence and call records, stored in Oracle databases. However, MapReduce algorithms, and patterns such as clustering (K-Means, Canopy), classification (Naive Bayes, K-NN), recommenders, and so on, cannot be easily implemented using SQL. Furthermore shipping enterprise data has latency, scalablity, integrity and security issues. 
How to analyze enterprise data in Oracle databases using the latest and modern Hadoop compute engines (HiveQL, MapReduce, Tez, Cascading, SparkQL, Impala, Red Shift, etc) and programming model without copying data over? This session describes (i) an HCatalog-based JDBC Storage Handler for safely processing relational data from Hadoop cluster w/o copying and (ii) "In-Database Container for Hadoop" for processing data in-place.

Kuassi Mensah is Group Product Manager for Oracle Database Data Access services, Net Services, and database programming APIs (Java, C/C++, PHP, Ruby, Python, Perl). Mr Mensah holds a MS in Computer Sciences from the Programming Institute of University of Paris VI. He is is a frequent speaker at Oracle and IT events; has published several articles and a book @ He maintains a blog @, and facebook, linkedin, and twitter pages.

  • All
  • Article
  • None

Quick Links

About Us

AIOUG is a non profit organization, provides Oracle technology and database professionals the opportunity to enhance their productivity and influence the quality, usability, and support of Oracle technology. For more details

Stay Connected on:


For general information about the event/conference, including registration, please contact us at:
  This email address is being protected from spambots. You need JavaScript enabled to view it.