Components: Apache Hive
Description: The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
Provision script (Puppet manifest): hadoop.pp
Additional info: Hive site
Table of content:
- Home
- Prerequisites
- Developement stand provisioning
- Components
- Crawler (Spring Boot, Java)
- Message Broker (Kafka)
- Distributed file storage (Hadoop’s HDFS)
- Data warehouse (Apache Hive)
- Distributed business logic cluster (Akka, Scala)
- Distributed data processing cluster (Spark, Scala)
- Indexing/Search engine (Elasticsearch)
- REST server (Lagom, Scala)
- Service coordination (Zookeeper)
- Time-series database (monitoring data) (InfluxDB)
- Metrics collector (Telegraf)
- Monitoring visualization service (Grafana)
- Reverse proxy/load-balancer (nginx)
- Monitoring Links
- Development