A little bit info about history of this project...
Disclaimer
Because I don't have enough time for this project anymore (and I've cancelled my hosting subscription) - most of the links to services are dead. However, I decided to keep this repo and simply don't publish links to it.
I'm a little bit ashamed for some code in it :(((
How many projects have suffered this fate!!....
Main part
When I thought about how to use in practice all technologies I studied last years - I have decided to create project which could include all my interests:
- a lot of data,
- distributed/cluster programming,
- ML,
- reactive/functional programming,
- network/REST programming,
- DevOps and so on….
The have choose idea of implementing a news aggregator and its server side for data collection, processing, storage and presentation. Not because I think that there is need to one other news aggregator - but because it’s very easy to apply aforementioned technologies.
From high-level point of view the project could be drawn like this:
Project is divided in many parts (one for each component and other things) - which are available in my repos with prefix “story_line2_“.
I have described all components of project (you can see them in TOC) with links to repos, CI status and deployment code.
Completed parts of project are working 24/7 - you can monitor its status in Monitoring Links.
If you have found any typo or have any questions - feel free to contact with me (contacts in footer).
Table of content:
- Home
- Prerequisites
- Developement stand provisioning
- Components
- Crawler (Spring Boot, Java)
- Message Broker (Kafka)
- Distributed file storage (Hadoop’s HDFS)
- Data warehouse (Apache Hive)
- Distributed business logic cluster (Akka, Scala)
- Distributed data processing cluster (Spark, Scala)
- Indexing/Search engine (Elasticsearch)
- REST server (Lagom, Scala)
- Service coordination (Zookeeper)
- Time-series database (monitoring data) (InfluxDB)
- Metrics collector (Telegraf)
- Monitoring visualization service (Grafana)
- Reverse proxy/load-balancer (nginx)
- Monitoring Links
- Development