Image for post
Image for post
Photo by Stephen Dawson on Unsplash

Kubernetes is our new Operating System, no one can doubt that anymore. As a lot of effort has been made, in order to develop a micro-services approach, and migrate workloads towards Kubernetes, organizations left their data services behind.

We all saw, due to COVID-19, how important data is, and how important it is to have the proper architecture and data fabric. Data won’t stop from growing! more even, it’ll just keep breaking its consumption records one year after the other.

This challenge forces us to provide a more automatic, and scalable solution to our organization by moving our data services…


Image for post
Image for post
Photo by Chris Liverani on Unsplash

The Big Data world is making its way towards Kubernetes and we already see many Data Processing and AI/ML products building their solution around Kubernetes to assure stability, scalability, and availability.

Until now, most of the solutions were VM based, with no orchestration, automation, or config management layers above, which caused those solutions to be less scalable and a little bit of a pain.

With Kubernetes, we can provide far more scalable solutions, that are fully automated and will preserve their state after failures.

With the will to run Big Data workloads on Kubernetes, comes the need for a simple…


Image for post
Image for post
Photo by ev on Unsplash

Kubernetes has become the de facto-standard container orchestration platform. With this approach, organizations are trying to gather up all their applications and platforms around Kubernetes to take advantage of its stability, agility, and simplicity. Running your whole stack in Kubernetes will allow you to have a single API and a common language whether it’s for an application, a database, or a storage engine that needs to be deployed.

A few years ago, people believed that in order to gain more performance for big data workloads, your application needs to have performant local disks mostly based on flash media. …


Image for post
Image for post

After my last article regarding Ceph deployments, I have decided to talk with you about a new capability that is now available on Ceph Octopus upstream version and will be available on later the RHCS versions as well, called cephadm. This new capability will allow you to deploy a whole Ceph cluster in under 10 minutes. cephadm is a deployment tool that is here to make you life much easier when dealing with Ceph cluster deployments, It uses Podman in order to run all the Ceph daemons, and the deployment management is done via an SSH connection. There are a…


Image for post
Image for post

Today more and more organizations are moving away from ETL, an ETL process in the form of Extract-> Transform -> Load where the data is being extracted from its source location, being transformed into a clean valuable data and then loaded into a target database/warehouse. ETL jobs are batch-driven, time-consuming, and messy, mainly because there is no alignment between the different ETL pipeline components, which make the ETL architecture to be looking like a big spaghetti plate.

So, is ETL dead? The answer is not at all, it’s just being re-newed.

Many organizations understand that in today’s world they cannot…


Image for post
Image for post

With the massive adoption of Apache Kafka, enterprises are looking for a way of replicating data across different sites. Kafka by itself has its own internal replication and self-healing mechanism which are only relevant to the local cluster and cannot tolerate a whole site failure. The solution for that, is the “Mirror Maker” feature, with this capability, your local Kafka cluster can be replicated asynchronously to a different external/central Kafka cluster that is located on a whole different location in order to persist your data pipelines, log collection, and metrics gathering processes.

The “Mirror Maker” connects between two clusters, as…


Image for post
Image for post

Ceph is a distributed, unified software-defined storage solution, it can be the source of your relevant storage protocols by exposing block, file, and object storage. Most of Ceph’s installation today is being used as daemons treated as system services and can be started, stopped, reloaded, disabled, etc. With the massive adoption of microservices and container engines, Ceph daemons can be installed as containers too without causing any bottlenecks at all. Having Ceph running in containers increases management simplicity dramatically since containers are stateless and can easily spawn up when having a failure. …


Image for post
Image for post

As the world adopts the data-centric approach and people get more familiar with Kubernetes as an end-to-end platform for their application lifecycle, the need for persistent arises. By default, containers are stateless, which means that they don’t save any state and treat the data as ephemeral. To solve this problem in Kubernetes, storage classes are being used. With storage classes, we have a storage provider (whether it’s block, file, or object storage) that Kubernetes can access to save the information that is being used by the containers in a volume. This volume is being attached to the container at runtime…


Image for post
Image for post

Today, when running Kubernetes in production, we are sometimes having a hard time collecting the logs from our cluster components. eventually, it ends up using custom made solutions that doesn’t provide the wanted user experience. To solve this problem, we can use the Cluster Logging Operator provided by Red Hat as out of the box solution for Openshift Container Platform. The CLO will deploy a whole Elastisearch->Fluentd->Kibana stack that will collect logs from our cluster components automatically. …


There are few challenges in moving to a microservices approach, which mainly comes from the fact that the application has a lot of moving parts. When you have a monolithic application, you have all composed in one piece — easier to deploy, monitor, observe, and secure.

As microservices became very popular, container orchestration engines such as Kubernetes/Openshift has become very popular too, where each microservice is actually one pod (generally speaking), and all microservices live in the Kubernetes cluster. With all these small components, it is becoming harder to monitor, observe, trace, and secure your applications. For example, you have…

Shon Paz

Solution Architect, Red Hat

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store