Date   

Re: Prometheus & Observability documentation

RichiH Hartmann
 

Thanks for this feedback!

On Wed, Jul 22, 2020 at 2:12 AM Arthur Silva Sens
<arthursens2005@...> wrote:

One thing that came to my mind when you talked about Monitoring vs Datalake, I guess that Machine Learning use-cases are starting to grow on the Observability field, so metrics' datalakes may not be garbage anymore. Of course that Monitoring and Machine Learning are two different things, but I think that both have some space at the Observability scope. WDYT?
I agree that ML needs more nuance. I have better mental pictures to
transport my caveats now and those would hopefully enable more
deliberate tradeoff considerations.


One topic that is still unclear to me is: When should I need tracing? You say that tracing is expensive and should be treated with care, so how do I know if tracing is worth for a particular case?
Fair point. I should have talked about trade-offs more.


Overall, I think that it was a wonderful presentation and I would recommend to anyone who is getting started with observability, like myself. And again, this is the opinion of someone not much experienced with all of this, maybe someone more experienced could give some feedback if what you are saying remains correct nowadays.
Thanks; and I think the ones with experience are good at nitpicking,
but, by definition, can't _truly_ say if it helps newcomers. The
newcomers need to do that.


Best,
Richard


Re: Prometheus & Observability documentation

Bartłomiej Płotka
 

Thanks Arthur - those questions are an amazing indication for us what sections of our potential SIG 101 doc should be extended (: 

> One topic that is still unclear to me is:  When should I need tracing? You say that tracing is expensive and should be treated with care, so how do I know if tracing is worth for a particular case?

TL;DR the answer is simple: When other signals do not give the answer you are looking for (: 
For example "why my request was slow? What part (e.g which service call) of the request execution was the bottleneck?" I find tracing definition by Peter Bourgon quite useful:

> (...) tracing then, is that it deals with information that is request-scoped. Any bit of data or metadata that can be bound to lifecycle of a single transactional object in the system. As examples: the duration of an outbound RPC to a remote service; the text of an actual SQL query sent to a database; or the correlation ID of an inbound HTTP request.

Kind Regards,
Bartek Plotka

On Wed, 22 Jul 2020 at 01:12, Arthur Silva Sens <arthursens2005@...> wrote:
Hello Richard and everyone,

Maybe some of you recognize me from GitHub, some may not, but for context to everyone.. I'm a recently graduated student with not much experience around the whole observability scope, currently working on improving KubeVirt's observability(CNCF sandbox project). 

I've just watched the observability 101 and I must say that it did give me some really good directions on how to proceed with my project. 
I can recognize some old mistakes that I did with old monitoring projects, often making cargo culting and building metrics datalakes instead of good monitoring solutions. Probably it wouldn't have happened if I watched your presentation back then.

One thing that came to my mind when you talked about Monitoring vs Datalake, I guess that Machine Learning use-cases are starting to grow on the Observability field, so metrics' datalakes may not be garbage anymore. Of course that Monitoring and Machine Learning are two different things, but I think that both have some space at the Observability scope. WDYT?

Another thing that was tremendously helpful for me in your presentation was the clear definition of what I'm supposed to accomplish with Metrics, Logs, and Traces. When implementing new metrics, I often try to solve problems that are better suited to logging/tracing. 

One topic that is still unclear to me is:  When should I need tracing? You say that tracing is expensive and should be treated with care, so how do I know if tracing is worth for a particular case?

Overall, I think that it was a wonderful presentation and I would recommend to anyone who is getting started with observability, like myself. And again, this is the opinion of someone not much experienced with all of this, maybe someone more experienced could give some feedback if what you are saying remains correct nowadays.

thanks,
Arthur

Em ter., 21 de jul. de 2020 às 14:04, Richard Hartmann <richih@...> escreveu:
Post it here, please. I do get FOSDEM feedback email too, but that's
invisible to this list.

On Tue, Jul 21, 2020 at 6:58 PM <a@...> wrote:
>
> Hi Richard,
>
> Regarding Observability 101 content: I noticed the recording page has a feedback link. Should I send feedback there, or post it here?
>
> Thank you,
> Amin
>




Re: Prometheus & Observability documentation

Arthur Silva Sens
 

Hello Richard and everyone,

Maybe some of you recognize me from GitHub, some may not, but for context to everyone.. I'm a recently graduated student with not much experience around the whole observability scope, currently working on improving KubeVirt's observability(CNCF sandbox project). 

I've just watched the observability 101 and I must say that it did give me some really good directions on how to proceed with my project. 
I can recognize some old mistakes that I did with old monitoring projects, often making cargo culting and building metrics datalakes instead of good monitoring solutions. Probably it wouldn't have happened if I watched your presentation back then.

One thing that came to my mind when you talked about Monitoring vs Datalake, I guess that Machine Learning use-cases are starting to grow on the Observability field, so metrics' datalakes may not be garbage anymore. Of course that Monitoring and Machine Learning are two different things, but I think that both have some space at the Observability scope. WDYT?

Another thing that was tremendously helpful for me in your presentation was the clear definition of what I'm supposed to accomplish with Metrics, Logs, and Traces. When implementing new metrics, I often try to solve problems that are better suited to logging/tracing. 

One topic that is still unclear to me is:  When should I need tracing? You say that tracing is expensive and should be treated with care, so how do I know if tracing is worth for a particular case?

Overall, I think that it was a wonderful presentation and I would recommend to anyone who is getting started with observability, like myself. And again, this is the opinion of someone not much experienced with all of this, maybe someone more experienced could give some feedback if what you are saying remains correct nowadays.

thanks,
Arthur

Em ter., 21 de jul. de 2020 às 14:04, Richard Hartmann <richih@...> escreveu:

Post it here, please. I do get FOSDEM feedback email too, but that's
invisible to this list.

On Tue, Jul 21, 2020 at 6:58 PM <a@...> wrote:
>
> Hi Richard,
>
> Regarding Observability 101 content: I noticed the recording page has a feedback link. Should I send feedback there, or post it here?
>
> Thank you,
> Amin
>




Re: Prometheus & Observability documentation

Richard Hartmann
 

Post it here, please. I do get FOSDEM feedback email too, but that's
invisible to this list.

On Tue, Jul 21, 2020 at 6:58 PM <a@...> wrote:

Hi Richard,

Regarding Observability 101 content: I noticed the recording page has a feedback link. Should I send feedback there, or post it here?

Thank you,
Amin


Re: Prometheus & Observability documentation

Amin Amos
 

Hi Richard,

Regarding Observability 101 content: I noticed the recording page has a feedback link. Should I send feedback there, or post it here?

Thank you,
Amin


Skipping August meetings

Richard Hartmann
 

Dear all,

as we were only nine people during the call, as PTO time is coming,
and as we don't have any pressing action items, we decided to skip
both August meetings.


Best,
Richard


Prometheus & Observability documentation

Richard Hartmann
 

Dear all,

during our call I mentioned that Prometheus will be overhauling our
complete documentation, see
https://docs.google.com/document/d/1yuaPKLDvhJNXMF1ubsOOm5kE2_6dvCxHowBQIDs0KdU/edit
for details.

Also, we are working on making 101 content. If you have someone who
can try https://archive.fosdem.org/2019/schedule/event/on_observability_2019/
and give honest feedback, that would be great.


Best,
Richard


Data analysis use case collection & collation

Richard Hartmann
 

Dear all,

as per yesterday's call, we will start collecting use cases for data
analysis scenarios. Please use
https://docs.google.com/document/d/1yfbBG8MBllLRCXVNMp2fgRJm5TiCooO9NSD2CfOQ4ck/edit
as the shared resource for this.


Best,
Richard


Meeting times

Richard Hartmann
 

Dear all,

as agreed during the last meeting, we will start future calls s.t.,
i.e. on time. While I understand the struggle of juggling endless
meetings all too well, we end at 50m to allow people to jump to the
next meeting. Rather than lose ~5% of our time and
encourage/perpetuate this going forward, we want to try and lead by
example here :)


Best,
Richard


Re: [Action Required] Thanos to Incubation Due Diligence Ask For Review

Bartłomiej Płotka
 

Hi.

A short reminder to check the DD before tomorrow's meeting as the plan is to review it together, similar to the Cortex document.

Doc: https://docs.google.com/document/d/1jJk5seSUcgwybT4nVGOzaRGrugc90uL3WCY0fUgQh1M/edit?usp=sharing 

Kind Regards,
Bartek


Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.

Ricardo Aravena
 

> Thanks for those examples, a recommendation of using those would be amazing as well. What I will say might be controversial, but the goal of this initiative is NOT to steal projects or compete with Apache. It's actually the opposite: Integrate better with the most promising open-source systems that solve the community use cases. I think if we can encourage some amazing project to join CNCF that's great, but IMO CNCF is only here to help project that needs help. If Druid or others already have helped from other organizations, that's fine, does it matter for us? (: In my opinion, what matters is that the promising project has the help, funding, and support it needs.

I like Druid, they are backed by https://imply.io too. I think you have multiple options to go about integrating Prometheus or time-series databases with these systems. One aspect would be to just support Lambda (batch/streaming), Kappa (all in one streaming), or both architectures. 

At the batch layer and to create a warehouse, one way would be to support exporting from TSDB to standard data formats. i.e Parquet, Avro, Arrow (in-mem columnar), Protobuf, etc. For streaming, you can support TSDB publishing to Kafka. Then you could orchestrate everything with Spark or Flink (batch and streaming) on top of K8s of course :-). Because there are many projects, and no one size fits all (data sources come in all different forms) I think it would be great to come up with a reference architecture that works for Prometheus (assuming all the integration points have been created). 

It would also be super interesting to understand how some of the observability vendors have implemented their systems if they'd like to share. Although, it might be too much to ask since that typically constitutes a lot of their bread and butter :-) 

On Sat, May 30, 2020 at 5:44 AM Bartłomiej Płotka <bwplotka@...> wrote:
Ricardo, Rob thanks for the answers so far! (: 

Ricardo:

> This is a great idea. Keep in mind that OLAPs are not necessarily used for monitoring and observability. 

Yup, we have already solid projects in the observability space. However, business-oriented analytics results are one of the best use cases/outcome of the long term observability data we collect for monitoring needs, right? (: It's quite an amazing side effect and benefit of collecting such data. The idea is to connect both worlds through integrations and Open APIs.

> In this space, I can think of these projects: DruidPinotKylinClickhouseModrianCubes (There might be others)  Druid, Pinot, and Kylin are already part of the Apache Foundation so that leaves others that we could approach to join the CNCF.

Thanks for those examples, a recommendation of using those would be amazing as well. What I will say might be controversial, but the goal of this initiative is NOT to steal projects or compete with Apache. It's actually the opposite: Integrate better with the most promising open-source systems that solve the community use cases. I think if we can encourage some amazing project to join CNCF that's great, but IMO CNCF is only here to help project that needs help. If Druid or others already have helped from other organizations, that's fine, does it matter for us? (: In my opinion, what matters is that the promising project has the help, funding, and support it needs.

> For example, storing historical data (SIG-Storage), running your batch processor workers (SIG-Runtime), serving your real-time and historical data (SIG-Network).

I agree this is very connected. However, my honest opinion is that Analytics, even in OLAP based fashion overlaps a little bit with SIG-Observability and this is why I am interested to find some solutions for our communities. 

Rob:
> experience with it at Uber (running OLAP queries against monitoring and observability data).

Yes! This is what we are looking for - production-grade experience and recommendations for this.

> 1. With respect to SIG Observability, I think talking and moving forward on options/standardized approaches to OLAP on monitoring and observability data makes sense. With regards to BI/OLAP in general, I would say that SIG Observability should not be focused on this space and would probably be better served by a dedicated data engineering SIG.

I tend to agree, however, given no one started SIG-BigData yet and given the observability data is quite an enormous source of meaningful information, I would love to explore at least API and integrations possibilities here. Maybe I'm wrong (: 

> 2. At Uber we ETL'd subsets of data users wanted to do large processing on into an existing platform. The data warehouse supported Spark and Presto for interactive queries (i.e. pull raw data matching query at at query time) and HDFS (ingest raw data as it arrives via Kafka into HDFS and ETL/query there).

Awesome, good examples, worth to revisit those amazing projects and integrations there (Spark, Presto, Hadoop)

> I'd love to see a project that was Prometheus Remote Read -> Spark for interactive or batch ETL against Prometheus data. Also Prometheus Remote Read -> Presto could be interesting, although Presto focuses more on interactive queries vs say Spark.

Yes! Does anyone have more info about that Spark integration? I remember some teams are using Presto on Thanos data already at Red Hat,  I might try to find more information on that as well. (: 

> The major issue with other systems in this space tends to be owning the whole data pipeline that results, e.g. Thanos/Cortex/M3/ecosystem would need to support an ongoing export of data into another stateful system such as Druid, Pinot, Clickhouse, etc. You also then have to now store the data in these other warehouses with smart policies, otherwise a lot of users end up just whitelisting all of the data to be warehoused. Typically this ends up with really large datasets existing in two different systems and a significant investment to keep the pipeline flowing between the two. 
That is why I think seeing projects that support interactive and ETL that operate on the dataset from the Prometheus metrics store itself and then save elsewhere being quite interesting, rather than warehouse the whole dataset themselves.


Yes! This is actually the amazing novelty we would love to push toward as well. Instead of storing the same data in 5 places can we keep it in just one? The idea would be to promote efficient streaming read API more vs copying the data to different formats. I mentioned this in one of the requirements here. This might mean more work on those Thanos/Cortex/M3/ecosystem projects, but given we are already collaborating, it might be easier (: This is along the lines what we try to push on metrics/logs/tracing world as mentioned by my team colleague Frederic: Can we reuse similar index for those three since we observe collect different data.. but from the same resources?

Kind Regards,
Bartek

On Fri, 29 May 2020 at 18:16, Rob Skillington <rob@...> wrote:
Hey Bartek,

Glad to hear this topic brought up, it's something we think a lot about and have some experience with it at Uber (running OLAP queries against monitoring and observability data).

1. With respect to SIG Observability, I think talking and moving forward on options/standardized approaches to OLAP on monitoring and observability data makes sense. With regards to BI/OLAP in general, I would say that SIG Observability should not be focused on this space and would probably be better served by a dedicated data engineering SIG.

2. At Uber we ETL'd subsets of data users wanted to do large processing on into an existing platform. The data warehouse supported Spark and Presto for interactive queries (i.e. pull raw data matching query at at query time) and HDFS (ingest raw data as it arrives via Kafka into HDFS and ETL/query there).

I'd love to see a project that was Prometheus Remote Read -> Spark for interactive or batch ETL against Prometheus data. Also Prometheus Remote Read -> Presto could be interesting, although Presto focuses more on interactive queries vs say Spark.

The major issue with other systems in this space tends to be owning the whole data pipeline that results, e.g. Thanos/Cortex/M3/ecosystem would need to support an ongoing export of data into another stateful system such as Druid, Pinot, Clickhouse, etc. You also then have to now store the data in these other warehouses with smart policies, otherwise a lot of users end up just whitelisting all of the data to be warehoused. Typically this ends up with really large datasets existing in two different systems and a significant investment to keep the pipeline flowing between the two. 

That is why I think seeing projects that support interactive and ETL that operate on the dataset from the Prometheus metrics store itself and then save elsewhere being quite interesting, rather than warehouse the whole dataset themselves.

Best,
Rob


On Fri, May 29, 2020 at 12:50 PM Ricardo Aravena <raravena80@...> wrote:
Bartek,

This is a great idea. Keep in mind that OLAPs are not necessarily used for monitoring and observability. For example, in the past, I worked on implementing Apache Druid to collect mobile analytics. In this space, I can think of these projects: Druid, Pinot, KylinClickhouse, Modrian, Cubes (There might be others)  Druid, Pinot, and Kylin are already part of the Apache Foundation so that leaves others that we could approach to join the CNCF.

Having said that because OLAP systems can be quite complex, there are multiple components that may fall into the scope of other CNCF SIGs. For example, storing historical data (SIG-Storage), running your batch processor workers (SIG-Runtime), serving your real-time and historical data (SIG-Network).

In any case, it would be great to approach the different projects so that the CNCF community is aware of how OLAPs work and foster general interest. 

Ricardo



On Fri, May 29, 2020 at 9:01 AM Bartłomiej Płotka <bwplotka@...> wrote:
Hi SIG Observability! 👋

I recently noticed that many of CNCF's Prometheus and Thanos users often desire to use their metric data collected by Prometheus for more advanced Analytics cases. Something more suitable for Business Intelligence / OLAP use cases. 

As the Prometheus maintainers, we designed Prometheus Query API and PomQL for realtime monitoring, or at most for simple analytics. It's far from being efficient for Data Mining or Data Exploration.

I feel there are two things we are missing in the CNCF space: 

1. Please tell me if I am wrong here, but I don't see any particular BI/OLAP open source project in the CNCF space. If not, I think as CNCF SIG Observability there is some possibility for us to encourage some project for this to either join or at least be closer integrated with the community. Do you think as the CNCF SIG Observability should we be doing this? 🤔

2. Metric data from, especially if you have years of it thanks to Thanos or Cortex, is an amazing source of information. In the Thanos community, we are actively looking for a project that will fit most of the requirements stated hereAre you currently a user of some Open Source OLAP system worth recommending? If yes, which one? Would you like to have good integration of such a system with metrics? 

We are looking for your feedback, preferably on this GitHub issue: https://github.com/thanos-io/thanos/issues/2682, I plan to also put this topic for the next SIG agenda if we will have time for it. 🤗 

Kind Regards and have a good weekend!
Bartek


Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.

Bartłomiej Płotka
 

BTW, Looks like Dan also suggests Analytics is in scope of our SIGs currently: https://github.com/cncf/landscape/issues/1632#issuecomment-638763810 (: 

Kind Regards,
Bartek

On Thu, 4 Jun 2020 at 15:40, RichiH Hartmann via lists.cncf.io <richih=grafana.com@...> wrote:
On Wed, Jun 3, 2020 at 3:38 PM Rob Skillington <rob@...> wrote:
 
Just my two cents though, maybe at first it could be folded into SIG-Observability and later broken out perhaps.

The IETF, the RIPE, and the Prometheus, model is to make something that works first, and then start branching out from there. So yes, I would strongly argue it should live somewhere related first, and then take on a life of its own.

NB: This is not SIG IETF, SIG RIPE, or SIG Prometheus. Just going on what I have seen work in the past.


Best
Richard


Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.

RichiH Hartmann
 

On Wed, Jun 3, 2020 at 3:38 PM Rob Skillington <rob@...> wrote:
 
Just my two cents though, maybe at first it could be folded into SIG-Observability and later broken out perhaps.

The IETF, the RIPE, and the Prometheus, model is to make something that works first, and then start branching out from there. So yes, I would strongly argue it should live somewhere related first, and then take on a life of its own.

NB: This is not SIG IETF, SIG RIPE, or SIG Prometheus. Just going on what I have seen work in the past.


Best
Richard


Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.

Rob Skillington
 

Hey Richi,

With respect to using R with Prometheus data, both Spark and Presto have the ability to write R programs against. Spark is much more mature in this space however. SparkR can run R functions over large datasets and has several Machine Learning algorithms already packaged for use with it's native types and model persistence with MLlib:
https://spark.apache.org/docs/latest/sparkr.html

I agree that integration of OLAP for observability and monitoring data can and should be led by SIG-Observability. And agreed on deeper analysis of data falling within Observability.

I thought there was a question of should SIG-Observability drive general purpose OLAP/BI within CNCF, of which I thought might not be best since data engineering is a huge space and there are many non-Observability related use cases of OLAP/BI better served by a SIG specifically focused on driving work in solely that area.

Just my two cents though, maybe at first it could be folded into SIG-Observability and later broken out perhaps.

Rob


On Tue, Jun 2, 2020 at 8:32 AM Richard Hartmann <richih@...> wrote:
Hijacking top email to reply across the board.

As many of you will know, I have been nagging Prometheus-team about this for years, so yes, I think we should cover this.

At PromCon 2017's dev summit hallway track, we talked about connectors to existing data analysis, e.g. an R interface to natively access data stored in Prometheus format. Thanos' block storage would solve a lot of pain points and Promtheus' remote read/write API is another obvious immediate attach point. Also at around the same time, I started a discussion about extending PromQL in this direction, a discussion which never went anywhere, but which I can see being revived.

I disagree that the topic should be death-by-committee'd day 1 by splitting it across several SIGs. Concerted effort and input from subject-matter experts is good, though. But get something off the ground first before making it more cumbersome.

Overall, I think it's something which we should at least take a look at in the context of this SIG. Deeper analysis of data definitely falls under o11y.


Best,
Richard


On Fri, May 29, 2020 at 6:00 PM Bartłomiej Płotka <bwplotka@...> wrote:
Hi SIG Observability! 👋

I recently noticed that many of CNCF's Prometheus and Thanos users often desire to use their metric data collected by Prometheus for more advanced Analytics cases. Something more suitable for Business Intelligence / OLAP use cases. 

As the Prometheus maintainers, we designed Prometheus Query API and PomQL for realtime monitoring, or at most for simple analytics. It's far from being efficient for Data Mining or Data Exploration.

I feel there are two things we are missing in the CNCF space: 

1. Please tell me if I am wrong here, but I don't see any particular BI/OLAP open source project in the CNCF space. If not, I think as CNCF SIG Observability there is some possibility for us to encourage some project for this to either join or at least be closer integrated with the community. Do you think as the CNCF SIG Observability should we be doing this? 🤔

2. Metric data from, especially if you have years of it thanks to Thanos or Cortex, is an amazing source of information. In the Thanos community, we are actively looking for a project that will fit most of the requirements stated hereAre you currently a user of some Open Source OLAP system worth recommending? If yes, which one? Would you like to have good integration of such a system with metrics? 

We are looking for your feedback, preferably on this GitHub issue: https://github.com/thanos-io/thanos/issues/2682, I plan to also put this topic for the next SIG agenda if we will have time for it. 🤗 

Kind Regards and have a good weekend!
Bartek


Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.

Richard Hartmann
 

Hijacking top email to reply across the board.

As many of you will know, I have been nagging Prometheus-team about this for years, so yes, I think we should cover this.

At PromCon 2017's dev summit hallway track, we talked about connectors to existing data analysis, e.g. an R interface to natively access data stored in Prometheus format. Thanos' block storage would solve a lot of pain points and Promtheus' remote read/write API is another obvious immediate attach point. Also at around the same time, I started a discussion about extending PromQL in this direction, a discussion which never went anywhere, but which I can see being revived.

I disagree that the topic should be death-by-committee'd day 1 by splitting it across several SIGs. Concerted effort and input from subject-matter experts is good, though. But get something off the ground first before making it more cumbersome.

Overall, I think it's something which we should at least take a look at in the context of this SIG. Deeper analysis of data definitely falls under o11y.


Best,
Richard


On Fri, May 29, 2020 at 6:00 PM Bartłomiej Płotka <bwplotka@...> wrote:
Hi SIG Observability! 👋

I recently noticed that many of CNCF's Prometheus and Thanos users often desire to use their metric data collected by Prometheus for more advanced Analytics cases. Something more suitable for Business Intelligence / OLAP use cases. 

As the Prometheus maintainers, we designed Prometheus Query API and PomQL for realtime monitoring, or at most for simple analytics. It's far from being efficient for Data Mining or Data Exploration.

I feel there are two things we are missing in the CNCF space: 

1. Please tell me if I am wrong here, but I don't see any particular BI/OLAP open source project in the CNCF space. If not, I think as CNCF SIG Observability there is some possibility for us to encourage some project for this to either join or at least be closer integrated with the community. Do you think as the CNCF SIG Observability should we be doing this? 🤔

2. Metric data from, especially if you have years of it thanks to Thanos or Cortex, is an amazing source of information. In the Thanos community, we are actively looking for a project that will fit most of the requirements stated hereAre you currently a user of some Open Source OLAP system worth recommending? If yes, which one? Would you like to have good integration of such a system with metrics? 

We are looking for your feedback, preferably on this GitHub issue: https://github.com/thanos-io/thanos/issues/2682, I plan to also put this topic for the next SIG agenda if we will have time for it. 🤗 

Kind Regards and have a good weekend!
Bartek


[Action Required] Thanos to Incubation Due Diligence Ask For Review

p.versockas@...
 

Hello,


I’m Povilas Versockas one of the Thanos maintainers & Prometheus Community organizers.


Similar to the Cortex document as per the next SIG Observability task I have prepared a Thanos Due Diligence document and I kindly invite all SIG Observability members to read and review. 


Feel free to add comments and questions to the doc, so we can discuss all during the next SIG Observability meeting on 9th June 2020 as per our agenda.


--

Best Regards,

Povilas Versockas


Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.

Bartłomiej Płotka
 

Ricardo, Rob thanks for the answers so far! (: 

Ricardo:

> This is a great idea. Keep in mind that OLAPs are not necessarily used for monitoring and observability. 

Yup, we have already solid projects in the observability space. However, business-oriented analytics results are one of the best use cases/outcome of the long term observability data we collect for monitoring needs, right? (: It's quite an amazing side effect and benefit of collecting such data. The idea is to connect both worlds through integrations and Open APIs.

> In this space, I can think of these projects: DruidPinotKylinClickhouseModrianCubes (There might be others)  Druid, Pinot, and Kylin are already part of the Apache Foundation so that leaves others that we could approach to join the CNCF.

Thanks for those examples, a recommendation of using those would be amazing as well. What I will say might be controversial, but the goal of this initiative is NOT to steal projects or compete with Apache. It's actually the opposite: Integrate better with the most promising open-source systems that solve the community use cases. I think if we can encourage some amazing project to join CNCF that's great, but IMO CNCF is only here to help project that needs help. If Druid or others already have helped from other organizations, that's fine, does it matter for us? (: In my opinion, what matters is that the promising project has the help, funding, and support it needs.

> For example, storing historical data (SIG-Storage), running your batch processor workers (SIG-Runtime), serving your real-time and historical data (SIG-Network).

I agree this is very connected. However, my honest opinion is that Analytics, even in OLAP based fashion overlaps a little bit with SIG-Observability and this is why I am interested to find some solutions for our communities. 

Rob:
> experience with it at Uber (running OLAP queries against monitoring and observability data).

Yes! This is what we are looking for - production-grade experience and recommendations for this.

> 1. With respect to SIG Observability, I think talking and moving forward on options/standardized approaches to OLAP on monitoring and observability data makes sense. With regards to BI/OLAP in general, I would say that SIG Observability should not be focused on this space and would probably be better served by a dedicated data engineering SIG.

I tend to agree, however, given no one started SIG-BigData yet and given the observability data is quite an enormous source of meaningful information, I would love to explore at least API and integrations possibilities here. Maybe I'm wrong (: 

> 2. At Uber we ETL'd subsets of data users wanted to do large processing on into an existing platform. The data warehouse supported Spark and Presto for interactive queries (i.e. pull raw data matching query at at query time) and HDFS (ingest raw data as it arrives via Kafka into HDFS and ETL/query there).

Awesome, good examples, worth to revisit those amazing projects and integrations there (Spark, Presto, Hadoop)

> I'd love to see a project that was Prometheus Remote Read -> Spark for interactive or batch ETL against Prometheus data. Also Prometheus Remote Read -> Presto could be interesting, although Presto focuses more on interactive queries vs say Spark.

Yes! Does anyone have more info about that Spark integration? I remember some teams are using Presto on Thanos data already at Red Hat,  I might try to find more information on that as well. (: 

> The major issue with other systems in this space tends to be owning the whole data pipeline that results, e.g. Thanos/Cortex/M3/ecosystem would need to support an ongoing export of data into another stateful system such as Druid, Pinot, Clickhouse, etc. You also then have to now store the data in these other warehouses with smart policies, otherwise a lot of users end up just whitelisting all of the data to be warehoused. Typically this ends up with really large datasets existing in two different systems and a significant investment to keep the pipeline flowing between the two. 
That is why I think seeing projects that support interactive and ETL that operate on the dataset from the Prometheus metrics store itself and then save elsewhere being quite interesting, rather than warehouse the whole dataset themselves.


Yes! This is actually the amazing novelty we would love to push toward as well. Instead of storing the same data in 5 places can we keep it in just one? The idea would be to promote efficient streaming read API more vs copying the data to different formats. I mentioned this in one of the requirements here. This might mean more work on those Thanos/Cortex/M3/ecosystem projects, but given we are already collaborating, it might be easier (: This is along the lines what we try to push on metrics/logs/tracing world as mentioned by my team colleague Frederic: Can we reuse similar index for those three since we observe collect different data.. but from the same resources?

Kind Regards,
Bartek

On Fri, 29 May 2020 at 18:16, Rob Skillington <rob@...> wrote:
Hey Bartek,

Glad to hear this topic brought up, it's something we think a lot about and have some experience with it at Uber (running OLAP queries against monitoring and observability data).

1. With respect to SIG Observability, I think talking and moving forward on options/standardized approaches to OLAP on monitoring and observability data makes sense. With regards to BI/OLAP in general, I would say that SIG Observability should not be focused on this space and would probably be better served by a dedicated data engineering SIG.

2. At Uber we ETL'd subsets of data users wanted to do large processing on into an existing platform. The data warehouse supported Spark and Presto for interactive queries (i.e. pull raw data matching query at at query time) and HDFS (ingest raw data as it arrives via Kafka into HDFS and ETL/query there).

I'd love to see a project that was Prometheus Remote Read -> Spark for interactive or batch ETL against Prometheus data. Also Prometheus Remote Read -> Presto could be interesting, although Presto focuses more on interactive queries vs say Spark.

The major issue with other systems in this space tends to be owning the whole data pipeline that results, e.g. Thanos/Cortex/M3/ecosystem would need to support an ongoing export of data into another stateful system such as Druid, Pinot, Clickhouse, etc. You also then have to now store the data in these other warehouses with smart policies, otherwise a lot of users end up just whitelisting all of the data to be warehoused. Typically this ends up with really large datasets existing in two different systems and a significant investment to keep the pipeline flowing between the two. 

That is why I think seeing projects that support interactive and ETL that operate on the dataset from the Prometheus metrics store itself and then save elsewhere being quite interesting, rather than warehouse the whole dataset themselves.

Best,
Rob


On Fri, May 29, 2020 at 12:50 PM Ricardo Aravena <raravena80@...> wrote:
Bartek,

This is a great idea. Keep in mind that OLAPs are not necessarily used for monitoring and observability. For example, in the past, I worked on implementing Apache Druid to collect mobile analytics. In this space, I can think of these projects: Druid, Pinot, KylinClickhouse, Modrian, Cubes (There might be others)  Druid, Pinot, and Kylin are already part of the Apache Foundation so that leaves others that we could approach to join the CNCF.

Having said that because OLAP systems can be quite complex, there are multiple components that may fall into the scope of other CNCF SIGs. For example, storing historical data (SIG-Storage), running your batch processor workers (SIG-Runtime), serving your real-time and historical data (SIG-Network).

In any case, it would be great to approach the different projects so that the CNCF community is aware of how OLAPs work and foster general interest. 

Ricardo



On Fri, May 29, 2020 at 9:01 AM Bartłomiej Płotka <bwplotka@...> wrote:
Hi SIG Observability! 👋

I recently noticed that many of CNCF's Prometheus and Thanos users often desire to use their metric data collected by Prometheus for more advanced Analytics cases. Something more suitable for Business Intelligence / OLAP use cases. 

As the Prometheus maintainers, we designed Prometheus Query API and PomQL for realtime monitoring, or at most for simple analytics. It's far from being efficient for Data Mining or Data Exploration.

I feel there are two things we are missing in the CNCF space: 

1. Please tell me if I am wrong here, but I don't see any particular BI/OLAP open source project in the CNCF space. If not, I think as CNCF SIG Observability there is some possibility for us to encourage some project for this to either join or at least be closer integrated with the community. Do you think as the CNCF SIG Observability should we be doing this? 🤔

2. Metric data from, especially if you have years of it thanks to Thanos or Cortex, is an amazing source of information. In the Thanos community, we are actively looking for a project that will fit most of the requirements stated hereAre you currently a user of some Open Source OLAP system worth recommending? If yes, which one? Would you like to have good integration of such a system with metrics? 

We are looking for your feedback, preferably on this GitHub issue: https://github.com/thanos-io/thanos/issues/2682, I plan to also put this topic for the next SIG agenda if we will have time for it. 🤗 

Kind Regards and have a good weekend!
Bartek


Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.

Rob Skillington
 

Hey Bartek,

Glad to hear this topic brought up, it's something we think a lot about and have some experience with it at Uber (running OLAP queries against monitoring and observability data).

1. With respect to SIG Observability, I think talking and moving forward on options/standardized approaches to OLAP on monitoring and observability data makes sense. With regards to BI/OLAP in general, I would say that SIG Observability should not be focused on this space and would probably be better served by a dedicated data engineering SIG.

2. At Uber we ETL'd subsets of data users wanted to do large processing on into an existing platform. The data warehouse supported Spark and Presto for interactive queries (i.e. pull raw data matching query at at query time) and HDFS (ingest raw data as it arrives via Kafka into HDFS and ETL/query there).

I'd love to see a project that was Prometheus Remote Read -> Spark for interactive or batch ETL against Prometheus data. Also Prometheus Remote Read -> Presto could be interesting, although Presto focuses more on interactive queries vs say Spark.

The major issue with other systems in this space tends to be owning the whole data pipeline that results, e.g. Thanos/Cortex/M3/ecosystem would need to support an ongoing export of data into another stateful system such as Druid, Pinot, Clickhouse, etc. You also then have to now store the data in these other warehouses with smart policies, otherwise a lot of users end up just whitelisting all of the data to be warehoused. Typically this ends up with really large datasets existing in two different systems and a significant investment to keep the pipeline flowing between the two. 

That is why I think seeing projects that support interactive and ETL that operate on the dataset from the Prometheus metrics store itself and then save elsewhere being quite interesting, rather than warehouse the whole dataset themselves.

Best,
Rob


On Fri, May 29, 2020 at 12:50 PM Ricardo Aravena <raravena80@...> wrote:
Bartek,

This is a great idea. Keep in mind that OLAPs are not necessarily used for monitoring and observability. For example, in the past, I worked on implementing Apache Druid to collect mobile analytics. In this space, I can think of these projects: Druid, Pinot, KylinClickhouse, Modrian, Cubes (There might be others)  Druid, Pinot, and Kylin are already part of the Apache Foundation so that leaves others that we could approach to join the CNCF.

Having said that because OLAP systems can be quite complex, there are multiple components that may fall into the scope of other CNCF SIGs. For example, storing historical data (SIG-Storage), running your batch processor workers (SIG-Runtime), serving your real-time and historical data (SIG-Network).

In any case, it would be great to approach the different projects so that the CNCF community is aware of how OLAPs work and foster general interest. 

Ricardo



On Fri, May 29, 2020 at 9:01 AM Bartłomiej Płotka <bwplotka@...> wrote:
Hi SIG Observability! 👋

I recently noticed that many of CNCF's Prometheus and Thanos users often desire to use their metric data collected by Prometheus for more advanced Analytics cases. Something more suitable for Business Intelligence / OLAP use cases. 

As the Prometheus maintainers, we designed Prometheus Query API and PomQL for realtime monitoring, or at most for simple analytics. It's far from being efficient for Data Mining or Data Exploration.

I feel there are two things we are missing in the CNCF space: 

1. Please tell me if I am wrong here, but I don't see any particular BI/OLAP open source project in the CNCF space. If not, I think as CNCF SIG Observability there is some possibility for us to encourage some project for this to either join or at least be closer integrated with the community. Do you think as the CNCF SIG Observability should we be doing this? 🤔

2. Metric data from, especially if you have years of it thanks to Thanos or Cortex, is an amazing source of information. In the Thanos community, we are actively looking for a project that will fit most of the requirements stated hereAre you currently a user of some Open Source OLAP system worth recommending? If yes, which one? Would you like to have good integration of such a system with metrics? 

We are looking for your feedback, preferably on this GitHub issue: https://github.com/thanos-io/thanos/issues/2682, I plan to also put this topic for the next SIG agenda if we will have time for it. 🤗 

Kind Regards and have a good weekend!
Bartek


Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.

Ricardo Aravena
 

Bartek,

This is a great idea. Keep in mind that OLAPs are not necessarily used for monitoring and observability. For example, in the past, I worked on implementing Apache Druid to collect mobile analytics. In this space, I can think of these projects: Druid, Pinot, KylinClickhouse, Modrian, Cubes (There might be others)  Druid, Pinot, and Kylin are already part of the Apache Foundation so that leaves others that we could approach to join the CNCF.

Having said that because OLAP systems can be quite complex, there are multiple components that may fall into the scope of other CNCF SIGs. For example, storing historical data (SIG-Storage), running your batch processor workers (SIG-Runtime), serving your real-time and historical data (SIG-Network).

In any case, it would be great to approach the different projects so that the CNCF community is aware of how OLAPs work and foster general interest. 

Ricardo



On Fri, May 29, 2020 at 9:01 AM Bartłomiej Płotka <bwplotka@...> wrote:
Hi SIG Observability! 👋

I recently noticed that many of CNCF's Prometheus and Thanos users often desire to use their metric data collected by Prometheus for more advanced Analytics cases. Something more suitable for Business Intelligence / OLAP use cases. 

As the Prometheus maintainers, we designed Prometheus Query API and PomQL for realtime monitoring, or at most for simple analytics. It's far from being efficient for Data Mining or Data Exploration.

I feel there are two things we are missing in the CNCF space: 

1. Please tell me if I am wrong here, but I don't see any particular BI/OLAP open source project in the CNCF space. If not, I think as CNCF SIG Observability there is some possibility for us to encourage some project for this to either join or at least be closer integrated with the community. Do you think as the CNCF SIG Observability should we be doing this? 🤔

2. Metric data from, especially if you have years of it thanks to Thanos or Cortex, is an amazing source of information. In the Thanos community, we are actively looking for a project that will fit most of the requirements stated hereAre you currently a user of some Open Source OLAP system worth recommending? If yes, which one? Would you like to have good integration of such a system with metrics? 

We are looking for your feedback, preferably on this GitHub issue: https://github.com/thanos-io/thanos/issues/2682, I plan to also put this topic for the next SIG agenda if we will have time for it. 🤗 

Kind Regards and have a good weekend!
Bartek


Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.

Bartłomiej Płotka
 

Hi SIG Observability! 👋

I recently noticed that many of CNCF's Prometheus and Thanos users often desire to use their metric data collected by Prometheus for more advanced Analytics cases. Something more suitable for Business Intelligence / OLAP use cases. 

As the Prometheus maintainers, we designed Prometheus Query API and PomQL for realtime monitoring, or at most for simple analytics. It's far from being efficient for Data Mining or Data Exploration.

I feel there are two things we are missing in the CNCF space: 

1. Please tell me if I am wrong here, but I don't see any particular BI/OLAP open source project in the CNCF space. If not, I think as CNCF SIG Observability there is some possibility for us to encourage some project for this to either join or at least be closer integrated with the community. Do you think as the CNCF SIG Observability should we be doing this? 🤔

2. Metric data from, especially if you have years of it thanks to Thanos or Cortex, is an amazing source of information. In the Thanos community, we are actively looking for a project that will fit most of the requirements stated hereAre you currently a user of some Open Source OLAP system worth recommending? If yes, which one? Would you like to have good integration of such a system with metrics? 

We are looking for your feedback, preferably on this GitHub issue: https://github.com/thanos-io/thanos/issues/2682, I plan to also put this topic for the next SIG agenda if we will have time for it. 🤗 

Kind Regards and have a good weekend!
Bartek

61 - 80 of 94