SIG Observability Sync: New hour our meeting.
Hi! Not sure if everyone is on CNCF #sig-observability slack, but Matt announced bit changed meeting timing. See note below:
|
||||||||||||
|
||||||||||||
Getting started
Mahmoud Saada <mahmoud@...>
Hey all,
I was wondering if there were some guidelines on how to join, participate, and contribute to this SIG. Any documentation? Recurring meetings? Thank you
|
||||||||||||
|
||||||||||||
Re: Getting started
Amye Scavarda Perrin
The SIG charter is currently in review with the TOC, this space will have recurring meetings as that completes and passes a TOC vote to set up this SIG officially. So, soon!
On Mon, Mar 16, 2020 at 9:40 AM Mahmoud Saada <mahmoud@...> wrote: Hey all, --
Amye Scavarda Perrin | Program Manager | amye@...
|
||||||||||||
|
||||||||||||
SIG Observability has a TOC Liaison!
Hello! We're happy to announce that Brendan Burns has agreed to be the TOC Liaison for SIG Observability. That means... The SIG Observability Charter is now ready for a vote! Matt
|
||||||||||||
|
||||||||||||
Re: SIG Observability has a TOC Liaison!
Awesome! What would be the next steps for us? (: Vote means, vote on next CNCF TOC meeting? Kind Regards, Bartek
On Wed, 1 Apr 2020 at 05:56, Matt Young <myoung@...> wrote:
|
||||||||||||
|
||||||||||||
Re: SIG Observability has a TOC Liaison!
Richard Hartmann
I think "next" is up to TOC's discretion, but "soon" sounds realistic.
toggle quoted messageShow quoted text
On Wed, Apr 1, 2020 at 3:51 PM Bartłomiej Płotka <bwplotka@...> wrote:
|
||||||||||||
|
||||||||||||
Meeting date
Richard Hartmann
Dear all,
before we start finding a meeting date, we would like to get a feeling for the room on overall meeting times. There are three different options in this privacy-preserving alternative to doodle: https://dudle.inf.tu-dresden.de/jrGTINFDmA/ Based on my experience, we will not need option 3 as the absolute majority of contributions tends to come from EMEA & US, but maybe that's a function of usual meeting times. Best, Richard
|
||||||||||||
|
||||||||||||
Re: Meeting date
Done. (: Thanks for setting this up Richi! Kind Regards, Bartek
On Fri, 10 Apr 2020 at 09:58, Richard Hartmann <richih@...> wrote: Dear all,
|
||||||||||||
|
||||||||||||
[SIG o11y] Tech lead nomination: Bartek (Bartłomiej Płotka)
Richard Hartmann
Dear all,
as announced during yesterday's TOC call, Matt Young and I are suggesting Bartek of Red Hat & Prometheus and Thanos as Tech Lead for our SIG. You can find Bartek's platform here: https://docs.google.com/document/d/194INvrWMRZT9p0VxhlkRa9yXK8a4npdBlDvOetWvPb0/edit Discussion and voting should happen on the TOC list, SIG o11y is BCC'ed in for information only. Best, Richard
|
||||||||||||
|
||||||||||||
[SIG o11y] Call cadence
Richard Hartmann
Dear all,
as announced during yesterday's TOC call, we will have our call fortnightly on every second Tuesday, 16:00 UTC, starting next week. Amye will also add it to our CNCF calendar. Best, Richard
|
||||||||||||
|
||||||||||||
[SIG o11y] Third chair nomination : Steve Flanders
Richard Hartmann
Dear all,
as announced during yesterday's TOC call, Matt Young and I are suggesting Steve Flanders of Splunk & Open Telemetry as the third chair for our SIG. I will leave the honours of introduction to Steve :) Discussion and voting should happen on the TOC list, SIG o11y is BCC'ed in for information only. Best, Richard
|
||||||||||||
|
||||||||||||
Invitation: CNCF SIG-Observability Meeting @ Every 2 weeks from 9am to 9:50am on Tuesday (PDT) (cncf-sig-observability@lists.cncf.io)
Amye Scavarda Perrin
|
||||||||||||
|
||||||||||||
Cortex Incubation Recommendation Document Review
Hi SIG Observability! As per our discussion on our first SIG meeting on April 28th, Cortex Team has prepared a document, based on Due Diligence Template. It's mostly finalized, (reviewed by myself so far), so tomorrow Goutham will walk us through this doc. As a SIG we will try to review it together and comment. The overall outcome is to have Consensus as a SIG if we recommend the Cortex project to be promoted to the Incubation stage or not. Added to agenda for tomorrow's Meeting Notes Please familiarize yourself with this document a little bit, so we can review this document together within a reasonable time tomorrow! See You! 🤗 Kind Regards, Bartek Płotka
|
||||||||||||
|
||||||||||||
Request for feedback: Cortex end user survey
Goutham Veeramachaneni
Hi folks, Thanks for taking a look at the Cortex Due Diligence for incubation. One of the requirements there is end-user feedback. I've created a short survey here: https://forms.gle/w1BJD9B62D8uM8Vz8 This is the first time we're doing this and we'd love to get feedback from the SIG and other project maintainers before I send it out to our users and community. If there existing surveys from other projects, I'd love to take a look as well. Thanks, Goutham.
|
||||||||||||
|
||||||||||||
Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.
Hi SIG Observability! 👋 I recently noticed that many of CNCF's Prometheus and Thanos users often desire to use their metric data collected by Prometheus for more advanced Analytics cases. Something more suitable for Business Intelligence / OLAP use cases. As the Prometheus maintainers, we designed Prometheus Query API and PomQL for realtime monitoring, or at most for simple analytics. It's far from being efficient for Data Mining or Data Exploration. I feel there are two things we are missing in the CNCF space: 1. Please tell me if I am wrong here, but I don't see any particular BI/OLAP open source project in the CNCF space. If not, I think as CNCF SIG Observability there is some possibility for us to encourage some project for this to either join or at least be closer integrated with the community. Do you think as the CNCF SIG Observability should we be doing this? 🤔 2. Metric data from, especially if you have years of it thanks to Thanos or Cortex, is an amazing source of information. In the Thanos community, we are actively looking for a project that will fit most of the requirements stated here. Are you currently a user of some Open Source OLAP system worth recommending? If yes, which one? Would you like to have good integration of such a system with metrics? We are looking for your feedback, preferably on this GitHub issue: https://github.com/thanos-io/thanos/issues/2682, I plan to also put this topic for the next SIG agenda if we will have time for it. 🤗 Kind Regards and have a good weekend! Bartek
|
||||||||||||
|
||||||||||||
Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.
Ricardo Aravena
Bartek, This is a great idea. Keep in mind that OLAPs are not necessarily used for monitoring and observability. For example, in the past, I worked on implementing Apache Druid to collect mobile analytics. In this space, I can think of these projects: Druid, Pinot, Kylin, Clickhouse, Modrian, Cubes (There might be others) Druid, Pinot, and Kylin are already part of the Apache Foundation so that leaves others that we could approach to join the CNCF. Having said that because OLAP systems can be quite complex, there are multiple components that may fall into the scope of other CNCF SIGs. For example, storing historical data (SIG-Storage), running your batch processor workers (SIG-Runtime), serving your real-time and historical data (SIG-Network). In any case, it would be great to approach the different projects so that the CNCF community is aware of how OLAPs work and foster general interest. Ricardo
On Fri, May 29, 2020 at 9:01 AM Bartłomiej Płotka <bwplotka@...> wrote:
|
||||||||||||
|
||||||||||||
Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.
Hey Bartek, Glad to hear this topic brought up, it's something we think a lot about and have some experience with it at Uber (running OLAP queries against monitoring and observability data). 1. With respect to SIG Observability, I think talking and moving forward on options/standardized approaches to OLAP on monitoring and observability data makes sense. With regards to BI/OLAP in general, I would say that SIG Observability should not be focused on this space and would probably be better served by a dedicated data engineering SIG. 2. At Uber we ETL'd subsets of data users wanted to do large processing on into an existing platform. The data warehouse supported Spark and Presto for interactive queries (i.e. pull raw data matching query at at query time) and HDFS (ingest raw data as it arrives via Kafka into HDFS and ETL/query there). I'd love to see a project that was Prometheus Remote Read -> Spark for interactive or batch ETL against Prometheus data. Also Prometheus Remote Read -> Presto could be interesting, although Presto focuses more on interactive queries vs say Spark. The major issue with other systems in this space tends to be owning the whole data pipeline that results, e.g. Thanos/Cortex/M3/ecosystem would need to support an ongoing export of data into another stateful system such as Druid, Pinot, Clickhouse, etc. You also then have to now store the data in these other warehouses with smart policies, otherwise a lot of users end up just whitelisting all of the data to be warehoused. Typically this ends up with really large datasets existing in two different systems and a significant investment to keep the pipeline flowing between the two. That is why I think seeing projects that support interactive and ETL that operate on the dataset from the Prometheus metrics store itself and then save elsewhere being quite interesting, rather than warehouse the whole dataset themselves. Best, Rob
On Fri, May 29, 2020 at 12:50 PM Ricardo Aravena <raravena80@...> wrote:
|
||||||||||||
|
||||||||||||
Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.
Ricardo, Rob thanks for the answers so far! (: Ricardo: Yup, we have already solid projects in the observability space. However, business-oriented analytics results are one of the best use cases/outcome of the long term observability data we collect for monitoring needs, right? (: It's quite an amazing side effect and benefit of collecting such data. The idea is to connect both worlds through integrations and Open APIs. > In this space, I can think of these projects: Druid, Pinot, Kylin, Clickhouse, Modrian, Cubes (There might be others) Druid, Pinot, and Kylin are already part of the Apache Foundation so that leaves others that we could approach to join the CNCF. Thanks for those examples, a recommendation of using those would be amazing as well. What I will say might be controversial, but the goal of this initiative is NOT to steal projects or compete with Apache. It's actually the opposite: Integrate better with the most promising open-source systems that solve the community use cases. I think if we can encourage some amazing project to join CNCF that's great, but IMO CNCF is only here to help project that needs help. If Druid or others already have helped from other organizations, that's fine, does it matter for us? (: In my opinion, what matters is that the promising project has the help, funding, and support it needs. > For example, storing historical data (SIG-Storage), running your batch processor workers (SIG-Runtime), serving your real-time and historical data (SIG-Network). I agree this is very connected. However, my honest opinion is that Analytics, even in OLAP based fashion overlaps a little bit with SIG-Observability and this is why I am interested to find some solutions for our communities. Rob: > experience with it at Uber (running OLAP queries against monitoring and observability data). Yes! This is what we are looking for - production-grade experience and recommendations for this. > 1. With respect to SIG Observability, I think talking and moving forward on options/standardized approaches to OLAP on monitoring and observability data makes sense. With regards to BI/OLAP in general, I would say that SIG Observability should not be focused on this space and would probably be better served by a dedicated data engineering SIG. I tend to agree, however, given no one started SIG-BigData yet and given the observability data is quite an enormous source of meaningful information, I would love to explore at least API and integrations possibilities here. Maybe I'm wrong (: > 2. At Uber we ETL'd subsets of data users wanted to do large processing on into an existing platform. The data warehouse supported Spark and Presto for interactive queries (i.e. pull raw data matching query at at query time) and HDFS (ingest raw data as it arrives via Kafka into HDFS and ETL/query there). Awesome, good examples, worth to revisit those amazing projects and integrations there (Spark, Presto, Hadoop) > I'd love to see a project that was Prometheus Remote Read -> Spark for interactive or batch ETL against Prometheus data. Also Prometheus Remote Read -> Presto could be interesting, although Presto focuses more on interactive queries vs say Spark. Yes! Does anyone have more info about that Spark integration? I remember some teams are using Presto on Thanos data already at Red Hat, I might try to find more information on that as well. (: > The major issue with other systems in this space tends to be owning the whole data pipeline that results, e.g. Thanos/Cortex/M3/ecosystem would need to support an ongoing export of data into another stateful system such as Druid, Pinot, Clickhouse, etc. You also then have to now store the data in these other warehouses with smart policies, otherwise a lot of users end up just whitelisting all of the data to be warehoused. Typically this ends up with really large datasets existing in two different systems and a significant investment to keep the pipeline flowing between the two. That is why I think seeing projects that support interactive and ETL that operate on the dataset from the Prometheus metrics store itself and then save elsewhere being quite interesting, rather than warehouse the whole dataset themselves. Yes! This is actually the amazing novelty we would love to push toward as well. Instead of storing the same data in 5 places can we keep it in just one? The idea would be to promote efficient streaming read API more vs copying the data to different formats. I mentioned this in one of the requirements here. This might mean more work on those Thanos/Cortex/M3/ecosystem projects, but given we are already collaborating, it might be easier (: This is along the lines what we try to push on metrics/logs/tracing world as mentioned by my team colleague Frederic: Can we reuse similar index for those three since we observe collect different data.. but from the same resources? Kind Regards, Bartek
On Fri, 29 May 2020 at 18:16, Rob Skillington <rob@...> wrote:
|
||||||||||||
|
||||||||||||
[Action Required] Thanos to Incubation Due Diligence Ask For Review
p.versockas@...
Hello, I’m Povilas Versockas one of the Thanos maintainers & Prometheus Community organizers. Similar to the Cortex document as per the next SIG Observability task I have prepared a Thanos Due Diligence document and I kindly invite all SIG Observability members to read and review. Feel free to add comments and questions to the doc, so we can discuss all during the next SIG Observability meeting on 9th June 2020 as per our agenda. Best Regards, Povilas Versockas
|
||||||||||||
|
||||||||||||
Re: Recommendations for Open Source Analytic (OLAP) system / API to mine Thanos/Prometheus data.
Richard Hartmann
Hijacking top email to reply across the board. As many of you will know, I have been nagging Prometheus-team about this for years, so yes, I think we should cover this. At PromCon 2017's dev summit hallway track, we talked about connectors to existing data analysis, e.g. an R interface to natively access data stored in Prometheus format. Thanos' block storage would solve a lot of pain points and Promtheus' remote read/write API is another obvious immediate attach point. Also at around the same time, I started a discussion about extending PromQL in this direction, a discussion which never went anywhere, but which I can see being revived. I disagree that the topic should be death-by-committee'd day 1 by splitting it across several SIGs. Concerted effort and input from subject-matter experts is good, though. But get something off the ground first before making it more cumbersome. Overall, I think it's something which we should at least take a look at in the context of this SIG. Deeper analysis of data definitely falls under o11y. Best, Richard
On Fri, May 29, 2020 at 6:00 PM Bartłomiej Płotka <bwplotka@...> wrote:
|
||||||||||||
|