17:00:11 <ahf> #startmeeting Network team meeting, 9th august 2021
17:00:13 <ahf> hello everybody
17:00:21 <ahf> pad is at https://pad.riseup.net/p/tor-netteam-2021.1-keep
17:00:26 <dgoulet> o/
17:00:33 <ahf> o/
17:00:38 <jnewsome> o/
17:00:42 <nickm> woo!  hi!
17:01:10 <juga> o/
17:01:25 <ahf> o/ woh looks like we are many people today, very nice
17:01:32 <ahf> how are people doing with their boards ?
17:01:47 <GeKo> o/
17:01:57 <mikeperry> o/
17:02:08 <nickm> my board is good; arti can be harder than I had expected :)
17:02:24 <nickm> I need to offload arti#128 to the whole team, I think.  I don't have a coherent thing to write there
17:02:39 <ahf> very nice, i am mostly in s30 land right now
17:02:53 <ahf> ah, we can lift it over to the tpo/core/team repo if we want to?
17:03:25 <nickm> ah, yes please
17:03:55 <ahf> i haven't made much thoughts about this other then the conversations we had last week for the thursday meeting
17:04:24 <ahf> i will get it moved when i have my admin account around
17:05:23 <nickm> works for me
17:05:28 <ahf> release things looks indifferent from last week from what i could tell, but dgoulet, nickm and i should probably talk afterwards TROVE 2021 007
17:05:34 <ahf> the james bond trove
17:06:24 <ahf> is that correct? :-)
17:06:37 <nickm> makes sense to me
17:06:57 <dgoulet> yes
17:07:13 <ahf> don't see anything from other teams
17:07:58 <ahf> no discussion items or announcements w00tw00t
17:08:04 <ahf> ok then i think it's s61 time
17:08:07 <nickm> i thought i adderd a discussion item
17:08:08 <nickm> hang on
17:08:10 <ahf> huh
17:08:22 <ahf> you did!
17:08:33 <nickm> it was in the wrong place; I moved it up, sorry
17:08:46 <ahf> 2021-08-09 [nickm] It looks like we never made the tickets for TROVE-2021-00[356] public.  Can we safely do so now?
17:09:09 <ahf> i would say we can, yeah. i don't remember number 5, but 3 and 6 i think was OK to do with
17:10:31 <nickm> ok
17:10:34 <nickm> doing now
17:11:13 <ahf> ok
17:11:19 <ahf> next item is:     2021-08-09 [nickm] Plan dates for next releases, and TROVE-2021-007 fix.
17:11:53 <ahf> it sounds like we need to chat a bit about 2021-007 after, but we are also talking about how dgoulet and i need to get more involved with releases
17:12:05 <ahf> maybe this is a good opportunity for us to dive into it heads first
17:12:14 <nickm> looks like next ff release data is early september if we want to sync with that
17:12:28 <nickm> tbb-team: can you pick up a security release earlier than that if we have one?
17:12:49 <ahf> oh hm
17:12:57 <nickm> ?
17:13:02 <gaba> nickm: i will take a look at arti#128
17:13:10 <ahf> yeah, syncing with them is fine, nothing there
17:13:22 <nickm> gaba: you already did; we're just moving it into the team issues list
17:13:43 <gaba> we  can discuss it on thursday
17:14:03 <nickm> dgoulet, ahf: I think it would be reasonable to target August 16 (1 week) for the releases with this fix; what do you think?
17:14:34 <dgoulet> plausible!
17:14:41 <ahf> i think that is OK
17:15:23 <nickm> and only 3 releases to do, since 047 isn't releasing yet and 044 is EOL
17:15:29 <nickm> nice!
17:15:42 <nickm> shall we try to get a new set of fallbacks by that date too?
17:15:53 <dgoulet> yes absolutely
17:16:02 <nickm> dgoulet: ok. can I leave that to you? :)
17:16:05 <dgoulet> yes ofc
17:16:08 <nickm> woot
17:16:16 <ahf> awesome
17:16:29 <ahf> then david and i can continue our plan with talking about release process tomorrow i think
17:16:34 <ahf> cool!
17:16:40 <nickm> cool; pull me in if you have any questions
17:16:43 <ahf> i think that was all for discussion items. let's move to s61?
17:16:48 <ahf> nickm: ya, we will for sure
17:16:58 <nickm> 2 notes.  1: current process is is doc/HACKING/ReleasingTor.md
17:17:07 <ahf> yep
17:17:13 <dgoulet> ahf: yes, lets do that
17:17:14 <nickm> 2: I forget what my second note was
17:17:15 <nickm> :)
17:17:20 <nickm> ok, s61 now :)
17:17:21 <ahf> goto 1;
17:17:23 <ahf> :-D
17:17:28 <ahf> mikeperry: you're on
17:17:34 <mikeperry> ok
17:17:46 <mikeperry> so ppl seem back from vacations; yay
17:18:43 <ahf> :-)
17:18:44 <mikeperry> I updated the Sponsor61 section as best I could. I think it captures the stuff we went over in the meting last week, for those who were out
17:18:57 <mikeperry> we have some blockers in that the shadow box is busted
17:19:22 <mikeperry> and we also need to figure out how to get its output to match metrics.tpo
17:19:38 <nickm> [i have something for after the s61 section; sorry! it will be short]
17:20:11 <mikeperry> I am not sure if the shadow issue needs input from jnowesome to diagnose the log, or if lavamind is still looking into it: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40350
17:20:27 <ahf> oh, it was the sudden gitlab issue?
17:20:48 <mikeperry> yeah it is not accepting jobs. but there was a lot of gitlab runner damage last week with disk space, etc
17:21:04 <jnewsome> my impression is that lavamind is still looking into it
17:21:20 * gaba still needs to look about anything from s61 from last week
17:22:43 <mikeperry> once we get shadow running again, we need to figure out how to make its baseline match the metrics website onionperf data
17:23:09 <mikeperry> I think what needs to be done there is to import a handful of instances of the onionperf models into shadow
17:23:24 <hiro> so with acute last week we were discussing that the models from onionperf could maybe used into shadow
17:23:48 <mikeperry> to get a representative amount of data points from an hour sim, I think we will need multiple compies of the onionperf models
17:24:04 <hiro> she was confident some of the code could be reused.. but I haven't looked into that yet
17:24:17 <sysrqb> nickm: regarding earlier Tor Browser release - yes, we can release earlier if needed
17:24:29 <nickm> great!
17:24:51 <jnewsome> afaik the main difference is just how the data is reported - the shadow postprocessing (tornettools) reports aggregate data over the onionperf instances in the simulation. the raw data is there for the individual instances, but we don't have scripts to graph it
17:25:13 <mikeperry> hiro,jnewsome,acute: this probably requires some coordination on how to add the onionperf tgen models to the shadow sim. which I imagine requires a working gitlab runner to test
17:25:23 <gaba> wom 2
17:26:04 <hiro> sporksmith[m]: you mean how the data is graphed in onionperf or in metrics website?
17:26:42 <jnewsome> in the shadow sim postprocessing - it shows aggregate data, vs the web site showing individual instances
17:26:54 <hiro> uhm
17:27:21 <hiro> so it's a matter of understanding how shadows aggregates the data and having it graph individual instances if requested?
17:27:45 <hiro> or the other way around? aggregate the onionperf data in metrics website?
17:28:15 <jnewsome> I think graphing individual instances of the shadow sim data, though I guess we could do the other way around too
17:28:45 <jnewsome> maybe this is too much detail for this meeting. should we try syncing up again this week?
17:29:02 <hiro> ok sounds good. I think I'll create a ticket to track this in the website
17:29:05 <jnewsome> though it's going to be hard to do very much before the runner is working again
17:29:53 <mikeperry> jnewsome: can you check with lavamind to see if he needs anything from you to help diagnose the runner failure?
17:30:10 <jnewsome> mikeperry: will do
17:30:44 <ahf> is that it for s61 things?
17:31:00 <hiro> I have one thing about the overload metrics
17:31:13 <hiro> https://gitlab.torproject.org/tpo/network-health/metrics/relay-search/-/issues/40005
17:31:59 <hiro> so we found some performances issues in onionoo, for which we might not be able to expose right away all the information about the overload-ratelimits and overload-fd-exhausted lines
17:33:03 <mikeperry> hrmm, is this due to extra-info handling? the overload-general line is ok?
17:33:15 * hiro < https://matrix.org/_matrix/media/r0/download/matrix.org/wsHmhslGEYcJxPreXWNhmIgy/message.txt >
17:33:39 <ahf> hm
17:33:39 <hiro> the overload-general could be ok
17:33:48 <hiro> unless we want the operators to know what is overloading
17:34:35 <ahf> not sure i understand, but you see nodes hitting the fd limit from the overload-fd-exhausted entry in the extra-info's ?
17:34:43 <hiro> yes
17:35:23 <ahf> interesting. if there is a way to cluster it we might be able to find out if they run tor by hand or use some init system that forgets to bump these limits for tor. could be bugs in distro's init scripts
17:35:33 <hiro> but because of the way onionoo process the extrainfos we might not be able to expose all the info on relay-search
17:35:50 <ahf> ah
17:35:51 <dgoulet> I think we only want the operator to know that it is "overloaded"
17:35:54 <mikeperry> for overload-general, we should give them an alert that includes instructions on how to get the metricsport details into prometheus for their own diagnosis
17:35:59 <dgoulet> and then the operator can go on the MetricsPort to learn why
17:36:10 <dgoulet> there, what mikeperry says :) /me shuts up
17:36:21 <hiro> ok!
17:36:38 <hiro> so that's useful to know thanks
17:36:50 <mikeperry> for the fd-ehausted issue, I imagine geko and arma2 inspecting that while doing reachability tests, etc
17:37:11 <mikeperry> but that is just a ulimit change to fix, or it should be
17:37:44 <GeKo> yep
17:37:59 <ahf> yeah
17:38:03 <mikeperry> I guess if getting that data causes perf issues on the metrics server, that is not surprising. we ran into that in early testing
17:38:12 <GeKo> and i agree with dgoulet on just showing that relays are overloaded
17:38:33 <GeKo> with some hint on how to figure out what is going on
17:38:38 <GeKo> looking at the metrics port
17:38:49 <dgoulet> I would put it like "red" or something very noticeable!
17:38:52 <hiro> so we will consume the fd-exhausted information? and we can offer the operators just the boolean flag?
17:39:20 <hiro> because we can expose a fd-exhausted flag with the bandwidth information
17:39:40 <hiro> and don't expose that on relay-search
17:39:52 <hiro> maybe just on the bandwidth graph
17:40:06 <GeKo> we could experiment a bit i guess
17:40:12 <hiro> sounds good
17:40:22 <GeKo> to figure out what approach is not confusing operators too much
17:40:24 <mikeperry> I am not sure what the perf issue is, but the theory is that fd-exuasted should be an easy ulimit fix. so a boolean is fine there, if that is easier
17:40:59 <hiro> it's just on the onionoo data models and how the endpoints outputs the documents it produces
17:41:19 <hiro> I think that's ok mikeperry
17:42:50 <ahf> very good
17:42:51 <hiro> It's all from me
17:42:57 * GeKo does not have anything else for s61
17:43:01 <mikeperry> juga: while you were away, ggus experimented with a research prototype for unlisted exists. some of that work might be useful for the sbws pinned exit ticket: https://gitlab.torproject.org/tpo/network-health/sbws/-/issues/40022#note_2746514
17:43:03 * ahf good too
17:43:10 <nickm> qq from me not on s61: Anybody mind if I take off from 30 Aug through Sep 3?
17:43:23 <dgoulet> nickm: please do!
17:43:39 <juga> mikeperry: i looked at that, but i think we'll run into the onion service issue we mentioned
17:43:56 <mikeperry> juga: that was a way to do it without the onion service
17:44:22 <juga> mikeperry: ok, let's talk later, cause i think bridge uses onion service there
17:44:34 <ahf> nickm: nope, hope you enjoy it :-)
17:44:44 <mikeperry> oh interesting. ok
17:46:10 <mikeperry> well I think that is it for the s61 part of the meeting then
17:46:44 <ahf> sweet <3
17:47:04 <ahf> i am also gonna take some holiday later this month but figuring that out this week. last month while it was away from work it was more doing emotional paperwork /o\
17:47:10 <ahf> ok, i don't think we have anything else for our meeting today
17:47:12 <ahf> everybody good?
17:47:25 <dgoulet> yes
17:47:34 <jnewsome> 👍️
17:47:45 <nickm> ok w me.  do we still need to talk about the TROVE after the meeting?
17:47:55 <nickm> or should we do that when we cover backports and releases tomorrow?
17:48:06 <dgoulet> later imo
17:48:19 <nickm> ok. we'll confer about that tomorrow.
17:48:22 <ahf> dgoulet: later today or later tomorrow? :-S
17:48:32 <dgoulet> lol the second option Nick game :)
17:48:39 <dgoulet> former vs later lol
17:48:51 <ahf> ok, let's chat about it tomorrow?
17:49:00 <dgoulet> yes
17:49:12 <nickm> spelling is latter :)
17:49:18 <dgoulet> knew it ...
17:49:18 <nickm> hence the confusion
17:49:28 <ahf> ah!
17:49:34 <ahf> ok, we talk tomorrow then
17:49:44 <ahf> thanks all for the meeting. nice to have everybody back
17:49:47 <dgoulet> o/
17:49:48 <ahf> #endmeeting