17:00:14 <nickm> #startmeeting network team meeting, 3 Jun 2019
17:00:20 <asn> o/
17:00:22 <nickm> hi!  I think it's that time again!
17:00:25 <catalyst> o/
17:00:28 <nickm> https://pad.riseup.net/p/tor-netteam-2019.1-keep
17:00:29 <ahf> hello hello
17:00:29 <asn> (around but will soon go for dinner with family)
17:00:30 <dgoulet> hello
17:00:31 <wayward> nothing from me!
17:01:10 <nickm> asn: ok! Anything you want to make sure we talk about while you are around?
17:01:12 <gaba> o/
17:01:19 <asn> nope im covered
17:01:44 <asn> i've been in contact with the s27 team and the scaling team and they are aware of my next moves
17:01:51 <nickm> woo
17:02:11 <nickm> So actually I suggest we skip past CI for now and talk about it when we do rotation handoff and discussion
17:02:19 <nickm> let's go straight to 041 release status
17:02:27 <nickm> https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/CoreTorReleases/041Status
17:03:14 <nickm> we have a bunch fewer than we did before ,  and once this week's reviews are done, we'll have fewer still, I hope
17:04:09 <nickm> I triaged a bunch of stuff out of the 0.4.1.x milestone last week; sent a tor-dev email about that at https://lists.torproject.org/pipermail/tor-dev/2019-May/013840.html
17:04:28 <nickm> Here are the tickets currently marked as 041-must: https://trac.torproject.org/projects/tor/query?status=accepted&status=assigned&status=merge_ready&status=needs_information&status=needs_review&status=needs_revision&status=new&status=reopened&keywords=~041-must&col=id&col=summary&col=status&col=type&col=priority&col=milestone&col=component&order=status
17:05:32 <nickm> If everybody with one of those can focus on closing it, and everybody with reviews to do for 041 can get reviews done, we'll get out  this week
17:05:56 <nickm> Does that sound like an okay plan?
17:06:19 <ahf> yes
17:06:46 <dgoulet> nickm: (1) is not simple, I've been all morning at it
17:07:03 <dgoulet> nickm: (3) will probably be closed imo through another one in needs review I think
17:07:20 <nickm> dgoulet: Wrt (1) would you like some help or do you want to poke at it longer?
17:08:14 <dgoulet> nickm: I'm hoping the person on the ticket will send me more debug logs or I'm finally able to reproduce, but if both fails, yes help++
17:09:17 <nickm> ok
17:09:38 <nickm> another option is to log more information when this happens (like, what kind of circuit and which hop the sendme came from)
17:10:07 <dgoulet> nickm: I have a relay loggin 5 lines of text every time it happens :) so far not helping much, I need the end2end correlation :S
17:10:16 <dgoulet> nickm: somehow the deliver/package window are out of sync :S
17:10:33 <nickm> weird
17:10:38 <nickm> I'll ask more on #tor-dev :)
17:10:49 <dgoulet> very... especially code I didn't change (stream level :S)
17:10:52 <nickm> next thing to do is roadmap
17:10:53 <dgoulet> nickm: great
17:11:33 <nickm> everybody please take a look at that kanban, filter it, and move stuff to a good place :)
17:12:57 <nickm> any questions/issues there?
17:13:10 <nickm> If not, let's move on to reviews...
17:13:44 <nickm> looks like I only got 1 this week, so if anybody needs help, please feel free to pass me something if you're overloaded
17:14:23 <ahf> i'd really like some extra eyes on #25140, i think it looks good, but it's pretty big and have had a ton of iterations
17:14:39 <ahf> we have found something at every iteration i believe
17:15:08 <nickm> ahf: ok, I can help. Let's plan a time tomorrow to look at it together?
17:15:25 <ahf> sounds good, at your morning'ish time?
17:16:15 <nickm> I have a 9am meeting my time, so my 10am would be good?
17:16:26 <nickm> == 1400 UTC I think
17:16:34 <ahf> sound sgood! yep
17:17:18 <nickm> Rotations are under discussion too, so let's move to rotations/discussion :)
17:18:19 <nickm> I'm passing CI to teor, but I plan to keep working on test-stem and test-rebind stuff
17:18:40 <nickm> one thing we should talk about is whether we disable the intermittent failing stuff that we have not yet been able to fix
17:18:51 <nickm> This is test-stem and test_rebind.by
17:18:59 <nickm> That is, we would have to make it allow_failures
17:19:08 <nickm> and keep working on a fix so we can turn it back on
17:19:12 <nickm> What do people think about that?
17:20:22 <catalyst> the test_rebind.py failure seems to be macOS-only, so we could allow_failures the macOS builds
17:20:31 <ahf> nice
17:20:34 <nickm> or disable test_rebind on them
17:20:45 <catalyst> anyone seen the test_rebind failure on not-macOS?
17:21:02 <nickm> I thought I had, but I turned out to be wrong
17:21:50 <nickm> Does anybody think we should or shouldn't allow_failure these for now?
17:22:38 <nickm> Maybe we should note this stuff on the CI status page, and have a note in ReleasingTor.md saying that we should manually try all the CI-disabled tests before releasing
17:23:16 <catalyst> we could also conditionally not run test_rebind on macOS somehow, so we don't have to disable all the macOS jobs
17:23:21 <catalyst> s/disable/allow_failures/
17:23:28 <nickm> yeah
17:23:42 <nickm> If nobody objects, I think this is the approach we should go with
17:25:03 <catalyst> we could make it also depend on running in Travis so a developer on macOS running `make check` still has it run
17:25:28 <nickm> I'd just add an environment variable for disabling test_rebind.py, and add it to the relevant entries in the travis file
17:25:42 <catalyst> that works too
17:26:11 <catalyst> maybe we should also have a process for the CI rotation person to check the allow_failures Travis jobs occasionally?
17:26:44 <nickm> +1
17:27:01 <nickm> if nothing else on this, let's look at teor's other questions about sbws deployment?
17:27:53 <nickm> they are:
17:27:57 <nickm> Should we deploy sbws to half the bandwidth authorities?
17:27:58 <nickm> Should we raise AuthDirMaxServersPerAddr to 4?
17:28:09 <nickm> any thoughts?
17:30:24 <ahf> is this a suggestion we make to the dirauth operators?
17:30:27 <dgoulet> about AuthDirMaxServersPerAddr, on tor-dev there
17:30:36 <dgoulet> discussion seems advancing
17:31:16 <nickm> In my ideal world this is a question dirauth operators would figure out
17:32:06 <nickm> I think we made that value tunable because we didn't know what it should be
17:32:13 <nickm> I don't mind keeping it at 2 or raising it to 4
17:32:20 <mikeperry> it's not just a measurement problem.. operators need multiple instances to use all bandwidth because of single-threaded CPU limits
17:32:23 <nickm> good to see the thread is making progress
17:32:54 <nickm> for sbws -- I'd like us to ask ourselves how sure we are this won't break, how fast we can change back if it does break.
17:33:21 <nickm> also maybe we should ask "are we measuring the right things so that we will notice any performance changes that will happen as  a result of this?"
17:33:28 <nickm> so maybe we should pull in metrics too
17:35:49 <mikeperry> isn't it unlikely to change metrics until relay operators actually run more than 2 relays per IP?
17:36:07 <nickm> I was talking about the SBWS change
17:36:10 <mikeperry> might take a bit for enough relay operators to take advantage/run more relay instances
17:36:13 <mikeperry> oh
17:37:13 <nickm> mikeperry: I agree with you about the MaxServers change
17:38:31 <mikeperry> we had some good aggregate graphs that we used from metrics when we last did a major torflow update, many years ago, to verify similar distributions between old and new instances
17:39:18 <nickm> mikeperry: could you introduce those to the tor-dev thread that teor linked?
17:40:10 <mikeperry> I am trying to find it.. it was a long long time ago
17:40:17 <nickm> ok
17:40:36 <nickm> That's the end of the discussion section on the pad.  Do we have any other items for this week's meeting?
17:40:50 * ahf has none
17:41:59 <nickm> okay. Let's have a great week hacking, then.  Thanks, everybody!
17:42:41 <ahf> o/
17:42:49 <gaba> o/ thanks
17:43:51 <nickm> #endmeeting