15:01:16 <karsten> #startmeeting metrics team meeting
15:01:16 <MeetBot> Meeting started Thu Feb 20 15:01:16 2020 UTC.  The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:16 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:01:27 <karsten> please add more topics, if you want.
15:03:15 <karsten> let's start. if more topics come up, we can append them.
15:03:20 <karsten> Review tasks from roadmap session (ticket creation, old cards in trello)
15:03:33 <karsten> gaba: ^
15:04:28 <gaba> yes
15:04:47 <gaba> there are still some tickets that needs to be created. I wanted to check with you all before going and creating them myself.
15:05:02 <gaba> they are the issues marked as NEED_TICKET in the temporal roadmap pad
15:05:30 <gaba> the updated (that is strikethrough) is that already added all other tickets and imported them into the trello roadmap
15:06:51 <acute> sorry about that, I thought all onionperf tickets were done
15:07:10 <gaba> and this bring us to the item about migrating from gitlab to trac
15:07:27 <acute> I am creating the remaining ones just now
15:07:27 <gaba> that irl added?
15:07:45 <gaba> thanks acute!
15:08:01 <irl> we didn't move the onionperf tickets yet
15:08:08 <irl> if that's what you're asking
15:08:22 <irl> we have moved all the metrics-cloud PRs, and even processed, reviewed and merged them
15:08:50 <gaba> ok
15:10:53 <karsten> I added a comment to a NeedsTicket line. do you need anything else from me?
15:11:03 <gaba> nop. I think we are fine
15:11:43 <karsten> okay.
15:12:06 <karsten> what else remains on this topic?
15:13:00 <gaba> I think we are done. I will include those into the roadmap once we have them.
15:13:14 <karsten> sounds great!
15:13:26 <karsten> okay, moving on.
15:13:32 <karsten> GeoIP database (karsten)
15:14:04 <karsten> last week we considered moving to another database provider. but it turns out we'd be running into similar issues as with the current database.
15:14:10 <karsten> CCPA.
15:14:23 <irl> yeah
15:14:30 <karsten> (California Consumer Privacy Act)
15:14:43 <karsten> we should talk about potential workarounds.
15:14:49 <irl> i'm sure this has to be a bad interpretation of the act
15:14:57 <irl> or the legislators are incompetent
15:15:26 <karsten> that's a fine question for a lawyer.
15:15:33 <irl> i like the idea of removing the need for clients to have a database
15:15:44 <karsten> yes, that's one option.
15:15:51 <karsten> which doesn't fully solve the problem, but part of it.
15:15:54 <irl> i also like the idea of distributing a cut down version of the database as a new dirauth document
15:16:10 <karsten> cut down how?
15:16:20 <irl> subnet -> country code is i think all we need
15:16:27 <karsten> ah, yes.
15:16:59 <karsten> I mentioned cutting out californian addresses, and you didn't like that. why?
15:17:29 <irl> a) it's ineffective and i don't want to encourage stuff like this, b) it's a slippery slope
15:18:06 <karsten> the two ideas above (no need for clients to use database, distribute via dirauths)
15:18:13 <karsten> only work in 6 months to 2 years from now.
15:18:26 <karsten> depending on how fast we get this implemented and depending on release cycles.
15:18:52 <irl> right but until then nothing actually breaks
15:18:53 <karsten> cutting out addresses might be a short-term solution.
15:18:59 <karsten> well, yes.
15:19:33 <irl> and we have plenty of relays submitting stats with old geoip files
15:20:04 <irl> (enthusiastic students watching should think about extrapolating network stats from only those relays with recent geoip files and seeing how the metrics change)
15:20:12 <karsten> hehe
15:20:29 <karsten> okay, I can see both sides here.
15:20:37 <karsten> I can see how it would be effective in a legal sense.
15:20:47 <karsten> and I can see how we survive 6 months to 2 years without new geoip data.
15:21:20 <karsten> are these the two ideas we should pursue?
15:21:35 <karsten> we would still have to find out whether they would be acceptable.
15:21:39 <irl> it's the being able to distribute updates that could be the problem
15:21:41 <karsten> and then talk about design and code.
15:21:48 <karsten> which part?
15:22:12 <irl> i remember reading we could ship a database with the product but each user had to sign up to receive updates
15:22:20 <irl> i don't remember which provider this was
15:22:33 <karsten> yes, that's something to discuss with them.
15:22:57 <karsten> I could imagine getting an exception for that. it's the part where they need to make sure that we make sure our users get updates.
15:23:24 <karsten> so, yes, we should be sure about that before writing a proposal or any code.
15:23:28 <irl> let's write a document to describe what we want to do
15:23:57 <gaba> I think regardless of the questions to IP2Location we should consult a lawyer
15:24:17 <irl> i think we need to work out what to ask the lawyer first
15:24:34 <irl> unless we have some big fund for lawyer questions
15:24:37 <karsten> my plan was to write that document in an email.
15:25:01 <karsten> we can discuss that internally before asking them. should we do that?
15:25:08 <irl> yeah
15:25:11 <gaba> ok
15:25:31 <karsten> and yes, asking a lawyer should be part of the process. not necessarily step 1, but somewhere.
15:25:52 <karsten> okay.
15:26:05 <karsten> that's all from me on this topic. moving on?
15:26:08 <irl> ok
15:26:24 <karsten> Onionoo 8.0 upgrade (karsten)
15:26:28 <karsten> currently in progress.
15:26:42 <karsten> is RS going to work with this?
15:27:08 <karsten> I guess we'll find out.
15:27:21 <irl> yes
15:27:24 <irl> it will work
15:27:42 <karsten> the updater is currently running, and I'm going to update the server after this meeting. and the metrics website documentation.
15:27:45 <karsten> cool!
15:28:06 <karsten> nice catch with running tests in 1990, by the way. ;)
15:28:21 <karsten> okay, that's all on this topic.
15:28:24 <irl> i have expenses to submit for my time travel
15:28:33 <karsten> ah, things were cheap back then.
15:28:37 <irl> heh
15:28:48 <karsten> Simplifying (but also breaking) TorDNSEL (irl)
15:29:23 <irl> we can make a simple dns service to replace the current thing, instead of replicating the functionality
15:29:38 <irl> it would tell you if an exit relay is an exit relay or not, but not look at exit policies
15:30:03 <irl> exit lists don't contain exit policies so this gets a lot more complicated than i had originally imagined it would be
15:30:28 <irl> for consumers i know of, they don't care so much if it's the simple method or not
15:30:30 <karsten> hmm, how does the current thing solve this?
15:30:35 <irl> but exit operators might notice
15:30:49 <irl> the current thing downloads server descriptors from collector on a schedule
15:31:40 <karsten> this is a fine question...
15:31:41 <irl> so if it's just for exit relays and not for exit relays by dest ip/port, we can actually just write out a bind compatible zone file
15:31:55 <irl> and that means that the task is 1 pt
15:32:55 <karsten> how can we ask users?
15:33:17 <irl> i have privately spoken to irc network operators
15:33:38 <irl> i don't know who the other users might be
15:33:54 <gaba> the exit operators through the tor-relay mailing list?
15:34:09 <irl> i could write a mail on tor-relays@
15:34:21 <karsten> and exit relay or not is determined by having the Exit flag?
15:34:32 <karsten> or having an exit policy that is not reject *?
15:34:49 <irl> exit relay ip addresses are ip addresses we found in exit lists
15:35:44 <karsten> how do we decide which relays we scan?
15:35:55 <karsten> and is that different with the current and the new thing?
15:36:13 <irl> exits that are allowed to connect to https websites
15:36:27 <irl> we did a comparison on how it is different, and it's not that different
15:36:40 <karsten> okay.
15:36:50 <irl> the difference is that in the past you could reject say freenode:6667
15:36:54 <karsten> I like simple. I just don't feel able to answer whether it's good enough.
15:37:02 <irl> and then freenode wouldn't list your ip as a tor exit when they queried
15:37:16 <irl> the alternative is spending two more weeks on this
15:37:26 <karsten> I can imagine.
15:37:27 <irl> maybe 3 or 4 when i find the complications
15:38:12 <karsten> okay, I think asking on tor-relays@ is a reasonable next step. and then decide based on feedback.
15:38:18 <irl> yeah agreed
15:38:33 <irl> ok that's probably all on this topic
15:38:46 <karsten> great!
15:38:52 <karsten> Onionoo new hosts (irl)
15:39:01 <irl> we got these hosts
15:39:03 <irl> should we use them?
15:39:21 <karsten> how many?
15:39:24 <irl> we have 2 nowe
15:39:34 <karsten> to replace omeiense and oo-hetzner-03?
15:39:44 <irl> yeah
15:39:58 <karsten> let's do it. when?
15:40:06 <karsten> next week?
15:40:13 <irl> good question
15:40:49 <irl> 4th march?
15:41:08 <irl> or 5th march
15:41:30 <karsten> the plan would be to set up 2 new hosts, but not switch immediately?
15:41:40 <karsten> and when everything looks good, switch and keep the old ones around?
15:41:50 <karsten> and then when everything still looks good, kill the old ones?
15:41:53 <irl> yeah
15:42:08 <irl> i think they are currently masked in the varnish config
15:42:36 <karsten> how long do we need for step 1? an hour or two?
15:43:12 <karsten> if so, maybe we can do it sooner, and then do the next step in march.
15:43:21 <karsten> leaving enough time between steps to do checks.
15:43:40 <irl> the time consuming bit is syncing the state
15:43:56 <karsten> you mean copying over the directory from omeiense?
15:44:04 <irl> yeah
15:44:51 <karsten> okay. let's note down march 4 or 5, and do it sooner if we find free time.
15:44:54 <irl> but also i have none of this in my head right now, i have to read the docs again
15:45:31 <irl> we can look again at the next meeting to fix when we do it
15:45:57 <karsten> ok.
15:46:30 <karsten> (Quick) Questions on Metrics DB / Metrics Website Refresh (dj)
15:46:39 <dennis_jackson> My current understanding of the Metrics Backend is drawn from the March 2019 report by Karsten and irl which was super helpful. (Found at https://research.torproject.org/techreports/metrics-evaluation-2019-03-25.pdf )
15:46:49 <dennis_jackson> It suggests the the back end is currently mostly Java + R reading from CSVs / PostGres SQL Databases which are created from various tor-specific flat files.
15:47:00 <dennis_jackson> 1) Is this correct / up to date?
15:47:13 <irl> yes
15:47:15 <dennis_jackson> A few weeks ago it was mentioned that the Metrics Website is getting a refresh and there will be some UX research prior to that.
15:47:22 <dennis_jackson> 2) Any idea on timeline? When is the work expected to take place and the new website land?
15:47:33 <dennis_jackson> (I tried to find this myself, but could not track it down)
15:47:35 <irl> not the metrics website, but a portal to metrics data
15:48:08 <dennis_jackson> So this will be an alternative set of visualisations of the same data? Using the same backend?
15:48:33 <irl> it's more of an index although we might have visualisations in it
15:48:39 <gaba> the portal work on the user research part will be done in August. We will be implementing it after that depending on capacity.
15:49:00 <dennis_jackson> Okay - so it is more like Onionoo? I don't quite know what to picture sorry
15:49:23 <irl> https://data.gov.uk/ but for tor
15:49:36 <gaba> simply secure is working on the assumption that we are going to have visualizations there and it will be a useful place for researchers, journalists and whoever wants to use the data from Tor
15:49:43 <irl> yeah
15:50:27 <dennis_jackson> Okay, great. So will this involve back end work on how metrics are stored?
15:50:34 <irl> if it does, we did it wrong
15:50:39 <dennis_jackson> Haha. okay :)
15:51:02 <irl> this will also help us to make other data available more easily
15:51:08 <dennis_jackson> That's super helpful, thank you both for saving me a lot of time looking through Tracs and Pads!
15:51:13 <irl> say if we did a one-off analysis and have some CSV files that will never be updated
15:51:19 <dennis_jackson> Ah! Gotcha!
15:51:36 <dennis_jackson> Yeah - there's a ton of stuff like that hanging around and it would be great to keep it together
15:51:46 <dennis_jackson> Final question / motivation
15:52:03 <dennis_jackson> Was any thought given to using a backend for time series metrics like InfluxDB or graphna or whatever?
15:52:21 <dennis_jackson> Has it been tried? Or rejected for some reason?
15:52:50 <irl> i have been working on/off on a backend using apache beam for windowed batch jobs to get more real-time output
15:53:12 <irl> like extrapolating network traffic from the subset of relays that updated extra-info descriptors more recently
15:53:50 <irl> it would be awesome to have a funded project and make this a real thing, but we're never going to have the time unless it's funded
15:54:02 <irl> and metrics is not the big cool thing that gets funded
15:54:20 <dennis_jackson> Okay, that's interesting. Thanks!
15:54:40 <dennis_jackson> Yeah, I appreciate that, although it really (IMO) it should be the first line in any perf / scaling proposal
15:54:47 <dennis_jackson> "All the metrics!"
15:55:03 <irl> yes we should have metrics in every proposal
15:55:08 <irl> otherwise how do you know if you did anything?
15:55:14 <dennis_jackson> Amen!
15:55:47 <dennis_jackson> Thank for the info. Saves me a lot of time. Context: students looking for mini projects and something like this as PoC came up.
15:55:55 <karsten> while we're at it, let's all pay for the current metrics proposal to come through.
15:55:58 <karsten> pray*
15:55:58 <dennis_jackson> Wanted to check it wasn't already being done / had been done / didn't work
15:56:00 <karsten> not pay.
15:56:19 <gaba> lol
15:56:24 <karsten> so many thing to do.
15:56:51 <irl> dennis_jackson: i can email you a "requirements" doc if you have students that will make it a thing
15:57:09 <dennis_jackson> The mozilla proposal? Any idea when they announce?
15:57:42 <dennis_jackson> irl: That would be super handy. I can't promise anything though.
15:57:52 <irl> ok
15:57:54 <gaba> hopefully very soon :)
15:57:56 <dennis_jackson> I.e. if you one already. please do send it, but don't make one or spend time on it
15:58:05 <dennis_jackson> if you have one already*
15:58:21 <dennis_jackson> gaba: fingers crossed then :)
15:58:35 <irl> i have all the things on postit notes, i can put them in a doc
15:58:42 <karsten> approaching the hour. anything else for today?
15:58:54 <karsten> irl: please copy me, if you write something. curious.
15:59:04 <irl> nothing else from me
15:59:06 <irl> will do
15:59:07 <dennis_jackson> irl: that would be great then
15:59:10 <dennis_jackson> nothing else from me
15:59:28 <karsten> thanks, everyone! talk to you next week!
15:59:32 <karsten> #endmeeting