16:00:17 <onyinyang[m]> #startmeeting tor anti-censorship meeting
16:00:17 <MeetBot> Meeting started Thu Nov  9 16:00:17 2023 UTC.  The chair is onyinyang[m]. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:17 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:28 <onyinyang[m]> hello everyone!
16:00:28 <onyinyang[m]> here is our meeting pad: [https://pad.riseup.net/p/tor-anti-censorship-keep](https://pad.riseup.net/p/tor-anti-censorship-keep)
16:00:34 <cohosh> hi
16:00:36 <meskio> hello
16:00:36 <shelikhoo> hi~
16:00:58 <onyinyang[m]> sorry for the late start, I blame DST
16:01:08 <shelikhoo> Do we wants to try a private pad? or after try that in the next meeting?
16:01:20 <shelikhoo> private=read only for public
16:01:38 <meskio> shelikhoo: let's explain a bit the problem:
16:01:39 <onyinyang[m]> since we only discussed it last week and didn't land on a solution, perhaps it would be best to discuss it fully in this meeting?
16:01:56 <meskio> we have our pad vandalize regularly and have to recover it manually
16:02:20 <meskio> we could try to have a public link that is read only and share the edit link with the people that usually participates in this meeting
16:02:30 <meskio> riseup pads do support that
16:02:46 <meskio> ahh, it was discussed last week, sorry I was not around
16:02:59 <meskio> I think we should give it a try next week
16:04:44 <onyinyang[m]> we discussed the issue last week but we were going to look into some things over the course of this week
16:05:10 <meskio> ahh, cool
16:05:11 <onyinyang[m]> I think there has been some discussion here: https://gitlab.torproject.org/tpo/community/hackweek/-/issues/16#note_2964041
16:05:39 <onyinyang[m]> I'm not sure if anything has been finalized yet though as I haven't been following the issue very closely since I posted on it
16:05:54 <meskio> I'm ok changing tools if needed, but we might not even need to do that, the only *problem* I see is that read-only links in riseup pads are not human friendly
16:06:12 <dcf1> I never knew about the read-only share link, thanks. There it is right in the toolbar.
16:06:30 <meskio> dcf1: exactly, is in the share icon
16:06:47 <shelikhoo> yes, we also have a look at other tools, and etherpad with read only is the step that make the least change and still fulfill our requirement
16:07:44 <meskio> onyinyang[m]: in hackweek#16 we didn't reach any conclusions, more like exploring things, and I've being experimenting with pad backups
16:07:57 <onyinyang[m]> ok so is the proposed course of action: read-only links for riseup pad shared publicly with edit links for regular attendees starting next week
16:08:30 <onyinyang[m]> and then possibly move to one of the other ideas (etherpad/cryptpad) if that doesn't work as expected?
16:09:01 <shelikhoo> I think this is the right move...
16:09:20 <meskio> sounds good, I think I'm next weeks facilitator, I can take care of setting it up
16:09:21 <cohosh> sounds good to me
16:09:29 <onyinyang[m]> sounds reasonable to me :)
16:10:18 <onyinyang[m]> ok so we're a bit out of order, but that's fine. Going back up to the top, we have a Fastly discussion point:
16:10:21 <onyinyang[m]> Fastly to block domain fronting in February 2024          https://lists.torproject.org/pipermail/anti-censorship-team/2023-October/000328.html
16:10:48 <onyinyang[m]> hmm, sorry about that formatting
16:10:55 <meskio> not sure if there is anything concrete to discuss on that topic, but I added a related topic:
16:11:00 <onyinyang[m]> this is the link: https://lists.torproject.org/pipermail/anti-censorship-team/2023-October/000328.html
16:11:05 <meskio> azure is giving a new date for the domain front closing
16:11:09 <meskio> January 8
16:11:19 <meskio> https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/33#note_2963884
16:11:24 <meskio> it was supposed to be yesterday
16:11:35 <onyinyang[m]> right
16:12:15 <meskio> I guess this is in the hands of cohosh to investigate and we might need to wait for her results to react
16:12:29 <cohosh> i don't have any updates yet
16:13:05 <meskio> we still have a couple of months
16:13:35 <meskio> EOF from my side
16:14:17 <cohosh> same from me
16:14:51 <onyinyang[m]> ok I think that's all of the discussion points
16:15:29 <onyinyang[m]> There are a couple of interesting links, namely video recordings from FOCI & PETS 2023
16:15:58 <onyinyang[m]> and a forum post about Snowflake
16:16:05 <onyinyang[m]> https://forum.torproject.org/t/snowflake-daily-operations-october-2023-update/10106
16:16:46 <onyinyang[m]> is there anything in particular anyone would like to to mention about any of those?
16:17:37 <onyinyang[m]> If not, we can move to the reading group discussion
16:18:21 <onyinyang[m]> Ok, let's move on.
16:18:39 <onyinyang[m]> The paper we decided to discuss today it: On Precisely Detecting Censorship Circumvention in Real-World Networks
16:18:53 <onyinyang[m]> It can be found here: https://www.robgjansen.com/publications/precisedetect-ndss2024.html
16:18:59 <rwails> For the reading group.. hi! Rob Jansen and I (Ryan Wails) are here to aid with discussion :)
16:19:07 <robgjansen[m]> 馃憢
16:19:11 <cohosh> welcome!
16:19:14 <dcf1> oh great you're here
16:19:16 <shelikhoo> hi~ welcome!
16:19:21 <meskio> nice to have you around, congrats for the paper, is pretty good
16:19:25 <onyinyang[m]> Great! Thanks you for coming :)
16:21:11 <dcf1> So this paper takes another look at past work that has claimed to be able to classify circumvention traffic with high precision
16:21:22 <dcf1> notably Wang et al. 2014 https://censorbib.nymity.ch/#Wang2015a
16:22:01 <dcf1> The biggest problem, of course is the base rate: since cicumvention flows are only a very small proportion of traffic, classifiers need to have very low false positives
16:23:06 <dcf1> In this work, they use notation with a 位 to quantify the traffic mix. 位 is the how many non-circumventing flows there are for a circumventing flow. 位 = 1 is an equal mix. 位 = 100 means roughly 1% of flows are circumventing.
16:23:31 <dcf1> Take a look at Table I on page 6 https://www.robgjansen.com/publications/precisedetect-ndss2024.pdf#page=6
16:24:16 <dcf1> It shows how precision/recall figures that look rosy at 位 = 1 become pretty dire at even 位 = 1000 (which is still probably wildly high compared to real-world conditions)
16:25:00 <dcf1> They build hand-tuned classifiers that are better than Wang et al.'s, and then a deep learning classifier that does even better.
16:25:26 <dcf1> But even that, they say, has too many false positives to be useful at 位 > 10,000 or so.
16:26:06 <dcf1> So to mitigate the low per-flow precision, they propose "host-based analysis" (Section VI), where you watch multiple flows to the same IP address over time.
16:26:41 <dcf1> Snowflake naturall mitigates this kind of host-based analysis (it is believed), because of they way the proxies are not at consistent IP addresses.
16:26:52 <dcf1> That's the end of my summary.
16:27:08 <meskio> the nice thing of the conclusions is that not only snowflake does mitigate that
16:27:10 <onyinyang[m]> Thanks for that great summary dcf1 !
16:27:22 <meskio> also our new PTs: conjure and webtunnel do mitigate it
16:27:28 <meskio> conjure by using ephemeral hosts
16:27:29 <dcf1> Besides the general research, this paper is interesting to us, because it looks specifically at obfs4 (and a hacked entropy-reduced obfs4 called obfs4*) as well as Snowflake rendezvous and data transfer.
16:27:39 <meskio> and webtunnel by having other traffic on the same ip:port
16:28:55 <dcf1> By the way, I wrote to the authors of "Covertness Analysis of Snowflake Proxy Request" (https://ieeexplore.ieee.org/document/10152736) that we linked a few weeks ago
16:29:10 <dcf1> and asked what their effective 位 was in evaluation
16:29:41 <shelikhoo> I think there is another paper that show traffic shape analysis maybe able to identify tls in tls traffic as in the case of webtunnel that is being published
16:29:51 <dcf1> they said 位 = 3.97 (4100 negative to 1032 positive)
16:30:28 <cohosh> i'm not sure in practice how ephemeral conjure hosts are
16:31:10 <cohosh> theoretically, the unused IP address space changes, but in practice it might not be enough or a large enough space if this technique were deployed
16:31:17 <dcf1> shelikhoo: yes, good point. IN fact the DL classifier in this paper does not use all the features it conceivably could: it just uses traffic sizes and directions.
16:31:34 <dcf1> V-E: "It is perhaps surprising that the CNN classifiers outperform the classical approaches using only packet sizes and directions."
16:32:11 <meskio> cohosh: in this paper the research calls hosts to the comvination of IP+port, so as long as conjure uses different ports it might work
16:32:31 <meskio> snowflake is nice there as it does use different port per connection
16:32:32 <shelikhoo> from a real user's feedback, I am aware that many users are reporting websocket-tls-vmess traffic blocked by IP address or port, which reinforce the idea of censors using host based censorship analysis
16:33:01 <shelikhoo> I was not aware of a way to reliably reproduce such censorship
16:33:22 <cohosh> at the moment, conjure uses fixed ports, but it could depend on the transport being used
16:33:38 <cohosh> if they eventually support a webrtc transport, for example, this could change
16:33:59 <meskio> ohh, I see, it might be a nice thing to improve...
16:34:14 <shelikhoo> and this is one of the challenge of dealing with host based analysis and censorship that is the difficulty of reproducing it reliably in real world environment
16:34:19 <rwails> it may not be necessary for a censor to use ports, btw, but it was the easiest way for us to isolate flows; for the host-based analysis we proposed to work, the censor needs a reliable way to capture sets of flows corresponding to one protocol/activity
16:35:11 <rwails> (but the censor is not limited to such choices)
16:35:46 <meskio> sure, but then having hosts that have other services will make the censors life harder
16:35:54 <rwails> right
16:35:57 <meskio> like if I host an obfs4 bridge in a server that I also have a website
16:36:35 <rwails> yes, if the censor looks across flows only for the IP address, then the website flows may serve as confusion
16:36:55 <shelikhoo> In the same time, creating a website does increase the cost of creating a bridge....
16:37:10 <dcf1> There is a piece of somewhat related research, coming out of a Chinese research lab in 2020
16:37:19 <dcf1> https://ieeexplore.ieee.org/document/9408011 "Towards Aggregated Features: A Novel Proxy Detection Method Using NetFlow Data"
16:38:24 <dcf1> They use NetFlow data, which is aggregated and fairly information-poor. To compensate for the low quality of the data, they look at multiple measurements of the same IP/port tuple over time, like the host-based analysis of this paper.
16:38:36 <dcf1> > Although NetFlow data is widely available today, it also brings about some challenging problems. The main reason is that NetFlow data is collected by sampling, resulting in the inability to obtain comprehensive information. Meanwhile, the statistical attributes of the sampled data lose original representation meaning. Therefore, we adopt NetFlow data aggregation method to overcome the challenges
16:38:42 <dcf1> imposed by using NetFlow data to achieve better proxy detection effect. In addition, through the approach, we aggregate statistics across multiple flows, which is not possible in a single flow.
16:38:46 <dcf1> > In order to deal with this problem, we design effective features from raw NetFlow data by data aggregation. In order to extract the aggregated features, we should first select the aggregation key and aggregation time window. In this paper, the IP (source IP/destination IP) is treated as a keyword, and the choice of the appropriate time window will be discussed in depth in section III-F2. Then all
16:38:52 <dcf1> NetFlow records with the key IP in the time window are aggregated to generate a feature vector.
16:39:20 <rwails> oh that's really interesting
16:40:33 <dcf1> Tunneled traffic features are something we are going to have to start paying more serious attention to. Now that we have more empirically informed threat models for it, we can do better than the best-effort attempts of ScrambleSuit and obfs4.
16:41:54 <shelikhoo> one of the quick fix for this issue is traffic mulplexing, just tunneling more than one payload connection in a proxy connection
16:42:06 <rwails> maybe related -- here's a link to new work appearing at ACM SIGCOMM where they develop efficient classifiers able to detect tunneled connections: <https://dl.acm.org/doi/10.1145/3603269.3604840>
16:42:06 <dcf1> I have a vague concept of introducing a list of traffic shaping "challenges" to encourage developers to start making the changes to their programs that will benecessary for more sophisticated traffic shaping
16:42:06 <shelikhoo> this does create other issue such as performance
16:43:14 <dcf1> My thinking is, there's currently a chicken-and-egg problem: current tools don't support arbitrary shaping, and also no one knows what a "good" shape should be, even if you could achieve it
16:44:13 <dcf1> My idea was to propose challenges that are admittedly not "good", but still provide something to target to make changes that have to be made anyway. And then, with better tool support, the community is in a better position overall to do experiments and find that "good" family of traffic schedules.
16:44:41 <dcf1> I have ambitions to write up something more complete about it, but I did post a sketch to give you an idea: https://github.com/net4people/bbs/issues/281#issuecomment-1724755111
16:45:12 <shelikhoo> https://github.com/3andne/restls/blob/main/Restls-Script%3A%20Hide%20Your%20Proxy%20Traffic%20Behavior.md
16:45:13 <bottooni> 12[slack] <github> signin
16:45:41 <shelikhoo> I think there is some tool that are trying to achieve scriptable padding
16:45:43 <dcf1> rwails, robgjansen[m]: my intuition is that, for example, if you had something like obfs4, but the server does the first send, rather than the client, it would confuse classifier based on direction/size. Is that right?
16:46:35 <rwails> yes, I do think these DL classifiers are fairly sensitive to perturbations like that
16:46:59 <meskio> robgjansen[m]: could protheus do this kind of things?
16:47:06 <robgjansen[m]> i would expect some resilience to a single bit-flip though
16:47:13 <rwails> we found that obfs4 had other identifiable features, specifically in the size of packets sent, but I think it would help
16:47:25 <dcf1> shelikhoo: because yeah, the packet-at-a-time padding/chopping is not good enough, we're learning
16:48:18 <robgjansen[m]> yeah we aim to generate protocols in proteus (github.com/unblockable/proteus) that can do many different handshake patterns including server-sends-first
16:48:20 <rwails> like Rob is alluding to though, I would imagine it would be easy to re-train the classifier to learn features that are more robust if the censor was able to collect flows from the modified obfs4 instance, if only bit flips were considered
16:49:26 <cohosh> the strength of proteus being that you can more easily provide a moving target of the features you'd need to learn, right?
16:49:37 <robgjansen[m]> yeah any changes to the PTs will always be public and hence the adversary can always retrain
16:49:42 <meskio> dcf1: does it make sense to add some of that to our research ideas wiki page? https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Research-ideas
16:49:57 <robgjansen[m]> so i think a very important design point is being able to adapt quickly
16:50:16 <dcf1> meskio: I will write it up at some point.
16:50:17 <shelikhoo> or procedural generated protocols that are difficult to ban them all
16:50:18 <rwails> cohosh: that's the idea :) make identifiable features tweakable
16:50:36 <meskio> dcf1: thanks :)
16:50:44 <dcf1> robgjansen[m]: well, adapt quickly, or else match something valuable to the censor so well that it doesn't get blocked.
16:50:54 <robgjansen[m]> hopefully both
16:51:05 <shelikhoo> yeah..
16:51:07 <dcf1> It's the same as in the "Grounding Circumvention in Empiricism": the two strategies are polymorphic and steganographic
16:51:23 <dcf1> same for traffic signature as for protocol payload features, as I see it.
16:51:56 <dcf1> But yeah, that's part of my idea, every time this topic comes up, everyone gets bogged down in talking about specifics, and no progress is made.
16:52:20 <dcf1> I'm hoping to cut through that impasse by giving developers some concrete targets, even if those targets are not directly useful for circumvention.
16:52:58 <robgjansen[m]> also not all censors behave the same so while we maybe only have some types of protocols that work in one country (eg because those protocols match well to something and have high collateral damage), in other countries maybe a wider set of protocols are still useful
16:53:22 <dcf1> good point robgjansen[m]
16:55:04 <meskio> yes, that will make an interesting challenge on how we clasify the protocols that work on each country to make sure to distribute working ones on each contry
16:55:22 <dcf1> robgjansen[m], I remember you were in the audience for Wang et al. 2014 at CCS (so was I)
16:55:34 <meskio> or we could just ignore that problem completely and head towards the lox idea of testing bridges and know what is blocked where without checking what protocol is there
16:56:15 <dcf1> And you asked a question afterward like, "so all these systems are totally broken, what do we do now?"
16:56:35 <robgjansen[m]> i think protocol adaptation to find the ones that work best in a target environments, and then using a non-traffic shaped approach like obfs4 still has legs, but the next step will be traffic shaping as mentioned earlier. traffic shaping is more complicated but definitely on our research plan. i do wonder how long it will take censors to just move to allowlisting and i fear we are accelerating toward that point.
16:57:57 <robgjansen[m]> > "so all these systems are totally broken, what do we do now?"
16:57:57 <robgjansen[m]> Such a good question :D
16:57:58 <shelikhoo> let's say in Turkmenistan, they are blocking entire /24 when there are proxies discovered in that ip range
16:58:04 <robgjansen[m]> * > "so all these systems are totally broken, what do we do now?"
16:58:04 <robgjansen[m]> Such a good question :D
16:58:33 <shelikhoo> as a result, there are less and less ip ranges that are reachable
16:58:39 <dcf1> sure, but Turkmenistan is hardly a representative example
16:59:15 <shelikhoo> yes, it was just an extreme example of what could happen in a allowlist future
16:59:24 <cohosh> i had a followup question about the utility of webtunnel (based on HTTPT) against host-based attacks
16:59:25 <dcf1> Allowlisting has a lot of downsides for a censor, it's not free to "just" move to allowlisting. It's only feasible in an environment like TM, where the network is so little valued they don't care about breaking it further.
16:59:37 <shelikhoo> that being said I have no idea how long or if that will actually happen in other region
17:00:06 <cohosh> if you have two (or maybe more) different shapes of traffic going to the same ip:port, can this attack be adapted to deal with that?
17:00:39 <cohosh> and how much of the benign, probe-resisting traffic would you need to throw it off?
17:00:48 <dcf1> As in a lot of activism, we have to contend with our opponent's degree of sociopathy
17:01:08 <dcf1> And it's probably true that "the censor can stay sociopathic longer than you can remain solvent" :D
17:01:47 <robgjansen[m]> sure it could possibly learn the two modes very tightly, and maybe even learn that those are actually two protocols on the same ip:port.
17:01:47 <shelikhoo> dcf1: yes, i agree with you that allowlist based censorship is a costly move, so it is not like it is imminent
17:02:05 <robgjansen[m]> I think that's what the ggfast paper is doing maybe?
17:02:36 <cohosh> what is the ggfast paper?
17:02:46 <meskio> one missfeature of webtunnel is that 位 might actually be 1 or lower for the traffic into that host from that censored country
17:02:54 <dcf1> ggfast https://dl.acm.org/doi/10.1145/3603269.3604840
17:02:57 <rwails> cohosh: I think that would help if the censor is using a classifier that is tuned to only one of the traffic shapes. We did think about possibly adapting a host-based aggregation method that allows for background traffic to exist, but it's hard to be confident that a host is participating in circumvention if there is a large fraction of flows that are not doing so
17:03:08 <cohosh> thanks for the link
17:03:45 <cohosh> one thing about webtunnel is that the probe-resistant shape can be modified without involving a protocol update in the client
17:04:08 <cohosh> it's potentially more adaptable than proteus because you don't need to ship a new bridge line with the new spec
17:04:20 <cohosh> but getting traffic to it is the tricky part
17:04:31 <cohosh> it's just there if a censor is probing it, not for regular use
17:05:08 <cohosh> which is what meskio mentioned above, i think
17:05:26 <rwails> is there a link you can drop where we can read more about webtunnel?
17:05:48 <meskio> I mean, most people I expect to host webtunnels in real websites, but those websites might not have much traffic from the censored places...
17:06:00 <cohosh> rwails: it's based on HTTPT https://censorbib.nymity.ch/#Frolov2020b
17:06:01 <shelikhoo> webtunnel is a alias of HTTPT
17:06:18 <rwails> oh ok, got it, thanks :)
17:06:20 <meskio> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/webtunnel/
17:06:50 <shelikhoo> although I am not so sure we can update its shape without an update to the client when it is used as a proxy
17:07:20 <shelikhoo> we do able to send traffic to the website it is fronting with
17:07:20 <cohosh> shelikhoo: not the shape of the circumvention flows to it, the shape of the benign flows
17:07:32 <shelikhoo> cohosh: yes
17:08:28 <dcf1> webtunnel forum posts (incl. setup guide) https://forum.torproject.org/t/tor-relays-announcement-webtunnel-a-new-pluggable-transport-for-bridges-now-available-for-deployment/8180
17:08:32 <dcf1> https://forum.torproject.org/t/call-for-testers-webtunnel-a-new-way-to-bypass-censorship-with-tor-browser/9855
17:08:44 <cohosh> i guess the shape is limited by changing the website it serves
17:10:13 <robgjansen[m]> cohosh: i don't understand your point about webtunnel enough to defend proteus the way i want, but i think the general point is either you ship a bunch of configuration choices ahead of time and have an algorithm on the client/server for choosing the right one, or you eventually have to update something
17:10:37 <robgjansen[m]> both are viable strategies for any PT i think
17:10:51 <cohosh> oh i didnt mean to say that webtunnel is strictly better than proteus
17:11:12 <shelikhoo> (BTW: research frontier in China is mostly about deniable censorship like ShadowTLSv3/restls or throttle resistant proxy like hysteria2)
17:11:23 <shelikhoo> (BTW: research frontier in China is mostly about deniable anti-\censorship like ShadowTLSv3/restls or throttle resistant proxy like hysteria2)
17:11:42 <cohosh> webtunnel is still limited in the adaptability of the circumvention protocol which seems like a pretty big shortcoming in light of this work
17:11:54 <cohosh> i was just trying to understand if its other features were useful here
17:12:52 <robgjansen[m]> ahh ok. yeah i think adaptability is huge in our game of cat-and-mouse. hopefully it's already on dcf1 's list of dev challanges :)
17:12:53 <shelikhoo> and tool to enable traffic shaping could be one of the next
17:14:00 <onyinyang[m]> it seems like the discussion is winding down a bit and we're ~15min over time
17:14:20 <onyinyang[m]> does anyone have any final thoughts or questions?
17:14:37 <onyinyang[m]> Otherwise, perhaps we can move further discussion to #tor-anticensorship:matrix.org ?
17:15:07 <cohosh> rwails: robgjansen[m]: this paper was really great
17:15:15 <cohosh> thanks for writing it and discussing with us
17:15:21 <dcf1> agreed, quality research
17:15:30 <dcf1> I appreciate the point you make in the conclusion:
17:15:35 <dcf1> "We focus on exploring realistic censorship adversaries *in service of understanding how to develop stronger CRSes*."
17:15:43 <meskio> yes, was a great paper and conversation
17:15:56 <shelikhoo> thanks for your work! nice paper!
17:16:30 <onyinyang[m]> yes! Great job and hopefully this will spur future research in helpful directions for censorship resistance!
17:16:56 <rwails> thanks! we're glad it was useful :) happy to have any follow up discussion too if there are any other questions
17:17:11 <robgjansen[m]> Thanks for the nice comments and great discussion! Of course we're available if you have any more questions later or what to discuss further.
17:17:24 <robgjansen[m]> s/what/want/
17:18:26 <onyinyang[m]> Great, thank you both for joining in the discussion today :)
17:18:28 <onyinyang[m]> With that, I will end the meeting now
17:18:31 <onyinyang[m]> #endmeeting