Telemetry (WAS: Attempt at a real world benchmark)

Fri Dec 9 14:52:26 UTC 2016

>> It could tell us which language features are most used.
>
> Language features are hard if they are not available in separate libs. 
> If in libs, then IIRC debian is packaging those in separate packages, 
> again you can use their package contest. 

What in particular makes them hard? Sorry if this seems like a stupid 
question to you, I'm just not that knowledgeable yet. One reason I can 
think of would be that we would want attribution, i.e. did the developer 
turn on the extension himself, or is it just used in a lib or template – 
but that should be easy to solve with a source hash, right? That source 
hash itself might need a bit of thought though. Maybe it should not be a 
hash of a source file, but of the parse tree.

>> The big issue is (a) design and implementation effort, and (b) 
>> dealing with the privacy issues. I think (b) used to be a big deal, 
>> but nowadays people mostly assume that their software is doing 
>> telemetry, so it feels more plausible.  But someone would need to 
>> work out whether it had to be opt-in or opt-out, and how to actually 
>> make it work in practice.
>
> Privacy here is complete can of worms (keep in mind you are dealing 
> with a lot of different law systems), I strongly suggest not to even 
> think about it for a second. Your note "but nowadays people mostly 
> assume that their software is doing telemetry" may perhaps be true in 
> sick mobile apps world, but I guess is not true in the world of 
> developing secure and security related applications for either server 
> usage or embedded.

My first reaction to "nowadays people mostly assume that their software 
is doing telemetry" was to amend it with "* in the USA" in my mind. But 
yes, mobile is another place. Nowadays I do assume most software uses 
some sort of phone-home feature, but that's because it's on my To Do 
list of things to search for on first configuration. Note that I am 
using "phone home" instead of "telemetry" because some companies hide it 
in "check for updates" or mix it with some useless "account" stuff. 
Finding out where it's hidden and how much information they give about 
the details tells a lot about the developers, as does opt-in vs opt-out. 
Therefore it can be a reason to not choose a piece of software or even 
an ecosystem after a first try. (Let's say an operating system almost 
forces me to create an online account on installation. That not only 
tells me I might not want to use that operating system, it also sends a 
marketing message that the whole ecosystem is potentially toxic to my 
privacy because they live in a bubble where that appears to be 
acceptable.) So I do have that aversion even in non-security-related 
contexts.

I would say people are aware that telemetry exists, and developers in 
particular. I would also say developers are aware of the potential 
benefits, so they might be open to it. But what they care and worry 
about is /what/ is reported and how they can /control/ it. Software 
being Open Source is a huge factor in that, because they know that, at 
least in theory, they could vet the source. But the reaction might still 
be very mixed – see Mozilla Firefox.

My suggestion would be a solution that gives the developer the feeling 
of making the choices, and puts them in control. It should also be 
compatible with configuration management so that it can be integrated 
into company policies as easily as possible. Therefore my suggestions 
would be

  *

    Opt-In. Nothing takes away the feeling of being in control more than
    perceived "hijacking" of a device with "spy ware". This also helps
    circumvent legal problems because the users or their employers now
    have the responsibility.

  *

    The switches to turn it on or off should be in a configuration file.
    There should be several staged configuration files, one for a
    project, one for a user, one system-wide. This is for compatibility
    with configuration management. Configuration higher up the hierarchy
    override ones lower in the hierarchy, but they can't force telemetry
    to be on – at least not the sensitive kind.

  *

    There should be several levels or a set of options that can be
    switched on or off individually, for fine-grained control. All
    should be very well documented. Once integrated and documented, they
    can never change without also changing the configuration flag that
    switches them on.

There still might be some backlash, but a careful approach like this 
could soothe the minds.

If you are worried that we might get too little data this way, here's 
another thought, leading back to performance data: The most benefit in 
that regard would come from projects that are built regularly, on 
different architectures, with sources that can be inspected and with an 
easy way to get diffs. In other words, projects that live on github and 
travis anyway. Their maintainers should be easy to convince to set that 
little switch to "on".

Regards,
MarLinn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20161209/d8bb4388/attachment.html>