[Haskell-community] 2018 state of Haskell survey results

Gershom B gershomb at gmail.com
Mon Nov 19 05:06:07 UTC 2018


On Sun, Nov 18, 2018 at 11:20 PM Richard Eisenberg <rae at cs.brynmawr.edu> wrote:
>
> I have not analyzed the data myself, but I wonder how we jumped to the conclusion that the troll was trying to promote Stack. Is there statistical data that supports that conclusion? For example, just reading this thread, it sounds like the bogus responses also really don't like the new release schedule. Maybe the troll wants the old release schedule back and was just lazy about programming the tool to vary the stack/cabal question answers adequately.

Roughly 90% of the bogus responses disliked the new ghc schedule and
10% left the answer blank. As far as I know, 100% of the bogus
responses said they used stack exclusively. The answers to almost
every other question (except, I think, for targeted platform?) varied
significantly (although according to either uniform, linear, or normal
distributions for the most part). So as guesses go, this seems pretty
strong.

I will also say, though there's speculation about "false flags" and
other silliness floating around that I personally have a very good
guess as to who did this. There's one well-known troll who has these
preoccupations and is known for creating serial sockpuppet accounts,
and is just the right amount of obsessed to do something like this. A
few of the bogus responses actually had comments, and the comments
were all written in a voice that was unmistakeable as this troll as
well. Occam's razor seems to apply.

Finally, let me add why I don't think this was a "false flag" -- while
there were enough telltale markers that the fake answers could seem to
be detected, I don't think this was on purpose. There was _too much_
effort put into distributions of other choices, etc. If they had
wanted the fakes to be detected they would have left much stronger
evidence. Rather, from a forensic standpoint, this seems pretty clear
to me that the pattern of data is of someone _trying_ to cover their
tracks, but just making four or five errors which I could assemble
into a pattern. If they hadn't made those errors -- likely based on
bad priors about what the organic data would be that theirs would need
to "mesh" into -- then I think the deception would have been much
harder to detect.

--Gershom

> Given the contention around cabal vs stack, I agree that sociological concerns suggest that the troll meant to tilt those scales. But I wouldn't want a public accusation without at least some statistical analysis that independently supports that conclusion.
>
> In any case, thanks to all for putting this together!
>
> Richard
>
> On Nov 18, 2018, at 4:31 PM, Taylor Fausak <taylor at fausak.me> wrote:
>
> Oops, the ordering of the answer choices is manual because some questions have a natural order while others should just be most to least popular. I've made another run through to make sure everything is sorted properly. I'll probably hit publish in the next half hour or so unless there are any objections.
>
> https://github.com/tfausak/tfausak.github.io/blob/fce97d07c369856d4c05b756c492eb6229a1b5c7/_posts/2018-11-18-2018-state-of-haskell-survey-results.markdown
>
>
> On Sun, Nov 18, 2018, at 3:07 PM, Gershom B wrote:
>
> The language extensions section doesn’t appear to be sorted properly. Outside of that, I think that these results are looking much better and any effort to find any additional outliers is probably not worth it for the moment. Thanks for your work on this, and I appreciate you being responsive and attentive when problems with the data were pointed out. There’s certainly some interesting and helpful information to be gleaned from this data.
>
> Cheers,
> Gershom
>
>
>
>
> On November 18, 2018 at 2:55:10 PM, Taylor Fausak (taylor at fausak.me) wrote:
>
>
>
>
> Ok, I updated the function that checks for bad responses, re-ran the script, and updated the announcement along with all the assets (charts, tables, and CSV). Hopefully it's the last time, as I can't justify spending much more time on this.
>
> https://github.com/tfausak/tfausak.github.io/blob/6f9991758ffeed085c45dd97e4ce6a82a8b1a73f/_posts/2018-11-18-2018-state-of-haskell-survey-results.markdown
>
>
> On Sun, Nov 18, 2018, at 2:32 PM, Michael Snoyman wrote:
>
> Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report.
>
> On 18 Nov 2018, at 21:17, Taylor Fausak <taylor at fausak.me> wrote:
>
> Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over-representing Stack and Stackage. Also, bizarrely, Java.
>
> That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses.
>
>
> On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote:
>
> What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year?
>
> The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary.
>
>
> Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels?
>
> Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats.
>
> Is either library likely to be influenced by this survey?
> _______________________________________________
> Haskell-community mailing list
> Haskell-community at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
>
> _______________________________________________
> Haskell-community mailing list
> Haskell-community at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
>
> _______________________________________________
> Haskell-community mailing list
> Haskell-community at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
>
> _______________________________________________
> Haskell-community mailing list
> Haskell-community at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
>


More information about the Haskell-community mailing list