<div dir="ltr">Hi Alex,<br><br>this is obviously highly ambitious if you want to get this right. If<br>you actually plan to start on this, then since there are plenty of<br>DSL's that eventually run on one machine, then where I would start is<br>the distributed part. I.e make something that passes around an Int,<br>and have it deploy to any number of machines. Then add gradually add<br>complexity, like distributed queues and workers, ways to enforce<br>ordering on when results of works are to be<br>submitted/accepted. Implementing precedence graphs would be<br>interesting. Then there is limiting congestion and probably many more<br>kinds of limits you want to add to different points in the graph.<br><br>Just some random ideas. But start and deploy something very simple at<br>first.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 8, 2017 at 2:21 PM, Alex <span dir="ltr"><<a href="mailto:alex@centromere.net" target="_blank">alex@centromere.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>

<br>

I'm seeking feedback for a project I'd like to start. I have a bit of<br>

experience developing large scale systems containing many<br>

microservices, databases, message queues, and caches over many VMs. Time<br>

and time again I find myself confronted with the same problems:<br>

<br>

1. It is difficult to trace events through the system: Consider an HTTP<br>

request made by a customer to a public API. Which microservices were<br>

impacted by that request? What SQL queries were run as a result of that<br>

request? What 3rd party APIs were consulted during the request's<br>

fulfillment? Answers to these questions are essential to fixing bugs<br>

quickly, and yet they are so difficult to answer (at least in my<br>

experience).<br>

<br>

2. Problems are difficult to reproduce: When Customer Success walks in<br>

and says, "I have an angry customer on the phone. They want to know why<br>

[FOO] wasn't properly [BAR]" it is often impossible to give an answer<br>

without interactive troubleshooting and hours of grepping through<br>

unstructured log files. Troubleshooting may incur additional expenses<br>

too, since (for instance) you may hit your API request limit for a 3rd<br>

party service.<br>

<br>

3. Business and non-business logic are not well encapsulated: Often I<br>

see code related to (for example) RabbitMQ interwoven with core business<br>

logic when calls need to be made to other microservices. The fact that<br>

RabbitMQ facilitates communication between microservices is an<br>

implementation detail that I shouldn't have to think about.<br>

<br>

4. Resource consumption is non-uniform: Some microservices are more<br>

demanding than others in terms of CPU, memory, and disk usage.<br>

Achieving optimal "packing" is difficult. In other words, some VMs<br>

will have a high load and others will remain idle. Auto scaling groups<br>

can help with this in theory, but I don't think they can achieve the<br>

kind of density I would like to see.<br>

   Moreover, what constitutes a "resource"? If a 3rd party service<br>

rate limits requests by IP address, couldn't each request be considered<br>

a resource unit which needs to be properly load balanced, just as you<br>

would with CPU?<br>

<br>

Given these motivations, I would like to flesh out some ideas for a<br>

framework/platform which addresses these issues. These ideas are<br>

half-baked and may not tie in well with one another.<br>

<br>

I envision a distributed system as follows:<br>

<br>

1. One kind of VM:<br>

    DevOps people have a saying: "Treat your VMs like cattle, not<br>

pets". In practice, "cattle" becomes "cows, chickens, pigs, and<br>

lobster". VMs typically have an assigned role, and they become part of<br>

a group which may or may not be auto-scaling. For a given instantiation<br>

of this hypothetical platform, I would like to see a single kind of VM.<br>

That is, every VM is identical to every other VM, and they all run the<br>

same Haskell application.<br>

<br>

2. Strict separation of business and non-business logic: The framework<br>

should handle all aspects of communication between nodes (like Cloud<br>

Haskell does) in a pluggable and transparent way, but that's not all.<br>

The framework should have first class support for other integrations<br>

(such as PagerDuty alerting, performance monitoring, etc) which are<br>

described below.<br>

<br>

3. Pool coordination via DSL: The entire pool of VMs is<br>

orchestrated/coordinated by one ore more "scripts" written in a DSL,<br>

which is implemented as a Free Monad. Every single "operation" or<br>

"primitive" in your AST data type is Serializable, and when the<br>

framework interprets the DSL, it serializes the instruction and sends<br>

it over the network to a node for execution. The particular node on<br>

which the instruction gets executed is chosen by the platform, not the<br>

developer.<br>

<br>

4. Smart resource consumption: Each node brings with it a set of<br>

resources. It is *not* my intention to create a system which views CPU,<br>

memory, etc as a contiguous unit. Rather, each primitive instruction in<br>

the AST is viewed as a "black box" which can only consume as much CPU<br>

and memory as the node has available to it. The framework is<br>

responsible for profiling each instruction and scheduling future<br>

instructions to a node for which resources are predicted to be<br>

available.<br>

   The developer should be able to define new resources such as 3rd<br>

party API calls, bandwidth, database connections, etc, all of which are<br>

profiled just as CPU and memory would.<br>

<br>

5. Browser based control panel: Engineers should have a GUI at their<br>

disposal which allows them to watch -- in real time -- the execution<br>

flow of the DSL script.<br>

<br>

6. Structured logs with advanced filtering: All log output should be<br>

structured with first class support for shipping the data to<br>

Logstash/ElasticSearch. The aforementioned GUI should be able to<br>

selectively filter output based on certain pre-defined predicates and<br>

display them to the developer. For example, if you're building an email<br>

virus scanning system (which may see millions of emails per day), you<br>

may want to limit the real-time debugging output to only a specific<br>

customer.<br>

<br>

7. First class integration with modern tools and services: The system<br>

should integrate with Consul, PagerDuty, statsd, RabbitMQ, memcache,<br>

DataDog, Logstash, and Slack, with new integrations being easy to add.<br>

This is vital for clean separation of business and non-business logic.<br>

For example, the developer should be able to cache certain bits of data<br>

at will, without having to worry about opening and managing a TCP<br>

connection to memcache.<br>

<br>

This is my vision, and I want to build it completely in Haskell. What<br>

do you all think?<br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Alex<br>

______________________________<wbr>_________________<br>

Haskell-Cafe mailing list<br>

To (un)subscribe, modify options or view archives go to:<br>

<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-<wbr>bin/mailman/listinfo/haskell-<wbr>cafe</a><br>

Only members subscribed via the mailman list are allowed to post.</font></span></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Markus Läll<br></div>

</div>