[Haskell-cafe] Speculation, OT: Program a Spreadsheet

Wed Nov 22 21:06:03 UTC 2017

Am 22.11.2017 um 06:06 schrieb trent shipley:
> On Tue, Nov 21, 2017 at 1:50 PM Joachim Durchholz <jo at durchholz.org 
> <mailto:jo at durchholz.org>> wrote:
> 
>     Am 21.11.2017 um 08:13 schrieb trent shipley:
>      >     Which one it is that you are interested in?
>      >     "Three-dimensional" as in "we have row, column, and layer",
>     or as in "we
>      >     have hierarchical workbooks"?
>      >
>      > They are not mutually exclusive. A workbook can contain sheets--just
>      > like now. IN ADDITION, you can put workbooks in workbooks, until you
>      > reach the root workbook, which is also a file.
> 
>     Not an answer that helps me understand your point.
> 
> 
> One more try.

So you mean you want both.

Not sure how useful a 3D sheet is. It's too abstract for those who need 
the visual feedback, and not powerful for those who deal with abstractions.

A hierarchy of sheets is pretty useful in my book.

I think the main problem for me was that you didn't spell out clearly 
which part of your description related to existing concepts and which 
was new, which one was desirable and which one was not interesting to you.
(Defining what's NOT in a project is often the most important task when 
outlining it.)

> Note, that it would be convenient if the spreadsheet application 
> numbered columns, rows, sheets, and virtual workbooks by default instead 
> of naming sheets like Excel seems to do.

Excel has the problem that there is no uniform naming convention for a 
cell. An address that includes the sheet name looks and acts in a pretty 
different fashion than one that's in a cell.

> If you were to allow function libraries or objects in a spreadsheet 
> application you could store them in workbooks or virtual workbooks, 
> extending the concept of putting a proper function in sheet marked with 
> the function property.

I don't see what problem is being solved with that.
But then I do not really share your vision it seems, so it's okay.

> -------------------------------------------------------
> 
> 
>      >      > Also, if sheets can be marked as being functions, virtual
>      >     workbooks can
>      >      > support libraries of functions, or objects, if you want
>     such things.
>      >
>      >     Sometimes I think Excel etc. got it all wrong.
>      >     People routinely reserve areas inside the spreadsheet for
>     different
>      >     tasks: input, intermediate results, output. Increasing the
>     size of each
>      >     area becomes a pain.
> 
> You would need some objects.  The idea of linked areas. I'm guessing 
> that Excel tables, kinda do this, but only for one contiguous area on 
> one sheet.

Exactly.

>      >     I'd want to have a spreadsheet where it's easy to define
>     multiple sheets
>      >     on the same "table". Extending the input sheet with another row
>      >     automatically extends the output sheet by a row (and if you
>     have rollup
>      >     then we get more than one row). 
> 
> So this would be a three dimensional Excel table basically? It would 
> involve the program taking a best guess at what you want in a lot of cases.

No, what I described above is entirely within the 2d area.

Though it should extend naturally to an arbitary number of dimensions. 
Not because I find multiple dimensions a killer feature, but it should 
be there for those who need it.
Plus this removes the somewhat arbitrary definitions of a cell address, 
which in Excel is "the file or data source name, the sheet name, and 
exactly two numbered coordinates". Plus all kinds of extra ad-hockery to 
integrade ODBC data sources (which is useful but requires jumping 
through extra hoops to get a complete definition of "what is a cell 
address").

>     Intermediate sheets could be replaced by
>      >     a function definition.
>      >
>      >     It's a bit of an ergonomic challenge; functions are much more
>     abstract,
>      >     and users will want to see how the results of applying a
>     function to an
>      >     input area look like. So the mental model might be that we
>     still have
>      >     intermediate sheets, but with just a single function, 
> 
> It has been suggested that you can write functions when you would want a 
> complex chain of calculations.

That's exactly my thinking.

> How would you bind one function to a column/row/range. You would need a 
> visual representation of the binding. Current spreadsheets don't do 
> that.

That's one of the major reasons why spreadsheets are such a maintenance 
nightmare.

 > (Except Emacs Org Mode spreadsheets have column functions.)
> 
> In the existing paradigm, you would try your best to generate a 
> plausible solution, then copy it to each cell in the new range in the 
> calculation layer(s). My intuition is to get that working, THEN conquer 
> the single function bound to many cells problem.

You will need a rewrite then.
Well, that's just my intuition, based on a few decades of designing and 
coding software but even that much experience can lead one astray, so 
don't take just my word for it.

>     and the
>      >     intermediate sheets can be create, destroyed, shown and
>     hidden easily.
> 
> Are you hoping to autogenerate the functions?

No... rather the other way round: Have the user define a function 
(ideally without calling it that) and let him put that function into a 
column.
CPU people call this the "SIMD model": SIMD = Single Instruction, 
Multiple Data. Or "vector operation". But don't mention that in the 
manual :-)
It's pretty near to the GPU model of doing things. Though you wouldn't 
want to model that you cannot really look into what a GPU is doing and 
why, which you definitely don't want in a spreadsheet (and not in a GPU 
either if you're struggling to make it do what you wanted, instead of 
just painting the whole screen in pink and you don't know why).

> Data -> range bound mono-function on intermediate function page -> Results

I'm more after functions bound to columns.

>      >     The other challenge would be dealing with headers. Headers are
>      >     important, but for the intermediate we have just the function
>     so it
>      >     needs to provide the headers. I.e. the function result would
>     have to be
>      >     a named tuple, and if you want multiline headers, we get a
>     hierarchical
>      >     named tuple.
> 
> 
> Concatenated names, generated functions get numbers added to the 
> concatenated name. I don't think you can make the names you generate 
> terribly pretty or human readable.

I'm definitely not after anything autogenerated. Particularly not names.
One could autogenerate a tuple definition from individual field names or 
something like that, but even then I wouldn't leave it to the user to 
give the thing a name.

> The intermediate function guessed and generated for the addition isn't 
> transient, it's durable data too.

I think we're not talking about the same concept here.
No guessing, no name autogeneration.

>      >     Manipulating the column order would also manipulate the tuple
>      >     definition, i.e. you'd have to think about how GUI
>     manipulation affects
>      >     the function definition.
> 
> 
> That's more heuristics, and UX/UI testing. Excel guesses wrong on this 
> frequently, and when it does it's a pain. But mostly it's behaves well. 
> I'd want to leverage the current engine that deals with GUI implications 
> for the model.

If leveraging the existing engine were possible, Microsoft would have 
done it years ago.

> I'm still not certain I'm on the same page as you, but that's a little 
> more effort.

Probably not, no.

>      > Any suggestions for a GUI library--that kind of implies C++ or Java.
>      > Ideally your GUI would be cross platform.
> 
>     I know only the Java GUI landscape well enough to say anything.
>     In that area, dialog boxes, menus and such will be easy enough with any
>     lib, but you'll be mostly on your own for the grid. Also if you aim for
>     unlimited size you'll get into a lot of hairy memory management issues -
>     a 64kx64k grid has 1 million cells, multiply that with 1KB per cell and
>     you end with a memory footprint of a gigabyte. Evaluating a formula for
>     each cell is going to get you a slideshow-like performance.
> 
> 
> <https://en.wikipedia.org/wiki/List_of_spreadsheet_software>
> https://en.wikipedia.org/wiki/List_of_spreadsheet_software

Spreadsheet software is definitely not a GUI library.
But I guess I'm talking about a new spreadsheet software and you're 
talking about extending an existing one. As I said above, I don't think 
the latter will work.

Anyway. We've been talking at cross-purposes for a while now, and I 
guess we have given each other enough keywords for our own thinking, but 
we're probably after too different things to make further discussions.
I'm still open for questions to clarify a point, but I don't think I can 
help you much when it comes to elaborating project options.

Regards,
Jo