Hacker News new | past | comments | ask | show | jobs | submit login
Matrices and Graph (thepalindrome.org)
205 points by jgrodziski on July 15, 2023 | hide | past | favorite | 45 comments



Fun fact: this is only valid for domains that have a notion of "selfness", i.e. that there is such thing as an "identity matrix" for the quantities.

Consider the following square matrix:

            TSLA APPL GOOG MSFT    
    Alice | 100   5     0    1
    Bob   |  0    30   100   5
    Carol |  2    2     2    2
    Dan   |  0    0     0  1000
An input vector of stock prices gives an output vector of net worths. However, that is about the only way you can use this matrix. You cannot transform the table arbitrarily and still have it make sense, such as applying a rotation matrix -- it is nonsensical to speak of a rotation from Tesla-coordinates to Google-coordinates. The input and output vectors lacks tensor transformation symmmetries, so they are not tensors.

This is also why Principal Component Analysis and other data science notions in the same vein are pseudoscience (unless you evaluate the logarithm of the quantities, but nobody seems to recognize the significance of unit dimensions and multiplicative vs additive quantities)


There's a little more nuance:

1. Technically, the table you shared is better thought of as a two-dimensional tensor, rather than a "graph-like matrix" -- which as you point out must be a linear map from a (vector) space to itself.

2. While not technically "Principal Component Analysis", one could do "Singular Value Decomposition" for an arbitrarily shaped 2-tensor. Further, there are other decomposition schemes that make sense for more generic tensors.

3. (Rotations / linear combinations in such spaces) Given a table of stock holdings, it can be sensible to talk about linear combinations / rotations etc. Eg: The "singular vectors" in this space could give you a decomposition in terms of companies held simultaneously by people (eg: SAAS, energy sector, semiconductors, entertainment, etc). Likewise, singular vectors on the other side would tell you the typical holding patterns among people (and clustering people by those, eg. retired pensioner invested for steady income stream, young professional investing for long-term capital growth, etc). As it turns out, this kind of approximate (low-rank) factorization is at the heart of recommender systems.


Yeah, this is also the case if the table is not square as the values can't represent edges any more. So its more something like the rows and columns should index the same "thing".

By the way by changing the graph representation we can give meaning even to non square matrices as described in this article https://www.math3ma.com/blog/matrices-probability-graphs


I think the most common application is describing mesh connectivity in Finite Element Methods, where each entry in the matrix represents the influence each node has on each other. Basically any N^2 table can be constructed to describe the general dependency of components in any simulation or systems in genera.


Another notable example is the PageRank algorithm [0] where you consider the graph where nodes are web pages and edges are links between them and you can build an adjacency matrix of this graph and with this algorithm sort the pages based "popularity" (which pages have more links pointing to them intuitively)

Let's say that in most cases you have a graph and you consider the corresponding matrix. Doing the inverse is not as useful in practice except in some cases as explained in the article.

[0]: https://en.wikipedia.org/wiki/PageRank


You make a good point about types of matrices that a graph representation makes sense with but it seems a bit much to say that PCA is pseudoscience?

If you had a lot of people and a lot of stocks, a low-rank representation of the matrix (probably not PCA per se with that particular matrix, but something closely related) could convey a lot of information about, e.g., submarkets and how they're valuated together. Or not, depending on how those prices covary over time.


I disagree... there are more ways you can use this matrix to creatively extract information out of it.

For instance, you can normalize along the columns, and build a "recommender system" using matrix factorization.

With that, when a new person comes with a portfolio, the system will output a probability for this new person to acquire the other assets he doesn't have.

It's (the very basic) idea of how Netflix recommends movies.


When I try to get this point across about techniques like the PCA, I like to show that the measurement units strongly affect the inference.

Really, if your conclusions change depending on whether you measure in inches or centimeters, there’s something wrong with the analysis!


I would disagree and here is why:

> When I try to get this point across about techniques like the PCA, I like to show that the measurement units strongly affect the inference.

In such a case the problem is not with PCA but with application. PCA is just a rotation of the original coordinate system that projects the data on new axes which are aligned with the directions of highest variability. It is not the job of PCA to parse out the origin of that variability (is it because of different units, or different effects).

> Really, if your conclusions change depending on whether you measure in inches or centimeters, there’s something wrong with the analysis!

To get a statistical distance one should: subtract the mean if the measurements differ in origin; divide by standard deviation if the measurements differ in scale; rotate (or equivalently compute Mahalanobis distance) if the measurements are dependant (co-vary). The PCA itself is closely related to Mahalanobis distance: Euclidian distance on PCA-transformed data should be equivalent to Mahalanobis distance on the original data. So, saying that something is wrong with PCA because it doesn't take units of measurement into account is close to saying that something is wrong with dividing by standard deviation because it doesn't subtract the mean.


Is the effect of measurement units eliminated by applying something like zero mean unit variance normalization prior to dimensionality reduction?


I dunno, there are some semi-useful things you can do.

For example, the transform from (Alive, Bob, Carol, Dan) to (Male, Female) is linear -- it's another matrix that you can compose with the individual-ownership one you have here.

Or, call your individual-ownership matrix A, and say that P is the covariance of daily changes to prices of the four stocks listed. Then A P A' is the covariance of daily changes to the peoples' wealths. The framing as linear algebra hasn't been useless.

I kinda get what you're saying though. Like, why would powers of this matrix be useful? It only makes sense if there's some implicit transform between prices and people, or vice versa, that happens to be an identity matrix.

You can make up a story. Say the people can borrow on margin some fraction of their wealth. Then say that they use that borrowing to buy stock, and that that borrowing affects prices. Composing all these transforms, you could get from price to price, and then ask what the dynamics are as the function is iterated.

But, ok, "I'm just going to do an SVD of the matrix and put it in a slide" isn't going to tell anybody much.

Maybe there's a use for a rank-one approximation to this system? Like, "this is pretty close to a situation where there's a single ETF with those stocks in these proportions, and where the people own the following numbers of shares in the ETF"? Maybe if you have millions of people and millions of stocks and wanted to simulate this "stock market" at 100Hz on a TI-83?

I dunno. You can make up stories.


Is there any ___domain where you can apply arbitrary transformations on a table and still make sense? I feel there is some depth in your argument that I cannot infer just by the content of your comment and I would be keen to look further into it. I.e in you ___domain example, a currency would be coordinate and you can move to alternate currencies? Would that be the identity you look for?


This approach reminds me of RedisGraph[1] (which is now unfortunately EoL).

"RedisGraph is the first queryable Property Graph database to use sparse matrices to represent the adjacency matrix in graphs and linear algebra to query the graph."

1. https://github.com/RedisGraph/RedisGraph


RDF-star and SPARQL-star are basically Property Graph interfaces if you don't validate with e.g. RDFS (schema.org,), SHACL, json-ld-schema (jsonschema+shacl), and/or OWL.

Justify Linked Data; https://5stardata.info/

W3C RDF-star and SPARQL-star > 2.2 RDF-star Graph Examples: https://w3c.github.io/rdf-star/cg-spec/editors_draft.html#rd...

    def to_matrices(g: rdflib.MultiDiGraph) -> Union[Matrix, Tensor]
rdflib.MultiDiGraph: https://networkx.org/documentation/stable/reference/classes/...

Multigraph: https://en.wikipedia.org/wiki/Multigraph :

> In mathematics, and more specifically in graph theory, a multigraph is a graph which is permitted to have multiple edges (also called parallel edges[1]), that is, edges that have the same end nodes. Thus two vertices may be connected by more than one edge ... [which requires multidimensional matrices, netcdf (pydata/xarray,), tensors, or a better implementation of a representation; and edge reification in RDF]

From "Why tensors? A beginner's perspective" https://news.ycombinator.com/item?id=30629931 :

> https://en.wikipedia.org/wiki/Tensor

... Tensor product of graphs: https://en.wikipedia.org/wiki/Tensor_product_of_graphs

Hilbert space: https://en.wikipedia.org/wiki/Hilbert_space :

> The inner product between two state vectors is a complex number known as a probability amplitude.


Beautifully done. I subscribe to the author’s Substack, lots of other really nice stuff there.

A little off topic, but this is just one more example of beautifully done content on Substack. I have seriously considered setting my freedom.to settings to only allow accessing HN, FB, Twitter, Mastodon, etc., 1 or 2 mornings a week, and that time would mostly be for ensuring that I was always subscribed to a few good Substack channels.

With the explosion of ‘tech stuff I should read’, I think I need a more extreme culling of what I spend my time on.


Another interesting mapping is that a vector is (or can be thought of as) a discrete function (f(x) = ....) over an interval, a dot product of two vectors is a discrete integral product, and a matrix is a discrete scalar field.

I wonder what the continuous form of a graph is... Some sort of a manifold perhaps?


> I wonder what the continuous form of a graph is... Some sort of a manifold perhaps?

Exactly!

The correspondence between manifolds and graphs is very beautiful. What many folks call today "graph signal processing" has traditionally been called "discrete differential geometry". Scalar fields are functions defined on vertices, vector fields are functions defined on edges, the incidence matrix is the gradient operator, its transpose is the divergence, the Laplacian is the divergence of the gradient, integrals and fluxes are scalar products by indicator functions, the boundary operator is minus the gradient, Green's formula is just matrix transposition, etc.

You can even go further in the analogy and define p-forms as functions defined on the p-cliques of the graph, and from that rebuild a whole discrete Hodge theory. The correspondence is almost perfect, except for the fact that you cannot write easily the product rule for derivatives (because you cannot multiply pointwise scalar fields with vector fields).


And electromagnetism seems to be a by product of discrete differential geometry. Such a fascinating subject. Makes continuous treatment look like a mess.


Extremal graph theory models graphs with an infinite number of vertices as real-valued functions on the unit square. Book: https://lovasz.web.elte.hu/bookxx/hombook-almost.final.pdf


>continuous form of a graph

A graphon?

Edit: This was already mentioned by meindnoch.


A bivariate function?


GraphBLAS completely revolves around this duality. Sparse linear algebra is really powerful. https://graphblas.org/


If folks are looking for terms to Google, try "spectral graph theory" and "algebraic graph theory"

https://en.wikipedia.org/wiki/Spectral_graph_theory

https://en.wikipedia.org/wiki/Algebraic_graph_theory

Pretty much every field in math has a related field where you try to turn problems from the former into linear algebra problems.

The "spectral theorem" (https://en.wikipedia.org/wiki/Spectral_theorem) is an important theorem in linear algebra that gives conditions for when a matrix can be diagonalized, which is closely related to what its eigenvalues/eigenvectors look like.

The simplest version of the spectral theorem says that a symmetric matrix with real-number entries has real-number eigenvalues. The eigenvalues of a matrix are called the "spectrum of the matrix", hence "spectral theorem" and "spectral graph theory".

The adjacency matrix of any undirected graph is real symmetric, so its eigenvalues are all real numbers and it's natural to ask whether they say anything about the underlying graph.

Lucky for us, there are lots of surprising connections!

For example, say G is an finite undirected graph. The chromatic number of G, denoted χ(G), is the fewest number of colors needed to color its vertexes so that no two adjacent vertexes have the same color.

If λ₁ is the largest eigenvalue of G's adjacency matrix then there's a theorem (Wilf's theorem) that says

    χ(G) ≤ 1 + ⌊λ₁⌋
That is, you can always color a graph with 1 + ⌊λ₁⌋ colors, where ⌊x⌋ is the floor of x.

And there are some (finite, undirected) graphs that require exactly 1 + ⌊λ₁⌋ colors, so we're not doing any better unless we can say something more specific about the graph.

Wilf's Theorem:

https://www2.math.upenn.edu/~wilf/website/Eigenvalues%20of%2...

https://www2.math.upenn.edu/~wilf/website/Inequality%20for%2...


If expressed as adjacency matrix every graph is a square matrix. And every square matrix has a characteristic polynomial. So that means: Every graph is also a polynomial!

see https://mathworld.wolfram.com/CharacteristicPolynomial.html


what book to read to develop these concepts and intuitions with engineering level of math (not proofs)?


There is a book I would like to recommend, as it was not mentioned. The book "Graph algorithms in the language of linear algebra"[1] gives an overview and intuition of different graph algorithms expressed in linear algebra using semirings. The concept of expressing graphs as adjacency matrices (or incidence matrices) is quite old and was already noted by Koenig[4][5]. I remember the duality was used in some proofs and algorithms while doing my CS degree, e.g., see the Floyd–Warshall algorithm. One advantage of using linear algebra, apart from the beauty of it, is that naturally, vectorization and parallelization strategies become easier to implement. In practice, sparse linear algebra is used to reduce space and computational complexity. This is done by storing the matrices in compressed formats, such as COO/CSR/ELLPack.

In HPC, there are multiple frameworks to do distributed sparse linear algebra, and also more graph and GPU focussed frameworks using semirings as the abstraction layer (CombBLAS[2], GraphBLAST[3]).

References:

[1] Kepner, Jeremy, and John Gilbert, eds. Graph algorithms in the language of linear algebra. Society for Industrial and Applied Mathematics, 2011.

[2] Buluç, Aydın, and John R. Gilbert. "The Combinatorial BLAS: Design, implementation, and applications." The International Journal of High Performance Computing Applications 25.4 (2011): 496-509.

[3] Yang, Carl, Aydın Buluç, and John D. Owens. "GraphBLAST: A high-performance linear algebra-based graph framework on the GPU." ACM Transactions on Mathematical Software (TOMS) 48.1 (2022): 1-51.

[4] D. Konig. Graphen und Matrizen (Graphs and matrices). Matematikai Lapok, 38:116–119, 1931.

[5] D. Konig. Theorie der endlichen und unendlichen graphen (Theory of Finite and Infinite Graphs). Leipzig: Akademie Verlag M.B.H. 1936.


I don't know how proof heavy this course is and it is not specialized on graphs: "Computational Science and Engineering I" -- Gilbert Strang (MIT)

https://ocw.mit.edu/courses/18-085-computational-science-and...


The textbook for this course is probably one of the finest Strang ever wrote. It's also worth looking at the original edition of the textbook, which was called "Introduction to Applied Mathematics" and has a much more thorough treatment of the parallels between matrices, graphs, and differential operators, and their use in optimization problems. CSE is much more practical for solving actual scientific computing problems, though, even if I find the layout somewhat less beautiful.


oh my, thanks! Ed 1 is the exact book i wanted


Were you able to find a copy? I found this on archive.org -- https://archive.org/details/introductiontoap0000stra/page/n9...


I highly recommend anything in Finite Element Methods, it gives you an immediate grounding to a concrete application. I personally really benefitted a lot in the following YouTube lecture series:

https://www.math.colostate.edu/~bangerth/videos.html


I think I’m misunderstanding. The node relabeling seems backwards.

He says start with the highest order, which makes me think the neighborhood with order 3 would get the smaller node labels, and the neighborhoods with order 0 would get the highest.

It looks like the opposite was done.


Does anyone know how the graph illustrations were created? I have driven myself mad with tikz and dot .


Author here. I rendered (most of the) equations with QuickLaTeX (https://quicklatex.com/), and put the illustrations together in Inkscape by hand (https://inkscape.org/).


looks like Manim.


I would like to know this too



This is also the basis of process mining. And why process graphs are so easy to analyse.


This is especially fascinating when you consider graphs/diagrams are a way to encode math.


The fact that you can represent a graph (the mathematical abstract object) as a diagram is sort of by-the-by here.

The most important thing is that graph algorithms and concepts have a strong relation to numerical aspects of the linear algebra and can be used to accelerate computation.

(You could of course argue that the act that graphs can be represented as a diagram helps humans come up with such algorithms, but that's basically equivalent to saying that you can represent a matrix as a block of numbers and that helps humans look at it).


I was pointing out the other direction:

Diagrams are how you encode categorical models of semantics, which naturally can be represented as graphs. Those graphs can in turn be encoded as matrices. So you have a way to encode semantic foundations as matrices — which you can then use graph algorithms to analyze.

Being able to move your semantic models (eg, diagrams) into a computational framework (eg, linear algebra) is neat.


Umm... What?

Please show us the graph that "encodes" the fundamental theorem of algebra.


https://en.wikipedia.org/wiki/Category_theory

The fundamental theorem of algebra is a fact about the diagram which relates polynomials via division by monomials.

Every polynomial of degree n is n divisions of a monomial away from the empty product.

- - - -

This is probably easier to see the other direction:

Polynomials of degree n are isomorphic to n-products of monomials, and you can build a graph of the assembly where each arrow represents a multiplication by a particular monomial. (Then reverse the arrows, to get my original diagram.)


Sparse Linear Algebra is Graphs all the way down.


Beautiful illustrations, thank you for sharing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: