this post was submitted on 16 Jun 2026
1 points (100.0% liked)
Web Development
5713 readers
1 users here now
Welcome to the web development community! This is a place to post, discuss, get help about, etc. anything related to web development
What is web development?
Web development is the process of creating websites or web applications
Rules/Guidelines
- Follow the programming.dev site rules
- Keep content related to web development
- If what you're posting relates to one of the related communities, crosspost it into there to help them grow
- If youre posting an article older than two years put the year it was made in brackets after the title
Related Communities
- !html@programming.dev
- !css@programming.dev
- !uiux@programming.dev
- !a11y@programming.dev
- !react@programming.dev
- !vuejs@programming.dev
- !webassembly@programming.dev
- !javascript@programming.dev
- !typescript@programming.dev
- !nodejs@programming.dev
- !astro@programming.dev
- !angular@programming.dev
- !tauri@programming.dev
- !sveltejs@programming.dev
- !pwa@programming.dev
Wormhole
Some webdev blogs
Not sure what to post in here? Want some web development related things to read?
Heres a couple blogs that have web development related content
- https://frontendfoc.us/ - [RSS]
- https://wesbos.com/blog
- https://davidwalsh.name/ - [RSS]
- https://www.nngroup.com/articles/
- https://sia.codes/posts/ - [RSS]
- https://www.smashingmagazine.com/ - [RSS]
- https://www.bennadel.com/ - [RSS]
- https://web.dev/ - [RSS]
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Except that it's been demonstrated multiple times that original training data can be extracted from a language model, so it is completely valid to talk about the model as a database, because the training data is stored within it.
Here's a broad survey of more than 100 research papers demonstrating this: Training Data Extraction From Pre-trained Language Models: A Survey
So, this is a good anology in this case.
See, I know how an internal combustion engine works. I don't know, by looking at the hood of a particular vehicle, how exactly a specific car's engine operates (maybe it has 4 cylinders, or 6 or 8, maybe it has fuel injectors, maybe it has a carburetor, etc). However, I do know that the principles are the same for all internal combustion engines, and that just because I don't know the details of how a particular engine operates, that does not mean that its operation is beyond my understanding.
The same is true for machine learning models. There may be uncertainty as to how a particular model operates "under the hood", but the principles of operation are the same for all, and are not incomprehensible.
We actually don't know this. This is called computationalism. It is speculative, there are several alternative theories, and little in the way of experimental evidence supporting any particular theory.
You have to understand, the current branch of machine learning models grew out of algorithms whose purpose was processing large data sets with thousands or millions of variables and optimizing for areas in the data set where many of those variables were maximized (or minimized). Here's a better explanation:
Hill Climbing Algorithm & Artificial Intelligence - Computerphile
How these tools perform their optimization, and what they optimize for, has been recombined in different ways to produce different types of models, and the search space of variables has been expanded with increased computing power, but the underlying operating principles are still the same. This is not a tool that can comprehend what it is doing, it can't be self-aware. It can only process large amounts of input data and attempt to maximize for particular dimensions. This seems vague to humans because the amount of variables being handled at any given time is far more than a human mind can focus on, but that doesn't make the optimization routine intelligent or conscious. It's just doing a lot of number crunching really fast, optimizing for specific aspects as directed by its developers.
It's been demonstrated that some more prominent pieces of training data can be reproduced, the majority of it cannot. This shows that those particular pieces of data are represented in some form within the model, it does not show that the way it works is equivalent to database lookups. If I can write down the lyrics of a song from memory, it shows that those lyrics are encoded as data in some form in my brain, but that doesn't mean it's valid to talk about my brain as a literal database, especially not in the sense that the limitations in the capabilities of a database can be ascribed to me (or its strengths, I cannot remember the exact lyrics of most songs I've heard, even if I can remember some).
This video literally starts out by describing evolution as a similar optimization algorithm. If you know the basic mechanism of evolution, does that mean you can use that to then say with certainty and specificity what biological life in its vast diversity of techniques is not capable of? The "underlying operating principles" of evolution don't "understand" chemistry or deception, but they still produce organisms capable of photosynthesis and camouflage. It's an algorithm that produces other algorithms, which is what puts those resulting algorithms in a different category of comprehensibility than fixed algorithms that were explicitly written by someone. We are very far from having a comprehensive understanding of biological systems, despite knowing how evolution works.
This is like saying evolution is only a simple mechanism taking in the world as data, which, yeah, obviously, but that property doesn't carry forward to what it produces. The bigger problem here though is, again, concepts like comprehension, consciousness, and intelligence are not well defined in computational terms, and it is unclear what statements involving them mean in any practical sense. These sorts of claims are non-falsifiable and don't make testable predictions about the boundaries of AI capability.