How an artificial intelligence (as in large language model based generative AI) could be better for information access and retrieval than an encyclopedia with a clean classification model and a search engine?
If we add a step of processing – where a genAI “digests” perfectly structured data and tries, as bad as it can, to regurgitate things it doesn’t understand – aren’t we just adding noise?
I’m talking about the specific use-case of “draw me a picture explaining how a pressure regulator works”, or “can you explain to me how to code a recursive pattern matching algorithm, please”.
I also understand how it can help people who do not want or cannot make the effort to learn an encyclopedia’s classification plan, or how a search engine’s syntax work.
But on a fundamental level, aren’t we just adding an incontrolable step of noise injection in a decent time-tested information flow?
But on a fundamental level, aren’t we just adding an incontrolable step of noise injection in a decent time-tested information flow?
Yes.
Well, the primary thing is that you can ask extremely specific questions and get tailored responses.
That’s the best use case for LLMs, imo. It’s less of a replacement for a traditional encyclopedia- though people use it like that also- and more of a replacement for googling your question and getting a Reddit thread where someone explains.
The issue comes when people take everything it spits out as gospel, and do zero fact checking on it- basically the way that they hallucinate is the problem I have with it.
If there’s a chance it’s going to just flatly make things up, invent statistics, or just be entirely wrong… I’d rather just use a normal forum and ask a real person that probably has a clue whatever question I have. Or try to find where someone has already asked that question and got an answer.
If you have to go and fact check the results anyway, is there even a point? At work now I’m getting entirely AI generated pull requests with AI generated descriptions, and when I challenge the dev on why they went with particular choices they can’t explain or back them up.
That’s why I don’t really use them myself. I’m not willing to spread misinformation just because ChatGPT told me it was true, but I also have no interest in going back over every response and double checking that it’s not just making shit up.
Google is so shit nowadays, it’s main purpose is to sell you things, not to actually retrieve the things you ask.
Mainly you see this with coding related questions, they were much better 5 years ago. Now only way to get results is to ask LLM and hope it doesn’t hallusinate some library that doesn’t exist.
Part of the issue is that SEO got better and google stopped changing things to avoid SEO manipulation.
LLMs are nice for basic research or explaining stuff in your terms. Kind of like an interactive encyclopedia. This does sacrifice accuracy, though
If it actually worked reliably enough, it would be like having a dedicated, knowledgeable, and infinitely patient tutor that you can ask questions to and interactively explore a subject with who can adapt their explanations specifically to your way of thinking. i.e. it would understand not just the subject matter but also you. That would help facilitate knowledge transfer and could reduce the tedium of trying to make sense of something that’s not explained well enough for you to understand (as written) with your current background knowledge but which you are capable of understanding.
Looks like we just found our next head of the Department of Education!
Now we just got a tweak Grok a little and our children will be ready for the first lesson of their new AI education checks notes Was the Holocaust real or just a woke story?
Looking at my ChatGPT “random questions” tab and the things I’ve asked from it, much of it are the kind of things you probably couldn’t look up on encyclopedia.
For example:
“Is a slight drop in the engine rpm when shifting from neutral to 1st gear while holding down the clutch pedal a sign of worn out clutch”?
Or:
“What’s the difference between Mirka’s red and yellow sandpaper?”
Now I want to know the answer to the clutch question ☺️
Hopefully, it told you that’s not a sign of a worn clutch. Assuming no computer interference and purely mechanical effects, then that’s a sign the clutch is dragging. A worn clutch would provide more of an air gap with the pedal depressed than a fresh clutch. If you want to see a partial list of potential causes, see my reply to the other comment that replied to you.
Your questions are still not proof that LLMs are filling some void. If you think of a traditional encyclopedia, of course it’s not going to know what the colors of one manufacturer’s sandpapers mean. I’m sure that’s answered somehow on their website or wherever you came across the two colors in the same grit and format. Chances are, if one is more expensive and doesn’t have a defined difference in abrasive material, the pricier one is going to last longer by way of having stronger backing paper, better abrasive adhesive, and better resistance to clogging. Whether or not the price is necessary for your project is a different story. ChatGPT is reading the same info available to you. But if you don’t understand the facts presented on the package, then how can you trust the LLM to tokenize it correctly to you?
Similarly, a traditional encyclopedia isn’t going to have a direct answer to your clutch question, but, if it has thorough mechanical entries (with automotive specifics), you might be able to piece it together. You’d learn the “engine” spins in unison up to the flywheel, the flywheel is the mating surface for the clutch, the clutch pedal disengages the clutch from the flywheel, and that holding the pedal down for 5+ seconds should make the transmission input components spin down to a stop (even in neutral). You’re trusting the LLM here to have a proper understanding of those linked mechanical devices. It doesn’t. It’s aggregating internet sources, buzzfeed style, and presenting anything it finds in a corrupted stream of tokens. Again, if you’re not brought up to speed on how those components interact, then how do you know what it’s saying is correct?
Obviously, the rebuttal is how can you trust anyone’s answer if you’re not already knowledgeable? Peer review is great for forums/social sites/wikipedias in the way of people correcting other comments. But beyond that, for formal informational sites, vetting places as a source - a skill being actively eroded with Google or ChatGPT “giving” answers. Neither are actually answering your questions. They’re regurgitating things they found elsewhere. Remember, Google was happy to take reddit answers as fact and tell you elmers glue will hold cheese to pizza and cockroaches live in cocks. If you saw those answers with their high upvote count, you’d understand the nuance that reddit loves shitty sarcastic answers for entertainment value. LLMs don’t because they, literally, don’t understand anything. It’s up to you to figure out if you should trust an algorithm-promoted Facebook page called “car hacks and facts” filled with bullshit videos. It’s up to you to figure out if everythingcar. com is untrustworthy because it has vague, expansive wording and has more ad space than information.
What’s understanding? Isn’t understanding just a consequence of neurons communicating with each other? This case LLMs with deep learning can understand things.
Yeah no
Any explanation? If they can write text, I assume they understand grammar. They are definetly skilled in a way. If you do snowboarding, do you understand snowboarding? The word “understand” can be misleading. That’s why I’m asking what’s understanding?
https://en.wikipedia.org/wiki/Disjunctive_sequence
With your logic, these numbers understand grammar too because they can form sentences.
Even better, anything that any human could ever say is contained in those, and as such, humanity has a more limited grammar understanding than a sequence.
You cannot define understanding by the results, and even if you did, AIs give horrible results that prove that they do nothing else than automatically put words next to each other based on the likelihood of it making sense to humans.
They do not understand grammar just like they do not understand anything, they simply are an algorithm made to spit out “realistic” answers without having to actually understand them.
Another example of that is AIs that generate images: they’re full of nonsense because the AI doesn’t understand what it’s making, and that’s why you end up with weird artifacts that seem completely absurd to any human with basic understanding of reality.
But LLMs are not simply probabilistic machines. They are neural nets. For sure, they haven’t seen the world. They didn’t learn the way we learn. What they mean by a caterpillar is just a vector. For humans, that’s a 3D, colorful, soft object with some traits.
You can’t expect that a being that sees chars and produces chars knows what we mean by a caterpillar. Their job is to figure out the next char. But you could expect them to understand some grammar rules. Although, we can’t expect them to explain the grammar.
For another example, I wrote a simple neural net, and with 6 neurons it could learn XOR. I think we can say that it understands XOR. Can’t we? Or would you say then that an XOR gate understands XOR better? I would not use the word understand for something that cannot learn. But why wouldn’t we use it for a NN?
Your whole logic is based on the idea that being able to do something means understanding that thing. This is simply wrong.
Humans feel emotions, yet they don’t understand them. A calculator makes calculations, but no one would say that it understands math. People blink and breathe and hear, without any understanding of it.
The concept of understanding implies some form of meta-knowledge about the subject. Understanding math is more than using math, it’s about understanding what you’re doing and doing it out of intention. All of those things are absent in an AI, neural net or not. They cannot “see the world” because they need to be programmed specifically for a task to be able to do it; they are unable to actually grow out of their programming, which is what understanding would ultimately cause. They simply absorb data and spit it back out after doing some processing, and the fact that an AI can be made to produce completely incompatible results shows that there is nothing behind it.