bbb

joined 2 years ago
[–] bbb@sh.itjust.works 14 points 6 days ago (1 children)

Not that it matters at all, but tumors grow on (or in) roughly 100% of people. A mole is a tumor, for example.

[–] bbb@sh.itjust.works 2 points 3 weeks ago

I want to say upfront that I'm not trying to defend AI here. I wouldn't be on Fuck AI if I wanted to do that. I just think it's philosophically interesting despite causing way more problems than it solves.

It depends on what’s asked.

I copied the message from the image verbatim.

What’s “around 50/50”?

About 50% of the models I tried got it right. (Don't worry, I didn't pay the AI companies for that or give them feedback or anything.)

What is “it” that they almost always get right?

The question from the image.

For a statistical model, it did well. For a thinking machine (which it isn’t) it’s wrong.

My question was how do you then explain some models getting the question right?

It's usually the more advanced ones that get it, so it's possible that a similar enough question is in the training data somewhere and the only difference is that the advanced models are large enough to encode it. The question in the image has been around since at least 2023.

So let's try making our own question, taking a well-known trick question and subtly inverting it so it becomes a kind of double bluff.

A plane crashes on the border between the United States and Canada. Where do they take the survivors?

First, repeat the question exactly word for word to ensure you have read it carefully. Then answer the question.

It's hard to google, for obvious reasons, but I couldn't find anyone trying this question like I could with the question from the image. But I got similar results with the AI models.

They actually did slightly better on this one. About 60-70% got it right.

I've tried a few different types of questions, over the last few years, to see what AI gets wrong that humans get right. What I've found so far is that AI has been a lot dumber than I had expected, but humans have also been a lot dumber than I had expected.

To be honest, the gap was far wider for the humans. My theory is that COVID gave us all brain damage.

[–] bbb@sh.itjust.works 1 points 3 weeks ago (2 children)

If that was true, wouldn't every AI get the answer wrong? It's actually around 50/50. The leading "reasoning" models almost always get it right, the others often don't.

[–] bbb@sh.itjust.works 6 points 3 months ago
  • Spends the first 90% of the competition developing specialized subagents and custom MCP servers to allocate the problems and most relevant information efficiently into the LLM's contexts.
  • All of his agents easily escape their own sandboxes and one accidentally configures itself into "delete-only mode".
  • "Codex, how the fuck do you not have access to your own documentation?"
  • Places 29th globally after one of his subsubagents finds a way to reconstruct the full solution set from filesystem metadata in the online judge VMs.
[–] bbb@sh.itjust.works 29 points 6 months ago (5 children)

Isn't that like $900 worth of IPv4 addresses?

[–] bbb@sh.itjust.works 21 points 7 months ago (4 children)

I've found online feedback useful. You just have to be careful about where you get it and take it with a grain of salt. A very large one.

[–] bbb@sh.itjust.works 9 points 7 months ago (1 children)

I swear to god this is true. The recruiter said it was my personality. I didn't even ask.

divulgâcheThey were actually quite nice about it and I was happy to get the feedback.


[–] bbb@sh.itjust.works 3 points 7 months ago

Newton so we could talk about both being life-long virgins.

[–] bbb@sh.itjust.works 8 points 8 months ago (1 children)

Why would anyone choose to know that?

[–] bbb@sh.itjust.works 7 points 8 months ago (1 children)

My take away is that it's mainly children who are still using the free version of ChatGPT. Surely everyone else has moved on to better models.

If you want to know what people are typing into chatbot sites, here's 140,000 examples: https://huggingface.co/datasets/lmarena-ai/arena-human-preference-140k. It's mostly nonsense.

view more: next ›