Ask HN: Having terrible time with paid versions of ChatGPT and Claude
Using for really simple bash programming tasks.
Paid version (lowest levels) of both. Claude Sonnet 4; ChatGpt 4o; Code is MacOS.
Going around circles for things as simple as 'please mark the end of the script with #finish of script' and often leaving off parts of the script (Claude).
Failing to easily find missing braces tasks that are easy for a human.
Requesting that I run the sed command to count up the braces 'oh I see we need an extra "{" (but then doesn't even fix). Annoying.
Often requested to 'start a new chat limit reached'.
Can't properly handle coloring of text in the terminal figures out then forgets the fix later with other changes.
What are others experiencing?
>Often requested to 'start a new chat limit reached'.
Are you using one conversation thread to implement lots of features? That's not recommended as the huge context makes the model behaviour unstable. Restart the conversation with an updated prompt when reaching the end of tasks and assuming it has all the required information the new conversation will perform better.
Generate multi-shot prompts of previous requests and successful outputs.
LLMs below Opus produce a surprisingly high rate of malformed bash commands that have to be corrected iteratively or just shown the right way from the beginning using aforementioned multishot prompting.
For claude web app the best way to have the agent remember things is to have it open up a running todo list in an artifact that you ask for it to frequently update.
Thank you for that idea. I've experimented with that trying this to start then asking how do I 'tell you' to use this (which it replied to).
"This is where I would keep track of things that I always want included in any bash or other routine I write that is executed on the command line. I will add to it as I think of new items:
I'm getting claude code to one shot deployment scripts. It's pretty great.
I use Claude Code for all of my interactions with Claude now. It's a way better experience.
It’s been pretty hit or miss for me with Copilot. Between ChatGPT, Claude, and Gemini, I’ve had the best luck with Gemini. Different models may work better or worse with what you’re trying to do.
Sometimes it works well. Sometimes it doesn't.
Opus and o3 usually work ok for me, meaning they get the job done with some steering.
Opus seems like overkill. o3 is probably the most accurate and cost-effective at the cost of a little creativity.
> Code is MacOS
What do you mean by this? Applescript isn't a super common language these days and you gotta acknowledge it's not what LLMs are finetuning on. You might be better off asking for a zsh script that does the same thing.
My experience with Claude has been positive as long as the code itself is broken down into ~500 SLOC or less in each module. Huge run-on programs don't seem to play nice with modern context windows.
Sorry. I meant platform is MacOS.
[dead]