Can Manus AI Build a Multi-Agent System That Runs on 4GB VRAM? I put It to the test.

Can Manus AI Build a Multi-Agent System That Runs on 4GB VRAM? I put It to the test.

I got early access to Manus AI a few days ago, but at first, I wasn’t quite sure what I was going to use it for.

For a while now, I’ve been itching to build something on the agentic end—something that could blend ollama models, diffusion models, visual language models, and integrate with MCP tools. I also wanted it to be plug-and-play, Docker-ready.


So I dropped this prompt into Manus:

“Write me an Agent that can work with ollama models. My focus is on VLMs and stable diffusion. I plan to deploy this on a free software service. Also, create a git repo that can push to GitHub.”

This being my first experience with Ollama, I hadn't realized it lacked support for Stable Diffusion until after I started using it. But so does Manus, it wrote code for Ollama with SD support which was wrong.

What happened next honestly surprised me.

After about 10–15 minutes of searching and processing, it began documenting all components in markdown files, setting up the Git repo, and diving deep into concepts like MCP, ollama, and deployment setups.

Midway, I pushed it further:

“Accommodate multi-agent plug-and-play mechanisms, tools, and MCP tools. You can use Smithery.ai for MCP. Also, what can be done with a 4GB VRAM machine? If possible, have agents that can run on 4GB VRAM.”

It didn’t blink.

Manus generated all the required code, packaged it neatly into a zip, and delivered a fully structured project, ready to roll.

Here’s what the project structure looks like:

AGENTS/
├── design/                 # 🧩 Architectural and system design documents
├── docs/                    # 📚 User and developer documentation
├── plugins/                # 🔌 Plugin modules for extending agent capabilities
├── research/              # 📖 researched topics
├── scripts/                 # 🛠️ Utility scripts for setup, testing, or running pipelines
├── src/                       # 🧠 Core source code
│   ├── agents/            # Different types of autonomous agents
│   ├── api/                  # REST or internal APIs to interact with agents
│   ├── config/             # Configuration files and schemas
│   ├── core/                # Core logic, orchestrator, pipelines
│   ├── mcp/                # Model Context Protocol (MCP)
│   ├── models/           # ML or LLM models and wrappers
│   ├── plugins/           # Plugin integrations
│   ├── tools/               # Tools usable by agents (e.g., web search, calculators)
│   ├── utils/                # Utility functions (logging, helpers, etc.)
│   └── main.py           # Entry point for running the project
├── tests/                    # ✅ Unit and integration tests
├── .gitignore
├── README.md            # 📄 Project overview and instructions
├── requirements.txt      # 📦 Python dependencies
├── setup.py                  # ⚙️ Installation script
└── todo.md                  # 📝 Pending tasks or roadmap        


Now the actual question is—does it really work?

Well... not out of the box.

I had to make quite a few changes to get everything running smoothly. First off, I forgot to mention that I’d be running this on a Windows system—so naturally, the asyncio event loop had to be configured like this:

import platform
if platform.system()=='Windows':
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())        

Next, the ollamaclient code Manus generated used aiohttp, which, for some reason, couldn’t communicate with my local Ollama server. So I rewrote it (well, prompted GPT/Grok) to use the official Ollama Python package instead.

Since ollamaclient was the backbone for all agents—VLM, Stable Diffusion, etc.—I had to update all related calls to work with the new Ollama Python API.

Then came the API server, which was originally set up with blocking calls using uvicorn. This clashed with the async event loop, so I had to refactor it to run in async mode as well.


Some smaller issues popped up along the way too:

  • CUDA version: I was on 11.5, which was ancient. Had to upgrade to 12.5 to support resource monitoring using pytorch. Manus had written a resource tracker that detects GPU, RAM, CPU, and dynamically decides how to run models (4-bit, 8-bit, no quantization, etc.).
  • A few misplaced arguments and broken module imports needed manual fixes.
  • requirements.txt was generated empty at first. I prompted again and got a decent list—but it was missing PyTorch, which I had to add myself.


Finally, after patching all the bits, I was able to run llava:7b-v1.5-q4_0 on a 4GB VRAM machine.


Below is a screenshot showing the image I passed to LLaVA (a dog with a frisbee) along with some logs.

  • The red box highlights the request being received and the corresponding task creation for image understanding.
  • The green box shows the response generated by the Agent.

The first few sentences in the response were actually spot-on, but after that, the LLaVA model started to hallucinate a bit. Classic.

The first request took 40.19 seconds to run, but subsequent requests were served around 10 seconds.

Article content
VLM Agent running LLAVA-1.5-4bit quant on 4GB RAM.


Next up

I’ll be integrating more agents and models. I still need to add MCP tools—I’ll explore the best options out there, test them with Docker, and try out cloud deployment.

I’m also planning to optimize model inference and integrate Stable Diffusion that can run smoothly on a 4GB VRAM machine.

After that, I will finally release it on GitHub for people to access.


Thank you, Manus AI, for providing early access and giving free credits to try the service.


Follow me Jaimin B. , for more upcoming blogs and updates.

I also run an Instagram page @cv_for_everyone where I post daily updates on new computer vision papers, along with quick TLDRs.

Coming soon: Podcasts diving into in-depth computer vision research and casual conversations around the field.


I have Manus too and i have also project: its performance and possibilities are very impressive, i don't know we have the possibilities to create an AI agent, you learn me a thing !

Sushant Mahalle

Actively Seeking Summer 2025 Internship/Co-op Opportunities | Studying Information Systems at Drexel University (MS Program) | 4+ Years of Expertise in Technical Support

4w

💡 Great insight

To view or add a comment, sign in

More articles by Jaimin B.

Insights from the community

Others also viewed

Explore topics