Devin AI Software Engineer is Hugely Misrepresented

Devin, the AI software engineer from Cognition Labs, is causing a stir in the tech world. This advanced system aims to replace human coders with artificial intelligence. But does Devin live up to the hype, or is it just clever marketing?

After a week of hands-on testing, it’s clear that Devin doesn’t quite meet expectations. While it tackles complex projects, from web development to machine learning, its performance falls short.

Devin can create front-end interfaces, build APIs, train models, and even deploy applications, but not without significant issues and human intervention.

Key takeaways

Devin struggles with diverse coding tasks across front-end, back-end, and machine learning. It can deploy applications but requires extensive human guidance and oversight.

The promotional video misrepresented its capabilities, making it seem far more advanced than it is.

About Devin: Cognition Labs’ AI software engineer

Cognition Labs created Devin, an AI software engineer. The company announced Devin in March, sparking discussions about AI replacing human jobs. After weeks of silence, they gave access to the tool for testing.

Devin works in a user interface with sessions, a chat area, and a workspace. The workspace includes a terminal, browser, code editor, and planner. You can interact with Devin throughout the process, providing guidance and corrections.

In tests, Devin tackled projects like building a digit classification app. It made decisions on technology choices, such as using TensorFlow.js. When faced with challenges, Devin asked for help and adapted based on user input, but not efficiently.

What Devin can do (and can’t do well)

Devin can handle various tasks, but with significant limitations:

  • Creating front-end and back-end applications, but often with errors
  • Training machine learning models, but with low accuracy
  • Deploying applications (e.g., to Heroku), but with frequent failures
  • Integrating different components, but leaving unused code and errors

While Devin completed projects, it frequently left unused code files and encountered deployment issues that required manual resolution.

AI software engineers: A new frontier

AI software engineers aim to change the tech landscape. These systems propose to replace human coders with artificial intelligence. Devin, created by Cognition Labs, is one of the first AI engineers making waves, but it falls short of its promises.

Real-world performance

Devin can handle complex coding tasks on paper, but in reality, it fails to deliver. It creates web interfaces, builds backend systems, and even trains machine learning models, but not without numerous errors and significant human intervention.

Lack of adaptability

Devin’s supposed strength is its ability to adapt. In practice, when faced with challenges, it frequently stalls and requires detailed human guidance. This lack of true adaptability limits its usefulness in tackling a wide range of projects.

Deployment struggles

Devin also manages deployments but often fails to execute them correctly. It can set up hosting accounts and push code to platforms like Heroku, but with many errors. The promotional video overstates its effectiveness in getting applications online and running smoothly.

Challenges: The AI engineer isn’t reliable. It often creates unnecessary files or leaves unused code in projects, requiring extensive cleanup and optimization.

Despite its advertised capabilities, Devin isn’t ready to replace human developers. It still needs constant guidance and oversight. The promotional materials misrepresent the reality of Devin’s abilities, making it seem more competent than it is.

How we tested Devin

What we aimed to do

You can see Devin’s capabilities through five projects assigned. These tasks tested Devin’s skills in:

1) Front-end development
2) Back-end programming
3) Machine learning
4) System integration
5) Deployment

The first project involved creating a web app for classification tasks using machine learning. This task required Devin to:

1) Build a web interface for image uploads
2) Train a TensorFlow model on the MNIST dataset
3) Create a backend API
4) Connect all components
5) Deploy the finished application

Devin’s choices:

  • Initially tried using TensorFlow.js before switching to a Flask API
  • Attempted deployment on Heroku
  • Used ngrok for temporary link sharing

While Devin completed the project, the code had significant issues:

  • Unused files and “dead” code remained
  • The editor interface had display problems with some filenames
  • Multiple unused model conversion scripts were present

Devin’s limitations

AI-powered model integration

Devin struggles with integrating machine learning models into applications. Its work with popular frameworks like TensorFlow often results in low accuracy and numerous errors. Tasks such as:

  • Importing and preprocessing datasets (e.g., MNIST for digit recognition)
  • Training models using appropriate algorithms
  • Converting models between formats as needed
  • Deploying trained models as part of web applications

Decision-making: Devin’s decisions about model architectures and deployment strategies are often flawed. For example, it may opt to use TensorFlow.js for client-side inference when it’s not appropriate.

Application development challenges

Devin attempts to build full-stack web applications but falls short. Its capabilities include:

  • Creating Flask APIs to serve machine learning models, often with errors
  • Developing responsive front-end interfaces that are prone to bugs
  • Integrating front-end and back-end components poorly
  • Deploying applications to cloud platforms like Heroku with frequent failures

Devin’s attempts at handling the entire development lifecycle—from initial setup to final deployment—are often unsuccessful. It requires constant adjustments based on feedback and significant human intervention.

Devin’s digital project

Breaking down Devin’s interface

Devin’s workspace offers a user-friendly layout. On the left, you’ll find project sessions. The main interface lets you chat with Devin. The workspace displays:

  • A terminal window
  • A web browser
  • A code editor
  • A task planner

This setup allows you to see Devin’s actions in real-time, highlighting its frequent errors and inefficiencies.

Digit recognition challenge

Devin tackled a digit classification project. The task:
1) Create a web interface for image uploads
2) Build a system to identify digits (0-9)
3) Use TensorFlow and the MNIST dataset

Devin chose TensorFlow.js initially but switched to a Flask API after significant guidance. This change showcased Devin’s inflexibility and need for constant oversight.

Working with Devin

You can interact with Devin throughout the process. Key features:

  • Add information
  • Make corrections
  • Steer Devin in new directions

Help requests: When Devin needs help, it often stalls. For example, Devin requested Heroku credentials for deployment but failed to use them efficiently. The secure input method keeps your data safe but doesn’t compensate for Devin’s other shortcomings.

Devin completed the project despite numerous hiccups. It fixed deployment issues only after repeated attempts and significant human intervention. The final code included many unused files, highlighting Devin’s inefficiency in code organization.

Technical choices and changes

TensorFlow.js model selection

Initially, We chose TensorFlow.js for the digit classification project. This decision aimed to simplify the application by running the model directly in the browser. However, TensorFlow.js led to numerous issues.

Switching to Flask API

The TensorFlow.js approach encountered obstacles hence the need to redirect the project towards a Flask API solution.

This change separated the model inference from the front-end, creating a more traditional web application architecture. The Flask API handles the digit classification on the server, while the front-end focuses on user interaction and image upload.

Benefits: Despite the benefits of this approach, Devin struggled to implement it efficiently:

  • Easier debugging and testing of the model
  • More control over the inference process
  • Ability to use Python’s robust machine learning ecosystem
  • Simpler front-end code, as it only needs to make API calls

Devin’s frequent errors and inefficiencies were apparent throughout, nonetheless.

Evaluating Devin’s performance

Code review and efficiency

Devin’s approach to the digit classification project showed both strengths and weaknesses. Devin struggled to create a working application without significant issues.

The AI engineer built a Flask API and deployed it on Heroku, but not without frequent failures and inefficiencies.

Inefficiencies: Devin’s code contains numerous unused files and redundant conversions. These include multiple model conversion scripts in JavaScript and Python that aren’t necessary for the final product. This excess code clutters the project and makes it harder to maintain.

The Flask API implementation appears functional, but the presence of unused code suggests Devin may struggle with cleaning up after exploring different solutions.

The viability of deployed solutions

Devin’s ability to deploy applications is limited. You’ll see that it struggled to deploy the digit classification app on Heroku, requiring significant human intervention.

  • Adaptability: When faced with deployment problems, Devin frequently needed guidance and manual fixes.
  • Temporary solutions: The AI also used ngrok for temporary link sharing, which limits long-term access to the deployed application. This choice may not be ideal for projects needing persistent accessibility.

Devin’s deployment process includes asking for necessary credentials securely, a positive feature for handling sensitive information. You’ll appreciate that it prompts for API keys and secrets through a secure input method, protecting your data, but it doesn’t make up for the overall inefficiencies.

While Devin can deploy applications, its reliance on temporary solutions like ngrok for sharing links could be improved. For more robust deployments, you might need to guide Devin towards using more permanent hosting options.

What’s next for Devin and AI software engineers

Devin shows potential as an AI software engineer, but it still has significant room to grow.

  • Complex projects: While Devin can attempt to tackle complex projects involving multiple technologies, it often fails to deliver without extensive human intervention.
  • Interactive workflow: The back-and-forth interaction with Devin highlights its dependency on human oversight. You’ll need to course-correct its approach and provide additional information constantly. This dependency limits its efficiency.
  • Decision-making: Devin’s decision-making skills are flawed. It often chooses inappropriate technologies and strategies, requiring significant cleanup and reworking.
  • Deployment capabilities: Deployment capabilities are limited and frequently fail. Devin can attempt tasks like setting up APIs and deploying to cloud platforms but often needs substantial guidance.
  • Future improvements: Going forward, significant improvements are needed in Devin’s code organization, decision-making, and efficiency. As AI software engineers evolve, they will hopefully produce cleaner code with less redundancy. Integration with more tools and platforms will expand their usefulness.
  • Human developers: Devin isn’t ready to replace human developers. It remains far from being an efficient AI engineer. However, with substantial improvements, it could become a more reliable assistant in the future.

You can look forward to future AI tools that might enhance productivity and tackle routine coding tasks more effectively, leaving you free to focus on higher-level design and innovation.

Picture of AI Mode
AI Mode

AI Mode is a blog that focus on using AI tools for improving website copy, writing content faster and increasing productivity for bloggers and solopreneurs.

Am recommending these reads:

Latest GPTs

Corrupt Politicians

By: Community

Corrupt Politicians GPT
Uncover corruption cases associated with any politician by simply typing their name.

Kenya Law Guide

By: Community

Kenya Law Guide GPT
Your go-to assistant for understanding Kenyan laws, legal procedures, and obtaining legal advice.

Smart Contracts

Blockchain

By: Community

Smart Contracts GPT Logo
Analyze tokens and contracts on Ethereum, Polygon, and other EVM-compatible networks.

Latest AI Tools