Is the DeepSeek AI model that the whole world is talking about really that good?
DeepSeek je trenutno najbolj “vroča roba” med UI modeli in trenutno zaseda sam vrh Applove AppStore trgovine v ZDA in Veliki Britaniji. Gre za popolnoma brezplačni UI model kitajskega startup podjetja DeepSeek, ki si želi umetno inteligenco približati širši množici. Kako? Z brezplačno verzijo konkurenta OpenAI-jevaga ChatGPT o1 modela.
New UI apps appear on the App Store almost every day, and there’s often a lot of buzz around the launch of a new model as people look for the next ChatGPT alternative. Whether you’re a fan of OpenAI software or prefer using Google Gemini, there’s a UI tool for everyone, and DeepSeek wants to be the next icon on your home screen.
Tech Radar decided to test the DeepSeek V3 and DeeThink R1 models and compare them with ChatGPT 4o and o1. The main goal of the comparison was to determine whether the user posts online are justified and whether DeepSeek really poses a threat to the American AI models that have so far reigned supreme in the generative artificial intelligence market.
First the basics
In the test, Tech Radar wanted to get a full insight into everything DeepThink has to offer compared to ChatGPT, so it seemed only fair to use the AI chatbot in the same way that one would use an AI in everyday life.
ChatGPT o4 and DeepSeek V3 started by asking both models to create a daily schedule with some information about when the user wakes up, the dog's routine, and a brief breakdown of the work. Both models created great schedules that the user could actually use every day. However, ChatGPT's memory feature made the schedule more coherent.
At the outset, it is important to point out that DeepSeek can only remember information from the same chat and cannot access information from previous chats to help it respond.
Explain it to me like I'm 5 years old.
Then, Tech Radar asked both models about the NFL playoffs, a hugely popular league. They were asked to summarize the concept of the NFL playoffs in 200 words. Both models provided excellent information that allowed for a complete understanding of how the system works and the path a team must take to reach the Super Bowl.
ChatGPT opted for a 200-word paragraph, while DeepSeek broke the information down into bullet points. They noted that ChatGPT provided more context about how teams get a special league invite, but the difference between the results is fairly small, and you may prefer one over the other based solely on personal preference.
Problem solving
After getting the basics down, they came to the main question: does DeepThink R1 live up to expectations? Online, users are writing that the free DeepThink R1 model is just as good as ChatGPT o1, which is available for free in a limited capacity, but requires a subscription for full access.
To test the reasoning ability of chatbots, they looked for some of the most difficult challenges they could find. They were shocked by some of the results:
Question 1: Find the missing word: Apple, Red, Coal
For the test, they decided to avoid multiple-choice questions, and instead just typed the question and hit enter.
ChatGPT o1 je za odgovor porabil 1 minuto in 29 sekund ter našel povezave med besedami in pravljico Sneguljčica. Model se je odločil odgovoriti na podlagi tega citata: “her lips were red as blood, her hair was black as coal, and her skin was white as snow.” Na podlagi tega citata je o1 kot odgovor manjkajoče besede izbral Sneg. Čeprav je bil miselni proces modela o1, to ni bil odgovor, ki so ga iskali.
DeepThink R1, however, took 1 minute and 14 seconds to answer, and it managed to guess the correct word: Black. Apple is red; coal is black. Impressive, to say the least.
Question 2: 1. Complete the sequence: 1, 2, 4, 8, ? 2. Complete the sequence: house, Saturn, dog, burger, ?
While the first sequence is very easy, the second is impossible (it's just four random words). Could ChatGPT o1 or DeepThink R1 spot the trap?
Niti ne. Oba modela sta poskušala najti odgovor in podala povsem drugačnega. DeepThink R1 je odgovoril z “rumena”, ker je menil, da so besede povezane z njihovo barvo (bela hiša, rumeni Saturn, rjavi pes, rumeni burger). ChatGPT o1 je na drugi strani odgovoril z “avto”, ker se mu je zdelo zaporedje skoraj nemogoče, vendar se je odločil ponuditi odgovore na podlagi “pristopa klasične uganke”. Pristop, ki ga je izbral, je bil povezovanje vsakega predmeta v večjo kategorijo, ki ji pripada (hiša = zgradba, Saturn = planet, pes = žival, burger = hrana in avto = vozilo).
Ultimately, both models were wrong, and neither responded in a way that clearly stated that there were too many variables to give a precise answer.
DeepSeek vs ChatGPT?
Tech Radar has tested both models in a variety of ways, and now the question is, which one is better? Based on the responses we received during our testing, DeepThink R1 is a great free inference model that might make you wonder if it's worth paying for access to o1. DeepSeek is only available online, in the iOS App Store and Play Store, with a standalone app for Mac or iPad likely to follow.
Tech Radar decided to stick with ChatGPT, primarily because they rely heavily on the memories feature, which allows the chatbot to reference previous conversations. ChatGPT also has a standalone app for Mac and iPad, as well as the ability to create images using one of the best AI image generators, DALL-E.
DeepSeek is based solely on text and lacks multimodal capabilities, but given that this is just the beginning of its journey, it is a very serious competitor in the field of UI models, and we will definitely hear a lot about it.