Microsoft’s Bing AI made several factual errors in its launch demo last week

Microsoft CEO Satya Nadella

Jordan Novett | cnbc

During last week’s chatbot promotion, with Microsoft And Google Attempting to one-up each other in demoing early versions of artificial intelligence-powered search, more than 1 million people signed up to try Microsoft’s tool in the first 48 hours, the company said.

Microsoft CEO Satya Nadella told CNBC The technology, which could spit out complete answers as if they were written by a human, was “probably the industrial revolution brought to bear on knowledge.”

But for those concerned about accuracy, the AI ​​leaves a lot to be desired.

in microsoft demo Analyzed earnings reports in front of reporters using ChatGPT-like technology embedded in the company’s Bing search engine Difference And Lululemon, When comparing the responses to the actual report, the chatbot missed some numbers. Others appear to have been created.

“The Bing AI got some answers completely wrong during their demo. But no one cared,” wrote independent search researcher Dimitri Brereton in a substack post on Monday. “Instead, everyone jumped on the Bing hype train.”

Brereton identified potential factual issues in the Microsoft demo in his answers regarding vacuum cleaner specifications and travel plans to Mexico, in addition to financial errors. He told CNBC that he wasn’t initially looking for errors, and only found them when he looked more closely to write a comparison of AI unveilings from Microsoft and Google.

AI experts call the phenomenon “hallucinations,” or the tendency of devices to simply make up stuff based on larger language models. Last week, Google introduced a competing AI tool that also included factual errors – although there were inaccuracies quickly called by the audience.

Both companies are racing to incorporate new types of generative AI into search engines and are eager to show off their progress following the explosion of ChatGPT, which OpenAI introduced to the public in November. OpenAI has raised billions from Microsoft, while competing startups like Stability AI and Hugging Face have also ballooned billion dollar valuation in private funding round.

While Google has been reluctant to add AI-generated responses to search engines reputational risk And security concernsMicrosoft, in its announcement last week, emphasized the short-term potential of releasing the technology to a few people.

“I think it’s important not to be in the lab,” Nadella said. “You have to get these things out safely.”

When it came time to display Bing AI’s response to a query on corporate earnings, there were a few problems.

Yusuf Mehdi, a marketing executive at Microsoft, navigated to Gap’s investor relations site, and asked him to summarize the “key takeaways” from Bing AI. Retailer’s Third Quarter Earnings Release in November.

“Great. Huge time saver,” said Mehdi.

These are screen shots from Microsoft’s demo:

There are some mistakes in the summary here:

  • Gap’s gross margin was 37.4%. But after removing Yeezy-related charges, adjusted gross margin was 38.7%.
  • Gap operating margin was 4.6%, not 5.9%, a number that can’t be found in the company’s reports.
  • Adjusted diluted earnings per share was $0.71 instead of $0.42, a number not reported. The figure reported by Gap includes an adjusted income tax benefit of approximately $0.33.
  • Gap pulled Its full year outlook in August and said in the third quarter report that “net sales could be down mid-single digits year-over-year in the fourth quarter.” That means a decline in revenue for the full year as opposed to “low double-digit growth.” There is no forecast for operating margin or EPS.

Microsoft said that it is aware of the errors and that it expects Bing AI to make mistakes.

“We are aware of this report and have analyzed its findings in our efforts to improve this experience,” a Microsoft spokesperson told CNBC. “We recognize there is still work to be done and expect the system to make mistakes during this preview period, which is why feedback is important so we can learn and help improve the model. “

Microsoft then asked Bing AI to compare gap earnings Together lululemon report, Mehdi wanted Bing to pull information from the two reports into one table.

“Look how amazing it is,” he said. “Just like that, in one table, I can get the answer to this question. Think how long it would have taken otherwise.”

Here’s what the Bing AI tool returned:

There are several errors in the table, starting with the margins.

  • Lululemon’s gross margin was 55.9%, not 58.7%.
  • The company’s operating margin was 19%, not 20.7%.
  • Lululemon reported diluted EPS of $2 and adjusted EPS of $1.62. Bing showed a diluted EPS number of $1.65.
  • Gap had $679 million in cash and cash equivalents, not $1.4 billion.
  • Gap had $3.04 billion in inventory, not $1.9 billion.

Watch: Full CNBC Interview with Thomas Seibel, CEO of C3.AI