No Result
View All Result
The New York Ledger
  • Home
  • News
    What to know about Argentina’s midterm vote, a pivotal test for Trump ally President Milei

    What to know about Argentina’s midterm vote, a pivotal test for Trump ally President Milei

    Inflation remained well above the Fed’s target in September ahead of rate cut decision

    Inflation remained well above the Fed’s target in September ahead of rate cut decision

    US IPO Weekly Recap: 5 Small IPOs Debut, As Government Shutdown Drags On

    US IPO Weekly Recap: 5 Small IPOs Debut, As Government Shutdown Drags On

    In emotional speech, Zohran Mamdani defends Muslim identity against ‘racist and baseless’ attacks

    In emotional speech, Zohran Mamdani defends Muslim identity against ‘racist and baseless’ attacks

    Over 140,000 bottles of popular cholesterol drug recalled

    Over 140,000 bottles of popular cholesterol drug recalled

    Webull Stock: Strong Growth Story, But Valuation Demands Patience (NASDAQ:BULL)

    Webull Stock: Strong Growth Story, But Valuation Demands Patience (NASDAQ:BULL)

    Trending Tags

    • general news
    • Risk News
    • Political/General News
    • industrial news
    • Travel
    • Financial Crime
    • business
    • consumer services
  • Spotlight
  • Politics
  • Business
  • Markets
    • Stock News
    • Crypto
    • Forex
  • Finance
  • Companies
  • Tech
  • Climate
  • Lifestyle
  • More
    • Videos
    • Economic Calendar
    • Stocks Portfoilo
    • Stock Tracker
Subscribe
  • Login
No Result
View All Result
  • Home
  • News
    What to know about Argentina’s midterm vote, a pivotal test for Trump ally President Milei

    What to know about Argentina’s midterm vote, a pivotal test for Trump ally President Milei

    Inflation remained well above the Fed’s target in September ahead of rate cut decision

    Inflation remained well above the Fed’s target in September ahead of rate cut decision

    US IPO Weekly Recap: 5 Small IPOs Debut, As Government Shutdown Drags On

    US IPO Weekly Recap: 5 Small IPOs Debut, As Government Shutdown Drags On

    In emotional speech, Zohran Mamdani defends Muslim identity against ‘racist and baseless’ attacks

    In emotional speech, Zohran Mamdani defends Muslim identity against ‘racist and baseless’ attacks

    Over 140,000 bottles of popular cholesterol drug recalled

    Over 140,000 bottles of popular cholesterol drug recalled

    Webull Stock: Strong Growth Story, But Valuation Demands Patience (NASDAQ:BULL)

    Webull Stock: Strong Growth Story, But Valuation Demands Patience (NASDAQ:BULL)

    Trending Tags

    • general news
    • Risk News
    • Political/General News
    • industrial news
    • Travel
    • Financial Crime
    • business
    • consumer services
  • Spotlight
  • Politics
  • Business
  • Markets
    • Stock News
    • Crypto
    • Forex
  • Finance
  • Companies
  • Tech
  • Climate
  • Lifestyle
  • More
    • Videos
    • Economic Calendar
    • Stocks Portfoilo
    • Stock Tracker
Subscribe
  • Login
The New York Ledger
No Result
View All Result
Home Business

The Surprising Idea That Generative AI Might Be Better Off Using Visual Images Of Text Rather Than Pure Text As Tokens

October 25, 2025
in Business
A A
The Surprising Idea That Generative AI Might Be Better Off Using Visual Images Of Text Rather Than Pure Text As Tokens
Share on FacebookShare on Twitter

In today’s column, I analyze a rather ingenious concept that skillfully turns the standard style of generative AI and big language designs (LLMs) on its head. Just specified, think about the bold concept that rather of generative AI getting pure text, the text was very first recorded as images, and the images were then fed into the AI.

State what?

For anybody versed in the technical foundations of LLMs, this appears totally oddball and counterproductive. You may currently be screaming aloud that this makes no sense. Here’s why. An LLM is developed to handle natural languages such as English and, for that reason, makes plentiful usage of text. Text is the manner in which we typically input triggers and enter our concerns into LLMs. Choosing to utilize pictures of text, in location of real text, has actually got to be a screwball principle. Blasphemous.

Keep your hat due to the fact that some earnest scientists attempted the technique, and there suffices benefit that we should provide the flight of fancy a degree of seriously dedicated thorough attention.

Let’s discuss it.

This analysis of AI advancements becomes part of my continuous Forbes column protection on the most recent in AI, consisting of recognizing and describing numerous impactful AI intricacies (see the link here).

Tokenization Is Vital

The heart of the matter requires the tokenization elements of modern-era generative AI and LLMs. I have actually covered the information of tokenization at the link here. I will offer a fast introduction to get you up to speed.

When you go into text into AI, the text gets transformed into numerous numbers. Those numbers are then handled throughout the remainder of the processing of your timely. When the AI has actually come to a response, the response is really in a numerical format and requires to be transformed back into text, so it is understandable by the user. The AI continues to transform the numbers into text and shows the reaction appropriately.

That entire procedure is called tokenization. The text that you go into is encoded into a set of numbers. The numbers are described as tokens. The numbers, or will we state tokens, circulation through the AI and are utilized to find out responses to your concerns. The reaction is at first in the numerical format of tokens and requires to be deciphered back into text.

Thankfully, a daily user is blissfully uninformed of the tokenization procedure. There is no requirement for them to learn about it. The subject is of eager interest to AI designers, however of little interest to the public. All sorts of numerical hoax are frequently utilized to attempt and make the tokenization procedure as quick as possible so that the AI isn’t being held up throughout the encoding and decoding that requires to take place.

Tokens Are A Problem

I pointed out that the public typically does not learn about the tokenization elements of LLMs. That’s not constantly the case. Anybody who has actually pressed AI to its limitations is most likely slightly familiar with tokens and tokenization.

The offer is this.

The majority of the modern LLMs, such as OpenAI’s ChatGPT and GPT-5, Anthropic Claude, Meta Llama, Google Gemini, xAI Grok, and others, are rather restricted due to the variety of tokens they can properly deal with at one time. When ChatGPT very first burst onto the scene, the variety of enabled tokens in a single discussion was rather restricted.

You would rudely find this reality by ChatGPT all of a sudden no longer having the ability to remember the earlier parts of your discussion. This was because of the AI striking the wall on the number of active tokens might exist at one time. The tokens from earlier in your discussion were summarily being tossed away.

If you were doing any prolonged and intricate discussions, these constraints were exasperating and practically knocked out of contention any big-time usage of generative AI. You were restricted to fairly brief discussions. The exact same problem developed when you imported text through an approach such as RAG (see my conversation at the link here). The text needed to be tokenized and when again counted versus the limit of the number of active tokens the AI might deal with.

It was maddening to those who had imagine utilizing generative AI for larger-scale analytical.

Limitations Are Greater However Still Exist

The early variations of ChatGPT had a restriction of less than 10,000 tokens that might be active at any moment. If you consider a token as representing a little word, such as “the” or “pet dog”, this suggests you struck the wall when your discussion had actually taken in approximately 10 thousand basic words. This was unbearable at the time for any prolonged or intricate use.

Nowadays, the standard variation of GPT-5 has a token context window of about 400,000 tokens. That is thought about the overall capability connected with both the input tokens and the output tokens as a combined overall. Context window sizes can differ. For instance, Claude has a limitation of about 200,000 tokens on a few of its designs, while others extend even more to around 500,000 tokens.

A visionary view of the future is that there will not be any constraints connected with the enabled variety of tokens. There is advanced deal with so-called boundless or unlimited memory in AI that would practically make it possible for any variety of tokens. Naturally, in a useful sense, there is just a lot server memory that can exist; hence, it isn’t really boundless, however the claim is memorable and fairly reasonable. For my description of how AI infinite memory works, see the link here.

Managing The Token Issue

Due to the fact that tokenization is at the essence of how most LLMs are developed and made use of, a great deal of effort has actually been stridently carried out to attempt and enhance the tokenization elements. The goal is to in some way make tokens smaller sized, if possible, enabling more tokens to exist within whatever memory restrictions the system has.

AI designers have actually consistently looked for to compress tokens. Doing so might be a huge aid. Whereas a token window may be usually restricted to 200,000 tokens, if you might drop each token down into half its typical size, you might double the limitation to 400,000 tokens. Good.

There is a bothersome catch connected with the compression of tokens. Typically, yes, you can squeeze them down in size, however the accuracy gets undercut when you do so. That’s bad. It may not be excessively bad in the sense that they are still practical and functional. All of it relies on just how much accuracy gets compromised.

Preferably, you would desire the optimum possible compression and do so at a 100% retention of accuracy. It’s a lofty objective. The chances are that you will require to weigh compression levels versus precision accuracy. Like a lot of things in life, there is never ever a totally free lunch.

Knock Your Socks Off

Expect we enabled ourselves to believe outside package.

The typical technique with LLMs is to accept pure text, encode the text into tokens, and continue in our merry method. We would often start our idea processes about tokenization by rationally and naturally presuming that the input from the user will be pure text. They go into text through their keyboard, and text is what gets transformed into tokens. It’s an uncomplicated technique.

Contemplate what else we may do.

Apparently out of left field, expect we dealt with text as images.

You currently understand that you can take an image of text and have that then optically scanned and either kept as an image or later on transformed into text. The procedure is a longstanding practice called OCR (optical character acknowledgment). OCR has actually been around considering that the early days of computer systems.

The typical OCR procedure includes transforming images into text and is described as image-to-text. Often you may wish to do the reverse, particularly, you have text and wish to change the text into images, which is text-to-image processing. There are lots and great deals of existing software application applications that will happily do image-to-text and do text-to-image. It is old hat.

Here’s the insane concept about LLMs and tokenization.

We still have individuals go into text, however we take that text and transform it to an image (i.e., text-to-image). Next, the image of the text is utilized by the token encoder. Therefore, instead of encoding pure text, the encoder is encoding based upon pictures of text. When the AI is all set to offer a reaction to the user, the tokens will be transformed from tokens into text, using image-to-text conversions.
Boom, drop the mic.

Understanding The Surprise

Whoa, you might be stating, what good does this experimenting with images accomplish?

If the images-to-tokens conversions can get us towards smaller sized tokens, we may be able to compress tokens. This, in turn, suggests we can possibly have more tokens within the bounds of restricted memory. Keep in mind that the compression of tokens is solemnly on our mind.

In a just recently published research study entitled “DeepSeek-OCR: Contexts Optical Compression” by Haoran Wei, Yaofeng Sun, Yukun Li, arXiv, October 21, 2025, the term paper made these claims (excerpts):

  • ” A single image including file text can represent abundant info utilizing considerably less tokens than the comparable digital text, recommending that optical compression through vision tokens might accomplish much greater compression ratios.”
  • ” This insight inspires us to reconsider vision-language designs (VLMs) from an LLM-centric point of view, concentrating on how vision encoders can boost LLMs’ performance in processing textual info instead of standard VQA, which human beings stand out at.”
  • ” OCR jobs, as an intermediate method bridging vision and language, offer a perfect testbed for this vision-text compression paradigm, as they develop a natural compression-decompression mapping in between visual and textual representations while using quantitative assessment metrics.”
  • ” Our technique attains 96%+ OCR decoding accuracy at 9-10x text compression, ∼ 90% at 10-12x compression, and ∼ 60% at 20x compression on Fox criteria including varied file designs (with real precision being even greater when representing formatting distinctions in between output and ground reality).”

As kept in mind above, the speculative work appeared to recommend that a compression ratio of 10x smaller sized might sometimes be attained with a 96% accuracy. If that might be done throughout the board, it would suggest that, whereas a token window limitation today may be 400,000 tokens, the limitation might be raised to 4,000,000 tokens, albeit at a 96% accuracy rate.

The accuracy at 96% may be bearable or excruciating, depending upon what the AI is being utilized for. You can’t get a totally free lunch, a minimum of up until now. A compression rate of 20x would be even much better, though the accuracy at 60% would appear rather unappealing. Still, there may be situations in which one might begrudgingly accept the 60% for the 20x boost.

Famous AI star Andrej Karpathy published his preliminary ideas online about this technique overall: “I rather like the brand-new DeepSeek-OCR paper. It’s a great OCR design (perhaps a bit even worse than dots), and yes information collection etc., however anyhow it does not matter. The more fascinating part for me (esp as a computer system vision at heart who is momentarily masquerading as a natural language individual) is whether pixels are much better inputs to LLMs than text. Whether text tokens are inefficient and simply dreadful, at the input. Possibly it makes more sense that all inputs to LLMs ought to just ever be images.” (source: Twitter/X, October 20, 2025).

Conceptualizing Works

The research study likewise attempted utilizing a wide variety of natural languages. This is yet another worth of utilizing images instead of pure text. As you understand, there are natural languages that utilize pictorial characters and words. Those languages would appear particularly appropriate to an image-based technique of tokenization.

Yet another interesting element is that we currently have VLMs, including AI that handles visual images instead of text per se (i.e., visual language designs). We do not need to transform the wheel when it pertains to doing similarly with LLMs. Simply obtain what has actually dealt with VLMs and adjust to use in LLMs. That’s utilizing the entire noggin and leveraging reuse when possible.

The concept deserves recommendation and extra digging in. I would not recommend walking around and immediately stating that all LLMs require to change to this type of technique. The jury is still out. We require more research study to see how far this goes, in addition to comprehending both the benefits and the drawbacks.

On the other hand, I think we can a minimum of make this strong declaration: “Often, an image truly deserves a thousand words.”

Source: Forbes.

ADVERTISEMENT

Related Articles

What Time Does The UFC 321 Fight Card Start?
Business

What Time Does The UFC 321 Fight Card Start?

Megan Thee Stallion Is A ‘Lover Girl’ On New Single
Business

Megan Thee Stallion Is A ‘Lover Girl’ On New Single

Inside The Cerebral Costume Design Of ‘Him’—And Why Pay Equity And Diversity Still Matter
Business

Inside The Cerebral Costume Design Of ‘Him’—And Why Pay Equity And Diversity Still Matter

‘Daryl Dixon’ Season 3 Fixed One Huge Problem With ‘The Walking Dead’
Business

‘Daryl Dixon’ Season 3 Fixed One Huge Problem With ‘The Walking Dead’

‘Chainsaw Man’ Latest Hit After ‘Demon Slayer’
Business

‘Chainsaw Man’ Latest Hit After ‘Demon Slayer’

Gymnast Donnell Whittenberg Makes History As First U.S. World Champion On Still Rings
Business

Gymnast Donnell Whittenberg Makes History As First U.S. World Champion On Still Rings

As Music Nostalgia Grows, Bowling For Soup Marks Anniversary Year
Business

As Music Nostalgia Grows, Bowling For Soup Marks Anniversary Year

Investors Protest Deforestation As The Amazon Tipping Point Approaches
Business

Investors Protest Deforestation As The Amazon Tipping Point Approaches

EJAE Shows Her Vulnerable Side In New Single, ’In Another World’
Business

EJAE Shows Her Vulnerable Side In New Single, ’In Another World’

Load More

Popular News

    Latest News

    America’s ‘BAT’ man unveils tech built to outsmart a Chinese first strike

    America’s ‘BAT’ man unveils tech built to outsmart a Chinese first strike

    What to know about Argentina’s midterm vote, a pivotal test for Trump ally President Milei

    What to know about Argentina’s midterm vote, a pivotal test for Trump ally President Milei

    Inflation remained well above the Fed’s target in September ahead of rate cut decision

    Inflation remained well above the Fed’s target in September ahead of rate cut decision

    US IPO Weekly Recap: 5 Small IPOs Debut, As Government Shutdown Drags On

    US IPO Weekly Recap: 5 Small IPOs Debut, As Government Shutdown Drags On

    Ether ETFs Log Second Week of Outflows as Bitcoin ETF Inflows Surge

    Ether ETFs Log Second Week of Outflows as Bitcoin ETF Inflows Surge

    Gold rebounds after soft US inflation data bolsters Fed rate cut bets

    Gold rebounds after soft US inflation data bolsters Fed rate cut bets

    About Us

    The New York Ledger is an online newspaper for cosmopolitans, global entrepreneurs, management staff, influencers, and other modern leaders who care about wider aspects and broader opinions.

    Category

    • Business
    • Crypto
    • Forex
    • Markets
    • News
    • Politics
    • Spotlight
    • Videos

    Topics

    2025 2026 elections coverage abortion andrew cuomo california campaigning charlie kirk chuck schumer congress crime crime world democratic party democrats democrats senate Donald Trump elections fbi federal courts government shutdown gubernatorial homeland security house of representatives politics illegal immigrants immigration israel jack ciattarelli joe biden judiciary Justice Department middle east mikie sherrill national guard new jersey new york city nyc mayoral elections coverage politics republicans republicans elections senate supreme court us us protests virginia virginia governor race White House zohran mamdani
    • About
    • Privacy Policy
    • Terms & Conditions
    • Contact

    © 2021 All Rights Reserved - Blue Planet Global Media Network

    No Result
    View All Result
    • Home
    • News
    • Spotlight
    • Politics
    • Business
    • Markets
      • Stock News
      • Crypto
      • Forex
    • Finance
    • Companies
    • Tech
    • Climate
    • Lifestyle
    • More
      • Videos
      • Economic Calendar
      • Stocks Portfoilo
      • Stock Tracker

    © 2021 All Rights Reserved - Blue Planet Global Media Network

    Welcome Back!

    Login to your account below

    Forgotten Password?

    Retrieve your password

    Please enter your username or email address to reset your password.

    Log In
    This website uses cookies. By continuing to use this website, you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.