


Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models
Mar 12, 2025 pm 01:03 PMResearchers from Shanghai Jiaotong University, Shanghai AI Lab and the Chinese University of Hong Kong have launched the Visual-RFT (Visual Enhancement Fine Tuning) open source project, which requires only a small amount of data to significantly improve the performance of visual language mockups (LVLM). Visual-RFT cleverly combines DeepSeek-R1's rule-based reinforcement learning approach with OpenAI's reinforcement fine-tuning (RFT) paradigm, successfully extending this approach from the text field to the visual field.
By designing corresponding rule rewards for tasks such as visual subcategorization and object detection, Visual-RFT overcomes the limitations of the DeepSeek-R1 method being limited to text, mathematical reasoning and other fields, providing a new way for LVLM training.
Advantages of Visual-RFT:
Compared with traditional visual instruction fine-tuning (SFT) methods, Visual-RFT has the following significant advantages:
- Less sample learning ability: only 10 to 1000 pieces of data can be used to achieve effective fine-tuning.
- Stronger generalization: In scenarios with limited data, performance is better than SFT.
The researchers verified Visual-RFT on multiple visual perception tasks (detection, classification, location, etc.), and the results showed that Visual-RFT achieved significant performance improvements and easily achieved capability transfer even under the settings of open vocabulary and small sample learning.
The researchers designed corresponding verifiable rewards for different tasks: IoU-based rewards are used for detection and positioning tasks, and classification correctness-based rewards are used for classification tasks.
In the inference positioning task, Visual-RFT demonstrates strong visual reasoning capabilities, such as accurately identifying waterproof glasses that athletes need to wear in pictures.
Experimental results:
Experiments based on the QWen2-VL 2B/7B model show that Visual-RFT is superior to SFT in open object detection, small sample detection, fine-grained classification and inference positioning tasks. Even if you detect a specific anime character (such as Slime), Visual-RFT can be achieved with just a small amount of data.
Open source information:
The Visual-RFT project is open source and contains training, evaluation code and data.
Project address: http://miracleart.cn/link/ec56522bc9c2e15be17d11962eeec453
The above is the detailed content of Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

As July 2025 approaches, the crypto market is hotly discussing which tokens may bring high returns. Are names like Pi, PEPE and FloppyPepe really worth the risky investment? Potential cryptocurrencies worth paying attention to in July 2025: virtual fire or real gold? As mid-2025, the heat of discussions on high-yield crypto assets continues to heat up. Bitcoin trends and "altcoin season" expectations have attracted investors' attention. Do tokens like PiNetwork, PEPE and FloppyPepe have the potential to bring considerable investment returns? Let's analyze its prospects one by one. Altcoin Market: Can July get what it wants? Against the backdrop of Bitcoin’s expected record of historical highs, the “altcoin season” seems to be brewing. Back

Ripple is redefining the future landscape of the financial industry by applying for a national bank license and promoting XRP’s new role in the crypto economy. Master the latest trends and in-depth observations and seize the trend opportunities. The cryptocurrency ecosystem is in rapid evolution, and Ripple and its digital asset XRP are undoubtedly at the center of the storm. A series of actions carried out in the US banking system are attracting widespread attention. All this development seems to be a real financial drama, gradually beginning! Ripple's banking industry aspirations are roughly the key to Ripple CEO Brad Garlinghouse is no longer content with the boundaries of traditional fintech. As a key step in strategic upgrades, Ripple

BNB is a platform token issued by Binance and has now become a native functional token of the BNB Chain ecosystem. Its main uses include 1. Transaction fee discounts; 2. BNB Chain fuel fee; 3. Participate in the Launchpad project; 4. Payment and consumption. The recommended orders of top exchanges are: 1. Binance, providing the deepest BNB liquidity; 2. Ouyi, comprehensive product line; 3. Huobi, stable and safe operation; 4. Gate.io, rich currency selection; 5. KuCoin, many emerging projects; 6. Kraken, famous for its safety and compliance.

In today's era of rapid development of technology, the integration of artificial intelligence and blockchain is gradually becoming a new trend. The Sahara AI (SAHARA) project came into being, and it is committed to creating the first full-stack AI native blockchain platform, making the future of artificial intelligence more accessible, fair and just, and open to everyone.

Contents 1. What is ICN? 2. ICNT latest updates 3. Comparison and economic model between ICN and other DePIN projects and economic models 4. Conclusion of the next stage of the DePIN track At the end of May, ICN (ImpossibleCloudNetwork) @ICN_Protocol announced that it had received strategic investment in NGPCapital with a valuation of US$470 million. Many people's first reaction was: "Has Xiaomi invested in Web3?" Although this was not Lei Jun's direct move, the one who had bet on Xiaomi, Helium, and WorkFusion

How do novice users choose a safe and reliable stablecoin platform? This article recommends the Top 10 stablecoin platforms in 2025, including Binance, OKX, Bybit, Gate.io, HTX, KuCoin, MEXC, Bitget, CoinEx and ProBit, and compares and analyzes them from dimensions such as security, stablecoin types, liquidity, user experience, fee structure and additional functions. The data comes from CoinGecko, DefiLlama and community evaluation. It is recommended that novices choose platforms that are highly compliant, easy to operate and support Chinese, such as KuCoin and CoinEx, and gradually build confidence through a small number of tests.

Robinhood launched OpenAI and SpaceX tokenized stocks caused controversy, with Elon Musk and Sam Altman fighting each other over the nature of the so-called "fake equity". Recently, the intersection of Elon Musk, Sam Altman and Robinhood has become the focus of public attention, all of which stems from tokenized equity. Robinhood's launch of tokenized stocks in private companies such as OpenAI and SpaceX to European users has sparked heated debate and accompanied by clarification and criticism from all parties. Robinhood's tokenized equity: A bold attempt? Robin, led by CEO Vlad Tenev

Explore Remittix (RTX), Monero (XMR) and Crypto-Fiat Trends: How these projects shape the future of cryptocurrencies through practicality and community orientation. Remittix, Monero and Cryptocurrency Evolution: What is the hottest speculation? The crypto market is always in a dynamic change, and new and old projects are competing for investors' attention. Currently, Remittix (RTX), Monero (XMR) and crypto-fiat currency directions are becoming the focus of discussion. Let’s find out what driving forces are behind this wave of popularity? Remittix: The emerging token with emerging potential is gradually gaining market attention, and its development trajectory has been compared to the early stages of Bitcoin and Ethereum by some people. "CryptoR
