📢 Gate Square Exclusive: #PUBLIC Creative Contest# Is Now Live!
Join Gate Launchpool Round 297 — PublicAI (PUBLIC) and share your post on Gate Square for a chance to win from a 4,000 $PUBLIC prize pool
🎨 Event Period
Aug 18, 2025, 10:00 – Aug 22, 2025, 16:00 (UTC)
📌 How to Participate
Post original content on Gate Square related to PublicAI (PUBLIC) or the ongoing Launchpool event
Content must be at least 100 words (analysis, tutorials, creative graphics, reviews, etc.)
Add hashtag: #PUBLIC Creative Contest#
Include screenshots of your Launchpool participation (e.g., staking record, reward
Comprehensive Assessment of GPT Model Credibility: Unveiling Potential Vulnerabilities and Areas for Improvement
New Research on Comprehensive Assessment of GPT Model Credibility
A study jointly conducted by several top universities and research institutions comprehensively assessed the reliability of large language models like GPT. The research team developed a comprehensive evaluation platform and presented the related findings in the latest paper "Decoding Trust: A Comprehensive Assessment of the Reliability of GPT Models."
Research has uncovered some previously undisclosed vulnerabilities related to credibility. For example, the GPT model is prone to being misled into producing toxic and biased outputs, and it may also leak private information from training data and conversation history. Although GPT-4 is generally more reliable than GPT-3.5 in standard benchmark tests, it is more susceptible to attacks when faced with maliciously designed system prompts or user prompts, which may be due to GPT-4's stricter adherence to misleading instructions.
The research team conducted a comprehensive evaluation of the GPT model from eight perspectives of credibility, including robustness against adversarial attacks, toxicity and bias, privacy leakage, and so on. For example, when assessing robustness against text adversarial attacks, the researchers constructed three evaluation scenarios, including standard benchmark tests, performance under different task instructions, and vulnerability when facing more challenging adversarial texts.
Research also found that GPT models exhibit unexpected advantages in certain cases. For example, GPT-3.5 and GPT-4 are not misled by counterfactual examples added during demonstrations and may even benefit from them. However, providing anti-fraud demonstrations may mislead the model into making incorrect predictions about counterfactual inputs, especially when the counterfactual demonstrations are close to user inputs.
In terms of toxicity and bias, the GPT model shows little deviation on most stereotype topics in benign environments, but may be "tricked" into agreeing with biased content under misleading system prompts. GPT-4 is more susceptible to targeted misleading system prompts than GPT-3.5. The degree of bias in the model is also related to the demographic groups and stereotype topics mentioned in user prompts.
Regarding the issue of privacy leakage, research has found that GPT models may leak sensitive information from the training data, such as email addresses. In certain cases, utilizing supplementary knowledge can significantly improve the accuracy of information extraction. GPT models may also leak private information injected into the conversation history. Overall, GPT-4 is more robust in protecting personal identity information compared to GPT-3.5, but both models may leak various types of personal information when faced with privacy leakage demonstrations.
This study provides a comprehensive assessment of the reliability of GPT models, revealing potential vulnerabilities and areas for improvement. The research team hopes that this work will encourage more researchers to participate and work together to create more robust and trustworthy models.