DeepSeek in the App Store | Chinese AI Ranks First – OrcaCore
Less than a month after its release, the Chinese AI startup DeepSeek has made waves with its chatbot app. Despite being offered for free, it stands as a formidable competitor to the established models from OpenAI. Now, marking a significant achievement, the DeepSeek iOS app has ascended to the top spot on the App Store charts. This article, brought to you by Orcacore, delves into the fascinating story of DeepSeek in the App Store and its implications for the AI landscape.
DeepSeek App Tops the App Store
The AI app DeepSeek has officially surpassed ChatGPT to claim the coveted title of the most downloaded free app on the App Store. This is no small feat, especially considering the market dominance of OpenAI’s flagship product.
The power behind the DeepSeek app lies in its DeepSeek-V3 model. According to the developers, this model is "the number one open-source model and rivals the most advanced closed-source models in the world." Since its launch in January, the app has resonated strongly with users, propelling it to the top of the charts. The rapid adoption of DeepSeek in the App Store signals a shift in the AI consumer market.

AI models, including ChatGPT and DeepSeek, require substantial computational power for training. This often translates to the use of advanced chips, such as Nvidia’s H100. However, geopolitical factors have come into play, with the US government expanding export bans on these high-performance chips to China since 2021.
The sanctions prevent China from directly accessing the H100. Instead, they rely on the H800, which offers a lower data transfer rate. Despite this limitation, DeepSeek researchers have confirmed that they utilized Nvidia’s H800 chips to train the DeepSeek-V3 model, with the total cost estimated at under $6 million. This impressive feat highlights the ingenuity and resourcefulness of the DeepSeek team.

The success of DeepSeek, despite hardware constraints, has sent ripples through the industry. Its app’s ascent to the top of the App Store has reportedly caused concern among major American companies and even led to a dip in US stock index futures. Conversely, the news of DeepSeek’s App Store dominance has boosted the shares of Chinese technology companies associated with DeepSeek, such as Iflytek. The rise of DeepSeek in the App Store showcases a new era of competition.
Adding to the intrigue, DeepSeek recently unveiled the open-source R1 model. Remarkably, this model outperformed OpenAI’s o1 reasoning model in certain benchmarks, despite costing 95% less to develop. This raises questions about the efficiency and cost-effectiveness of different AI development approaches.
The industry now awaits the response of American companies to this emerging Chinese competitor. The question is whether they will adjust their pricing strategies or explore alternative technological pathways to maintain their competitive edge. The competition spurred by DeepSeek in the App Store will be interesting to watch.
DeepSeek Removed from App Store in Italy
In a separate development, the Italian data protection authority has blocked DeepSeek in the app store, citing concerns over the lack of clarity regarding its handling of personal data.
This action came shortly after DeepSeek launched its free AI assistant, boasting lower data usage and cost compared to its rivals. The app’s rapid rise to the top of the Apple App Store, surpassing ChatGPT, had already generated anxiety among tech stock market investors.

Pasquale Stanzione, head of the Italian data protection authority, stated: "The news of the removal of the app was published a few hours ago, but I cannot say whether this action was the result of our investigation or not." He further added: "The authority will launch an in-depth investigation to check whether DeepSeek complies with the European Union’s General Data Protection Regulation (GDPR)."
The Italian watchdog, Garante, will investigate the types of personal data collected from users, its source, the purposes for which it is used, the legal basis for its processing, and whether the data is stored in China.
DeepSeek and its affiliates have been given 20 days to respond to the questions.
Stanzione also emphasized the authority’s commitment to safeguarding minors, preventing algorithmic bias, and preventing electoral interference.
Italy has been proactive in AI monitoring. Two years prior, the country temporarily blocked ChatGPT due to potential breaches of EU privacy regulations.
Alternative Solutions to Hardware Limitations
The article highlights DeepSeek’s ability to train a powerful AI model despite limitations on access to cutting-edge hardware like Nvidia’s H100 chips. While the article focuses on using the H800 as a workaround, here are two alternative strategies to overcome hardware limitations in AI training:
1. Model Distillation and Quantization:
-
Explanation: Model distillation involves training a smaller, more efficient "student" model to mimic the behavior of a larger, more complex "teacher" model. The teacher model, potentially trained on more powerful hardware, transfers its knowledge to the student model, which can then run on less resource-intensive hardware. Quantization further reduces the memory footprint and computational requirements of the model by representing weights and activations with lower precision (e.g., 8-bit integers instead of 32-bit floating-point numbers). This approach allows for deployment on edge devices or in environments with limited computing resources.
-
Code Example (Conceptual – using PyTorch):
import torch
import torch.nn as nn
import torch.optim as optim
# Assume you have a trained "teacher" model (teacher_model) and a "student" model (student_model)
# Define a distillation loss function (e.g., a combination of KL divergence and cross-entropy)
def distillation_loss(student_output, teacher_output, targets, temperature=2.0, alpha=0.5):
"""
Calculates the distillation loss.
Args:
student_output: Output from the student model.
teacher_output: Output from the teacher model.
targets: Ground truth labels.
temperature: Softmax temperature for smoothing probabilities.
alpha: Weighting factor between distillation loss and cross-entropy loss.
Returns:
The calculated loss.
"""
# Soften the student and teacher outputs using temperature
student_prob = torch.softmax(student_output / temperature, dim=1)
teacher_prob = torch.softmax(teacher_output / temperature, dim=1)
# Calculate KL divergence loss
kl_loss = nn.KLDivLoss(reduction='batchmean')(torch.log(student_prob), teacher_prob) * (temperature**2)
# Calculate cross-entropy loss with ground truth labels
cross_entropy_loss = nn.CrossEntropyLoss()(student_output, targets)
# Combine the losses
loss = alpha * kl_loss + (1 - alpha) * cross_entropy_loss
return loss
# Training loop for the student model
optimizer = optim.Adam(student_model.parameters(), lr=0.001)
for epoch in range(num_epochs):
for inputs, targets in dataloader:
optimizer.zero_grad()
student_output = student_model(inputs)
teacher_output = teacher_model(inputs) # Ensure teacher_model is in eval() mode
loss = distillation_loss(student_output, teacher_output, targets)
loss.backward()
optimizer.step()
# After distillation, quantize the student model
quantized_student_model = torch.quantization.quantize_dynamic(
student_model, {torch.nn.Linear}, dtype=torch.qint8
)
2. Federated Learning:
-
Explanation: Federated learning allows for training a global model across multiple decentralized devices (e.g., smartphones, edge servers) without directly sharing the training data. Each device trains a local model on its own data, and then only the model updates (e.g., gradients) are aggregated to update the global model. This approach can leverage the collective computational power of many devices, reducing the reliance on expensive, centralized hardware. It also addresses privacy concerns, as the raw data remains on the individual devices.
-
Code Example (Conceptual – using a hypothetical federated learning framework):
# Assuming a federated learning framework is available (e.g., Flower, PySyft)
# On each client device:
def train_local_model(local_data):
"""Trains a local model on the client's data."""
local_model = YourModel() # Instantiate your model
optimizer = optim.Adam(local_model.parameters(), lr=0.001)
for inputs, targets in local_data:
optimizer.zero_grad()
outputs = local_model(inputs)
loss = loss_function(outputs, targets)
loss.backward()
optimizer.step()
return local_model.state_dict() # Return the model weights
# On the central server:
def aggregate_model_updates(client_updates):
"""Aggregates model updates from multiple clients."""
global_model = YourModel() # Instantiate your global model
# Average the weights from all client updates (a simple aggregation method)
with torch.no_grad():
for param in global_model.named_parameters():
param_name = param[0]
param_values = [client_update[param_name] for client_update in client_updates]
stacked_values = torch.stack(param_values)
averaged_value = torch.mean(stacked_values, dim=0)
param[1].copy_(averaged_value) #Update global model weights
return global_model.state_dict()
# Federated Learning Loop (Simplified)
for round in range(num_rounds):
# Simulate client selection (e.g., randomly select a subset of clients)
selected_clients = select_clients(all_clients, num_clients_per_round)
# Train local models on selected clients
client_updates = []
for client in selected_clients:
client_updates.append(train_local_model(client.local_data))
# Aggregate model updates on the server
global_model_weights = aggregate_model_updates(client_updates)
# Distribute updated global model to clients (not shown in this snippet)
These alternative approaches offer viable pathways for AI development in resource-constrained environments. They allow companies to overcome hardware limitations and develop powerful AI models without relying solely on expensive, high-performance chips.